Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 2

This article is part of a series of articles covering the usage of Puppeteer and Lighthouse, you can view the first article of this series here.
This is a continuation from Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 1, we're going to jump right into where we left of from the previous article.

We're going to allow a repeat property to be added on the provided options object that is passed from the client, this will be in the same format that bull accepts (excluding the usage of Date instances, as we will only be able to accept string or number when using JSON):

ts
interface RepeatOpts {
  cron?: string; // Cron string
  tz?: string, // Timezone
  startDate?: string | number; // Start date when the repeat job should start repeating (only with cron).
  endDate?: string | number; // End date when the repeat job should stop repeating.
  limit?: number; // Number of times the job should repeat at max.
  every?: number; // Repeat every millis (cron setting cannot be used together with this setting.)
  count?: number; // The start value for the repeat iteration count.
}

To support this on the service side, we only need to add the repeat file to our options when adding our job to the queue, we can also allow the client to set a few more options that make sense for a client to be able to access.

Once our job is created, we will also want a way to disable it, so we're going to pass our report identifier as our job id, we can then later use this to delete the job:

js
// Inside our `requestGenerateReport` function in `reports.js`
// Replacing our previous `reportGenerationQueue.add` code
const queueOptions = {
  jobId: id,
  removeOnComplete: true,
  removeOnFail: true // We have no way to handle this atm 
};
const allowedOptions = [
  "repeat",
  "backoff",
  "attempts",
  "delay"
];
allowedOptions
// Add if the options has that key
  .filter(key => options.hasOwnProperty(key))
  .forEach(
  key => queueOptions[key] = options[key]
);
await reportGenerationQueue.add(document, queueOptions);

Be aware that we've just given clients access to schedule a job every millisecond if they wanted to, and being able to create as many jobs as they want! A bad actor may tack advantage of this and cause an outage for your service, so in production code you should be validating these inputs to ensure they're within limits for the client, in a future article we're going to extend this further and introduce tokenization of these jobs, and also add some validation!

Now that in our removeReport function we will need to also delete the job from our queue:

js
// Inside our `removeReport` function before Promise.all
async function removeQueueJob() {
  const job = await reportGenerationQueue.getJob(id);
  if (!job) {
    return; // Already completed
  }
  // This may throw an error, but I'm unsure what to do with that atm, we should probably 
  // do this function first so we can handle it
  await job.remove();
}
promises.push(removeQueueJob);

In our service we also want to swap out our single reportPath property and instead use results, which will be an array which will contain objects with the keys id, path and createdAt:

js
// Inside our doReportWork function, replacing the assignment of `document`:
const document = Object.assign({ results: [] }, await getReport(payload.id));
document.results.push({
  id: uuid.v4(),
  path: reportPath,
  createdAt: new Date().toISOString()
});

This means we've introduced a breaking change, so be aware that our client will no longer work, we're going to introduce a couple more routes, and change the usage of getReportWithResult to getReport, create a getReportResultDocument function, delete our old getReportWithResult function, add a removeReportResult function, and update our removeReport function to delete each result document:

js
// In reports.js

// New function
export async function getReportResult(id, resultId, document = undefined) {
  document = document || await getReport(id);
  if (!(document || Array.isArray(document.results))) {
    return undefiend;
  }
  return document.results
    .find(({ id }) => id === resultId);
}

// New function replacing `getReportWithResult`
export async function getReportResultDocument(id, resultId) {
  const result = await getReportResult(id, resultId);
  if (!result) {
    return undefined;
  }
  const reportJSONBuffer = await getDocument(result.path);
  if (!reportJSONBuffer) {
    throw new Error("Unable to find report result"); 
  }
  return JSON.parse(reportJSONBuffer.toString("utf-8"));
}

// New function
export async function removeReportResult(id, resultId) {
  const document = await getReport(id);
  const result = await getReportResult(id, resultId, document);
  if (!result) {
    return false;
  }
  await removeDocument(result.path);
  const newDocument = Object.assign({}, document, {
    results: document.results
      // Filter out our result
      .filter(({ id }) => id !== resultId)
  });
  const store = await getStore();
  await store.set(id, JSON.stringify(newDocument));
}

// Within `removeReport`, replacing the assignment of promises & the if statement for the reportPath
let promises = [];
if (document.results) {
  promises = promises.concat(
      document.results.map(({ path }) => removeDocument(path))
  );
}

And our new routes:

js
// At the top of our file
import { requestGenerateReport, getReport, removeReport, getReportResultDocument, removeReportResult } from "./reports.js";


// Replacing all the routes excluding `POST /report`
app.get("/report/:id", asyncHandler(async (request, response) => {
  const report = await getReport(request.params.id);
  if (!report) {
    return response.sendStatus(404); // We couldn't find it
  }
  response.set("Content-Type", "application/json");
  return response.send(report);
}));

app.get("/report/:id/result/:resultId", asyncHandler(async (request, response) => {
  const result = await getReportResultDocument(request.params.id, request.params.resultId);
  if (!result) {
    return response.sendStatus(404);
  }
  response.set("Content-Type", "application/json");
  return response.send(result);
}));

app.delete("/report/:id", asyncHandler(async (request, response) => {
  const foundAndDeleted = await removeReport(request.params.id);
  response.sendStatus(foundAndDeleted ? 204 : 404);
}));

app.delete("/report/:id/result/:resultId", asyncHandler(async (request, response) => {
  const foundAndDeleted = await removeReportResult(request.params.id, request.params.resultId);
  response.sendStatus(foundAndDeleted ? 204 : 404);
}))

Now we can delete individual report results, and also fetch individual report results!

If we allow for a query parameter of html in our GET /report/:id/result/:resultId route we can also serve up HTML directly, meaning that we can view the reports without first processing the result:

js
app.get("/report/:id/result/:resultId", asyncHandler(async (request, response) => {
  const result = await getReportResultDocument(request.params.id, request.params.resultId);
  if (!result) {
    return response.sendStatus(404);
  }
  if (request.query.html) {
    response.set("Content-Type", "text/html");
    // result.report is an HTML string
    return response.send(result.report);
  } else {
    // JSON by default with all the contents
    response.set("Content-Type", "application/json");
    return response.send(result);
  }
}));

Now in our client we're going to replace our checkResults function so that we can handle the results, you also might notice we're no longer automatically deleting the results and instead allow each report to be deleted individually as well as the entire report generation request (for ongoing reports):

js
function deleteReport(parentNode, identifier) {
  fetch(
    `/report/${identifier}`,
    {
      method: "DELETE"
    }
  )
    .then(() => parentNode.remove())
    .catch((error) => {
      console.warn(error);
      alert("Couldn't delete report!");
    })
}

function deleteResult(parentNode, identifier, resultIdentifier) {
  fetch(
    `/report/${identifier}/result/${resultIdentifier}`,
    {
      method: "DELETE"
    }
  )
    .then(() => parentNode.remove())
    .catch((error) => {
      console.warn(error);
      alert("Couldn't delete result!");
    })
}

function displayReport(element, identifier, originalUrl, report, onDelete = undefined) {
  // Empty out the element
  while (element.firstChild) {
    element.removeChild(element.firstChild);
  }
  if (!report.results) {
    // Display waiting info
    const info = document.createElement("span");
    info.innerText = "Waiting for report generation";
    element.appendChild(info);
  } else {
    // Display list of results
    report.results
      .forEach(
        ({ id: resultId, createdAt }, index, array) => {
          const span = document.createElement("span");
          const link = document.createElement("a");
          link.innerText = `Report generated for ${originalUrl} at ${new Date(createdAt).toString()} (Report ${index + 1})`;
          link.href = `/report/${identifier}/result/${resultId}?html=1`;
          link.target = "_blank";
          span.appendChild(link);
          const deleteButton = document.createElement("button");
          deleteButton.innerText = "Delete";
          deleteButton.addEventListener("click", () => deleteResult(span, identifier, resultId));
          span.appendChild(deleteButton);
          element.appendChild(span);
          if (array.length > (index + 1)) {
            // Add a line break in between each report:
            element.appendChild(
              document.createElement("br")
            );
          }
        }
      );
  }

  // Allow deletion
  const deleteButton = document.createElement("button");
  deleteButton.innerText = "Delete Report";
  deleteButton.addEventListener("click", () =>{
    deleteReport(element, identifier);
    if (onDelete) {
      onDelete();
    }
  });
  element.appendChild(deleteButton);
}

function checkResults(element, identifier, originalUrl, report = undefined) {

  const intervalHandle = setInterval(doCheck, 2500);
  let secondaryIntervalHandle;

  const onDelete = () => {
    if (secondaryIntervalHandle) {
      clearInterval(secondaryIntervalHandle);
    } else {
      clearInterval(intervalHandle);
    }
  };

  displayReport(element, identifier, originalUrl, report || { id: identifier, url: originalUrl }, onDelete);

  function doCheck() {
    fetch(`/report/${identifier}`)
        .then(response => response.json())
        .then(report => {
          if (report.results) {
            // Clear the old interval
          clearInterval(intervalHandle);

          // Now try every minute
          if (!secondaryIntervalHandle) {
            secondaryIntervalHandle = setInterval(doCheck, 60 * 1000);
          }
        }
        displayReport(element, identifier, originalUrl, report, onDelete);
        })
        .catch(console.warn)
  }
}

Now we want to be able to accept options, so next to our textarea for URL values we're going to have another textarea where a JSON object can be provided for the options to use, we're still going to force the type to be html however so that will be one options that can't be changed:

html
<!-- After #urls -->
<textarea id="options" rows="10">{}</textarea>

We'll need to validate these options and merge them in with our preset options, we also want to be able to save our options to local storage so that we don't need to create them every time:

js
// Within script
const options = document.querySelector("#options");

// Try and init our options from storage
try {
  options.value = localStorage.getItem("report-options") || "{}";
} catch(e) {

}

// Reset our options so they're nicely formatted
getOptions();

// When we blur, try and validate the options, and then make it pretty 
options.addEventListener("blur", getOptions);

function getOptions() {
  let providedOptions = {};
  try {
    const value = (options.value || "{}").trim();
    // If its an object, good to go
    if (value && value[0] === "{" && value[value.length - 1] === "}") {
      providedOptions = JSON.parse(value);
    } else if (value) {
      // Not an object
      throw new Error("Invalid options!");
    }
  } catch (e) {
    return alert("Could not parse options as JSON");
  }
  // Set our options to a nice formatted version
  const string = JSON.stringify(providedOptions, undefined, "  ");
  try {
    localStorage.setItem("report-options", string);
  } catch(e) {

  }
  options.value = string;
  // We want to receive an html output
  return Object.assign({}, providedOptions, { output: "html" })
}

Now when we send off our request to create our reports, we'll pass in our options:

js
function send(urls) {
  const options = getOptions();
  fetch("/report", {
        method: "POST",
    headers: {
        "Content-Type": "application/json" 
    },
    body: JSON.stringify(urls.map(url => ({ url, options })))
  })
      .then(response => response.json())
      .then(displayResults)
      .catch(() => alert("Something went wrong while sending our request"))
}

Now if we input these options, and submit a URL, we should get a report every minute:

json
{
  "repeat": {
    "cron": "* * * * *"
  }
}

Now that we have reports generating, it would be great if we could list all the reports that are available, for this we're going to need a function called listReports, we'll first need to promisify the keys function in store.js:

js
function getPromiseClient(client) {
  client.set = promisify(client.set.bind(client));
  client.get = promisify(client.get.bind(client));
  client.del = promisify(client.del.bind(client));
  client.keys = promisify(client.keys.bind(client));
  return client;
}

Next we'll make our listReports function by first listing all the keys starting with report: and return the value for each value for each key

js
// In our reports.js file
export async function listReports() {
    const store = await getStore();
  const keys = await store.keys("report:*");
  return Promise.all(
    keys.map(key => getReport(key))
  );
}

And then create a GET /report route:

js
// At the top of our file
import { listReports } from "./reports";

// After our other routes
app.get("/report", asyncHandler(async (request, response) => {
  response.json(await listReports());
}));

Now when our client loads we want load all available reports:

js
// Within our script in index.html
function loadReports() {
  fetch("/report", {
    headers: {
      Accept: "application/json"
    }
  })  
    .then(response => response.json())
    .then(reports => {
      reports.forEach(
        report => {
          const element = document.createElement("li");
          results.appendChild(element);
          checkResults(element, report.id, report.url, report);
        }
      );
    })
      .catch(error => {
        alert("Unable to load reports");
        console.warn(error);
      });
}
loadReports();

Now when we re-load our client we should see a bunch of reports that we have previously created, and see our ongoing reports being generated!

In the next article we're going to cover creating meta reports derived from the infromation collected in this process.

Fabian Cook

Software Engineer @ Dovetail

JavaScript Developer.

Read similar articles

Building a Web App with Angular and Bootstrap How to efficiently debug JavaScript with Chrome DevTools.

Sep 20, 2020

Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 2

Start for free

Self-hosted

Cloud