Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 2

September 20, 2020
Table of Contents
This article is part of a series of articles covering the usage of Puppeteer and Lighthouse, you can view the first article of this series here.
This is a continuation from Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 1, we're going to jump right into where we left of from the previous article.
We're going to allow a repeat
property to be added on the provided options
object that is passed from the client, this will be in the same format that bull
accepts (excluding the usage of Date
instances, as we will only be able to accept string
or number
when using JSON):
interface RepeatOpts {
cron?: string; // Cron string
tz?: string, // Timezone
startDate?: string | number; // Start date when the repeat job should start repeating (only with cron).
endDate?: string | number; // End date when the repeat job should stop repeating.
limit?: number; // Number of times the job should repeat at max.
every?: number; // Repeat every millis (cron setting cannot be used together with this setting.)
count?: number; // The start value for the repeat iteration count.
}
To support this on the service side, we only need to add the repeat
file to our options when adding our job to the queue, we can also allow the client to set a few more options that make sense for a client to be able to access.
Once our job is created, we will also want a way to disable it, so we're going to pass our report identifier as our job id, we can then later use this to delete the job:
// Inside our `requestGenerateReport` function in `reports.js`
// Replacing our previous `reportGenerationQueue.add` code
const queueOptions = {
jobId: id,
removeOnComplete: true,
removeOnFail: true // We have no way to handle this atm
};
const allowedOptions = [
"repeat",
"backoff",
"attempts",
"delay"
];
allowedOptions
// Add if the options has that key
.filter(key => options.hasOwnProperty(key))
.forEach(
key => queueOptions[key] = options[key]
);
await reportGenerationQueue.add(document, queueOptions);
Be aware that we've just given clients access to schedule a job every millisecond if they wanted to, and being able to create as many jobs as they want! A bad actor may tack advantage of this and cause an outage for your service, so in production code you should be validating these inputs to ensure they're within limits for the client, in a future article we're going to extend this further and introduce tokenization of these jobs, and also add some validation!
Now that in our removeReport
function we will need to also delete the job from our queue:
// Inside our `removeReport` function before Promise.all
async function removeQueueJob() {
const job = await reportGenerationQueue.getJob(id);
if (!job) {
return; // Already completed
}
// This may throw an error, but I'm unsure what to do with that atm, we should probably
// do this function first so we can handle it
await job.remove();
}
promises.push(removeQueueJob);
In our service we also want to swap out our single reportPath
property and instead use results
, which will be an array which will contain objects with the keys id
, path
and createdAt
:
// Inside our doReportWork function, replacing the assignment of `document`:
const document = Object.assign({ results: [] }, await getReport(payload.id));
document.results.push({
id: uuid.v4(),
path: reportPath,
createdAt: new Date().toISOString()
});
This means we've introduced a breaking change, so be aware that our client will no longer work, we're going to introduce a couple more routes, and change the usage of getReportWithResult
to getReport
, create a getReportResultDocument
function, delete our old getReportWithResult
function, add a removeReportResult
function, and update our removeReport
function to delete each result document:
// In reports.js
// New function
export async function getReportResult(id, resultId, document = undefined) {
document = document || await getReport(id);
if (!(document || Array.isArray(document.results))) {
return undefiend;
}
return document.results
.find(({ id }) => id === resultId);
}
// New function replacing `getReportWithResult`
export async function getReportResultDocument(id, resultId) {
const result = await getReportResult(id, resultId);
if (!result) {
return undefined;
}
const reportJSONBuffer = await getDocument(result.path);
if (!reportJSONBuffer) {
throw new Error("Unable to find report result");
}
return JSON.parse(reportJSONBuffer.toString("utf-8"));
}
// New function
export async function removeReportResult(id, resultId) {
const document = await getReport(id);
const result = await getReportResult(id, resultId, document);
if (!result) {
return false;
}
await removeDocument(result.path);
const newDocument = Object.assign({}, document, {
results: document.results
// Filter out our result
.filter(({ id }) => id !== resultId)
});
const store = await getStore();
await store.set(id, JSON.stringify(newDocument));
}
// Within `removeReport`, replacing the assignment of promises & the if statement for the reportPath
let promises = [];
if (document.results) {
promises = promises.concat(
document.results.map(({ path }) => removeDocument(path))
);
}
And our new routes:
// At the top of our file
import { requestGenerateReport, getReport, removeReport, getReportResultDocument, removeReportResult } from "./reports.js";
// Replacing all the routes excluding `POST /report`
app.get("/report/:id", asyncHandler(async (request, response) => {
const report = await getReport(request.params.id);
if (!report) {
return response.sendStatus(404); // We couldn't find it
}
response.set("Content-Type", "application/json");
return response.send(report);
}));
app.get("/report/:id/result/:resultId", asyncHandler(async (request, response) => {
const result = await getReportResultDocument(request.params.id, request.params.resultId);
if (!result) {
return response.sendStatus(404);
}
response.set("Content-Type", "application/json");
return response.send(result);
}));
app.delete("/report/:id", asyncHandler(async (request, response) => {
const foundAndDeleted = await removeReport(request.params.id);
response.sendStatus(foundAndDeleted ? 204 : 404);
}));
app.delete("/report/:id/result/:resultId", asyncHandler(async (request, response) => {
const foundAndDeleted = await removeReportResult(request.params.id, request.params.resultId);
response.sendStatus(foundAndDeleted ? 204 : 404);
}))
Now we can delete individual report results, and also fetch individual report results!
If we allow for a query parameter of html
in our GET /report/:id/result/:resultId
route we can also serve up HTML directly, meaning that we can view the reports without first processing the result:
app.get("/report/:id/result/:resultId", asyncHandler(async (request, response) => {
const result = await getReportResultDocument(request.params.id, request.params.resultId);
if (!result) {
return response.sendStatus(404);
}
if (request.query.html) {
response.set("Content-Type", "text/html");
// result.report is an HTML string
return response.send(result.report);
} else {
// JSON by default with all the contents
response.set("Content-Type", "application/json");
return response.send(result);
}
}));
Now in our client we're going to replace our checkResults
function so that we can handle the results, you also might notice we're no longer automatically deleting the results and instead allow each report to be deleted individually as well as the entire report generation request (for ongoing reports):
function deleteReport(parentNode, identifier) {
fetch(
`/report/${identifier}`,
{
method: "DELETE"
}
)
.then(() => parentNode.remove())
.catch((error) => {
console.warn(error);
alert("Couldn't delete report!");
})
}
function deleteResult(parentNode, identifier, resultIdentifier) {
fetch(
`/report/${identifier}/result/${resultIdentifier}`,
{
method: "DELETE"
}
)
.then(() => parentNode.remove())
.catch((error) => {
console.warn(error);
alert("Couldn't delete result!");
})
}
function displayReport(element, identifier, originalUrl, report, onDelete = undefined) {
// Empty out the element
while (element.firstChild) {
element.removeChild(element.firstChild);
}
if (!report.results) {
// Display waiting info
const info = document.createElement("span");
info.innerText = "Waiting for report generation";
element.appendChild(info);
} else {
// Display list of results
report.results
.forEach(
({ id: resultId, createdAt }, index, array) => {
const span = document.createElement("span");
const link = document.createElement("a");
link.innerText = `Report generated for ${originalUrl} at ${new Date(createdAt).toString()} (Report ${index + 1})`;
link.href = `/report/${identifier}/result/${resultId}?html=1`;
link.target = "_blank";
span.appendChild(link);
const deleteButton = document.createElement("button");
deleteButton.innerText = "Delete";
deleteButton.addEventListener("click", () => deleteResult(span, identifier, resultId));
span.appendChild(deleteButton);
element.appendChild(span);
if (array.length > (index + 1)) {
// Add a line break in between each report:
element.appendChild(
document.createElement("br")
);
}
}
);
}
// Allow deletion
const deleteButton = document.createElement("button");
deleteButton.innerText = "Delete Report";
deleteButton.addEventListener("click", () =>{
deleteReport(element, identifier);
if (onDelete) {
onDelete();
}
});
element.appendChild(deleteButton);
}
function checkResults(element, identifier, originalUrl, report = undefined) {
const intervalHandle = setInterval(doCheck, 2500);
let secondaryIntervalHandle;
const onDelete = () => {
if (secondaryIntervalHandle) {
clearInterval(secondaryIntervalHandle);
} else {
clearInterval(intervalHandle);
}
};
displayReport(element, identifier, originalUrl, report || { id: identifier, url: originalUrl }, onDelete);
function doCheck() {
fetch(`/report/${identifier}`)
.then(response => response.json())
.then(report => {
if (report.results) {
// Clear the old interval
clearInterval(intervalHandle);
// Now try every minute
if (!secondaryIntervalHandle) {
secondaryIntervalHandle = setInterval(doCheck, 60 * 1000);
}
}
displayReport(element, identifier, originalUrl, report, onDelete);
})
.catch(console.warn)
}
}
Now we want to be able to accept options, so next to our textarea
for URL values we're going to have another textarea
where a JSON object can be provided for the options to use, we're still going to force the type to be html
however so that will be one options that can't be changed:
<!-- After #urls -->
<textarea id="options" rows="10">{}</textarea>
We'll need to validate these options and merge them in with our preset options, we also want to be able to save our options to local storage so that we don't need to create them every time:
// Within script
const options = document.querySelector("#options");
// Try and init our options from storage
try {
options.value = localStorage.getItem("report-options") || "{}";
} catch(e) {
}
// Reset our options so they're nicely formatted
getOptions();
// When we blur, try and validate the options, and then make it pretty
options.addEventListener("blur", getOptions);
function getOptions() {
let providedOptions = {};
try {
const value = (options.value || "{}").trim();
// If its an object, good to go
if (value && value[0] === "{" && value[value.length - 1] === "}") {
providedOptions = JSON.parse(value);
} else if (value) {
// Not an object
throw new Error("Invalid options!");
}
} catch (e) {
return alert("Could not parse options as JSON");
}
// Set our options to a nice formatted version
const string = JSON.stringify(providedOptions, undefined, " ");
try {
localStorage.setItem("report-options", string);
} catch(e) {
}
options.value = string;
// We want to receive an html output
return Object.assign({}, providedOptions, { output: "html" })
}
Now when we send off our request to create our reports, we'll pass in our options:
function send(urls) {
const options = getOptions();
fetch("/report", {
method: "POST",
headers: {
"Content-Type": "application/json"
},
body: JSON.stringify(urls.map(url => ({ url, options })))
})
.then(response => response.json())
.then(displayResults)
.catch(() => alert("Something went wrong while sending our request"))
}
Now if we input these options, and submit a URL, we should get a report every minute:
{
"repeat": {
"cron": "* * * * *"
}
}
Now that we have reports generating, it would be great if we could list all the reports that are available, for this we're going to need a function called listReports
, we'll first need to promisify the keys
function in store.js
:
function getPromiseClient(client) {
client.set = promisify(client.set.bind(client));
client.get = promisify(client.get.bind(client));
client.del = promisify(client.del.bind(client));
client.keys = promisify(client.keys.bind(client));
return client;
}
Next we'll make our listReports
function by first listing all the keys starting with report:
and return the value for each value for each key
// In our reports.js file
export async function listReports() {
const store = await getStore();
const keys = await store.keys("report:*");
return Promise.all(
keys.map(key => getReport(key))
);
}
And then create a GET /report
route:
// At the top of our file
import { listReports } from "./reports";
// After our other routes
app.get("/report", asyncHandler(async (request, response) => {
response.json(await listReports());
}));
Now when our client loads we want load all available reports:
// Within our script in index.html
function loadReports() {
fetch("/report", {
headers: {
Accept: "application/json"
}
})
.then(response => response.json())
.then(reports => {
reports.forEach(
report => {
const element = document.createElement("li");
results.appendChild(element);
checkResults(element, report.id, report.url, report);
}
);
})
.catch(error => {
alert("Unable to load reports");
console.warn(error);
});
}
loadReports();
Now when we re-load our client we should see a bunch of reports that we have previously created, and see our ongoing reports being generated!
In the next article we're going to cover creating meta reports derived from the infromation collected in this process.

Fabian Cook
Software Engineer @ Dovetail
JavaScript Developer.
Read similar articles
Building a Web App with Angular and Bootstrap
Check out our tutorial
Debugging JavaScript Efficiently with Chrome DevTools
Check out our tutorial
Let's play bingo with JavaScript!
Check out our tutorial