Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 2
This article is part of a series of articles covering the usage of Puppeteer and Lighthouse, you can view the first article of this series here.
This is a continuation from Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 1, we're going to jump right into where we left of from the previous article.
We're going to allow a repeat
property to be added on the provided options
object that is passed from the client, this will be in the same format that bull
accepts (excluding the usage of Date
instances, as we will only be able to accept string
or number
when using JSON):
tsinterface RepeatOpts { cron?: string; // Cron string tz?: string, // Timezone startDate?: string | number; // Start date when the repeat job should start repeating (only with cron). endDate?: string | number; // End date when the repeat job should stop repeating. limit?: number; // Number of times the job should repeat at max. every?: number; // Repeat every millis (cron setting cannot be used together with this setting.) count?: number; // The start value for the repeat iteration count. }
To support this on the service side, we only need to add the repeat
file to our options when adding our job to the queue, we can also allow the client to set a few more options that make sense for a client to be able to access.
Once our job is created, we will also want a way to disable it, so we're going to pass our report identifier as our job id, we can then later use this to delete the job:
js// Inside our `requestGenerateReport` function in `reports.js` // Replacing our previous `reportGenerationQueue.add` code const queueOptions = { jobId: id, removeOnComplete: true, removeOnFail: true // We have no way to handle this atm }; const allowedOptions = [ "repeat", "backoff", "attempts", "delay" ]; allowedOptions // Add if the options has that key .filter(key => options.hasOwnProperty(key)) .forEach( key => queueOptions[key] = options[key] ); await reportGenerationQueue.add(document, queueOptions);
Be aware that we've just given clients access to schedule a job every millisecond if they wanted to, and being able to create as many jobs as they want! A bad actor may tack advantage of this and cause an outage for your service, so in production code you should be validating these inputs to ensure they're within limits for the client, in a future article we're going to extend this further and introduce tokenization of these jobs, and also add some validation!
Now that in our removeReport
function we will need to also delete the job from our queue:
js// Inside our `removeReport` function before Promise.all async function removeQueueJob() { const job = await reportGenerationQueue.getJob(id); if (!job) { return; // Already completed } // This may throw an error, but I'm unsure what to do with that atm, we should probably // do this function first so we can handle it await job.remove(); } promises.push(removeQueueJob);
In our service we also want to swap out our single reportPath
property and instead use results
, which will be an array which will contain objects with the keys id
, path
and createdAt
:
js// Inside our doReportWork function, replacing the assignment of `document`: const document = Object.assign({ results: [] }, await getReport(payload.id)); document.results.push({ id: uuid.v4(), path: reportPath, createdAt: new Date().toISOString() });
This means we've introduced a breaking change, so be aware that our client will no longer work, we're going to introduce a couple more routes, and change the usage of getReportWithResult
to getReport
, create a getReportResultDocument
function, delete our old getReportWithResult
function, add a removeReportResult
function, and update our removeReport
function to delete each result document:
js// In reports.js // New function export async function getReportResult(id, resultId, document = undefined) { document = document || await getReport(id); if (!(document || Array.isArray(document.results))) { return undefiend; } return document.results .find(({ id }) => id === resultId); } // New function replacing `getReportWithResult` export async function getReportResultDocument(id, resultId) { const result = await getReportResult(id, resultId); if (!result) { return undefined; } const reportJSONBuffer = await getDocument(result.path); if (!reportJSONBuffer) { throw new Error("Unable to find report result"); } return JSON.parse(reportJSONBuffer.toString("utf-8")); } // New function export async function removeReportResult(id, resultId) { const document = await getReport(id); const result = await getReportResult(id, resultId, document); if (!result) { return false; } await removeDocument(result.path); const newDocument = Object.assign({}, document, { results: document.results // Filter out our result .filter(({ id }) => id !== resultId) }); const store = await getStore(); await store.set(id, JSON.stringify(newDocument)); } // Within `removeReport`, replacing the assignment of promises & the if statement for the reportPath let promises = []; if (document.results) { promises = promises.concat( document.results.map(({ path }) => removeDocument(path)) ); }
And our new routes:
js// At the top of our file import { requestGenerateReport, getReport, removeReport, getReportResultDocument, removeReportResult } from "./reports.js"; // Replacing all the routes excluding `POST /report` app.get("/report/:id", asyncHandler(async (request, response) => { const report = await getReport(request.params.id); if (!report) { return response.sendStatus(404); // We couldn't find it } response.set("Content-Type", "application/json"); return response.send(report); })); app.get("/report/:id/result/:resultId", asyncHandler(async (request, response) => { const result = await getReportResultDocument(request.params.id, request.params.resultId); if (!result) { return response.sendStatus(404); } response.set("Content-Type", "application/json"); return response.send(result); })); app.delete("/report/:id", asyncHandler(async (request, response) => { const foundAndDeleted = await removeReport(request.params.id); response.sendStatus(foundAndDeleted ? 204 : 404); })); app.delete("/report/:id/result/:resultId", asyncHandler(async (request, response) => { const foundAndDeleted = await removeReportResult(request.params.id, request.params.resultId); response.sendStatus(foundAndDeleted ? 204 : 404); }))
Now we can delete individual report results, and also fetch individual report results!
If we allow for a query parameter of html
in our GET /report/:id/result/:resultId
route we can also serve up HTML directly, meaning that we can view the reports without first processing the result:
jsapp.get("/report/:id/result/:resultId", asyncHandler(async (request, response) => { const result = await getReportResultDocument(request.params.id, request.params.resultId); if (!result) { return response.sendStatus(404); } if (request.query.html) { response.set("Content-Type", "text/html"); // result.report is an HTML string return response.send(result.report); } else { // JSON by default with all the contents response.set("Content-Type", "application/json"); return response.send(result); } }));
Now in our client we're going to replace our checkResults
function so that we can handle the results, you also might notice we're no longer automatically deleting the results and instead allow each report to be deleted individually as well as the entire report generation request (for ongoing reports):
jsfunction deleteReport(parentNode, identifier) { fetch( `/report/${identifier}`, { method: "DELETE" } ) .then(() => parentNode.remove()) .catch((error) => { console.warn(error); alert("Couldn't delete report!"); }) } function deleteResult(parentNode, identifier, resultIdentifier) { fetch( `/report/${identifier}/result/${resultIdentifier}`, { method: "DELETE" } ) .then(() => parentNode.remove()) .catch((error) => { console.warn(error); alert("Couldn't delete result!"); }) } function displayReport(element, identifier, originalUrl, report, onDelete = undefined) { // Empty out the element while (element.firstChild) { element.removeChild(element.firstChild); } if (!report.results) { // Display waiting info const info = document.createElement("span"); info.innerText = "Waiting for report generation"; element.appendChild(info); } else { // Display list of results report.results .forEach( ({ id: resultId, createdAt }, index, array) => { const span = document.createElement("span"); const link = document.createElement("a"); link.innerText = `Report generated for ${originalUrl} at ${new Date(createdAt).toString()} (Report ${index + 1})`; link.href = `/report/${identifier}/result/${resultId}?html=1`; link.target = "_blank"; span.appendChild(link); const deleteButton = document.createElement("button"); deleteButton.innerText = "Delete"; deleteButton.addEventListener("click", () => deleteResult(span, identifier, resultId)); span.appendChild(deleteButton); element.appendChild(span); if (array.length > (index + 1)) { // Add a line break in between each report: element.appendChild( document.createElement("br") ); } } ); } // Allow deletion const deleteButton = document.createElement("button"); deleteButton.innerText = "Delete Report"; deleteButton.addEventListener("click", () =>{ deleteReport(element, identifier); if (onDelete) { onDelete(); } }); element.appendChild(deleteButton); } function checkResults(element, identifier, originalUrl, report = undefined) { const intervalHandle = setInterval(doCheck, 2500); let secondaryIntervalHandle; const onDelete = () => { if (secondaryIntervalHandle) { clearInterval(secondaryIntervalHandle); } else { clearInterval(intervalHandle); } }; displayReport(element, identifier, originalUrl, report || { id: identifier, url: originalUrl }, onDelete); function doCheck() { fetch(`/report/${identifier}`) .then(response => response.json()) .then(report => { if (report.results) { // Clear the old interval clearInterval(intervalHandle); // Now try every minute if (!secondaryIntervalHandle) { secondaryIntervalHandle = setInterval(doCheck, 60 * 1000); } } displayReport(element, identifier, originalUrl, report, onDelete); }) .catch(console.warn) } }
Now we want to be able to accept options, so next to our textarea
for URL values we're going to have another textarea
where a JSON object can be provided for the options to use, we're still going to force the type to be html
however so that will be one options that can't be changed:
html<!-- After #urls --> <textarea id="options" rows="10">{}</textarea>
We'll need to validate these options and merge them in with our preset options, we also want to be able to save our options to local storage so that we don't need to create them every time:
js// Within script const options = document.querySelector("#options"); // Try and init our options from storage try { options.value = localStorage.getItem("report-options") || "{}"; } catch(e) { } // Reset our options so they're nicely formatted getOptions(); // When we blur, try and validate the options, and then make it pretty options.addEventListener("blur", getOptions); function getOptions() { let providedOptions = {}; try { const value = (options.value || "{}").trim(); // If its an object, good to go if (value && value[0] === "{" && value[value.length - 1] === "}") { providedOptions = JSON.parse(value); } else if (value) { // Not an object throw new Error("Invalid options!"); } } catch (e) { return alert("Could not parse options as JSON"); } // Set our options to a nice formatted version const string = JSON.stringify(providedOptions, undefined, " "); try { localStorage.setItem("report-options", string); } catch(e) { } options.value = string; // We want to receive an html output return Object.assign({}, providedOptions, { output: "html" }) }
Now when we send off our request to create our reports, we'll pass in our options:
jsfunction send(urls) { const options = getOptions(); fetch("/report", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify(urls.map(url => ({ url, options }))) }) .then(response => response.json()) .then(displayResults) .catch(() => alert("Something went wrong while sending our request")) }
Now if we input these options, and submit a URL, we should get a report every minute:
json{ "repeat": { "cron": "* * * * *" } }
Now that we have reports generating, it would be great if we could list all the reports that are available, for this we're going to need a function called listReports
, we'll first need to promisify the keys
function in store.js
:
jsfunction getPromiseClient(client) { client.set = promisify(client.set.bind(client)); client.get = promisify(client.get.bind(client)); client.del = promisify(client.del.bind(client)); client.keys = promisify(client.keys.bind(client)); return client; }
Next we'll make our listReports
function by first listing all the keys starting with report:
and return the value for each value for each key
js// In our reports.js file export async function listReports() { const store = await getStore(); const keys = await store.keys("report:*"); return Promise.all( keys.map(key => getReport(key)) ); }
And then create a GET /report
route:
js// At the top of our file import { listReports } from "./reports"; // After our other routes app.get("/report", asyncHandler(async (request, response) => { response.json(await listReports()); }));
Now when our client loads we want load all available reports:
js// Within our script in index.html function loadReports() { fetch("/report", { headers: { Accept: "application/json" } }) .then(response => response.json()) .then(reports => { reports.forEach( report => { const element = document.createElement("li"); results.appendChild(element); checkResults(element, report.id, report.url, report); } ); }) .catch(error => { alert("Unable to load reports"); console.warn(error); }); } loadReports();
Now when we re-load our client we should see a bunch of reports that we have previously created, and see our ongoing reports being generated!
In the next article we're going to cover creating meta reports derived from the infromation collected in this process.