Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 1

This article and the following are largely showing the process of taking the code from the previous article on this topic to a service that supports reports to be generated on a schedule, we utilise this service in this article which shows how to create meta reports about a websites performance using these scheduled reports.
You'll see I jump back and forward between portions of the service, this is to show my thinking around each section of code and to show the why rather than just the outcome.

In our last article discussing puppeteer & lighthouse we showed how to generate lighthouse reports automatically by utilising the DevTools protocol of Chrome, in this article we're going to take that one step further and schedule our reports to be generated on a schedule.

To do this we're going to extend our code from our previous article, re-using older code is always a bonus!

The code for the previous article can be found here

In out previous article we allowed for a list of URL values along with a set of options, this means we already have a framework set up for us to be able to schedule our work. Once our work has been done we want to additionally take a PDF snapshot of the rendered page, and also generate a PDF for our lighthouse report, we're still going to retain an HTML snapshot of our page, but having a PDF snapshot ensures we have exactly how the page was rendered at that point of time without needing to think about what browser version the page was rendered with.

We need to add a way to schedule our jobs for the future, rather than adding them to a queue like before. For this we don't require a precise schedule, but the closer the better! In our previous article we utilised LevelDB for our data store, but for this article we're going to upgrade to using Redis, we're going to do this so we can utilise bull, which supports scheduled tasks out of the box.

We're assuming you have redis installed on your machine and have it ready to roll, if you don't please see here to get installation instructions for redis

First we need to swap out our store to redis, we're going to need to install the redis npm module:

bash
npm install --save redis
$$

We're going to split out our store creation into a store.js file, we will only need to export a getStore function, which will return a new instance of the redis client. In preperation of deployment we will also want to support usage of the REDIS_URL environment variable, so we're going to pass this to our client, if it doesn't exist the client will default to our local redis instance. We also only want a single instance to be used across our process, so we're going to utilise a IIFE so we can scope our client into a factory function of sorts. Inside our code now instead of using a store thats scoped to our requests, we're going to utilise the global instance by invoking getStore.

We're also wanting to use promises with our redis client, so for now we're going to promisify only the functions that we know we're going to utilise (set, del, get):

js
import { createClient } from "redis";
import { promisify } from "util";

function getPromiseClient(client) {
  client.set = promisify(client.set.bind(client))
  client.get = promisify(client.get.bind(client))
  client.del = promisify(client.del.bind(client))
  return client;
}

export const getStore = (() => {
  let singletonClient;
  return async () => {
    // Already created, no need to try and make another
       if (singletonClient) {
         return singletonClient; 
    }
    const options = {
      // Include in the case that its passed, else will default to local
      url: process.env.REDIS_URL
    };
    // Create a local variable that we will use later on to reset our client
    // if we run into issues
    const client = getPromiseClient(createClient(options));
    // We want to reset the client if we run into an issue that we can't handle, or for some reason our client was closed
    function onReset() {
      // Remove our listeners so this function is cleaned up 
      client.removeListener("error", onReset);
      client.removeListener("end", onReset);
      if (singletonClient !== client) {
           // This client is no longer the singletonClient, so don't try and reset it
        return;
      }
      // Reset the client
      singletonClient = undefined;
    }
    client.once("error", onReset);
    client.once("end", onReset);
    
    // Set our singleton client as a promise that resolves once we have a connection
       // This means when the next caller comes along we won't need to wait for the same process
    const connectedPromise = new Promise(
      (resolve, reject) => {
        function reset() {
          // Remove all our listeners
          client.removeListener("error", onError);
                 client.removeListener("connect", onReset);
        }
        function onError(error) {
          reset(); // Remove listeners
          reject(error);
        }
        function onConnect() {
          reset(); // Remove listeners
          // Resolve this promise with our client, so whoever is waiting will get the clientr rather than just waiting for a connection
          resolve(client); 
        }
        client.once("error", onError);
        client.once("connect", onConnect);
      }
    );
    singletonClient = connectedPromise;
    await connectedPromise;
    if (singletonClient === connectedPromise) {
      // If we're still the primary client, get rid of the promise
      // and replace the singleton with the client directly
        singletonClient = client;
    }
    return client;
  }
})();

We're also going to need to create a bull instance, which we will do in a schedule.js file, we're going to create a getQueue function, bull will manage its own connection to redis, so we just need to pass our REDIS_URL value if it is available:

js
import Queue from "bull";

function createQueue(name) {
    return Queue(name, process.env.REDIS_URL);
}

We're also going to create a getQueue function, which retains a single instance of each queue similar to how we retained the redis instance previosly:

js
export const getQueue = (() => {
  const queues = {};
  
  return (name) => {
       if (queues[name]) {
        return queues[name]; 
    }
    // Retain our queue so we don't create multiple of the same queue
    // it won't be a big problem, we just don't want to create too mant redis clients
    queues[name] = createQueue(name);
    return queues[name];
  }
})();

Our redis client uses the set method rather than put, so we're going to need to update our usage of that first, we're also going to remove our usage of level-jobsand replace with bull. Our getStore function also now returns a promise so we're going to need to update our usage to accomodate that as well. You can see the list of changes I made to swap from level to redis here.

Now we have redis ready to use, we're also going to need to use a document store so that we can save our PDF's for future use, for this we're going to create a storage.js function that exposes a putDocument, getDocument, and removeDocument function, for now we're going to use our local fs, but in a later article we're going to extend this so that we can utilise something like AWS S3, IPFS, or a similar service to store our documents:

js
import { writeFile, readFile, mkdir, unlink } from "fs";
import { join, basename, dirname } from "path";
import { promisify } from "util";

function getPath(name) {
    // Use basename to ensure the name isn't a reference to a seperate directory
    return join("./documents/", basename(name));
}

export async function putDocument(name, buffer) {
    const path = getPath(name);
    console.log(path);
    // Ensure the directory exists
    await promisify(mkdir)(dirname(path))
        // Catch if exists
        .catch(() => {});
    await promisify(writeFile)(path, buffer);
}

export function getDocument(name) {
    const path = getPath(name);
    return promisify(readFile)(path)
        // Return undefined if we ran into an issue
        .catch(() => {});
}

export function removeDocument(name) {
     const path = getPath(name);
  return promisify(unlink)(path)
        // Return undefined if we ran into an issue
        .catch(() => {});
}

Using our new "document store" we're going to use it to retain and retreive our generated JSON report, as we only want to store a reference to the resulting report so we don't fill our redis instance with just report results.

Inside our doReportWork function in reports.js:

js
// At the top of our file
import { putDocument } from "./storage";

// In our doReportWork function
const reportPath = `${uuid.v4()}.json`;

await putDocument(
  reportPath, 
  Buffer.from(
    JSON.stringify(result),
    "utf-8"
  )
);

const document = Object.assign({}, payload, {
  reportPath
});

Now whenever we retrieve our report, we will want to grab the file at the specified path, so we'll create a function to do this for us:

js
// At the top of our file
import { getDocument } from "./storage";

export async function getReport(id) {
  const store = await getStore();
  const documentJSON = await store.get(id);
  if (!documentJSON) {
    return undefined;
  }
  return JSON.parse(documentJSON);
}

export async function getReportWithResult(id) {
     const document = await getReport(id);
     if (!document.reportPath) {
    // Not complete
    return document;
  }
  const reportJSONBuffer = await getDocument(document.reportPath);
  if (!reportJSONBuffer) {
    throw new Error("Unable to find report result"); 
  }
  return Object.assign({}, document, { result: JSON.parse(reportJSONBuffer.toString("utf-8")) });
}

Now instead of accessing our store directly in our GET /report route handler, we will use getReportWithResult:

js
// At the top of our file
import { getReportWithResult } from "./reports.js";

// Replacing our old handler
app.get("/report/:id", asyncHandler(async (request, response) => {
  const report = await getReportWithResult(request.params.id);
  if (!report) {
    return response.sendStatus(404); // We couldn't find it
  }
  response.set("Content-Type", "application/json");
  // report here is a JSON string
  return response.send(report);
}));

We're also going to need to remove our document when we delete the associated report, so lets create a generic removeReport function:

js
// At the top of our file
import { removeDocument } from "./storage";

export async function removeReport(id) {
     const document = await getReport(id);
  if (!document) {
    return false;
  }
  const store = await getStore();
  const promises = [];
  if (document.reportPath) {
      promises.push(removeDocument(document.reportPath));
  }
  promises.push(store.del(id));
  await Promise.all(promises);
  return true;
}

Now we'll replace our old report deletion handler:

js
// At the top of our file
import { removeReport } from "./reports.js";

// Replacing our old handler
app.delete("/report/:id", asyncHandler(async (request, response) => {
  const foundAndDeleted = await removeReport(request.params.id);
  response.sendStatus(foundAndDeleted ? 204 : 404);
}));

Now time to add some scheduling, we will cover this in the next article.

Next: Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 2

Fabian Cook

Software Engineer @ Dovetail

JavaScript Developer.

Read similar articles

A beginner's guide to configuring a Discord Bot in Node.js Implementing and Testing a Mongoose Model with CI/CD Integration

Sep 19, 2020

Node.js

Moving from single fire to scheduled tasks with Puppeteer and Lighthouse Part 1

Start for free

Self hosted

Cloud