How to optimize Node.js Docker image (Part 2)

How to optimize Node.js Docker image (Part 2)

Build multi-stage Docker images on a push to Git. Cache layers at no charge đź’¸.
Try Buddy for Free

Introduction

In the first part of Node-in-Docker optimization series we covered:

  • Reducing the number of running processes
  • Handling signals properly
  • Making use of the build cache
  • Using ENTRYPOINT
  • Using EXPOSE to document exposed ports

This was the resulting Dockerfile we finished with:

Dockerfile
FROM node WORKDIR /root/ COPY ["package.json", "package-lock.json", "./"] RUN ["npm", "install"] COPY ["webpack.config.js", "./"] COPY ["src/", "./src/"] RUN ["npm", "run", "build"] ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"] EXPOSE 8080

In the second part of this article, we will cover:

  • Reducing the image size by:
    • Removing obsolete files
    • Using a lighter base image
  • Using labels (LABEL)
  • Adding semantics to labels
  • Linting your Dockerfile
Hint
You can find the source code of the Node application used in this guide at github.com/buddy-works/tutorials-docker-node-frontend in the docker/basic branch.

Reducing the Docker image file size

If we take a look at our Node image now, you'll find that it's huge (939MB to be exact).

bash
docker images demo-frontend:expose REPOSITORY TAG IMAGE ID SIZE demo-frontend expose 9ffa262cf2ce 939MB $$$$

To deploy this image to a remote server, we must transfer at least 939 MB of data. Now, imagine a scenario where you need to roll back to a previous deployment in production, e.g. because of errors in the source code: if your image is large, there may be a noticeable downtime before the image finishes the transfer to the server and the rollback is complete.

Therefore, reducing the file size of our Docker image is important.

Removing obsolete files

If we examine the contents of our container, we'll find many source code files that were required by the build process, but not during the runtime:

bash
docker exec -it demo-frontend du -ahd1 16K ./dist 36K ./src 4.0K ./webpack.config.js 55M ./node_modules 15M ./.npm 4.0K ./package.json 164K ./package-lock.json 70M . $$$$$$$$$$

In fact, out of the files above, only dist/ and node_modules/ are needed. The rest of files increase the container size and can be removed without a second thought.

A naive approach would be to add an extra RUN instruction to remove these files.

Dockerfile
FROM node WORKDIR /root/ COPY ["package.json", "package-lock.json", "./"] RUN ["npm", "install"] COPY ["webpack.config.js", "./"] COPY ["src/", "./src/"] RUN ["npm", "run", "build"] RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"] ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"] EXPOSE 8080

Whilst this does get rid of some files, it does not reduce the filesize of our Node image. This is because images in Docker are built layer by layer: once a layer is added, it cannot be removed from the image. Therefore, adding an extra RUN instruction will actually increase the image's filesize.

Another approach would be to combine build and cleanup steps into a single instruction:

Dockerfile
FROM node WORKDIR /root/ COPY [".", "./"] RUN ["/bin/sh", "-c", "npm install && npm run build && find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"] ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"] EXPOSE 8080

And again: while this does reduce the image size, it undoes all the good work we've done leveraging the build cache.

What should we do, then? Use multi-stage builds to remove obsolete files, whilst still taking advantage of the build cache, of course.

What is multi-stage Dockerfile?

Multi-stage build is a Dockerfile feature introduced in v17.05 that allows you to specify multiple images (stages) within the same Dockerfile. More importantly, you are able to copy the build artifacts from one stage to another.

Therefore, inside our Dockerfile, we can have a builder stage, where we install development dependencies, build the application from the source code, splitting that process into multiple instructions to leverage the build cache. Then, we only copy the files required to run the image from the builder stage to the final image:

Dockerfile
FROM node as builder WORKDIR /root/ COPY ["package.json", "package-lock.json", "./"] RUN ["npm", "install"] COPY ["webpack.config.js", "./"] COPY ["src/", "./src/"] RUN ["npm", "run", "build"] RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"] FROM node WORKDIR /root/ COPY --from=builder /root/ ./ ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"] EXPOSE 8080

Note that the COPY instruction has been enriched with the --from option to signify that it should copy files from the builder stage instead of the build context.

Success
Using multi-stage builds allows us to leverage the build cache, whilst keeping our final image size small.

If we build our image again, you'll see that we've already saved ~9 MB from the image. Not a small image per se, but enough for some start!

bash
docker build -t demo-frontend:multi-stage . docker images REPOSITORY TAG IMAGE ID SIZE demo-frontend multi-stage cf57206dc983 930MB <none> <none> 8874c0fec4c9 939MB $$$$$$

The <none>:<none> image is an intermediate builder stage image, which can be safely discarded, although doing so will also remove the cached layers.

Hint
We will outline a way to easily clean up intermediate images later in this article.

Using a lighter base image

Even though we got rid of unnecessary build artifacts, 9 MB is not much relative to the size of the image. We can significantly reduce the size of the image by using a lighter base image.

At the moment, we are using the official node base image, which itself is 904 MB:

bash
docker images node REPOSITORY TAG IMAGE ID SIZE node latest a9c1445cbd52 904MB $$$$

This means no matter how much we minimize our demo-frontend image, it will never get smaller than 904 MB. So why is it so large?

If we look inside the Dockerfile for the node base image, we'll find that it's based on the buildpack-deps image, which contains a large number of common Debian packages, including build tools, system libraries, and system utilities. We might need these utilities when building our demo-frontend image, but we won't need them to run our node process.

Fortunately, there's a variant of the image called node:alpine. It is based off Linux Alpine, a small image at only 5.53 MB of size:

bash
docker images alpine REPOSITORY TAG IMAGE ID SIZE alpine latest 5cb3aa00f899 5.53MB $$$$

The alpine image doesn't include any build tools or libraries (it doesn't even have Bash!), allowing for a much smaller size than the node:latest image:

bash
docker images node REPOSITORY TAG IMAGE ID SIZE node slim e52c23bbdd87 148MB node latest a9c1445cbd52 904MB node alpine 953c516e1466 76.1MB $$$$$$

Therefore, the first stage would be updating our Dockerfile to use node:alpine for our final image. At the same time, we need to keep node for our builder stage:

Dockerfile
FROM node as builder WORKDIR /root/ COPY ["package.json", "package-lock.json", "./"] RUN ["npm", "install"] COPY ["webpack.config.js", "./"] COPY ["src/", "./src/"] RUN ["npm", "run", "build"] RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"] FROM node:alpine WORKDIR /root/ COPY --from=builder /root/ ./ ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"] EXPOSE 8080

When you build the image again, you should notice a drastic decrease in size:

bash
docker images demo-frontend:alpine REPOSITORY TAG IMAGE ID SIZE demo-frontend alpine 97373fdcb697 102MB $$$$

Removing Intermediate Images

Multi-stage build is a great feature, as it allows you to keep images small and use the build cache at the same time. But this also means a lot of intermediate images are going to be generated.

These intermediate images are a type of dangling image, which are images that does not have a name. Generally, you should keep these dangling images, as they are the basis of the build cache. But having them littered in the terminal can be annoying; or if you are maintaining a CI/CD server, you may also want to clean up dangling images regularly.

You can output a list of dangling images by using the --filter flag to the standard command:

bash
docker images --filter dangling=true REPOSITORY TAG IMAGE ID SIZE <none> <none> 8874c0fec4c9 939MB $$$$

To remove dangling images, run:

bash
docker rmi $(docker images --filter dangling=true --quiet) $$

However, this indiscriminately removes all dangling images. What if you just want to remove the images generated from a certain build? Enter the second stage: LABELS!

Using LABEL instruction

The LABEL instruction allows you to specify the metadata in your image as key-value pairs. You can use labels to:

  • document contact details of the author and/or maintainer of the image
  • check the build date of the image
  • add licensing information

In our case, we can use labels to mark an image as intermediate and belonging to the demo-frontend build:

Dockerfile
FROM node as builder LABEL name=demo-frontend LABEL intermediate=true WORKDIR /root/ COPY ["package.json", "package-lock.json", "./"] RUN ["npm", "install"] COPY ["webpack.config.js", "./"] COPY ["src/", "./src/"] RUN ["npm", "run", "build"] RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"] FROM node:alpine LABEL name=demo-frontend WORKDIR /root/ COPY --from=builder /root/ ./ ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"] EXPOSE 8080

Now, when we run docker build, it will already be labeled. And so we can use the labels to filter the output of the listing command:

bash
docker build -t demo-frontend:labels . docker images --filter label=name=demo-frontend REPOSITORY TAG IMAGE ID SIZE demo-frontend labels 6965537afe54 102MB <none> <none> 0cbce2a3844b 939MB $$$$$$

It also allows us to remove the intermediate image of our demo-frontend build(s) by running:

bash
docker rmi $(docker images --filter label=name=demo-frontend --filter label=intermediate=true --quiet) $$

Adding semantics to labels

Above, we picked two strings – name and intermediate – as our label key values. This is fine for now, but what if the author of another Docker image decides to use these labels as well? This is why Docker recommends that all LABEL instructions should have keys that are namespaced with the reverse DNS name of the domain that you own. This will help avoid clashes in label key names. Therefore, we should update our labels accordingly:

Dockerfile
FROM node as builder LABEL works.buddy.name=demo-frontend LABEL works.buddy.intermediate=true WORKDIR /root/ COPY ["package.json", "package-lock.json", "./"] RUN ["npm", "install"] COPY ["webpack.config.js", "./"] COPY ["src/", "./src/"] RUN ["npm", "run", "build"] RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"] FROM node:alpine LABEL works.buddy.name=demo-frontend WORKDIR /root/ COPY --from=builder /root/ ./ ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"] EXPOSE 8080

Whilst namespacing prevent label keys from clashing, it lacks a common semantics: how would a user know what works.buddy.intermediate means? Or whether works.buddy.intermediate conveys the same meaning as com.acme.intermediate?

In the past, Docker users and organizations came up with multiple conventions for imposing semantics to label key names, including:

However, both have been superseded by annotations defined in the OCI Image Format Specification.

This specification defines multiple pre-defined annotation keys, each prefixed with the org.opencontainers.image. namespace.

For example, the annotation specification specifies that the org.opencontainers.image.title label should be used to specify the "human-readable title of the image", and the org.opencontainers.image.vendor label be used for the "name of the distributing entity, organization or individual".

So let's update the label keys in our Dockerfile with these standardized label keys wherever possible:

Dockerfile
FROM node as builder LABEL org.opencontainers.image.vendor=demo-frontend LABEL works.buddy.intermediate=true WORKDIR /root/ COPY ["package.json", "package-lock.json", "./"] RUN ["npm", "install"] COPY ["webpack.config.js", "./"] COPY ["src/", "./src/"] RUN ["npm", "run", "build"] RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"] FROM node:alpine LABEL org.opencontainers.image.vendor=demo-frontend LABEL org.opencontainers.image.title="Buddy Team" WORKDIR /root/ COPY --from=builder /root/ ./ ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"] EXPOSE 8080

Linting your Dockerfile

The last thing we will do in this article is to lint our Dockerfile. There are multiple tools for linting Dockerfiles, including Buddy's official Dockerfile Linter: https://github.com/buddy-works/dockerfile-linter

For the purposes of the guide, however, we'll use hadolint, with a brief mention of dockerfilelint the end.

Hadolint

Hadolint parses the Dockerfile into an abstract syntax tree (AST), which is a structured object representing the contents of the Dockerfile. In concept, it's similar to how your browser parses HTML source code into the Document Object Model (DOM).

Hadolint then tests the AST against a list of rules to detect places in the Dockerfile which does not follow best practices. Let's run it against our Dockerfile to see where we can improve.

The easiest way to run hadolint is by running the hadolint/hadolint image using Docker.

bash
docker pull hadolint/hadolint docker run --rm -i hadolint/hadolint < Dockerfile /dev/stdin:1 DL3006 Always tag the version of an image explicitly $$$$

You can notice that Hadolint displayed the DL3006 error, which says that the first line of the Dockerfile (/dev/stdin:1) should use a tagged image.

So let's update our FROM instruction to give the base image the latest tag:

dockerfile
FROM node:latest as builder LABEL org.opencontainers.image.vendor=demo-frontend ...

Run the linter again. This time, it gives another error:

bash
docker run --rm -i hadolint/hadolint < Dockerfile /dev/stdin:1 DL3007 Using latest is prone to errors if the image will ever update. Pin the version explicitly to a release tag $$$

This error informs us that we shouldn't use the latest tag, as node:latest can refer to different images over time. Instead, we should pick a more specific tag.

We can be as specific as possible and use a tag like 10.15.3-stretch. However, I've found that using the lts tag is just right, as it follows the latest Long Term Support (LTS) version of Node.js:

Dockerfile
FROM node:lts as builder LABEL org.opencontainers.image.vendor=demo-frontend ...

Now, when we run hadolint again, it finally doesn't generate any new errors.

There are two general rules to using hadolint:

  • Rules which begin with DL imply errors in the Dockerfile syntax
  • Rules which begin with SC imply errors in some of the script(s) in the Dockerfile. These are picked up by another tool called ShellCheck, which performs static analysis on your shell scripts.

Using a Second Linter

Linting your Dockerfile ensures you are following best practices; but you don't have to limit yourself to a single linter! For instance, you can also use the dockerfilelint npm package alongside hadolint.

Using dockerfilelint with our pre-linted Dockerfile yields a similar result, although dockerfilelint outputs in CLI format by default, which might be better for everyday use.

bash
dockerfilelint Dockerfile File: Dockerfile Issues: 1 Line 1: FROM node as builder Issue Category Title Description 1 Clarity Base Image Missing Base images should specify a tag to use. Tag $$$$$$$$$$
Hint
dockerfilelint can also output as JSON, which may be advantageous for programmatic use.
bash
dockerfilelint Dockerfile -o json | jq .files[0].issues [ { "line": "1", "content": "FROM node as builder", "category": "Clarity", "title": "Base Image Missing Tag", "description": "Base images should specify a tag to use." } ] $$$$$$$$$$$

When the issues are fixed, this is the output from dockerfilelint.

bash
dockerfilelint Dockerfile File: Dockerfile Issues: None found đź‘Ť $$$$$

Using multiple linters have the advantage of discovering errors missed by other linters. To finish up, let's build our image using the double-linted Dockerfile!

bash
docker build -t demo-frontend:oci-annotations . $$
Success
If you'd like to learn more on image optimization and further reduction of the container size, I recommend you watch the talk I gave at the London Node User Group (LNUG) entitled Dockerizing JavaScript Applications.

Next Step: Security

Although this article does not focus on image security, we've already improved it by moving our node image to node:alpine. This is because everything library and tool in the container has potential to be exploited in an attack. By reducing their number, we reduce the potential attack surface. The same principle applies to reducing the number of running processes in our container.

However, there's lot more we can do, and for this I invite you to the last article in my series.

Optimize Docker deployment with Buddy. Save time and get the job done. ⏰
Try Buddy for Free

Additional resources

Read similar articles