How to optimize Node.js Docker image (Part 2)

Introduction

In the first part of Node-in-Docker optimization series we covered:

Reducing the number of running processes
Handling signals properly
Making use of the build cache
Using ENTRYPOINT
Using EXPOSE to document exposed ports

This was the resulting Dockerfile we finished with:

Dockerfile
FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080

In the second part of this article, we will cover:

Reducing the image size by:
- Removing obsolete files
- Using a lighter base image
Using labels (LABEL)
Adding semantics to labels
Linting your Dockerfile

Hint

You can find the source code of the Node application used in this guide at github.com/buddy-works/tutorials-docker-node-frontend in the docker/basic branch.

Reducing the Docker image file size

If we take a look at our Node image now, you'll find that it's huge (939MB to be exact).

bash
docker images demo-frontend:expose
REPOSITORY     TAG     IMAGE ID      SIZE
demo-frontend  expose  9ffa262cf2ce  939MB
$$$

To deploy this image to a remote server, we must transfer at least 939 MB of data. Now, imagine a scenario where you need to roll back to a previous deployment in production, e.g. because of errors in the source code: if your image is large, there may be a noticeable downtime before the image finishes the transfer to the server and the rollback is complete.

Therefore, reducing the file size of our Docker image is important.

Removing obsolete files

If we examine the contents of our container, we'll find many source code files that were required by the build process, but not during the runtime:

bash
docker exec -it demo-frontend du -ahd1
16K    ./dist
36K    ./src
4.0K   ./webpack.config.js
55M    ./node_modules
15M    ./.npm
4.0K   ./package.json
164K   ./package-lock.json
70M    .
$$$$$$$$$

In fact, out of the files above, only dist/ and node_modules/ are needed. The rest of files increase the container size and can be removed without a second thought.

A naive approach would be to add an extra RUN instruction to remove these files.

Dockerfile
FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \;"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080

Whilst this does get rid of some files, it does not reduce the filesize of our Node image. This is because images in Docker are built layer by layer: once a layer is added, it cannot be removed from the image. Therefore, adding an extra RUN instruction will actually increase the image's filesize.

Another approach would be to combine build and cleanup steps into a single instruction:

Dockerfile
FROM node
WORKDIR /root/
COPY [".", "./"]
RUN ["/bin/sh", "-c", "npm install && npm run build && find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \;"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080

And again: while this does reduce the image size, it undoes all the good work we've done leveraging the build cache.

What should we do, then? Use multi-stage builds to remove obsolete files, whilst still taking advantage of the build cache, of course.

What is multi-stage Dockerfile?

Multi-stage build is a Dockerfile feature introduced in v17.05 that allows you to specify multiple images (stages) within the same Dockerfile. More importantly, you are able to copy the build artifacts from one stage to another.

Therefore, inside our Dockerfile, we can have a builder stage, where we install development dependencies, build the application from the source code, splitting that process into multiple instructions to leverage the build cache. Then, we only copy the files required to run the image from the builder stage to the final image:

Dockerfile
FROM node as builder
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \;"]

FROM node
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080

Note that the COPY instruction has been enriched with the --from option to signify that it should copy files from the builder stage instead of the build context.

Success

Using multi-stage builds allows us to leverage the build cache, whilst keeping our final image size small.

If we build our image again, you'll see that we've already saved ~9 MB from the image. Not a small image per se, but enough for some start!

bash
docker build -t demo-frontend:multi-stage .
docker images
REPOSITORY     TAG          IMAGE ID      SIZE
demo-frontend  multi-stage  cf57206dc983  930MB
<none>         <none>       8874c0fec4c9  939MB
$$$$$

The <none>:<none> image is an intermediate builder stage image, which can be safely discarded, although doing so will also remove the cached layers.

Hint

We will outline a way to easily clean up intermediate images later in this article.

Using a lighter base image

Even though we got rid of unnecessary build artifacts, 9 MB is not much relative to the size of the image. We can significantly reduce the size of the image by using a lighter base image.

At the moment, we are using the official node base image, which itself is 904 MB:

bash
docker images node
REPOSITORY  TAG     IMAGE ID      SIZE
node        latest  a9c1445cbd52  904MB
$$$

This means no matter how much we minimize our demo-frontend image, it will never get smaller than 904 MB. So why is it so large?

If we look inside the Dockerfile for the node base image, we'll find that it's based on the buildpack-deps image, which contains a large number of common Debian packages, including build tools, system libraries, and system utilities. We might need these utilities when building our demo-frontend image, but we won't need them to run our node process.

Fortunately, there's a variant of the image called node:alpine. It is based off Linux Alpine, a small image at only 5.53 MB of size:

bash
docker images alpine
REPOSITORY  TAG     IMAGE ID      SIZE
alpine      latest  5cb3aa00f899  5.53MB
$$$

The alpine image doesn't include any build tools or libraries (it doesn't even have Bash!), allowing for a much smaller size than the node:latest image:

bash
docker images node
REPOSITORY  TAG     IMAGE ID      SIZE
node        slim    e52c23bbdd87  148MB
node        latest  a9c1445cbd52  904MB
node        alpine  953c516e1466  76.1MB
$$$$$

Therefore, the first stage would be updating our Dockerfile to use node:alpine for our final image. At the same time, we need to keep node for our builder stage:

Dockerfile
FROM node as builder
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \;"]

FROM node:alpine
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080

When you build the image again, you should notice a drastic decrease in size:

bash
docker images demo-frontend:alpine
REPOSITORY     TAG     IMAGE ID      SIZE
demo-frontend  alpine  97373fdcb697  102MB
$$$

Removing Intermediate Images

Multi-stage build is a great feature, as it allows you to keep images small and use the build cache at the same time. But this also means a lot of intermediate images are going to be generated.

These intermediate images are a type of dangling image, which are images that does not have a name. Generally, you should keep these dangling images, as they are the basis of the build cache. But having them littered in the terminal can be annoying; or if you are maintaining a CI/CD server, you may also want to clean up dangling images regularly.

You can output a list of dangling images by using the --filter flag to the standard command:

bash
docker images --filter dangling=true
REPOSITORY  TAG     IMAGE ID      SIZE
<none>      <none>  8874c0fec4c9  939MB
$$$

To remove dangling images, run:

bash
docker rmi $(docker images --filter dangling=true --quiet)
$

However, this indiscriminately removes all dangling images. What if you just want to remove the images generated from a certain build? Enter the second stage: LABELS!

Using LABEL instruction

The LABEL instruction allows you to specify the metadata in your image as key-value pairs. You can use labels to:

document contact details of the author and/or maintainer of the image
check the build date of the image
add licensing information

In our case, we can use labels to mark an image as intermediate and belonging to the demo-frontend build:

Dockerfile
FROM node as builder
LABEL name=demo-frontend
LABEL intermediate=true
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \;"]

FROM node:alpine
LABEL name=demo-frontend
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080

Now, when we run docker build, it will already be labeled. And so we can use the labels to filter the output of the listing command:

bash
docker build -t demo-frontend:labels .
docker images --filter label=name=demo-frontend
REPOSITORY     TAG     IMAGE ID      SIZE
demo-frontend  labels  6965537afe54  102MB
<none>         <none>  0cbce2a3844b  939MB
$$$$$

It also allows us to remove the intermediate image of our demo-frontend build(s) by running:

bash
docker rmi $(docker images --filter label=name=demo-frontend --filter label=intermediate=true --quiet)
$

Adding semantics to labels

Above, we picked two strings – name and intermediate – as our label key values. This is fine for now, but what if the author of another Docker image decides to use these labels as well? This is why Docker recommends that all LABEL instructions should have keys that are namespaced with the reverse DNS name of the domain that you own. This will help avoid clashes in label key names. Therefore, we should update our labels accordingly:

Dockerfile
FROM node as builder
LABEL works.buddy.name=demo-frontend
LABEL works.buddy.intermediate=true
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \;"]

FROM node:alpine
LABEL works.buddy.name=demo-frontend
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080

Whilst namespacing prevent label keys from clashing, it lacks a common semantics: how would a user know what works.buddy.intermediate means? Or whether works.buddy.intermediate conveys the same meaning as com.acme.intermediate?

In the past, Docker users and organizations came up with multiple conventions for imposing semantics to label key names, including:

Label Schema, which uses a shared org.label-schema namespace
Generic labels suggested by Project Atomic

However, both have been superseded by annotations defined in the OCI Image Format Specification.

This specification defines multiple pre-defined annotation keys, each prefixed with the org.opencontainers.image. namespace.

For example, the annotation specification specifies that the org.opencontainers.image.title label should be used to specify the "human-readable title of the image", and the org.opencontainers.image.vendor label be used for the "name of the distributing entity, organization or individual".

So let's update the label keys in our Dockerfile with these standardized label keys wherever possible:

Dockerfile
FROM node as builder
LABEL org.opencontainers.image.vendor=demo-frontend
LABEL works.buddy.intermediate=true
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \;"]

FROM node:alpine
LABEL org.opencontainers.image.vendor=demo-frontend
LABEL org.opencontainers.image.title="Buddy Team"
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080

Linting your Dockerfile

The last thing we will do in this article is to lint our Dockerfile. There are multiple tools for linting Dockerfiles, including Buddy's official Dockerfile Linter: https://github.com/buddy-works/dockerfile-linter

For the purposes of the guide, however, we'll use hadolint, with a brief mention of dockerfilelint the end.

Hadolint

Hadolint parses the Dockerfile into an abstract syntax tree (AST), which is a structured object representing the contents of the Dockerfile. In concept, it's similar to how your browser parses HTML source code into the Document Object Model (DOM).

Hadolint then tests the AST against a list of rules to detect places in the Dockerfile which does not follow best practices. Let's run it against our Dockerfile to see where we can improve.

The easiest way to run hadolint is by running the hadolint/hadolint image using Docker.

bash
docker pull hadolint/hadolint
docker run --rm -i hadolint/hadolint < Dockerfile
/dev/stdin:1 DL3006 Always tag the version of an image explicitly
$$$

You can notice that Hadolint displayed the DL3006 error, which says that the first line of the Dockerfile (/dev/stdin:1) should use a tagged image.

So let's update our FROM instruction to give the base image the latest tag:

dockerfile
FROM node:latest as builder
LABEL org.opencontainers.image.vendor=demo-frontend
...

Run the linter again. This time, it gives another error:

bash
docker run --rm -i hadolint/hadolint < Dockerfile
/dev/stdin:1 DL3007 Using latest is prone to errors if the image will ever update. Pin the version explicitly to a release tag
$$

This error informs us that we shouldn't use the latest tag, as node:latest can refer to different images over time. Instead, we should pick a more specific tag.

We can be as specific as possible and use a tag like 10.15.3-stretch. However, I've found that using the lts tag is just right, as it follows the latest Long Term Support (LTS) version of Node.js:

Dockerfile
FROM node:lts as builder
LABEL org.opencontainers.image.vendor=demo-frontend
...

Now, when we run hadolint again, it finally doesn't generate any new errors.

There are two general rules to using hadolint:

Rules which begin with DL imply errors in the Dockerfile syntax
Rules which begin with SC imply errors in some of the script(s) in the Dockerfile. These are picked up by another tool called ShellCheck, which performs static analysis on your shell scripts.

Using a Second Linter

Linting your Dockerfile ensures you are following best practices; but you don't have to limit yourself to a single linter! For instance, you can also use the dockerfilelint npm package alongside hadolint.

Using dockerfilelint with our pre-linted Dockerfile yields a similar result, although dockerfilelint outputs in CLI format by default, which might be better for everyday use.

bash
dockerfilelint Dockerfile 

File:   Dockerfile
Issues: 1

Line 1: FROM node as builder
Issue  Category  Title               Description
    1  Clarity   Base Image Missing  Base images should specify a tag to use.
                 Tag 
$$$$$$$$$

Hint

dockerfilelint can also output as JSON, which may be advantageous for programmatic use.

bash
dockerfilelint Dockerfile -o json | jq .files[0].issues
[
  {
    "line": "1",
    "content": "FROM node as builder",
    "category": "Clarity",
    "title": "Base Image Missing Tag",
    "description": "Base images should specify a tag to use."
  }
]
$$$$$$$$$$

When the issues are fixed, this is the output from dockerfilelint.

bash
dockerfilelint Dockerfile

File:   Dockerfile
Issues: None found 👍
$$$$

Using multiple linters have the advantage of discovering errors missed by other linters. To finish up, let's build our image using the double-linted Dockerfile!

bash
docker build -t demo-frontend:oci-annotations .
$

Success

If you'd like to learn more on image optimization and further reduction of the container size, I recommend you watch the talk I gave at the London Node User Group (LNUG) entitled Dockerizing JavaScript Applications.

Next Step: Security

Although this article does not focus on image security, we've already improved it by moving our node image to node:alpine. This is because everything library and tool in the container has potential to be exploited in an attack. By reducing their number, we reduce the potential attack surface. The same principle applies to reducing the number of running processes in our container.

However, there's lot more we can do, and for this I invite you to the last article in my series.

Additional resources

Daniel Li

Staff Software Engineer @ Zinc Work

Daniel Li is a DevOps Engineer and Fullstack Node.js Developer, working with AWS, Ansible, Terraform, Docker, Kubernetes, and Node.js. He is the author of the book _Building Enterprise JavaScript Applications_, published by Packt.

Read similar articles

Docker Commands Cheat Sheet A beginner's guide to configuring a Discord Bot in Node.js

Feb 19, 2024

On this page

Introduction

Reducing the Docker image file size

Removing obsolete files

What is multi-stage Dockerfile?

Using a lighter base image

Removing Intermediate Images

Using LABEL instruction

Adding semantics to labels

Linting your Dockerfile

Hadolint

Using a Second Linter

Next Step: Security

Additional resources

How to optimize Node.js Docker image (Part 2)

Introduction

Reducing the Docker image file size

Removing obsolete files

What is multi-stage Dockerfile?

Using a lighter base image

Removing Intermediate Images

Using LABEL instruction

Adding semantics to labels

Linting your Dockerfile

Hadolint

Using a Second Linter

Next Step: Security

Additional resources

Start for free

Self-hosted

Cloud