Optimizing Dockerfile for Node.js (Part 2)
Table of Contents
In the first part of this article, we covered:
- Reducing the number of running processes
- Handling signals properly
- Making use of the build cache
- Using
ENTRYPOINT
- Using
EXPOSE
to document exposed ports
This was the resulting Dockerfile
we finished with from Part 1:
FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080
In the second part of this article, we will cover:
- Reducing the Docker Image file size by:
- Removing Obsolete Files
- Using a lighter base image
- Using labels (
LABEL
) - Adding Semantics to Labels
- Linting your Dockerfile
Reducing the Docker Image file size
If we take a look at our image now, you'll find that it's huge (939MB to be exact).
$ docker images demo-frontend:expose
REPOSITORY TAG IMAGE ID SIZE
demo-frontend expose 9ffa262cf2ce 939MB
For us to deploy this image to a remote server and run it, at least 939MB must be transferred over. Imagine a scenario where you need to rollback to a previous deployment in production; if your Docker image is large, there may be a noticeable downtime before the Docker image finish being transferred onto the servers and the rollback is complete. Therefore, reducing the file size of our Docker image is important.
Removing Obsolete Files
If we examine the contents of our container, we will find many files that were required for the build process, but not during runtime.
$ docker exec -it demo-frontend du -ahd1
16K ./dist
36K ./src
4.0K ./webpack.config.js
55M ./node_modules
15M ./.npm
4.0K ./package.json
164K ./package-lock.json
70M .
In fact, out of the files above, only dist/
and node_modules/
are needed. We should remove the rest.
A naive approach would be to add an extra RUN
instruction to remove these files.
FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080
Whilst this does get rid of the files, it does will not reduce the file size of our image. This is because Docker images are built layer by layer; once a layer is added, it cannot be removed from the image. Adding an additional RUN
instruction will actually increase the image's file size.
Another approach would be to combine the build and cleanup steps into a single instruction.
FROM node
WORKDIR /root/
COPY [".", "./"]
RUN ["/bin/sh", "-c", "npm install && npm run build && find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080
Whilst this does reduce the image size, it undoes all the good work we've done leveraging the build cache.
Instead, we can use multi-stage builds to remove obsolete files, whilst still taking advantage of the build cache.
Using Multi-stage Builds
Multi-stage build is a Dockerfile
feature introduced in v17.05 that allows you to specify multiple images (stages) within the same Dockerfile
. More importantly, you are able to COPY
build artifacts from one stage to another stage.
Therefore, inside our Dockerfile
, we can have a builder stage, where we install dependencies and build our application, splitting that process into multiple instructions to leverage the build cache. Then, we copy only what is needed to run the image from the builder stage to the final image.
FROM node as builder
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"]
FROM node
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080
Note that we specified a --from
option to COPY
to signify that it should copy from the builder
stage, and not from the build context.
Using multi-stage builds allows us to leverage the build cache, whilst keeping our final image size small.
If we build our image again, you'll see that we'ved saved ~9MB from the image.
$ docker build -t demo-frontend:multi-stage .
$ docker images
REPOSITORY TAG IMAGE ID SIZE
demo-frontend multi-stage cf57206dc983 930MB
<none> <none> 8874c0fec4c9 939MB
The <none>:<none>
image is the intermediate builder stage image, which can be safely discarded, although doing so will also remove the cached layers.
We will outline a way to easily clean up intermediate images later in this article.
Using a lighter base image
Even though we've gotten rid of unnecessary build artifacts, 9MB is not a lot relative to the size of the image. We can reduce the size of the image more significantly by using a lighter base image.
At the moment, we are using the node
base image, which is, itself, 904MB.
$ docker images node
REPOSITORY TAG IMAGE ID SIZE
node latest a9c1445cbd52 904MB
This means no matter how much we minimize our demo-frontend
image, it will never get smaller than 904MB. So why is it so large?
If we look inside the Dockerfile
for the node
base image, we'll find that it's based on the buildpack-deps
image, which contains a large number of common Debian packages, including build tools, system libraries, and system utilities. We might need these utilities when building our demo-frontend
image, but we won't need them to run our node
process.
Fortunately, there's a variant of the node
image called node:alpine
. The node:alpine
image is based off the alpine
(Linux Alpine) image, which is a much smaller base image (5.53MB).
$ docker images alpine
REPOSITORY TAG IMAGE ID SIZE
alpine latest 5cb3aa00f899 5.53MB
The alpine
image doesn't include any build tools or libraries (it doesn't even have Bash!), allowing it to have a much smaller image size than the node:latest
image.
$ docker images node
REPOSITORY TAG IMAGE ID SIZE
node slim e52c23bbdd87 148MB
node latest a9c1445cbd52 904MB
node alpine 953c516e1466 76.1MB
Therefore, we should update our Dockerfile
to use node:alpine
instead of node
for our final image (but keep using node
for our builder
stage).
FROM node as builder
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"]
FROM node:alpine
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080
When we build our image again, you should notice the size of the image decreased drastically!
$ docker images demo-frontend:alpine
REPOSITORY TAG IMAGE ID SIZE
demo-frontend alpine 97373fdcb697 102MB
Removing Intermediate Images
Mutli-stage build is a great feature, as it allows you to keep images small and make use of the build cache. But this also means a lot of intermediate images are going to be generated.
These intermediate images are a type of dangling image, which are images that does not have a name. Generally, you should keep these dangling images, as they are the basis of the build cache. But having them littered across the output of your docker images
output can be annoying; or if you are maintaining a CI/CD server, you may also want to clean up dangling images regularly.
You can output a list of dangling images by using the --filter
flag of docker images
.
$ docker images --filter dangling=true
REPOSITORY TAG IMAGE ID SIZE
<none> <none> 8874c0fec4c9 939MB
And you can remove them by running docker rmi $(docker images --filter dangling=true --quiet)
. However, this indiscriminately removes all dangling images. What if you just want to remove dangling images generated from a certain build? Enter labels!
Using labels (LABEL)
The LABEL
instruction allows you to specify metadata (as key-value pairs) to your image. You can use labels to:
- document contact details of the author and/or maintainer of the image (this replaces the deprecated
MAINTAINER
instruction) - the build date of the image
- add licensing information
In our case, we can use labels to mark an image as intermediate and belonging to the demo-frontend
build.
FROM node as builder
LABEL name=demo-frontend
LABEL intermediate=true
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"]
FROM node:alpine
LABEL name=demo-frontend
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080
Now, when we build our image, it will be labelled, and we can filter the output of docker images
using the labels.
$ docker build -t demo-frontend:labels .
$ docker images --filter label=name=demo-frontend
REPOSITORY TAG IMAGE ID SIZE
demo-frontend labels 6965537afe54 102MB
<none> <none> 0cbce2a3844b 939MB
It also allows us to remove the intermediate image of our demo-frontend
build(s) by running docker rmi $(docker images --filter label=name=demo-frontend --filter label=intermediate=true --quiet)
.
Adding Semantics to Labels
Above, we picked two strings - name
and intermediate
- as our label key values. This is fine for now, but what if the author of another Docker image decides to use these labels as well? This is why Docker recommends that all LABEL
instructions should have keys that are namespaced with the reverse DNS name of a domain that you own. This will avoid clashes in label key names. Therefore, we should update our labels accordingly.
FROM node as builder
LABEL works.buddy.name=demo-frontend
LABEL works.buddy.intermediate=true
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"]
FROM node:alpine
LABEL works.buddy.name=demo-frontend
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080
Whilst namespacing prevent label keys from clashing, it lacks a common semantics - how would a user know what works.buddy.intermediate
mean? Or whether works.buddy.intermediate
conveys the same meaning as com.acme.intermediate
?
In the past, Docker users and organizations have came up with multiple conventions for imposing semantics to label key names, including:
- Label Schema, which uses a shared
org.label-schema
namespace - Generic labels suggested by Project Atomic
However, both have been superseded by annotations defined in the Open Container Initiative (OCI) Image Format Specification. This specification defines multiple pre-defined annotation keys, each prefixed with the org.opencontainers.image.
namespace.
For example, the annotations specification specifies that the org.opencontainers.image.title
label be used to specify the "human-readable title of the image", and the org.opencontainers.image.vendor
label be used for the "name of the distributing entity, organization or individual".
So let's update the label keys in our Dockerfile
with these standardized label keys wherever possible.
FROM node as builder
LABEL org.opencontainers.image.vendor=demo-frontend
LABEL works.buddy.intermediate=true
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
RUN ["/bin/bash", "-c", "find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \\;"]
FROM node:alpine
LABEL org.opencontainers.image.vendor=demo-frontend
LABEL org.opencontainers.image.title="Buddy Team"
WORKDIR /root/
COPY --from=builder /root/ ./
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080
Linting your Dockerfile
The last thing we will do in this article is to lint our Dockerfile
. There are multiple tools available for linting Dockerfile
s:
- Haskell Dockerfile Linter, or hadolint
- written in Haskell
dockerfilelint
- written in JavaScript, and has an online version at fromlatest.io
- RedHat's Project Atomic's
dockerfile_lint
- also written in JavaScript
dockerlint
- another linter written in JavaScript
In this article, we will use hadolint, with a brief mention of dockerfilelint
at the end.
Hadolint
Hadolint parses the Dockerfile
into an abstract syntax tree (AST), which is a structured object representing the contents of the Dockerfile
. It is similar in concept to how your browser parses HTML source code into the Document Object Model (DOM).
Hadolint will then test the AST against a list of rules to detect places in the Dockerfile
which does not follow best practices. Let's run it against our Dockerfile
to see where we can improve.
The easiest way to run hadolint is by running the hadolint/hadolint
image using Docker.
$ docker pull hadolint/hadolint
$ docker run --rm -i hadolint/hadolint < Dockerfile
/dev/stdin:1 DL3006 Always tag the version of an image explicitly
Hadolint displayed the DL3006
error, which says that the first line (/dev/stdin:1
) of the Dockerfile
should use a tagged image. So let's update our FROM
instruction to give our node
base image the latest
tag.
FROM node:latest as builder
LABEL org.opencontainers.image.vendor=demo-frontend
...
We can run hadolint again; this time, it gives another error.
$ docker run --rm -i hadolint/hadolint < Dockerfile
/dev/stdin:1 DL3007 Using latest is prone to errors if the image will ever update. Pin the version explicitly to a release tag
The DL3007
error informs us that we shouldn't use the latest
tag, as node:latest
can reference different images over time. Instead, we should pick a more specific tag. We could be as specific as possible and use a tag like 10.15.3-stretch
. However, I've found using the lts
tag is often sufficient, as it follows the latest Long Term Support (LTS) version of Node.js.
FROM node:lts as builder
LABEL org.opencontainers.image.vendor=demo-frontend
...
Now, when we run hadolint again, it doesn't generate any errors anymore!
In general, where using hadolint, there are two types of rules:
- Rules which begins with
DL
implies errors in theDockerfile
syntax - Rules which begins with
SC
implies errors in some of the script(s) you specified within theDockerfile
. These are picked up by another tool called ShellCheck, which performs static analysis on your shell scripts.
Using a Second Linter
Linting your Dockerfile
ensures you are following best practices; but you don't have to limit yourself to a single linter! For instance, you can also use the dockerfilelint
npm package alongside hadolint.
Using dockerfilelint
with our pre-linted Dockerfile
yields a similar result, although dockerfilelint
outputs in CLI format by default, which might be better for everyday use.
$ dockerfilelint Dockerfile
File: Dockerfile
Issues: 1
Line 1: FROM node as builder
Issue Category Title Description
1 Clarity Base Image Missing Base images should specify a tag to use.
Tag
dockerfilelint
can also output as JSON, which may be advantagous for programmatic use.
$ dockerfilelint Dockerfile -o json | jq .files[0].issues
[
{
"line": "1",
"content": "FROM node as builder",
"category": "Clarity",
"title": "Base Image Missing Tag",
"description": "Base images should specify a tag to use."
}
]
When the issues are fixed, this is the output from dockerfilelint
.
$ dockerfilelint Dockerfile
File: Dockerfile
Issues: None found 👍
Using multiple linters have the advantage of discovering errors missed by other linters. To finish up, let's build our image using the (double-)linted Dockerfile
!
$ docker build -t demo-frontend:oci-annotations .
Next Steps
In this article, we have only covered the basics. If you'd like to learn more, I'd recommend you watch a talk I gave at the London Node User Group (LNUG) back in October 2018, titled Dockerizing JavaScript Applications.
An important aspect we haven't covered is security.
Unbeknownst to you, we've already made strides in securing your Docker image! When we moved from our node
image to node:alpine
, that has already improved the security of the image.
This is because everything inside a container has the potential to be exploited in an attack. By reducing the number of libraries and tools, we reduce the potential attack surface. The same principle applies when we reduced the number of running processes in our container.
However, there are a lot more we can, and should, do to secure our image. So stay tuned for our next article - Securing our Docker image - which builds on top of this one.
Daniel Li
Staff Software Engineer @ Zinc Work
Daniel Li is a DevOps Engineer and Fullstack Node.js Developer, working with AWS, Ansible, Terraform, Docker, Kubernetes, and Node.js. He is the author of the book Building Enterprise JavaScript Applications, published by Packt.
Read similar articles
Docker Commands Cheat Sheet
Check out our tutorialHow to Make a Discord Bot in Node.js for Beginners
Check out our tutorialIntegration testing for AWS Lambda in Go with Docker-compose
Check out our tutorial