Optimizing Dockerfile for Node.js (Part 1)
Table of Contents
Let's say you've just built an amazing Node.js application and want to distribute it as a Docker image. You write a Dockerfile
, run docker build
, and distribute the generated image on a Docker registry like Docker Hub.
You pat yourself on the back and utter to yourself "Not too shabby!". But being the perfectionist that you are, you want to make sure that your Docker image and Dockerfile
are as optimized as possible.
Well, that's exactly what we'll cover in this article! But because there're a lot of techniques for optimizing your Docker image, we've split this article into 2 parts. The first part (this one) will cover:
- Reducing the number of running processes
- Handling signals properly
- Making use of the build cache
- Using
ENTRYPOINT
- Using
EXPOSE
to document exposed ports
In the second part, we will cover:
- Reducing the Docker Image file size by:
- Removing Obsolete Files
- Using a lighter base image
- Using labels (
LABEL
) - Adding Semantics to Labels
- Linting your Dockerfile
Later on, we will publish a dedicated article on securing our Docker image, where we will cover:
- Following the Principle of Least Privilege
- Signing and verifying Docker Images
- Use
.dockerignore
to ignore sensitive files - Vulnerability Scanning
Although this article deals with a Node.js application, the principles outlined here applies to applications written in other languages and frameworks too!
Background
For this article, we will work to optimize the Dockerfile
associated with a basic front-end application. Start by cloning the repository at github.com/d4nyll/docker-demo-frontend. Specifically, we want to use the docker/basic
branch.
$ git clone -b docker/basic https://github.com/d4nyll/docker-demo-frontend.git
Next, open up the Dockerfile
to see what instructions are already there.
FROM node
WORKDIR /root/
COPY . .
RUN npm install
RUN npm run build
CMD npm run serve
Each line inside a
Dockerfile
is called an instruction. You can find all valid instructions at the Dockerfile reference page.
Pretty basic stuff. But for those new to Docker, here's a brief overview:
FROM node
- we want to build our Docker image based on the
node
base image
- we want to build our Docker image based on the
WORKDIR /root/
- we want all subsequent instructions in this
Dockerfile
to be carried out inside the specified directory. It's similar to runnincd /root/
on your terminal.
- we want all subsequent instructions in this
COPY . .
- copy everything from the build context to the current
WORKDIR
. Don't know what a build context is? Check out the documentation ondocker build
.
- copy everything from the build context to the current
RUN npm install
- run
npm install
to install all the application's dependencies, as specified inside thedependencies
property of thepackage.json
, as well as thepackage-lock.json
file
- run
RUN npm build
- run the
build
npm script insidepackage.json
, which simply uses Webpack to build the application
- run the
CMD npm run serve
- whilst all the previous instructions are executed during the
docker build
process, theCMD
command is executed when you rundocker run
. It specifies which process should run inside the container as the first process.
- whilst all the previous instructions are executed during the
Try running docker build
to build our image.
$ docker build -t demo-frontend:basic .
...
Removing intermediate container a3d5032b851b
---> 703e723acecf
Successfully built 703e723acecf
Successfully tagged demo-frontend:basic
You should now be able to see the demo-frontend:basic
image when you run docker images
.
$ docker images
REPOSITORY TAG IMAGE ID SIZE
demo-frontend basic 703e723acecf 939MB
node latest b18afbdfc458 908MB
Next, run docker run
to run our application.
$ docker run --name demo-frontend demo-frontend:basic
> frontend@1.0.0 serve /root
> http-server ./dist/
Starting up http-server, serving ./dist/
Available on:
http://127.0.0.1:8080
Hit CTRL-C to stop the server
If you see this (rather rudimentary) interface on the URL outputted by npm run serve
(127.0.0.1:8080
in our example), then the application is running successfully.
Reducing the Number of Processes
With our Docker container running, we can run docker exec
on another terminal to see what processes are running inside our container.
$ docker exec demo-frontend ps -eo pid,ppid,user,args --sort pid
PID PPID USER COMMAND
1 0 root /bin/sh -c npm run serve
6 1 root npm
17 6 root sh -c http-server ./dist/
18 17 root node /root/node_modules/.bin/http-server ./dist/
25 0 root ps -eo pid,ppid,user,args --sort pid
We can see an /bin/sh
shell (with process ID (PID) of 1
) is invoked to execute npm
(PID 6
), which invokes another sh
shell (PID 17
) to run our npm serve
script, which then executes the node
command (PID 18
) that we actually want.
The
ps
command is the same one that we are running withdocker exec
. It would not normally be running inside the container, and we can ignore it here.
That's a lot of processes that are not needed to run our application, and each one takes up a large amount of memory relative to the total memory usage of the container. It would be ideal if we can just run the node /root/node_modules/.bin/http-server ./dist/
command and nothing else.
Avoid using npm script
It's best not to use npm
as the CMD
command because, as you saw above, npm
will invoke a sub-shell and execute the script inside that sub-shell, yielding a redundant process. Instead, you should specify the command directly as the value of our CMD
instruction.
Update the CMD
instruction inside your Dockerfile
to invoke our node
process directly.
FROM node
WORKDIR /root/
COPY . .
RUN npm install
RUN npm run build
CMD node /root/node_modules/.bin/http-server ./dist/
Next, try stopping our existing http-server
instance by pressing Ctrl + C. Hmmm, it seems like it's not working! We will explain the reason shortly, but for now, run docker stop demo-frontend
and docker rm demo-frontend
on a separate terminal to stop and remove the container.
With a clean slate, let's build and run our image again.
$ docker build -t demo-frontend:no-npm .
$ docker run --name demo-frontend demo-frontend:no-npm
Once again, run docker exec
on a separate terminal. This time, the number of processes have been reduced from 4 to 2.
$ docker exec demo-frontend ps -eo pid,ppid,user,args --sort pid
PID PPID USER COMMAND
1 0 root /bin/sh -c node /root/node_modules/.bin/http-server ./dist/
6 1 root node /root/node_modules/.bin/http-server ./dist/
13 0 root ps -eo pid,ppid,user,args --sort pid
If we calculate the 'real memory' used by the container before and after the change, you'll find that we've saved ~16MB, just by removing the superfluous npm
and sh
functions.
If you are interested in how to calculate the 'real memory' usage, have a read around the topic of proportional set size (PSS).
However, our node
command is still being ran inside of a /bin/sh
shell. How do we get rid of that shell and invoke node
as the first and only process inside our container? To answer that, we must understand and use the exec form syntax in our Dockerfile
.
Using the Exec Form
Docker supports two different syntax when specifying instructions inside your Dockerfile
- the shell form, which is what we've been using, and the exec form.
The exec form specifies the command and its options and arguments in the form of a JSON array, rather than a simple string. Our Dockerfile
using the exec form would look like this:
FROM node
WORKDIR /root/
COPY . .
RUN ["npm", "install"]
RUN ["npm", "run", "build"]
CMD ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
Shell vs. Exec Form
The practical difference is that with the shell form, Docker will implicitly invoke a shell and run the CMD
command inside that shell (this is what we saw earlier). With the exec form, the command we specified is run directly, without first invoking a shell.
Again, stop and remove the existing demo-frontend
container, update your Dockerfile
to use the exec form, build it, run it, and run docker exec
to query the container's process(es).
$ docker stop demo-frontend && docker rm demo-frontend
$ docker build -t demo-frontend:exec .
$ docker run --name demo-frontend demo-frontend:exec
$ docker exec demo-frontend ps -eo pid,ppid,user,args --sort pid
PID PPID USER COMMAND
1 0 root node /root/node_modules/.bin/http-server ./dist/
12 0 root ps -eo pid,ppid,user,args --sort pid
Great, now the only process running inside our container is the node
process we care about! We have succesfully reduced the number of running processes to just one!
Signal Handling
However, saving a single process is not the reason why we prefer the exec form over the shell form. The real reason is because of signal handling.
On Linux, different processes can communicate with each other through inter-process communication (IPC). One method of IPC is signalling). If you use the command line, you've probably used signals without realizing it. For example, when you press Ctrl + C, you're actually instructing the kernel to send a SIGINT
signal to the process, requesting it to stop.
Remember previously, when we tried to stop our container by ressing Ctrl + C, it didn't work. But now, let's try that again. With the demo-frontend:exec
image running, try pressing Ctrl + C on the terminal running http-server
. This time, the http-server
stops successfully.
$ docker run --name demo-frontend demo-frontend:exec
Starting up http-server, serving ./dist/
Available on:
http://127.0.0.1:8080
http://172.17.0.2:8080
Hit CTRL-C to stop the server
^Chttp-server stopped.
So why did it work this time, but not earlier? This is because when we send the SIGINT
signal from our terminal, we are actually sending it to the to the first process ran inside the container. This process is known as the init process, and has the PID of 1
.
Therefore, the init process must have the ability to listen for the SIGINT
signal. When it receives the signal, it must try to shutdown gracefully. For example, for a web server, the server must stop accepting any new requests, wait for any remaining requests to finish, and then exit.
With the shell form, the init process is /bin/sh
. When /bin/sh
receives the SIGINT
signal, it'll simply ignore it. Therefore, our container and the http-server
process won't be stopped.
When we run docker stop demo-frontend
, the Docker daemon similarly sends a SIGTERM
signal to the container's init process, but again, /bin/sh
ignores it. After a time period of around 10 seconds, the Docker daemon realizes the container is not responding to the SIGTERM
signal, and issues a SIGKILL
signal, which forcefully kills the process. The SIGKILL
signal cannot be handled; this means processes within the container do not get a chance to shut down gracefully. For a web server, it might mean that existing requests won't have a chance to run to completion, and your client might have to retry that request again.
If we measure the time it takes to stop a container where the init process is /bin/sh
, you can see that it takes just over 10 seconds, which is the timeout period Docker will wait before sending a SIGKILL
.
$ time docker stop demo-frontend
real 0m10.443s
user 0m0.072s
sys 0m0.022s
In comparison, when we use the exec form, node
is the init process and it will handle the SIGINT
and SIGTERM
signals. You can either include a process.on('SIGINT')
handler yourself, or the default one will be used. The point is, with node
as the first command, you have the ability to catch signals and handle them.
To demonstrate, with the new image built using the exec form Dockerfile
, the container can be stopped in under half a second.
$ time docker stop demo-frontend
real 0m0.420s
user 0m0.053s
sys 0m0.026s
If the application you are running cannot handle signals, you should run
docker run
with the--init
flag, which will executetini
as its first process. Unlikesh
,tini
is a minimalistic init system that can handle and propagate signals.
Caching Layers
So far, we've looked at techniques that improves the function of our Docker image whilst it's running. In this section, we'll look at how we can use Docker's build cache to make the build process faster.
When we run docker build
, Docker will run the base image as a container, execute each instruction sequentially (one after the other) on top of it, and save the resulting state of the container in a layer, and use that as the base image for the next instruction. The final image is built this way - layer by layer.
You can conceptualize a layer as a diff from the previous layer.
(Taken from the About images, containers, and storage drivers page)
However, pulling or building an image from scratch every single time can be time-consuming. This is why Docker will try to use an existing, cached layer whenever possible. If Docker determines that the next instruction will yield the same result as an existing layer, it will use the cached layer.
For example, let's say we've updated something inside the src
direcotry; when we run docker build
again, Docker will use the cached layer associated with the FROM node
and WORKDIR /root/
instructions.
When it gets to the COPY
instruction, it will notice that the source code has changed, and invalidates the cached layer and builds it from scratch. This will also invalidate every layer that comes after it. Therefore, every instruction after the COPY
instruction must be built again. In this instance, this build process takes ~10 seconds.
$ time docker build -t demo-frontend:exec .
Sending build context to Docker daemon 511kB
Step 1/6 : FROM node
---> a9c1445cbd52
Step 2/6 : WORKDIR /root/
---> Using cache
---> 7ac595062ce2
Step 3/6 : COPY . .
---> 3c2f3cfb6f92
Step 4/6 : RUN ["npm", "install"]
...
Successfully built 326bf48a8488
Successfully tagged demo-frontend:exec
real 0m10.387s
user 0m0.187s
sys 0m0.089s
However, making a small change in our source code (e.g. fixing a typo) shouldn't affect the dependencies of our application, and so there's really no need to run npm install
again. However, because the cache is invalidated in an earlier step, every subsequent step must be re-ran from scratch.
To optimize this, we should copy only what is needed for the next immediate step. This means if the next step is npm install
, we should COPY
only the package.json
and package-lock.json
, and nothing else.
Update our Dockerfile
to copy only what is needed for the next immediate step:
FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
CMD ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
By COPY
ing only what is needed immediately, we allow more layers of the image to be cached. Now, if we update the /src
direcotry again, every instructions and layers up until COPY ["src/", "./src/"]
are cached.
$ time docker build -t demo-frontend:cache .
Step 1/8 : FROM node
Step 2/8 : WORKDIR /root/
---> Using cache
Step 3/8 : COPY ["package.json", "package-lock.json", "./"]
---> Using cache
Step 4/8 : RUN ["npm", "install"]
---> Using cache
Step 5/8 : COPY ["webpack.config.js", "./"]
---> Using cache
Step 6/8 : COPY ["src/", "./src/"]
Step 7/8 : RUN ["npm", "run", "build"]
...
Successfully tagged demo-frontend:cache
real 0m3.175s
user 0m0.193s
sys 0m0.132s
Now, instead of taking ~10 seconds to build, it takes only ~3 seconds (your mileage may vary, but using the build cache will always be faster.
You can find more details on caching, including how Docker determines when a cache is invalidated, on the Dockerfile Best Practices page.
Using ENTRYPOINT and CMD together
Right now, the command that's ran by docker run
is specified by the CMD
instruction. This command can be overriden by the user of the image (the one executing docker run
). For example, if I want to use a different port (e.g. 4567
) rather than the default (8080
), then I can run:
$ docker run --name demo-frontend demo-frontend:cache node /root/node_modules/.bin/http-server ./dist/ -p 4567
Starting up http-server, serving ./dist/
Available on:
http://127.0.0.1:4567
http://172.17.0.2:4567
Hit CTRL-C to stop the server
However, we have to specify the whole command in its entirety. This requires the user of the image to know where the executable is located within the container (i.e. /root/node_modules/.bin/http-server
). We should make it as easy as possible for the user to run our application. Wouldn't it be nice if they can run the containerized application in the same way as the non-containerized application?
You guessed it! We can!
Instead of using the CMD
instruction only, we can use the ENTRYPOINT
instruction to specify the default command and options to run, and use the CMD
instruction to specify any additional options that are commonly overridden.
Update our Dockerfile
to make use of the ENTRYPOINT
instruction.
FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
And build the image.
$ docker build -t demo-frontend:entrypoint .
Using this method, the user can run the image as if it was the http-server
command, and does not need to know the underlying file structure of the container.
$ docker run --name demo-frontend demo-frontend:entrypoint -p 4567
The command specified by the ENTRYPOINT
instruction can also be overriden using the --entrypoint
flag of docker run
. For example, if we want to run a /bin/sh
shell inside the container to explore, we can run:
$ docker run --name demo-frontend -it --entrypoint /bin/sh demo-frontend:entrypoint
# hostname
1b64852541eb
Using EXPOSE to document exposed ports
Lastly, let's finish up the first part of this article with some documentation. By default, our http-server
listens on port 8080
; however, a user of our image won't know this without looking up the documentation for http-server
. Likewise, if we are running our own application, the user would have to look inside our implementation code to know which port the application listens on.
We can make it easier for the users by using an EXPOSE
instruction to document which ports and protocol (TCP or UDP) the application expects to listen on. This way, the user can easily figure out which ports needs to be published.
FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080/tcp
Once again, build the image using docker build
.
$ docker build -t demo-frontend:expose .
Now a user can see which ports are exposed either by looking at the Dockerfile
, or by using docker inspect
on the image.
$ docker inspect --format '{{range $key, $value := .ContainerConfig.ExposedPorts}}{{ $key }}{{end}}' demo-frontend:expose
8080/tcp
Note that the EXPOSE
instruction does not publish the port. If the user wishes to publish the port, he/she would have to either:
- use the
-p
flag ondocker run
to individually specify each host-to-container port mapping, or - use the
-P
flag to automatically map all exposed container port(s) to an ephemeral high-ordered host port(s)
Summary
By following the 5 techniques outlined above, we have improved our Dockerfile
and Docker image. However, this is only the begining! Keep an eye out for the next part of this article, where we will reduce the size of our Docker image, learn to use labels, and lint our Dockerfile
!
Daniel Li
Staff Software Engineer @ Zinc Work
Daniel Li is a DevOps Engineer and Fullstack Node.js Developer, working with AWS, Ansible, Terraform, Docker, Kubernetes, and Node.js. He is the author of the book Building Enterprise JavaScript Applications, published by Packt.
Read similar articles
Docker Commands Cheat Sheet
Check out our tutorialHow to Make a Discord Bot in Node.js for Beginners
Check out our tutorialIntegration testing for AWS Lambda in Go with Docker-compose
Check out our tutorial