Kevin's Blog

The only way to discover the limits of the possible is to go beyond them into the impossible. - Arthur C. Clarke

Jun 4, 2017 - 8 minute read - Comments - workshop

Docker Image Optimization

This is a presentation that I made at our Nashville Docker Meetup in May. We will cover the new multi-stage build features of Dockerfile and cover some basics in regards to optimizing your image builds and how docker images work. Some understanding of Dockerfiles and images are assumed in this talk.

Topic of Discussion

  • What are image layers? How do they work?
  • Tips for building minimal images
  • Dockerfile: Introducing multi-stage builds

Slide Deck (w/ speaker notes)

You can find the original presentation on the Nashville Docker github page.

Docker Images

Docker images are a collection of layers, which include metadata and files, in most cases generated from issuing a docker build command.

What are Layers?

Each line in a Dockerfile is executed in sequential order, in the event there are changes to the filesystem layer those contents are written to a r/w layer which is assigned a random UUID. Once the command is finish executing the contents are committed as an immutable layer and the digest is calculated. This digest is a MD5 hash of the contents of that layer.

Sample caption Sample caption
  • Layers are composed and then digested
  • Each digested layer is given a signature
  • Signatures govern integrity and security
  • More layers, more problems (not really, just more disk space - usually)

To learn more about images, I encourage you to read the official documentation covering this subject matter.

Reducing Layer Clutter

There are several tactics that can be deployed to reduce the size and number of your layers. We’ll cover some of the most widely regarded methods and give real-world examples of the image size reductions.

Let’s go over a fairly normal looking Dockerfile, step-by-step, and discuss how we might optimize it.

Unoptimized Dockerfile

Dockerfile
 1 2 3 4 5 6 7 8 91011121314151617181920
FROM    ubuntu:latest

RUN     apt-get update -y && apt-get install -y curl gnupg nginx
RUN     curl -sL https://deb.nodesource.com/setup_6.x | bash - 
RUN     apt-get install nodejs
RUN     npm install npm@latest -g

WORKDIR /root/node-app

COPY    node-app/. .

RUN     npm install
RUN     npm run build:prod
RUN     cp -R ./dist/* /usr/share/nginx/html

WORKDIR /usr/share/nginx/html

COPY    entrypoint.sh /usr/local/bin/

ENTRYPOINT [ "entrypoint.sh" ]

In this image we’re instructing the Docker image builder to perform the following actions.

Line Description
1 Fetch the latest ubuntu image from dockerhub and use it for our base image
3 Fetch software dependencies that the build process will need to install/compile our application
4 Download the install script for the debian nodejs repository
5-6 Install nodejs && install the latest version of npm
8-10 Establish our working directory, which will then copy our source files into this directory
12-14 Installs npm dependencies, compiles our node app, and copies the build target
16 Sets our working directory to the folder which our web app will be served from
18 Copies the server startup script to a directory in the system PATH
20 This command instructs docker to execute this script on bootup. This will launch our nginx web server

There are a quite a number of optimizations we can make to this Dockerfile. First, we’ll cover the size of the base image and the image we’ve built above.

Image Size Results

  • base image: 117 mb
  • our image: 555 mb
image            tag                 size
----------------------------------------------------------------------
ubuntu           latest ?            117 MB
    \_ ubuntu, nginx
node-app         0.1.0-ubuntu        555 MB
    \_ ubuntu, nginx, nodejs+npm, app

The base image we started with ubuntu:latest is 117mb, after we’ve installed nginx, nodejs, npm, app dependencies (node modules) and our application we’ve published an image that weighs in at 555mb.

This could become problematic if we’re building/pushing images this size on a regular basis. If your host does not do a sweep of stale images you might find yourself out of disk space in a matter of days.

Reducing Base Image size

There are much lighter base images available on docker hub from which you may compose your docker images. The most notable of slim images is alpine. We’ll discuss the pros and cons of alpine and offer some alternative base images as well.

nginx            1.13.0-alpine      15.5 MB
    \_ alpine, nginx

The base image for alpine with nginx is 15.5mb. This is a pretty big difference from our ubuntu base image. There are some caveats that come with using alpine.

  • alpine linux uses slimmed down / less functional binaries.
  • alpine linux is built on musl libc (rather than glibc)

Going into detail on what that stuff means is beyond the scope of our conversation, but I’d encourage anybody to google that stuff. It could be a show-stopper for your organization.

When should I use a full base OS

This is a loaded question. What is a full base os? Which components constitute a “full” OS. Is it dev tooling, package managers, debug tools, 9 different shells? Most base operating systems for linux don’t include gcc, build tools, debug tools. They do include package managers, and sometimes multiple shells.

The point I’m getting at here is you can find fairly minimal distributions of debian, centos, ubuntu, etc. out there. They’ve been trimmed down to only include the bare minimum. You might also consider developing your application with a full base image but then slimming down to alpine once you go to day ½ operations.

There are some reasons you might choose a fully loaded base image. I’ve observed the following cases where they’ve been justified.

  • Support
    • Have a contract with RedHat? Why would you install CentOS?
  • Security / Compliance
    • Distro specific security hardening practices
    • SOX says you run Ubuntu, you run Ubuntu
  • Ease of Use
    • Familiarity with package managers

Optimized Dockerfile

Dockerfile
 1 2 3 4 5 6 7 8 9101112131415161718
FROM    nginx:1.13.0-alpine

RUN     apk add --update nodejs && npm install npm@4.5.0 -g && \
        rm -rf /var/cache/apk/*

WORKDIR /root/node-app
COPY    node-app/package.json .
RUN     npm set progress=false && npm config set depth 0 && \
        npm install --production

COPY    node-app/. .
RUN     npm run build:prod && cp -R dist/. /usr/share/nginx/html && \
        cd / && rm -rf /root/node-app

WORKDIR /usr/share/nginx/html
COPY    entrypoint.sh /usr/local/bin/

ENTRYPOINT [ "entrypoint.sh" ]
Line Description
1 We are now baseing our image off a prepackaged nginx image from alpine
3-4 Install specific version of npm and remove any cache files from apk --update
6 Sets our working directory for the compilation stage
7-10 This is a language specific optimization. We are copying the package.json before the rest of our application, fetching our dependencies, which ensures this layer will cache for any subsequent builds where package.json hasn’t changed
11-12 Copy the application source files, execute the build, and then clean up the source files afterwards
12-14 Installs npm dependencies, compiles our node app, and copies the build target
15 Sets our working directory to the folder which our web app will be served from
16 Copies the server startup script to a directory in the system PATH
18 This command instructs docker to execute this script on bootup. This will launch our nginx web server

Image Size Results

  • optimized image: 291 MB
  • previous image: 555 MB
image            tag                 size
----------------------------------------------------------------------
node-app         0.1.0-alpine        291 MB
    \_ alpine, nginx, nodejs+npm, app
node-app         0.1.0-debian        555 MB
    \_ debian, nginx, nodejs+npm, app

With minimal changes and even though this image still has image layers with unnecessary information we’ve seen a 48% improvement. Our application doesn’t need nodejs/npm or node_modules to serve the compiled assets. We’ve traded off cache efficiency for image size, which may be unacceptable to you.

Multistage Dockerfile

Dockerfile
 1 2 3 4 5 6 7 8 910111213141516171819
FROM    alpine:3.6 AS build

WORKDIR /root/node-app

RUN     apk add --update nodejs && npm install npm@4.5.0 -g

COPY    node-app/package.json .
RUN     npm set progress=false && npm config set depth 0 && \
        npm install

COPY    node-app/. .
RUN     npm run build:prod

FROM    nginx:1.13.0-alpine

COPY    --from=build /root/node-app/dist /usr/share/nginx/html
COPY    entrypoint.sh /usr/local/bin/

ENTRYPOINT [ "entrypoint.sh" ]
Line Description
1 Our base image starts off with a bare bones alpine image, we assign the image an alias which will be used later
5 Install node && specific version of npm (notice that we aren’t cleaning up here)
7-9 This is a language specific optimization. We are copying the package.json before the rest of our application, fetching our dependencies, which ensures this layer will cache for any subsequent builds where package.json hasn’t changed
11-12 Copy the application source files, execute the build (again, notice we’re not cleaning up after the build)
14 What is this? Another FROM ??? IS THIS LEGAL!?? (it is, we are instructing docker to start a new image from this point)
16 This command instructs docker to fetch the files from a path /root/node-app/dist in the first image build
17-19 Copies the server startup script to a directory in the system PATH and instruct docker to execute this script on bootup. This will launch our nginx web server

Image Size Results

  • multistage image: 20.4 MB
  • optimized image: 291 MB
  • un-optimized image: 555 MB

By changing our base image and utilizing multi-stage builds we’ve seen a 96% reduction in image size. DevOps engineers could achieve something similar prior to multi-stage builds but it was a painful and messy process. This might be an extreme case but image builds which collect dependencies, compile, and ship dynamic languages this type of efficiency can be expected.

The presentation this blog post was derived from has several language specific hints/tips/tricks. I’ll make an effort to create additional blog posts with some best practices regarding the new multistage builds and languages such as Ruby, PHP, Python, Go, and Java.

Thank you for reading! :)

Tags: Docker Multistage Dockerfile

How We Deleted All Our Base Images

comments powered by Disqus