Several years ago I started a new greenfield project. Based on the state of technology affairs back then, I decided to go full time with container technology - Docker and Kubernetes. We head dived into all of the new technologies and had our application started pretty fast. Back then the majority of Docker Library was based on Debian and it resulted in quite a large images - our average Python app container image weights about 700-1000MB. Finally the time has come to rectify it.
Why do you care
Docker images are not pulled too often and 1GB is not too big of a number in the age of clouds, so why do you care? Your mileage may vary, but these are our reasons:- Image pull speed - on GCE it takes about 30 seconds to pull 1GB image. While downloads when pulling from GCR are almost instant, extraction takes a notable amount of time. When a GKE node crashes and pods migrate to other nodes, the image pull time adds to your application downtime. To compare - pulling of 40MB coredns image from GCR takes only 1.3 seconds.
- Disk space on GKE nodes - when you have lots of containers and update them often, you may end up with disk space pressure. Same goes for developers' laptops.
- Deploying off-cloud - pulling gigabytes of data is no fun when you try that over saturated 4G network during a conference demo.
Here are the strategies current available on the market.
Use alpine based images
Sounds trivial right? - they are around for quite some time already and the majority of the Docker Library has an-alpine
variant. But not all alpine images were born the same:
Docker Library alpine variant of Python:
$ docker pull python3.6-alpine
$ docker images python:3.6-alpine --format '{{.Size}}'
74.2MB
DIY alpine Python:
$ cat Dockerfile
FROM alpine:3.8
RUN apk add --no-cache python3
$ docker build -t alpython .
$ docker images alpython --format '{{.Size}}'
56.2MB
This is %25 size reduction compared to Docker Library Python!
Note: There is another "space-saving" project that provides a bit different approach - instead of providing a complete Linux distro, albeit a smaller one, they provide a minimal runtime base image for each Language. Have a look at Distroless.
Avoid unnecessary layers
It's quite natural to write your Dockerfile as follows:
FROM alpine:3.8
RUN apk add --no-cache build-base
COPY hello.c /
RUN gcc -Wall -o hello hello.c
RUN apk del build-base
RUN rm -rf hello.c
It provides nice reuse of layers cache, e.g. if hello.c
changes, then we can still reuse installation of build-base
package from cache.
There is one problem through - in the above example, the resulting image weights 157MB(!) through actual hello
binary is just 10KB:
$ cat hello.c
#include <stdio.h>
#include <stdlib.h>
int main() {
printf("Hello world!\n");
return EXIT_SUCCESS;
}
$ docker build -t layers .
$ docker images layers --format '{{.Size}}'
157MB
$ docker run --rm -ti layers ls -lah /hello
-rwxr-xr-x 1 root root 10.4K Sep 18 10:45 /hello
The reason is that each line in Dockerfile produces a new layer that constitutes a part of the image, even-through the final FS layout may not contain all of the files.
You can see the hidden "convicts" using docker history
:
$ docker history layers
IMAGE CREATED CREATED BY SIZE COMMENT
8c85cd4cd954 16 minutes ago /bin/sh -c rm -rf hello.c 0B
b0f981eae17a 17 minutes ago /bin/sh -c apk del build-base 20.6kB
5f5c41aaddac 17 minutes ago /bin/sh -c gcc -Wall -o hello hello.c 10.6kB
e820eacd8a70 18 minutes ago /bin/sh -c #(nop) COPY file:380754830509a9a2… 104B
0617b2ee0c0b 18 minutes ago /bin/sh -c apk add --no-cache build-base 153MB
196d12cf6ab1 6 days ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 6 days ago /bin/sh -c #(nop) ADD file:25c10b1d1b41d46a1… 4.41MB
The last - a missing one - is our base image, the third line from the top is our binary and the rest is just junk.
Squash those squishy bugs!
You can build docker images with--squash
flag. What is does is essentially leaving your image with just two layers - the one you started FROM; and another one that contains only files that are visible in a resulting FS (minus the FROM image).
It plays nice with layer cache - all intermediate images are still cached, so building similar docker images will yield in a high cache hit. A small catch - it's still considered experimental, though the feature available since Docker 1.13 (Jan 2017). To enable it, run your dockerd with --experimental
or add "experimental": true
to your /etc/docker/daemon.json
. I'm also not sure about its support for SaaS container builders, but you can always spin your own docker daemon.
Lets see it in action:
# Same Dockerifle as above
$ docker build --squash -t layers:squashed
$ docker images layers:squashed --format '{{.Size}}'
4.44MB
This is exactly our alpine image with 10KB of hello
binary:
$ docker inspect layers:squashed | jq '.[].RootFS.Layers' # Just two layers as promised
[
"sha256:df64d3292fd6194b7865d7326af5255db6d81e9df29f48adde61a918fbd8c332",
"sha256:5b55011753b4704fdd9efef0ac8a56e51a552b237238af1ba5938e20e019f440"
]
$ mkdir /tmp/img && docker save layers:squashed | tar -xC /tmp/img; du -hsc /tmp/img/*
52K /tmp/img/118227640c4bf55636e129d8a2e1eaac3e70ca867db512901b35f6247b978cdd
4.5M /tmp/img/1341a124286c4b916d8732b6ae68bf3d9753cbb0a36c4c569cb517456a66af50
4.0K /tmp/img/712000f83bae1ca16c4f18e025c0141995006f01f83ea6d9d47831649a7c71f9.json
4.0K /tmp/img/manifest.json
4.0K /tmp/img/repositories
4.6M total
Neat!
Nothing is perfect though. Squashing your layers reduces potential for reusing them when pulling images. Consider the following:
$ cat Dockerfile
FROM alpine:3.8
RUN apk add --no-cache python3
RUN apk add --no-cache --virtual .build-deps build-base openssl-dev python3-dev
RUN pip3 install pycrypto==2.6.1
RUN apk del .build-deps
COPY my.py / # Just one "import Crypto" line
$ docker build -t mycrypto .
$ docker build --squash -t mycrypto:squashed .
$ docker images mycrypto
REPOSITORY TAG IMAGE ID CREATED SIZE
mycrypto squashed 9a1e85fa63f0 11 seconds ago 58.6MB
mycrypto latest 53b3803aa92f About a minute ago 246MB
The difference is very positive - comparing the basic Python Alpine image I have built earlier, the squashed one here is just 2 megabytes larger. The squashed image has, again, just two layers: alpine base and the rest of our Python, pycrypto, and our code squashed.
And here is the downside: If you have 10 such Python apps on your Docker/Kubernetes host, you are going to download and store Python 10 times, and instead of having 1 alpine layer (2MB), one Python layer (~50MB) and 10 app layers (10x2MB) which is ~75MB, we end up with ~600MB.
One way to avoid this is to use proper base images, e.g. instead of basing on alpine, we can build our own Python base image and work FROM it.
Lets combine
Another technique which is widely employed is combining RUN instructions to avoid "spilling over" unnecessary layers. I.e. the above docker can be rewritten as follows:
$ cat Dockerfile-comb
FROM alpine:3.8
RUN apk add --no-cache python3 # Other Python apps will reuse it
RUN set -ex && \
apk add --no-cache --virtual .build-deps build-base openssl-dev python3-dev && \
pip3 install pycrypto==2.6.1 && \
apk del .build-deps
COPY my.py /
$ docker build -f Dockerfile-comb -t mycrypto:comb .
$ docker images mycrypto
REPOSITORY TAG IMAGE ID CREATED SIZE
mycrypto comb 4b89e6ea6f72 7 seconds ago 59MB
mycrypto squashed 9a1e85fa63f0 38 minutes ago 58.6MB
mycrypto latest 53b3803aa92f 39 minutes ago 246MB
$ docker inspect mycrypto:comb | jq '.[].RootFS.Layers'
[
"sha256:df64d3292fd6194b7865d7326af5255db6d81e9df29f48adde61a918fbd8c332",
"sha256:f9ac7d1d908f7d2afb3c724bbd5845f034aa41048afcf953672dfefdb43582d0",
"sha256:10c59ffc3c3cb7aefbeed9db7e2dc94a39e4896941e55e26c6715649bf6c1813",
"sha256:f0ac8bc96a6b044fe0e9b7d9452ecb6a01c1112178abad7aa80236d18be0a1f9"
]
The end result is similar to a squashed one and now we can control the layers.
Downsides? There are some.
One is a cache reuse, or lack thereof. Every single image will have to install build-base
over and over. Consider some real example which has 70 lines long RUN instruction. You image may take 10 minutes to build and changing a single line in that huge instruction will start it all over.
Second is that development experience is somewhat hackish - you resort from Dockerfile mastery to shell witchery. E.g. you can easily overlook a space character chat crept after trailing backslash. This increases development times and ups our frustration - we all are humans.
Multi-stage builds
This feature is so amazing that I wonder why it is not very famous. It seems like only hard-core docker builders are aware of it.The idea is to allow one image to borrow artifacts from another image. Lets apply it for the example that compiles C code:
$ cat Dockerfile-multi
FROM alpine:3.8 AS builder
RUN apk add --no-cache build-base
COPY hello.c /
RUN gcc -Wall -o hello hello.c
RUN apk del build-base
FROM alpine:3.8
COPY --from=builder /hello /
$ docker build -f Dockerfile-multi -t layers:multi .
$ docker images layers
REPOSITORY TAG IMAGE ID CREATED SIZE
layers multi 98329d4147f0 About a minute ago 4.42MB
layers squashed 712000f83bae 2 hours ago 4.44MB
layers latest a756fa351578 2 hours ago 157MB
That is, the size is as good as it gets (even a bit better, since our squashed variant still has couple of apk metadata left by). It works just great for toolchains that produce clearly distinguishable artifacts. Here is another (simplified) example for nodejs:
FROM alpine:3.8 AS builder
RUN apk add --no-cache nodejs
COPY src /src
WORKDIR /src
RUN npm install
RUN ./node_modules/.bin/jspm install
RUN ./node_modules/.bin/gulp export # Outputs to ./build
FROM alpine:3.8
RUN apk add --no-cache nginx
COPY --from=builder /src/build /srv/www
It's more tricky for other toolchains like Python where it's not immediately clear how to copy artifacts after pip-install'ing your app. The proper way to do it, for Python, it yet to be discovered (for me).
I will not describe other perks of this feature since Docker's documentation on the subject is quite verbose.
Conclusion
As you can probably tell there is no one ultimate method to rule them all. Alpine images are no-brainer; multi-stage provides nice & clean separation, but I lackRUN --from=...
; squashing has its trade-offs; and humongous RUN instructions are still a necessary evil.
We use multi-stage approach for our nodejs images and mega-RUNs for Python ones. When I find a clean way to extract pip's artifacts, I will definitely move to multi-stage builds there as well.
No comments:
Post a Comment