Saturday, May 11, 2019

Testing Lua "classes" speed

"Testing" is a bit too strong word for what I've done here, but numbers are still interesting.

I developed a "smart" reverse proxy recently where I decided to use OpenResty platform - it's basically Nginx + Lua + goodies. Lua is the first class language so theoretically you can implement anything with it.

After the couple of weeks I spent with Lua it strongly reminds me JavaScript 5 - while it's a complete language, it's very "raw" in a sense that while it has constructs to do anything there is no standard (as in "industry-standard") way to do many things, classes being one of them. Having a strong Python background I'm used to spend my time mostly on business logic and not googling around to find best 3rd-party set/dict/etc. implementation. Many praise Lua's standard library asceticism (which reminds me similar sentiments in JS 5 days), but most of the time I get paid to create products, not tools. Also, lack of uniform way to do common tasks results in quite non-uniform code-base.

Having said the above, I chose OpenResty. I already had Nginx deployed, so switching to OpenResty was a natural extension. It was exactly what I was looking for - a scriptable proxy - which is OpenResty's primary goal as a project. I didn't want to take a generic web-server and write middleware/plugin for it - it sounded a bit too adventurous and risky from security perspective. So getting back to JS 5 days using niche language like Lua was a good trade-off.

Eventually I liked Lua. There is a special cuteness to it - I often find myself smiling while reading Lua code. Particularly it provided a great relief from Nginx IF evilness I used before.

Let's get to the point of this post, should we? While imbuing my proxy with some logic I decided to check which class-like approaches in Lua is the fastest. I ended up with 3 contenders:

  • Metatables
  • Closures
  • pl.class - part of the excellent PenLight Lua library that aims to complement Lua with Python-inspired data types and utilities. This class implementation is also metatable-based but involves more internal boilerplate to support, e.g. inheritance.

I implemented class to test object member access, method invocation, and method chaining. The code is in the gist.

Let's run it

I used LuaJIT 2.1.0-beta3 that is supplied with the latest OpenResty docker image. pl.class documents two ways to define a class, hence I had two versions to see if there is any difference.

Initialization speed


Func:       815,112,512 calls/sec
Metatable:  815,737,335 calls/sec
Closure:      2,459,325 calls/sec
PLClass1:     1,536,435 calls/sec
PLClass2:     1,545,817 calls/sec

Initialization + call speed


Metatable:  816,309,204 calls/sec
Closure:      2,104,911 calls/sec
PLClass1:     1,390,997 calls/sec
PLClass2:     1,453,514 calls/sec

We can see that Metatable is as fast as our baseline plain func. Also with metatable, invoke does not affect speed - probably JIT is doing amazing job here (considering the code is trivial and predictable).

Closures are much slower and invocation has cost. penlight.Class, while most syntactically rich, is the slowest one and also takes hit from invocation.

Conclusions

Being myself a casual Lua developer, I prefer Closure approach:

  • It promotes composition
  • Easy to understand - no implicit self var
  • More importantly, it's unambiguous to use - no one needs to think whether you access something by dot or colon

Again, I'm casual Lua developer. Had I spent more time within I assume my brain would adjust to things like implicit self and may be my self-recommendation would change.

For pure speed metatable is the way, though I wonder what difference it will make in real application (time your assumptions).

Out of curiosity, I did similar tests in Python (where there is one sane way to write this code). The results were surprising:

CPython3.7


Benchmarking init
Func:      18,378,052 ops/sec
Class:      4,760,040 ops/sec
Closure:    2,825,914 ops/sec
Benchmarking init+ivnoke
Class:      1,742,217 ops/sec
Closure:    1,549,709 ops/sec

PyPy3.6-7.1.1:


Benchmarking init
Func:   1,076,386,157 ops/sec
Class:    247,935,234 ops/sec
Closure:  189,527,406 ops/sec
Benchmarking init+invoke
Class:  1,073,107,020 ops/sec
Closure:  175,466,657 ops/sec

On CPython if you want to do anything with your classes beside initializing them, there is no much difference between Class and Closure. "Func" aside, its performance is on par with Lua.

PyPy just shines - its JIT outperforms Lua JIT by a far cry. The fact that speed of init+invoke on Class is similar to raw Func benchmark tells something about their ability to trace code that does nothing :)

On the emotional side

Don't believe benchmarks - lies, damn lies, and benchmarks :)

Seriously though, before thinking "why didn't they embed Python", other aspects should be contemplated:

  • Memory. Lua uses much less of it. Array of 10 million strings 10 byte each weighs 400mb in Lua while 700+mb in CPython/PyPy.
  • Python was a synchronous language originally with async support introduced much later. Nginx is an async server, hence Lua fits there more naturally, but I'm speculating here.
  • Everyone says that Lua is much easier to embed.
Finally, both can do amazing things through FFI.

Friday, February 1, 2019

A warning about JSON serialization

I added caching capabilities for one of my projects by using aiocache with JSON serializer. While doing that I came over strange issue where I was putting {1: "a"} in cache, but received {"1": "a"} on retrieval - the 1 integer key came back as "1" string. First I thought it's an bug in aiocache, but maintainer kindly pointed out that JSON, being Javascript Object Notation does not allow mapping keys to be non-strings.

However there is a point here that's worth paying attention to - it looks like JSON libraries, at least in Python and Chrome/Firefox, will happily accept {1: "a"} for encoding but will convert keys to strings. This may lead to quite subtle bugs as in my earlier example - cache hits will return data different to original.


>>> import json
>>> json.dumps({1:"a"})
'{"1": "a"}'
>>> json.loads('{1:"a"}')
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/.../lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/home/.../lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/.../lib/python3.6/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Tuesday, October 2, 2018

How Google turned a simple feature into configuration disaster

A grain of frustration with a light in the end of the tunnel.

Back in a while G Suite had a simple way to create email distribution lists - in the admin panel you could simply create a group of users, e.g. support@example.com, add a couple, decide whether users outside of your organization can send an email to the group address, and you are done.

Today I tried to do the same - to create a distribution list - with the current version of G Suite for Business, which ended up in hours of time effort and a lengthy conversation with G Suit support.

First I tried to create a group and to send an email to its address - nope, does not work:

Grou-what? A Google Group? I don't want no Google Group! I want, you know, a distribution list!

Clicking on "Access Settings" of the group properties in G Suite leads to a particular settings page on... groups.google.com. Just one of 21(!) other setting pages! My day schedule didn't include a ramp up on Google Groups, so I hooked into support chat straight away. The Google guy explained to me that with G Suite Business the only option is to configure a particular Google Group to behave as a distribution list.

First we worked on enabling people outside of the organization to post (send emails) to the group. Once we configured the setting I was about to jump away to test it, but the guy told me, citing:

Zaar Hai: OK. Saving and testing. Can you please hold on? 
G Suite Support, Jay: Wait. 
Zaar Hai: OK :)
G Suite Support, Jay: That is not going to work right away.
G Suite Support, Jay: We have what we called propagation.
G Suite Support, Jay: You need to wait for 24 hours propagation for the changes to take effect.
G Suite Support, Jay: Most of the time it works in less than 24 hours.
Zaar Hai: Seriously??
Zaar Hai: I thought I'm dealing with Google... 
G Suite Support, Jay: Yes, we are Google.

24 hours! After soothing myself, I decided to give it a shot - it worked! The configuration propagated quite fast it seems.

It was still not a classic distribution list though, since all of the correspondence was archived and ready to be seen on groups.google.com. I didn't want this behaviour, so we kept digging. Eventually the guy asked me to reset the group to the Email list type:

The messages still got archived though, so we blamed it on the propagation and the guy advised me to come back next day if it still does not work.

Well, after taking 24 hours brake, it still didn't. I did a bit of settings exploration myself and found that there is dedicated toggle responsible for the message archiving. Turns out the reset does not untoggle it. Once disabled, it propagated within a minute.

That was a frustration part. Now the light - a guide on how to have a distribution list with G Suit.

How to configure a G Suite groups to behave like a distribution list

Step 1: Create a group

Create a group in G Suite admin console. If you need just an internal mailing list, that is for members only, and are fine with the message archiving, then you are done. If you need outside users to be able to send emails to it (like you probably do with, e.g. sales@example.com), then read on.

Step 2: Enabling external access

  • Go to groups.google.com.
  • Click on "My groups" and then on manage link under the name of the group in question
  • On the settings page, navigate to Permissions -> Basic permis... in the menu on the left
  • In the Post row drop-down select "Anyone on the web". Click Save and you should be done

This is almost a classic distribution list - we only need to disable archiving.

Step 3: Disable archiving

Eventually I discovered that archiving is controlled by the toggle located under Information -> Content control in the settings menu:

In my case, the change went into effect immediately.

Afterthoughts

  • Doing all of the above steps may be quite daunting on a system administrator that needs to manage many groups. Why not to have a shortcut right in G Suite admin console to make it easier?
  • 24h propagation period sounds like some blast from the past. The Google guy told me that any G Suite setting change can take up to 24 hours to take effet. Back to the future now, Google offers a distributed database with cross-continental ACID transactions, which makes me wonder about the reasons behind 24h propagation period.

Wednesday, September 26, 2018

Setting GCE snapshot labels en masse

I'm working on our cloud costs analysis and one the things to do here is to assign labels, e.g. service=ci to our resources. We massively use GCE PD snapshots for database backups and I want to label them as well, per service.

You can do it through:

  • Cloud console, max 200 snapshots at a time
  • gcloud, one at a time

    The problem is... I have thousands of snapshots to label. Running gcloud in a simple for loop takes up to several seconds per iteration, so the whole process would take a day. Therefore I crafted a script to do it in parallel, which, thanks for lesser known features of xargs, turned out to be really simple:

    
    #!/bin/bash
    
    set -e
    
    NAME=$1
    LABELS=$2
    
    JOBS=${JOBS:-10}
    
    if ! ([[ "$NAME" ]] && [[ "$LABELS" ]]); then
     echo "Usage: $0 <name substring> <labels>"
     echo "Label format: k1=v1,k2=v2"
     exit 1
    fi
    
    gcloud compute snapshots list \
          --filter "name~$NAME" \
          --format="table[no-heading](name)" | \
     xargs -I @ -P $JOBS -t gcloud compute snapshots update --update-labels=$LABELS @
    
    
    Note that each gcloud instance is in the air for several seconds and occupies ~80MB of RAM, i.e. running on 10 jobs can consume about 1GB of RAM easily. Obviously doing it through GCP APIs by writing a dedicated, say Python, code would not have that RAM issue, but it does not worth the effort in this case.
  • Friday, September 21, 2018

    Docker multi-stage builds for Python apps

    Previously I highly regarded Multi-Stage Docker Image build approach, though it was not immediately clear how to apply it to Python applications.

    In Python you install application dependencies and (preferably) the application itself using pip tool. When we run it during image build, pip just installs everything under /usr so there is no immediate way to copy artifacts (that is the app and its dependencies installed by pip) into the next build stage.

    The solution that I came up with is to coerce pip to install everything into a dedicated directory. There are many ways of doing so, but from my experiments I found installing with --user flag and properly setting PYTHONUSERBASE as the most convenient way to install both Python libraries and app binaries (e.g. entrypoint scripts).

    Eventually it's quite straight forward and I wonder why I didn't find any formal guides on this.

    One caveat I came along later is if you have packages already installed in system as part of pip/pipenv/setuptools dependencies, pip will not reinstall them under /pyroot, hence there will be missing dependencies in production image - this is the reason for using --ignore-installed flag.

    Without further ado, let's see how it can be done.

    Setup

    Lets use a sample Python Hello World project that contains a proper setup.py to install both the app's libs and the entrypoint script.

    Note: I urge you to use setup.py even if you don't plan to distribute your app. Simply copying your Python sources into docker image will eventually break - you may end up copying __pycache__ directories, tests, test fixtures, etc. Having a working setup.py makes it easy to use your app as an installable component in other apps/images.

    Let's setup our test environment:

    
    git clone git@github.com:haizaar/python-helloworld.git
    cd python-helloworld/
    # Add some artificial requirements to make the example more real
    echo pycrypto==2.6.1 > requirements.txt
    

    The Dockerfile

    All the "magic" is happening below. I've added inline comments to ease on reading.
    
    FROM alpine:3.8 AS builder
    
    ENV LANG C.UTF-8
    
    # This is our runtime
    RUN apk add --no-cache python3
    RUN ln -sf /usr/bin/pip3 /usr/bin/pip
    RUN ln -sf /usr/bin/python3 /usr/bin/python
    
    # This is dev runtime
    RUN apk add --no-cache --virtual .build-deps build-base python3-dev
    # Using latest versions, but pinning them
    RUN pip install --upgrade pip==19.0.1
    RUN pip install --upgrade setuptools==40.4.1
    
    # This is where pip will install to
    ENV PYROOT /pyroot
    # A convenience to have console_scripts in PATH
    ENV PATH $PYROOT/bin:$PATH
    ENV PYTHONUSERBASE $PYROOT
    
    # THE MAIN COURSE #
    
    WORKDIR /build
    
    # Install dependencies
    COPY requirements.txt ./
    RUN pip install --user --ignore-installed -r requirements.txt
    # Install our application
    COPY . ./
    RUN pip install --user .
    
    ####################
    # Production image #
    ####################
    FROM alpine:3.8 AS prod
    # This is our runtime, again
    # It's better be refactored to a separate image to avoid instruction duplication
    RUN apk add --no-cache python3
    RUN ln -sf /usr/bin/pip3 /usr/bin/pip
    RUN ln -sf /usr/bin/python3 /usr/bin/python
    
    ENV PYROOT /pyroot
    ENV PATH $PYROOT/bin:$PATH
    ENV PYTHONPATH $PYROOT/lib/python:$PATH
    # This is crucial for pkg_resources to work
    ENV PYTHONUSERBASE $PYROOT
    
    # Finally, copy artifacts
    COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
    # In most cases we don't need entry points provided by other libraries
    COPY --from=builder $PYROOT/bin/helloworld_in_python $PYROOT/bin/
    
    CMD ["helloworld_in_python"]
    

    Let's see that it works:

    
    $ docker build -t pyhello .
    $ docker run --rm -ti pyhello
    Hello, world
    

    As I mentioned before - it's really straight forward. So far I've managed to pack one of our real apps with the approach and it works well so far.

    Using pipenv?

    If you use pipenv, which I like a lot, you can happily apply the same approach. It's a bit tricky to coerce pipenv to install into a separate dir, but this command does the trick:
    
    # THE MAIN COURSE #
    
    WORKDIR /build
    
    # Install dependencies
    COPY Pipfile Pipfile.lock ./
    RUN PIP_USER=1 PIP_IGNORE_INSTALLED=1 pipenv install --system --deploy
    # Install our application
    COPY . ./
    RUN pip install --user .
    

    Tuesday, September 18, 2018

    Reducing docker image sizes

    Several years ago I started a new greenfield project. Based on the state of technology affairs back then, I decided to go full time with container technology - Docker and Kubernetes. We head dived into all of the new technologies and had our application started pretty fast. Back then the majority of Docker Library was based on Debian and it resulted in quite a large images - our average Python app container image weights about 700-1000MB. Finally the time has come to rectify it.

    Why do you care

    Docker images are not pulled too often and 1GB is not too big of a number in the age of clouds, so why do you care? Your mileage may vary, but these are our reasons:
    • Image pull speed - on GCE it takes about 30 seconds to pull 1GB image. While downloads when pulling from GCR are almost instant, extraction takes a notable amount of time. When a GKE node crashes and pods migrate to other nodes, the image pull time adds to your application downtime. To compare - pulling of 40MB coredns image from GCR takes only 1.3 seconds.
    • Disk space on GKE nodes - when you have lots of containers and update them often, you may end up with disk space pressure. Same goes for developers' laptops.
    • Deploying off-cloud - pulling gigabytes of data is no fun when you try that over saturated 4G network during a conference demo.

    Here are the strategies current available on the market.

    Use alpine based images

    Sounds trivial right? - they are around for quite some time already and the majority of the Docker Library has an -alpine variant. But not all alpine images were born the same:

    Docker Library alpine variant of Python:

    
    $ docker pull python3.6-alpine
    $ docker images python:3.6-alpine --format '{{.Size}}'
    74.2MB
    

    DIY alpine Python:

    
    $ cat Dockerfile
    FROM alpine:3.8
    RUN apk add --no-cache python3
    $ docker build -t alpython .
    $ docker images alpython --format '{{.Size}}'
    56.2MB
    

    This is %25 size reduction compared to Docker Library Python!

    Note: There is another "space-saving" project that provides a bit different approach - instead of providing a complete Linux distro, albeit a smaller one, they provide a minimal runtime base image for each Language. Have a look at Distroless.

    Avoid unnecessary layers

    It's quite natural to write your Dockerfile as follows:
    
    FROM alpine:3.8
    RUN apk add --no-cache build-base
    COPY hello.c /
    RUN gcc -Wall -o hello hello.c
    RUN apk del build-base
    RUN rm -rf hello.c
    

    It provides nice reuse of layers cache, e.g. if hello.c changes, then we can still reuse installation of build-base package from cache. There is one problem through - in the above example, the resulting image weights 157MB(!) through actual hello binary is just 10KB:

    
    $ cat hello.c 
    #include <stdio.h>
    #include <stdlib.h>
    
    int main() {
     printf("Hello world!\n");
     return EXIT_SUCCESS;
    }
    $ docker build -t layers .
    $ docker images layers --format '{{.Size}}'
    157MB
    $ docker run --rm -ti layers ls -lah /hello
    -rwxr-xr-x    1 root     root       10.4K Sep 18 10:45 /hello
    

    The reason is that each line in Dockerfile produces a new layer that constitutes a part of the image, even-through the final FS layout may not contain all of the files. You can see the hidden "convicts" using docker history:

    
    $ docker history layers
    IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
    8c85cd4cd954        16 minutes ago      /bin/sh -c rm -rf hello.c                       0B                  
    b0f981eae17a        17 minutes ago      /bin/sh -c apk del build-base                   20.6kB              
    5f5c41aaddac        17 minutes ago      /bin/sh -c gcc -Wall -o hello hello.c           10.6kB              
    e820eacd8a70        18 minutes ago      /bin/sh -c #(nop) COPY file:380754830509a9a2…   104B                
    0617b2ee0c0b        18 minutes ago      /bin/sh -c apk add --no-cache build-base        153MB               
    196d12cf6ab1        6 days ago          /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B                  
    <missing>           6 days ago          /bin/sh -c #(nop) ADD file:25c10b1d1b41d46a1…   4.41MB       
    

    The last - a missing one - is our base image, the third line from the top is our binary and the rest is just junk.

    Squash those squishy bugs!

    You can build docker images with --squash flag. What is does is essentially leaving your image with just two layers - the one you started FROM; and another one that contains only files that are visible in a resulting FS (minus the FROM image).

    It plays nice with layer cache - all intermediate images are still cached, so building similar docker images will yield in a high cache hit. A small catch - it's still considered experimental, though the feature available since Docker 1.13 (Jan 2017). To enable it, run your dockerd with --experimental or add "experimental": true to your /etc/docker/daemon.json. I'm also not sure about its support for SaaS container builders, but you can always spin your own docker daemon.

    Lets see it in action:

    
    # Same Dockerifle as above
    $ docker build --squash -t layers:squashed
    $ docker images layers:squashed --format '{{.Size}}'
    4.44MB
    

    This is exactly our alpine image with 10KB of hello binary:

    
    $ docker inspect layers:squashed | jq '.[].RootFS.Layers'  # Just two layers as promised
    [
      "sha256:df64d3292fd6194b7865d7326af5255db6d81e9df29f48adde61a918fbd8c332",
      "sha256:5b55011753b4704fdd9efef0ac8a56e51a552b237238af1ba5938e20e019f440"
    ]
    $ mkdir /tmp/img && docker save layers:squashed | tar -xC /tmp/img; du -hsc /tmp/img/*
    52K /tmp/img/118227640c4bf55636e129d8a2e1eaac3e70ca867db512901b35f6247b978cdd
    4.5M /tmp/img/1341a124286c4b916d8732b6ae68bf3d9753cbb0a36c4c569cb517456a66af50
    4.0K /tmp/img/712000f83bae1ca16c4f18e025c0141995006f01f83ea6d9d47831649a7c71f9.json
    4.0K /tmp/img/manifest.json
    4.0K /tmp/img/repositories
    4.6M total
    

    Neat!

    Nothing is perfect though. Squashing your layers reduces potential for reusing them when pulling images. Consider the following:

    
    $ cat Dockerfile
    FROM alpine:3.8
    RUN apk add --no-cache python3
    RUN apk add --no-cache --virtual .build-deps build-base openssl-dev python3-dev
    RUN pip3 install pycrypto==2.6.1
    RUN apk del .build-deps
    COPY my.py /  # Just one "import Crypto" line
    
    $ docker build -t mycrypto .
    $ docker build --squash -t mycrypto:squashed .
    $ docker images mycrypto
    REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE
    mycrypto            squashed            9a1e85fa63f0        11 seconds ago       58.6MB
    mycrypto            latest              53b3803aa92f        About a minute ago   246MB
    

    The difference is very positive - comparing the basic Python Alpine image I have built earlier, the squashed one here is just 2 megabytes larger. The squashed image has, again, just two layers: alpine base and the rest of our Python, pycrypto, and our code squashed.

    And here is the downside: If you have 10 such Python apps on your Docker/Kubernetes host, you are going to download and store Python 10 times, and instead of having 1 alpine layer (2MB), one Python layer (~50MB) and 10 app layers (10x2MB) which is ~75MB, we end up with ~600MB.

    One way to avoid this is to use proper base images, e.g. instead of basing on alpine, we can build our own Python base image and work FROM it.

    Lets combine

    Another technique which is widely employed is combining RUN instructions to avoid "spilling over" unnecessary layers. I.e. the above docker can be rewritten as follows:
    
    $ cat Dockerfile-comb 
    FROM alpine:3.8
    RUN apk add --no-cache python3  # Other Python apps will reuse it
    RUN set -ex && \
     apk add --no-cache --virtual .build-deps build-base openssl-dev python3-dev && \
     pip3 install pycrypto==2.6.1 && \
     apk del .build-deps
    COPY my.py /
    
    $ docker build -f Dockerfile-comb -t mycrypto:comb .
    $ docker images mycrypto
    REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
    mycrypto            comb                4b89e6ea6f72        7 seconds ago       59MB
    mycrypto            squashed            9a1e85fa63f0        38 minutes ago      58.6MB
    mycrypto            latest              53b3803aa92f        39 minutes ago      246MB
    
    $ docker inspect  mycrypto:comb | jq '.[].RootFS.Layers'
    [
      "sha256:df64d3292fd6194b7865d7326af5255db6d81e9df29f48adde61a918fbd8c332",
      "sha256:f9ac7d1d908f7d2afb3c724bbd5845f034aa41048afcf953672dfefdb43582d0",
      "sha256:10c59ffc3c3cb7aefbeed9db7e2dc94a39e4896941e55e26c6715649bf6c1813",
      "sha256:f0ac8bc96a6b044fe0e9b7d9452ecb6a01c1112178abad7aa80236d18be0a1f9"
    ]
    

    The end result is similar to a squashed one and now we can control the layers.

    Downsides? There are some.

    One is a cache reuse, or lack thereof. Every single image will have to install build-base over and over. Consider some real example which has 70 lines long RUN instruction. You image may take 10 minutes to build and changing a single line in that huge instruction will start it all over.

    Second is that development experience is somewhat hackish - you resort from Dockerfile mastery to shell witchery. E.g. you can easily overlook a space character chat crept after trailing backslash. This increases development times and ups our frustration - we all are humans.

    Multi-stage builds

    This feature is so amazing that I wonder why it is not very famous. It seems like only hard-core docker builders are aware of it.

    The idea is to allow one image to borrow artifacts from another image. Lets apply it for the example that compiles C code:

    
    $ cat Dockerfile-multi 
    FROM alpine:3.8 AS builder
    RUN apk add --no-cache build-base
    COPY hello.c /
    RUN gcc -Wall -o hello hello.c
    RUN apk del build-base
    
    FROM alpine:3.8
    COPY --from=builder /hello /
    
    $ docker build -f Dockerfile-multi -t layers:multi .
    $ docker images layers
    REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE
    layers              multi               98329d4147f0        About a minute ago   4.42MB
    layers              squashed            712000f83bae        2 hours ago          4.44MB
    layers              latest              a756fa351578        2 hours ago          157MB
    

    That is, the size is as good as it gets (even a bit better, since our squashed variant still has couple of apk metadata left by). It works just great for toolchains that produce clearly distinguishable artifacts. Here is another (simplified) example for nodejs:

    
    FROM alpine:3.8 AS builder
    RUN apk add --no-cache nodejs
    COPY src /src
    WORKDIR /src
    RUN npm install
    RUN ./node_modules/.bin/jspm install
    RUN ./node_modules/.bin/gulp export  # Outputs to ./build
    
    FROM alpine:3.8
    RUN apk add --no-cache nginx
    COPY --from=builder /src/build /srv/www
    

    It's more tricky for other toolchains like Python where it's not immediately clear how to copy artifacts after pip-install'ing your app. The proper way to do it, for Python, it yet to be discovered (for me).

    I will not describe other perks of this feature since Docker's documentation on the subject is quite verbose.

    Conclusion

    As you can probably tell there is no one ultimate method to rule them all. Alpine images are no-brainer; multi-stage provides nice & clean separation, but I lack RUN --from=...; squashing has its trade-offs; and humongous RUN instructions are still a necessary evil.

    We use multi-stage approach for our nodejs images and mega-RUNs for Python ones. When I find a clean way to extract pip's artifacts, I will definitely move to multi-stage builds there as well.

    Monday, August 6, 2018

    Running docker multi-stage builds on GKE

    I recently worked on reducing docker image sizes for our applications and one of the approaches is to use docker multi-stage builds. It all worked well on my dev machine, but then I shoved new Dockerfiles to CI and and it all shuttered complaining that our docker server is way too old.

    The thing is that GKE K8s nodes still use docker server v17.03, even on the latest K8s 1.10 they have available. If you like us run your Jenkins on GKE as well, and use K8s node's docker server for image builds, then this GKE lag will bite you one day.

    There is a solution though - run your own docker server and make Jenkins to use it. Fortunately the community thought about it before and official docker images for docker itself include -dind flavour which stands for Docker-In-Docker.

    Our Jenkins talked to host's docker server through /var/run/docker.sock that was mounted from host. Now instead we run DInD as a deployment and talk to it through GCP:

    
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: dind
    spec:
      replicas: 1
      strategy:
        type: Recreate
      template:
        metadata:
          labels:
            component: dind
        spec:
          containers:
          - name: dind
            image: docker:18.06.0-ce-dind
            env:
            - name: DOCKER_HOST
              value: tcp://0.0.0.0:2375
            args:
              - dockerd
              - --storage-driver=overlay2
              - -H tcp://0.0.0.0:2375
            ports:
            - name: http
              containerPort: 2375
            securityContext:
              privileged: true
            volumeMounts:
            - name: varlibdocker
              mountPath: /var/lib/docker
            livenessProbe:
              httpGet:
                path: /v1.38/info
                port: http
            readinessProbe:
              httpGet:
                path: /v1.38/info
                port: http
          volumes:
          - name: varlibdocker
            emptyDir: {}
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: dind
      labels:
        component: dind
    spec:
      selector:
        component: dind
      ports:
      - name: http
        targetPort: http
        port: 2375
    

    After loading it into your cluster you can add the following environment variable to your Jenkins containers: DOCKER_HOST=tcp://dind:2375 and verify that you are now talking to your new & shiny docker server 18.06:

    
    root@jenkins-...-96d867487-rb5r8:/# docker version
    Client:
     Version: 17.12.0-ce
     API version: 1.35
     Go version: go1.9.2
     Git commit: c97c6d6
     Built: Wed Dec 27 20:05:38 2017
     OS/Arch: linux/amd64
    
    Server:
     Engine:
      Version: 18.06.0-ce
      API version: 1.38 (minimum version 1.12)
      Go version: go1.10.3
      Git commit: 0ffa825
      Built: Wed Jul 18 19:13:39 2018
      OS/Arch: linux/amd64
      Experimental: false
    

    Caveat: the setup I'm describing uses emptyDir to store built docker images and cache, i.e. restarting pod will empty the cache. It's good enough for my needs, but you may consider using PV/PVC for persistence, which on GKE is trivial to setup. Using emptyDir will also consume disk space from you K8s node - something to watch for if you don't have an automatic job that purges older images.

    Another small bonus of this solution that now running docker images on your Jenkins pod will only return images you have built. Previously this list would also include images of container that currently run on the node.