Friday, September 21, 2018

Docker multi-stage builds for Python apps

Previously I highly regarded Multi-Stage Docker Image build approach, though it was not immediately clear how to apply it to Python applications.

In Python you install application dependencies and (preferably) the application itself using pip tool. When we run it during image build, pip just installs everything under /usr so there is no immediate way to copy artifacts (that is the app and its dependencies installed by pip) into the next build stage.

The solution that I came up with is to coerce pip to install everything into a dedicated directory. There are many ways of doing so, but from my experiments I found installing with --user flag and properly setting PYTHONUSERBASE as the most convenient way to install both Python libraries and app binaries (e.g. entrypoint scripts).

Eventually it's quite straight forward and I wonder why I didn't find any formal guides on this.

Without further ado, let's see how it can be done.

Setup

Lets use a sample Python Hello World project that contains a proper setup.py to install both the app's libs and the entrypoint script.

Note: I urge you to use setup.py even if you don't plan to distribute your app. Simply copying your Python sources into docker image will eventually break - you may end up copying __pycache__ directories, tests, tests fixtures, etc. Having a working setup.py makes it easy to use your app as an installable component in other apps/images.

Let's setup our test environment:


git clone git@github.com:haizaar/python-helloworld.git
cd python-helloworld/
# Add some artificial requirements to make the example more real
echo pycrypto==2.6.1 > requirements.txt

The Dockerfile

All the "magic" is happening below. I've added inline comments to ease on reading.

FROM alpine:3.8 AS builder

ENV LANG C.UTF-8

# This is our runtime
RUN apk add --no-cache python3
RUN ln -sf /usr/bin/pip3 /usr/bin/pip
RUN ln -sf /usr/bin/python3 /usr/bin/python

# This is dev runtime
RUN apk add --no-cache --virtual .build-deps build-base python3-dev
# Using latest versions, but pinning them
RUN pip install --upgrade pip==18.0
RUN pip install --upgrade setuptools==40.4.1

COPY . /build
WORKDIR /build

# This is where pip will install to
ENV PYROOT /pyroot
# A convenience to have console_scripts in PATH
ENV PATH $PYROOT/bin:$PATH

# THE MAIN COURSE #

# Install dependencies
RUN PYTHONUSERBASE=$PYROOT pip install --user -r requirements.txt
# Install our application
RUN PYTHONUSERBASE=$PYROOT pip install --user .

####################
# Production image #
####################
FROM alpine:3.8 AS prod
# This is our runtime, again
# It's better be refactored to a separate image to avoid instruction duplication
RUN apk add --no-cache python3
RUN ln -sf /usr/bin/pip3 /usr/bin/pip
RUN ln -sf /usr/bin/python3 /usr/bin/python

ENV PYROOT /pyroot
ENV PATH $PYROOT/bin:$PATH
ENV PYTHONPATH $PYROOT/lib/python:$PATH
# This is crucial for pkg_resources to work
ENV PYTHONUSERBASE $PYROOT

# Finally, copy artifacts
COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
# In most cases we don't need entry points provided by other libraries
COPY --from=builder $PYROOT/bin/helloworld_in_python $PYROOT/bin/

CMD ["helloworld_in_python"]

Let's see that it works:


$ docker build -t pyhello .
$ docker run --rm -ti pyhello
Hello, world

As I mentioned before - it's really straight forward. So far I've managed to pack one of our real apps with the approach and it works well so far.

Using pipenv?

If you use pipenv, which I like a lot, you can still apply this approach. Instead of doing pipenv install --system --deploy, do the following in your Dockerfile:

ENV REQS /tmp/requirements.txt
RUN pipenv lock -r > $REQS
RUN PYTHONUSERBASE=$PYROOT pip install --user -r $REQS