Friday, September 21, 2018

Docker multi-stage builds for Python apps

Previously I highly regarded Multi-Stage Docker Image build approach, though it was not immediately clear how to apply it to Python applications.

In Python you install application dependencies and (preferably) the application itself using pip tool. When we run it during image build, pip just installs everything under /usr so there is no immediate way to copy artifacts (that is the app and its dependencies installed by pip) into the next build stage.

The solution that I came up with is to coerce pip to install everything into a dedicated directory. There are many ways of doing so, but from my experiments I found installing with --user flag and properly setting PYTHONUSERBASE as the most convenient way to install both Python libraries and app binaries (e.g. entrypoint scripts).

Eventually it's quite straight forward and I wonder why I didn't find any formal guides on this.

One caveat I came along later is if you have packages already installed in system as part of pip/pipenv/setuptools dependencies, pip will not reinstall them under /pyroot, hence there will be missing dependencies in production image - this is the reason for using --ignore-installed flag.

Without further ado, let's see how it can be done.

Setup

Lets use a sample Python Hello World project that contains a proper setup.py to install both the app's libs and the entrypoint script.

Note: I urge you to use setup.py even if you don't plan to distribute your app. Simply copying your Python sources into docker image will eventually break - you may end up copying __pycache__ directories, tests, test fixtures, etc. Having a working setup.py makes it easy to use your app as an installable component in other apps/images.

Let's setup our test environment:


git clone git@github.com:haizaar/python-helloworld.git
cd python-helloworld/
# Add some artificial requirements to make the example more real
echo pycrypto==2.6.1 > requirements.txt

The Dockerfile

All the "magic" is happening below. I've added inline comments to ease on reading.

FROM alpine:3.8 AS builder

ENV LANG C.UTF-8

# This is our runtime
RUN apk add --no-cache python3
RUN ln -sf /usr/bin/pip3 /usr/bin/pip
RUN ln -sf /usr/bin/python3 /usr/bin/python

# This is dev runtime
RUN apk add --no-cache --virtual .build-deps build-base python3-dev
# Using latest versions, but pinning them
RUN pip install --upgrade pip==19.0.1
RUN pip install --upgrade setuptools==40.4.1

# This is where pip will install to
ENV PYROOT /pyroot
# A convenience to have console_scripts in PATH
ENV PATH $PYROOT/bin:$PATH
ENV PYTHONUSERBASE $PYROOT

# THE MAIN COURSE #

WORKDIR /build

# Install dependencies
COPY requirements.txt ./
RUN pip install --user --ignore-installed -r requirements.txt
# Install our application
COPY . ./
RUN pip install --user .

####################
# Production image #
####################
FROM alpine:3.8 AS prod
# This is our runtime, again
# It's better be refactored to a separate image to avoid instruction duplication
RUN apk add --no-cache python3
RUN ln -sf /usr/bin/pip3 /usr/bin/pip
RUN ln -sf /usr/bin/python3 /usr/bin/python

ENV PYROOT /pyroot
ENV PATH $PYROOT/bin:$PATH
ENV PYTHONPATH $PYROOT/lib/python:$PATH
# This is crucial for pkg_resources to work
ENV PYTHONUSERBASE $PYROOT

# Finally, copy artifacts
COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
# In most cases we don't need entry points provided by other libraries
COPY --from=builder $PYROOT/bin/helloworld_in_python $PYROOT/bin/

CMD ["helloworld_in_python"]

Let's see that it works:


$ docker build -t pyhello .
$ docker run --rm -ti pyhello
Hello, world

As I mentioned before - it's really straight forward. So far I've managed to pack one of our real apps with the approach and it works well so far.

Using pipenv?

If you use pipenv, which I like a lot, you can happily apply the same approach. It's a bit tricky to coerce pipenv to install into a separate dir, but this command does the trick:

# THE MAIN COURSE #

WORKDIR /build

# Install dependencies
COPY Pipfile Pipfile.lock ./
# --ignore-installed is vital to re-install packages that are already present
# (e.g. brought by pipenv dependencies) into $PYROOT
# Need to use pip eventually because of https://github.com/pypa/pipenv/issues/4453
RUN set -ex && \
    export HOME=/tmp && \
    pipenv lock -r | pip install --user --ignore-installed -r /dev/stdin
# Install our application
COPY . ./
RUN pip install --user .

7 comments:

  1. Thanks. This was very helpful for figuring out how to make Docker work with Pipenv. I do want to point out two errors (I think) in the pipenv instructions:

    Pipefile.lock should be Pipfile.lock (it took me a while to figure out why it kept saying the file didn't exist).

    Missing "RUN" command before pipenv install:
    RUN PIP_USER=1 PIP_IGNORE_INSTALLED=1 pipenv install --system --deploy

    ReplyDelete
    Replies
    1. Thanks for the remarks - fixed! I'm glad to hear it helped someone. I surely missed such guide when I was digging into it myself originally.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. I think this is the first time I have ever commented on an article, and I just wanted to say that it is awesome!

    ReplyDelete
    Replies
    1. I'm happy it was helpful and thanks for the feedback.

      Delete
  4. Hi, can you provide full example for pipenv. I can't get it working.

    ReplyDelete
    Replies
    1. What error are you getting exactly?
      Recent pipenv versions broke support for PIP_IGNORE_INSTALLED env var. I've fixed the example to address that.

      Delete