skip to Main Content

Current Problem

I am trying to productionize a Python application. This is not easy because Python does not support a way of creating a single compiled binary from the source code. Furthermore Python does not support a natural way of creating shared libraries. I will explain this in further detail in the following section.

Python and Shared Libraries

Most languages offer some relatively convenient way of productionizing code. This usually involves:

  • Creating some shared libraries containing code which is common to a project
  • Creating executable files which use the shared libraries (and maybe other libraries)

Examples:

  • Java has a way to bundle code into a .jar file. There are tools like Maven and Gradle to assist with the process of deployment and bundling.
  • With languages like C++ and C, typically one builds binary libraries and binary executable files. These can then be deployed. There is a way to link a binary executable such that it contains all the shared library code.
  • Rust has Cargo which typically builds individual binary executables containing all relevant code.

Python works a bit differently, because the interpreter process only knows about the current working directory path and any paths which have been added to the PYTHONPATH environment variable.

  • Modifying the PYTHONPATH environment variable isn’t generally advisable as it doesn’t scale well and results in an additional overhead in maintaining its value.

This means that either;

  • Python executables must be a single file and there can be no shared code between executables
  • or, any shared code has to go in a libary which must be placed in the same current working directory as the executables themselves

This suggests a project structure like this:

my-project/
  bin/
    executable1.py
    executable2.py
    ...
    lib1/
      __init__.py
      ...
    lib2/
      __init__.py
      ...

Clearly we don’t build projects like this. Putting all the shared library code in the same directory as executable files is already weird enough. The stucture also breaks if we want to have shared library code and multiple directories under bin for different "groups" of executables. For example, it is not possible to create two subdirectories under bin called group1 and group2 and share common Python code in a shared library used by both groups.

This is why I say Python does not support shared libraries. You can build a shared library, but you need to use some other tool to do it. (Unless you resort to modifying PYTHONPATH. Let’s assume we don’t want to do that.)

Solutions to Python Shared Libraries Problems

In reality, we would use a tool like virtualenv (venv) or Poetry (which essentially manages virtualenvs) to allow us to defer common library code to some other directory, and to enable the Python interpreter to find it.

My current workflow situation

Until now, I have been using venv in interactive mode to develop Python software.

This means that I have a project structure like this

my-project
  .venv/...
  bin/
    ...
  src/
    lib1/...
    lib2/...
  pyproject.toml

and I have been using the venv in interactive mode:

$ .venv/bin/activate
$ pip3 install -e .

This is great for development, because if the library code under src is changed, those changes show up "live". (Meaning there is no bundle, install step required. Just run the executable and the "current" code is in use.)

It isn’t so good for deployment.

Why Docker?

My first approach to trying to "install" a production ready version of the code was to do the following:

  • Use systemd to manage processes (starting and stopping)
  • Copy the executable code from the bin folder to some "deployment" location on the system (for example /opt/my-project/bin/)
  • Build a wheel (.whl) file
  • Install the whl system-wide

I did not get as far as the final two steps, as I realized this was probably not a good approach. There are some problems with this:

  • The point of using a virtual environment was to avoid having to install packages with pip system wide. It therefore doesn’t make a lot of sense to create a whl and install that system wide.
  • I also do not know how I would use a virtual environment in this context. I described copying the executable Python files to some directory like /opt which is (obviously) outside of the "development" directory which contains the .venv.
  • This whole idea doesn’t seem to make a lot of sense

This leads me to believe using Docker results in a much more sensible approach. We can install a whl system-wide inside a Docker container.

  • On reflection, I actually don’t think there is a sensible way to productionize and deploy Python code without using Docker. If anyone has any thoughts on that it would be great to get some feedback.

The Docker Approach and Current Issues

I don’t currently understand how to create a Docker image and how to install a whl file into it.

Here’s how I think the process should work:

  • The first step: Somehow, I do not know how exactly, the shared library code should be built into a whl
  • The Dockerfile should copy the executable Python files and whl file into a container image
  • The Dockerfile should install the whl and then delete it

The reason I am confused is that I am using Poetry to manage the virtual envs for the development process (these are not needed in the production Docker image) and I do not understand how to split the virtual env development process from the Docker container production process – if that makes any sense?

Here’s what I have in a Dockerfile right now:

from python:3.12-bookworm
run pip install poetry # do I need this?
workdir /opt/docker-python-poetry-example
copy dist/*.whl /opt/docker-python-poetry-example
run pip install *.whl
run rm /opt/docker-python-poetry-example/*.whl
copy bin /opt/docker-python-poetry-example/bin
run chmod +x /opt/docker-python-poetry-example/bin/*
copy pyproject.toml poetry.lock /opt/docker-python-poetry-example/
env PATH="/opt/docker-python-poetry-example/bin:${PATH}"
run poetry config virtualenvs.create false
run poetry install --no-dev --no-interaction --no-ansi

cmd poetry run example_executable.py

Here’s what I have in my pyproject.toml:

[tool.poetry]
name = "docker-python-poetry-example"
version = "0.1.0"
description = ""
authors = ["Example <[email protected]>"]
readme = "README.md"

[tool.poetry.scripts]
example-executable = 'bin.example_executable:main'

[tool.poetry.dependencies]
python = "^3.11"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

At this point I have become a bit lost and I am hoping someone may be able to clear up the situation so that I can understand how I should be approaching this problem.

If the question isn’t completely clear please leave a comment and I will try to provide further clarification.

2

Answers


  1. Chosen as BEST ANSWER

    I think it may be simpler than I origionally thought.

    I think Poetry is only needed outside of the Docker container, and that inside the container, the container has no use of it.

    This is what I have come up with.

    Firstly, the sequence of commands which must be run:

    # build the .whl using Poetry
    poetry build
    
    # build the Docker container from the Dockerfile
    docker build -t example-container .
    

    Here's the updated Dockerfile:

    from python:3.12-bookworm
    workdir /opt/docker-python-poetry-example
    
    # copy the whl, and install it system-wide, inside the Docker container
    copy dist/*.whl /opt/docker-python-poetry-example
    run pip install *.whl
    run rm /opt/docker-python-poetry-example/*.whl
    
    # copy the executable "binaries" (Python scripts), add this dir to `PATH`
    copy bin /opt/docker-python-poetry-example/bin
    run chmod +x /opt/docker-python-poetry-example/bin/*
    env PATH="/opt/docker-python-poetry-example/bin:${PATH}"
    
    # entry point
    cmd example_executable.py
    

    It seems to work.

    Here's why I think this is a sensible approach:

    • We want to distribute some Python code (libraries)
    • Normally this would be done by building a wheel, which can be uploaded to pypy for others to use, or just copied to a target machine to be installed with pip
    • In this case, we bundle the wheel and executable Python "binaries" together inside a Docker container image
    • The Docker image is easy to distribute, start and stop. It is a self contained "thing" or "package"

    What I think the shortcomings are:

    • There way library code is distributed and binaries are distributed is different
    • There should ideally be a unified approach where both code elements are treated the same way
    • Libaries are distributed via a wheel
    • Binary code is just copied over

    I personally think the requirement of treating libraries and binaries differently is a shortcoming.

    • The manual step of updating the PATH is perhaps also a shortcoming. It might be better to install the binaries somewhere else?

  2. how to create a Docker image and how to install a whl file into it.

    You install, build, do everything inside the docker image. You only copy the source code, on a fresh, unmodified repository without any new files. The idea is so that it is self-contained, reproducible and research other buzzwords.

    FROM python
    COPY . .
    RUN pip install -e .
    

    Done. Usually install requirements.txt first to cache dependencies in docker layers for faster builds.

    I have no idea about poetry, pip is the standard tool that I use. For poetry, editable install according to docs is done with poetry add --editable ..

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search