Current Problem
I am trying to productionize a Python application. This is not easy because Python does not support a way of creating a single compiled binary from the source code. Furthermore Python does not support a natural way of creating shared libraries. I will explain this in further detail in the following section.
Python and Shared Libraries
Most languages offer some relatively convenient way of productionizing code. This usually involves:
- Creating some shared libraries containing code which is common to a project
- Creating executable files which use the shared libraries (and maybe other libraries)
Examples:
- Java has a way to bundle code into a
.jar
file. There are tools like Maven and Gradle to assist with the process of deployment and bundling. - With languages like C++ and C, typically one builds binary libraries and binary executable files. These can then be deployed. There is a way to link a binary executable such that it contains all the shared library code.
- Rust has Cargo which typically builds individual binary executables containing all relevant code.
Python works a bit differently, because the interpreter process only knows about the current working directory path and any paths which have been added to the PYTHONPATH
environment variable.
- Modifying the
PYTHONPATH
environment variable isn’t generally advisable as it doesn’t scale well and results in an additional overhead in maintaining its value.
This means that either;
- Python executables must be a single file and there can be no shared code between executables
- or, any shared code has to go in a libary which must be placed in the same current working directory as the executables themselves
This suggests a project structure like this:
my-project/
bin/
executable1.py
executable2.py
...
lib1/
__init__.py
...
lib2/
__init__.py
...
Clearly we don’t build projects like this. Putting all the shared library code in the same directory as executable files is already weird enough. The stucture also breaks if we want to have shared library code and multiple directories under bin
for different "groups" of executables. For example, it is not possible to create two subdirectories under bin
called group1
and group2
and share common Python code in a shared library used by both groups.
This is why I say Python does not support shared libraries. You can build a shared library, but you need to use some other tool to do it. (Unless you resort to modifying PYTHONPATH
. Let’s assume we don’t want to do that.)
Solutions to Python Shared Libraries Problems
In reality, we would use a tool like virtualenv (venv) or Poetry (which essentially manages virtualenvs) to allow us to defer common library code to some other directory, and to enable the Python interpreter to find it.
My current workflow situation
Until now, I have been using venv in interactive mode to develop Python software.
This means that I have a project structure like this
my-project
.venv/...
bin/
...
src/
lib1/...
lib2/...
pyproject.toml
and I have been using the venv in interactive mode:
$ .venv/bin/activate
$ pip3 install -e .
This is great for development, because if the library code under src
is changed, those changes show up "live". (Meaning there is no bundle, install step required. Just run the executable and the "current" code is in use.)
It isn’t so good for deployment.
Why Docker?
My first approach to trying to "install" a production ready version of the code was to do the following:
- Use
systemd
to manage processes (starting and stopping) - Copy the executable code from the
bin
folder to some "deployment" location on the system (for example/opt/my-project/bin/
) - Build a wheel (
.whl
) file - Install the
whl
system-wide
I did not get as far as the final two steps, as I realized this was probably not a good approach. There are some problems with this:
- The point of using a virtual environment was to avoid having to install packages with
pip
system wide. It therefore doesn’t make a lot of sense to create awhl
and install that system wide. - I also do not know how I would use a virtual environment in this context. I described copying the executable Python files to some directory like
/opt
which is (obviously) outside of the "development" directory which contains the.venv
. - This whole idea doesn’t seem to make a lot of sense
This leads me to believe using Docker results in a much more sensible approach. We can install a whl
system-wide inside a Docker container.
- On reflection, I actually don’t think there is a sensible way to productionize and deploy Python code without using Docker. If anyone has any thoughts on that it would be great to get some feedback.
The Docker Approach and Current Issues
I don’t currently understand how to create a Docker image and how to install a whl
file into it.
Here’s how I think the process should work:
- The first step: Somehow, I do not know how exactly, the shared library code should be built into a
whl
- The
Dockerfile
should copy the executable Python files andwhl
file into a container image - The
Dockerfile
should install thewhl
and then delete it
The reason I am confused is that I am using Poetry to manage the virtual envs for the development process (these are not needed in the production Docker image) and I do not understand how to split the virtual env development process from the Docker container production process – if that makes any sense?
Here’s what I have in a Dockerfile
right now:
from python:3.12-bookworm
run pip install poetry # do I need this?
workdir /opt/docker-python-poetry-example
copy dist/*.whl /opt/docker-python-poetry-example
run pip install *.whl
run rm /opt/docker-python-poetry-example/*.whl
copy bin /opt/docker-python-poetry-example/bin
run chmod +x /opt/docker-python-poetry-example/bin/*
copy pyproject.toml poetry.lock /opt/docker-python-poetry-example/
env PATH="/opt/docker-python-poetry-example/bin:${PATH}"
run poetry config virtualenvs.create false
run poetry install --no-dev --no-interaction --no-ansi
cmd poetry run example_executable.py
Here’s what I have in my pyproject.toml
:
[tool.poetry]
name = "docker-python-poetry-example"
version = "0.1.0"
description = ""
authors = ["Example <[email protected]>"]
readme = "README.md"
[tool.poetry.scripts]
example-executable = 'bin.example_executable:main'
[tool.poetry.dependencies]
python = "^3.11"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
At this point I have become a bit lost and I am hoping someone may be able to clear up the situation so that I can understand how I should be approaching this problem.
If the question isn’t completely clear please leave a comment and I will try to provide further clarification.
2
Answers
I think it may be simpler than I origionally thought.
I think Poetry is only needed outside of the Docker container, and that inside the container, the container has no use of it.
This is what I have come up with.
Firstly, the sequence of commands which must be run:
Here's the updated
Dockerfile
:It seems to work.
Here's why I think this is a sensible approach:
What I think the shortcomings are:
I personally think the requirement of treating libraries and binaries differently is a shortcoming.
PATH
is perhaps also a shortcoming. It might be better to install the binaries somewhere else?You install, build, do everything inside the docker image. You only copy the source code, on a fresh, unmodified repository without any new files. The idea is so that it is self-contained, reproducible and research other buzzwords.
Done. Usually install requirements.txt first to cache dependencies in docker layers for faster builds.
I have no idea about poetry,
pip
is the standard tool that I use. For poetry, editable install according to docs is done withpoetry add --editable .
.