If have to dockerize an existing project that uses setuptools for building from a setup.py
file, instead of requirements.txt
.
This build includes large binary downloads (pytorch, fast-whisper) and after the build at runtime an initial download of the corresponding models. Alltogether ~10GB.
Problem
In order to get the build correctly done I need to COPY
the files before the installation, which results in a rebuild every time I change a file of the source code.
If I only copy the setup.py
for installation, there will be the package missing, as detailed described in another question’s answer.
Dockerfile example
FROM python:3.11-slim
WORKDIR /app
RUN apt update &&
apt install -y --no-install-recommends git ffmpeg curl
COPY setup.py /app
# this is the problem:
# if I move this line behind the next line,
# the build will result in an incomplete package
# but if I keep it here, all the following
# layers will not be cached and the
# downloads will run again
COPY mypackage /app/mypackage
# runs setuptools and installs deps,
# including 2.2GB pytorch
RUN pip install ./ --extra-index-url https://download.pytorch.org/whl/cu118
# downloads ~8GB of models
RUN ["mypackage", "init"]
# I would love to move COPY of the project
# files to this position
CMD ["mypackage", "start"]
Content of the setup.py
file
from setuptools import setup, find_packages
from distutils.util import convert_path
import platform
system = platform.system()
if system in ["Windows","Linux"]:
torch = "torch==2.0.0+cu118"
if system == "Darwin":
torch = "torch==2.0.0"
main_ns = {}
ver_path = convert_path('mypackage/version.py')
with open(ver_path) as ver_file:
exec(ver_file.read(), main_ns)
setup(
name='aTrain',
version=main_ns['__version__'],
readme="README.md",
license="LICENSE",
python_requires=">=3.10",
install_requires=[
torch,
"torchaudio==2.0.1",
"faster-whisper>=0.8",
"transformers",
"ffmpeg-python>=0.2",
"pandas",
"pyannote.audio==3.0.0",
"Flask==2.3.2",
"pywebview==4.2.2",
"flaskwebgui",
"screeninfo==0.8.1",
"wakepy==0.7.2",
"show-in-file-manager==1.1.4"
],
packages=find_packages(),
include_package_data=True,
entry_points={
'console_scripts': ['mypackage = mypackage:cli',]
}
)
I am still new to all this and I wonder what options I have to avoid downloading all the
2
Answers
Here you find a description how layers work (https://docs.docker.com/build/cache/).
It’s easy enough to use
pip freeze
to create a requirements file. Outside of Docker, create a virtual environment, install your application into it, runpip freeze
, and commit the resulting file to source control.The difference between
setup.py
andrequirements.txt
here is thatrequirements.txt
will always contain an exact version of every package your application uses, directly or indirectly. Yoursetup.py
has a couple of version ranges and a couple of packages with no version constraints.In the Dockerfile, then, you can first install the requirements file, and then install the rest of the application package.
pip install
has a--no-deps
option to avoid installing dependencies, which makes sense in this specific case, since you’ve already done that step.