Dockerfile build with setuptools - how to avoid full rebuild when files change

Jankapunkt
January 24, 2024
255 views
0 votes
2 Answers

If have to dockerize an existing project that uses setuptools for building from a setup.py file, instead of requirements.txt.

This build includes large binary downloads (pytorch, fast-whisper) and after the build at runtime an initial download of the corresponding models. Alltogether ~10GB.

Problem

In order to get the build correctly done I need to COPY the files before the installation, which results in a rebuild every time I change a file of the source code.

If I only copy the setup.py for installation, there will be the package missing, as detailed described in another question’s answer.

Dockerfile example

FROM python:3.11-slim

WORKDIR /app

RUN apt update && 
    apt install -y --no-install-recommends git ffmpeg curl

COPY setup.py /app

# this is the problem:
# if I move this line behind the next line,
# the build will result in an incomplete package
# but if I keep it here, all the following
# layers will not be cached and the 
# downloads will run again
COPY mypackage /app/mypackage

# runs setuptools and installs deps,
# including 2.2GB pytorch 
RUN pip install ./ --extra-index-url https://download.pytorch.org/whl/cu118

# downloads ~8GB of models
RUN ["mypackage", "init"]

# I would love to move COPY of the project
# files to this position

CMD ["mypackage", "start"]

Content of the `setup.py` file

from setuptools import setup, find_packages
from distutils.util import convert_path
import platform

system = platform.system()
if system in ["Windows","Linux"]:
    torch = "torch==2.0.0+cu118"
if system == "Darwin":
    torch = "torch==2.0.0"

main_ns = {}
ver_path = convert_path('mypackage/version.py')
with open(ver_path) as ver_file:
    exec(ver_file.read(), main_ns)

setup(
    name='aTrain',
    version=main_ns['__version__'],
    readme="README.md",
    license="LICENSE",
    python_requires=">=3.10",
    install_requires=[
        torch,
        "torchaudio==2.0.1",
        "faster-whisper>=0.8",
        "transformers",
        "ffmpeg-python>=0.2",
        "pandas",
        "pyannote.audio==3.0.0",
        "Flask==2.3.2",
        "pywebview==4.2.2",
        "flaskwebgui",
        "screeninfo==0.8.1",
        "wakepy==0.7.2",
        "show-in-file-manager==1.1.4"
    ],
    packages=find_packages(),
    include_package_data=True,
    entry_points={
        'console_scripts': ['mypackage = mypackage:cli',]
    }
)

I am still new to all this and I wonder what options I have to avoid downloading all the

Answers

You could use another base image, which already comes with packages you need, for example https://hub.docker.com/r/pytorch/pytorch

# dockerfile
FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime

WORKDIR /app
[...]

You could try to install the install_requires first.
Here you find a description how layers work (https://docs.docker.com/build/cache/).

# dockerfile
FROM python:3.11-slim

WORKDIR /app

RUN apt update && 
    apt install -y --no-install-recommends git ffmpeg curl

COPY requirements.txt setup.py . # you can use . here since you already changed workdir

# install deps - this is the task which takes a long time
RUN pip install -r requirements.txt

COPY mypackage ./mypackage

# runs setuptools
# including 2.2GB pytorch 
RUN pip install ./ --extra-index-url https://download.pytorch.org/whl/cu118

# downloads ~8GB of models
RUN ["mypackage", "init"]

# I would love to move COPY of the project
# files to this position

CMD ["mypackage", "start"]

# requirements.txt
torch
torchaudio==2.0.1
faster-whisper>=0.8
transformers
ffmpeg-python>=0.2
pandas
pyannote.audio==3.0.0
Flask==2.3.2
pywebview==4.2.2
flaskwebgui
screeninfo==0.8.1
wakepy==0.7.2
show-in-file-manager==1.1.4

- DavidMaze
- January 24, 2024 at 11:53 am
- 0 votes
0
It’s easy enough to use pip freeze to create a requirements file. Outside of Docker, create a virtual environment, install your application into it, run pip freeze, and commit the resulting file to source control.
```
python -m venv ./venv
. ./venv/bin/activate
pip install .
pip freeze > requirements.txt
git add requirements.txt
git commit -m 'create lock file'
```
The difference between setup.py and requirements.txt here is that requirements.txt will always contain an exact version of every package your application uses, directly or indirectly. Your setup.py has a couple of version ranges and a couple of packages with no version constraints.

In the Dockerfile, then, you can first install the requirements file, and then install the rest of the application package. pip install has a --no-deps option to avoid installing dependencies, which makes sense in this specific case, since you’ve already done that step.
```
WORKDIR /app

# Install package dependencies
COPY requirements.txt ./
RUN pip install -r requirements.txt

# Copy in the rest of the application
COPY setup.py ./
COPY mypackage/ ./mypackage/
RUN pip install --no-deps .
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Dockerfile build with setuptools – how to avoid full rebuild when files change

Problem

Dockerfile example

Content of the setup.py file

Answers

Content of the `setup.py` file