skip to Main Content

If have to dockerize an existing project that uses setuptools for building from a setup.py file, instead of requirements.txt.

This build includes large binary downloads (pytorch, fast-whisper) and after the build at runtime an initial download of the corresponding models. Alltogether ~10GB.

Problem

In order to get the build correctly done I need to COPY the files before the installation, which results in a rebuild every time I change a file of the source code.

If I only copy the setup.py for installation, there will be the package missing, as detailed described in another question’s answer.

Dockerfile example

FROM python:3.11-slim

WORKDIR /app

RUN apt update && 
    apt install -y --no-install-recommends git ffmpeg curl

COPY setup.py /app

# this is the problem:
# if I move this line behind the next line,
# the build will result in an incomplete package
# but if I keep it here, all the following
# layers will not be cached and the 
# downloads will run again
COPY mypackage /app/mypackage

# runs setuptools and installs deps,
# including 2.2GB pytorch 
RUN pip install ./ --extra-index-url https://download.pytorch.org/whl/cu118

# downloads ~8GB of models
RUN ["mypackage", "init"]

# I would love to move COPY of the project
# files to this position

CMD ["mypackage", "start"]

Content of the setup.py file

from setuptools import setup, find_packages
from distutils.util import convert_path
import platform

system = platform.system()
if system in ["Windows","Linux"]:
    torch = "torch==2.0.0+cu118"
if system == "Darwin":
    torch = "torch==2.0.0"

main_ns = {}
ver_path = convert_path('mypackage/version.py')
with open(ver_path) as ver_file:
    exec(ver_file.read(), main_ns)

setup(
    name='aTrain',
    version=main_ns['__version__'],
    readme="README.md",
    license="LICENSE",
    python_requires=">=3.10",
    install_requires=[
        torch,
        "torchaudio==2.0.1",
        "faster-whisper>=0.8",
        "transformers",
        "ffmpeg-python>=0.2",
        "pandas",
        "pyannote.audio==3.0.0",
        "Flask==2.3.2",
        "pywebview==4.2.2",
        "flaskwebgui",
        "screeninfo==0.8.1",
        "wakepy==0.7.2",
        "show-in-file-manager==1.1.4"
    ],
    packages=find_packages(),
    include_package_data=True,
    entry_points={
        'console_scripts': ['mypackage = mypackage:cli',]
    }
)

I am still new to all this and I wonder what options I have to avoid downloading all the

2

Answers


    1. You could use another base image, which already comes with packages you need, for example https://hub.docker.com/r/pytorch/pytorch
    # dockerfile
    FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime
    
    WORKDIR /app
    [...]
    
    1. You could try to install the install_requires first.
      Here you find a description how layers work (https://docs.docker.com/build/cache/).
    # dockerfile
    FROM python:3.11-slim
    
    WORKDIR /app
    
    RUN apt update && 
        apt install -y --no-install-recommends git ffmpeg curl
    
    COPY requirements.txt setup.py . # you can use . here since you already changed workdir
    
    # install deps - this is the task which takes a long time
    RUN pip install -r requirements.txt
    
    COPY mypackage ./mypackage
    
    # runs setuptools
    # including 2.2GB pytorch 
    RUN pip install ./ --extra-index-url https://download.pytorch.org/whl/cu118
    
    # downloads ~8GB of models
    RUN ["mypackage", "init"]
    
    # I would love to move COPY of the project
    # files to this position
    
    CMD ["mypackage", "start"]
    
    # requirements.txt
    torch
    torchaudio==2.0.1
    faster-whisper>=0.8
    transformers
    ffmpeg-python>=0.2
    pandas
    pyannote.audio==3.0.0
    Flask==2.3.2
    pywebview==4.2.2
    flaskwebgui
    screeninfo==0.8.1
    wakepy==0.7.2
    show-in-file-manager==1.1.4
    
    Login or Signup to reply.
  1. It’s easy enough to use pip freeze to create a requirements file. Outside of Docker, create a virtual environment, install your application into it, run pip freeze, and commit the resulting file to source control.

    python -m venv ./venv
    . ./venv/bin/activate
    pip install .
    pip freeze > requirements.txt
    git add requirements.txt
    git commit -m 'create lock file'
    

    The difference between setup.py and requirements.txt here is that requirements.txt will always contain an exact version of every package your application uses, directly or indirectly. Your setup.py has a couple of version ranges and a couple of packages with no version constraints.

    In the Dockerfile, then, you can first install the requirements file, and then install the rest of the application package. pip install has a --no-deps option to avoid installing dependencies, which makes sense in this specific case, since you’ve already done that step.

    WORKDIR /app
    
    # Install package dependencies
    COPY requirements.txt ./
    RUN pip install -r requirements.txt
    
    # Copy in the rest of the application
    COPY setup.py ./
    COPY mypackage/ ./mypackage/
    RUN pip install --no-deps .
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search