skip to Main Content

Packages installed by poetry significantly increases the image size when it’s built for amd64.

I’m building a docker image on my host machine(MacOS, M2 Pro), which I want to deploy to an EC2 instance. Normal build will make an image size of 2GB, which is good. But it will result in system compatibility issue when deployed on EC2: WARNING: The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested. So I am trying a build with buildx command. However, it results in whopping 13GB, even though all I changed was a build command. I’d like to know why and how to reduce the size.

Here is the Dockerfile:

FROM python:3.11-slim

# for -slim version (it breaks if you don't comment out && apt-get clean)
RUN apt-get update && apt-get install -y 
    gfortran 
    libopenblas-dev 
    liblapack-dev 
    && apt-get clean 
    && rm -rf /var/lib/apt/lists/*

# Set environment variables to make Python and Poetry play nice
ENV POETRY_VERSION=1.7.1 
    PYTHONUNBUFFERED=1 
    PYTHONDONTWRITEBYTECODE=1 
    EXPERIMENT_ID=$EXPERIMENT_ID 
    RUN_ID=$RUN_ID


## Install poetry
RUN pip install "poetry==$POETRY_VERSION"

## copy project requirement files here to ensure they will be cached.
WORKDIR /app

COPY pyproject.toml ./

RUN poetry config virtualenvs.create false 
    && poetry install --no-interaction --no-dev --no-ansi  --verbose 
    && poetry cache clear pypi --all

With this pyproject.toml, you can reproduce the build.

[tool.poetry]
name = "malicious-url"
version = "0.1.0"
description = ""
authors = ["Makoto1021 <[email protected]>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.11"
numpy = "^1.26.4"
tld = "^0.13"
fuzzywuzzy = "^0.18.0"
scikit-learn = "^1.4.1.post1"
pandas = "^2.2.1"
mlflow = {extras = ["pipelines"], version = "^2.11.3"}
xgboost = "^2.0.3"
python-dotenv = "^1.0.1"
imblearn = "^0.0"
torch = "^2.2.2"
flask = "^3.0.3"
googlesearch-python = "^1.2.3"
whois = "^1.20240129.2"
nltk = "^3.8.1"

[tool.poetry.group.dev.dependencies]
ipykernel = "^6.29.3"
tldextract = "^5.1.2"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

And this command will build a 2GB image.

docker build -f ./docker/Dockerfile 
-t malicious-url-prediction-img:v1 .

And this will make a 13GB image

docker buildx build --platform linux/amd64 -f ./docker/Dockerfile 
-t malicious-url-prediction-img:v1-amd64 .

The image size stays small if I remove the RUN command poetry config virtualenvs.create..., even if I build the image for amd64. So I assume that poetry is causing this problem. However, it still is weird to have such a big difference in size by just changing the build context.

I went inside the huge image with /bin/bash and du command, and I found out that the total used space is just around 2MB. So it indicates that the large Docker image size issue might not be directly related to the files I’m adding to the image but rather to the base image and the layers created by package installations and configurations.

I also used dive to investigate the space usage, but doesn’t really tell what’s wrong.
enter image description here

enter image description here

I have a feeling that it’s coming from poetry Any advice?

FYI, this is how I run the container.

docker run --rm -p 7070:5000 -v $(pwd)/logs:/app/logs malicious-url-prediction-img:v1-amd64

EDITED:

  • changed Dockerfile to minimal example
  • added myproject.toml to reproduce the build
  • added my investigation on poetry

3

Answers


  1. The problem is builds.

    When you run AMD64 chances area there are already precompiled wheels you can use, so there’s no need to build anything. When you’re using ARM64 there are often a lot less wheels available, requiring you to build the project.

    Builds take up space- they download libraries, compile artifacts, and link things together.

    What you should do instead is use a multi stage build. Run the installation in one stage of the build process, and then move your created files into a new container.

    For example (this isn’t tested, but is based on your file and should work with some tweaks):

    FROM python:3.11-slim
    
    # for -slim version (it breaks if you don't comment out && apt-get clean)
    RUN apt-get update && apt-get install -y 
        gfortran 
        libopenblas-dev 
        liblapack-dev 
        && apt-get clean 
        && rm -rf /var/lib/apt/lists/*
    
    # Set environment variables to make Python and Poetry play nice
    ENV POETRY_VERSION=1.7.1 
        PYTHONUNBUFFERED=1 
        PYTHONDONTWRITEBYTECODE=1 
        EXPERIMENT_ID=$EXPERIMENT_ID 
        RUN_ID=$RUN_ID
    
    
    ## Install poetry
    RUN pip install "poetry==$POETRY_VERSION"
    
    ## copy project requirement files here to ensure they will be cached.
    WORKDIR /app
    
    COPY poetry.lock pyproject.toml ./
    
    RUN poetry config virtualenvs.create false 
        && poetry install --no-interaction --no-dev --no-ansi  --verbose 
        && poetry cache clear pypi --all
    
    
    # Build our actual container now.
    FROM python:3.11-slim
    
    ENV POETRY_VERSION=1.7.1 
        PYTHONUNBUFFERED=1 
        PYTHONDONTWRITEBYTECODE=1 
        EXPERIMENT_ID=$EXPERIMENT_ID 
        RUN_ID=$RUN_ID
    
    
    # Copy all of the python files built in the Builder container into this smaller container.
    COPY --from=Builder /usr/local/lib/python3.11 /usr/local/lib/python3.11
    
    # Copy the model directory to the Docker image
    COPY mlartifacts/542535033067401306/2ba405d6d3144db6a5cd237509e53ef8 /app/mlartifacts/542535033067401306/2ba405d6d3144db6a5cd237509e53ef8/
    
    # Copy the blacklist and whitelist data
    COPY data/blacklist.txt data/whitelist.txt /app/data/
    
    ## Copy Python files
    COPY *.py .
    COPY utils/ ./utils/
    COPY .env .
    
    EXPOSE 7070
    
    CMD ["poetry", "run", "flask", "run", "--host=0.0.0.0"]
    

    This method is used for all of the Multi-Py projects (disclaimer: I’m the author of these), so you can look there for examples.

    Login or Signup to reply.
  2. Some ISAs may produce smaller/larger builds. But increase amount in size in your case isn’t normal. Here I propose an small refactoring in your Dockerfile.

    I use this trick allways, but I didn’t test here.

    You need gfortran and some other packages to build your python packages. After build, they aren’t necessary and you can uninstall them.

    RUN apt install -y gfortran liblapack liblapack-dev libopenblas libopenblas-dev
    RUN build your packages
    RUN apt purge gfortran lapack-dev openblas-dev
    RUN apt autoremove --purge
    RUN other-cleanups
    

    But above codes does not reduce your docker images, since build dependencies exists in above layers. You must change above code to:

    RUN apt install -y gfortran liblapack liblapack-dev libopenblas libopenblas-dev ; 
        build your packages ; 
        apt purge gfortran lapack-dev openblas-dev ; 
        apt autoremove --purge
    

    This will decrease your image size considerably.

    apt, pip and … (apt by default, deletes downloaded .deb files however)downloads files in cache directories. You can add --mount=type=cache to your RUN argument to cache them in seperate cache volumes, not in your images.

    So here is your refactored Dockerfile (I add liblapack and libopenblas so they must stay after apt autoremove):

    FROM python:3.11-slim
    
    ENV POETRY_VERSION=1.7.1 
        PYTHONUNBUFFERED=1 
        PYTHONDONTWRITEBYTECODE=1 
        EXPERIMENT_ID=$EXPERIMENT_ID 
        RUN_ID=$RUN_ID
    
    WORKDIR /app
    COPY poetry.lock pyproject.toml ./
    
    RUN --mount=type=cache,target=/var/cache/apt 
        --mount=type=cache,target=/root/.cache/pip 
        --mount=type=cache,target=/root/.cache/pypoetry/cache 
        set -eux ; 
        apt update && apt install -y gfortran libopenblas libopenblas-dev liblapack liblapack-dev ; 
        pip install "poetry==$POETRY_VERSION" ; 
        poetry install --no-interaction --no-dev --no-ansi  --verbose ; 
        poetry cache clear pypi --all ; 
        apt purge gfortran libopenblas-dev liblapack-dev ; 
        apt autoremove --purge ; 
        rm -rf /var/lib/apt/lists/*
    
    COPY mlartifacts/542535033067401306/2ba405d6d3144db6a5cd237509e53ef8 /app/mlartifacts/542535033067401306/2ba405d6d3144db6a5cd237509e53ef8/
    COPY data/blacklist.txt data/whitelist.txt /app/data/
    COPY *.py .
    COPY utils/ ./utils/
    COPY .env .
    
    EXPOSE 7070
    
    CMD ["poetry", "run", "flask", "run", "--host=0.0.0.0"]
    
    Login or Signup to reply.
  3. the difference between the arm64 and amd64 image size comes down to a conditional architecture-dependent dependency

    torch on amd64 pulls in a series of nvidia* dependencies to support gpu development — but does not pull those in on other architectures: https://github.com/pytorch/pytorch/blob/7a6edb0b6644eb2b28650ea3be1c806e4a57e351/.github/workflows/generated-linux-binary-manywheel-main.yml#L51

    nvidia-cuda-nvrtc-cu11==11.8.89; platform_system == 'Linux' and platform_machine == 'x86_64' ...
    

    you can see this reflected in the wheel metadata from pypi:

    $ curl --silent https://files.pythonhosted.org/packages/c3/33/d7a6123231bd4d04c7005dde8507235772f3bc4622a25f3a88c016415d49/torch-2.2.2-cp311-cp311-manylinux1_x86_64.whl.metadata | grep platform_system
    Requires-Dist: nvidia-cuda-nvrtc-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cuda-runtime-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cuda-cupti-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cudnn-cu12 (==8.9.2.26) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cublas-cu12 (==12.1.3.1) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cufft-cu12 (==11.0.2.54) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-curand-cu12 (==10.3.2.106) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cusolver-cu12 (==11.4.5.107) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cusparse-cu12 (==12.1.0.106) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-nccl-cu12 (==2.19.3) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-nvtx-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: triton (==2.2.0) ; platform_system == "Linux" and platform_machine == "x86_64" and python_version < "3.12"
    

    and in the aarch64 wheel as well:

    $ curl --silent https://files.pythonhosted.org/packages/cd/fd/2121f53629c433589273a2e8f71d29705e98024e0abe2360e63b852e80bb/torch-2.2.1-cp311-cp311-manylinux2014_aarch64.whl.metadata | grep platform_system
    Requires-Dist: nvidia-cuda-nvrtc-cu12 ==12.1.105 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cuda-runtime-cu12 ==12.1.105 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cuda-cupti-cu12 ==12.1.105 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cudnn-cu12 ==8.9.2.26 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cublas-cu12 ==12.1.3.1 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cufft-cu12 ==11.0.2.54 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-curand-cu12 ==10.3.2.106 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cusolver-cu12 ==11.4.5.107 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-cusparse-cu12 ==12.1.0.106 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-nccl-cu12 ==2.19.3 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: nvidia-nvtx-cu12 ==12.1.105 ; platform_system == "Linux" and platform_machine == "x86_64"
    Requires-Dist: triton ==2.2.0 ; platform_system == "Linux" and platform_machine == "x86_64" and python_version < "3.12"
    

    these dependencies are not pulled in on arm64 (aarch64):

    # du -hs /usr/local/lib/python3.11/site-packages/{triton,nvidia}
    420M    /usr/local/lib/python3.11/site-packages/triton
    2.8G    /usr/local/lib/python3.11/site-packages/nvidia
    

    (there may be more as well, I only traced torch which I’m familiar with its dependency problem as I couldn’t build your docker image since I ran out of disk space!)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search