I am very new to Docker but enjoying the study. Recently I have tried to build an image for a simple ML app that I had build earlier. I have used python:3.11-slim base image and installed a few dependencies. After the final build, the image size turned out to be 1.13 GB. How is this happening?
Following is my Dockerfile:
FROM python:3.11-slim
EXPOSE 8080
ADD requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR /app
RUN apk del .build-deps
COPY . .
ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8080"]
=================================================================
Following is my requirements.txt:
joblib==1.4.2
numpy==1.26.4
scikit-learn==1.5.0
scipy==1.13.1
streamlit==1.35.0
xgboost==2.1.0
2
Answers
In general, dependencies that you install usually pull further dependencies along them.
In your example, XGBoost and scikit-learn are heavy packages that most likely pulled further binary files while installing.
I ran your Dockerfile and at first glance noticed this layer of the build:
It appears to be the main source of your large image.
Looking a bit closer, these are the top 3 largest pip dependencies after building the image:
Some of them are Sub-dependencies of your requirements.txt
You can check all the dependencies and its sizes by saving your output to separate buildinfo file:
Here we can see that only 2 dependencies took ~ 400mb.