My Python based Docker image is 6.35GB. I tried multi stage build and several other options found while searching( like cache cleanup) Nothing helped.
I might be missing something really important.
# Use a smaller base image
FROM python:3.11.4-slim as compiler
# Set the working directory in the container
WORKDIR /app
RUN python -m venv /opt/venv
# Enable venv
ENV PATH="/opt/venv/bin:$PATH"
COPY ./requirements.txt /app/requirements.txt
#RUN pip install -Ur requirements.txt
RUN pip install pip==23.1.2
&& pip install -r requirements.txt
&& rm -rf /root/.cache
FROM python:3.11.4-slim as runner
WORKDIR /app/
COPY --from=compiler /opt/venv /opt/venv
# Enable venv
ENV PATH="/opt/venv/bin:$PATH"
COPY . /app/
# Expose the port the app runs on
EXPOSE 8501
# Define environment variables
ENV STREAMLIT_THEME_BASE dark
ENV STREAMLIT_THEME_SECONDARY_BACKGROUND_COLOR #3A475C
ENV STREAMLIT_THEME_BACKGROUND_COLOR #2d3748
# Run the Streamlit app
CMD ["streamlit", "run", "welcome.py"]
requirements.txt
ibm-generative-ai>=2.0.0
pydantic>=2.6.1
langchain>=0.1.0
streamlit==1.27.2
ray==2.7.1
chromadb>=0.4.14
python-dotenv==1.0.0
beautifulsoup4==4.12.2
sentence-transformers==2.2.2
ibm-watson>=6.1.0
markdown==3.5.2
ibm-generative-ai[langchain]
app folder contains multiple .py which does language model processing
pip list
to check size of individual packages
1.5G /app/venv/lib/python3.11/site-packages/torch
420M /app/venv/lib/python3.11/site-packages/triton
170M /app/venv/lib/python3.11/site-packages/ray
126M /app/venv/lib/python3.11/site-packages/pyarrow
86M /app/venv/lib/python3.11/site-packages/transformers
79M /app/venv/lib/python3.11/site-packages/pandas
73M /app/venv/lib/python3.11/site-packages/sympy
29M /app/venv/lib/python3.11/site-packages/kubernetes
25M /app/venv/lib/python3.11/site-packages/streamlit
24M /app/venv/lib/python3.11/site-packages/onnxruntime
23M /app/venv/lib/python3.11/site-packages/sqlalchemy
17M /app/venv/lib/python3.11/site-packages/networkx
16M /app/venv/lib/python3.11/site-packages/pip
15M /app/venv/lib/python3.11/site-packages/langchain
14M /app/venv/lib/python3.11/site-packages/torchvision
14M /app/venv/lib/python3.11/site-packages/nltk
14M /app/venv/lib/python3.11/site-packages/altair
12M /app/venv/lib/python3.11/site-packages/uvloop
12M /app/venv/lib/python3.11/site-packages/tokenizers
9.3M /app/venv/lib/python3.11/site-packages/pydeck
9.0M /app/venv/lib/python3.11/site-packages/pygments
6.7M /app/venv/lib/python3.11/site-packages/setuptools
5.9M /app/venv/lib/python3.11/site-packages/aiohttp
5.6M /app/venv/lib/python3.11/site-packages/pydantic_core
5.3M /app/venv/lib/python3.11/site-packages/watchfiles
5.1M /app/venv/lib/python3.11/site-packages/mpmath
5.0M /app/venv/lib/python3.11/site-packages/safetensors
4.3M /app/venv/lib/python3.11/site-packages/tornado
3.7M /app/venv/lib/python3.11/site-packages/chromadb
3.6M /app/venv/lib/python3.11/site-packages/pydantic
3.5M /app/venv/lib/python3.11/site-packages/regex
3.0M /app/venv/lib/python3.11/site-packages/sentencepiece
2.8M /app/venv/lib/python3.11/site-packages/tzdata
2.8M /app/venv/lib/python3.11/site-packages/pytz
2.6M /app/venv/lib/python3.11/site-packages/joblib
2.5M /app/venv/lib/python3.11/site-packages/rich
2.5M /app/venv/lib/python3.11/site-packages/msgpack
2.5M /app/venv/lib/python3.11/site-packages/greenlet
2.4M /app/venv/lib/python3.11/site-packages/bcrypt
1.7M /app/venv/lib/python3.11/site-packages/fsspec
1.5M /app/venv/lib/python3.11/site-packages/oauthlib
1.4M /app/venv/lib/python3.11/site-packages/fastapi
1.3M /app/venv/lib/python3.11/site-packages/jinja2
1.2M /app/venv/lib/python3.11/site-packages/yarl
1.1M /app/venv/lib/python3.11/site-packages/websockets
1.1M /app/venv/lib/python3.11/site-packages/pyasn1
1.1M /app/venv/lib/python3.11/site-packages/jsonschema
1.1M /app/venv/lib/python3.11/site-packages/httptools
1.0M /app/venv/lib/python3.11/site-packages/anyio
1004K /app/venv/lib/python3.11/site-packages/urllib3
932K /app/venv/lib/python3.11/site-packages/frozenlist
860K /app/venv/lib/python3.11/site-packages/click
820K /app/venv/lib/python3.11/site-packages/markdown
804K /app/venv/lib/python3.11/site-packages/httpx
788K /app/venv/lib/python3.11/site-packages/httpcore
780K /app/venv/lib/python3.11/site-packages/starlette
712K /app/venv/lib/python3.11/site-packages/humanfriendly
696K /app/venv/lib/python3.11/site-packages/uvicorn
696K /app/venv/lib/python3.11/site-packages/toolz
680K /app/venv/lib/python3.11/site-packages/langsmith
632K /app/venv/lib/python3.11/site-packages/pypika
628K /app/venv/lib/python3.11/site-packages/watchdog
620K /app/venv/lib/python3.11/site-packages/posthog
568K /app/venv/lib/python3.11/site-packages/tqdm
548K /app/venv/lib/python3.11/site-packages/h11
540K /app/venv/lib/python3.11/site-packages/idna
540K /app/venv/lib/python3.11/site-packages/gitdb
512K /app/venv/lib/python3.11/site-packages/multidict
484K /app/venv/lib/python3.11/site-packages/requests
480K /app/venv/lib/python3.11/site-packages/importlib_resources
468K /app/venv/lib/python3.11/site-packages/marshmallow
444K /app/venv/lib/python3.11/site-packages/typer
412K /app/venv/lib/python3.11/site-packages/packaging
372K /app/venv/lib/python3.11/site-packages/wrapt
348K /app/venv/lib/python3.11/site-packages/referencing
340K /app/venv/lib/python3.11/site-packages/soupsieve
328K /app/venv/lib/python3.11/site-packages/coloredlogs
328K /app/venv/lib/python3.11/site-packages/certifi
284K /app/venv/lib/python3.11/site-packages/flatbuffers
264K /app/venv/lib/python3.11/site-packages/rsa
260K /app/venv/lib/python3.11/site-packages/orjson
248K /app/venv/lib/python3.11/site-packages/validators
192K /app/venv/lib/python3.11/site-packages/smmap
188K /app/venv/lib/python3.11/site-packages/tenacity
188K /app/venv/lib/python3.11/site-packages/build
188K /app/venv/lib/python3.11/site-packages/asgiref
156K /app/venv/lib/python3.11/site-packages/toml
136K /app/venv/lib/python3.11/site-packages/tzlocal
124K /app/venv/lib/python3.11/site-packages/overrides
120K /app/venv/lib/python3.11/site-packages/markupsafe
120K /app/venv/lib/python3.11/site-packages/backoff
108K /app/venv/lib/python3.11/site-packages/pyproject_hooks
108K /app/venv/lib/python3.11/site-packages/cachetools
108K /app/venv/lib/python3.11/site-packages/blinker
96K /app/venv/lib/python3.11/site-packages/filelock
84K /app/venv/lib/python3.11/site-packages/mdurl
80K /app/venv/lib/python3.11/site-packages/mmh3
68K /app/venv/lib/python3.11/site-packages/deprecated
64K /app/venv/lib/python3.11/site-packages/zipp
64K /app/venv/lib/python3.11/site-packages/attrs
60K /app/venv/lib/python3.11/site-packages/sniffio
48K /app/venv/lib/python3.11/site-packages/aiolimiter
24K /app/venv/lib/python3.11/site-packages/aiosignal
2
Answers
sentence-transformers is a biggest culprit here, Sometime it won’t understand the dependency and add them to make the size huge.
Use this in your dockerfile, It should reduce the size to around 2.5GB
There are several ways to optimize and reduce the size of the Docker image you have provided. Here are a few suggestions:
Use Alpine or Debian as a base image: Instead of using the slim version of the Python base image, try using smaller base images such as Alpine or Debian. These images are specifically created for Docker and have a smaller footprint compared to other distros or OSes. You can use
python:3.11.4-alpine
orpython:3.11.4-buster
as your base image.Remove unnecessary packages: From the output of your
pip list
command, it appears that there are several packages that may not be necessary for your application. Consider removing or optimizing them to reduce the image size. For example, you can removealtair
,beautifulsoup4
, andnetworkx
if they are not required for your application.Use a virtual environment: Use a virtual environment to manage packages for your application. It can reduce the size of the image by removing any unused packages or libraries.
Avoid installing unnecessary dependent packages: Ensure that you are installing only the packages that are required for your web application. Sometimes when you install some packages, it automatically installs some dependencies. To avoid this, you can use
--no-deps
with yourpip install
command to avoid installing unnecessary packages.Use Dockerignore: Use a
.dockerignore
file in your Docker build context to exclude any unnecessary files and directories that are not required for your application. This will help you reduce the size of your Docker image and speed up the build process.