I’m pretty new to docker and, although I’ve read lots of articles, tutorials and watched YouTube videos, I’m still finding that my image size is in excess of 1 GB when the alpine image for Python is only about 25 MB (if I’m reading this correctly!).
I’m trying to work out how to make it smaller (if in fact it needs to be).
[Note: I’ve been following tutorials to create what I have below. Most of it makes sense .. but some of it feels like voodoo]
Here is my Dockerfile:
FROM python:3.8.3-alpine
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
RUN mkdir -p /home/app
RUN addgroup -S app && adduser -S app -G app
ENV HOME=/home/app
ENV APP_HOME=/home/app/web
RUN mkdir $APP_HOME
RUN mkdir $APP_HOME/staticfiles
RUN mkdir $APP_HOME/mediafiles
WORKDIR $APP_HOME
RUN pip install --upgrade pip
COPY requirements.txt .
RUN apk update
&& apk add --virtual build-deps gcc python3-dev musl-dev
&& apk add postgresql-dev
&& apk add jpeg-dev zlib-dev libjpeg
&& apk add --update --no-cache postgresql-client
RUN pip install -r requirements.txt
RUN apk del build-deps
COPY entrypoint.prod.sh $APP_HOME
COPY . $APP_HOME
RUN chown -R app:app $APP_HOME
USER app
ENTRYPOINT ["/home/app/web/entrypoint.prod.sh"]
Using Pillow
and psycopg2-binary
has caused a world of confusion and hurt. Particularly with the following:
RUN apk update
&& apk add --virtual build-deps gcc python3-dev musl-dev
&& apk add postgresql-dev
&& apk add jpeg-dev zlib-dev libjpeg
&& apk add --update --no-cache postgresql-client
RUN pip install -r requirements.txt
RUN apk del build-deps
This was originally:
RUN apk update
&& apk add --virtual build-deps gcc python3-dev musl-dev
&& apk add postgresql
&& apk add postgresql-dev
&& apk add --update --no-cache postgresql-client
&& pip install psycopg2-binary
&& apk add jpeg-dev zlib-dev libjpeg
&& pip install Pillow
&& apk del build-deps
I really have no idea how much of the above I need to make it work. I think there might be a way of reducing the build.
I know there is a way to build the original image and then use that to transfer things over, but the only tutorials are confusing and I am struggling to get my head around this without adding more complexity. I really wish I had someone who could just explain it in person.
I also don’t know if the size of the image is coming from the requirements.txt
file. I’m using django and there are a number of requirements:
requirements.txt
asgiref==3.4.1
Babel==2.9.1
boto3==1.18.12
botocore==1.21.12
certifi==2021.5.30
charset-normalizer==2.0.4
crispy-bootstrap5==0.4
defusedxml==0.7.1
diff-match-patch==20200713
Django==3.2.5
django-anymail==8.4
django-compat==1.0.15
django-crispy-forms==1.12.0
django-environ==0.4.5
django-extensions==3.1.3
django-hijack==2.3.0
django-hijack-admin==2.1.10
django-import-export==2.5.0
django-money==2.0.1
django-recaptcha==2.0.6
django-social-share==2.2.1
django-storages==1.11.1
et-xmlfile==1.1.0
fontawesomefree==5.15.3
gunicorn==20.1.0
idna==3.2
jmespath==0.10.0
MarkupPy==1.14
odfpy==1.4.1
openpyxl==3.0.7
Pillow==8.3.1
psycopg2-binary==2.9.1
py-moneyed==1.2
python-dateutil==2.8.2
pytz==2021.1
PyYAML==5.4.1
requests==2.26.0
s3transfer==0.5.0
six==1.16.0
sqlparse==0.4.1
stripe==2.60.0
tablib==3.0.0
urllib3==1.26.6
xlrd==2.0.1
xlwt==1.3.0
The question I have is, how do I make the image smaller. Does it need to be smaller?
I’m just trying to find the best way to deploy the Django app to Digitalocean and there is a world of confusion with so many approaches and tutorials etc. I don’t know if it makes it easier to use docker. Do I just use their App Platform? Will that provide SSL? What are the advantages to using docker etc?
docker-compose file (for reference)
version: '3.7'
services:
web:
build:
context: .
dockerfile: Dockerfile.prod
command: gunicorn maffsguru.wsgi:application --bind 0.0.0.0:8000
volumes:
- static_volume:/home/app/web/staticfiles
- media_volume:/home/app/web/mediafiles
expose:
- 8000
env_file:
- .env.docker
depends_on:
- db
db:
image: postgres:12.0-alpine
env_file:
- .env.docker
volumes:
- postgres_data:/var/lib/postgresql/data/
ports:
- 5432:5432
nginx:
build: ./nginx
volumes:
- static_volume:/home/app/web/staticfiles
- media_volume:/home/app/web/mediafiles
ports:
- 1337:80
depends_on:
- web
volumes:
postgres_data:
static_volume:
media_volume:
Just to say … the above all seems to work … but I don’t know if the size of the image etc is going to be a problem?
I am also confused as to why Nginx seems to need me to do http://0.0.0.0:1337 to view the site. Isn’t the whole point to view it by navigating to http://0.0.0.0/
Thanks for any advice or guidance you might be able to give and apologies for the random nature of my questions
2
Answers
welcome to Docker! It can be quite the thing to wrap one’s head around, especially when beginning, but you’re asking really valid questions that are all pertinent
Reducing Size
How to
A great place to start is Docker’s own Dockerfile best practices page:
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
They explain neatly how your each directve (
COPY
,RUN
,ENV
, etc) all create additional layers, increasing your containers size. Importantly, they show how to reduce your image size by minimising the different directives. They key to alot of minimisation is chaining commands inRUN
statements with the use of&&
.Something else I note in your Dockerfile is one specific line:
Now, depending on how you build your container (Specifically, what folder you pass to Docker as the context), this will copy EVERYTHING in that it has available to it. Chances are, this will be bringing in your
venv
folder etc if you have one. I feel that this may be the largest perpetrator of size for you. You can mitigate this by adding an explicitCOPY
in, or using a.dockerignore
file.I built your image (Without any source code, and without copying in
entrypoint.sh
), and it came out to 710MB as a base. It could be a good idea to check the size of your source code, and see if anything else is getting in there.After I re-arranged some of the commands to reuse directives, the image was 484MB, which is considerably smaller!
If you get stuck, I can pop it into a gist on Github for you and walk you through it, however, the Docker documentation should hopefully get you going
Why?
Well, larger applications / images aren’t inherently bad, but with any increase in data, some operations may be slower.
When I say operations, I tend to mean pulling images from a registry, or pushing them to publish. It will take longer to transfer 1GB than it will 50MB.
There’s also a consideration to be made when you scale your containers. While the image size does not necessarily correlate directly to how much disk you will use when you start a container, it will certainly increase the requirements for the machine you’re running on, and limit others on smaller devices
Docker
The advantages of using Docker are widespread, and I can’t cover them all here without submitting my writing for thesis defence 😉
But it mainly boils down to the following points:
Nginx
You’ve set things up well there, from what I can gather! I imagine nginx is ‘telling you’ (Via the logs?) to navigate to
0.0.0.0
because that is what it will have bound to in the container. Now, you’ve forwarded traffic from1337:80
. Docker follows the format ofhost:container
, so this means that traffic onlocalhost:1337
will be directed to the containers port80
.You may need to swap this around based on your nginx configuration, but rest assured you will be able to navigate to localhost in your browser and see your website once everything is set up
Let me know if you need help with any of the above, or want more resources to aid you. Happy to correspond and walk you through anything anytime given we seem to be in the same timezone 🤙
Notice that you have to install a compiler. That compiler takes a lot of space.
Most Python packages include pre-compiled binary packages, so why do you need a compiler? Because you are using Alpine. Binary packages (==wheels) from PyPI don’t work on Alpine.
So:
python:3.8-slim-buster
.Details: https://pythonspeed.com/articles/alpine-docker-python/
The alternative is a multi-stage build, where your final image doesn’t include the unnecessary compiler. This adds more complexity of course.
Starting point for that (it’s 3 article series): https://pythonspeed.com/articles/smaller-python-docker-images/