I’m building an Docker image with big files (>1.0GB) and small python scripts. Big files are rarely changed, so I want to caching it.
The directory is looks like:
- app/
- main.py
- modules/
- foo.py
- bar.py
- big_files/
- bigone.tar
- bigtwo.tar
My first Dockerfile:
FROM python3:latest
COPY ./app /opt/app
When I update python scripts, it have to COPY all files which consume a long time.
What I want to acheive:
FROM python3:latest
COPY ./app/big_files /opt/app/big_files
COPY ./app /opt/app
However, it also copy big files too.
How to COPY in two step for caching?
2
Answers
You should change your app’s structure so that the big files are outside the /app folder.
If you don’t want to do that, then you have to explicitly copy the files under /app:
Each step is happening in its own layer. In your case that means
Everything must be run on the initial build.
With a setup similar to yours:
and a dockerfile:
Now if you run the exact same build again and nothing has changed, each layer can be reused:
If you touch the big files, that layer has to be rebuilt:
You can see that in the example above, the step 2 and 3 have changed because
bigFiles
is also part ofapp
, so no layer could be cached.However, if you only change the small files and not big files, the big file layer can be cached: