I am trying to speed up my docker multistage build.
In the first stage I copy .
(except stuff in .dockerignore
) and build. I deliberately use COPY . .
so I don’t have to list theinteresting folders & to avoid COPY flattening. A Dockerfile
snippet is at end of this post.
I thought the RUN rm -rf nginx/
would mean in development if I changed a file in nginx/
and I rebuild the image that the first stage would be skipped. But that is not working and it is rebuilding it which is slow. Am I misunderstanding how the caching works? Basically I am trying to make changes to the nginx/
and it is slow to rebuild everything to try stuff. Is there a way to avoid rebuilding the first layer if I change something in nginx/
?
FROM myimage/node:14.19.3 as build
WORKDIR /home/node
COPY . .
# During development NGINX changes should not rebuild this layer;
RUN rm -rf nginx/
RUN npm run build
FROM myimage/nginx:1.23.3
COPY ./nginx/nginx.conf /etc/nginx/nginx.conf
// SNIP
2
Answers
Generally speaking, each instruction in a Dockerfile becomes roughly a layer in the final image. Every layer add more content to the bottom layer and, finally, all the layers compose a stack.
Whenever a layer changes, that layer need to rebuild and this affect all the layers that comes after it.
In your case, the issue that is preventing you to cache the initial layers is the
COPY . .
instruction. In fact, from the caching point of view that instruction is very inefficient since updating any file causes a reinstall of all files and dependencies every time you build the Docker image even if they were not changed.If you cannot edit that instruction try to move it in order to be executed as late as possible, such as:
In short
The fact that you copy all the files and folders with
COPY . .
in your directory, invalidates the cache when you modify any file (even unrelated ones) in this folder. Cache will be recreated starting from the layer where your edited file has a relevance.In this case
the relavance
is the fact that your changed on nginx.conf invalidates the layers, because the copy of.
copies all the files, even the.conf
, and the deriving layer, in terms of Checksum cannot so the layer cannot be the same.This answer explain how the checksum of layers are calculated
TL; DR;
To do a better explanation of what is happening, and due to the fact i don’t know what is in your app, i will go with a similar example of mine.
A simple React app, so, let’s get in.
In this example there is a copy of specific directories:
when i launch the build of
docker build -t deusdog .
i will obtain this:and, very important, if i relaunch the same command without modifying anything i’ll obtain something like:
you can see a lot of cached layers.
So if you do any modication of the file
uselessfile.txt
, and then relaunch thedocker build
command, you’ll obtain:as you can see, all the layers after the copy of
uselessfile.txt
will be recreated. Every. Single. Time.Another (superfaster) example
In the following example, i’ll put the line
COPY ./uselessfile.txt ./veryuseless.txt
after a lot of layers, like this:
Then i launch the
docker build
command and i will obtain:When i relaunch the
docker build
command even if i modify theuselessfile.txt
Conclusions
Every layer depends on the preceding, and when you do as you do (good for certain production scopes), using
COPY . .
, any change on any file recreates a lot of layers.Anyway, i suggest you to do a dockerfile for dev and one for production and keep it updated and similar.
For example in the dev one, you can copy the node_modules so you can skip the
npm install
command, or a mix of all these things. If you do a lot of editings on thesrc
folder orApp.js
file, you don’t have the need of recreate the node_modules every time on the docker build.Moreover, you could edit the file directly on the container with vim to speed up you development.
Hope, it clearify!
Cose belle!