Is there any advantage to layer cache invalidation by using ADD
instead of RUN
?
Background
I frequently see Dockerfiles that install wget or curl just to RUN wget …
or RUN curl …
to install some dependency that cannot be found in package management.
I suspect these could be converted to simple ADD <url> <dest>
lines, and that would at least obviate the need for adding curl or wget to the image.
Further, it seems like the docker daemon could rely on HTTP cache invalidation to inform its own layer cache invalidation. At a minimum (e.g. in the absence of HTTP cache headers), it could GET
the resource, hash it, and calculate invalidation the same way it does for local files.
NOTE: I am familiar with the usage of Add
vs RUN …
, but I am looking for a strong reason to choose one over the other. In particular, I want to know if ADD <url>
can behave any more intelligently with regard to layer cache invalidation.
2
Answers
Certainly.
The
RUN
instruction will not invalidate the cache unless its text changes. So if the remote file is updated, you won’t get it. Docker will use the cached layer.The
ADD
instruction will always download the file and the cache will be invalidated if the checksum of the file no longer matches.I would recommend using
ADD
instead ofRUN wget ...
orRUN curl ...
. I imagine people use the latter as its more familiar, but theADD
instruction is quite powerful. It can untar files and set ownership. It’s also considered best practice to avoid downloading any packages that are not necessary for your process to run (though there are multiple ways to accomplish this, like using multi-stage builds).Docs on cache invalidation:
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache
RUN wget …
orRUN curl …
is preferable if you are downloading an archive. It allows extraction of the archive files and deletion of the downloaded file in the sameRUN
command. Therefore the downloaded file itself is not stored in the image.As the Docker documentation says: "using ADD to fetch packages from remote URLs is strongly discouraged"
Avoid doing things like::
And instead, do something like: