skip to Main Content

Is there any advantage to layer cache invalidation by using ADD instead of RUN?

Background

I frequently see Dockerfiles that install wget or curl just to RUN wget … or RUN curl … to install some dependency that cannot be found in package management.

I suspect these could be converted to simple ADD <url> <dest> lines, and that would at least obviate the need for adding curl or wget to the image.

Further, it seems like the docker daemon could rely on HTTP cache invalidation to inform its own layer cache invalidation. At a minimum (e.g. in the absence of HTTP cache headers), it could GET the resource, hash it, and calculate invalidation the same way it does for local files.

NOTE: I am familiar with the usage of Add vs RUN …, but I am looking for a strong reason to choose one over the other. In particular, I want to know if ADD <url> can behave any more intelligently with regard to layer cache invalidation.

2

Answers


  1. Certainly.

    The RUN instruction will not invalidate the cache unless its text changes. So if the remote file is updated, you won’t get it. Docker will use the cached layer.

    The ADD instruction will always download the file and the cache will be invalidated if the checksum of the file no longer matches.

    I would recommend using ADD instead of RUN wget ... or RUN curl .... I imagine people use the latter as its more familiar, but the ADD instruction is quite powerful. It can untar files and set ownership. It’s also considered best practice to avoid downloading any packages that are not necessary for your process to run (though there are multiple ways to accomplish this, like using multi-stage builds).

    Docs on cache invalidation:

    https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache

    Login or Signup to reply.
  2. RUN wget … or RUN curl … is preferable if you are downloading an archive. It allows extraction of the archive files and deletion of the downloaded file in the same RUN command. Therefore the downloaded file itself is not stored in the image.

    As the Docker documentation says: "using ADD to fetch packages from remote URLs is strongly discouraged"

    Avoid doing things like::

    ADD https://example.com/big.tar.xz /usr/src/things/
    RUN tar -xJf /usr/src/things/big.tar.xz -C /usr/src/things
    RUN make -C /usr/src/things all
    

    And instead, do something like:

    RUN mkdir -p /usr/src/things 
        && curl -SL https://example.com/big.tar.xz 
        | tar -xJC /usr/src/things 
        && make -C /usr/src/things all
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search