Consider this example from the docs:
# syntax=docker/dockerfile:1 FROM node WORKDIR /app COPY package.json yarn.lock . # Copy package management files RUN npm install # Install dependencies COPY . . # Copy over project files RUN npm build # Run build
By installing dependencies in earlier layers of the Dockerfile, there
is no need to rebuild those layers when a project file has changed.
Which layers can be skipped here?
By my understanding, npm
command is black-box for docker,
so docker doesn’t know what npm
‘s inputs are and what will it produce.
If so, then docker has to always run npm install
and npm build
commands, which means caching is useless here.
What am I missing here?
2
Answers
No.
All.
Docker exactly knows that – the input is all the files and the image produced from earlier stages, and the output are all the modifications made to the files in the image. The modifications are saved as a new layer.
Docker indeed doesn’t know what
npm install
means (e.g., you might configure your terminal to aliasnpm install
toecho 42
or anything else). But it does know that running it does not depend on any files."What will it produce" depends on its inputs (as @jonrsharpe also said in the comments). So, if inputs aren’t changed, the outputs aren’t changed either, Docker can infer that.
Compare it with, for example,
COPY package.json yarn.lock .
which does depend on the contents ofpackage.json
andyarn.lock
. Conversely, if either of those files is changed, then theCOPY …
command is re-applied, and thus the subsequentRUN npm install
is re-applied as well.So, to answer the titular question:
It’s rather the opposite:
RUN
-ing side-effect’y scripts is useless when Docker caching is involved.