When I build an image from a Dockerfile in my project, how does Docker know if there has been any change in the packages I import. For example, lets say I have RUN pip install flask
in my Dockerfile, and I build an image out of it. Lets say, I rebuild an image from this Dockerfile again in a few days, but the Flask package was updated. Does Docker still use the cached layer, or will it run the command fresh to get the latest Flask package. If it does not use cache how does it know that the Flask package was updated?
I know that there are options to clear the cache and build the image, but how would I know that there was an update to a package I installed. This does not seem like a reasonable solution, because if we use hundreds of packages, we would have to check each and every one of them to see if they have been updated.
I tried googling about this question, but I keep getting results on the Docker diff command which is not what I need.
2
Answers
Docker doesn’t know whether a package has changed remotely. The only thing that influences the build cache is the modification time of files in your build context. E.g., if your Dockerfile includes:
And you modify
requirements.txt
, this will invalidate the cache for that command and any following commands. On the other hand, if you have:That will stay cached indefinitely, regardless of whether the
flask
package gets an update. Docker doesn’t know anything about Python packages (orapt
packages, orrpm
packages, etc).You don’t need to check all the packages individually. If you occasionally build the image with caching disabled, you’ll get the latest version of all your packages.
On the other hand, if you have an application configured and working, you may not want to update potentially hundreds of packages (what if it breaks?). That’s why in production, many people will pin their dependencies to a specific version (
pip install flask==2.2.2
): this prevents unexpected updates from breaking things, and means that you control when updates happen.For Python in particular, tools like Pipenv can help manage version pinning for large numbers of dependencies.
Example Dockerfile:
For commands that aren’t
ADD
orCOPY
, Docker will only check the text of the directive for changes: I.e. it will check if the first layer,RUN "COMMAND1"
, has changed and it will check if there is a change to the build arg ofRUN "COMMAND2:$BUILDARG"
.However, for
ADD
andCOPY
, the checksum of each file is what is checked, and the last accessed and last modified times are ignored.