When we build an image from a Dockerfile, each step in the Dockerfile, results into creation of an intermediate container created from the image from its previous step and running the instruction of that step.
However when it comes to the ultimate step of CMD, the newly created image by running the CMD command on top of penultimate step will also have the same startup command CMD right?
Example:
Step 1. FROM alpine
Step 2. RUN apk --update add redis
Step 3. EXPOSE 6379
Step 4. CMD ["redis-server"]
Step 1 FROM Alpine will have Image I1
Step 2 will start a container C1 using I1 and the RUN command as startup creating new Image I2 (removing intermediate container C1)
Step 3 will use image I2 to run a new container C2 with EXPOSE command as startup and create a new image I3 (removing intermediate container C2)
Step 4 will use image I3 to start a new container C3 and run startup command CMD that will create a new image I4 (removing intermediate container C3)
Finally I4 will be presented as the final image. Now this image I4 will also the same start up command of CMD as the image I3.
My question is why do we have to run container C3 and create image I4? Why not just leave it at image I3 with startup command CMD which will run when a user tried to create a container using I3
2
Answers
I found the solution to the question:
To start with the image I3 and I4 do not have the same start up command. Let me explain:
Actually whenever each step executes, it takes the Image created from the previous step and creates a container with that image. Then it executes the command in the current step inside this running container using
docker exec
modifying the filesystem snapshot without modifying the start up command. This then gets used as a base image for the next step.However this the behavior is slightly different for the ultimate step of CMD. In this step, docker again takes the image generated from the penultimate step and creates a container. Then instead of executing the instruction in the CMD step... it simply sets the startup command for the container and takes an image snapshot and shuts down the intermediate container.
So the difference is that while in other steps the filesystem snapshot gets updated, in the ultimate step of CMD, only the startup command gets updated.
It doesn’t run these containers, this only happens for RUN steps. There’s also no new layers added, the only steps that can add filesystem layers are RUN, COPY, and ADD. This is a modification to the image’s config json metadata. The image manifest contains a reference to this config json, so you get a new image digest for every config change. And with the classic build process, the easy way to generate that metadata is to create a container, but as the output here shows, it’s not being run, status is only
created
:Importantly, images are json that reference other layers and config metadata, and you don’t copy all the layers, you only create additional references to the same layers, so there’s only a few kb to track another json file that points to another config file. You can see that when pushing to a registry:
And looking at the history section of the image config, it shows those steps as not having any layer data (
empty_layer: true
):Docker tracks each of these separately because it impacts caching. Only when you start from the same previous state is it valid to reuse a cached step from a previous build.
Note that there are other ways to generate this metadata, and buildkit does this without creating or running containers:
In the above, you can see
#5
executes the RUN, and then it immediately writes the result.