What is the idiomatic way to write a docker file for building against many different versions of the same compiler?
I have a project which tests against a wide-range of versions of different compilers like gcc
and clang
as part of a CI job. At some point, the agents for the CI tasks were updated/changed, resulting in newer jobs failing — and so I’ve started looking into dockerizing these builds to try to guarantee better reliability and stability.
However, I’m having some difficulty understanding what a proper and idiomatic approach is to producing build images like this without causing a large amount of duplication caused by layers.
For example, let’s say I want to build using the following toolset:
gcc
4.8, 4.9, 5.1, … (various versions)cmake
(latest)ninja-build
I could write something like:
# syntax=docker/dockerfile:1.3-labs
# Parameterizing here possible, but would cause bloat from duplicated
# layers defined after this
FROM gcc:4.8
ENV DEBIAN_FRONTEND noninteractive
# Set the work directory
WORKDIR /home/dev
COPY . /home/dev/
# Install tools (cmake, ninja, etc)
# this will cause bloat if the FROM layer changes
RUN <<EOF
apt update
apt install -y cmake ninja-build
rm -rf /var/lib/apt/lists/*
EOF
# Default command is to use CMak
CMD ["cmake"]
However, the installation of tools like ninja-build
and cmake
occur after the base image, which changes per compiler version. Since these layers are built off of a different parent layer, this would (as far as I’m aware) result in layer duplication for each different compiler version that is used.
One alternative to avoid this duplication could hypothetically be using a smaller base image like alpine
with separate installations of the compiler instead. The tools could be installed first so the layers remain shared, and only the compiler changes as the last layer — however this presents its own difficulties, since it’s often the case that certain compiler versions may require custom steps, such as installing certain keyrings.
What is the idiomatic way of accomplishing this? Would this typically be done through multiple docker files, or a single docker file with parameters? Any examples would be greatly appreciated.
3
Answers
put an
ARG
before theFROM
and then invoke theARG
as theFROM
so:
then you
docker build . -t test/clang-8 --build-args COMPILER=clang-8
or similar.
If you want to automate just make a list of compilers and a bash script looping over the lines in your file, and paste the lines as inputs to the tag and COMPILER build args.
As for Cmake, I’d just do:
When copying, I find it cleaner to do
edit: formatting
I would separate the parts of preparing the compiler and doing the calculation, so the source doesn’t become part of the docker container.
Prepare Compiler
For preparing the compiler I would take the ARG approach but without copying the data into the container. In case you wanna fast retry while having enough resources you could spin up multiple instances the same time.
Build it
Here you have few options. You could either prepare a volume with the sources or use bind mounts together with docker exec like this:
And because the source is not part of the docker image you don’t have bloat. You can also have two mounts, one for a readonly source tree and one for the output files.
Note: If you remove the CMake command you could also spin up the docker containers in parallel and use docker exec to start the build. The downside of this is that you have to take care of out of source builds to avoid clashes on the output folder.
As far as I know, there is no way to do that easily and safely. You could use a
RUN --mount=type=cache
, but the documentation clearly says that:I have not tried it but I guess the layers are duplicated anyway, you just save time, assuming the cache is not emptied.
The other possible solution you have is similar to the one you mention in the question: starting with the tools installation and then customizing it with the
gcc
image. Instead of starting with analpine
image, you could startFROM scratch
.scratch
is basically the empty image, you couldCOPY
the files generated byThen you
COPY
the entiregcc
filesystem. However, I am not sure it will work because the order of the initial layers is now reversed. This means that some files that were in theupper
layer (coming from tools) now are in thelower
layer and could be overwritten. In the comments, I asked you for a working Dockerfile because I wanted to try this out before answering. If you want, you can try this method and let us know. Anyway, the first step is extracting the files created from the tools layer.How to extract changes from a layer?
Let’s consider this Dockerfile and build it with
docker build -t test .
:Now that we have built the
test
image, we should find 3 new layers. You mainly have 2 ways to extract the changes from each layer:the first is
docker inspect
ing the image and thenfind
the ids of the layers in the/var/lib/docker
folder, assuming you are on Linux. Each layer has adiff
subfolder containing the changes. Actually, I think it is more complex than this, that is why I would opt for…skopeo
: you can install it withapt install skopeo
and it is a very useful tool to operate on docker images. The command you are interested in iscopy
, that extracts the layers of an image and export them as.tar
:where
image_name
istest
in this case.Extracting layer content with Skopeo
In the specified folder, you should find some
tar
files and a configuration file (look at theskopeo copy
command output and you will know which one is that). Then extract each{layer}.tar
in a different folder and you are done.Note: to find the layer containing your tools just open the configuration file (maybe using
jq
because it is json) and take thediff_id
that corresponds to theRUN
instruction you find in thehistory
property. You should understand it once you open the JSON configuration. This is unnecessary if you have a small image that has, for example,debian
as parent image and a singleRUN
instruction containing the tools you want to install.Get GCC image content
Now that we have the tool layer content, we need to extract the
gcc
filesystem. we don’t needskopeo
for this one, butdocker export
is enough:create a container from
gcc
(with the tag you need):export it as tar:
finally extract the tar file.
Putting all together
The final Dockerfile could be something like:
In this way, the tools layer is always reused (unless you change the content of that folder, of course), but you can parameterize the
gcc_4.x
with theARG
instruction for example.Read carefully: all of this is not tested but you might encounter 2 issues:
gcc
image overwrites some files you have changed in the tools layer. You could check if this happens by computing thediff
between the gcc layer folder and the tools layer folder. If it happens, you can only keep track of that file/s and add it/them in the dockerfile after theCOPY ./gcc ...
with anotherCOPY
.upper
layer a file is removed, docker marks that file with a.wh
extension (not sure if it is different withskopeo
). If in the tools layer you delete a file that exists in the gcc layer, then that file will not be deleted using the above Dockerfile (theCOPY ./gcc ...
instruction would overwrite the.wh
). In this case too, you would need to add an additionalRUN rm ...
instruction.This is probably not the correct approach if you have a more complex image that the one you are showing us. In my opinion, you could give this a try and just see if this works out with a single Dockerfile. Obviously, if you have many compilers, each one having its own tools set, the maintainability of this approach could be a real burden. Instead, if the Dockerfile is more or less linear for all the compilers, this might be good (after all, you do not do this every day).
Now the question is: is avoiding layer replication so important that you are willing to complicate the image-building process this much?