Can someone explain me in simple terms what does
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes -c yes
do when called right before doing a docker build
a container from a Dockerfile?
I have the notion that it is to permit the use of containers from other architectures into the X86 architecture, but I am not sure I quite understand the explanation I found in some sites.
Does the presence of the above instruction(docker run
) implies that the Dockerfile of the build stage is for another architecture?
2
Answers
I too had this question recently, and I don’t have a complete answer, but here is what I do know, or at least believe:
Setup & Test
The magic to setup – required once per reboot of system, is just this:
Note that the test example assumes that you are running that your own personal "rootless-"
docker
, therefore as yourself, not asroot
(nor viasudo
), and it works just dandy.Gory Details
… which are important if you want to understand how/why this works.
The main sources for this info:
buildx
to build multi/cross-arch images)qemu-user-static
)The fundamental trick to making this work is to install new "magic" strings into the kernel process space so that when an (ARM) executable is run inside a docker image, it recognizes the bin-fmt and uses the QEMU interpreter (from the
multiarch/*
docker image) to execute it. Before we setup the bin formats, the contents look like this:After we start (root’s)
dockerd
and setup the formats:Now we can run an ARM version of ubuntu:
The warning is to be expected since the host CPU is AMD, and can be gotten rid of by specifying the platform to docker:
How does this really work?
At the base of it is just QEMU’s ability to interpose a DBM (dynamic binary modification) interpreter to translate the instruction set of one system to that of the underlying platform.
The only trick we have to do is tell the underlying system where to find those interpreters. Thats what the
qemu-user-static
image does in registering the binary format magic strings / interpreters. So, what’s in thosebinfmt
s?Huh – that’s interesting, especially because on the host system there is no
/usr/bin/qemu-aarch64-static
, and it’s not in the target image either, so where does this thing live? It’s in theqemu-user-static
image itself, with the appropriate tag of the form:<HOST-ARCH>-<GUEST-ARCH>
, as inmultiarch/qemu-user-static:x86_64-aarch64
.That’s the real magic that I don’t yet quite understand. Somehow
docker
is, I believe, using that image to spin up the QEMU interpreter, and then feeding it the code from the actual image/container you want to run, as in theuname
example from earlier. Some web-searching left me unsatiated as to how this magic is achieved, but I’m guessing if I kept following links from here I might find the true source of that slight-of-hand.To complement @crimson-egret’s answer: The
fix-binary
flag inbinfmt_misc
was used to make the statically compiled qemu emulator work across different namespaces/chroots/containers.In the doc for
binfmt_misc
you can find the explanation of thefix-binary
flag:This bug report also explained:
If you use the qemu-user-static image without the
-p yes
option, thefix-binary
flag won’t be added, and running the arm64 container won’t work because now the kernel will actually try to open the qemu emulator in the container’s root filesystem: