I’ve just started learning Docker for work, and my current belief is that if one runs a Docker container with the rm
option – i.e.
docker run -rm mycontainer python3 ./mycode.py --logging='true'
then whatever output is produced disappears when the container closes. However, I just came across some code documentation that said:
"–rm: removes container at end of execution. Note – since output is stored in a volume, this persists beyond the life of the container"
What does this mean?
The original command this came from had the form:
docker run -it --name p2p -d --rm
--mount type=bind,source=/home/me/scripts,target=/scripts
--mount source=data,target=/data
--mount source=output,target=/output
--gpus device=GPU-5jhfjhjhjg-jhg-jgjgjh
my_docker_container
python3 mycode.py --logging='true' <lots of other flags>
What does it mean "the output is stored in a volume" and how do I go about finding this volume?
2
Answers
Docker volumes are basically just directories on the host, usually under
/var/lib/docker/overlay
. It’s a little trickier to get to/var/lib/docker
on OS X.You can run
docker volume ls
to list the volumes, anddocker volume inspect <id>
to get the path on disk. The volume should hang around after the container is removed, unless you explicitly remove it or rundocker system prune
(and should be automatically re-attached by running the same command).It looks like you didn’t actually mount a volume with your command, but I’ve occasionally lucked out and found data under
/var/lib/docker
that hasn’t been deleted/garbage-collected/etc yet.Volumes are the primary method of persisting data beyond the lifetime of a container and also for sharing data between containers so that, provided that the volume is writeable, the change one container makes is visible to another.
Think of it like a network file storage shared between two computers on a network with absolutely no hard disk of their own. Now, if the computer were to shut down and get restarted, by itself it doesn’t have a hard-disk to get persisted data from but because of the network file storage, it can see the latest updates to the contents made by the other machine. Same goes with volumes and containers.
The source of a docker volume could be any logical abstraction of persistant disk storage, whether its a windows drive or a linux mount point. When mounting a volume on a container, you’re basically creating a linux mount point within the container that is pointing to the outside logical storage so that it sees what the host sees and vice versa. For example, in the example you shared, the host mount point /home/me/scripts/ contents is seen by the container as belonging to /scripts. In fact, if you enter the bash shell of the container and run rm on any file in /scripts within the container, it will be result in the /home/me/scripts/ content being removed as well but in reality, it IS the same thing being point at by host and container.
Volumes are essential for running databases in containers because the container by itself is ephemeral and everything is lost when it dies. But having a volume means that if the db container is started up again with the same volume mount pointing to the host file system where db data is residing, the db state remains intact.
Most of what I said is aimed at getting the basic idea of a volume and not towards being completely accurate-I hope you get what I am saying. Here is a great article that goes deeper into docker volumes. It’s 5 years hold but the concept still holds.