I have a docker.tar
file that contains numerous docker images – the size of this file is quite big, sitting at around 44 gigabytes. These images would be loaded from the tar file, retagged, and then pushed to another registry. All-in-all, this entire process could take about 40 minutes due to how many images there are.
So far, I’ve managed to cut this time down to about 20 minutes as I am using xargs
to push the images once they have been retagged. The loading of the images is the next thing I want to try and address as this takes some time as well.
I have tried using split
to split the original tar file into smaller parts, and then using xargs
that way to try a docker load
however I get some errors saying these new tar files are not created correctly (incorrect header, unexpected EOF, etc.)
Apart from that, I haven’t found much on the topic besides this: https://forums.docker.com/t/docker-save-load-performance/9245 – but the one comment that provides a possible improvement deals with docker save
Is there any other way I can improve the speed of docker load
? Ideally, possible improvements would need to be done in bash
.
UPDATE: Please find the process on how the images are saved, loaded, retagged, and pushed below. If any more info is required, please comment.
- A Jenkins job is triggered which will pull the Docker images from the internal registry and save these images a
docker.tar
file - This tar file is then packaged alongside bash scripts (which will eventually load, retag, and push these images to a new registry) inside a CSAR package: https://wiki.onap.org/display/DW/Csar+Structure
- An external user will then be given this package – they will then unzip the CSAR package
- From here, they will then run the provided script to load the images from the
docker.tar
file:
RED="33[1;31m"
GREEN="33[1;32m"
NOCOLOR="33[0m"
IMAGE_TAR=$1
logger() {
echo -e "`date '+%Y-%m-%d %H:%M:%S:'` $@" | tee -a ${LOG_FILE}
}
logger "Loading [$IMAGE_TAR]..."
docker load --input ${IMAGE_TAR}
LIST_OF_DOCKER_IMAGES=`docker image list | awk '{print $1":"$2;}'`
This script is then called as so: bash load_docker_images.sh docker.tar
2
Answers
There’s no need to use
docker pull; docker save; docker load; docker tag; docker push
in this scenario. All of those steps are transforming the content, moving it in an out of the backend storage system, extracting it to temporary directories, etc, that isn’t needed to simply move an image between two registries with an air-gap.The common solution these days is to use an OCI Layout to store the image on the filesystem, optionally tarring up the content (if you already tar up some other files in the process of packaging the scripts, I’d skip that as redundant). Multiple tools allow you to copy images to or from an OCI Layout, including
crane
from Google,oras
from Microsoft,skopeo
from RedHat, andregctl
from myself. E.g. withregctl
, the commands would look like:Yes there is.
The
docker save
command has flatten all the layers of an image in order to create the tar. I can be easily verified by extracting the tar.Then the tar is transferred over the network.
Then
docker load
has to unpack all the layers from the archive.This is long.
You should use an image registry, either public or private.
Even if the image is transfer twice, once for
docker push
, once fordocker pull
, layers which where already there, on the source, on the registry or on the destination, are not re-transfer. Meaning updates are fast.More over, layers are downloaded in parallel then extracted without waiting for the last one, which is not possible with a tar.
And if the image has to be deployed more than one machine, you save even more.