Is there a way to optimise/speed up the loading of docker images from tar files?

KevinNiland
April 30, 2024
157 views
0 votes
2 Answers

I have a docker.tar file that contains numerous docker images – the size of this file is quite big, sitting at around 44 gigabytes. These images would be loaded from the tar file, retagged, and then pushed to another registry. All-in-all, this entire process could take about 40 minutes due to how many images there are.

So far, I’ve managed to cut this time down to about 20 minutes as I am using xargs to push the images once they have been retagged. The loading of the images is the next thing I want to try and address as this takes some time as well.

I have tried using split to split the original tar file into smaller parts, and then using xargs that way to try a docker load however I get some errors saying these new tar files are not created correctly (incorrect header, unexpected EOF, etc.)

Apart from that, I haven’t found much on the topic besides this: https://forums.docker.com/t/docker-save-load-performance/9245 – but the one comment that provides a possible improvement deals with docker save

Is there any other way I can improve the speed of docker load? Ideally, possible improvements would need to be done in bash.

UPDATE: Please find the process on how the images are saved, loaded, retagged, and pushed below. If any more info is required, please comment.

A Jenkins job is triggered which will pull the Docker images from the internal registry and save these images a docker.tar file
This tar file is then packaged alongside bash scripts (which will eventually load, retag, and push these images to a new registry) inside a CSAR package: https://wiki.onap.org/display/DW/Csar+Structure
An external user will then be given this package – they will then unzip the CSAR package
From here, they will then run the provided script to load the images from the docker.tar file:

RED="33[1;31m"
GREEN="33[1;32m"
NOCOLOR="33[0m"
IMAGE_TAR=$1

logger() { 
   echo -e "`date '+%Y-%m-%d %H:%M:%S:'` $@"  | tee -a ${LOG_FILE}
}

logger "Loading [$IMAGE_TAR]..."

docker load --input ${IMAGE_TAR}

LIST_OF_DOCKER_IMAGES=`docker image list | awk '{print $1":"$2;}'`

This script is then called as so: bash load_docker_images.sh docker.tar

Tags: bash docker

Answers

- BMitch
- April 30, 2024 at 4:02 pm
- 0 votes
0
There’s no need to use docker pull; docker save; docker load; docker tag; docker push in this scenario. All of those steps are transforming the content, moving it in an out of the backend storage system, extracting it to temporary directories, etc, that isn’t needed to simply move an image between two registries with an air-gap.

The common solution these days is to use an OCI Layout to store the image on the filesystem, optionally tarring up the content (if you already tar up some other files in the process of packaging the scripts, I’d skip that as redundant). Multiple tools allow you to copy images to or from an OCI Layout, including crane from Google, oras from Microsoft, skopeo from RedHat, and regctl from myself. E.g. with regctl, the commands would look like:
```
# in the CI job
regctl image copy $source ocidir://$dir:$tag
# ... package scripts and $dir into a tar file to ship ...

# on the destination
# ... unpack scripts and $dir from tar ...
regctl image copy ocidir://$dir:tag $dest
```
Login or Signup to reply.

- Setop
- April 30, 2024 at 4:07 pm
- 0 votes
0
Yes there is.

The docker save command has flatten all the layers of an image in order to create the tar. I can be easily verified by extracting the tar.
Then the tar is transferred over the network.
Then docker load has to unpack all the layers from the archive.
This is long.

You should use an image registry, either public or private.

Even if the image is transfer twice, once for docker push, once for docker pull, layers which where already there, on the source, on the registry or on the destination, are not re-transfer. Meaning updates are fast.

More over, layers are downloaded in parallel then extracted without waiting for the last one, which is not possible with a tar.

And if the image has to be deployed more than one machine, you save even more.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.