skip to Main Content

One of the main benefits of Docker is reproducibility. One can specify exactly which programs and libraries get installed how and where, for example.

However, I’m trying to think this through and can’t wrap my head around it. How I understand reproducibility is that if you request a certain tag, you will receive the same image with the same contents every time. However there are two issues with that:

  • Even if I try to specify a version as thoroughly as possible, for example python:3.8.3, I seem to have no guarantee that it points to a static non-changing image? A new version could be pushed to it at any time.
  • python:3.8.3 is a synonym for python:3.8.3-buster which refers to the Debian Buster OS image this is based on. So even if Python doesn’t change, the underlying OS might have changes in some packages, is that correct? I looked at the official Dockerfile and it does not specify a specific version or build of Debian Buster.

2

Answers


  1. If you depend on external docker images, your Docker image indeed has no guarantee of reproducability. The solution is to import the Python:3.8.3 image into your own Docker Registry, ideally a docker registry that can prevent overriding of tags (immutability), e.g. Harbor.

    However, reproducibility if your Docker image is harder then only the base image you import. E.g. if you install some pip packages, and one of the pip packages does not pin a version of a package they depend on, you still have no guarantee that rebuilding your Docker image leads to the same image. Hosting those python packages in your own pip artifactory is again the solution here.

    Login or Signup to reply.
  2. Addressing your individual concerns.

    • Even if I try to specify a version as thoroughly as possible, for example python:3.8.3, I seem to have no guarantee that it points to a static non-changing image? A new version could be pushed to it at any time.

    I posted this in my comment on your question, but addressing it here as well. Large packages use semantic versioning. In order for trust to work, it has to be established. This method of versioning introduces trust and consistency to an otherwise (sometimes arbitrary) system.

    The trust is that when they uploaded 3.8.3, it will remain as constant as possible for the future. If they added another patch, they will upload 3.8.4, if they added a feature, they will upload 3.9.0, and if they broke a feature, they would create 4.0.0. This ensures you, the user, that 3.8.3 will be the same, every time, everywhere.

    Frameworks and operating systems often backport patches. PHP is known for this. If they find a security hole in v7 that was in v5, they will update all versions of v5 that had it. While all the v5 versions were updated from their original published versions, functionality remained constant. This is important, this is the trust.

    So, unless you were “utilizing” that security hole to do what you needed to do, or relying on a bug, you should feel confident that 3.8.3 from DockerHub should always be used.

    NodeJS is a great example. They keep all their old deprecated versions available in Docker Hub for archival sake.

    I have been utilizing named tags (NOT latest) from Docker Hub in all my projects for work and home, and I’ve never into an issue after deployment where a project crashed because something changed “under my feet”. In fact, just last week, I rebuilt and updated some code on an older version of NodeJS (from 4 years ago) which required a repull, and because it was a named version (not latest), it worked exactly as expected.

    • python:3.8.3 is a synonym for python:3.8.3-buster which refers to the Debian Buster OS image this is based on. So even if Python doesn’t change, the underlying OS might have changes in some packages, is that correct? I looked at the official Dockerfile and it does not specify a specific version or build of Debian Buster.

    Once a child image (python) is built off a parent image (buster), it is immutable. The exception is if the child image (python) was rebuilt at a later date and CHOOSES to use a different version of the parent image (buster). But this is considered bad-form, sneaky and undermines the PURPOSE of containers. I don’t know any major package that does this.

    This is like doing a git push --force on your repository after you changed around some commits. It’s seriously bad practice.

    The system is designed and built on trust, and in order for it to be used, adopted and grow, the trust must remain. Always check the older tags of any container you want to use, and be sure they allow their old deprecated tags to live on.

    Thus, when you download python:3.8.3 today, or 2 years from now, it should function exactly the same.

    For example, if you docker pull python:2.7.8, and then docker inspect python:2.7.8 you’ll find that it is the same container that was created 5 years ago.

            "Created": "2014-11-26T22:30:48.061850283Z",
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search