TL;DR: If my CI docker build instruction is something like
DOCKER_BUILDKIT=1 docker build
--cache-from registry.my.org/project-x/app:latest
--tag registry.my.org/project-x/app:latest
--tag registry.my.org/project-x/app:$CI_BUILD_NUM
--build-arg BUILDKIT_INLINE_CACHE=1
--build-arg BUILD_NUM=$CI_BUILD_NUM
.
how can I limit the expiration or max age of the acceptable image cache so that the first FROM <some-language>:latest
dockerfile directive is refreshed weekly or so, triggering a full build?
Context: My CI cache system, Bitbucket Pipelines, will not cache docker layers produced with BuildKit, which I plan to enable company-wide for diverse improvements. The suggested workaround is to use the --build-arg BUILDKIT_INLINE_CACHE=1
and --cache-from
options when building to benefit from previously published images. See https://support.atlassian.com/bitbucket-cloud/docs/run-docker-commands-in-bitbucket-pipelines/#Docker-BuildKit-caching-limitations
This would be awesome because I would avoid cache size limits which right now result into frequent layer cache misses in big projects.
BUT
Due to the structure of my dockerfiles, which are usually like
- Pull language runtime
- Update language package manager
- Install system requirements (rarely updated)
- Copy dependency pinning and install it (weekly updated)
- Copy application sources (rarely cached, but potentially cached in microservices monorepos or if developments only affected files outside the building context like CI files)
- Enumerate the release with the CI incremental run number (never cached but super cheap!)
(See this example for a Python project, but Node or Php projects are written in a very similar mood)
FROM python:3.9-slim
RUN pip install --upgrade pip
RUN apt-get update && apt-get install --assume-yes
gcc gettext libcurl4-openssl-dev libpangoft2-1.0-0 libssl-dev ... whatever
WORKDIR /app
COPY requirements.txt /app
RUN pip install --requirement requirements.txt
COPY . /app
ARG BUILD_NUM
RUN test -n "$BUILD_NUM"
ENV RELEASE_NUM=$BUILD_NUM
CMD ["python", "/app/main.py"]
I fear I will have a perfect cache hit for ever and ever for the preamble involving the runtime, package manager and system libraries installation, dragging them on old versions over time.
Right now the docker layers cache is cleared weekly so the images stay eventually up-to-date!
2
Answers
Building on this mitigation https://stackoverflow.com/a/73003290/11715259, a workaround I think possible is to write a timestamp into an arbitrary file cached by the CI system, and use it as a build arg right after the FROM directive as a "cache epoch" to effectively invalidate the whole docker image whenever the contents of the arbitrary file are modified.
But this feels quite cumbersome, I'd expect a BuildKit builtin mechanism instead.
A mitigation is to add the
--pull
option to the build instruction.If you are using trending language official base images, they are updated quite regularly so this would suffice to invalidate all cached layers from time to time.
Yet, if for some reason you base images are updated sparingly, you would consequently miss updates on libraries and package managers for long periods.
Altogether, this depends on the release cycle of your base images so it is quite uncontrollable and not a canonical solution.