I’m trying to speed up my build pipeline by using a previously built docker image as a cache. This works locally, but on Azure DevOps the pipeline rebuilds the docker image from scratch every time.
I’ve split up the instructions in the Dockerfile such that changes in source code should only affect the last layer of the image
Dockerfile:
FROM my_teams_own_baseimage
# Set index for package installation of our own python packages
ARG INDEX_URL
ENV PIP_EXTRA_INDEX_URL=$INDEX_URL
# Copy the requirements file, set the working directory, and install
# python requirements for this project.
COPY requirements.txt /work/
WORKDIR /work
RUN python -m pip install pip --upgrade
&& pip install --no-cache -r requirements.txt
# Copy all the remaining stuff into /work
COPY . /work/
Relevant pipeline steps:
- Authenticate pip to create the
PIP_EXTRA_INDEX_URL
which is passed to the docker build context with--build-arg
- Log in to the Azure Container Registry
- Pull the "latest" image
- Build the new image using the "latest" image as cache
- task: PipAuthenticate@1
displayName: 'Pip authenticate'
inputs:
artifactFeeds: $(ArtifactFeed)
onlyAddExtraIndex: true
- task: Docker@1
displayName: 'Docker login'
inputs:
containerregistrytype: 'Azure Container Registry'
azureSubscriptionEndpoint: '$(AzureSubscription)'
azureContainerRegistry: '$(ACR)'
command: 'login'
- script: "docker pull $(ACR)/$(ImagePrefix)$(ImageName):latest"
displayName: "Pull latest image for layer caching"
continueOnError: true # for first build, no cache
- task: Docker@1
displayName: 'Build image'
inputs:
containerregistrytype: 'Azure Container Registry'
azureSubscriptionEndpoint: '$(AzureSubscription)'
azureContainerRegistry: '$(ACR)'
command: 'Build an image'
dockerFile: 'Dockerfile'
arguments: |
--cache-from $(ACR)/$(ImagePrefix)$(ImageName):latest
--build-arg INDEX_URL=$(PIP_EXTRA_INDEX_URL)
imageName: '$(ACR)/$(ImagePrefix)$(ImageName):$(ImageTag)'
When I do this locally, I (as expected) get all layers using the cached versions except the last one, whereas on ADO the only thing that is cached is the layers needed to download the image specified in the FROM
instruction (so clearly something is cached..), and then – weirdly – the second step but not the first??
Output from the ADO pipeline log:
Step 1/11 : FROM my_teams_own_baseimage
---> 257aee2d50ca
Step 2/11 : ARG INDEX_URL
---> Using cache
---> 51b3ddad9198
Step 3/11 : ENV PIP_EXTRA_INDEX_URL=$INDEX_URL
---> Running in 01338566e424
Removing intermediate container 01338566e424
---> df80a24236d0
Step 4/11 : COPY requirements.txt /work/
---> 10c7e91c753e
Step 5/11 : WORKDIR /work
---> Running in af615be24108
Removing intermediate container af615be24108
---> f01b0b69df75
Step 6/11 : RUN python -m pip install pip --upgrade && pip install --no-cache -r requirements.txt
---> Running in 0266deda77c6
I have tried not using the Docker@1
ADO task and instead using an inline script like I would do locally, but the result is the same.
My only idea is that the INDEX_URL
is actually different each time and therefore invalidates the subsequent layers. I cannot get it printed out to check because its a "secret" so ADO puts **** instead.
EDIT
After some more trial and error, it appears that the PIP_EXTRA_INDEX_URL
created by the PipAuthenticate@1
task is unique every time and only lasts for a day or two. Because the argument to pass in this env var is at the top of the Dockerfile because it is needed in the pip install
command, every subsequent image layer will be un-cached. I cannot find a way around this, except for a static PIP_EXTRA_INDEX_URL
but that seems non-ideal in a cloud environment..
2
Answers
I guess the answer to the question is that there is no way to use cache when also passing in a variable (here, the
PIP_EXTRA_INDEX_URL
) at the top, which changes every time, because all subsequent layers will be invalidated for caching purposes.The way around this that I found was to do this part in our base image and generate that once a day. Then only the
FROM
part of theDockerfile
has a change which means that thepip install
step can be cached from an earlier version of the same build.Are you using Self-Hosted or Microsoft hosted agent pools for your pipeline. If you are running your pipeline on Microsoft hosted agents, it provisions a new ephemeral agent for your each pipeline run. Newly provisioned agent does not have history of pipeline execution so it does not have any cache of your Docker build stage. Best solution for this case is to use self-hosted agent pool where you have more control over your pipeline behaviour. If you are running more than one agent in any pool, you can always point your pipeline to a particular agent specifying agent.name in deman to keep cache of docker build task.