Azure pipeline Docker task not using cache

KasparH
December 9, 2022
255 views
0 votes
2 Answers

I’m trying to speed up my build pipeline by using a previously built docker image as a cache. This works locally, but on Azure DevOps the pipeline rebuilds the docker image from scratch every time.

I’ve split up the instructions in the Dockerfile such that changes in source code should only affect the last layer of the image

Dockerfile:

FROM my_teams_own_baseimage

# Set index for package installation of our own python packages
ARG INDEX_URL
ENV PIP_EXTRA_INDEX_URL=$INDEX_URL

# Copy the requirements file, set the working directory, and install 
# python requirements for this project.
COPY requirements.txt /work/
WORKDIR /work
RUN python -m pip install pip --upgrade 
  && pip install --no-cache -r requirements.txt

# Copy all the remaining stuff into /work
COPY . /work/

Relevant pipeline steps:

Authenticate pip to create the PIP_EXTRA_INDEX_URL which is passed to the docker build context with --build-arg
Log in to the Azure Container Registry
Pull the "latest" image
Build the new image using the "latest" image as cache

- task: PipAuthenticate@1
  displayName: 'Pip authenticate'
  inputs:
    artifactFeeds: $(ArtifactFeed)
    onlyAddExtraIndex: true

- task: Docker@1
  displayName: 'Docker login'
  inputs:
    containerregistrytype: 'Azure Container Registry'
    azureSubscriptionEndpoint: '$(AzureSubscription)'
    azureContainerRegistry: '$(ACR)'
    command: 'login'

- script: "docker pull $(ACR)/$(ImagePrefix)$(ImageName):latest"
  displayName: "Pull latest image for layer caching"
  continueOnError: true # for first build, no cache

- task: Docker@1
  displayName: 'Build image'
  inputs:
    containerregistrytype: 'Azure Container Registry'
    azureSubscriptionEndpoint: '$(AzureSubscription)'
    azureContainerRegistry: '$(ACR)'
    command: 'Build an image'
    dockerFile: 'Dockerfile'
    arguments: |
      --cache-from $(ACR)/$(ImagePrefix)$(ImageName):latest
      --build-arg INDEX_URL=$(PIP_EXTRA_INDEX_URL)
    imageName: '$(ACR)/$(ImagePrefix)$(ImageName):$(ImageTag)'

When I do this locally, I (as expected) get all layers using the cached versions except the last one, whereas on ADO the only thing that is cached is the layers needed to download the image specified in the FROM instruction (so clearly something is cached..), and then – weirdly – the second step but not the first??

Output from the ADO pipeline log:

Step 1/11 : FROM my_teams_own_baseimage
 ---> 257aee2d50ca
Step 2/11 : ARG INDEX_URL
 ---> Using cache
 ---> 51b3ddad9198
Step 3/11 : ENV PIP_EXTRA_INDEX_URL=$INDEX_URL
 ---> Running in 01338566e424
Removing intermediate container 01338566e424
 ---> df80a24236d0
Step 4/11 : COPY requirements.txt /work/
 ---> 10c7e91c753e
Step 5/11 : WORKDIR /work
 ---> Running in af615be24108
Removing intermediate container af615be24108
 ---> f01b0b69df75
Step 6/11 : RUN python -m pip install pip --upgrade   && pip install --no-cache -r requirements.txt
 ---> Running in 0266deda77c6

I have tried not using the Docker@1 ADO task and instead using an inline script like I would do locally, but the result is the same.

My only idea is that the INDEX_URL is actually different each time and therefore invalidates the subsequent layers. I cannot get it printed out to check because its a "secret" so ADO puts **** instead.

EDIT

After some more trial and error, it appears that the PIP_EXTRA_INDEX_URL created by the PipAuthenticate@1 task is unique every time and only lasts for a day or two. Because the argument to pass in this env var is at the top of the Dockerfile because it is needed in the pip install command, every subsequent image layer will be un-cached. I cannot find a way around this, except for a static PIP_EXTRA_INDEX_URL but that seems non-ideal in a cloud environment..

Answers

Chosen as BEST ANSWER
- KasparH
- January 13, 2023 at 1:36 pm
- 0 votes
0
I guess the answer to the question is that there is no way to use cache when also passing in a variable (here, the PIP_EXTRA_INDEX_URL) at the top, which changes every time, because all subsequent layers will be invalidated for caching purposes.

The way around this that I found was to do this part in our base image and generate that once a day. Then only the FROM part of the Dockerfile has a change which means that the pip install step can be cached from an earlier version of the same build.

(Edit)

- iamattiq1991
- December 10, 2022 at 3:56 pm
- 0 votes
0
Are you using Self-Hosted or Microsoft hosted agent pools for your pipeline. If you are running your pipeline on Microsoft hosted agents, it provisions a new ephemeral agent for your each pipeline run. Newly provisioned agent does not have history of pipeline execution so it does not have any cache of your Docker build stage. Best solution for this case is to use self-hosted agent pool where you have more control over your pipeline behaviour. If you are running more than one agent in any pool, you can always point your pipeline to a particular agent specifying agent.name in deman to keep cache of docker build task.
```
pool:
  name: Your Pool
  demands:
  - Agent.Name -equals agentName
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.