Optimizing Docker Image Storage and Cache Management in Self-Hosted Build Agents

BhaveshPatil
July 11, 2024
148 views
0 votes
2 Answers

I am facing a significant issue with my self-hosted build agents in my CI/CD pipeline, and I need a comprehensive solution. Here are the details:

Build Agents and Services

I have 6 build agents.
I manage 30-40 services in the pipeline.

Problem

During the build process, my self-hosted build agents’ disks are becoming full due to the accumulation of Docker images and cache layers.
We are already using the scratch base image to minimize the size of the Docker images.

Requirements

I need a solution that can preserve the cache layers to speed up the build process but also ensures that the disk does not get full.

What I Tried and Expected Results

I have attempted several solutions to address the issue, but none have been entirely satisfactory:

Docker System Prune -a

Command: docker system prune -a
Expected Outcome: Free up disk space by removing all unused data (containers, networks, images, and cache).
Actual Result: This solution significantly slowed down the build process as it removed essential cache layers.

Docker Image Removal in Azure Pipeline

Command:

bash: |
  docker rmi img/address/image_name:$(Build.BuildId)
displayName: "Docker Delete"

Expected Outcome: Remove specific images to free up space.
Actual Result: This command did not free up enough space to prevent the disk from becoming full.
Desired Solution
I need a solution that keeps the build process fast and prevents the disk from getting full. It is acceptable to delete images and containers if necessary.

Please provide a solution that addresses the following:

Maintaining Build Speed: Ensure that the build process remains efficient and quick.
Effective Disk Space Management: Implement a strategy to manage disk space effectively.
Preserving Cache Layers: Find a way to keep essential cache layers while freeing up disk space.
Automation and Maintenance: Utilize scripts or tools to automate the cleanup process without manual intervention.

Can you help with a solution that meets these requirements?

Answers

- ZiyangLiuMSFT
- July 11, 2024 at 11:26 am
- 0 votes
0
You are using self-hosted agents and you should maintain your resources locally instead of from within the pipeline.

As a workaround, you can create a separate pipeline to remove all unused images and stopped containers and configure schedules for it to be triggered regularly.
1. Instead of using Docker System Prune -a which removes all unused data, you can use
- docker image prune -a --filter "until=24h" command, which will remove all unused images created more than 24 hours ago.
- docker container prune --filter "until=24h" command, which will remove all stopped containers created more than 24 hours ago
1. Add your self-hosted agents to an environment as VM resources. Use deployment job to run the above scripts on your target machines.
2. Add a schedule trigger to the pipeline so that it can be triggered to clean up images and containers regularly.
Below is a sample yaml for your reference.
```
# YAML file in the main branch
trigger:
- none

schedules:
- cron: '0 0 * * *'
  displayName: Daily midnight build
  branches:
    include:
    - main

jobs:
- deployment: VMDeploy
  displayName: Deploy to VMs
  environment: 
    name: VMenv
    resourceType: virtualMachine
  strategy:
     runOnce:
        deploy:   
          steps:
            - task: Bash@3
              inputs:
                targetType: 'inline'
                script: |
                  docker image prune -a --filter "until=24h" -f
                  docker container prune --filter "until=24h" -f
```
Login or Signup to reply.

- RuiJarimba
- July 11, 2024 at 11:46 am
- 0 votes
0
We have a 500GB disk that gets full in 3-4 weeks

As a starting point, and to keep it simple:
- Create a pipeline to do a proper cleanup (running docker system prune -a) based on a schedule – let’s say every day after office hours, or 1-2 times per week.
- Consider also running the prune command without the -a more often, in order to remove all unused containers, networks, images (both dangling and unused).
- You can also cache some Docker base images by pulling some (or all) of them in the same pipeline right after the cleanup.
Something like:
```
bash: |
  docker system prune -a

  docker pull my-base-image-a
  docker pull my-base-image-b
  # other base images to pull here
displayName: "Docker cleanup and pull images"
retryCountOnTaskFailure: 3 # configure retries
condition: always()
```
It’s ok to keep the following task in case the image is no longer used:
```
bash: |
  docker rmi img/address/image_name:$(Build.BuildId)
displayName: "Docker Delete"
condition: always()
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.