I am facing a significant issue with my self-hosted build agents in my CI/CD pipeline, and I need a comprehensive solution. Here are the details:
Build Agents and Services
I have 6 build agents.
I manage 30-40 services in the pipeline.
Problem
During the build process, my self-hosted build agents’ disks are becoming full due to the accumulation of Docker images and cache layers.
We are already using the scratch base image to minimize the size of the Docker images.
Requirements
I need a solution that can preserve the cache layers to speed up the build process but also ensures that the disk does not get full.
What I Tried and Expected Results
I have attempted several solutions to address the issue, but none have been entirely satisfactory:
Docker System Prune -a
- Command: docker system prune -a
- Expected Outcome: Free up disk space by removing all unused data (containers, networks, images, and cache).
- Actual Result: This solution significantly slowed down the build process as it removed essential cache layers.
Docker Image Removal in Azure Pipeline
Command:
bash: |
docker rmi img/address/image_name:$(Build.BuildId)
displayName: "Docker Delete"
- Expected Outcome: Remove specific images to free up space.
- Actual Result: This command did not free up enough space to prevent the disk from becoming full.
- Desired Solution
I need a solution that keeps the build process fast and prevents the disk from getting full. It is acceptable to delete images and containers if necessary.
Please provide a solution that addresses the following:
-
Maintaining Build Speed: Ensure that the build process remains efficient and quick.
-
Effective Disk Space Management: Implement a strategy to manage disk space effectively.
-
Preserving Cache Layers: Find a way to keep essential cache layers while freeing up disk space.
-
Automation and Maintenance: Utilize scripts or tools to automate the cleanup process without manual intervention.
Can you help with a solution that meets these requirements?
2
Answers
You are using self-hosted agents and you should maintain your resources locally instead of from within the pipeline.
As a workaround, you can create a separate pipeline to remove all unused images and stopped containers and configure schedules for it to be triggered regularly.
Docker System Prune -a
which removes all unused data, you can usedocker image prune -a --filter "until=24h"
command, which will remove all unused images created more than 24 hours ago.docker container prune --filter "until=24h"
command, which will remove all stopped containers created more than 24 hours agoAdd your self-hosted agents to an environment as VM resources. Use deployment job to run the above scripts on your target machines.
Add a schedule trigger to the pipeline so that it can be triggered to clean up images and containers regularly.
Below is a sample yaml for your reference.
As a starting point, and to keep it simple:
Create a pipeline to do a proper cleanup (running
docker system prune -a
) based on a schedule – let’s say every day after office hours, or 1-2 times per week.Consider also running the prune command without the
-a
more often, in order to remove all unused containers, networks, images (both dangling and unused).You can also cache some Docker base images by pulling some (or all) of them in the same pipeline right after the cleanup.
Something like:
It’s ok to keep the following task in case the image is no longer used: