skip to Main Content

I am facing a significant issue with my self-hosted build agents in my CI/CD pipeline, and I need a comprehensive solution. Here are the details:

Build Agents and Services

I have 6 build agents.
I manage 30-40 services in the pipeline.

Problem

During the build process, my self-hosted build agents’ disks are becoming full due to the accumulation of Docker images and cache layers.
We are already using the scratch base image to minimize the size of the Docker images.

Requirements

I need a solution that can preserve the cache layers to speed up the build process but also ensures that the disk does not get full.

What I Tried and Expected Results

I have attempted several solutions to address the issue, but none have been entirely satisfactory:

Docker System Prune -a 
  • Command: docker system prune -a
  • Expected Outcome: Free up disk space by removing all unused data (containers, networks, images, and cache).
  • Actual Result: This solution significantly slowed down the build process as it removed essential cache layers.

Docker Image Removal in Azure Pipeline

Command:

bash: |
  docker rmi img/address/image_name:$(Build.BuildId)
displayName: "Docker Delete"
  • Expected Outcome: Remove specific images to free up space.
  • Actual Result: This command did not free up enough space to prevent the disk from becoming full.
  • Desired Solution
    I need a solution that keeps the build process fast and prevents the disk from getting full. It is acceptable to delete images and containers if necessary.

Please provide a solution that addresses the following:

  • Maintaining Build Speed: Ensure that the build process remains efficient and quick.

  • Effective Disk Space Management: Implement a strategy to manage disk space effectively.

  • Preserving Cache Layers: Find a way to keep essential cache layers while freeing up disk space.

  • Automation and Maintenance: Utilize scripts or tools to automate the cleanup process without manual intervention.

Can you help with a solution that meets these requirements?

2

Answers


  1. You are using self-hosted agents and you should maintain your resources locally instead of from within the pipeline.

    As a workaround, you can create a separate pipeline to remove all unused images and stopped containers and configure schedules for it to be triggered regularly.

    1. Instead of using Docker System Prune -a which removes all unused data, you can use
    • docker image prune -a --filter "until=24h" command, which will remove all unused images created more than 24 hours ago.
    • docker container prune --filter "until=24h" command, which will remove all stopped containers created more than 24 hours ago
    1. Add your self-hosted agents to an environment as VM resources. Use deployment job to run the above scripts on your target machines.

    2. Add a schedule trigger to the pipeline so that it can be triggered to clean up images and containers regularly.

    Below is a sample yaml for your reference.

    # YAML file in the main branch
    trigger:
    - none
    
    schedules:
    - cron: '0 0 * * *'
      displayName: Daily midnight build
      branches:
        include:
        - main
    
    jobs:
    - deployment: VMDeploy
      displayName: Deploy to VMs
      environment: 
        name: VMenv
        resourceType: virtualMachine
      strategy:
         runOnce:
            deploy:   
              steps:
                - task: Bash@3
                  inputs:
                    targetType: 'inline'
                    script: |
                      docker image prune -a --filter "until=24h" -f
                      docker container prune --filter "until=24h" -f
    
    Login or Signup to reply.
  2. We have a 500GB disk that gets full in 3-4 weeks

    As a starting point, and to keep it simple:

    • Create a pipeline to do a proper cleanup (running docker system prune -a) based on a schedule – let’s say every day after office hours, or 1-2 times per week.

    • Consider also running the prune command without the -a more often, in order to remove all unused containers, networks, images (both dangling and unused).

    • You can also cache some Docker base images by pulling some (or all) of them in the same pipeline right after the cleanup.

    Something like:

    bash: |
      docker system prune -a
    
      docker pull my-base-image-a
      docker pull my-base-image-b
      # other base images to pull here
    displayName: "Docker cleanup and pull images"
    retryCountOnTaskFailure: 3 # configure retries
    condition: always()
    

    It’s ok to keep the following task in case the image is no longer used:

    bash: |
      docker rmi img/address/image_name:$(Build.BuildId)
    displayName: "Docker Delete"
    condition: always()
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search