skip to Main Content

I’ve been running some build pipelines in Google Cloud Build that use docker-compose without any issue for the past 2 months or so.
The pipeline sets up an integration testing environment using Docker Compose, and runs the applicable unit and integration tests on the main container (container-dev) using the docker exec command.

Here is a snippet of the Cloud Build file:

  - id: "Set Up Testing Instances (Docker Compose)"
    name: docker
    env:
      - 'DISCORD_TOKEN=automated_test'
      - 'DOCKER_NETWORK=cloudbuild'
    args: ["compose", "up", "-d", "--build"]

  - id: "Run Unit & Integration Tests"
    name: docker
    args: ["exec", "container-dev", "python", "-m", "coverage", "run", "-m", "pytest"]

  - id: "Show Test Coverage"
    name: docker
    args: ["exec", "container-dev", "python", "-m", "coverage", "report"]

  - id: "Build Test Docker Container"
    name: docker
    entrypoint: /bin/bash
    args: ["build", ".", "--target", "live", "-t", "us-east1-docker.pkg.dev/$PROJECT_ID/my-registry/mycontainer:test"]

This configuration has been working well for me for months now. However, suddenly on the 13th September, as soon as I tried to run these pipelines (even retrying previously succeeded ones), I get an issue where step 2, the unit and integration tests fails without verbose reasoning, only giving me status code 137. The logs I get are as follows:

Starting Step #1 - "Run Unit & Integration Tests"
Step #1 - "Run Unit & Integration Tests": Already have image (with digest): gcr.io/cloud-builders/docker
Finished Step #1 - "Run Unit & Integration Tests"
ERROR
ERROR: build step 1 "gcr.io/cloud-builders/docker" failed: step exited with non-zero status: 137

I am aware that status code 137 occurs if the machine running the container has run out of memory, or has been terminated by another process.
The most puzzling aspect was how previously succeeded pipelines now failed upon retry.

Unless there has been a change to the underlying machines being used in Cloud Build recently, I don’t believe memory availability is the issue. I tried running just the unit tests in a standalone container (no docker compose) with significantly less memory usage, which had the exact same result. The common denominator is the image gcr.io/cloud-builders/docker.

My question is: Is there a change that has been done to Cloud Build recently that could affect this? Why has this only happened now?


EDIT:
I tried a few more things, like updating all the Docker images from Google, and have updated the YAML above to reflect that. The 137 message is now back. It seemed before that the docker compose step was terminating prematurely, which was leading to a error code 1 (container not available).

Now since I resolved that issue by upgrading the images, the 137 error is back, even when I run from my main branch of my repo, which previously succeeded prior to the 13th Sept.
I have also run the pipeline since with more memory/CPU, but this still achieved the same result, a 137 error code from docker.

As for logs, what I have posted above is pretty much the same as what I’m getting still. I’ll add a few lines here for more context. Step #0 of the pipeline is setting up the containers on the cloudbuild network when running, so that the containers can access each other.

Step #1 is running a Docker image to try and run the exec command on the container to run its unit tests, and this is where it’s failing with 137, it’s saying it already has the docker image, then just cuts out without any logs, so I’ve no clue what’s happening inside that container.

Step #0 - "Set Up Testing Instances (Docker Compose)": 
Step #0 - "Set Up Testing Instances (Docker Compose)": #26 [container-dev] exporting to image
Step #0 - "Set Up Testing Instances (Docker Compose)": #26 exporting layers
Step #0 - "Set Up Testing Instances (Docker Compose)": #26 exporting layers 1.6s done
Step #0 - "Set Up Testing Instances (Docker Compose)": #26 writing image sha256:9ae549b1894c5ffcfadde428bd790fc26201a3a5b56b9d199adfac67b58ce669 done
Step #0 - "Set Up Testing Instances (Docker Compose)": #26 naming to docker.io/library/container-dev done
Step #0 - "Set Up Testing Instances (Docker Compose)": #26 DONE 2.8s
Step #0 - "Set Up Testing Instances (Docker Compose)": Container cloud-storage  Creating
Step #0 - "Set Up Testing Instances (Docker Compose)": Container firestore  Creating
Step #0 - "Set Up Testing Instances (Docker Compose)": Container pubsub  Creating
Step #0 - "Set Up Testing Instances (Docker Compose)": Container cloud-storage  Created
Step #0 - "Set Up Testing Instances (Docker Compose)": Container firestore  Created
Step #0 - "Set Up Testing Instances (Docker Compose)": Container pubsub  Created
Step #0 - "Set Up Testing Instances (Docker Compose)": Container data-prep  Creating
Step #0 - "Set Up Testing Instances (Docker Compose)": Container data-prep  Created
Step #0 - "Set Up Testing Instances (Docker Compose)": Container container-dev  Creating
Step #0 - "Set Up Testing Instances (Docker Compose)": Container container-dev  Created
Step #0 - "Set Up Testing Instances (Docker Compose)": Container cloud-storage  Starting
Step #0 - "Set Up Testing Instances (Docker Compose)": Container firestore  Starting
Step #0 - "Set Up Testing Instances (Docker Compose)": Container pubsub  Starting
Step #0 - "Set Up Testing Instances (Docker Compose)": Container firestore  Started
Step #0 - "Set Up Testing Instances (Docker Compose)": Container cloud-storage  Started
Step #0 - "Set Up Testing Instances (Docker Compose)": Container pubsub  Started
Step #0 - "Set Up Testing Instances (Docker Compose)": Container data-prep  Starting
Step #0 - "Set Up Testing Instances (Docker Compose)": Container data-prep  Started
Step #0 - "Set Up Testing Instances (Docker Compose)": Container container-dev  Starting
Step #0 - "Set Up Testing Instances (Docker Compose)": Container container-dev  Started
Finished Step #0 - "Set Up Testing Instances (Docker Compose)"
Starting Step #1 - "Run Unit & Integration Tests"
Step #1 - "Run Unit & Integration Tests": Already have image: docker
Finished Step #1 - "Run Unit & Integration Tests"
ERROR
ERROR: build step 1 "docker" failed: step exited with non-zero status: 137

2

Answers


  1. Chosen as BEST ANSWER

    After much trial and error, I found that ultimately the container being tested was failing due to a missing dependency. For some reason the Google Pub/Sub Python library had a silent upgrade/change, and suddenly needed a library called pytz which wasn't included in their setup.py/pyproject.toml, not sure how that happened...

    Anyway, after adding pytz directly to my requirements.txt file, the container now works, and the pipeline runs.


  2. I’d start with using the official Cloud Builders images: set the name property to just docker in all three steps, as explained here. The old image name is not well supported anymore.

    And then don’t forget to change the first argument of the first step from docker-compose to just compose.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search