skip to Main Content

I have a Python (3.x) webservice deployed in GCP. Everytime Cloud Run is shutting down instances, most noticeably after a big load spike, I get many logs like these Uncaught signal: 6, pid=6, tid=6, fault_addr=0. together with [CRITICAL] WORKER TIMEOUT (pid:6) They are always signal 6.

The service is using FastAPI and Gunicorn running in a Docker with this start command

CMD gunicorn -w 2 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8080 app.__main__:app

The service is deployed using Terraform with 1 gig of ram, 2 cpu’s and the timeout is set to 2 minutes

resource "google_cloud_run_service" <ressource-name> {
  name     = <name>
  location = <location>

  template {
    spec {
      service_account_name = <sa-email>
      timeout_seconds = 120
      containers {
        image = var.image
        env {
          name = "GCP_PROJECT"
          value = var.project
        }
        env {
          name = "BRANCH_NAME"
          value = var.branch
        }
        resources {
          limits = {
            cpu = "2000m"
            memory = "1Gi"
          }
        }
      }
    }
  }
  autogenerate_revision_name = true
}

I have already tried tweaking the resources and timeout in Cloud Run, using the –timeout and –preload flag for gunicorn as that is what people always seem to recommend when googling the problem but all without success. I also dont exactly know why the workers are timing out.

3

Answers


  1. Unless you have enabled CPU is always allocated, background threads and processes might stop receiving CPU time after all HTTP requests return. This means background threads and processes can fail, connections can timeout, etc. I cannot think of any benefits to running background workers with Cloud Run except when setting the –cpu-no-throttling flag. Cloud Run instances that are not processing requests, can be terminated.

    Signal 6 means abort which terminates processes. This probably means your container is being terminated due to a lack of requests to process.

    Run more workloads on Cloud Run with new CPU allocation controls

    What if my application is doing background work outside of request processing?

    Login or Signup to reply.
  2. Extending on the top answer which is correct, You are using GUnicorn which is a process manager that manages Uvicorn processes which runs the actual app.

    When Cloudrun wants to shutdown the instance (due to lack of requests probably) it will send a signal 6 to process 1. However, GUnicorn occupies this process as the manager and will not pass it to the Uvicorn workers for handling – thus you receive the Unhandled signal 6.

    The simplest solution, is to run Uvicorn directly instead of through GUnicorn (possibly with a smaller instance) and allow the scaling part to be handled via Cloudrun.

    CMD ["uvicorn", "app.__main__:app", "--host", "0.0.0.0", "--port", "8080"]
    
    Login or Signup to reply.
  3. This error happens when a background process is aborted. There are some advantages of running background threads on cloud just like for other applications. Luckily, you can still use them on Cloud Run without processes getting aborted. To do so, when deploying, chose the option "CPU always allocated" instead of "CPU only allocated during request processing"

    For more details, check https://cloud.google.com/run/docs/configuring/cpu-allocation

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search