skip to Main Content

We are using Symfony Messenger in combination with supervisor running in a Docker container on AWS ECS. We noticed the worker is not shut down gracefully. After debugging it appears it does work as expected when using APP_ENV=dev, but not when APP_ENV=prod.

I made a simple sleepMessage, which sleeps for 1 second and then prints a message for 60 seconds. This is when running with APP_ENV=dev
enter image description here

As you can see it’s clearly waiting for the program to stop running.
Now with APP_ENV=prod:
enter image description here

It stops immediately without waiting.

In the Dockerfile we have configured the following to start supervisor. It’s based on php:8.1-apache, so that’s why STOPSIGNAL has been configured

RUN apt-get update && apt-get install -y --no-install-recommends 
    # for supervisor
    python 
    supervisor

The start-worker.sh script contains this

#!/usr/bin/env bash

cp config/worker/messenger-worker.conf ../../../etc/supervisor/supervisord.conf
exec /usr/bin/supervisord

We do this because certain env variables are only available when starting up.
For debugging purposes the config has been hardcoded to test.
Below is the messenger-worker.conf

[unix_http_server]
file=/tmp/supervisor.sock

[supervisord]
nodaemon=true               ; start in foreground if true; default false

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[program:messenger-consume]
stderr_logfile_maxbytes=0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
command=bin/console messenger:consume async -vv --env=prod --time-limit=129600
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
numprocs=1
environment=
    MESSENGER_TRANSPORT_DSN="https://sqs.eu-central-1.amazonaws.com/{id}/dev- 
    symfony-messenger-queue"

So in short, when using --env=prod in the config above it doesn’t wait for the worker to stop, while with --env=dev it does. Does anybody know how to solve this?

2

Answers


  1. Chosen as BEST ANSWER

    Turns out it was related to the wait_time option related to SQS transports. It probably caused a request that was started just before the container exited and was sent back when the container did not exist anymore. So, wait_time to 0 fixed that problem.

    Then there was this which could lead to the same issue


  2. I don’t know why there would be a difference between dev & prod environment but it seems you have no grace period set (at least for Supervisor). As I added in the docs:

    As you use Docker too, you can also set the graceful period at the service level which defaults to 10s:

    services:
        my_app:
            stop_grace_period: 20s
            # ...
    

    With this configuration, running docker-compose down (just an example):

    • Docker sends a SIGTERM signal to the service entrypoint (Supervisor) and waits 20s for it to exit
    • Supervisor sends a SIGTERM signal to its programs (messenger:consume commands) and waits 20s for them to exit
    • the messenger:consume processes will "catch" the signal, finish handling the current message and stop
    • every program stopped, Supervisor can stop, then the Docker Compose stack
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search