We are using Symfony Messenger in combination with supervisor running in a Docker container on AWS ECS. We noticed the worker is not shut down gracefully. After debugging it appears it does work as expected when using APP_ENV=dev
, but not when APP_ENV=prod
.
I made a simple sleepMessage, which sleeps for 1 second and then prints a message for 60 seconds. This is when running with APP_ENV=dev
As you can see it’s clearly waiting for the program to stop running.
Now with APP_ENV=prod
:
It stops immediately without waiting.
In the Dockerfile we have configured the following to start supervisor. It’s based on php:8.1-apache
, so that’s why STOPSIGNAL has been configured
RUN apt-get update && apt-get install -y --no-install-recommends
# for supervisor
python
supervisor
The start-worker.sh script contains this
#!/usr/bin/env bash
cp config/worker/messenger-worker.conf ../../../etc/supervisor/supervisord.conf
exec /usr/bin/supervisord
We do this because certain env variables are only available when starting up.
For debugging purposes the config has been hardcoded to test.
Below is the messenger-worker.conf
[unix_http_server]
file=/tmp/supervisor.sock
[supervisord]
nodaemon=true ; start in foreground if true; default false
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[program:messenger-consume]
stderr_logfile_maxbytes=0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
command=bin/console messenger:consume async -vv --env=prod --time-limit=129600
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
numprocs=1
environment=
MESSENGER_TRANSPORT_DSN="https://sqs.eu-central-1.amazonaws.com/{id}/dev-
symfony-messenger-queue"
So in short, when using --env=prod
in the config above it doesn’t wait for the worker to stop, while with --env=dev
it does. Does anybody know how to solve this?
2
Answers
Turns out it was related to the
wait_time
option related to SQS transports. It probably caused a request that was started just before the container exited and was sent back when the container did not exist anymore. So,wait_time
to 0 fixed that problem.Then there was this which could lead to the same issue
I don’t know why there would be a difference between
dev
&prod
environment but it seems you have no grace period set (at least for Supervisor). As I added in the docs:SIGTERM
signal if you have the PCNTL PHP extensionstopwaitsecs
to your Supervisor program configurationAs you use Docker too, you can also set the graceful period at the service level which defaults to 10s:
With this configuration, running
docker-compose down
(just an example):SIGTERM
signal to the service entrypoint (Supervisor) and waits 20s for it to exitSIGTERM
signal to its programs (messenger:consume
commands) and waits 20s for them to exitmessenger:consume
processes will "catch" the signal, finish handling the current message and stop