I am building celery + django + selenium application. I am running selenium-based browsers in separate processes with help celery. Versions:
celery==5.2.6
redis==3.4.1
selenium-wire==5.1.0
Django==4.0.4
djangorestframework==3.13.1
I found out that after several hours application generates thousands of zombie processes. Also found out that problem deals with celery docker container
, because after sudo /usr/local/bin/docker-compose -f /data/new_app/docker-compose.yml restart celery
I have 0 zombie processes.
My code
from rest_framework.decorators import api_view
@api_view(['POST'])
def periodic_check_all_urls(request): # web-service endpoint
...
check_urls.delay(parsing_results_ids) # call celery task
Celery task code
from celery import shared_task
@shared_task()
def check_urls(parsing_result_ids: List[int]):
"""
Run Selenium-based parser
the parser exctracts data and saves in database
"""
try:
logger.info(f"{datetime.now()} Start check_urls")
parser = Parser() # open selenium browser
parsing_results = ParsingResult.objects.filter(pk__in=parsing_result_ids).exclude(status__in=["DONE", "FAILED"])
parser.check_parsing_result(parsing_results)
except Exception as e:
full_trace = traceback.format_exc()
finally:
if 'parser' in locals():
parser.stop()
Selenium browser stop function and destructor
class Parser():
def __init__(self):
"""
Prepare parser
"""
if not USE_GUI:
self.display = Display(visible=0, size=(800, 600))
self.display.start()
""" Replaced with FireFox
self.driver = get_chromedriver(proxy_data)
"""
proxy_data = {
...
}
self.driver = get_firefox_driver(proxy_data=proxy_data)
def __del__(self):
self.stop()
def stop(self):
try:
self.driver.quit()
logger.info("Selenium driver closed")
except:
pass
try:
self.display.stop()
logger.info("Display stopped")
except:
pass
Also I was trying several settings to limit celery task resources and time of work (it didn’t help with Zombie processes)
My celery settings in dgango settings.py
# celery setting (documents generation)
CELERY_BROKER_URL = os.environ.get("CELERY_BROKER", "redis://redis:6379/0")
CELERY_RESULT_BACKEND = os.environ.get("CELERY_BROKER", "redis://redis:6379/0")
CELERY_IMPORTS = ("core_app.celery",)
CELERY_TASK_TIME_LIMIT = 10 * 60
My celery settings in dockers
celery:
build: ./project
command: celery -A core_app worker --loglevel=info --concurrency=15 --max-memory-per-child=1000000
volumes:
- ./project:/usr/src/app
- ./project/media:/project/media
- ./project/logs:/project/logs
env_file:
- .env
environment:
# environment variables declared in the environment section override env_file
- DJANGO_ALLOWED_HOSTS=localhost 127.0.0.1 [::1]
- CELERY_BROKER=redis://redis:6379/0
- CELERY_BACKEND=redis://redis:6379/0
depends_on:
- django
- redis
I read Django/Celery – How to kill a celery task? but it didn’t help
Also read Celery revoke leaving zombie ffmpeg process but my task already contains try/except
Example of zombie processes
ps aux | grep 'Z'
root 32448 0.0 0.0 0 0 ? Z 13:45 0:00 [Utility Process] <defunct>
root 32449 0.0 0.0 0 0 ? Z 13:09 0:00 [Utility Process] <defunct>
root 32450 0.0 0.0 0 0 ? Z 11:13 0:00 [sh] <defunct>
root 32451 0.0 0.0 0 0 ? Z 13:44 0:00 [Utility Process] <defunct>
root 32452 0.0 0.0 0 0 ? Z 10:12 0:00 [Utility Process] <defunct>
root 32453 0.0 0.0 0 0 ? Z 09:52 0:00 [sh] <defunct>
root 32454 0.0 0.0 0 0 ? Z 10:40 0:00 [Utility Process] <defunct>
root 32455 0.0 0.0 0 0 ? Z 09:52 0:00 [Utility Process] <defunct>
root 32456 0.0 0.0 0 0 ? Z 10:13 0:00 [sh] <defunct>
root 32457 0.0 0.0 0 0 ? Z 10:51 0:00 [Utility Process] <defunct>
root 32459 0.0 0.0 0 0 ? Z 14:01 0:00 [Utility Process] <defunct>
root 32460 0.0 0.0 0 0 ? Z 13:16 0:00 [Utility Process] <defunct>
root 32461 0.0 0.0 0 0 ? Z 10:40 0:00 [Utility Process] <defunct>
root 32462 0.0 0.0 0 0 ? Z 10:12 0:00 [Utility Process] <defunct>
2
Answers
Use timeout and soft_time_limit
You have already set CELERY_TASK_TIME_LIMIT, but it can be beneficial to also use soft_time_limit. The soft_time_limit sends a TimeoutError signal to the task, which you can catch to clean up resources before the task is forcefully terminated after the time_limit.
Here’s how you can set both:
Ensure All Selenium Processes are Cleaned
Make sure all subprocesses, including the Selenium driver and X server (in headless mode), are correctly stopped. This could involve adding explicit process killing if necessary. For instance:
Implement soft_time_limit and time_limit for task termination.
Ensure that all Selenium resources are released (including driver and display).
Use psutil to clean lingering processes.
Configure Docker memory limits and restart policies.
Use max-tasks-per-child to automatically restart workers.
I’d start by turning the
Parser
class into a context manager:If there is an error thrown within the
with
block,Parser.__exit__
will be called before the exception is raised, which gives you the chance to kill the driver and the display before the process closes.Note that I removed your empty
try: except:
blocks in thestop
method. This is bad practice, because you won’t see the traceback, which would be quite useful for debugging your question…Now in your task:
It’s unlikely Celery is the problem. Using Selenium within a Docker container seems to be the root cause of the zombie processes. See Jimmy Engelbrecht’s answer for further details.
Jimmy’s solution to the zombie problem:
If this solution doesn’t work, please show us the traceback you suppressed in your
Parser.stop
method.