skip to Main Content

I need some help,

My setup is Django, Postgres, Celery, Redis – all dockerized. Besides regular user-related features, the app should scrape info in the background mode.

What I need is to launch the scraping function manually from management command like "python manage.py start_some_scrape –param1 –param2 ..etc", and know that this script works in the background mode informing me only by logs.

At this moment script works without Celery and only while the terminal connection is alive what is not useful because the scraper should work a long time – like days.

Is Celery the right option for this?
How to pass a task from management command to Celery properly?
How to prevent Celery to be blocked by the long-time task? Celery also has other tasks – related and not related to the scraping script. Is there are threads or some other way?

Thx for help!

2

Answers


  1. A simple way would be to just send your task to the background of your shell. With nohup it shouldn’t be terminated even if your shell session expires.

    your-pc:~$ nohup python manage.py start_some_scrape --param1 --param2 > logfile.txt &
    
    Login or Signup to reply.
  2. Your route to achieve what u want is

    Django–>Celery(Redis)–>SQLite<–Scrapyd<–Scrapy

    If u want to use a shared DB like PostGre, u need to patch scrapyd as it supports only SQLite.
    There is a github project https://github.com/holgerd77/django-dynamic-scraper that can do what u want or simple teach u how to pass celery tasks.
    Celery is an asynchronous task queue/job queue so u dont have any kind of blocks from his side.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search