skip to Main Content

Trying to get a celery-based scraper up and running. The celery worker seems to function on its own, but when I also run the celery beat server, the worker gives me this keyerror.

  File "c:usersmyusername.virtualenvsdjango-news-scraper-dbqk-dk5libsite-packagesceleryworkerconsumerconsumer.py", line 555, in on_task_received
    strategy = strategies[type_]
KeyError: 'core.tasks.scrape_dev_to'
[2020-10-04 16:51:41,231: ERROR/MainProcess] Received unregistered task of type 'core.tasks.scrape_dev_to'.
The message has been ignored and discarded.

I’ve been through many similar answers on stackoverflow, but none solved my problem. I’ll list things I tried at the end.

Project structure:

core -tasks

newsscraper -celery.py -settings.py

tasks:

import time
from newsscraper.celery import shared_task, task
from .scrapers import scrape

@task
def scrape_dev_to():
    URL = "https://dev.to/search?q=django"
    scrape(URL)
    return

settings.py:

INSTALLED_APPS = [
    'django.contrib.admin',
     ...
    'django_celery_beat',
    'core',
]
...
# I Added this setting while troubleshooting, got a new ModuleNotFound error for core.tasks
#CELERY_IMPORTS = (
#    'core.tasks',
#)
CELERY_BROKER_URL = 'redis://localhost:6379'
CELERY_BEAT_SCHEDULE = {
    "ScrapeStuff": {
        'task': 'core.tasks.scrape_dev_to',
        'schedule': 10  # crontab(minute="*/30")
    }
}

celery.py:

from __future__ import absolute_import, unicode_literals

import os

from celery import Celery

# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'newsscraper.settings')

app = Celery('newsscraper')

app.config_from_object('django.conf:settings', namespace='CELERY')

# Load task modules from all registered Django app configs.
app.autodiscover_tasks()

When I run debug for the celery worker, I see that celery doesn’t have the task I want (scrape_dev_to) registered. Shouldn’t the app.autodiscover_tasks() call in celery.py take care of this? Here’s the output:

  . celery.accumulate
  . celery.backend_cleanup
  . celery.chain
  . celery.chord
  . celery.chord_unlock
  . celery.chunks
  . celery.group
  . celery.map
  . celery.starmap

I also get a ModuleNotFoundError when I try to add core.tasks to a CELERY_IMPORTS setting. This is my best guess for where the problem is, but I don’t know how to solve it.

Things I tried:

  1. Add core.tasks to a celery_imports setting. This causes a new error when I try to run the celery beat: ‘no module named ‘core.tasks’ ‘.
  2. Hardcoding the name in the task: name=’core.tasks.scrape_dev_to’
  3. Specified the celery config explicitly when calling the worker: celery -A newsscraper worker -l INFO -settings=celeryconfig
  4. Playing with my imports (from newsscraper.celery instead of from celery, for instance)
  5. Adding some config code to the init.py for the module containing tasks (already had it in the init.py for module containing settings and celery.py)
  6. Python manage.py check identified no issues
  7. Calling the work with core.tasks explicitly: celery -A core.tasks worker -l INFO

3

Answers


  1. I had the same problem and this setup solved it for me.

    in your settings

    CELERY_IMPORTS = [
        'app_name.tasks',
    ]
    

    and

    # app_name/tasks.py
    
    from celery import shared_task
    
    @shared_task
    def my_task(*args, **kwargs):
        pass
    

    Docs ref for imports.

    Login or Signup to reply.
  2. This can occur when you configured a celery task and then removed it.
    Just deconfigure the tasks and configure again

    $ celery -A proj purge
    

    or

    from proj.celery import app
    app.control.purge()
    
    Login or Signup to reply.
  3. In settings.py, I have added the below line:

    CELERY_IMPORTS = [
    'app_name.tasks',]
    

    and it worked for me.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search