skip to Main Content

I am using Celery to run background jobs for my Django app, hosted on Heroku, with Redis as broker. and I want to set up task prioritization.

I am currently using the Celery default queue and all the workers feed from. I was thinking about implementing prioritization within the only queue but it is described everywhere as a bad practice.
The consensus on the best approach to deal with the priority problem is to set different Celery queues for each level of priority. Let’s say:

  • Queue1 for highest priority tasks, assigned to x workers

  • Queue2 the default queue, assigned to all the other workers

The first problem I see with this method is that if there is no high priority task at some time, I loose the productivity of x workers.

Also, let’s say my infrastructure scales up and I have more workers available. Only the number of “default” workers will be expanded dynamically. Besides, this method prevents me from keeping identical dynos (containers on Heroku) which doesn’t look optimized for scalability.

Is there an efficient way to deal with task prioritization and keep replicable workers at the same time?

2

Answers


  1. For the answer, W1 and W2 are workers consuming high and low priority tasks respectively.

    You can scale W1 and W2 as separate containers. You can have three containers, essentially drawn from the same image. One for the app, two for the workers. If you have higher number of one kind of task, only that container would scale. Also, depending on the kind of dyno you are using, you can set concurrency for the workers to use resources in a better way.

    For your reference, this is something that I did in one of my projects.

    Scaling workers

    Login or Signup to reply.
  2. I know this is a very old question, but this might be helpful for someone who is still looking for an answer.

    In My current project, we have an implementation like this

    1) High --> User actions (user is waiting for this task to get completed)
    2) Low ---> Non-User action (There is no wait time)
    3) Reporting --> Sending emails/reports to the user
    4) Maintenance --> Cleaning up the historical data from DB
    

    All these queues have different tasks based on priority (low, medium & high). We have to implement the priority for our celery tasks.

    A) Let’s assume this scenario in which we are more interested in processing the tasks based on priority.

    1) We have 2 or more queues and we are pushing the tasks into the queue(s) by specifying the priority.
    2) all the workers (let's say I have 4 workers) are listening to all the queues. 
    3) In this scenario, if you have 100 tasks in your queues, and within these 100 tasks 20 are high priority tasks, 30 are medium priority and 50 are low priority.
    4) So, the celery framework first process the high priority tasks across the queues then the medium, and finally the low priority tasks.
    

    B) Queue1 for highest priority tasks, assigned to x workers
    Queue2 the default queue, assigned to all the other workers

    1) This approach would be helpful **when you are more concerned about the performance of processing the tasks in a specific queue** like I have queue **HIGH** and tasks in this queue are very important for me irrespective of the priority of the task.
    2) So, I should be having a dedicated worker which would be processing the tasks only from a particular queue. (As you have mentioned in the question, it has some limitations)
    

    We have to choose these two options based on our requirements, I hope this would be helpful,
    Suresh.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search