skip to Main Content

I’m learning about concurrency in Python

I set up Celery run a script to fetch a request for 50 times and it only takes 0.13 seconds to finish, really surprise me !

When I compare with regular synchronous programming, it took ~ 1 minute to complete.

afaik, celery will spawn a pool of processes to execute the tasks. But I don’t think 500 tasks executed by celery worker can be much faster than sync execution of a single task.

# sync.py
import json
import requests
from timer import Timer

URL = 'https://httpbin.org/uuid'

def fetch(url):
    response = requests.get(url)
    data = json.loads(response.text)
    print(data['uuid'])


def main():
    with Timer():
        for _ in range(1):
            fetch(URL)

main()
# take ~50 seconds
# celery_demo.py
from celery import Celery
from timer import Timer
import json
import requests

URL = 'https://httpbin.org/uuid'

BROKER_URL = 'redis://localhost:6379/0'
RESULT_BACKEND = 'redis://localhost:6379/0'

app = Celery(__name__, broker=BROKER_URL, backend=RESULT_BACKEND)


@app.task(name='fetch')
def fetch():
    res = requests.get(URL)
    return res.json()['uuid']


def main():
    with Timer():
        for i in range(50):
            res = fetch.delay()
            print(res)


if __name__ == '__main__':
    main()

# Celery configuration
# celery -A celery_demo.app worker --concurrency=4

# takes ~0.1 to 0.2 seconds

May we give me some insights on this fact ?

2

Answers


  1. With the celery task, you time how long it takes to put 50 task executions in the celery queue, when celery is finished is not measured.

    So it is likely that the requests did not finish at all when the Timer context is left – and you have e.g. about 46 tasks left in the redis queue.

    Documentation is here: https://docs.celeryq.dev/en/stable/userguide/calling.html; the delay() method only sends the task but does not wait for it to finish.

    Login or Signup to reply.
  2. It is so because the way you measurme is incorrect.

    Problem is this part:

                res = fetch.delay()
                print(res)  # this does not print result as you may have expected
    

    Why? – Because delay() returns immediately an AsyncResult object.

    To make it comparable to non-Celery solution (your first script) add a call to get():

                ar = fetch.delay()
                print(ar.get())  # this _does_ print the result
    

    Doing it this way you will have more/less exact time as in your non-Celery solution. However, what you can do is to call fetch.delay() all in one go first, without the call to get(). And then get results when they are ready. This will utilise the distributed nature of Celery, and you will get the results faster because most likely you will have multiple worker-processes executing your tasks.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search