I’m learning about concurrency in Python
I set up Celery run a script to fetch a request for 50 times and it only takes 0.13 seconds to finish, really surprise me !
When I compare with regular synchronous programming, it took ~ 1 minute to complete.
afaik, celery will spawn a pool of processes to execute the tasks. But I don’t think 500 tasks executed by celery worker can be much faster than sync execution of a single task.
# sync.py
import json
import requests
from timer import Timer
URL = 'https://httpbin.org/uuid'
def fetch(url):
response = requests.get(url)
data = json.loads(response.text)
print(data['uuid'])
def main():
with Timer():
for _ in range(1):
fetch(URL)
main()
# take ~50 seconds
# celery_demo.py
from celery import Celery
from timer import Timer
import json
import requests
URL = 'https://httpbin.org/uuid'
BROKER_URL = 'redis://localhost:6379/0'
RESULT_BACKEND = 'redis://localhost:6379/0'
app = Celery(__name__, broker=BROKER_URL, backend=RESULT_BACKEND)
@app.task(name='fetch')
def fetch():
res = requests.get(URL)
return res.json()['uuid']
def main():
with Timer():
for i in range(50):
res = fetch.delay()
print(res)
if __name__ == '__main__':
main()
# Celery configuration
# celery -A celery_demo.app worker --concurrency=4
# takes ~0.1 to 0.2 seconds
May we give me some insights on this fact ?
2
Answers
With the celery task, you time how long it takes to put 50 task executions in the celery queue, when celery is finished is not measured.
So it is likely that the requests did not finish at all when the
Timer
context is left – and you have e.g. about 46 tasks left in the redis queue.Documentation is here: https://docs.celeryq.dev/en/stable/userguide/calling.html; the
delay()
method only sends the task but does not wait for it to finish.It is so because the way you measurme is incorrect.
Problem is this part:
Why? – Because delay() returns immediately an AsyncResult object.
To make it comparable to non-Celery solution (your first script) add a call to get():
Doing it this way you will have more/less exact time as in your non-Celery solution. However, what you can do is to call fetch.delay() all in one go first, without the call to get(). And then get results when they are ready. This will utilise the distributed nature of Celery, and you will get the results faster because most likely you will have multiple worker-processes executing your tasks.