I have a complex API which takes around 7GB memory when I deploy it using Uvicorn.
I want to understand how I can deploy it, such a way that from my end I want to be able to make parallel requests. The deployed API should be capable of processing two or three requests at same time.
I am using FastAPI with uvicorn and nginx for deployment. Here is my deployed command.
uvicorn --host 0.0.0.0 --port 8888
Can someone provide some clarity on how people achieve that?
2
Answers
You can use gunicorn instead of uvicorn to handle your backend. Gunicorn offers multiple workers to effectively make load balancing of the arriving requests. This means that you will have as many gunicorn running process as you specify to receive and process requests. From the doc, gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second. However, the number of workers should be no more than (2 x number_of_cpu_cores) + 1 to avoid running out of memory errors. You can check this out in the doc.
For example, if you want to use 4 workers for your fastapi-based backend, you can specify it with the flag
w
:gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b "0.0.0.0:8888"
In this case, the script where I have my backend functionalities is called
main
and fastapi is instantiated asapp
.I’m working on something like this using Docker and NGINX.
There’s a Docker official image created by the guy who developed FastAPI that deploys uvicorn/gunicorn for you that can be configured to your needs:
It took some time to get the hang of Docker but I’m really liking it now. You can build an nginx image using the below configuration and then build x amount of your app inside of separate containers for however many you need to serve as hosts.
The below example is running a weighted load balancer for two of my app services with a backup third if those two should fail.
https://hub.docker.com/r/tiangolo/uvicorn-gunicorn-fastapi
nginx Dockerfile:
nginx.conf:
app Dockerfile: