Recently, our users have been complaining about the sometimes very long loading times of our web application. This occurs both when reloading the page and when using the page. all users are affected at the same time. Sometimes everything runs smoothly, but sometimes the application freezes and nothing runs for several minutes.
i assume that a maximum of about 20 people are using the application at the same time. We use a PostgreSQL v16 database on a Docker container. We use gunicorn as the server.
The API logs basically look as I expect them to. Sometimes there is the message "Ignoring EPIPE". If I understand correctly, this indicates that the client has closed the connection due to a timeout.
I suspect a jam in the handling of queries. Where could the problem lie? What is the best way to isolate the cause?
I would be very grateful for any input…
- even in times of congestion, most sessions are idle and only about 4 sessions are used.
- the api-logs look fine, apart from the mentioned "Ignoring EPIPE"
- during times of congestion the cpu-usage is consistantly over 300 % (on 4 cores). Seems a bit high to me…
2
Answers
So... In the end it was a wrong api-call from the frontend that triggered the problem: instead of passing the argument id=xy todo_id=xy was passed which led to selecting and serializing the whole database. this took like 5 min per call. after 4 parallel calls the whole system was congested and nothing worked anymore for any user.
Thanks a lot for your suggestions. It helped me looking in the right places.
Sounds like a query is taking too long. This could be any number of things, but most like it is a poorly-optimized query (i.e. a query that doesn’t use an index where it should).
Try setting
log_min_duration_statement
in postgresql.conf to 1000 (milliseconds) or greater:Then after a while, check your database server logs.
If you update your question with the slow statements from the logs, maybe I or others can assist you in creating the necessary index and/or re-writing the query.