My Flask app returns 503 errors regularly. I can’t tell the reason. Could be load related. It is not systematic, so it’s not a file permissions issue. It’s more like 5 times on 10 subsequent requests. Easy to reproduce using F5 in a browser.
I’d like to debug that but I can’t find anything in the logs.
I’ve checked apache main log files (access/error) and the VirtualHost access/error log files. I’ve tried setting LogLevel
to debug, to no avail.
When the application returns a 503
(e.g. using abort(503)
with Flask), the error is logged in the virtualhost access log (this is not an apache error, so it goes in access log). It is also logged in my app log because my framework logs all http errors.
I’ve been having load issues in the past, where no thread was available. This resulted in 503 errors returned by apache itself and I’m pretty sure those were logged in either access or error log (most probably error).
How is it possible that the client gets a 503 and there’s no trace of it in the logs?
Virtual host config excerpt:
ErrorLog ${APACHE_LOG_DIR}/my-app-error.log
CustomLog ${APACHE_LOG_DIR}/my-app-access.log combined
WSGIDaemonProcess my-app threads=5
WSGIScriptAlias /api /srv/my-app/application.wsgi process-group=my-app application-group=%{GLOBAL}
WSGIPassAuthorization On
<Location /api>
WSGIProcessGroup my-app
</Location>
<Directory /srv/my-app/>
Options FollowSymLinks
AllowOverride All
</Directory>
Debian Stretch, apache 2.4.25, mod_wsgi 4.5.11.
Edit 1: All WSGi applications are affected
We notice 503 errors on another wsgi application in another virtual host on the same apache instance. This application is under a light (close to zero) load, so it shouldn’t 503. However, I don’t get a 503 when loading the default VHost page (the “Apache2 Debian Default Page” “It works!” page). Like if there was some sort of mod_wsgi limitation that would be common to all WSGI applications, but not a global apache limitation since only WSGI applications are affected.
Edit 2: Restarting apache
systemctl reload apache2
doesn’t change anything. However, systemctl restart apache2
solved it for now. Until next time.
Before the restart
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-12-04 11:13:23 CET; 2 weeks 0 days ago
Process: 10023 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=0/SUCCESS)
Process: 536 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 977 (apache2)
Tasks: 133 (limit: 4915)
Memory: 1.7G
CPU: 6d 6h 3min 51.105s
CGroup: /system.slice/apache2.service
├─ 977 /usr/sbin/apache2 -k start
├─10066 /usr/sbin/apache2 -k start
├─10067 /usr/sbin/apache2 -k start
├─10068 /usr/sbin/apache2 -k start
├─10069 /usr/sbin/apache2 -k start
├─16834 /usr/sbin/apache2 -k start
└─16836 /usr/sbin/apache2 -k start
After the restart
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-12-19 12:32:02 CET; 3s ago
Process: 11840 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
Process: 11735 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=0/SUCCESS)
Process: 11850 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 11854 (apache2)
Tasks: 79 (limit: 4915)
Memory: 125.3M
CPU: 4.080s
CGroup: /system.slice/apache2.service
├─11854 /usr/sbin/apache2 -k start
├─11855 /usr/sbin/apache2 -k start
├─11856 /usr/sbin/apache2 -k start
├─11857 /usr/sbin/apache2 -k start
└─11858 /usr/sbin/apache2 -k start
The differences I see here are the number of processes (no idea what to conclude about that) and the memory usage. Alright, the application seems to be a bit greedy with the memory but I think the server can handle that.
2
Answers
First of all, did you check out the access log? Because if there is no error log this means the server was accessed, so there must be somehting in the access log.
If there is, check if Flask is indeed serving.
Secondly, are you proxying requests? If you do, make sure your proxy config is ok.
And of course, make sure your mod_wsgi config is correct
In case it helps: we have experienced a similar issue with a flask wsgi application intermittently returning
503
(say, every 5-10 requests).Manual testing revealed that the corresponding requests did not show up in the apache access log (while the successful requests did).
As hinted in workaround‘s answer, the apache config did indeed also contain proxy configurations for other apps, and we were using the
keepalive=On
keyword for one of ourProxyPass
directives (not for the flask app, but for another app served under the same prefix). Excerpt:There was actually no good reason for us to use the
keepalive
keyword here (no internal firewall).Removing the keyword from the ProxyPass directive seems to have resolved the
503
issue for the flask app as a side-effect.