503 with no trace in the logs using apache/wsgi

J233r244me
December 19, 2018
148 views
0 votes
2 Answers

My Flask app returns 503 errors regularly. I can’t tell the reason. Could be load related. It is not systematic, so it’s not a file permissions issue. It’s more like 5 times on 10 subsequent requests. Easy to reproduce using F5 in a browser.

I’d like to debug that but I can’t find anything in the logs.

I’ve checked apache main log files (access/error) and the VirtualHost access/error log files. I’ve tried setting LogLevel to debug, to no avail.

When the application returns a 503 (e.g. using abort(503) with Flask), the error is logged in the virtualhost access log (this is not an apache error, so it goes in access log). It is also logged in my app log because my framework logs all http errors.

I’ve been having load issues in the past, where no thread was available. This resulted in 503 errors returned by apache itself and I’m pretty sure those were logged in either access or error log (most probably error).

How is it possible that the client gets a 503 and there’s no trace of it in the logs?

Virtual host config excerpt:

    ErrorLog ${APACHE_LOG_DIR}/my-app-error.log
    CustomLog ${APACHE_LOG_DIR}/my-app-access.log combined

    WSGIDaemonProcess my-app threads=5
    WSGIScriptAlias /api /srv/my-app/application.wsgi process-group=my-app application-group=%{GLOBAL}
    WSGIPassAuthorization On

    <Location /api>
        WSGIProcessGroup my-app
    </Location>

    <Directory /srv/my-app/>
        Options FollowSymLinks
        AllowOverride All
    </Directory>

Debian Stretch, apache 2.4.25, mod_wsgi 4.5.11.

Edit 1: All WSGi applications are affected

We notice 503 errors on another wsgi application in another virtual host on the same apache instance. This application is under a light (close to zero) load, so it shouldn’t 503. However, I don’t get a 503 when loading the default VHost page (the “Apache2 Debian Default Page” “It works!” page). Like if there was some sort of mod_wsgi limitation that would be common to all WSGI applications, but not a global apache limitation since only WSGI applications are affected.

Edit 2: Restarting apache

systemctl reload apache2 doesn’t change anything. However, systemctl restart apache2 solved it for now. Until next time.

Before the restart

● apache2.service - The Apache HTTP Server
   Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-12-04 11:13:23 CET; 2 weeks 0 days ago
  Process: 10023 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=0/SUCCESS)
  Process: 536 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
 Main PID: 977 (apache2)
    Tasks: 133 (limit: 4915)
   Memory: 1.7G
      CPU: 6d 6h 3min 51.105s
   CGroup: /system.slice/apache2.service
           ├─  977 /usr/sbin/apache2 -k start
           ├─10066 /usr/sbin/apache2 -k start
           ├─10067 /usr/sbin/apache2 -k start
           ├─10068 /usr/sbin/apache2 -k start
           ├─10069 /usr/sbin/apache2 -k start
           ├─16834 /usr/sbin/apache2 -k start
           └─16836 /usr/sbin/apache2 -k start

After the restart

● apache2.service - The Apache HTTP Server
   Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-12-19 12:32:02 CET; 3s ago
  Process: 11840 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
  Process: 11735 ExecReload=/usr/sbin/apachectl graceful (code=exited, status=0/SUCCESS)
  Process: 11850 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
 Main PID: 11854 (apache2)
    Tasks: 79 (limit: 4915)
   Memory: 125.3M
      CPU: 4.080s
   CGroup: /system.slice/apache2.service
           ├─11854 /usr/sbin/apache2 -k start
           ├─11855 /usr/sbin/apache2 -k start
           ├─11856 /usr/sbin/apache2 -k start
           ├─11857 /usr/sbin/apache2 -k start
           └─11858 /usr/sbin/apache2 -k start

The differences I see here are the number of processes (no idea what to conclude about that) and the memory usage. Alright, the application seems to be a bit greedy with the memory but I think the server can handle that.

Answers

- workaround
- December 19, 2018 at 12:37 pm
- 0 votes
0
First of all, did you check out the access log? Because if there is no error log this means the server was accessed, so there must be somehting in the access log.
If there is, check if Flask is indeed serving.

Secondly, are you proxying requests? If you do, make sure your proxy config is ok.

And of course, make sure your mod_wsgi config is correct

Login or Signup to reply.

- leopoldtalirz
- February 1, 2021 at 6:02 pm
- 0 votes
0
In case it helps: we have experienced a similar issue with a flask wsgi application intermittently returning 503 (say, every 5-10 requests).

Manual testing revealed that the corresponding requests did not show up in the apache access log (while the successful requests did).

As hinted in workaround‘s answer, the apache config did indeed also contain proxy configurations for other apps, and we were using the keepalive=On keyword for one of our ProxyPass directives (not for the flask app, but for another app served under the same prefix). Excerpt:
```
    <Location /curated-cofs>
        WSGIProcessGroup curated-cofs   # this is the flask app
    </Location>

    <Location /curated-cofs/optimade>
        ProxyPass http://localhost:3759 keepalive=On timeout=1200
        ProxyPassReverse http://localhost:3759
    </Location>
```
There was actually no good reason for us to use the keepalive keyword here (no internal firewall).

Removing the keyword from the ProxyPass directive seems to have resolved the 503 issue for the flask app as a side-effect.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.