I had a recent outage on a Nginx/Rails application server. It turned out we were being bombarded by requests to a particular URL that takes a few seconds to load. It appears that a user was continually refreshing that page for a number of minutes – my guess is they accidentally put some object on their keyboard in such a way as to trigger a constant stream of browser refreshes.
Regardless of the cause, I need to put protection in place against this kind of problem, and note that this is not static content – it’s dynamic, user-specific content sitting behind authentication.
I’ve looked into using Cache-Control but this appears to be a non-starter – on Chrome at least, refreshing a page within the same tab will trigger a request regardless of the Cache-Control header (cf iis – Is Chrome ignoring Cache-Control: max-age? – Stack Overflow)
I believe the answer may be rate limiting. If so, I wouldn’t be able to do it based on IP because many of our customers share the same one. However I may be able to add a new header to identify a user and then apply rate limiting in Nginx based on this.
Does this sound like the way forward? This feels like it should be a fairly common problem!
2
Answers
A colleague of mine suggested a solution that I think is the best fit for our situation. I'll explain why in case this proves useful to anyone else.
Note that we were receiving requests at a low rate - just 6 per second. The reason this was a problem was that the page in question was quite a slow loading report, only accessible to authenticated users.
Server-side caching is not a great solution for us because it needs to be implemented individually on each affected page and we have a complex app with lots of different controllers.
Rate-limiting via Nginx might be viable but tricky to optimise and also has issues with testability.
Anyway, my colleague's solution is as follows: we already have a table that logs details of each request, including the ID of the user that made it. To find out if a user is refreshing too often, we simply schedule a Sidekiq job once every, say, 30 seconds to check this table for users with a refresh rate above our threshold and then kill any active sessions.
How you kill a session depends how you are managing them - in our case, we could simply add a flag to the user that says "rate_limited" and have our Sidekiq job set it to true, and then check the value of this flag on each request. If it's true, the user will be redirected away from the slow page and on to the login screen which will happily deal with refreshing itself 6 times per second.
You could achieve something similar even without a request logging table, e.g. by keeping track of the request rate in a new column on the users table.
Note that this solution is a better UX than Nginx rate-limiting, as users are never actually locked out of the app.
Nginx rate limiting is a fast configuration update if immediate mitigation is needed. As others have mentioned, caching would also be ideal when combined with this.
The
$http_authorization
header or a unique cookie (e.g.$cookie_foo
) could also be used to uniquely identify requests that would collide with the same IP/user-agent values.