skip to Main Content

I’m working on a Django middleware to store all requests/responses in my main database (Postgres / SQLite).
But it’s not hard to guess that the overhead will be crazy, so I’m thinking to use Redis to queue the requests for an amount of time and then send them slowly to my database.
e.g. receiving 100 requests, storing them in database, waiting to receive another 100 requests and doing the same, or something like this.

The model is like this:

url
method
status
user
remote_ip
referer
user_agent
user_ip
metadata # any important piece of data related to request/response e.g. errors or ...
created_at
updated_at

My questions are "is it a good approach? and how we can implement it? do you have any example that does such a thing?"
And the other question is that "is there any better solution"?

2

Answers


  1. This doesn’t suit the concrete question/answer format particularly well, unfortunately.

    "Is this a good approach" is difficult to answer directly with a yes or no response. It will work and your proposed implementation looks sound, but you’ll be implementing quite a bit of software and adding quite a bit of complexity to your project.

    Whether this is desirable isn’t easily answerable without context only you have.

    Some things you’ll want to answer:

    • What am I doing with these stored requests? Debugging? Providing an audit trail?
      • If it’s for debugging, what does a database record get us that our web server’s request logs do not?
      • If it’s for an audit trail, is every individual HTTP request the best representation of that audit trail? Does an auditor care that someone asked for /favicon.ico? Does it convey the meaning and context they need?
    • Do we absolutely need every request stored? For how long? How do we handle going over our storage budget? How do we handle in edge cases like the client hanging up before getting the response, or we’ve processed the request but crashed before sending a response or logging the record?
    • Does logging a request in band with the request itself present a performance cost we actually can’t afford?

    Compare the strengths and weaknesses of your approach to some alternatives:

    • We can rely on the web server’s logs, which we’re already paying the cost for and are built to handle many of the oddball cases here.
    • We can write an HTTPLog model in band with the request using a simple middleware function, which solves some complexities like "what if redis is down but django and the database aren’t?"
    • We write an audit logging system by providing any context needed to an out-of-band process (perhaps through signals or redis+celery)

    Above all: capture your actual requirements first, implement the simplest solution that works second, and optimize only after you actually see performance issues.

    Login or Signup to reply.
  2. I would not put this functionality in my Django application. There are many tools to do that. One of them is NGINX, which is a reverse proxy server which you can put infront of Django. Then you can use the access log from NGINX. Also, you can format those logs according to your need. Usually for this big amount of data, it is better to not store them in database, because this data will rarely be used. You can store them in a S3 bucket or just in plain files and use a log parser tool to parse them.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search