skip to Main Content

We want to create a view counter per post (https://my-domain.com/posts/post-title) and we want to store views per day to allow posts to be sorted by "Most popular in the last 7 days", "Most popular in the last 30 days", etc.
Several solutions are available to us.

Solution A:
Have a "view_counters" table with the columns below:

id
post_id INT
08_28_2020 INT DEFAULT 0
08_29_2020 INT DEFAULT 0
08_30_2020 INT DEFAULT 0
08_31_2020 INT DEFAULT 0
...

Every day, a cron job in PHP would add a column with today’s date as the name.
There will be one row per post. With each view, we would increment the column by the current date.
This solution allows to have something more easily readable for humans with a web interface (PhpMyAdmin) but still creates a lot of columns and it is probably not the most optimized solution for the database engine. since after 1 year there will be more than 365 columns in the table.


Solution B:
Have a "view_counters" table with the columns below:

id
post_id INT
current_date DATE
counter INT DEFAULT 0

There will be one row per post per day. At each view, we would increment the "counter" column of the line of the post in question.
It is admittedly less readable via PhpMyAdmin (you would have to make a small query) but certainly easier to read and process for the database engine. Correct me if I say wrong!


Solution C:
Have a "view_counters" table with the columns below:

id
post_id INT
current_date DATE

Each view would add a row to the table. We would then have a query that would count the number of views of such and such post for the current date (with a COUNT()). However, we think that this solution is not suitable because it would be necessary to do a COUNT() each time a visitor loads the page of a post, knowing that we have more than 100,000 pageviews/day, that would use a lot of resources to re-count each time. So, solution not suitable in our opinion…


Solution D:
If you have another solution with a more optimized structure, I’d love to learn more!

Hoping to have been clear and understandable.
Thank you in advance for your answers!

3

Answers


  1. Solution A

    If a solution requires regular changes to your database structure within an RDBMS, then that solution is wrong. RDBMSs work with a defined structure that may occasionally change. but that also results in a code change. Also, how would you get the sum of views within a range this way?

    Solution B and C

    From a database design perspective, these are equivalent solutions because you have one record per measured event. The difference between the two is the level of granularity of the information you collect.

    If you are only interested in how many times a post was viewed in a day, then go with solution B.

    However, if you need more granular information, like who visited the post, which time of the day the post was visited, are the recurring users visiting the same post, etc., then you have to go with solution C. Obviously, solution C makes sense only if you store additional details, not just the date of the view.

    Login or Signup to reply.
  2. TL;DR answer

    Go with solution C coupled with a materialized view of the daily page views.

    Longer answer

    Solution A is not a good solution from a database design perspective. First it makes aggregating data difficult and it requires a daily modification to the table. The latter may result in inefficient storage of the rows in the tablespaces.

    Solution B, is workable and would be a good solution if it was a detail table (i.e. had a FK relationship to a primary table of posts) and you do not care about temporal resolution less than a day. In my opinion, it is a bit of dead end from a database design perspective and I would not recommend it.

    Solution C, provides the opportunity to store the actual date/time of a view as well as the opportunity to store additional facts about a post view (e.g. viewing user (login or anonymous), country of origin, etc). With solution C you can create a materialized view that produces solution B (refereshing it daily), so solution C is a 2-for-1 solution. Also, solution C will also allow you to create other aggregate views (weekly aggregates, views as a function of time of day, views by user, etc).

    Login or Signup to reply.
  3. B, but simpler

    post_id INT
    date DATE   -- (there is nothing "current" about the date")
    counter INT DEFAULT 0
    

    That is, get rid of id, it is useless since you should have PRIMARY KEY(post_id, date).

    Each night tally up the counts for the day.

    You have a "summary table". More discussion: http://mysql.rjweb.org/doc.php/summarytables

    The "report" would sum the counts to get a total for any (weekly/monthly/whatever) range of dates. "Most popular" then builds on having those counts.

    C

    "it would be necessary to do a COUNT() each time a visitor loads the page" — That is not a real problem; it could be handled with IODKU, which would either add a new row (once a day, per post) or update the existing row.

    And IODKU is an alternative to having code to do the daily summarization. Note: my table schema would work with IODKU. Also, 100K hits per day is "trivial". If you get to 10M/day, you might need to switch to the nightly summarization.

    A Rule of Thumb: Under 100 queries/second is not a problem; more than that may need special handling.

    A

    It is almost always a bad idea to spread an "array" across columns.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search