I want to know how many people visited each blog page. For that, I have a column in the Blogs table (MS SQL DB) to keep the total visits count. But I also want the visits to be as unique as possible.
So I keep the user’s unique Id and blog Id in the Redis cache, and every time a user visits a page, I check if she has visited this page before, if not, I will increase the total visit count.
My question is, what is the best way of storing such data?
Currently, I create a key like this "project-visit-{blogId}-{userId}" and use StringSetAsync and StringGetAsync. But I don’t know if this method is efficient or not.
Any ideas?
3
Answers
Your solution is not atomic, unless you wrap the get and set operation in a transaction or Lua script.
A better solution is to save
project-visit-{blogId}-{userId}
into a Redis set. When you get a visit, callSADD
add an item into the set. Redis adds a new item to the set, only if the user has not visited this page before. If you want to get the total count, just callSCARD
to get the size of the set.Regardless of the back-end technology (programming language etc.), you can use Redis stream. It is a very new feature in Redis 5 and allows you to define publisher and subscriber to a topic (stream) created in Redis. Then, in each user visit, you commit a new record (of course, async) to this stream. You can hold whatever info you want in that record (user ip, id etc..).
Defining a key for each unique visit is not a good idea at all, because:
Conclusion:
If you want to use Redis, go with Redis Stream. If Redis can be changed, go with Kafka for sure (or a similar technology).
If you can sacrifice some precision, the HyperLogLog (HLL) probabilistic data structure is a great solution for counting unique visits because:
The HyperLogLog algorithm is really smart, but you don’t need to understand its inner workings in order to use it, some years ago Redis added it as a data structure. So all you, as a user, need to know is that with HyperLogLogs you can count unique elements (visits) in a fixed memory space of 12K, with a 0.81% margin of error.
Let’s say you want to keep a count of unique visits per day; you would have to have one HyperLogLog per day, named something like
cnt:page-name:20200917
and every time a user visits a page you would add them to the HLL:If you add the same user multiple time, they will still only be counted once.
To get the count you run:
You can change the granularity of unique users by having different HLLs for different time intervals, for example
cnt:page-name:202009
for the month of September, 2020.This quick explainer lays it out pretty well: https://www.youtube.com/watch?v=UAL2dxl1fsE
This blog post might help too: https://redislabs.com/redis-best-practices/counting/hyperloglog/
And if you’re curious about the internal implementation Antirez’s release post is a great read: http://antirez.com/news/75
NOTE: Please note that with this solution you lose the information of which user visited that page, you only have the count