skip to Main Content

I want to know how many people visited each blog page. For that, I have a column in the Blogs table (MS SQL DB) to keep the total visits count. But I also want the visits to be as unique as possible.
So I keep the user’s unique Id and blog Id in the Redis cache, and every time a user visits a page, I check if she has visited this page before, if not, I will increase the total visit count.

My question is, what is the best way of storing such data?
Currently, I create a key like this "project-visit-{blogId}-{userId}" and use StringSetAsync and StringGetAsync. But I don’t know if this method is efficient or not.

Any ideas?

3

Answers


  1. Your solution is not atomic, unless you wrap the get and set operation in a transaction or Lua script.

    A better solution is to save project-visit-{blogId}-{userId} into a Redis set. When you get a visit, call SADD add an item into the set. Redis adds a new item to the set, only if the user has not visited this page before. If you want to get the total count, just call SCARD to get the size of the set.

    Login or Signup to reply.
  2. Regardless of the back-end technology (programming language etc.), you can use Redis stream. It is a very new feature in Redis 5 and allows you to define publisher and subscriber to a topic (stream) created in Redis. Then, in each user visit, you commit a new record (of course, async) to this stream. You can hold whatever info you want in that record (user ip, id etc..).

    Defining a key for each unique visit is not a good idea at all, because:

    • It makes the life harder for redis GC
    • Performance, comparing the use-case, is not comparable to Stream, specially if you use that instance of redis for other purposes
    • Constantly collecting these unique visits and processing them is not efficient. You have to always scan through all keys

    Conclusion:
    If you want to use Redis, go with Redis Stream. If Redis can be changed, go with Kafka for sure (or a similar technology).

    Login or Signup to reply.
  3. If you can sacrifice some precision, the HyperLogLog (HLL) probabilistic data structure is a great solution for counting unique visits because:

    • It only uses 12K of memory, and those are fixed – they don’t grow with the number of unique visits
    • You don’t need to store user data, which makes your service more privacy-oriented

    The HyperLogLog algorithm is really smart, but you don’t need to understand its inner workings in order to use it, some years ago Redis added it as a data structure. So all you, as a user, need to know is that with HyperLogLogs you can count unique elements (visits) in a fixed memory space of 12K, with a 0.81% margin of error.

    Let’s say you want to keep a count of unique visits per day; you would have to have one HyperLogLog per day, named something like cnt:page-name:20200917 and every time a user visits a page you would add them to the HLL:

    > PFADD cnt:page-name:20200917 {userID}
    

    If you add the same user multiple time, they will still only be counted once.
    To get the count you run:

    > PFCOUNT cnt:page-name:20200917
    

    You can change the granularity of unique users by having different HLLs for different time intervals, for example cnt:page-name:202009 for the month of September, 2020.

    This quick explainer lays it out pretty well: https://www.youtube.com/watch?v=UAL2dxl1fsE

    This blog post might help too: https://redislabs.com/redis-best-practices/counting/hyperloglog/

    And if you’re curious about the internal implementation Antirez’s release post is a great read: http://antirez.com/news/75

    NOTE: Please note that with this solution you lose the information of which user visited that page, you only have the count

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search