Count unique users in last 60 mins per page with Redis HyperLogLog - PhpOut

Ben
July 5, 2020
165 views
0 votes
2 Answers

I’m designing an algorithm to count unique users on a set of pages, based on a 60min sliding scale

So it needs to find unique IPs (or tokens) that have hit a particular page and total up those hits within the last 60 mins

I need this to be very fast at scale (mainly to write but reading is a bonus). We could have 10,000s of users per page multiplied by 1000s of pages.

My research is pointing me to using Redis with HyperLogLog

I’m new to Redis coming from a Memcache background. Could anyone give me any pointers?

Thanks

Tags: algorithm hyperloglog redis

Answers

- ItamarHaber
- July 5, 2020 at 2:10 pm
- 0 votes
0
One way of doing this would be to keep an HLL key for each page/set of pages with a minute resolution. For example, if we’re tracking ‘index.html’ and the current timestamp is 0, a visitor with the ID ‘abc’ can be tracked by:
```
PFADD index.html:0 abc
```
Once the minute had passed – i.e. timestamp 1 for simplicity – a visitor such as ‘def’ will be added to the next key:
```
PFADD index.html:1 def
```
And so forth. To count the number of unique visitors from the last 60 minutes, assuming the current timestamp 100, you’ll need to call the PFCOUNT command and provide it with the names of all of these 60 keys, e.g.:
```
PFCOUNT index.html:100 index.html:99 ... index.html:41
```
Note: if you want "old" counts to be evicted, call EXPIRE after each call to PFADD.
Login or Signup to reply.

- Ersoy
- July 5, 2020 at 2:16 pm
- 0 votes
0
You can’t get time intervals in a single HyperLogLog key.

Sorted set could be an option;
- You add your users to the sorted set as their entrance date as score and their user id as value with ZADD.
- You can use ZCOUNT to get total number of unique users in that time interval. I used small numbers for timestamps for example.
```
127.0.0.1:6379> ZADD activeusers:page:1 1 a1
(integer) 1
127.0.0.1:6379> ZADD activeusers:page:1 1 a2
(integer) 1
127.0.0.1:6379> ZADD activeusers:page:1 3 a5
(integer) 1
127.0.0.1:6379> ZADD activeusers:page:1 116 a7
(integer) 1
127.0.0.1:6379> ZCOUNT activeusers:page:1 60 inf
(integer) 1
127.0.0.1:6379> ZRANGEBYSCORE activeusers:page:1 60 inf
1) "a7"
```
When you are using ZCOUNT, you will define MIN as (current time – (60*60)) and MAX as inf, so it will take between (now – 3600 seconds) and (now).

One of the drawbacks for this one is, you need to remove old data from these sets manually via using ZREMRANGEBYSCORE
```
127.0.0.1:6379> ZREMRANGEBYSCORE activeusers:page:1 -inf 59
(integer) 3
127.0.0.1:6379> ZRANGEBYSCORE activeusers:page:1 -inf inf
1) "a7"
127.0.0.1:6379> ZRANGEBYSCORE activeusers:page:1 -inf inf WITHSCORES
1) "a7"
2) "116"
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.