I have a following table representing user activity in API endpoints.
create table crud_logs (
id bigint generated always as identity
constraint pk_crud_logs
primary key,
object_type varchar(255) not null,
object_id bigint not null,
action crudtypes not null,
operation_ts timestamp with time zone default now() not null,
user_id bigint
constraint fk_crud_logs_user_id_users
references users
on delete set null
);
As part of a user statistics API, I want to get the rank for a given user_id
(numeric or %-wise), ranked by the count of rows per user for a given period. For simplicity let’s say from forever.
Example:
id | object_type | object_id | action | operation_ts | user_id |
---|---|---|---|---|---|
57 | Enterprise | 56 | update | 2023-07-21 12:51:43.904511 +00:00 | 34 |
58 | Enterprise | 56 | update | 2023-07-21 12:52:46.472828 +00:00 | 34 |
59 | Enterprise | 56 | delete | 2023-07-21 13:00:26.499430 +00:00 | 34 |
60 | Enterprise | 56 | update | 2023-07-21 12:51:43.904511 +00:00 | 34 |
61 | Enterprise | 56 | update | 2023-07-21 12:52:46.472828 +00:00 | 34 |
62 | Enterprise | 56 | delete | 2023-07-21 13:00:26.499430 +00:00 | 34 |
67 | Enterprise | 56 | update | 2023-07-21 12:51:43.904511 +00:00 | 34 |
68 | Enterprise | 56 | update | 2023-07-21 12:52:46.472828 +00:00 | 34 |
69 | Enterprise | 56 | delete | 2023-07-21 13:00:26.499430 +00:00 | 59 |
70 | Enterprise | 56 | update | 2023-07-21 12:51:43.904511 +00:00 | 58 |
71 | Enterprise | 56 | update | 2023-07-21 12:52:46.472828 +00:00 | 59 |
72 | Enterprise | 56 | delete | 2023-07-21 13:00:26.499430 +00:00 | 58 |
Here, user_id = 34
would have first place because that user has the most entries.
I have managed to write the following query:
select user_id, rank() over (order by cnt desc )
from (select user_id, count(*) cnt from crud_logs group by user_id) sq
It works for all users, whereas I need the result only for 1 specific user. For example, the expected output for user_id = 58
would be:
user_id = 58, rank = 2
2
Answers
The query you’ve written is pretty close. To obtain the rank for a specific user, you could simply add a WHERE clause in the outer SQL query to filter for the required user_id.
Here’s how you could modify your query:
In this query, the innermost subquery groups the crud_logs table by user_id and counts the number of entries for each user. The middle subquery ranks these users by their count in descending order. Finally, the outer query filters these results to return the rank of the user with
user_id = 58
.Basic query
You can wrap the window function
rank()
around the aggregatecount(*)
in the same query level:Subtle issue 1: ignore
null
values?You have a FK constraint on
user_id
withON DELETE SET NULL
– suggesting that users can be deleted and their log entries are kept withuser_id
reset tonull
. The "NULL" user would be ranked like any other, representing a collection of deleted users. I suspect you’d want to exclude that from the ranks. So:Subtle issue 2: proper time periods
Those time periods (which we are ignoring for the sake of simplicity) can be tricky with the type
timestamptz
. Typically, you’d want statistics for days/weeks/months in a given time zone. Avoid sneaky corner case errors by factoring in the time zone properly.Example for the month July 2023 in Australia:
fiddle
See:
If the table is big, an index on
(operation_ts)
should help.