Postgresql - Select rank for one given user_id

AlekseiKhatkevich
July 25, 2023
264 views
1 vote
2 Answers

I have a following table representing user activity in API endpoints.

create table crud_logs (
    id           bigint generated always as identity
        constraint pk_crud_logs
            primary key,
    object_type  varchar(255)                           not null,
    object_id    bigint                                 not null,
    action       crudtypes                              not null,
    operation_ts timestamp with time zone default now() not null,
    user_id      bigint
        constraint fk_crud_logs_user_id_users
            references users
            on delete set null
);

As part of a user statistics API, I want to get the rank for a given user_id (numeric or %-wise), ranked by the count of rows per user for a given period. For simplicity let’s say from forever.

Example:

id	object_type	object_id	action	operation_ts	user_id
57	Enterprise	56	update	2023-07-21 12:51:43.904511 +00:00	34
58	Enterprise	56	update	2023-07-21 12:52:46.472828 +00:00	34
59	Enterprise	56	delete	2023-07-21 13:00:26.499430 +00:00	34
60	Enterprise	56	update	2023-07-21 12:51:43.904511 +00:00	34
61	Enterprise	56	update	2023-07-21 12:52:46.472828 +00:00	34
62	Enterprise	56	delete	2023-07-21 13:00:26.499430 +00:00	34
67	Enterprise	56	update	2023-07-21 12:51:43.904511 +00:00	34
68	Enterprise	56	update	2023-07-21 12:52:46.472828 +00:00	34
69	Enterprise	56	delete	2023-07-21 13:00:26.499430 +00:00	59
70	Enterprise	56	update	2023-07-21 12:51:43.904511 +00:00	58
71	Enterprise	56	update	2023-07-21 12:52:46.472828 +00:00	59
72	Enterprise	56	delete	2023-07-21 13:00:26.499430 +00:00	58

Here, user_id = 34 would have first place because that user has the most entries.

I have managed to write the following query:

select user_id, rank() over (order by cnt desc )
from (select user_id, count(*) cnt from crud_logs group by user_id) sq

It works for all users, whereas I need the result only for 1 specific user. For example, the expected output for user_id = 58 would be:

user_id = 58, rank = 2

Answers

- Mahesh
- July 24, 2023 at 3:41 pm
- 0 votes
0
The query you’ve written is pretty close. To obtain the rank for a specific user, you could simply add a WHERE clause in the outer SQL query to filter for the required user_id.

Here’s how you could modify your query:
```
SELECT user_id, rank 
FROM (
    SELECT user_id, RANK() OVER (ORDER BY cnt DESC) as rank 
    FROM (
        SELECT user_id, COUNT(*) cnt 
        FROM crud_logs 
        GROUP BY user_id
    ) sq
) result
WHERE user_id = 58
```
In this query, the innermost subquery groups the crud_logs table by user_id and counts the number of entries for each user. The middle subquery ranks these users by their count in descending order. Finally, the outer query filters these results to return the rank of the user with user_id = 58.
Login or Signup to reply.

- ErwinBrandstetter
- July 25, 2023 at 3:51 am
- 0 votes
0
Basic query

You can wrap the window function rank() around the aggregate count(*) in the same query level:
```
SELECT *
FROM  (
   SELECT user_id, rank() OVER (ORDER BY count(*) DESC) AS rank
   FROM   crud_logs 
   GROUP  BY 1
   ) sub
WHERE  user_id = 58;
```
Subtle issue 1: ignore null values?

You have a FK constraint on user_id with ON DELETE SET NULL – suggesting that users can be deleted and their log entries are kept with user_id reset to null. The "NULL" user would be ranked like any other, representing a collection of deleted users. I suspect you’d want to exclude that from the ranks. So:
```
SELECT *
FROM  (
   SELECT user_id, rank() OVER (ORDER BY count(*) DESC) AS rank
   FROM   crud_logs 
   WHERE  user_id IS NOT NULL  -- !
   GROUP  BY 1
   ) sub
WHERE  user_id = 58;
```
Subtle issue 2: proper time periods

Those time periods (which we are ignoring for the sake of simplicity) can be tricky with the type timestamptz. Typically, you’d want statistics for days/weeks/months in a given time zone. Avoid sneaky corner case errors by factoring in the time zone properly.

Example for the month July 2023 in Australia:
```
SELECT *
FROM  (
   SELECT user_id
        , rank() OVER (ORDER BY count(*) DESC) AS rank
   FROM   crud_logs 
   WHERE  user_id IS NOT NULL  -- !
   AND    operation_ts >= timestamp '2023-07-01' AT TIME ZONE 'Australia/Sydney'  -- !
   AND    operation_ts <  timestamp '2023-08-01' AT TIME ZONE 'Australia/Sydney'  -- !
   GROUP  BY 1
   ) sub
WHERE  user_id = 58;
```
fiddle

See:
- Ignoring time zones altogether in Rails and PostgreSQL
- Group and count events per time intervals, plus running total
If the table is big, an index on (operation_ts) should help.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Select rank for one given user_id

Answers

Basic query

Subtle issue 1: ignore `null` values?

Subtle issue 2: proper time periods

Postgresql – Select rank for one given user_id

Answers

Basic query

Subtle issue 1: ignore null values?

Subtle issue 2: proper time periods

Subtle issue 1: ignore `null` values?