I have a Postgres table with these columns:
id int8
, user_id varchar
, is_favorite boolean
, join_time timestamptz
I want to delete some rows in this table with some conditions:
- Keep a maximum of 10 rows for each
user_id
. - These 10 rows must contain each
user_id
‘s rows withis_favorite=true
(There can’t be more than 5 rows withis_favorite=true
peruser_id
.) - The rest of 10 rows must be the ones with the latest
join_time
.
I want to delete rows past the 10 per user_id in this table.
Example
id|user_id |is_favorite|join_time
------------------------------------+------------------------------------+-------
1 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-04 15:16:40.000 +0300
2 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-03 15:16:25.000 +0300
3 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-02 15:16:40.000 +0300
4 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-04-22 15:16:40.000 +0300
5 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-23 15:16:25.000 +0300
6 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-21 15:16:25.000 +0300
7 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-20 15:16:40.000 +0300
8 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-19 15:16:25.000 +0300
9 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-18 15:16:40.000 +0300
10|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-17 15:16:25.000 +0300
11|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-16 15:16:40.000 +0300
12|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-15 15:16:25.000 +0300
13|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:40.000 +0300
14|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:39.000 +0300
15|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 12:16:25.000 +0300
16|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 11:16:25.000 +0300
17|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 10:16:25.000 +0300
18|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 09:16:25.000 +0300
19|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 08:16:25.000 +0300
20|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 07:16:25.000 +0300
21|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 06:16:25.000 +0300
22|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 05:16:25.000 +0300
23|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 04:16:25.000 +0300
24|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 03:16:25.000 +0300
25|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 02:16:25.000 +0300
26|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 01:16:25.000 +0300
For user_id = 655caab8-ce81-11ed-afa1-0242ac120002
these IDs must be deleted: 11,12,13,14
For user_id = 81c126b6-ce81-11ed-afa1-0242ac120002
these IDs must be deleted 25,26.
2
Answers
Since you are processing the whole table, using a simple subquery with
row_number()
should be fastest:Adding
ROWS UNBOUNDED PRECEDING
is optional, but should make it substantially faster (until Postgres 16 is released). See:Applying the right sort order, this skips the top 10 of most desirable rows per user and deletes the rest.
true
sorts beforefalse
in descending order. See:If there can be
null
values, you need to do more. Like, first of all clarify your question.Obviously, there would be race conditions with concurrent writes. If there can be concurrent write load, take a write lock on the table in the same transaction first …
If that’s going to delete the majority of rows, it may be cheaper to create a new table of survivors instead …
There are other ways. Like:
Aside: use type
uuid
for youruser_id
column. Much better. See:You can use row_number() twice. first one used to remove records above rank 10, and second one will be used to remove any records
is_favorite=true
above rank 5.Demo here