I have a Postgres table with these columns:
id int8
, user_id varchar
, is_favorite boolean
, join_time timestamptz
I want to delete some rows in this table with some conditions:
- Keep a maximum of 10 rows for each
user_id
. - These 10 rows must contain each
user_id
‘s rows withis_favorite=true
(There can’t be more than 5 rows withis_favorite=true
peruser_id
.) - The rest of 10 rows must be the ones with the latest
join_time
.
I want to delete rows past the 10 per user_id
in this table with multiple user_id
criteria.
There is a topic without user_id
criteria, but related to performance issues, I opened a new topic. For whole table deletion, see:
The script will execute 8 times in about 5 minutes.Table contains about 200K rows and relationol with users
table.(version:PostgreSQL 12.10)
Example:
id|user_id |is_favorite|join_time
------------------------------------+------------------------------------+-------
1 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-04 15:16:40.000 +0300
2 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-03 15:16:25.000 +0300
3 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-02 15:16:40.000 +0300
4 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-04-22 15:16:40.000 +0300
5 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-23 15:16:25.000 +0300
6 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-21 15:16:25.000 +0300
7 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-20 15:16:40.000 +0300
8 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-19 15:16:25.000 +0300
9 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-18 15:16:40.000 +0300
10|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-17 15:16:25.000 +0300
11|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-16 15:16:40.000 +0300
12|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-15 15:16:25.000 +0300
13|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:40.000 +0300
14|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:39.000 +0300
15|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 12:16:25.000 +0300
16|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 11:16:25.000 +0300
17|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 10:16:25.000 +0300
18|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 09:16:25.000 +0300
19|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 08:16:25.000 +0300
20|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 07:16:25.000 +0300
21|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 06:16:25.000 +0300
22|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 05:16:25.000 +0300
23|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 04:16:25.000 +0300
24|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 03:16:25.000 +0300
25|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 02:16:25.000 +0300
26|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 01:16:25.000 +0300
27|91a005bc-cf89-11ed-afa1-0242ac120002|true |2023-01-08 05:27:25.000 +0300
28|91a005bc-cf89-11ed-afa1-0242ac120002|true |2023-02-09 07:32:25.000 +0300
29|91a005bc-cf89-11ed-afa1-0242ac120002|false |2023-02-10 08:51:25.000 +0300
For user_id in '655caab8-ce81-11ed-afa1-0242ac120002, 81c126b6-ce81-11ed-afa1-0242ac120002'
these IDs must be deleted: 11,12,13,14,25,26
What is the most efficient way of deal with this?
2
Answers
You can use a CTE (Common Table Expression) to achieve this.
Applied to the whole table (all users), the query I suggested to your previous question is faster.
Applied to only a few given users, I suggest:
Filter early.
An index with leading
user_id
will help a lot. (You probably have one.)An index with leading
id
should exist in any case.Other considerations as detailed in my previous answer: