skip to Main Content

I have a Postgres table with these columns:

  id          int8
, user_id     varchar
, is_favorite boolean
, join_time   timestamptz

I want to delete some rows in this table with some conditions:

  • Keep a maximum of 10 rows for each user_id.
  • These 10 rows must contain each user_id‘s rows with is_favorite=true (There can’t be more than 5 rows with is_favorite=true per user_id.)
  • The rest of 10 rows must be the ones with the latest join_time.

I want to delete rows past the 10 per user_id in this table with multiple user_id criteria.

There is a topic without user_id criteria, but related to performance issues, I opened a new topic. For whole table deletion, see:

The script will execute 8 times in about 5 minutes.Table contains about 200K rows and relationol with users table.(version:PostgreSQL 12.10)

Example:

id|user_id                             |is_favorite|join_time                    
------------------------------------+------------------------------------+-------
1 |655caab8-ce81-11ed-afa1-0242ac120002|true       |2023-03-04 15:16:40.000 +0300
2 |655caab8-ce81-11ed-afa1-0242ac120002|true       |2023-03-03 15:16:25.000 +0300
3 |655caab8-ce81-11ed-afa1-0242ac120002|true       |2023-03-02 15:16:40.000 +0300
4 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-04-22 15:16:40.000 +0300
5 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-23 15:16:25.000 +0300
6 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-21 15:16:25.000 +0300
7 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-20 15:16:40.000 +0300
8 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-19 15:16:25.000 +0300
9 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-18 15:16:40.000 +0300
10|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-17 15:16:25.000 +0300
11|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-16 15:16:40.000 +0300
12|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-15 15:16:25.000 +0300
13|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-14 15:16:40.000 +0300
14|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-14 15:16:39.000 +0300
15|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 12:16:25.000 +0300
16|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 11:16:25.000 +0300
17|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 10:16:25.000 +0300
18|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 09:16:25.000 +0300
19|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 08:16:25.000 +0300
20|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 07:16:25.000 +0300
21|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 06:16:25.000 +0300
22|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 05:16:25.000 +0300
23|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 04:16:25.000 +0300
24|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 03:16:25.000 +0300
25|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 02:16:25.000 +0300
26|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 01:16:25.000 +0300
27|91a005bc-cf89-11ed-afa1-0242ac120002|true       |2023-01-08 05:27:25.000 +0300
28|91a005bc-cf89-11ed-afa1-0242ac120002|true       |2023-02-09 07:32:25.000 +0300
29|91a005bc-cf89-11ed-afa1-0242ac120002|false      |2023-02-10 08:51:25.000 +0300

For user_id in '655caab8-ce81-11ed-afa1-0242ac120002, 81c126b6-ce81-11ed-afa1-0242ac120002' these IDs must be deleted: 11,12,13,14,25,26

What is the most efficient way of deal with this?

2

Answers


  1. You can use a CTE (Common Table Expression) to achieve this.

    WITH ranked_rows AS (
      SELECT
        id,
        user_id,
        ROW_NUMBER() OVER (
          PARTITION BY user_id
          ORDER BY is_favorite DESC, join_time DESC
        ) AS row_num
      FROM your_table_name
      WHERE user_id IN ('655caab8-ce81-11ed-afa1-0242ac120002', '81c126b6-ce81-11ed-afa1-0242ac120002')
    )
    DELETE FROM your_table_name
    WHERE id IN (
      SELECT id FROM ranked_rows WHERE row_num > 10
    );
    
    Login or Signup to reply.
  2. Applied to the whole table (all users), the query I suggested to your previous question is faster.
    Applied to only a few given users, I suggest:

    DELETE FROM tbl t
    USING (
       SELECT id, row_number() OVER (PARTITION BY user_id
                                     ORDER BY is_favorite DESC, join_time DESC
                                     ROWS UNBOUNDED PRECEDING) AS rn
       FROM   tbl t
       WHERE  user_id = ANY ({'655caab8-ce81-11ed-afa1-0242ac120002
                              ,81c126b6-ce81-11ed-afa1-0242ac120002}')
       ) del
    WHERE  t.id = del.id
    AND    del.rn > 10;

    Filter early.
    An index with leading user_id will help a lot. (You probably have one.)
    An index with leading id should exist in any case.

    Other considerations as detailed in my previous answer:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search