skip to Main Content

I have a Postgres table with these columns:

  id          int8
, user_id     varchar
, is_favorite boolean
, join_time   timestamptz

I want to delete some rows in this table with some conditions:

  1. Keep a maximum of 10 rows for each user_id.
  2. These 10 rows must contain each user_id‘s rows with is_favorite=true
    (There can’t be more than 5 rows with is_favorite=true per user_id.)
  3. The rest of 10 rows must be the ones with the latest join_time.

I want to delete rows past the 10 per user_id in this table.

Example

id|user_id                             |is_favorite|join_time                    
------------------------------------+------------------------------------+-------
1 |655caab8-ce81-11ed-afa1-0242ac120002|true       |2023-03-04 15:16:40.000 +0300
2 |655caab8-ce81-11ed-afa1-0242ac120002|true       |2023-03-03 15:16:25.000 +0300
3 |655caab8-ce81-11ed-afa1-0242ac120002|true       |2023-03-02 15:16:40.000 +0300
4 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-04-22 15:16:40.000 +0300
5 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-23 15:16:25.000 +0300
6 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-21 15:16:25.000 +0300
7 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-20 15:16:40.000 +0300
8 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-19 15:16:25.000 +0300
9 |655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-18 15:16:40.000 +0300
10|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-17 15:16:25.000 +0300
11|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-16 15:16:40.000 +0300
12|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-15 15:16:25.000 +0300
13|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-14 15:16:40.000 +0300
14|655caab8-ce81-11ed-afa1-0242ac120002|false      |2023-03-14 15:16:39.000 +0300
15|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 12:16:25.000 +0300
16|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 11:16:25.000 +0300
17|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 10:16:25.000 +0300
18|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 09:16:25.000 +0300
19|81c126b6-ce81-11ed-afa1-0242ac120002|true       |2023-03-01 08:16:25.000 +0300
20|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 07:16:25.000 +0300
21|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 06:16:25.000 +0300
22|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 05:16:25.000 +0300
23|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 04:16:25.000 +0300
24|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 03:16:25.000 +0300
25|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 02:16:25.000 +0300
26|81c126b6-ce81-11ed-afa1-0242ac120002|false      |2023-03-01 01:16:25.000 +0300

For user_id = 655caab8-ce81-11ed-afa1-0242ac120002 these IDs must be deleted: 11,12,13,14

For user_id = 81c126b6-ce81-11ed-afa1-0242ac120002 these IDs must be deleted 25,26.

2

Answers


  1. Since you are processing the whole table, using a simple subquery with row_number() should be fastest:

    DELETE FROM tbl t
    USING (
       SELECT id, row_number() OVER (PARTITION BY user_id
                                     ORDER BY is_favorite DESC, join_time DESC
                                     ROWS UNBOUNDED PRECEDING) AS rn
       FROM   tbl t
       ) del
    WHERE  t.id = del.id
    AND    del.rn > 10;
    

    Adding ROWS UNBOUNDED PRECEDING is optional, but should make it substantially faster (until Postgres 16 is released). See:

    Applying the right sort order, this skips the top 10 of most desirable rows per user and deletes the rest.

    true sorts before false in descending order. See:

    If there can be null values, you need to do more. Like, first of all clarify your question.

    Obviously, there would be race conditions with concurrent writes. If there can be concurrent write load, take a write lock on the table in the same transaction first …

    If that’s going to delete the majority of rows, it may be cheaper to create a new table of survivors instead …

    There are other ways. Like:

    Aside: use type uuid for your user_id column. Much better. See:

    Login or Signup to reply.
  2. You can use row_number() twice. first one used to remove records above rank 10, and second one will be used to remove any records is_favorite=true above rank 5.

    with cte as (
      select *, row_number() over (partition by user_id order by is_favorite desc, join_time desc) as rn,
      (CASE WHEN is_favorite = 'true'
            THEN row_number() over (partition by user_id order by is_favorite desc)
            ELSE 0
      END) as fav_rn
      from mytable
    )
    delete from mytable
    where id in (
      select id
      from cte
      where rn>10 or fav_rn > 5
    )
    

    Demo here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search