skip to Main Content

Given the following schema

CREATE TEMP TABLE tmp_table 
(
    id_bucket int2 NOT NULL, 
    id_property int2 NOT NULL,
    v1 varchar(320) NOT NULL,
    v2 varchar(320) NULL
);

And the following data

id_bucket  id_property v1   v2
------------------------------
1          3           OT   A
1          8           RO   C
1          2           OT   C
1          1           RT   D   
1          5           OT   C
1          4           OT   C

I would like that all rows that have the same values for v1 and v2 (in this case v1 = OT, v2 = C) be updated with a value for v2 of NULL except for one, regardless.

I tried grouping by my criterias, but then I lose the details of the rows in question. I know id_property will probably be useful, but when using GROUP BY’s, I don’t have access to that column.

I’ve been pulling my hair out for a while on this one. Any ideas ?

2

Answers


  1. "Except for one".
    You didn’t actually specify which one, but if you don’t care, you can do something like:

    UPDATE tmp_table t
    SET v2 = NULL
    WHERE (id_bucket, v1, v2) IN (
        SELECT id_bucket, v1, v2
        FROM tmp_table
        GROUP BY id_bucket, v1, v2
        HAVING COUNT(*) > 1
    )
    AND id_property <> (
        SELECT MIN(id_property)
        FROM tmp_table
        WHERE id_bucket = t.id_bucket AND v1 = t.v1 AND v2 = t.v2
    );
    
    Login or Signup to reply.
  2. Your solution are analytic functions aka window functions.

    UPDATE tmp_table
    SET v2 = NULL
    WHERE (id_bucket,
           id_property) in -- this assumes these columns form a unique key
        (SELECT id_bucket,
                id_property
         FROM
           (SELECT id_bucket,
                   id_property,
                   row_number() OVER (
                      PARTITION BY v1, v2) rn -- numbers rows with identical v1 and v2
                                                    
            FROM tmp_table
           ) x
         WHERE rn > 1 
        ); -- removes the first row with identical v1, v2
    

    Results in

    id_bucket id_property v1 v2
    1 3 OT A
    1 8 RO C
    1 2 OT C
    1 1 RT D
    1 5 OT null
    1 4 OT null

    See the DB-fiddle for a full example.

    The benefit of a solution based on window functions is that the underlying table gets selected only once, while comparable solutions based on group by in general need to select the table at least a second time. This makes window functions much more efficient.
    Of course, that is only really relevant when table get large.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search