skip to Main Content

I have a table in Postgres with a column that has distinct alphanumeric values in the pattern 1234P001. However, due to some bug, there are duplicate values in the column, like 1234P001 appearing thrice.

I want to replace duplicate 1234P001‘s with 1234P002, 1234P003 and 1234P004. How can I do this in PostgresSql?

I tried using sequence but it didn’t work.

2

Answers


  1. This can be done with a temporary table and the use of row_number window function. Here is an illustration.

    -- Prepare a test case
    create table the_table (id integer, the_column text);
    insert into the_table values 
    (1, '1234P001'), 
    (2, '1235P001'), 
    (3, '1234P001'), 
    (4, '1236P001'), 
    (5, '1235P001'), 
    (6, '1234P001');
    
    
    create temporary table the_temp_table as 
     select *, row_number() over (partition by the_column order by id) ord 
     from the_table ;
    
    update the_temp_table
     set the_column = the_column||'.'||ord::text where ord > 1;
    
    truncate table the_table;
    
    insert into the_table(id, the_column)
     select id, the_column from the_temp_table;
    
    select * from the_table order by the_column;
    
    id the_column
    1 1234P001
    3 1234P001.2
    6 1234P001.3
    2 1235P001
    5 1235P001.2
    4 1236P001
    Login or Signup to reply.
  2. Using this sample data to illustrate the concept

    create table tab (id varchar(8) );
    insert into tab(id) values 
    ('1234P001'), 
    ('1234P001'), 
    ('1234P001'), 
    ('1234P002'), 
    ('1234P004'), 
    ('1234P004'),
    ('1234P005');
    

    First you need to identify the duplicated key – use count .. over

    select id,
    count(*) over (partition by id) > 1  is_dup
    from tab;
    
    id      |is_dup|
    --------+------+
    1234P001|true  |
    1234P001|true  |
    1234P001|true  |
    1234P002|false |
    1234P004|true  |
    1234P004|true  |
    1234P005|false |
    

    Assign each duplicated row a unique sequence number (you’ll see soon why)

    with dup as (
    select id,
    count(*) over (partition by id) > 1  is_dup
    from tab
    )
    select id,  
    row_number() over (order by id) dup_idx
    from dup
    where is_dup;
    
    id      |dup_idx|
    --------+-------+
    1234P001|      1|
    1234P001|      2|
    1234P001|      3|
    1234P004|      4|
    1234P004|      5|
    

    Now generate all not existing keys based on you key schema (here prefix of length 5 and 3 digit integer)

    with free_key as (
    select distinct substring(id,1,5)||lpad(idx::text,3,'0') id 
    from tab
    cross join generate_series(1,10) as t(idx) /* increase the count up to 999 if required */
    except 
    select id from tab)
    select id,
    row_number() over (order by id) free_id_idx
    from free_key
    
    id      |free_id_idx|
    --------+-----------+
    1234P003|          1|
    1234P006|          2|
    1234P007|          3|
    1234P008|          4|
    1234P009|          5|
    1234P010|          6|
    

    In the last step simple join the table with duplicated keys with the unassigned key using teh unique index to get the resolution old_id and the unique new_id

    Note I use an outer join – if you get an empty new_id there is a problem you have no free key to fix in your schema.

    with dup as (
    select id,
    count(*) over (partition by id) > 1  is_dup
    from tab
    ),
    dup2 as (
    select id,  
    row_number() over (order by id) dup_idx
    from dup
    where is_dup),
    free_key as (
    select distinct substring(id,1,5)||lpad(idx::text,3,'0') id 
    from tab
    cross join generate_series(1,10) as t(idx) /* increase the count up to 999 if required */
    except 
    select id from tab),
    free_key2 as (
    select id,
    row_number() over (order by id) free_id_idx
    from free_key)
    select dup2.id old_id, free_key2.id new_id
    from dup2
    left outer join free_key2
    on dup2.dup_idx = free_key2.free_id_idx;
    
    old_id  |new_id  |
    --------+--------+
    1234P001|1234P003|
    1234P001|1234P006|
    1234P001|1234P007|
    1234P004|1234P008|
    1234P004|1234P009|
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search