skip to Main Content

I am attempting to address these duplicates without removing the second entry. I am seeking a solution where duplicate records of an identifier would be assigned the maximum identifier value for the account_id.

select account_id, identifier, count(*)
from books                                           
group by account_id, identifier   
HAVING count(*) > 1;

 account_id | identifier | count 
------------+------------+-------
        111 |        155 |     2
        111 |        198 |     2
        111 |        178 |     2
        111 |        167 |     2
        111 |        196 |     2
        111 |        156 |     2
        111 |        150 |     2
        111 |        223 |     2

For Example: (Processing only the first record)
(Consider the maxidentifier as 223 based on the account_id)
For the below record

 account_id | identifier | count 
------------+------------+-------
        111 |        155 |     2

it should set the identifier as (max_identifier+1)

 account_id | identifier | count 
------------+------------+-------
        111 |        155 |     1
        111 |        224 |     1

Likewise it should do this in the loop for all the records without breaking other Identifiers and records.

#
# Table name: books
#
#  id                 :bigint           not null, primary key
#  account_id         :bigint           not null
#  identifier         :bigint           not null

# Indexes
#  unique_account_identifier             (account_id,identifier) UNIQUE

2

Answers


  1. Chosen as BEST ANSWER
    with max_id as (select account_id,max(identifier) last_identifier from books group by 1
                )
                
    update books set identifier = last_identifier + r_id 
    from (SELECT id,last_identifier, ROW_NUMBER() OVER (PARTITION BY sub.account_id order by id) r_id
        FROM (
            SELECT id, account_id, identifier, ROW_NUMBER() OVER (PARTITION BY account_id, identifier order by id) AS rn
            FROM invoices
        ) sub
     left join max_id on sub.account_id = max_id.account_id
        WHERE rn > 1)a 
    where a.id= books.id;
    

    This resolved the Duplicate issue without Deleting.This Query updates the duplicates with the max_identifier of an acocunt_id.


  2. This should do it:

    UPDATE books
    SET identifier = new_identifier
    FROM (
      SELECT
        id,
        (
          SELECT MAX(identifier) FROM books WHERE account_id = dup.account_id
        ) + (
          ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY id)
        ) AS new_identifier
      FROM books dup
      WHERE EXISTS(
        SELECT *
        FROM books orig
        WHERE dup.account_id = orig.account_id
          AND dup.identifier = orig.identifier
          AND orig.id < dup.id
      )
    ) to_update
    WHERE books.id = to_update.id
    

    (online demo)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search