Postgresql - psql - Resolve Duplicates without deleting, set max_identifier for the duplicates based on an account_id

Azar
February 11, 2024
107 views
1 vote
2 Answers

I am attempting to address these duplicates without removing the second entry. I am seeking a solution where duplicate records of an identifier would be assigned the maximum identifier value for the account_id.

select account_id, identifier, count(*)
from books                                           
group by account_id, identifier   
HAVING count(*) > 1;

 account_id | identifier | count 
------------+------------+-------
        111 |        155 |     2
        111 |        198 |     2
        111 |        178 |     2
        111 |        167 |     2
        111 |        196 |     2
        111 |        156 |     2
        111 |        150 |     2
        111 |        223 |     2

For Example: (Processing only the first record)
(Consider the maxidentifier as 223 based on the account_id)
For the below record

 account_id | identifier | count 
------------+------------+-------
        111 |        155 |     2

it should set the identifier as (max_identifier+1)

 account_id | identifier | count 
------------+------------+-------
        111 |        155 |     1
        111 |        224 |     1

Likewise it should do this in the loop for all the records without breaking other Identifiers and records.

#
# Table name: books
#
#  id                 :bigint           not null, primary key
#  account_id         :bigint           not null
#  identifier         :bigint           not null

# Indexes
#  unique_account_identifier             (account_id,identifier) UNIQUE

Answers

Chosen as BEST ANSWER

with max_id as (select account_id,max(identifier) last_identifier from books group by 1
            )
            
update books set identifier = last_identifier + r_id 
from (SELECT id,last_identifier, ROW_NUMBER() OVER (PARTITION BY sub.account_id order by id) r_id
    FROM (
        SELECT id, account_id, identifier, ROW_NUMBER() OVER (PARTITION BY account_id, identifier order by id) AS rn
        FROM invoices
    ) sub
 left join max_id on sub.account_id = max_id.account_id
    WHERE rn > 1)a 
where a.id= books.id;

This resolved the Duplicate issue without Deleting.This Query updates the duplicates with the max_identifier of an acocunt_id.

(Edit)

This should do it:

UPDATE books
SET identifier = new_identifier
FROM (
  SELECT
    id,
    (
      SELECT MAX(identifier) FROM books WHERE account_id = dup.account_id
    ) + (
      ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY id)
    ) AS new_identifier
  FROM books dup
  WHERE EXISTS(
    SELECT *
    FROM books orig
    WHERE dup.account_id = orig.account_id
      AND dup.identifier = orig.identifier
      AND orig.id < dup.id
  )
) to_update
WHERE books.id = to_update.id

^{(online demo)}

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – psql – Resolve Duplicates without deleting, set max_identifier for the duplicates based on an account_id

Answers