I am trying to build the query to remove duplicate records and retain the latest in postgres.
Input : Object_Name,Id,name,insert_date,update_date
a,NULL,NULL,2023-08-28,
a,NULL,NULL,2023-08-29,
b,test,test1,,2023-08-29
a,NULL,NULL,2023-08-30,
Output :Need to delete duplicate rows where object name is same and Id and name is null and need to retain the latest record based on insert date.
b,test,test1,,2023-08-29
a,NULL,NULL,2023-08-30
I am trying below but it’s not working.
Delete from table
where object_name in (select object_name from table except select max(object_name) from table group by object_name)
But it’s not working as expected.
2
Answers
To remove duplicate records and retain the latest one based on the
insert_date
, you can combine common table expressions (CTEs) and a self-join. Here’s how you can construct the query:Replace
your_table
with the actual name of your table.Here’s how the query works:
LatestRecords
CTE selects distinct rows with non-nullId
andname
where theobject_name
appears. It orders the rows byobject_name
andinsert_date DESC
, meaning that for eachobject_name
, the latest record will be the first one.DELETE
statement joins theyour_table
with theLatestRecords
CTE on theobject_name
and checks if theinsert_date
of the row inyour_table
is earlier than theinsert_date
of the latest record inLatestRecords
. If this condition is met, the row inyour_table
will be deleted.This query will remove the duplicate rows while keeping the latest one for each
object_name
.One way to do this is to fetch every object name with the latest insert date in a subquery or a CTE and then to use
NOT EXISTS
to delete all other rows.This query will do the first step:
This will be our subquery or CTE. We will keep exactly those rows selected by this query. We will delete all others:
Note: You wrote this as your requirement:
Need to delete duplicate rows where object name is same and Id and name is null and need to retain the latest record based on insert date
I don’t know if the bold part is just to make clear your desired result after the deletion or if it should actually be a condition.
That’s why I wrote the part add this if necessary:… in the
WHERE
clause of the above delete command. It’s up to you to use that part or not.Anyway, we can check this is working as expected on this db<>fiddle with your sample data.