skip to Main Content

I am trying to build the query to remove duplicate records and retain the latest in postgres.

Input : Object_Name,Id,name,insert_date,update_date

a,NULL,NULL,2023-08-28,
a,NULL,NULL,2023-08-29,
b,test,test1,,2023-08-29
a,NULL,NULL,2023-08-30,

Output :Need to delete duplicate rows where object name is same and Id and name is null and need to retain the latest record based on insert date.

b,test,test1,,2023-08-29
a,NULL,NULL,2023-08-30

I am trying below but it’s not working.
Delete from table
where object_name in (select object_name from table except select max(object_name) from table group by object_name)
But it’s not working as expected.

2

Answers


  1. To remove duplicate records and retain the latest one based on the insert_date, you can combine common table expressions (CTEs) and a self-join. Here’s how you can construct the query:

    WITH LatestRecords AS (
        SELECT DISTINCT ON (object_name)
               object_name, Id, name, insert_date, update_date
        FROM your_table
        WHERE (Id IS NULL OR name IS NULL)
        ORDER BY object_name, insert_date DESC
    )
    DELETE FROM your_table
    USING LatestRecords
    WHERE your_table.object_name = LatestRecords.object_name
      AND your_table.insert_date < LatestRecords.insert_date;
    

    Replace your_table with the actual name of your table.

    Here’s how the query works:

    1. The LatestRecords CTE selects distinct rows with non-null Id and name where the object_name appears. It orders the rows by object_name and insert_date DESC, meaning that for each object_name, the latest record will be the first one.
    2. The main DELETE statement joins the your_table with the LatestRecords CTE on the object_name and checks if the insert_date of the row in your_table is earlier than the insert_date of the latest record in LatestRecords. If this condition is met, the row in your_table will be deleted.

    This query will remove the duplicate rows while keeping the latest one for each object_name.

    Login or Signup to reply.
  2. One way to do this is to fetch every object name with the latest insert date in a subquery or a CTE and then to use NOT EXISTS to delete all other rows.

    This query will do the first step:

    SELECT 
      Object_Name, 
      MAX(insert_date) AS insert_date
    FROM input
    GROUP BY Object_Name;
    

    This will be our subquery or CTE. We will keep exactly those rows selected by this query. We will delete all others:

    WITH maxData AS
      (SELECT 
         Object_Name, 
         MAX(insert_date) AS insert_date
       FROM input
       GROUP BY Object_Name)
    DELETE FROM input i
    WHERE -- add this if necessary: id IS NULL AND name IS NULL AND
      NOT EXISTS 
        (SELECT 1 FROM maxData md WHERE
          i.object_name = md.object_name
          AND i.insert_date = md.insert_date);
    

    Note: You wrote this as your requirement:

    Need to delete duplicate rows where object name is same and Id and name is null and need to retain the latest record based on insert date

    I don’t know if the bold part is just to make clear your desired result after the deletion or if it should actually be a condition.

    That’s why I wrote the part add this if necessary:… in the WHERE clause of the above delete command. It’s up to you to use that part or not.

    Anyway, we can check this is working as expected on this db<>fiddle with your sample data.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search