skip to Main Content

I’m using Postgresql,and I have a table named stock,this table has no index or id:

open high low close volume datetime
383.97 384.22 383.66 384.08 1298649 2022-12-16 14:25:00
383.59 384.065 383.45 383.98 991327 2022-12-16 14:20:00
383.59 384.065 383.45 383.98 991327 2022-12-16 14:20:00
383.59 384.065 383.45 383.98 991327 2022-12-16 14:20:00
383.64 384.2099 383.54 383.61 1439271 2022-12-16 14:15:00

How can I remove the rows that have duplicated datetime ,and only keep 1 row of it,only keep the latest row of it,by using Postgresql sql ?

The output should be:

open high low close volume datetime
383.97 384.22 383.66 384.08 1298649 2022-12-16 14:25:00
383.59 384.065 383.45 383.98 991327 2022-12-16 14:20:00
383.64 384.2099 383.54 383.61 1439271 2022-12-16 14:15:00

Something like:

delete from stock where datetime duplicated > 1

2

Answers


  1. One possible option to solve this problem is to:

    • store one row of each of your duplicate records in a temporary table
    • delete all records having duplicates from the original table
    • adding duplicate records back from the temporary table to the original table

    These steps can be condensed in the following three queries:

    CREATE TEMPORARY TABLE deleted_values AS 
    SELECT *
    FROM tab
    GROUP BY open, high, low, close, volume, datetime
    HAVING COUNT(*) > 1;
    
    DELETE FROM tab
    USING deleted_values
    WHERE tab.open = deleted_values.open
      AND tab.high = deleted_values.high
      AND tab.low = deleted_values.low
      AND tab.close = deleted_values.close
      AND tab.datetime = deleted_values.datetime;
    
    INSERT INTO tab 
    SELECT * FROM deleted_values;
    

    Check the demo here.

    Login or Signup to reply.
  2. As your table does not contain the primary key you’ll have to use ctid

    This query reports the duplication "index" – all rows with rn > 1 are duplicates

    select ctid, tab.*,
    row_number() over (partition by datatime order by open) rn
    from tab;
    
    ctid |open|datatime  |rn|
    -----+----+----------+--+
    (0,1)| 300|2022-12-16| 1|
    (0,2)| 310|2022-12-16| 2|
    (0,3)| 400|2022-12-17| 1|
    (0,4)| 500|2022-12-18| 1|
    (0,5)| 500|2022-12-18| 2| 
    

    Note that you set the partition by to your unique key and with order by you can controll which row will be preserved.

    Than you use the ctidof the duplicated row to get rid of them

    with t as (
    select ctid, tab.*,
    row_number() over (partition by datatime order by open) rn
    from tab
    )
    delete from tab where ctid in (
       select ctid from t where rn > 1
    );
    -- 2 rows deleted
    

    Sample (simplified) data

    create table tab as 
    select * from (values
    (300, DATE'2022-12-16'),
    (310, DATE'2022-12-16'),
    (400, DATE'2022-12-17'),
    (500, DATE'2022-12-18'),
    (500, DATE'2022-12-18')
    ) tab(open, datatime)
    ;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search