Postgresql how to remove duplicated rows by columns value from a table without index or id

William
December 17, 2022
245 views
0 votes
2 Answers

I’m using Postgresql,and I have a table named stock,this table has no index or id:

open	high	low	close	volume	datetime
383.97	384.22	383.66	384.08	1298649	2022-12-16 14:25:00
383.59	384.065	383.45	383.98	991327	2022-12-16 14:20:00
383.59	384.065	383.45	383.98	991327	2022-12-16 14:20:00
383.59	384.065	383.45	383.98	991327	2022-12-16 14:20:00
383.64	384.2099	383.54	383.61	1439271	2022-12-16 14:15:00

How can I remove the rows that have duplicated datetime ,and only keep 1 row of it,only keep the latest row of it,by using Postgresql sql ?

The output should be:

open	high	low	close	volume	datetime
383.97	384.22	383.66	384.08	1298649	2022-12-16 14:25:00
383.59	384.065	383.45	383.98	991327	2022-12-16 14:20:00
383.64	384.2099	383.54	383.61	1439271	2022-12-16 14:15:00

Something like:

delete from stock where datetime duplicated > 1

Answers

- lemon
- December 17, 2022 at 7:10 pm
- 0 votes
0
One possible option to solve this problem is to:
- store one row of each of your duplicate records in a temporary table
- delete all records having duplicates from the original table
- adding duplicate records back from the temporary table to the original table
These steps can be condensed in the following three queries:
```
CREATE TEMPORARY TABLE deleted_values AS 
SELECT *
FROM tab
GROUP BY open, high, low, close, volume, datetime
HAVING COUNT(*) > 1;

DELETE FROM tab
USING deleted_values
WHERE tab.open = deleted_values.open
  AND tab.high = deleted_values.high
  AND tab.low = deleted_values.low
  AND tab.close = deleted_values.close
  AND tab.datetime = deleted_values.datetime;

INSERT INTO tab 
SELECT * FROM deleted_values;
```
Check the demo here.
Login or Signup to reply.

As your table does not contain the primary key you’ll have to use ctid

This query reports the duplication "index" – all rows with rn > 1 are duplicates

select ctid, tab.*,
row_number() over (partition by datatime order by open) rn
from tab;

ctid |open|datatime  |rn|
-----+----+----------+--+
(0,1)| 300|2022-12-16| 1|
(0,2)| 310|2022-12-16| 2|
(0,3)| 400|2022-12-17| 1|
(0,4)| 500|2022-12-18| 1|
(0,5)| 500|2022-12-18| 2|

Note that you set the partition by to your unique key and with order by you can controll which row will be preserved.

Than you use the ctidof the duplicated row to get rid of them

with t as (
select ctid, tab.*,
row_number() over (partition by datatime order by open) rn
from tab
)
delete from tab where ctid in (
   select ctid from t where rn > 1
);
-- 2 rows deleted

Sample (simplified) data

create table tab as 
select * from (values
(300, DATE'2022-12-16'),
(310, DATE'2022-12-16'),
(400, DATE'2022-12-17'),
(500, DATE'2022-12-18'),
(500, DATE'2022-12-18')
) tab(open, datatime)
;

Please signup or login to give your own answer.

Click here to cancel reply.