I want to delete duplicates from a Redshift table that are true duplicates. Below is an example of two rows that are true duplicates.
Since it is Redshift, there are no primary keys to the table. Any help is appreciated.
id | Col 1 | Col 2 |
---|---|---|
1 | Val 1 | Val 2 |
1 | Val 1 | Val 2 |
I tried using window functions row_number()
, rank()
. Neither worked as when applying Delete command, SQL command cannot differentiate both rows.
Trial 1:
The below command deletes both rows
DELETE From test_table
where (id) IN
(
select *,row_number() over(partition by id) as rownumber from test*table where row*number !=1
);
Trial 2:
The below command retains both rows.
DELETE From test_table
where (id) IN
(
select *,rank() over(partition by id) as rownumber from test*table where row*number !=1
);
2
Answers
All row values are identical. Hence you unable to delete specific rows in that table.
In that I would recommend to create dummy table, and load unique records.
Steps to follow:
create table dummy as select * from main_table where 1=2
insert into dummy(col1,col2..coln) select distinct col1,col2..coln from main_table;
Alter table main_table rename to main_table_bk
alter table dummy rename to main.
drop main_table_bk
Hope it will help.
You cannot delete one without deleting the other as they are identical. The way to do this is to: