Removing duplicates in AWS Redshift using SQL

AIViz
December 15, 2022
145 views
0 votes
2 Answers

Summary:
I’ve got a table which looks something like this

Entity Id	Entity Name
01	Entity1
02	Entity2
01	Entity1
03	Entity3

Question
I’m trying to remove duplicates using SQL proc but struggling to find a solution. Is there a way to remove duplicates in this case using just SQL? Ideally, I’d like to create a procedure which I can call when required to remove duplicates.

I tried using CTE and row_number() but struggling to get it working…

Answers

- TimBiegeleisen
- December 15, 2022 at 4:33 pm
- 0 votes
0
If I glean your data correctly, a simple distinct select should work here:
```
SELECT DISTINCT EntityId, EntityName
FROM yourTable;
```
Login or Signup to reply.

- BillWeiner
- December 15, 2022 at 5:06 pm
- 0 votes
0
Your on the right track with row_number() but it looks like you are trying to delete all but one of the duplicated rows. This isn’t possible since there is no way to tell one copy from another. You will need to do this in a few steps.
1. Save a single copy of duplicated rows into a temp table. Using row_number() where this value equals 2 is a good way.
2. Delete all matching rows from the original table.
3. Insert the values from the temp table back into the original table.
You can do this in a transaction block so that no other user of the database will see the table with the rows missing.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.