I am using the row_number() functon to identify duplicates, partitioned by ID and Name:
SELECT *
, ROW_NUMBER() OVER(PARTITION BY "ID", "Name" ORDER BY "date" ASC) as row_number
from table;
My data looks like:
ID | Name | date | row_number |
---|---|---|---|
1 | Name1 | 2024-11-19 | 2 |
1 | Name1 | 2024-11-18 | 1 |
2 | Name1 | 2024-11-18 | 1 |
3 | Name2 | 2024-11-19 | 1 |
4 | Name3 | 2024-11-12 | 3 |
4 | Name3 | 2024-11-11 | 2 |
4 | Name3 | 2024-11-10 | 1 |
10 | Name7 | 2024-11-10 | 1 |
Now I want to keep only the rows where we have multiple times the same ID and Name combination.
So the third, fourth and last row should be removed and the output should be as below. How can this be done?
ID | Name | date | row_number |
---|---|---|---|
1 | Name1 | 2024-11-19 | 2 |
1 | Name1 | 2024-11-18 | 1 |
4 | Name3 | 2024-11-12 | 3 |
4 | Name3 | 2024-11-11 | 2 |
4 | Name3 | 2024-11-10 | 1 |
2
Answers
I would use
COUNT()
here as a window function, with the same partition as you were using withROW_NUMBER()
: