skip to Main Content

I’m currently trying to find a way to return all rows in which there is at least one other row with the same values in any number of columns and unique values in any number of other columns.

I want a generic approach to this to be able to apply it to any scenario in which I want to get all rows that have another row with a matching id or multiple id columns but also has all distinct values for another column or other columns.

My current challenge is trying to get rows from a license db where each row is a license. I want to see if multiple users have two of the same kind of license active. Therefore, I want to get all rows that represent active licenses that have at least one other row with a matching email and license type, but different license ids. This is so I can tell if a user is being charged twice for the same type of license. Here is an example table called Licenses (assume there is a unique primary key):

id type email uid
76 B bob 12
96 C bob 12
1 A bob 11
1 A bob 11
3 A bob 11
3 A bob 12
90 A bob 12
99 A joe 14
5 B joe 15
6 B joe 15
7 B joe 16
12 A pat 23
13 A pat 23
57 A ira 47
57 A ira 47
60 A ian 99

This is what I want to get:

id type email uid
1 A bob 11
3 A bob 11
90 A bob 12
5 B joe 15
6 B joe 15
7 B joe 16
12 A pat 23
13 A pat 23

But I am getting this (notice the the extra rows for bob):

id type email uid
1 A bob 11
1 A bob 11
3 A bob 11
3 A bob 12
90 A bob 12
5 B joe 15
6 B joe 15
7 B joe 16
12 A pat 23
13 A pat 23

So, ira was able to be filtered out properly since all records for ira and type A have the id of 57. BUT, bob has 2 extra rows included since the id differs from at least one other id with the same email and type. How can I write a query that will not return these duplicate IDs? This is the query I tried:

SELECT * FROM Licenses t1
WHERE EXISTS (
    SELECT 1 FROM Licenses t2
    WHERE t1.email = t2.email 
    AND t1.type = t2.type
    AND t1.id <> t2.id
) ORDER BY email;

How would I write another query that further filters on column x to make sure x is unique for all the entries as well?

Is there a generic way to approach a problem like this?

Thanks!

2

Answers


  1. This should work:

    with dup_licenses as (
        select email, type, id
        from licenses
        group by email, type, id
        having count(1) >1
    )
    SELECT * FROM Licenses t1
    INNER JOIN DUP_LICENSES dl ON
        t1.email = dl.email 
        AND t1.type =dl.type
        AND t1.id = dl.id
    ) ORDER BY t1.email;
    

    if you want to change the columns that define duplicates then just add/remove them from the CTE SELECT/GROUP BY and amend the join condition

    Login or Signup to reply.
  2. Try the following:

    select id, type, email, uid 
    from
    (
      select *,
        min(id) over (partition by type, email) min_id,
        max(id) over (partition by type, email) max_id,
        row_number() over (partition by id, type, email order by uid)  rn -- or maybe order by uid desc
      from tbl_name
    ) t
    where min_id <> max_id -- this is to ensure that there are multiple ids for each (type, email) group
      and rn = 1           -- this is to select only one row for each (id, type, email) group
    order by type, email, id
    

    demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search