skip to Main Content

I have a table like this

CREATE TABLE userinteractions
(
userid  bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);

My problem is that some of the data is polluted with multiple dobyr values for the same user.

The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.

I want to be able to create a clean table that contains unique userid and dobyr limited to the cases where there is only one value of dobyr for the userid in userinteractions.

For example I start with data like this:

userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989

And I want to select from this to get a table like this:

userid,dobyr
1,1995
2,1999
4,1989

Is there an elegant, efficient way to get this in a single sql query?

I am using postgres.

EDIT: I do not have permissions to modify the userinteractions table, so I need a SELECT solution, not a DELETE solution.

2

Answers


  1. Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:

    1. If there are many rows with the same userid value but also the same dobyr value, one of them is kept (doesn’t matter which one), rest gets discarded.
    2. All rows for a given userid are discarded if it occurs with different dobyr values.
    create table userinteractions_clean as 
    select distinct on (userid,dobyr) * 
    from userinteractions
    where userid in ( 
        select   userid
        from     userinteractions
        group by userid
        having   count(distinct dobyr)=1 )
    order by userid,dobyr;
    

    This could also be done with an not in, not exists or exists conditions. Also, select which combination to keep by adding columns at the end of order by.
    Updated demo with tests and more rows.

    If you don’t need the other columns in the table, only something you’ll later use as a filter/whitelist, plain userid‘s from records with (userid,dobyr) pairs matching your criteria are enough, as they already uniquely identify those records:

    create table userinteractions_whitelist as
    select   userid
    from     userinteractions
    group by userid
    having   count(distinct dobyr)=1
    
    Login or Signup to reply.
  2. Just use a HAVING clause to assert that all rows in a group must have the same dobyr.

    SELECT
      userid,
      MAX(dobyr) AS dobyr
    FROM
      userinteractions
    GROUP BY
      userid
    HAVING
      COUNT(DISTINCT dobyr) = 1
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search