How to group by one column and limit to rows where another column has the same value for all rows in group? - Postgresql

mmdanziger
November 30, 2022
69 views
0 votes
2 Answers

I have a table like this

CREATE TABLE userinteractions
(
userid  bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);

My problem is that some of the data is polluted with multiple dobyr values for the same user.

The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.

I want to be able to create a clean table that contains unique userid and dobyr limited to the cases where there is only one value of dobyr for the userid in userinteractions.

For example I start with data like this:

userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989

And I want to select from this to get a table like this:

userid,dobyr
1,1995
2,1999
4,1989

Is there an elegant, efficient way to get this in a single sql query?

I am using postgres.

EDIT: I do not have permissions to modify the userinteractions table, so I need a SELECT solution, not a DELETE solution.

Answers

- Zegarek
- November 30, 2022 at 3:07 pm
- 0 votes
0
Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:
1. If there are many rows with the same userid value but also the same dobyr value, one of them is kept (doesn’t matter which one), rest gets discarded.
2. All rows for a given userid are discarded if it occurs with different dobyr values.
```
create table userinteractions_clean as 
select distinct on (userid,dobyr) * 
from userinteractions
where userid in ( 
    select   userid
    from     userinteractions
    group by userid
    having   count(distinct dobyr)=1 )
order by userid,dobyr;
```
This could also be done with an not in, not exists or exists conditions. Also, select which combination to keep by adding columns at the end of order by.
Updated demo with tests and more rows.

If you don’t need the other columns in the table, only something you’ll later use as a filter/whitelist, plain userid‘s from records with (userid,dobyr) pairs matching your criteria are enough, as they already uniquely identify those records:
```
create table userinteractions_whitelist as
select   userid
from     userinteractions
group by userid
having   count(distinct dobyr)=1
```
Login or Signup to reply.

- MatBailie
- November 30, 2022 at 3:22 pm
- 0 votes
0
Just use a HAVING clause to assert that all rows in a group must have the same dobyr.
```
SELECT
  userid,
  MAX(dobyr) AS dobyr
FROM
  userinteractions
GROUP BY
  userid
HAVING
  COUNT(DISTINCT dobyr) = 1
```
Login or Signup to reply.