I have a table like this
CREATE TABLE userinteractions
(
userid bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);
My problem is that some of the data is polluted with multiple dobyr
values for the same user.
The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.
I want to be able to create a clean table that contains unique userid
and dobyr
limited to the cases where there is only one value of dobyr
for the userid
in userinteractions
.
For example I start with data like this:
userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989
And I want to select from this to get a table like this:
userid,dobyr
1,1995
2,1999
4,1989
Is there an elegant, efficient way to get this in a single sql query?
I am using postgres.
EDIT: I do not have permissions to modify the userinteractions
table, so I need a SELECT
solution, not a DELETE
solution.
2
Answers
Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:
userid
value but also the samedobyr
value, one of them is kept (doesn’t matter which one), rest gets discarded.userid
are discarded if it occurs with differentdobyr
values.This could also be done with an
not in
,not exists
orexists
conditions. Also, select which combination to keep by adding columns at the end oforder by
.Updated demo with tests and more rows.
If you don’t need the other columns in the table, only something you’ll later use as a filter/whitelist, plain
userid
‘s from records with(userid,dobyr)
pairs matching your criteria are enough, as they already uniquely identify those records:Just use a HAVING clause to assert that all rows in a group must have the same dobyr.