Postgresql non distinct pairs across multiple columns

convolution
March 29, 2024
304 views
3 votes
3 Answers

I want to return records from a table in postgresql that when a distinct value in column 1 has more then one different values in column 2 spread over multiple records.

As a simple example:

table1:

column1 - column2
1 – 1
1 – 2
2 – 1
2 – 1 
3 - 3
4 - 5

return:

1 - 1
1 - 2

There is a lot of examples showing distinct (such as this), but I’m looking for non distinct.

select distinct (column2, column1) my_distinct from table1;

Tags: postgresql sql

Answers

- Marth
- March 29, 2024 at 3:05 pm
- 0 votes
0
You can use a CTE to get all the values of column1 you’re looking for, then filter the table:

Schema (PostgreSQL v15)
```
CREATE TABLE t (
  column1 INTEGER,
  column2 INTEGER
);

INSERT INTO t
  (column1, column2)
VALUES
  (1, 1),
  (1, 2),
  (2, 1),
  (2, 1),
  (3, 3),
  (4, 5);
```
Query #1
```
WITH cte AS (
  SELECT column1
  FROM t
  GROUP BY 1
  HAVING COUNT(DISTINCT column2) > 1
)
SELECT column1, column2
FROM t
JOIN cte USING (column1);
```
column1 column2

1 1

1 2

Alternatively you can create a list of the column2 values (per column1) and unnest that, which avoids scanning the table a second time.

Query #2
```
SELECT column1, unnest(array_agg(column2)) AS column2
FROM t
GROUP BY 1
HAVING COUNT(DISTINCT column2) > 1;
```
column1 column2

1 1

1 2

View on DB Fiddle
Login or Signup to reply.

- Error_2646
- March 29, 2024 at 4:24 pm
- 0 votes
0
This is a vanilla way to do it. Just an exists to check for records where there is a different value on column2 for the same column1.

Fiddle: https://dbfiddle.uk/Zf9vTS9K
```
CREATE TABLE t (
  column1 INTEGER,
  column2 INTEGER
);

INSERT INTO t
  (column1, column2)
VALUES
  (1, 1),
  (1, 2),
  (2, 1),
  (2, 1),
  (3, 3),
  (4, 5);

select * 
  from t t1
 where exists
         (
           select 1
             from t t2
            where t1.column1 = t2.column1
              and t1.column2 <> t2.column2
         );
```
Login or Signup to reply.

- Charlieface
- March 29, 2024 at 4:54 pm
- 0 votes
0
You should use window functions for this, it is likely far more efficient than joins.

Unfortunately, COUNT(DISTINCT is not implemented for window functions, so you need to hack it with DENSE_RANK and MAX
```
with ranked as (
    select *,
      dense_rank() over (partition by column1 order by column2) as dr 
    from t
),
maxed as (
    select *,
      max(dr) over (partition by column1) as max_dr
    from ranked
)
select
  column1,
  column2
from maxed
where max_dr > 1;
```
Another option in your specific case is to compare counts with two different partitionings.
```
with counted as (
    select *,
      count(*) over (partition by column1, column2) as c1,
      count(*) over (partition by column1) as c2
    from t
)
select
  column1,
  column2
from counted
where c1 < c2;
```
db<>fiddle
Login or Signup to reply.

column1	column2
1	1
1	2

column1	column2
1	1
1	2

Please signup or login to give your own answer.

Click here to cancel reply.