skip to Main Content

I want to return records from a table in postgresql that when a distinct value in column 1 has more then one different values in column 2 spread over multiple records.

As a simple example:

table1:

column1 - column2
1 – 1
1 – 2
2 – 1
2 – 1 
3 - 3
4 - 5

return:

1 - 1
1 - 2

There is a lot of examples showing distinct (such as this), but I’m looking for non distinct.

select distinct (column2, column1) my_distinct from table1;

3

Answers


  1. You can use a CTE to get all the values of column1 you’re looking for, then filter the table:

    Schema (PostgreSQL v15)

    CREATE TABLE t (
      column1 INTEGER,
      column2 INTEGER
    );
    
    INSERT INTO t
      (column1, column2)
    VALUES
      (1, 1),
      (1, 2),
      (2, 1),
      (2, 1),
      (3, 3),
      (4, 5);
    

    Query #1

    WITH cte AS (
      SELECT column1
      FROM t
      GROUP BY 1
      HAVING COUNT(DISTINCT column2) > 1
    )
    SELECT column1, column2
    FROM t
    JOIN cte USING (column1);
    
    column1 column2
    1 1
    1 2

    Alternatively you can create a list of the column2 values (per column1) and unnest that, which avoids scanning the table a second time.

    Query #2

    SELECT column1, unnest(array_agg(column2)) AS column2
    FROM t
    GROUP BY 1
    HAVING COUNT(DISTINCT column2) > 1;
    
    column1 column2
    1 1
    1 2

    View on DB Fiddle

    Login or Signup to reply.
  2. This is a vanilla way to do it. Just an exists to check for records where there is a different value on column2 for the same column1.

    Fiddle: https://dbfiddle.uk/Zf9vTS9K

    CREATE TABLE t (
      column1 INTEGER,
      column2 INTEGER
    );
    
    INSERT INTO t
      (column1, column2)
    VALUES
      (1, 1),
      (1, 2),
      (2, 1),
      (2, 1),
      (3, 3),
      (4, 5);
    
    select * 
      from t t1
     where exists
             (
               select 1
                 from t t2
                where t1.column1 = t2.column1
                  and t1.column2 <> t2.column2
             );
    
    Login or Signup to reply.
  3. You should use window functions for this, it is likely far more efficient than joins.

    Unfortunately, COUNT(DISTINCT is not implemented for window functions, so you need to hack it with DENSE_RANK and MAX

    with ranked as (
        select *,
          dense_rank() over (partition by column1 order by column2) as dr 
        from t
    ),
    maxed as (
        select *,
          max(dr) over (partition by column1) as max_dr
        from ranked
    )
    select
      column1,
      column2
    from maxed
    where max_dr > 1;
    

    Another option in your specific case is to compare counts with two different partitionings.

    with counted as (
        select *,
          count(*) over (partition by column1, column2) as c1,
          count(*) over (partition by column1) as c2
        from t
    )
    select
      column1,
      column2
    from counted
    where c1 < c2;
    

    db<>fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search