Postgresql - Query optimization for this table

CCM
March 7, 2024
147 views
1 vote
2 Answers

ID	Amount	Brand
1	10	NULL
1	20	NULL
2	30	Mazada
2	NULL	BMW
3	40	NULL
3	40	KIA
4	NULL	Honda
4	NULL	Honda

Above is the original table, and my goal is to find out any difference within the same ID for each column.

In the same ID group, if there is any difference in the column, creating a new column and then specify which column has difference.

The outcome should be like this

ID	Difference
1	Amount
2	Amount
2	Brand
3	Brand

My code for PostgreSQL

SELECT
    ID,
    'Amount' AS Difference
FROM
    table
GROUP BY
    ID
HAVING
    COUNT(DISTINCT amount) > 1
    OR (COUNT(amount) != COUNT(*) AND COUNT(DISTINCT amount) > 0)

UNION ALL 

SELECT
    ID,
    'Brand' AS Difference
FROM
    table
GROUP BY
    ID
HAVING
    COUNT(DISTINCT brand) > 1
    OR (COUNT(brand) != COUNT(*) AND COUNT(DISTINCT brand) > 0)

Be careful that there is NULL value in the record, so that the aggregation function would not count the NULL value. The NULL vs NULL view it as same, but NULL vs any value view it as difference.

Is there any better or faster query for this solution? Like lateral join or window function? Thanks

Answers

- SalmanA
- March 7, 2024 at 9:52 am
- 0 votes
0
Counting distinct values — including nulls — is easy. Add 1 to distinct count if you have at least one null. That being said, your query could as follows, making it easier to expand for multiple columns without doing union:
```
with cte as (
    select id
         , count(distinct amount) + max(case when amount is null then 1 else 0 end) > 1 as a_is_different
         , count(distinct brand)  + max(case when brand  is null then 1 else 0 end) > 1 as b_is_different
    from t
    group by id
)
select id, column_name
from cte, lateral (values
    ('amount', a_is_different),
    ('brand', b_is_different)
) as v(column_name, is_different)
where is_different
```
DB<>Fiddle
Login or Signup to reply.

- ErwinBrandstetter
- March 7, 2024 at 9:55 am
- 0 votes
0
A single pass over the table should be quite a bit faster:
```
SELECT id, col
FROM  (
   SELECT id
        , min(amount) <> max(amount) AS a_diff, bool_or(true) FILTER (WHERE amount IS NULL) AS a_null
        , min(brand)  <> max(brand)  AS b_diff, bool_or(true) FILTER (WHERE brand  IS NULL) AS b_null
   FROM   tbl
   GROUP  BY id
   ) t
JOIN   LATERAL (
   VALUES
     ('Amount', a_diff, a_null)
   , ('Brand' , b_diff, b_null)
   ) x(col, diff, has_null) ON (diff OR NOT diff AND has_null)
ORDER  BY id, col
```
fiddle

Plus, count(DISTINCT ...) is notoriously expensive. And we don’t actually need that count. We just need to know if the min value differs from the max (or does not, but there is also at least one NULL).

Very similar case with more explanation:
- Optimize query for columns with distinct values per ID
Looks like you have seen that one, but not yet applied to your case.
Like I mentioned over there: there are faster ways for many rows per group …
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Query optimization for this table

Answers