ID | Amount | Brand |
---|---|---|
1 | 10 | NULL |
1 | 20 | NULL |
2 | 30 | Mazada |
2 | NULL | BMW |
3 | 40 | NULL |
3 | 40 | KIA |
4 | NULL | Honda |
4 | NULL | Honda |
Above is the original table, and my goal is to find out any difference within the same ID for each column.
In the same ID group, if there is any difference in the column, creating a new column and then specify which column has difference.
The outcome should be like this
ID | Difference |
---|---|
1 | Amount |
2 | Amount |
2 | Brand |
3 | Brand |
My code for PostgreSQL
SELECT
ID,
'Amount' AS Difference
FROM
table
GROUP BY
ID
HAVING
COUNT(DISTINCT amount) > 1
OR (COUNT(amount) != COUNT(*) AND COUNT(DISTINCT amount) > 0)
UNION ALL
SELECT
ID,
'Brand' AS Difference
FROM
table
GROUP BY
ID
HAVING
COUNT(DISTINCT brand) > 1
OR (COUNT(brand) != COUNT(*) AND COUNT(DISTINCT brand) > 0)
Be careful that there is NULL value in the record, so that the aggregation function would not count the NULL value. The NULL vs NULL view it as same, but NULL vs any value view it as difference.
Is there any better or faster query for this solution? Like lateral join or window function? Thanks
2
Answers
Counting distinct values — including nulls — is easy. Add 1 to distinct count if you have at least one null. That being said, your query could as follows, making it easier to expand for multiple columns without doing union:
DB<>Fiddle
A single pass over the table should be quite a bit faster:
fiddle
Plus,
count(DISTINCT ...)
is notoriously expensive. And we don’t actually need that count. We just need to know if the min value differs from the max (or does not, but there is also at least oneNULL
).Very similar case with more explanation:
Looks like you have seen that one, but not yet applied to your case.
Like I mentioned over there: there are faster ways for many rows per group …