I’m trying to get a distinct count of multiple columns grouped by another column, but I want the results to only include counts greater than ONE. So if I have the following:
SELECT *
FROM cast
ORDER BY cast_characters, cast_identities, cast_roles
cast_characters | cast_identities | cast_roles |
---|---|---|
Barry | William | Hero |
Barry | William | Hero |
Barry | Scott | Hero |
Barry | Scott | Hero |
Alice | Susan | Villain |
Jerry | Smith | Villain |
Jerry | Smith | Villain |
Carlos | Salvador | Supporting |
Carlos | Salvador | Supporting |
As I’d like to count by unique roles over ONE, based on the above, "Hero" should have two counts with Barry/William and Barry/Scott. "Villain", should have one count, with Jerry/Smith (Alice/Susan should be ignored, as there’s only one instance of her), and "Supporting" should have one with Carlos/Salvador. So I tried this and want:
SELECT cast_roles, COUNT(DISTINCT CONCAT(cast_characters, cast_identities, cast_roles)) AS 'cnt'
FROM cast
GROUP BY cast_roles
HAVING cnt > 1;
cast_roles | cnt |
---|---|
Hero | 2 |
Villains | 1 |
Supporting | 1 |
But I get…
cast_roles | cnt |
---|---|
Hero | 2 |
Villains | 2 |
Supporting | 1 |
So pretty close, but it looks like it’s counting all distinct characters/identities/roles regardless of how many instances. Indeed, when I remove the "having" element from the query, I get the same results, so it doesn’t seem to be doing anything, though it doesn’t give me an error message, either.
What am I missing?
2
Answers
This sub query might help get the results you want
fiddle
You should start with a subquery that removes all non-duplicated rows from the original data. Then get the per-role counts from that.
Result:
DEMO
Note that you shouldn’t include the column you’re grouping by in the
COUNT(DISTINCT ...)
expression.