Let’s say I have a table containing cities with their populations and a table containing buildings of different types. I want to know the percentage of people living in cities containing schools.
Cities:
name | pop |
---|---|
A | 10 |
B | 100 |
C | 1000 |
Buildings:
is_school | city |
---|---|
false | A |
true | B |
true | B |
true | C |
If I do somthing like:
SELECT
SUM(CASE WHEN building.is_school = true THEN city.pop ELSE 0 END) school,
SUM(city.pop) total
FROM city
LEFT JOIN building ON building.city = city.name;
I get city B population summed twice.
I would like to have:
school | total |
---|---|
1100 | 1110 |
but I get:
school | total |
---|---|
1200 | 1210 |
I can do a subquery:
SELECT
SUM(CASE WHEN city.name in (
SELECT city.name
FROM city
LEFT JOIN building ON building.city = city.name
WHERE building.is_school = true
) THEN city.pop ELSE 0 END),
SUM(city.pop)
FROM city;
but it feels unnecessary convoluted considering what I’m trying to achieve, is there really no other way ?
2
Answers
You can do:
Result:
Or, you can use a lateral join:
See running examples at db<>fiddle.
Another approach you can use is to join with a filtered version of buildings instead of building itself
It is not recommended to use DISTINCT for huge database, since the process of determining distinct records is very slow, but at least the query is slim and cute