I have the following tables (simplified) in Postgres:
CREATE TABLE party(
id int PRIMARY KEY,
family_name varchar(50) NOT NULL
);
CREATE TABLE election(
id int,
country_name varchar(50) NOT NULL,
e_type election_type NOT NULL,
e_date date NOT NULL,
vote_share numeric,
seats int,
seats_total int NOT NULL,
party_name_short varchar(10) NOT NULL,
party_name varchar(255) NOT NULL,
party_name_english varchar(255) NOT NULL,
party_id int REFERENCES party(id)
);
I like to know how a certain political party family (Conservatives, Social Democracy and so on) in elections. It can be done quite easily:
SELECT
e.country_name,
extract(year FROM e.e_date) AS year,
sum(e.vote_share) AS vote_share
FROM
election e
JOIN party p ON e.party_id = p.id
WHERE
e.e_type = 'parliament'
AND p.family_name IN ('Green/Ecologist')
AND e.country_name = 'Austria'
AND e.e_date >= '1980-01-01'::date
AND e.e_date < '2020-01-01'::date
GROUP BY
e.country_name,
e.e_date
I’d like to know how certain party family performs in elections by decades, ie. what is vote share in years 1980-1989, 1990-1999 and son on. Luckily, Postgres
has date_trunc
function which does exactly what I want. I wrote the following query:
SELECT
e.country_name,
sum(e.vote_share) AS vote_share,
extract(year FROM date_trunc('decade', e.e_date)) || 's' AS decade
FROM
election e
JOIN party p ON e.party_id = p.id
WHERE
e.e_type = 'parliament'
AND p.family_name IN ('Green/Ecologist')
AND e.country_name = 'Austria'
AND e.e_date >= '1980-01-01'::date
AND e.e_date < '2020-01-01'::date
GROUP BY
e.country_name,
decade
It does incorrect results as it seems to simply sum vote share. The query should instead sum the vote share in each election in a given decade and then divide by the number of election in that decade. How do I do that?
Here’s my incorrect results:
|country_name|vote_share|decade|
|------------|----------|------|
|Austria |8.2 |1980s |
|Austria |26.3 |1990s |
|Austria |31 |2000s |
|Austria |36.4 |2010s |
--------------------------------
As per comment, I provide input data:
+---------+------+------------+
| country | year | vote_share |
+---------+------+------------+
| Austria | 1983 | 1.4 |
| Austria | 1983 | 2.0 |
| Austria | 1986 | 4.8 |
| Austria | 1990 | 4.8 |
| Austria | 1990 | 2.0 |
| Austria | 1994 | 7.3 |
| Austria | 1995 | 4.8 |
| Austria | 1999 | 7.4 |
+---------+------+------------+
Expected result:
1980s: sum: 1,4 + 2 + 4,8 = 8,2
average vote share: 8,2 / 2 = 4,1 -- here I divide by 2 because there were two elections (1983, 1986)
2
Answers
You need to divide the sum of votes by the vote count per decade.
This below should help:
There’s nothing stopping you from doing arithmetics with aggregate expressions exactly how you phrased it:
The second function counts different election years that, by nature, are unique within the decade you’re already grouping by, effectively giving you the number of elections in that decade. Demo:
Apart from adding that one division you wanted, your query can remain unchanged.