Postgresql - Splitting results by decades in Postgres

menteith
September 19, 2023
253 views
0 votes
2 Answers

I have the following tables (simplified) in Postgres:

CREATE TABLE party(
    id int PRIMARY KEY,
    family_name varchar(50) NOT NULL
);

CREATE TABLE election(
    id int,
    country_name varchar(50) NOT NULL,
    e_type election_type NOT NULL,
    e_date date NOT NULL,
    vote_share numeric,
    seats int,
    seats_total int NOT NULL,
    party_name_short varchar(10) NOT NULL,
    party_name varchar(255) NOT NULL,
    party_name_english varchar(255) NOT NULL,
    party_id int REFERENCES party(id)
);

I like to know how a certain political party family (Conservatives, Social Democracy and so on) in elections. It can be done quite easily:

SELECT
    e.country_name,
    extract(year FROM e.e_date) AS year,
    sum(e.vote_share) AS vote_share
FROM
    election e
    JOIN party p ON e.party_id = p.id
WHERE
    e.e_type = 'parliament'
    AND p.family_name IN ('Green/Ecologist')
    AND e.country_name = 'Austria'
    AND e.e_date >= '1980-01-01'::date
    AND e.e_date < '2020-01-01'::date
GROUP BY
    e.country_name,
    e.e_date

I’d like to know how certain party family performs in elections by decades, ie. what is vote share in years 1980-1989, 1990-1999 and son on. Luckily, Postgres has date_trunc function which does exactly what I want. I wrote the following query:

SELECT
    e.country_name,
    sum(e.vote_share) AS vote_share,
    extract(year FROM date_trunc('decade', e.e_date)) || 's' AS decade
    FROM
    election e
    JOIN party p ON e.party_id = p.id
WHERE
    e.e_type = 'parliament'
    AND p.family_name IN ('Green/Ecologist')
    AND e.country_name = 'Austria'
    AND e.e_date >= '1980-01-01'::date
    AND e.e_date < '2020-01-01'::date
GROUP BY
    e.country_name,
    decade

It does incorrect results as it seems to simply sum vote share. The query should instead sum the vote share in each election in a given decade and then divide by the number of election in that decade. How do I do that?

Here’s my incorrect results:

|country_name|vote_share|decade|
|------------|----------|------|
|Austria     |8.2       |1980s |
|Austria     |26.3      |1990s |
|Austria     |31        |2000s |
|Austria     |36.4      |2010s |
--------------------------------

As per comment, I provide input data:

+---------+------+------------+
| country | year | vote_share |
+---------+------+------------+
| Austria | 1983 |        1.4 |
| Austria | 1983 |        2.0 |
| Austria | 1986 |        4.8 |
| Austria | 1990 |        4.8 |
| Austria | 1990 |        2.0 |
| Austria | 1994 |        7.3 |
| Austria | 1995 |        4.8 |
| Austria | 1999 |        7.4 |
+---------+------+------------+

Expected result:

1980s: sum: 1,4 + 2 + 4,8 = 8,2
average vote share: 8,2 / 2 = 4,1 --  here I divide by 2 because there were two elections (1983, 1986)

Tags: postgresql sql

Answers

You need to divide the sum of votes by the vote count per decade.

This below should help:

   with cte as
(
    Select count(*) as elect_count, extract(year FROM date_trunc('decade', e_date)) || 's' AS decade
    From election
    group by decade
),
cte2 as
(
SELECT
    e.country_name,
    sum(e.vote_share) vote_share,
    extract(year FROM date_trunc('decade', e.e_date)) || 's' AS decade
    FROM
    election e
    JOIN party p ON e.party_id = p.id
 
WHERE
    e.e_type = 'parliament'
    AND p.family_name IN ('Green/Ecologist')
    AND e.country_name = 'Austria'
    AND e.e_date >= '1980-01-01'::date
    AND e.e_date < '2020-01-01'::date
GROUP BY
    e.country_name,
    decade
  )
  select country_name, vote_share/elect_count, vote_share, elect_count, a.decade
  From cte2 a left join cte b on a.decade = b.decade

- Zegarek
- September 18, 2023 at 10:03 pm
- 0 votes
0
The query should instead sum the vote share in each election in a given decade and then divide by the number of election in that decade.

There’s nothing stopping you from doing arithmetics with aggregate expressions exactly how you phrased it:
```
sum(e.vote_share) --'sum the vote share in each election in a given decade'
  /                                 --'and then divide'
count(distinct                      --'by the number'
      extract(year FROM e.e_date) ) --'of elections in that decade'
```
The second function counts different election years that, by nature, are unique within the decade you’re already grouping by, effectively giving you the number of elections in that decade. Demo:
```
SELECT
    e.country_name,
    extract(year FROM date_trunc('decade', e.e_date)) || 's' AS decade,
    sum(e.vote_share)/count(distinct extract(year FROM e.e_date)) AS vote_share
FROM election e
    JOIN party p ON e.party_id = p.id
WHERE
    e.e_type = 'parliament'
    AND p.family_name IN ('Green/Ecologist')
    AND e.country_name = 'Austria'
    AND e.e_date >= '1980-01-01'::date
    AND e.e_date < '2020-01-01'::date
GROUP BY
    e.country_name,
    decade
```
country_name decade vote_share

Austria 1980s 4.1000000000000000

Austria 1990s 6.5750000000000000

Apart from adding that one division you wanted, your query can remain unchanged.
Login or Signup to reply.

country_name	decade	vote_share
Austria	1980s	4.1000000000000000
Austria	1990s	6.5750000000000000

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Splitting results by decades in Postgres

Answers