skip to Main Content

I have the following tables (simplified) in Postgres:

CREATE TABLE party(
    id int PRIMARY KEY,
    family_name varchar(50) NOT NULL
);

CREATE TABLE election(
    id int,
    country_name varchar(50) NOT NULL,
    e_type election_type NOT NULL,
    e_date date NOT NULL,
    vote_share numeric,
    seats int,
    seats_total int NOT NULL,
    party_name_short varchar(10) NOT NULL,
    party_name varchar(255) NOT NULL,
    party_name_english varchar(255) NOT NULL,
    party_id int REFERENCES party(id)
);

I like to know how a certain political party family (Conservatives, Social Democracy and so on) in elections. It can be done quite easily:

SELECT
    e.country_name,
    extract(year FROM e.e_date) AS year,
    sum(e.vote_share) AS vote_share
FROM
    election e
    JOIN party p ON e.party_id = p.id
WHERE
    e.e_type = 'parliament'
    AND p.family_name IN ('Green/Ecologist')
    AND e.country_name = 'Austria'
    AND e.e_date >= '1980-01-01'::date
    AND e.e_date < '2020-01-01'::date
GROUP BY
    e.country_name,
    e.e_date

I’d like to know how certain party family performs in elections by decades, ie. what is vote share in years 1980-1989, 1990-1999 and son on. Luckily, Postgres has date_trunc function which does exactly what I want. I wrote the following query:

SELECT
    e.country_name,
    sum(e.vote_share) AS vote_share,
    extract(year FROM date_trunc('decade', e.e_date)) || 's' AS decade
    FROM
    election e
    JOIN party p ON e.party_id = p.id
WHERE
    e.e_type = 'parliament'
    AND p.family_name IN ('Green/Ecologist')
    AND e.country_name = 'Austria'
    AND e.e_date >= '1980-01-01'::date
    AND e.e_date < '2020-01-01'::date
GROUP BY
    e.country_name,
    decade

It does incorrect results as it seems to simply sum vote share. The query should instead sum the vote share in each election in a given decade and then divide by the number of election in that decade. How do I do that?

Here’s my incorrect results:

|country_name|vote_share|decade|
|------------|----------|------|
|Austria     |8.2       |1980s |
|Austria     |26.3      |1990s |
|Austria     |31        |2000s |
|Austria     |36.4      |2010s |
--------------------------------

As per comment, I provide input data:

+---------+------+------------+
| country | year | vote_share |
+---------+------+------------+
| Austria | 1983 |        1.4 |
| Austria | 1983 |        2.0 |
| Austria | 1986 |        4.8 |
| Austria | 1990 |        4.8 |
| Austria | 1990 |        2.0 |
| Austria | 1994 |        7.3 |
| Austria | 1995 |        4.8 |
| Austria | 1999 |        7.4 |
+---------+------+------------+

Expected result:

1980s: sum: 1,4 + 2 + 4,8 = 8,2
average vote share: 8,2 / 2 = 4,1 --  here I divide by 2 because there were two elections (1983, 1986)

2

Answers


  1. You need to divide the sum of votes by the vote count per decade.

    This below should help:

       with cte as
    (
        Select count(*) as elect_count, extract(year FROM date_trunc('decade', e_date)) || 's' AS decade
        From election
        group by decade
    ),
    cte2 as
    (
    SELECT
        e.country_name,
        sum(e.vote_share) vote_share,
        extract(year FROM date_trunc('decade', e.e_date)) || 's' AS decade
        FROM
        election e
        JOIN party p ON e.party_id = p.id
     
    WHERE
        e.e_type = 'parliament'
        AND p.family_name IN ('Green/Ecologist')
        AND e.country_name = 'Austria'
        AND e.e_date >= '1980-01-01'::date
        AND e.e_date < '2020-01-01'::date
    GROUP BY
        e.country_name,
        decade
      )
      select country_name, vote_share/elect_count, vote_share, elect_count, a.decade
      From cte2 a left join cte b on a.decade = b.decade
    
    Login or Signup to reply.
  2. The query should instead sum the vote share in each election in a given decade and then divide by the number of election in that decade.

    There’s nothing stopping you from doing arithmetics with aggregate expressions exactly how you phrased it:

    sum(e.vote_share) --'sum the vote share in each election in a given decade'
      /                                 --'and then divide'
    count(distinct                      --'by the number'
          extract(year FROM e.e_date) ) --'of elections in that decade'
    

    The second function counts different election years that, by nature, are unique within the decade you’re already grouping by, effectively giving you the number of elections in that decade. Demo:

    SELECT
        e.country_name,
        extract(year FROM date_trunc('decade', e.e_date)) || 's' AS decade,
        sum(e.vote_share)/count(distinct extract(year FROM e.e_date)) AS vote_share
    FROM election e
        JOIN party p ON e.party_id = p.id
    WHERE
        e.e_type = 'parliament'
        AND p.family_name IN ('Green/Ecologist')
        AND e.country_name = 'Austria'
        AND e.e_date >= '1980-01-01'::date
        AND e.e_date < '2020-01-01'::date
    GROUP BY
        e.country_name,
        decade
    
    country_name decade vote_share
    Austria 1980s 4.1000000000000000
    Austria 1990s 6.5750000000000000

    Apart from adding that one division you wanted, your query can remain unchanged.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search