skip to Main Content

I have the following data:

            country  objectid  objectuse
record_date
2022-07-20    chile         0          4
2022-07-01    chile         1          4
2022-07-02    chile         1          4
2022-07-03    chile         1          4
2022-07-04    chile         1          4
...             ...       ...        ...
2022-07-26     peru      3088          4
2022-07-27     peru      3088          4
2022-07-28     peru      3088          4
2022-07-30     peru      3088          4
2022-07-31     peru      3088          4

The data describes the daily usage of an object within a country for a single month (July 2022), and not all object are used every day. One of the things I am interested in finding is the sum of the monthly maximums for the month:

WITH month_max AS (
    SELECT
        country,
        objectid,
        MAX(objectuse) AS maxuse
    FROM mytable
    GROUP BY
        country,
        objectid
)
SELECT
    country,
    SUM(maxuse)
FROM month_max
GROUP BY country;

Which results in this:

country   sum
-------------
chile    1224
peru    17008   

But what I actually want is to get the rolling sum of the maxima from the beginning of the month up to each date. So that I get something that looks like:

            country       sum  
record_date
2022-07-01    chile         1
2022-07-01     peru         1
2022-07-02    chile         2
2022-07-02     peru         3
...             ...       ...
2022-07-31    chile       1224
2022-07-31     peru      17008

I tried using a window function like this to no avail:

SELECT
    *,
    SUM(objectuse) OVER (
        PARTITION BY country
        ORDER BY record_date ROWS 30 PRECEDING
    ) as cumesum
FROM mytable
order BY cumesum DESC;

Is there a way I can achieve the desired result in SQL?

Thanks in advance.

EDIT: For what it’s worth, I asked the same question but on Pandas and I received an answer; perhaps it helps to figure out how to do it in SQL.

2

Answers


  1. Chosen as BEST ANSWER

    What ended up working is probably not the most efficient approach to this problem. I essentially created backwards looking blocks from each day in the month back towards the beginning of the month. Within each of these buckets I get the maximum of objectuse for each objectid within that bucket. After taking the max, I sum across all the maxima for that backward looking period. I do this for every day in the data.

    Here is the query that does it:

    WITH daily_lookback AS (
        SELECT
            A.record_date,
            A.country,
            B.objectid,
            MAX(B.objectuse) AS maxuse
        FROM mytable AS A
        LEFT JOIN mytable AS B
            ON A.record_date >= B.record_date
            AND A.country = B.country
            AND DATE_PART('month', A.record_date) = DATE_PART('month', B.record_date)
            AND DATE_PART('year', A.record_date) = DATE_PART('year', B.record_date)
        GROUP BY
            A.record_date,
            A.country,
            B.objectid
    )
    SELECT
        record_date,
        country,
        SUM(maxuse) AS usetotal
    FROM daily_lookback
    GROUP BY 
        record_date,
        country
    ORDER BY
        record_date;
    

    Which gives me exactly what I was looking for: the cumulative sum of the objectid maximums for the backward looking period, like this:

                country       sum  
    record_date
    2022-07-01    chile         1
    2022-07-01     peru         1
    2022-07-02    chile         2
    2022-07-02     peru         3
    ...             ...       ...
    2022-07-31    chile       1224
    2022-07-31     peru      17008
    

  2. You need to change your inner query to use the windowed maximum:

    WITH month_max AS (
        SELECT record_date, country, objectid,
            MAX(objectuse) over (PARTITION BY country, objectid ORDER BY record_date) AS mx
        FROM mytable
    )
    SELECT record_date, country, SUM(mx) as "sum"
    FROM month_max
    GROUP BY record_date, country;
    

    This does assume one row per object per date.

    Here’s a re-written version of your query. With indexing it seems possible that it might run faster:

    select record_date, country, min(usetotal) as usetotal
    from mytable d inner join lateral (
        select distinct sum(max(objectuse)) over () as usetotal from mytable a
        where a.record_date between date_trunc('month', d.record_date) and d.record_date
          and a.country = d.country
        group by objectid
    ) T on 1 = 1
    group by record_date, country
    order by record_date, country;
    

    https://dbfiddle.uk/?rdbms=postgres_14&fiddle=63760e30aecf4c885ec4967045b6cd03

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search