SQL Window Function over sliding time window - Amazon web services

ManuelMartinez
August 24, 2022
271 views
0 votes
2 Answers

I have the following data:

            country  objectid  objectuse
record_date
2022-07-20    chile         0          4
2022-07-01    chile         1          4
2022-07-02    chile         1          4
2022-07-03    chile         1          4
2022-07-04    chile         1          4
...             ...       ...        ...
2022-07-26     peru      3088          4
2022-07-27     peru      3088          4
2022-07-28     peru      3088          4
2022-07-30     peru      3088          4
2022-07-31     peru      3088          4

The data describes the daily usage of an object within a country for a single month (July 2022), and not all object are used every day. One of the things I am interested in finding is the sum of the monthly maximums for the month:

WITH month_max AS (
    SELECT
        country,
        objectid,
        MAX(objectuse) AS maxuse
    FROM mytable
    GROUP BY
        country,
        objectid
)
SELECT
    country,
    SUM(maxuse)
FROM month_max
GROUP BY country;

Which results in this:

country   sum
-------------
chile    1224
peru    17008

But what I actually want is to get the rolling sum of the maxima from the beginning of the month up to each date. So that I get something that looks like:

            country       sum  
record_date
2022-07-01    chile         1
2022-07-01     peru         1
2022-07-02    chile         2
2022-07-02     peru         3
...             ...       ...
2022-07-31    chile       1224
2022-07-31     peru      17008

I tried using a window function like this to no avail:

SELECT
    *,
    SUM(objectuse) OVER (
        PARTITION BY country
        ORDER BY record_date ROWS 30 PRECEDING
    ) as cumesum
FROM mytable
order BY cumesum DESC;

Is there a way I can achieve the desired result in SQL?

Thanks in advance.

EDIT: For what it’s worth, I asked the same question but on Pandas and I received an answer; perhaps it helps to figure out how to do it in SQL.

Answers

Chosen as BEST ANSWER

What ended up working is probably not the most efficient approach to this problem. I essentially created backwards looking blocks from each day in the month back towards the beginning of the month. Within each of these buckets I get the maximum of objectuse for each objectid within that bucket. After taking the max, I sum across all the maxima for that backward looking period. I do this for every day in the data.

Here is the query that does it:

WITH daily_lookback AS (
    SELECT
        A.record_date,
        A.country,
        B.objectid,
        MAX(B.objectuse) AS maxuse
    FROM mytable AS A
    LEFT JOIN mytable AS B
        ON A.record_date >= B.record_date
        AND A.country = B.country
        AND DATE_PART('month', A.record_date) = DATE_PART('month', B.record_date)
        AND DATE_PART('year', A.record_date) = DATE_PART('year', B.record_date)
    GROUP BY
        A.record_date,
        A.country,
        B.objectid
)
SELECT
    record_date,
    country,
    SUM(maxuse) AS usetotal
FROM daily_lookback
GROUP BY 
    record_date,
    country
ORDER BY
    record_date;

Which gives me exactly what I was looking for: the cumulative sum of the objectid maximums for the backward looking period, like this:

            country       sum  
record_date
2022-07-01    chile         1
2022-07-01     peru         1
2022-07-02    chile         2
2022-07-02     peru         3
...             ...       ...
2022-07-31    chile       1224
2022-07-31     peru      17008

(Edit)

You need to change your inner query to use the windowed maximum:

WITH month_max AS (
    SELECT record_date, country, objectid,
        MAX(objectuse) over (PARTITION BY country, objectid ORDER BY record_date) AS mx
    FROM mytable
)
SELECT record_date, country, SUM(mx) as "sum"
FROM month_max
GROUP BY record_date, country;

This does assume one row per object per date.

Here’s a re-written version of your query. With indexing it seems possible that it might run faster:

select record_date, country, min(usetotal) as usetotal
from mytable d inner join lateral (
    select distinct sum(max(objectuse)) over () as usetotal from mytable a
    where a.record_date between date_trunc('month', d.record_date) and d.record_date
      and a.country = d.country
    group by objectid
) T on 1 = 1
group by record_date, country
order by record_date, country;

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=63760e30aecf4c885ec4967045b6cd03

Please signup or login to give your own answer.

Click here to cancel reply.

SQL Window Function over sliding time window – Amazon web services

Answers