I have the following data:
country objectid objectuse
record_date
2022-07-20 chile 0 4
2022-07-01 chile 1 4
2022-07-02 chile 1 4
2022-07-03 chile 1 4
2022-07-04 chile 1 4
... ... ... ...
2022-07-26 peru 3088 4
2022-07-27 peru 3088 4
2022-07-28 peru 3088 4
2022-07-30 peru 3088 4
2022-07-31 peru 3088 4
The data describes the daily usage of an object within a country for a single month (July 2022), and not all object are used every day. One of the things I am interested in finding is the sum of the monthly maximums for the month:
WITH month_max AS (
SELECT
country,
objectid,
MAX(objectuse) AS maxuse
FROM mytable
GROUP BY
country,
objectid
)
SELECT
country,
SUM(maxuse)
FROM month_max
GROUP BY country;
Which results in this:
country sum
-------------
chile 1224
peru 17008
But what I actually want is to get the rolling sum of the maxima from the beginning of the month up to each date. So that I get something that looks like:
country sum
record_date
2022-07-01 chile 1
2022-07-01 peru 1
2022-07-02 chile 2
2022-07-02 peru 3
... ... ...
2022-07-31 chile 1224
2022-07-31 peru 17008
I tried using a window function like this to no avail:
SELECT
*,
SUM(objectuse) OVER (
PARTITION BY country
ORDER BY record_date ROWS 30 PRECEDING
) as cumesum
FROM mytable
order BY cumesum DESC;
Is there a way I can achieve the desired result in SQL?
Thanks in advance.
EDIT: For what it’s worth, I asked the same question but on Pandas and I received an answer; perhaps it helps to figure out how to do it in SQL.
2
Answers
What ended up working is probably not the most efficient approach to this problem. I essentially created backwards looking blocks from each day in the month back towards the beginning of the month. Within each of these buckets I get the maximum of
objectuse
for eachobjectid
within that bucket. After taking the max, I sum across all the maxima for that backward looking period. I do this for every day in the data.Here is the query that does it:
Which gives me exactly what I was looking for: the cumulative sum of the
objectid
maximums for the backward looking period, like this:You need to change your inner query to use the windowed maximum:
This does assume one row per object per date.
Here’s a re-written version of your query. With indexing it seems possible that it might run faster:
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=63760e30aecf4c885ec4967045b6cd03