Let’s assume I have a table with 3 columns: Id, Value, date Time. I need to find the min, max, first and last value of each 1 minute window between the provided start and end time. And in case no row exists for the start time I need to backfill the last value before the start for it and even for any missing window use the last value.
e.g. Let’s assume a table, simplifying to just include time:
Id Time Value
1 10:10:05 3
1 10:11:06 4
1 10:13:13 5
1 10:13:19 9
1 10:13:32 8
1 10:14:35 2
And if I want the results from start time 10:12:00 to end time 10:14:00.
It will give for a window for 12-13 min as the following information:
start value = 4 (backfill from the last value in data from the preceeding value)
end value = 4
min value = 4
max value = 4
minute start = 10:12:00
minute end = 10:13:00
For a window for 13-14 min as the following information:
start value = 4 (backfill from the last value in data from the preceeding value)
end value = 8
min value = 4
max value = 9
minute start = 10:13:00
minute end = 10:14:00
Basically for any window use the last window value.
Is there a query that can do it? If the above query is quite complex can we at least do a query which simple gives the values between the provided start and end time however if no row exists for the start time, it simple gives the last value before the start time. Basically, I always have a value for start time.
SO for example if I ask for values between start and end time of 10:12:00 and 10:14:00 it will give the following values:
1 10:12:00 4 (back fill from last value)
1 10:13:13 5
1 10:13:19 9
1 10:13:32 8
Rest I’ll do programmatically.
2
Answers
Result:
The easiest way to do this is probably by making an overview of which values per minute are applicable and then grouping on them for min/max and distinct on (with sort is desc/asc) for start and end values.
Gettig to the overview has some issues to overcome, so i tried to clarify why to take certain actions with comments in the SQL.
Initiating table:
I named the table given in the question "main", since it was missing a name.
Getting overviews into a result:
Demo at db<>fiddle:
generate_series()
to generate 1-minute slots in your range, thenleft join
on the matching minutes.left join
will produce all-null
rows wherever you had nothing in that window.coalesce(val,lag(val)over w1)
to get the "previous" value in those all-null
rows. Problem is, PostgreSQL doesn’t haveskip null
clause and it could happen that the previous minute you’d want values from, is also empty. To address that, I aggregate all earlier values into an array,filter
to skipnull
, and ask for the last element.arr[-1]
. You’d have to either add an intermediate cte to first build the arrays in a cte/subquery, and reuse their name to doarr[array_upper(arr)]
, or usejsonb
arrays instead –jsonb
arrays support negative subscripts, but you have to cast from their resultingtext
type back to your initialint
. With native arrays you can also do(array_agg()over w1)[count(*)over w1]
similar to(array_agg())[count(*)]
.time
-looking values literally, which is why I used a time-based range type. If that’s just an example, this can be changed to use propertimestamptz
andtstzrange
.OFFSET 1
at the end skips the additional, initial row that you have to have to be able to backfill from it, but don’t want it in your final input, since it precedes your target range.