skip to Main Content

Let’s assume I have a table with 3 columns: Id, Value, date Time. I need to find the min, max, first and last value of each 1 minute window between the provided start and end time. And in case no row exists for the start time I need to backfill the last value before the start for it and even for any missing window use the last value.

e.g. Let’s assume a table, simplifying to just include time:


    Id Time    Value
    1 10:10:05 3
    1 10:11:06 4
    1 10:13:13 5
    1 10:13:19 9
    1 10:13:32 8
    1 10:14:35 2 

And if I want the results from start time 10:12:00 to end time 10:14:00.

It will give for a window for 12-13 min as the following information:

start value = 4 (backfill from the last value in data from the preceeding value) 
end value = 4
min value = 4
max value = 4
minute start = 10:12:00
minute end = 10:13:00

For a window for 13-14 min as the following information:

start value = 4 (backfill from the last value in data from the preceeding value) 
end value = 8
min value = 4
max value = 9
minute start = 10:13:00
minute end = 10:14:00

Basically for any window use the last window value.

Is there a query that can do it? If the above query is quite complex can we at least do a query which simple gives the values between the provided start and end time however if no row exists for the start time, it simple gives the last value before the start time. Basically, I always have a value for start time.

SO for example if I ask for values between start and end time of 10:12:00 and 10:14:00 it will give the following values:


    1 10:12:00 4 (back fill from last value)
    1 10:13:13 5
    1 10:13:19 9
    1 10:13:32 8

Rest I’ll do programmatically.

2

Answers


  1. Result:

    enter image description here

    The easiest way to do this is probably by making an overview of which values per minute are applicable and then grouping on them for min/max and distinct on (with sort is desc/asc) for start and end values.

    Gettig to the overview has some issues to overcome, so i tried to clarify why to take certain actions with comments in the SQL.

    Initiating table:

    I named the table given in the question "main", since it was missing a name.

    -- INIT table
    CREATE TABLE main (
      id integer,
      "time" timestamp without time zone,
      value integer
    );
    
    -- Fill table
    INSERT INTO main values 
        (1,'2024-01-01 10:10:05', 3),
        (1,'2024-01-01 10:11:06', 4),
        (1,'2024-01-01 10:13:13', 5),
        (1,'2024-01-01 10:13:19', 9),
        (1,'2024-01-01 10:13:32', 8),
        (1,'2024-01-01 10:14:35', 2);
    

    Getting overviews into a result:

    -- overview is a CTE to create an overview of all the time durations a 
    -- value spans. So from the variable time, till the time of the next record, 
    -- renamed end. This is also declared in minutes so we can use it later on.
    
    with overview as (
        SELECT 
            distinct on (a.time) 
            a.id, 
            a.time, 
            b.time as "end", 
            a.value, 
            date_trunc('minute', a.time) as minute_start, 
            date_trunc('minute', b.time) as minute_end 
        FROM 
            main a 
        left join 
            main b 
        on 
            a."time"<b."time" and 
            a.id = b.id 
        order by 
            a.time, b.time asc
        ),
    
    -- overview 2 makes an overview of records which are applicable for each minute. 
    -- This means any record which was originaly in main, unioned with any record 
    -- which also spans into another minute, added again because they are applicable 
    -- in two minutes. These will be added once with the original time (they count 
    -- in their original minute) and once with a time equal to the new minute 
    -- (they also count in the followup minute)
    
    overview2 as (
        select 
            id, 
            date_trunc('minute', "end") as time, 
            date_trunc('minute', "end") as minute, 
            value, 
            true as backfill 
        from 
            overview 
        where 
            minute_start <> minute_end
        UNION ALL
        select 
            id, 
            time, 
            date_trunc('minute', time) as minute, 
            value, 
            false as backfill 
        from 
            overview
        ),  
    
    -- then we still have an issue with minutes which are jumped by all "time"-values. 
    -- For example there is no "time" in minute 12. 
    -- Eventhough minute 13 gets the startdate from minute 11 in overview2, 
    -- minute 12 still has no backfilled startvalue. These are filled in overview3. 
    -- This is done, by generating a list of all minutes. Checking which are missing, 
    -- fetching the right values for them by joining on main on the condition 
    -- that the main time is smaller then the current time, and then picking the 
    -- value where the time is maximzed (via distinct on)
    
    overview3 as (
        select 
            * 
        from 
            overview2 
        UNION ALL (
            Select 
                distinct on (a.missingminute) 
                c.id, 
                a.missingminute as time, 
                a.missingminute as minute, 
                c.value, 
                true as backfill 
            from (
                SELECT 
                    date_trunc('minute', time.time) as missingminute
                FROM 
                    generate_series((select min(minute) from overview2),(select max(minute) from overview2),'1 minute'::interval) time 
                left join (
                    select distinct 
                        minute 
                    from 
                        overview2
                    ) b 
                on 
                    date_trunc('minute', time) = b.minute 
                where 
                    b.minute isnull
                ) a 
            left join 
                main c 
            on 
                a.missingminute > c.time 
            order by 
                a.missingminute, 
                c.time desc
            ) 
        order by 
            time
        )
    
    -- now that a full overview is generated, it just comes down to generating the 
    -- wanted values, and placing them in a result table
    
    select 
        t1.id, 
        t1.minute as minute_start, 
        t1.minute + interval '1 minute' as minute_end, 
        t1.backfill as start_backfill,
        t1.start, 
        t2.end, 
        -- below values are coalesce with startvalue since if a minute is missing, then it will only have a startvalue. In that case min = max = start. So if min or mox are null then they are start
        coalesce(t3.min, t1.start) as min, 
        coalesce(t3.max, t1.start) as max 
    from 
        (select distinct on (id, minute) id, minute, value as start, backfill from overview3 order by id, minute, time asc) t1 
    left join
        (select distinct on (id, minute) id, minute, value as end from overview3 order by id, minute, time desc) t2 on t1.id = t2.id and t1.minute = t2.minute 
    left join
        (select id, minute, min(value) min, max(value) max from overview2 group by id,minute) t3 on t1.id = t3.id and t1.minute = t3.minute
    
    Login or Signup to reply.
  2. Demo at db<>fiddle:

    with param(target_range) as (values ('[10:12:00,10:14:00]'::trange))
    ,add_missing as (
        select date_trunc('minute',g-'1 minute'::interval)::time as start,
               date_trunc('minute',g)::time as end,
               (array_agg("Value" order by "Time"))[count(*)] end_value,
               min("Value"),
               max("Value")
        from param
        cross join generate_series(current_date + lower(target_range),
                                   current_date + upper(target_range),
                                   '1 minute')g
        left join your_table t on date_trunc('minute',g-'1 minute'::interval)::time
                                 =date_trunc('minute',t."Time")
                              and g::time <@ target_range
        group by 1,2)
    select "start","end"
          ,((jsonb_agg(end_value)filter(where end_value is not null)over w1)->>-1
           )::int as start_value
          ,((jsonb_agg(end_value)filter(where end_value is not null)over w1)->>-1
           )::int as end_value
          ,((jsonb_agg(min)filter(where min is not null)over w1)->>-1
           )::int as min
          ,((jsonb_agg(max)filter(where max is not null)over w1)->>-1
           )::int as max
    from add_missing
    window w1 as (order by start)
    offset 1;
    
    start end start_value end_value min max
    10:12:00 10:13:00 4 4 4 4
    10:13:00 10:14:00 8 8 5 9
    1. There’s a gaps-and-islands aspect to your problem.
    2. You can use generate_series() to generate 1-minute slots in your range, then left join on the matching minutes.
    3. The left join will produce all-null rows wherever you had nothing in that window.
    4. Normally, you’d use coalesce(val,lag(val)over w1) to get the "previous" value in those all-null rows. Problem is, PostgreSQL doesn’t have skip null clause and it could happen that the previous minute you’d want values from, is also empty. To address that, I aggregate all earlier values into an array, filter to skip null, and ask for the last element.
    5. Native arrays don’t support negative subscripts, which means the last element can’t be taken as arr[-1]. You’d have to either add an intermediate cte to first build the arrays in a cte/subquery, and reuse their name to do arr[array_upper(arr)], or use jsonb arrays instead – jsonb arrays support negative subscripts, but you have to cast from their resulting text type back to your initial int. With native arrays you can also do (array_agg()over w1)[count(*)over w1] similar to (array_agg())[count(*)].
    6. I took your time-looking values literally, which is why I used a time-based range type. If that’s just an example, this can be changed to use proper timestamptz and tstzrange.
    7. OFFSET 1 at the end skips the additional, initial row that you have to have to be able to backfill from it, but don’t want it in your final input, since it precedes your target range.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search