skip to Main Content

We are storing data corresponding to rates (ex: electricity price) in a SQL table, such as:

Date Value
2022-08-25 01:00 12.3
2022-09-23 06:12 14.5
2022-10-18 05:34 9.8

The date interval between two rows is not regular. In this table, 12.3 is the current rate until it’s replaced by the new value on September 23rd, when the rate becomes 14.5

From there, we want to generate an hourly time-series, with each value corresponding to the correct rate, such as:

Date Value
2022-08-25 01:00 12.3
2022-08-25 02:00 12.3
2022-08-25 03:00 12.3
2022-08-25 04:00 12.3
2022-08-25 05:00 12.3
12.3
2022-09-23 06:12 14.5
2022-09-23 07:00 14.5
2022-09-23 08:00 14.5
14.5
2022-10-18 05:34 9.8
9.8

how you would generate such as time-series in PostgreSQL ?

2

Answers


  1. So you need to do two things: generate the time series with hourly intervals and then check for each interval which value was active during that.

    For Postgres I would also create a timestamp range that contains the start and end of the range in which the price is valid (excluding the upper bound). This can be used in a join condition against the generated time series

    with time_series ("date") as (
      select g.*
      from ( 
        select min("date") as start_date, max("date") as end_date
        from the_table
      ) x 
        cross join generate_series(x.start_date, x.end_date, interval '1 hour') as g
    ), ranges as (
      select tsrange("date", lead("date") over (order by "date"), '(]') as valid_during, 
             value
      from the_table
    )
    select ts."date",
           r.value 
    from time_series ts
      join ranges r on r.valid_during @> ts."date"
    

    If you don’t really need a "dynamic time series", you can just use generate_series() with a hard-coded start and end which would simplify this a bit.

    Online example

    Login or Signup to reply.
  2. This is solution for Postgres. I think it’s what you wanted, the intervals end with full hour and after generation ends the next hour is exact timestamp from the original table (see table). It was done through comparison of the generated date with original date truncated to the hours. To make sure that the last date appears in the result I made COALESCE on LAG window function to fill the NULL value with the last date. Hope it doesn’t look too hacky.

    hourly_interval value
    2022-08-25 01:00:00 12.3
    2022-08-25 02:00:00 12.3
    2022-09-23 06:00:00 12.3
    2022-09-23 06:12:00 14.5
    2022-09-23 07:00:00 14.5
    2022-10-18 05:00:00 14.5
    2022-10-18 05:34:00 9.8

    The result has 1303 rows

    WITH cte AS (
        SELECT *,
        date_trunc('hour',generate_series(date,
            COALESCE((LAG(date,-1) OVER (ORDER BY date)),date),
            '1 hour')) hourly_interval
        FROM electricity
    ) 
    SELECT
        CASE WHEN
            hourly_interval = date_trunc('hour',date)
        THEN
            date
        ELSE
            hourly_interval
        END AS hourly_interval,
        value
    FROM cte
    
    

    Feel free to fiddle around

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search