skip to Main Content

Is postgresql can do average for rolling data? I have this case where I need to average the values for 5 days and get all the days filled up while waiting for the actual values so basically if I’m in day 6 today, my formula would be day 1-5 then day 7 would be 2-6 but day 6 values comes from the average today and so on. You can see the values below. I have been searching if sql can do this but can’t find any function. I only found recursive but not available in postgres.

SAMPLE TABLE

enter image description here

2

Answers


  1. The key to calculating rolling means with missing value substitution is to maintain a buffer holding the values contributing to the mean. The following query demonstrates calculating a rolling mean using a recursive CTE with an array holding the contributing values:

    WITH RECURSIVE
      -- generate sample data as per original post
      t AS (
        SELECT s.n AS day_number, (CASE WHEN s.n <= 5 THEN s.n + 1 END)::float AS day_value
          FROM generate_series(1, 9) s(n)),
      cte AS (
        SELECT t.day_number,
               t.day_value AS original_day_value,
               t.day_value,
               COALESCE(t.day_value, 0) AS rolling_sum,
               CASE WHEN t.day_value IS NULL THEN 0 ELSE 1 END AS sample_count,
               -- array size determines maximum number of samples contributing to rolling mean
               array_fill(NULL::float, ARRAY[4]) || ARRAY[t.day_value] AS samples
          FROM t
          WHERE t.day_number = 1
        UNION ALL
        SELECT t.day_number,
               t.day_value AS original_day_value,
               w.day_value,
               -- maintain rolling sum to eliminate need to repeatedly sum samples
               cte.rolling_sum - COALESCE(samples[cte.sample_count - cardinality(cte.samples) + 1], 0) + COALESCE(w.day_value, 0) AS rolling_sum,
               -- because missing values have substitutions, sample_count can never decrease
               LEAST(cardinality(cte.samples),
                     (cte.sample_count + CASE WHEN w.day_value IS NOT NULL THEN 1 ELSE 0 END)) AS sample_count,
               -- shift out the oldest sample and append the newest
               cte.samples[2:] || w.day_value
          FROM t
          JOIN cte
            ON t.day_number = cte.day_number + 1
          CROSS JOIN LATERAL (
            -- compute the new day value once instead of repeating the calculation in the enclosing query's select list
            SELECT COALESCE(t.day_value, cte.rolling_sum / NULLIF(cte.sample_count, 0)) AS day_value) w)
    SELECT cte.*
      FROM cte
      ORDER BY cte.day_number;
    

    I don’t have access to a postgreSQL 9.1 database, so I have only run this with PostgreSQL 15.3. There might be some changes needed for the older version; however, the general concepts still apply.

    Login or Signup to reply.
  2. You can get what the 5-day running average you asked for with the window version of the avg() function, see here and here with demo here.

    select day
         , value
         , case when rn>5
                then p5::numeric(5,2)
                else null
           end "prior 5 day average"
      from (select * 
                 , avg(value) over win p5
                 , row_number() over() rn
              from sample_table
            window win as (order by day 
                           range between 5 preceding
                                     and 1 preceding
                          )   
           ) sq;
    

    Well at least you get what you asked for, 5 day prior average. You cannot however get what you have in your sample table as days 6 thru 10 do not have a valid value and I am not going to what values you lead to your posed results.

    NOTE: Demo actually run with v15. As v9.1 is no longer carried by db<>fiddle, but documentation references are v9.1.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search