skip to Main Content

The table statuses has 3 columns

  • id (uuid, PK)
  • time (timestamp with time zone, NOT NULL)
  • duration (integer, NOT NULL)

The records it contains are not evenly distributed; there could be a single record in a day or thousands in some other day.

I need to aggregate the duration (average) for any arbitrary time range such that the returned result has no more than 100 records.

Had it been grouping by day or by hour, it would have been pretty straightforward. But what’s throwing me off a bit is the arbitrary time range and finding that sweet spot so the records are within the limit.

Example:

Actual data:

+------------------------+----------+
| time                   | duration |
|------------------------+----------|
| 2023-06-30 03:49:43+00 | 96       |
| 2023-06-30 03:51:25+00 | 132      |
| 2023-06-30 03:51:55+00 | 124      |
| 2023-06-30 03:52:25+00 | 102      |
| 2023-06-30 03:52:55+00 | 119      |
| 2023-06-30 03:53:25+00 | 136      |
| 2023-06-30 03:53:55+00 | 92       |
| 2023-06-30 03:54:25+00 | 86       |
| 2023-06-30 03:54:55+00 | 99       |
| 2023-06-30 03:55:25+00 | 105      |
| 2023-06-30 03:55:55+00 | 94       |
| 2023-06-30 03:56:25+00 | 76       |
| 2023-06-30 03:56:55+00 | 77       |
| 2023-06-30 03:57:25+00 | 66       |
| 2023-06-30 03:57:55+00 | 80       |
| 2023-06-30 03:59:13+00 | 98       |
| 2023-06-30 03:59:43+00 | 201      |
| 2023-06-30 04:00:13+00 | 652      |
| 2023-06-30 04:00:43+00 | 154      |
| 2023-06-30 04:01:13+00 | 272      |
+------------------------+----------+

Assuming, I want at most 10 output (instead of 100 as originally mentioned in the question), I would expect the following output

+------------------------+----------+
| time                   | duration |
|------------------------+----------|
| 2023-06-30 03:49:43+00 | 114      |
| 2023-06-30 03:51:55+00 | 113      |
| 2023-06-30 03:52:55+00 | 127.5    |
| 2023-06-30 03:53:55+00 | 89.0     |
...6 more rows
+------------------------+----------+

2

Answers


  1. is this what you want (obviously you’d substitute ‘x’ and ‘y’ for the limits of your date range):

    with data_set as (select duration from table where time between 'x' and  'y' LIMIT 100)
    select avg(duration)
    from data_set
    ;
    

    If this isn’t what you want, can you update your question with some sample data and the result you want to achieve?

    Login or Signup to reply.
  2. Using window functions with lead.

    create table job_table(line_id integer generated always as identity, ts timestamptz, duration integer);
    
    insert into job_table(ts, duration) values
    (' 2023-06-30 03:49:43+00 ', 96),
    (' 2023-06-30 03:51:25+00 ', 132),
    (' 2023-06-30 03:51:55+00 ', 124),
    (' 2023-06-30 03:52:25+00 ', 102),
    (' 2023-06-30 03:52:55+00 ', 119),
    (' 2023-06-30 03:53:25+00 ', 136),
    (' 2023-06-30 03:53:55+00 ', 92 ),
    (' 2023-06-30 03:54:25+00 ', 86 ),
    (' 2023-06-30 03:54:55+00 ', 99 ),
    (' 2023-06-30 03:55:25+00 ', 105),
    (' 2023-06-30 03:55:55+00 ', 94 ),
    (' 2023-06-30 03:56:25+00 ', 76 ),
    (' 2023-06-30 03:56:55+00 ', 77 ),
    (' 2023-06-30 03:57:25+00 ', 66 ),
    (' 2023-06-30 03:57:55+00 ', 80 ),
    (' 2023-06-30 03:59:13+00 ', 98 ),
    (' 2023-06-30 03:59:43+00 ', 201),
    (' 2023-06-30 04:00:13+00 ', 652),
    (' 2023-06-30 04:00:43+00 ', 154),
    (' 2023-06-30 04:01:13+00 ', 272); 
    
    select 
        ts, round((duration + lead(duration) over(order by ts))/2.0, 2) 
    from 
        job_table 
    where 
       ts between '2023-06-30 03:49:43+00' and ' 2023-06-30 04:01:13+00' 
    limit 10;
    
               ts            | duration 
    -------------------------+----------
     06/29/2023 20:49:43 PDT |   114.00
     06/29/2023 20:51:25 PDT |   128.00
     06/29/2023 20:51:55 PDT |   113.00
     06/29/2023 20:52:25 PDT |   110.50
     06/29/2023 20:52:55 PDT |   127.50
     06/29/2023 20:53:25 PDT |   114.00
     06/29/2023 20:53:55 PDT |    89.00
     06/29/2023 20:54:25 PDT |    92.50
     06/29/2023 20:54:55 PDT |   102.00
     06/29/2023 20:55:25 PDT |    99.50
    
    
    
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search