Postgresql - Limit RANGE window function amount of rows considered

mirabilos
September 19, 2023
186 views
0 votes
2 Answers

I have a PostgreSQL scenario where I have a database table with packet captures (each packet has a timestamp and a length), and I want to calculate the “current” bandwidth in use. I’m currently using a 100 ms sliding window, like this (simplified from the full query for readability reasons):

WITH
    counted AS (
        SELECT pkttimestamp, pktlen,
            sum(pktlen) OVER (
                ORDER BY pkttimestamp
                -- 0.1 PRECEDING ⇒ 100ms sliding window
                RANGE BETWEEN 0.1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
            ) AS lensum, min(pkttimestamp) OVER (
                ORDER BY pkttimestamp
                -- same sliding window as above!
                RANGE BETWEEN 0.1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
            ) AS smallesttimestamp
        FROM packets
    ),
    divided AS (
        SELECT pkttimestamp, pktlen,
            lensum / (pkttimestamp - smallesttimestamp) AS currentbandwidth
        FROM counted
    )
SELECT * FROM divided; -- any final query

In this, I use a window function to calculate the sum of the preceding 100 ms worth of packets’ length, and the timestamp of the oldest packet in that range, which I then use (with the timestamp of the current packet) to divide the bytes that have been sent since the oldest timestamp up to just before “this” packet is sent by the timestamp difference to get the sliding average bandwidth.

This works well, limiting to 100 ms makes the resulting graph “good” when there’s few traffic (e.g. the area before iperf tests where there’s only the occasional packet). However, during iperf tests with high bandwidth settings, the 100 ms are too rough (especially if the available bandwidth suddenly drops, the graph takes 100 ms to catch up).

Reducing the sliding window is an option, but it results in more jagged, less “good” graphs in low bandwidth scenarios (not enough packets in the range).

Can I reword the window functions to put an “extra limit” on top, so that the preceding 0.1 seconds are selected, but only the closest say 50 rows? (Given how I do any further calculations based on the first/lowest timestamp in the window, it doesn’t effect the calculations if fewer rows are selected.)

Just switching to ROWS BETWEEN 50 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW is not an option because it produces really bad results for when the last 50 packets were only captured across multiple minutes.

Tags: postgresql window-functions

Answers

- MuhammadSarmad
- September 4, 2023 at 9:39 pm
- 0 votes
0
You can change your SQL query to adapt the window size based on the number of rows inside a given time frame to accomplish dynamic window sizing for your bandwidth calculation in PostgreSQL. This enables a coarser granularity during times of low traffic and a finer one during times of high bandwidth.
The modified query first determines the current timestamp currenttimestamp and the cumulative sum of packet lengths lensum for each row. The timestamp of the preceding row is then retrieved using the lag window function. Finally, the time difference between the timestamps of the current row and the previous row is taken into account while calculating the bandwidth dynamically. With this method, the window size is guaranteed to adjust to network traffic conditions, resulting in more precise and responsive bandwidth predictions, which are especially helpful in scenarios with fluctuating traffic loads. You can fine-tune this strategy to meet your unique demands by changing the number of rows and desired time range.

Login or Signup to reply.

- Zegarek
- September 6, 2023 at 1:30 pm
- 0 votes
0
You cannot do that (easily or cleanly) in a single window, but you can use two and switch between them. Demo:
```
SELECT pkttimestamp, 
       pktlen,
       --(pkttimestamp-lag(pkttimestamp,50)over w1) as time_reach_50_packets_back,
       --(count(*)over w2) as packet_reach_100_ms_back,
       case when (count(*)over w2)/*packet_reach_100_ms_back*/>50 
            then (sum(pktlen)OVER w1)/(pkttimestamp - min(pkttimestamp)OVER w1) 
            else (sum(pktlen)OVER w2)/(pkttimestamp - min(pkttimestamp)OVER w2)
            end as currentbandwidth
FROM packets
WINDOW w1 AS (order by pkttimestamp
              ROWS BETWEEN 50 PRECEDING AND CURRENT 
              ROW EXCLUDE CURRENT ROW ),
       w2 AS (ORDER BY pkttimestamp
              RANGE BETWEEN 0.1 PRECEDING AND CURRENT 
              ROW EXCLUDE CURRENT ROW )
```
1. It’s good to define your windows once, in their WINDOW section of the query and reference their alias. Especially in your case, when the window definition is pretty long and repeated whenever it’s needed.
2. Since numeric/interval operation you use for bandwidth is undefined in PostgreSQL, and neither is RANGE BETWEEN 0.1 PRECEDING for a timestamp column, I’m assuming your pkttimestamp is actually an epoch saved as numeric. If you make it a regular timestamp you could use date/time functions without having to cast and you can still do RANGE BETWEEN '100 ms'::interval PRECEDING.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Limit RANGE window function amount of rows considered

Answers