I have a PostgreSQL scenario where I have a database table with packet captures (each packet has a timestamp and a length), and I want to calculate the “current” bandwidth in use. I’m currently using a 100 ms sliding window, like this (simplified from the full query for readability reasons):
WITH
counted AS (
SELECT pkttimestamp, pktlen,
sum(pktlen) OVER (
ORDER BY pkttimestamp
-- 0.1 PRECEDING ⇒ 100ms sliding window
RANGE BETWEEN 0.1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
) AS lensum, min(pkttimestamp) OVER (
ORDER BY pkttimestamp
-- same sliding window as above!
RANGE BETWEEN 0.1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
) AS smallesttimestamp
FROM packets
),
divided AS (
SELECT pkttimestamp, pktlen,
lensum / (pkttimestamp - smallesttimestamp) AS currentbandwidth
FROM counted
)
SELECT * FROM divided; -- any final query
In this, I use a window function to calculate the sum of the preceding 100 ms worth of packets’ length, and the timestamp of the oldest packet in that range, which I then use (with the timestamp of the current packet) to divide the bytes that have been sent since the oldest timestamp up to just before “this” packet is sent by the timestamp difference to get the sliding average bandwidth.
This works well, limiting to 100 ms makes the resulting graph “good” when there’s few traffic (e.g. the area before iperf
tests where there’s only the occasional packet). However, during iperf
tests with high bandwidth settings, the 100 ms are too rough (especially if the available bandwidth suddenly drops, the graph takes 100 ms to catch up).
Reducing the sliding window is an option, but it results in more jagged, less “good” graphs in low bandwidth scenarios (not enough packets in the range).
Can I reword the window functions to put an “extra limit” on top, so that the preceding 0.1 seconds are selected, but only the closest say 50 rows? (Given how I do any further calculations based on the first/lowest timestamp in the window, it doesn’t effect the calculations if fewer rows are selected.)
Just switching to ROWS BETWEEN 50 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
is not an option because it produces really bad results for when the last 50 packets were only captured across multiple minutes.
2
Answers
You can change your SQL query to adapt the window size based on the number of rows inside a given time frame to accomplish dynamic window sizing for your bandwidth calculation in PostgreSQL. This enables a coarser granularity during times of low traffic and a finer one during times of high bandwidth.
The modified query first determines the current timestamp
currenttimestamp
and the cumulative sum of packet lengthslensum
for each row. The timestamp of the preceding row is then retrieved using thelag
window function. Finally, the time difference between the timestamps of the current row and the previous row is taken into account while calculating the bandwidth dynamically. With this method, the window size is guaranteed to adjust to network traffic conditions, resulting in more precise and responsive bandwidth predictions, which are especially helpful in scenarios with fluctuating traffic loads. You can fine-tune this strategy to meet your unique demands by changing the number of rows and desired time range.You cannot do that (easily or cleanly) in a single window, but you can use two and switch between them. Demo:
WINDOW
section of the query and reference their alias. Especially in your case, when the window definition is pretty long and repeated whenever it’s needed.numeric
/interval
operation you use for bandwidth is undefined in PostgreSQL, and neither isRANGE BETWEEN 0.1 PRECEDING
for atimestamp
column, I’m assuming yourpkttimestamp
is actually an epoch saved asnumeric
. If you make it a regulartimestamp
you could use date/time functions without having to cast and you can still doRANGE BETWEEN '100 ms'::interval PRECEDING
.