The table is represented by following scripts:
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
product_id INTEGER,
sales_date DATE,
quantity INTEGER,
price NUMERIC
);
INSERT INTO sales (product_id, sales_date, quantity, price) VALUES
(1, '2023-01-01', 10, 10.00),
(1, '2023-01-02', 12, 12.00),
(1, '2023-01-03', 15, 15.00),
(2, '2023-01-01', 8, 8.00),
(2, '2023-01-02', 10, 10.00),
(2, '2023-01-03', 12, 12.00);
The task is to sum sales quantity for last 3 days per each product_id. The period must be counted backwards from maximum (last) date of each product_id. So for 1 maximum is 2023-01-03, same for 2. But for product_id 2 last day may be differ from 1 – let’s say 2023-01-05.
By applying this query with window function in subquery:
select product_id, max(increasing_sum) as quantity_last_3_days
from
(SELECT product_id,
SUM(quantity) OVER (PARTITION BY product_id ORDER BY sales_date RANGE BETWEEN INTERVAL '2 days'
PRECEDING AND CURRENT ROW) AS increasing_sum
FROM sales) as s
group by product_id;
I receive the expected output:
| product_id | quantity_last_3_days |
|____________|______________________|
|_____1______|___________37_________|
|_____2______|___________30_________|
But is it solution optimal? Is there any way to solve the problem, by using window function without subquery ?
2
Answers
If you want to avoid window functions (which one might because of their cognitive load), you could also solve the problem using a
WITH
clause (a.k.a. CTE). Since the query does not actually need anything more than a single threshold for each product id, you could express the filtering in a join condition with a CTE as follows:No, not if you just need "to sum sales
quantity
for last 3 days per eachproduct_id
".You can usually trade window functions for correlated subqueries,
lateral
or scalar, but there’s an easier way to speed up your query, and correct it. It is doing extra work trying to get you a rolling/stepping sum; the window isn’t trying to get the 3 most recent days for eachproduct_id
.Instead, for each row, it’s looking back at rows with the same
product_id
up to 2 days prior. You’re later picking the 3 day period with the highest sum ofquantity
, which won’t necessarily be the 3 most recent dates.On 400k samples, your query takes a whole
1.0s
without an index,0.7s
with a covering index, and you can go down from that to0.4s
without or0.1s
with the covering index. You just need to only ask about the sum for the 3 most recent dates for eachproduct_id
: demo at db<>fiddleThe trick here is that the window function will execute with a
Run Condition: (3 >= row_number() OVER (?))
, which means it’ll just grab the 3 most recent and quit. It can even get them straight off the top of the covering index, without ever having to visit the table.Your original query has to scan the whole thing (either the entire table or the whole index, if it’s available), then sort that to get the
max()
.