skip to Main Content

The table is represented by following scripts:

CREATE TABLE sales (
  id SERIAL PRIMARY KEY,
  product_id INTEGER,
  sales_date DATE,
  quantity INTEGER,
  price NUMERIC
);

INSERT INTO sales (product_id, sales_date, quantity, price) VALUES
   (1, '2023-01-01', 10, 10.00),
   (1, '2023-01-02', 12, 12.00),
   (1, '2023-01-03', 15, 15.00),
   (2, '2023-01-01', 8, 8.00),
   (2, '2023-01-02', 10, 10.00),
   (2, '2023-01-03', 12, 12.00);

The task is to sum sales quantity for last 3 days per each product_id. The period must be counted backwards from maximum (last) date of each product_id. So for 1 maximum is 2023-01-03, same for 2. But for product_id 2 last day may be differ from 1 – let’s say 2023-01-05.

By applying this query with window function in subquery:

select product_id, max(increasing_sum) as quantity_last_3_days
   from 
        (SELECT product_id,
         SUM(quantity) OVER (PARTITION BY product_id ORDER BY sales_date RANGE BETWEEN INTERVAL '2 days'
                PRECEDING AND CURRENT ROW) AS increasing_sum
         FROM sales) as s
   group by product_id;

I receive the expected output:

  | product_id | quantity_last_3_days |
  |____________|______________________|            
  |_____1______|___________37_________|
  |_____2______|___________30_________|     
 

But is it solution optimal? Is there any way to solve the problem, by using window function without subquery ?

2

Answers


  1. If you want to avoid window functions (which one might because of their cognitive load), you could also solve the problem using a WITH clause (a.k.a. CTE). Since the query does not actually need anything more than a single threshold for each product id, you could express the filtering in a join condition with a CTE as follows:

    WITH latest_sales AS (
      SELECT product_id, max(sales_date) AS time 
      FROM sales 
      GROUP BY product_id
    )
    SELECT sales.product_id, sum(sales.quantity) as quantity_last_3_days
    FROM sales INNER JOIN latest_sales ON 
        sales.product_id = latest_sales.product_id 
        AND sales.sales_date >= latest_sales.time - interval '2 days'
    GROUP BY sales.product_id;
    
    Login or Signup to reply.
  2. But is it solution optimal?

    No, not if you just need "to sum sales quantity for last 3 days per each product_id".

    Is there any way to solve the problem, by using window function without subquery?

    You can usually trade window functions for correlated subqueries, lateral or scalar, but there’s an easier way to speed up your query, and correct it. It is doing extra work trying to get you a rolling/stepping sum; the window isn’t trying to get the 3 most recent days for each product_id.

    Instead, for each row, it’s looking back at rows with the same product_id up to 2 days prior. You’re later picking the 3 day period with the highest sum of quantity, which won’t necessarily be the 3 most recent dates.

    On 400k samples, your query takes a whole 1.0s without an index, 0.7s with a covering index, and you can go down from that to 0.4s without or 0.1s with the covering index. You just need to only ask about the sum for the 3 most recent dates for each product_id: demo at db<>fiddle

    select product_id,sum(quantity) as quantity_last_3_days
    from(select *,row_number()over(PARTITION BY product_id ORDER BY sales_date DESC)
         from sales)_
    where 3>=row_number
    group by product_id;
    

    The trick here is that the window function will execute with a Run Condition: (3 >= row_number() OVER (?)), which means it’ll just grab the 3 most recent and quit. It can even get them straight off the top of the covering index, without ever having to visit the table.
    Your original query has to scan the whole thing (either the entire table or the whole index, if it’s available), then sort that to get the max().

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search