skip to Main Content

Here are the exemple tables:

Product Prices (prd)

start_date product price
2023-04-01 prod_A 10.0
2023-04-15 prod_A 20.0
2023-04-01 prod_B 20.0

Order Products (ord)

order_date id product
2023-04-01 10001 prod_A
2023-04-01 10001 prod_B
2023-04-02 10002 prod_A
2023-04-02 10002 prod_B
2023-04-16 10003 prod_A
2023-04-16 10003 prod_B

Desired Result

order_date id product price
2023-04-01 10001 prod_A 10.0
2023-04-01 10001 prod_B 20.0
2023-04-02 10002 prod_A 10.0
2023-04-02 10002 prod_B 20.0
2023-04-16 10003 prod_A 20.0
2023-04-16 10003 prod_B 20.0

My first attempt was the following approach:

SELECT ord.order_date, ord.id, ord.product, prd.price
FROM tra
LEFT JOIN (
    SELECT *
    FROM prd
    ORDER BY prd.start_date ASC
) AS prd ON ord.id = prd.id AND ord.order_date >= prd.start_date

But some records keep getting duplicated, like:

Undesired Result

order_date id product price
2023-04-16 10003 prod_A 10.0
2023-04-16 10003 prod_B 20.0

I know why they are duplicated but don’t know what to do.
Any idea how I could fix the query?

4

Answers


  1. This is called a lateral join, and the syntax is different (or may not be supported at all) depending on what kind of database you have. SQL Server uses APPLY, for example. Additionally, very often you can re-write the query using a windowing function for better performance.

    Login or Signup to reply.
  2. The intent of the query is indeed a lateral join, as Joel Coehoorn points out.

    Any idea how I could fix the query?

    The technique is to move the predicates within the subquery.

    In Postgres:

    SELECT o.*, p.price, p.start_date
    FROM ord o
    LEFT JOIN LATERAL (
        SELECT p.*
        FROM prd p
        WHERE p.id = o.id and p.start_date <= o.order_date
        ORDER BY start_date DESC LIMIT 1
    ) p ON TRUE
    

    One of the benefits of the lateral join is that it can return multiple rows and columns. But if all you want is the price, a scalar subquery is good enough as well.

    Login or Signup to reply.
  3. This is an other approach using ABS function to get the closest start date from the product table and the window function row_number() to sort the dataset and pick only the first match (the smallest Date_Diff )

    with cte as (
      select o.*, p.price, ABS(order_date - start_date) as date_diff
      from ord o
      left join prd p on p.product = o.product
    ),
    cte2 as (
      select *, row_number() over (partition by order_date, id, product order by date_diff asc) as rn
      from cte
    )
    select order_date,  id, product,    price
    from cte2
    where rn = 1;
    

    Result :

    order_date  id      product price
    2023-04-01  10001   prod_A  10
    2023-04-01  10001   prod_B  20
    2023-04-02  10002   prod_A  10
    2023-04-02  10002   prod_B  20
    2023-04-16  10003   prod_A  20
    2023-04-16  10003   prod_B  20
    

    Demo here

    Login or Signup to reply.
  4. I used postgresql and used Lead function to fix this issue. Try and see if this works for you.

    SELECT o.order_date, o.id, o.product, pp.price AS price
    FROM orders o
    LEFT JOIN (
        SELECT product, price, 
               start_date, 
               LEAD(start_date) OVER (PARTITION BY product ORDER BY start_date) AS end_date 
        FROM product_prices
    ) pp ON o.product = pp.product AND o.order_date >= pp.start_date AND (o.order_date < pp.end_date OR pp.end_date IS NULL)
    ORDER BY o.order_date, o.id, o.product;
    

    I used an additional column "end_date" that represents the start date of the next price record for the same product. This is obtained by using a window function (LEAD) to get the next start date for each product.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search