Here are the exemple tables:
Product Prices (prd)
start_date | product | price |
---|---|---|
2023-04-01 | prod_A | 10.0 |
2023-04-15 | prod_A | 20.0 |
2023-04-01 | prod_B | 20.0 |
Order Products (ord)
order_date | id | product |
---|---|---|
2023-04-01 | 10001 | prod_A |
2023-04-01 | 10001 | prod_B |
2023-04-02 | 10002 | prod_A |
2023-04-02 | 10002 | prod_B |
2023-04-16 | 10003 | prod_A |
2023-04-16 | 10003 | prod_B |
Desired Result
order_date | id | product | price |
---|---|---|---|
2023-04-01 | 10001 | prod_A | 10.0 |
2023-04-01 | 10001 | prod_B | 20.0 |
2023-04-02 | 10002 | prod_A | 10.0 |
2023-04-02 | 10002 | prod_B | 20.0 |
2023-04-16 | 10003 | prod_A | 20.0 |
2023-04-16 | 10003 | prod_B | 20.0 |
My first attempt was the following approach:
SELECT ord.order_date, ord.id, ord.product, prd.price
FROM tra
LEFT JOIN (
SELECT *
FROM prd
ORDER BY prd.start_date ASC
) AS prd ON ord.id = prd.id AND ord.order_date >= prd.start_date
But some records keep getting duplicated, like:
Undesired Result
order_date | id | product | price |
---|---|---|---|
2023-04-16 | 10003 | prod_A | 10.0 |
2023-04-16 | 10003 | prod_B | 20.0 |
I know why they are duplicated but don’t know what to do.
Any idea how I could fix the query?
4
Answers
This is called a lateral join, and the syntax is different (or may not be supported at all) depending on what kind of database you have. SQL Server uses
APPLY
, for example. Additionally, very often you can re-write the query using a windowing function for better performance.The intent of the query is indeed a lateral join, as Joel Coehoorn points out.
The technique is to move the predicates within the subquery.
In Postgres:
One of the benefits of the lateral join is that it can return multiple rows and columns. But if all you want is the price, a scalar subquery is good enough as well.
This is an other approach using
ABS
function to get the closest start date from the product table and the window functionrow_number()
to sort the dataset and pick only the first match (the smallest Date_Diff )Result :
Demo here
I used postgresql and used Lead function to fix this issue. Try and see if this works for you.
I used an additional column "end_date" that represents the start date of the next price record for the same product. This is obtained by using a window function (LEAD) to get the next start date for each product.