I have an input table like this:
Load_date | projected_date | current stock | consumed demand |
---|---|---|---|
2020-01-01 | 2020-01-01 | 100 | 15 |
2020-01-01 | 2020-01-02 | 150 | 80 |
2020-01-01 | 2020-01-03 | 200 | 25 |
2020-01-01 | 2020-01-04 | 300 | 70 |
2020-01-01 | 2020-01-05 | 400 | 65 |
My intermediate output will be like this:
For projected date: 2020-01-01
Load_date | projected_date | current stock | consumed demand | rolling consumed demand | Finished projected date |
---|---|---|---|---|---|
2020-01-01 | 2020-01-01 | 100 | 15 | — | 2020-01-03 |
2020-01-01 | 2020-01-02 | 150 | 80 | 80 | — |
2020-01-01 | 2020-01-03 | 200 | 25 | 105 | — |
2020-01-01 | 2020-01-04 | 300 | 70 | 175 | — |
2020-01-01 | 2020-01-05 | 400 | 65 | 240 | — |
For projected date: 2020-01-02
Load_date | projected_date | current stock | consumed demand | rolling consumed demand | Finished projected date |
---|---|---|---|---|---|
2020-01-01 | 2020-01-01 | 100 | 15 | — | |
2020-01-01 | 2020-01-02 | 150 | 80 | — | 2020-01-05 |
2020-01-01 | 2020-01-03 | 200 | 25 | 25 | — |
2020-01-01 | 2020-01-04 | 300 | 70 | 95 | — |
2020-01-01 | 2020-01-05 | 400 | 65 | 160 | — |
Similarly we have to derive the finished projected date for other projected dates
Final output will be:
Load_date | projected_date | current stock | consumed demand | Finished projected date |
---|---|---|---|---|
2020-01-01 | 2020-01-01 | 100 | 15 | 2020-01-03 |
2020-01-01 | 2020-01-02 | 150 | 80 | 2020-01-05 |
2020-01-01 | 2020-01-03 | 200 | 25 | — |
2020-01-01 | 2020-01-04 | 300 | 70 | — |
2020-01-01 | 2020-01-05 | 400 | 65 | — |
Can you please help me with the SQL query? I have tried to write query in Databricks SQL, but not getting this output. Actually it is a small part of my entire requirement.
The logic here is to calculate column Finished projected date, whenever the rolling consumed demand will exceed the current stock for a particular projected date, it will return that rolling consumed demand’s corresponding projected date.
Like in the above table, the 2020-01-01 projected date has current stock of 100. The rolling consumed demand calculation starts 1 day after actual projected date i.e. from 2020-01-02 onwards. The rolling consumed demand reaches to 105 on projected date 2020-01-03. So, for 1st row it will return the finished projected date as 2020-01-03.
Similarly for projected date 2020-01-02, the rolling consumed demand exceeds the total demand on 2020-01-05. As 160 > 150. So the finished projected date would be 2020-01-05.
2
Answers
I haven’t try it yet.
But should be look like this using sample table. You could this as an initial guide.
You could find refence and sample on this site. SQL Cumulative Sum: A Comprehensive Guide
To achieve the desired output, you can use a SQL query that involves window functions to compute the rolling consumed demand and then determine the finished projected date based on the logic you provided.
Here is the SQL query that should work in Databricks SQL:
Explanation:
CTE_RollingDemand: This common table expression calculates the rolling consumed demand for each projected date.
SUM(consumed_demand) OVER (PARTITION BY Load_date ORDER BY projected_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
computes the cumulative sum of consumed demand.CTE_FinishedProjectedDate: This common table expression finds the first projected date where the rolling consumed demand exceeds the current stock.
MIN(b.projected_date) AS Finished_projected_date
identifies the earliest projected date where the rolling consumed demand surpasses the current stock.Final Select: The final select statement joins the original input table with the
CTE_FinishedProjectedDate
to get theFinished_projected_date
for each row, ordering the results byLoad_date
andprojected_date
.Assumptions:
input_table
. Please replace it with the actual name of your table.This query should give you the final output with the finished projected date calculated as per your requirements.