TL;DR
Base table is incorrect, need to refine data using query. Final value is incorrect (slightly less) as nesting on itself leading to some records omission as it’s a partial join.
Detail-
This is a query that I am trying to write. My original table has a wrong logic and not sure how to get it rectified, hence decided to pivot the correct table using SQL.
Each out_id can have multiple distinct ord_type on any day. Let’s assume there are three ord_type-
Alpha, Beta and Gamma (and so on), not necessarily each exists on all days. Some days might have one, some neither, some all three.
Actual data in table:
dt | out_id | ord_type | identifier1 | Non_Recon_Amnt | act_returns |
---|---|---|---|---|---|
16/12 | 01 | Alpha | True | 1 | 3 |
16/12 | 01 | Beta | False | 2 | 4 |
16/12 | 01 | Gamma | False | 3 | 5 |
17/12 | 01 | Beta | False | 4 | 6 |
17/12 | 01 | Gamma | False | 5 | 7 |
18/12 | 01 | Alpha | True | 6 | 8 |
18/12 | 01 | Gamma | False | 7 | 9 |
Data I want from Query:
dt | out_id | ord_type | identifier1 | Non_Recon_Amnt | act_returns |
---|---|---|---|---|---|
16/12 | 01 | Alpha | True | 1 | 0 |
16/12 | 01 | Beta | False | 2 | 7 |
16/12 | 01 | Gamma | False | 3 | 5 |
17/12 | 01 | Beta | False | 4 | 6 |
17/12 | 01 | Gamma | False | 5 | 7 |
18/12 | 01 | Alpha | True | 6 | 0 |
18/12 | 01 | Gamma | False | 7 | 9 |
18/12 | 01 | Beta | False | 0 | 8 |
This is because the very definition of Alpha means act_returns will be 0 and all values under this should ideally be attributed to Beta.
Data my current query gives-
dt | out_id | ord_type | identifier1 | Non_Recon_Amnt | act_returns |
---|---|---|---|---|---|
16/12 | 01 | Alpha | True | 1 | 0 |
16/12 | 01 | Beta | False | 2 | 7 |
16/12 | 01 | Gamma | False | 3 | 5 |
17/12 | 01 | Beta | False | 4 | 6 |
17/12 | 01 | Gamma | False | 5 | 7 |
18/12 | 01 | Alpha | True | 6 | 0 |
18/12 | 01 | Gamma | False | 7 | 9 |
You may notice it doesn’t replace Alpha’s act_returns value with Beta on 18th Dec
Full Query:
SELECT
dt AS date_,
out_id,
out_name,
ct,
ord_type,
identifier1,
identifier2,
SUM(Non_Recon_Amnt),
SUM(ret_loss),
SUM(act_returns)
FROM
(
SELECT
dt,
out_id,
out_name,
ct,
ord_type,
identifier1,
identifier2,
SUM(Non_Recon_Amnt) as Non_Recon_Amnt,
CASE WHEN ord_type='Alpha' AND identifier1='true' THEN SUM(ret_loss) ELSE 0 END as ret_loss,
CASE
WHEN ord_type = 'Alpha' AND identifier1 = 'true' THEN 0
WHEN ord_type = 'Beta' THEN
(
SELECT SUM(act_returns)
FROM generic_table
WHERE dt=G.dt AND out_id=G.out_id
AND ord_type = 'Alpha' AND identifier1 = 'true'
OR dt=G.dt AND out_id=G.out_id
AND ord_type = 'Beta'
)
ELSE SUM(act_returns)
END AS act_returns,
FROM
generic_table G
WHERE
dt >= current_date - 30 AND dt < current_date
GROUP BY
1, 2, 3, 4, 5, 6, 7
) AS subquery
GROUP BY 1, 2, 3, 4, 5, 6, 7
In my subquery,
CASE
WHEN ord_type = 'Alpha' AND identifier1 = 'true' THEN 0
WHEN ord_type = 'Beta' THEN
(
SELECT SUM(act_returns)
FROM generic_table
WHERE dt=G.dt AND out_id=G.out_id
AND ord_type = 'Alpha' AND identifier1 = 'true'
OR dt=G.dt AND out_id=G.out_id
AND ord_type = 'Beta'
)
ELSE SUM(act_returns)
END AS act_returns,`
since I have used ord_type Beta, if generic table doesn’t have Beta on that particular date, it skips them. How to make the solution better?
2
Answers
This is for SQL Server since I don’t have redshift. Tried to keep the query as DBMS independent as possible. Certainly this could be done in fewer lines with window functions, rather than the last self-join, but not sure exactly what and how redshift supports.
The idea is to first manufacturer your missing record for ‘Beta’ on each day. Then later you can self join and define act_returns how you’ve described.
Fiddle: https://dbfiddle.uk/_oqtPxct
Edit: A bit more brief. I’d be surprised if Redshift didn’t support this. Same idea as above but just looks across the window instead of with a self join.
The following LEFT JOINs on to a mapping table to duplicate Alpha rows as Beta rows, then uses aggregation and CASE expressions to move the values around as necessary.
fiddle