skip to Main Content

TL;DR
Base table is incorrect, need to refine data using query. Final value is incorrect (slightly less) as nesting on itself leading to some records omission as it’s a partial join.

Detail-
This is a query that I am trying to write. My original table has a wrong logic and not sure how to get it rectified, hence decided to pivot the correct table using SQL.
Each out_id can have multiple distinct ord_type on any day. Let’s assume there are three ord_type-
Alpha, Beta and Gamma (and so on), not necessarily each exists on all days. Some days might have one, some neither, some all three.

Actual data in table:

dt out_id ord_type identifier1 Non_Recon_Amnt act_returns
16/12 01 Alpha True 1 3
16/12 01 Beta False 2 4
16/12 01 Gamma False 3 5
17/12 01 Beta False 4 6
17/12 01 Gamma False 5 7
18/12 01 Alpha True 6 8
18/12 01 Gamma False 7 9

Data I want from Query:

dt out_id ord_type identifier1 Non_Recon_Amnt act_returns
16/12 01 Alpha True 1 0
16/12 01 Beta False 2 7
16/12 01 Gamma False 3 5
17/12 01 Beta False 4 6
17/12 01 Gamma False 5 7
18/12 01 Alpha True 6 0
18/12 01 Gamma False 7 9
18/12 01 Beta False 0 8

This is because the very definition of Alpha means act_returns will be 0 and all values under this should ideally be attributed to Beta.

Data my current query gives-

dt out_id ord_type identifier1 Non_Recon_Amnt act_returns
16/12 01 Alpha True 1 0
16/12 01 Beta False 2 7
16/12 01 Gamma False 3 5
17/12 01 Beta False 4 6
17/12 01 Gamma False 5 7
18/12 01 Alpha True 6 0
18/12 01 Gamma False 7 9

You may notice it doesn’t replace Alpha’s act_returns value with Beta on 18th Dec


Full Query:

SELECT
    dt AS date_,
    out_id,
    out_name,
    ct,
    ord_type,
    identifier1,
    identifier2,
    SUM(Non_Recon_Amnt),
    SUM(ret_loss),
    SUM(act_returns)

FROM
    (
        SELECT
            dt,
            out_id,
            out_name,
            ct,
            ord_type,
            identifier1,
            identifier2,
            SUM(Non_Recon_Amnt) as Non_Recon_Amnt,
            CASE WHEN ord_type='Alpha' AND identifier1='true' THEN SUM(ret_loss) ELSE 0 END as ret_loss,
            CASE
                WHEN ord_type = 'Alpha' AND identifier1 = 'true' THEN 0
                WHEN ord_type = 'Beta' THEN
                    (
                        SELECT SUM(act_returns)
                        FROM generic_table
                        WHERE dt=G.dt AND out_id=G.out_id
                        AND ord_type = 'Alpha' AND identifier1 = 'true'
                        OR dt=G.dt AND out_id=G.out_id
                        AND ord_type = 'Beta'
                    )
                ELSE SUM(act_returns)
            END AS act_returns,
            FROM
            generic_table G
        WHERE
            dt >= current_date - 30 AND dt < current_date
        GROUP BY
            1, 2, 3, 4, 5, 6, 7
    ) AS subquery
GROUP BY 1, 2, 3, 4, 5, 6, 7

In my subquery,

CASE
                WHEN ord_type = 'Alpha' AND identifier1 = 'true' THEN 0
                WHEN ord_type = 'Beta' THEN
                    (
                        SELECT SUM(act_returns)
                        FROM generic_table
                        WHERE dt=G.dt AND out_id=G.out_id
                        AND ord_type = 'Alpha' AND identifier1 = 'true'
                        OR dt=G.dt AND out_id=G.out_id
                        AND ord_type = 'Beta'
                    )
                ELSE SUM(act_returns)
            END AS act_returns,`

since I have used ord_type Beta, if generic table doesn’t have Beta on that particular date, it skips them. How to make the solution better?

2

Answers


  1. This is for SQL Server since I don’t have redshift. Tried to keep the query as DBMS independent as possible. Certainly this could be done in fewer lines with window functions, rather than the last self-join, but not sure exactly what and how redshift supports.

    The idea is to first manufacturer your missing record for ‘Beta’ on each day. Then later you can self join and define act_returns how you’ve described.

    Fiddle: https://dbfiddle.uk/_oqtPxct

    CREATE TABLE generic_table (
        dt DATE,
        out_id VARCHAR(2),
        ord_type VARCHAR(10),
        identifier1 VARCHAR(5),
        Non_Recon_Amnt INT,
        act_returns INT
    );
    
    INSERT INTO generic_table (dt, out_id, ord_type, identifier1, Non_Recon_Amnt, act_returns) VALUES ('2023-12-16', '01', 'Alpha', 'True', 1, 3);
    INSERT INTO generic_table (dt, out_id, ord_type, identifier1, Non_Recon_Amnt, act_returns) VALUES ('2023-12-16', '01', 'Beta', 'False', 2, 4 );
    INSERT INTO generic_table (dt, out_id, ord_type, identifier1, Non_Recon_Amnt, act_returns) VALUES ('2023-12-16', '01', 'Gamma', 'False', 3, 5);
    INSERT INTO generic_table (dt, out_id, ord_type, identifier1, Non_Recon_Amnt, act_returns) VALUES ('2023-12-17', '01', 'Beta', 'False', 4, 6);
    INSERT INTO generic_table (dt, out_id, ord_type, identifier1, Non_Recon_Amnt, act_returns) VALUES ('2023-12-17', '01', 'Gamma', 'False', 5, 7);
    INSERT INTO generic_table (dt, out_id, ord_type, identifier1, Non_Recon_Amnt, act_returns) VALUES ('2023-12-18', '01', 'Alpha', 'True', 6, 8);
    INSERT INTO generic_table (dt, out_id, ord_type, identifier1, Non_Recon_Amnt, act_returns) VALUES ('2023-12-18', '01', 'Gamma', 'False', 7, 9);
    
    
    with with_mocked_missing_beta_record as (
      select * 
        from generic_table 
       union all
      select dt,
             out_id,
             'Beta' as ord_type,
             'False' as identifier1,
             0 as Non_Recon_Amnt,
             0 as act_returns
        from generic_table t1
       where not exists
               ( select 1
                   from generic_table t2
                  where t1.dt = t2.dt
                    and t1.out_id = t2.out_id
                    and t2.ord_type = 'Beta'
               )
         and t1.ord_type = 'Alpha'
    )
    select orig.dt,
           orig.out_id,
           orig.ord_type,
           orig.identifier1,
           orig.Non_Recon_Amnt,
           case when orig.ord_type = 'Alpha' then 0
                when orig.ord_type = 'Beta' then orig.act_returns
                                                 + 
                                                 coalesce(alpha_to_beta.act_returns, 0)
                else orig.act_returns
            end as act_returns
      from with_mocked_missing_beta_record orig
      left
      join with_mocked_missing_beta_record alpha_to_beta
        on orig.dt = alpha_to_beta.dt
       and orig.out_id = alpha_to_beta.out_id
       and orig.ord_type = 'Beta'
       and alpha_to_beta.ord_type = 'Alpha'
    order 
        by orig.dt,
           orig.ord_type;
    
    dt out_id ord_type Non_Recon_Amnt (No column name)
    2023-12-16 01 Alpha 1 0
    2023-12-16 01 Beta 2 7
    2023-12-16 01 Gamma 3 5
    2023-12-17 01 Beta 4 6
    2023-12-17 01 Gamma 5 7
    2023-12-18 01 Alpha 6 0
    2023-12-18 01 Beta 0 8
    2023-12-18 01 Gamma 7 9

    Edit: A bit more brief. I’d be surprised if Redshift didn’t support this. Same idea as above but just looks across the window instead of with a self join.

    with with_mocked_missing_beta_record as (
      select * 
        from generic_table 
       union all
      select dt,
             out_id,
             'Beta' as ord_type,
             'False' as identifier1,
             0 as Non_Recon_Amnt,
             0 as act_returns
        from generic_table t1
       where not exists
               ( select 1
                   from generic_table t2
                  where t1.dt = t2.dt
                    and t1.out_id = t2.out_id
                    and t2.ord_type = 'Beta'
               )
         and t1.ord_type = 'Alpha'
    )
    select dt,
           out_id,
           ord_type,
           identifier1,
           Non_Recon_Amnt,
           case when ord_type = 'Alpha' then 0
                when ord_type = 'Beta' then 
                     act_returns + 
                       max(case when ord_type = 'Alpha' then act_returns else 0 end)
                         over ( partition by dt, out_id)
                else act_returns
            end as act_returns
      from with_mocked_missing_beta_record
     order 
        by dt,
           ord_type
    
    Login or Signup to reply.
  2. The following LEFT JOINs on to a mapping table to duplicate Alpha rows as Beta rows, then uses aggregation and CASE expressions to move the values around as necessary.

    CREATE TABLE generic_table (
      dt               DATE,
      out_id           INT,
      ord_type         VARCHAR(8),
      identifier1      BOOLEAN,
      non_recon_amnt   INT,
      act_return       INT
    );
    
    INSERT INTO
      generic_table
    VALUES
      ('2023-12-16', 1, 'Alpha', True,  1, 3),
      ('2023-12-16', 1, 'Beta',  False, 2, 4),
      ('2023-12-16', 1, 'Gamma', False, 3, 5),
      ('2023-12-17', 1, 'Beta',  False, 4, 6),
      ('2023-12-17', 1, 'Gamma', False, 5, 7),
      ('2023-12-18', 1, 'Alpha', True,  6, 8),
      ('2023-12-18', 1, 'Gamma', False, 7, 9)
    ;
    
    SELECT
      g.dt,
      g.out_id,
      COALESCE(map.ord_type,    g.ord_type),
      COALESCE(map.identifier1, g.identifier1),
      SUM(CASE WHEN map.source IS NULL
                 OR map.source  = map.ord_type THEN g.non_recon_amnt ELSE 0 END),
      SUM(CASE WHEN map.source  = map.ord_type THEN 0 ELSE g.act_return     END)
    FROM
      generic_table        AS g
    LEFT JOIN
    (
      SELECT
        'Alpha'   AS source,
        'Alpha'   AS ord_type,
        True      AS identifier1
    
      UNION ALL
    
      SELECT
        'Alpha'   AS source,
        'Beta'    AS ord_type,
        False     AS identifier1
    )
      AS map
        ON map.source = g.ord_type
    GROUP BY
      g.dt,
      g.out_id,
      COALESCE(map.ord_type,    g.ord_type),
      COALESCE(map.identifier1, g.identifier1)
    ORDER BY
      g.dt,
      g.out_id,
      COALESCE(map.ord_type,    g.ord_type),
      COALESCE(map.identifier1, g.identifier1)
    
    
    dt out_id coalesce coalesce sum sum
    2023-12-16 1 Alpha t 1 0
    2023-12-16 1 Beta f 2 7
    2023-12-16 1 Gamma f 3 5
    2023-12-17 1 Beta f 4 6
    2023-12-17 1 Gamma f 5 7
    2023-12-18 1 Alpha t 6 0
    2023-12-18 1 Beta f 0 8
    2023-12-18 1 Gamma f 7 9

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search