skip to Main Content

I have an customer_audit table, that logs INSERT and UPDATE happened on a customer table.

customer_audit

id operation timestamp customer_id address1 address2
1 I 2024-10-05 100 link st number 1
2 U 2024-10-06 100 link st number 2
3 U 2024-10-07 100 link road number 2
4 U 2024-10-08 100 link road number 200
5 I 2024-10-06 200 park st number 20
6 U 2024-10-08 200 park st number 200

The expected output, in this case for address1 and address2, displays all historical values for address1 and address2 separately, with from_date and to_date calculated from the timestamp column.

E.g. for customer_id 100:

customer_id column_name column_value from_date to_date
100 address1 link st 2024-10-05 2024-10-07
100 address1 link road 2024-10-07 null (meaning column value is current)
100 address2 number 1 2024-10-05 2024-10-06
100 address2 number 2 2024-10-06 2024-10-08
100 address2 number 200 2024-10-08 null

I managed to solve it by tweaking what mcwolf’s query a bit:

select customer_id, name, 'address1' as attribute_name, address1 as "attribute_value", min(from_date) as "valid_from", max(to_date) as "valid_to"
from (
select customer_id, 
       name, 
       address1,
       "mod_timestamp" from_date,
         COALESCE(lead("mod_timestamp") over change_window, to_date('99991231', 'yyyymmdd')) to_date,
       ROW_NUMBER () OVER ( PARTITION BY customer_id, address1 ORDER BY mod_timestamp) as rownum
from customer_audit
WINDOW change_window as (PARTITION BY customer_id order by mod_timestamp)   
order by customer_id, mod_timestamp ) a
group by customer_id, name, attribute_name, "attribute_value"
order by customer_id, "valid_from";

2

Answers


  1. … here is another way to do it …

    WITH
      dates AS    -- track changes of addresses ( ..._rn = 1 )
        ( Select ca.id, customer_id, "timestamp" as ts,
                 address1, Row_Number() Over(Partition By customer_id, address1 Order By id) addr1_rn,
                 address2, Row_Number() Over(Partition By customer_id, address2 Order By id) addr2_rn
          From   customer_audit ca
        ) 
    --    M a i n    S Q L : 
    SELECT  f.customer_id, 'address1' as column_name, f.address1 as column_value, f.ts as from_date,  
            Lead(ts) Over(Partition By customer_id, 'address1' Order By id) as to_date
    FROM  ( Select  * From  dates Where addr1_rn = 1 ) f
    UNION ALL 
    SELECT  f.customer_id, 'address2', f.address2, f.ts, 
            Lead(ts) Over(Partition By customer_id, 'address2' Order By id) as to_date
    FROM  ( Select  * From  dates Where addr2_rn = 1 ) f 
    ORDER BY  customer_id, column_name, from_date
    
    /*     R e s u l t : 
    customer_id column_name column_value    from_date               to_date
    ----------- ----------- --------------  ----------------------  ----------------------
            100 address1    link st         2024-10-05 00:00:00     2024-10-07 00:00:00
            100 address1    link road       2024-10-07 00:00:00     null
            100 address2    number 1        2024-10-05 00:00:00     2024-10-06 00:00:00
            100 address2    number 2        2024-10-06 00:00:00     2024-10-08 00:00:00
            100 address2    number 200      2024-10-08 00:00:00     null
    --
            200 address1    park st         2024-10-06 00:00:00     null
            200 address2    number 20       2024-10-06 00:00:00     2024-10-08 00:00:00
            200 address2    number 200      2024-10-08 00:00:00     null                    */
    

    See the fiddle here.

    Login or Signup to reply.
  2. Your appended answer does not seem to work for me. This does:

    SELECT customer_id, 'address1' AS col_name, address1 AS col_value
         , min(timestamp) AS valid_from
         , max(to_ts) AS valid_to
    FROM  (
       SELECT *, count(step OR null) OVER (PARTITION BY customer_id ORDER BY timestamp) AS grp
       FROM  (
          SELECT customer_id, address1, timestamp
               , lag(address1) OVER (PARTITION BY customer_id ORDER BY timestamp) <> address1 AS step
               , lead(timestamp, 1, '9999-12-31') OVER (PARTITION BY customer_id ORDER BY timestamp) AS to_ts
          FROM   customer_audit
          ) sub1
       ) sub2
    GROUP BY customer_id, grp, address1
    ORDER BY customer_id, grp;
    

    fiddle

    Repeat the same for address2 and append results. Both operations are mostly independent as they form different groups.

    Related (with links to more):

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search