skip to Main Content

I got this SCD table:

start_date end_date partition
2022-03-08 15:35:09.856 2022-03-09 14:57:36.610 1
2022-03-09 14:57:36.610 2022-05-18 13:26:31.195 2
2022-05-18 13:26:31.195 2022-08-02 10:12:02.441 2
2022-08-02 10:12:02.441 2022-09-01 11:10:01.019 2
2022-09-01 11:10:01.019 2022-09-01 11:10:20.777 1
2022-09-01 11:10:20.777 2022-09-01 11:21:26.526 1

I would like to know for each partition the last value of start_date and end_date of the other partition (there are only two). for the given table:

start_date end_date partition max_start_date max_end_date
2022-03-08 15:35:09.856 2022-03-09 14:57:36.610 1 null null
2022-03-09 14:57:36.610 2022-05-18 13:26:31.195 2 2022-03-08 15:35:09.856 2022-03-09 14:57:36.610
2022-05-18 13:26:31.195 2022-08-02 10:12:02.441 2 2022-03-08 15:35:09.856 2022-03-09 14:57:36.610
2022-08-02 10:12:02.441 2022-09-01 11:10:01.019 2 2022-03-08 15:35:09.856 2022-03-09 14:57:36.610
2022-09-01 11:10:01.019 2022-09-01 11:10:20.777 1 2022-08-02 10:12:02.441 2022-09-01 11:10:01.019
2022-09-01 11:10:20.777 2022-09-01 11:21:26.526 1 2022-08-02 10:12:02.441 2022-09-01 11:10:01.019

I tried some last_value window function and didn’t made it. like this:
, last_value (start_date) OVER (partition by partition = ‘1’ order by start_date asc) as last_start_date_partition
, last_value (end_date) OVER (partition by partition = ‘1’ order by end_date asc) as last_end_date_partition
is there any option to inject a condition to window function and make it function that way?

3

Answers


  1. Using dense_rank:

    with cte as (
       select (select sum((s1.start_date < s.start_date and s1.partition != s.partition)::int) 
         from scd s1) r, s.* 
       from scd s
    ),
    n_part as (
       select dense_rank() over (order by c.r) dr, c.* from cte c
    )
    select np.start_date, np.end_date, np.partition, max(np1.start_date), max(np1.end_date) 
    from n_part np left join n_part np1 on np1.dr = np.dr - 1
    group by np.start_date, np.end_date, np.partition
    order by np.start_date, np.end_date
    

    See fiddle.

    Login or Signup to reply.
  2. Using windows functions and gaps-and-islandish approach:

    SELECT start_date,
           end_date,
           PARTITION,
           max(start_date) OVER (ORDER BY grp RANGE UNBOUNDED PRECEDING EXCLUDE GROUP) max_start_date, -- use max value without current group
           max(end_date) OVER (ORDER BY grp RANGE UNBOUNDED PRECEDING EXCLUDE GROUP) max_end_date -- use max value without current group
    FROM
      (SELECT start_date,
              end_date,
              PARTITION,
              sum(lag) OVER (ORDER BY end_date) AS grp -- use cumulative sum to create a group
       FROM
         (SELECT *,
                 CASE
                     WHEN lag(PARTITION) OVER (ORDER BY end_date) != PARTITION THEN 1
                     ELSE 0
                 END lag -- use lag to determine if the partition has changed
          FROM mytable) t) tt 
    

    Fiddle

    Login or Signup to reply.
  3. You could do a self-left join and aggregate as the following:

    select T.start_date, T.end_date, T.partition_,
           max(D.start_date) max_start_date,
           max(D.end_date) max_end_date
    from SCD T left join SCD D
    on T.start_date > D.start_date and 
       T.partition_ <> D.partition_
    group by T.start_date, T.end_date, T.partition_
    order by T.start_date
    

    See demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search