skip to Main Content

I have a table with data on activated subscriptions in PosgreSQL database. The table has the following columns: User ID, Start Date of Subscription, End Date of Subscription, and Type of Subscription.

If a subscription is currently active, then the End Date of Subscription is set to today’s date.

There are two independent types of subscriptions: 1. Premium, 2. Books. They are activated independently. For example, the data might look like this:

User ID Start Date of Subscription End Date of Subscription Type of Subscription
675 2023-01-01 2023-05-10 Premium
675 2023-02-15 2023-02-28 Books
675 2023-04-18 2023-06-18 Books
726 2023-01-01 2023-10-10 Premium
726 2023-03-16 2023-05-28 Books
855 2023-04-05 2023-05-28 Books
855 2023-04-20 2023-07-25 Premium

What I need is to adjust the periods of the Premium subscription if another subscription (Books) is activated during its term. In other words, the Premium subscription should be split into several periods before and after the Book subscription activation. The desired output would look like this:

User ID Start Date of Subscription End Date of Subscription Type of Subscription
675 2023-01-01 2023-02-15 Premium
675 2023-02-15 2023-02-28 Books
675 2023-02-28 2023-04-18 Premium
675 2023-04-18 2023-06-18 Books
726 2023-01-01 2023-03-16 Premium
726 2023-03-16 2023-05-28 Books
726 2023-05-28 2023-10-10 Premium
855 2023-04-05 2023-05-28 Books
855 2023-05-28 2023-07-25 Premium

Made smth like this, but it splits the first subscription named premium into period before the next sub "books". But I don’t understand how to continue sub "premium" after the "books" finished.

WITH ordered_subscriptions AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY start_date) as rn
    FROM subscriptions
),
date_ranges AS (
    SELECT 
        a.user_id, 
        a.start_date, 
        MIN(b.start_date) as end_date,
        a.subscription_type
    FROM ordered_subscriptions a
    LEFT JOIN ordered_subscriptions b ON a.user_id = b.user_id AND a.rn < b.rn
    GROUP BY a.user_id, a.start_date, a.subscription_type
),
filtered_subscriptions AS (
    SELECT 
        user_id, 
        start_date, 
        COALESCE(end_date, CURRENT_DATE) as end_date, 
        subscription_type
    FROM date_ranges
    WHERE subscription_type = 'Premium' AND NOT EXISTS (
        SELECT 1 
        FROM date_ranges d2 
        WHERE d2.user_id = date_ranges.user_id 
        AND d2.subscription_type = 'Books' 
        AND d2.start_date < date_ranges.end_date 
        AND (d2.end_date IS NULL OR d2.end_date > date_ranges.start_date)
    )
)

SELECT * FROM filtered_subscriptions
UNION ALL
SELECT user_id, start_date, COALESCE(end_date, CURRENT_DATE), subscription_type 
FROM date_ranges 
WHERE subscription_type = 'Books'
ORDER BY user_id, start_date;

2

Answers


  1. I’ll try to do a better write-up later and possibly even clean up the query a bit.

    The presumption is that you want to keep all the Books subscriptions. Where there’s a Premium subscription that overlaps in some way then the period that overlaps needs to be absorbed into the other subscription while the non-overlaps (possibly two) are adjusted/created.

    There are four different ways that two such subscriptions can intersect. There’s also the possibility that multiple Books can fall under a single Premium. The first part of the query identifies whether there is overlap, the type of overlap, and gathers data related to forward and backward "links".

    The second portion generates a set of four dates corresponding to boundaries between different segments of the overlap. It also handles creating an extra row where a Premium must split into three parts (one remains with a Books) and where a Premium is fully subsumed by a Books and needs to disappear completely.

    The rest of the logic just drives the assembly of all the parts. Care is taken to not duplicate segments where a "linkage" happens.

    There’s no recursion and only a single join so I think this will be efficient. Appropriate indexes on columns user_id, start_date, end_date would probably be beneficial.

    with data as (
        select
            s.user_id, s.subscription_type, s.start_date, s.end_date,
            b.start_date as b_start_date, b.end_date as b_end_date,
            case when b.subscription_type is not null
                 then lag(b.end_date)
                          over (partition by s.user_id, s.start_date order by b.start_date) end as pb_end_date,
            case when b.subscription_type is not null
                 then lead(b.start_date)
                          over (partition by s.user_id, s.start_date order by b.start_date) end as nb_start_date,
            case when s.start_date < b.start_date then 'p'
                 when s.start_date > b.start_date then 'b' else '' end ||
            case when s.end_date   < b.end_date   then 'b'
                 when s.end_date   > b.end_date   then 'p' else '' end as overlap
        from ordered_subscriptions s left outer join ordered_subscriptions as b
            on      s.subscription_type = 'Premium' and b.subscription_type = 'Books'
                and s.user_id = b.user_id
                and s.start_date < b.end_date and s.end_date > b.start_date
    ), data2 as (
        select
            user_id, subscription_type, overlap,
            case when pb_end_date is not null then 'Y' else 'N' end as linked,
            least(start_date, b_start_date) as d1,
            greatest(start_date, b_start_date) as d2,
            least(end_date, b_end_date) as d3,
            coalesce(nb_start_date, greatest(end_date, b_end_date)) as d4,
            seg_num
        from data left outer join lateral (
                select * from (values (1), (2)) as v(seg_num)
                where subscription_type = 'Premium' and (seg_num = 1 or overlap = 'pp')
            ) as s(seg_num) on 1 = 1
        where (subscription_type = 'Books' or overlap <> 'bb')
    )
    select
        user_id, subscription_type,
        case when seg_num is null then d1
             when seg_num = 1 then case when overlap = 'bp' then d3 else d1 end
             when seg_num = 2 then d3 end as start_date,
        case when seg_num is null then d3
             when seg_num = 1 then case when overlap = 'bp' then d4 else d2 end
             when seg_num = 2 then d4 end as end_date
    from data2
    where (linked = 'N' or seg_num = 2)
    order by user_id, start_date, end_date;
    

    https://dbfiddle.uk/chT-SLiC

    I had originally thought that using an outer join would be cleaner and generate a better plan than going with a union. Potentially, it is in fact better to go this other route even though the two queries are mostly the same.

    https://dbfiddle.uk/raMy-wOL

    Login or Signup to reply.
  2. See example.
    There periods with subscription_type Premium joined with rows Books (not Premium, more accurately). Join conditions on scheme

    Premium period     =====++++========++++++++====
    Books period 1  --------|  |        |      |
    Books period 2             |        |      |---------
    Books period 1 and 2       |--------|
    

    To reduce calculations, we also immediately take the end of the previous Books period and the beginning of the next one.
    Case when Books period completely overlaps Premium period, conditions 1 and 2 will not be satisfied and this row is excluded from the output (inner join).
    For Joint sequences, like 2023-12-01->2023-12-04 <->2023-12-04->2023-12-16 need to clarify the conditions.

    with tB as(
    select *
       ,lag(end_date)over(partition by user_id order by start_date) prev_end
       ,lead(start_date)over(partition by user_id order by start_date) next_start
    from subscriptions s
    where s.subscription_type<>'Premium'
    )
    select user_id,start_daten start_date,end_daten end_date,subscription_type
      --,start_dateorig,end_dateorig,start_dateb,end_dateb
    from(
    select s.user_id,s.subscription_type
      ,s.start_date as start_dateOrig,s.end_date as end_dateOrig
      ,tb.start_date as start_dateB,tb.end_date as end_dateB
      ,case when n=1 then tb.end_date
            when n=2 then 
             case when tb.prev_end is null or tb.prev_end<s.start_date then s.start_date
             else tb.prev_end
             end
       end start_dateN
      ,case when n=2 then tb.start_date
            when n=1 then 
             case when tb.next_start is null then s.end_date --? current_date 
                  when tb.next_start>s.end_date then s.end_date
             else tb.next_start
             end
       end end_dateN
    from subscriptions s
    left join tb on tB.user_id=s.user_id 
       and ( tb.start_date between s.start_date and s.end_date
           or tb.end_date between s.start_date and s.end_date) 
    inner join (select 1 n union all select 2 ) nn 
      on   (n=1 and tb.end_date between s.start_date and s.end_date )
        or (n=2 and tb.start_date between s.start_date and s.end_date 
              and (tb.prev_end is null or tb.prev_end<s.start_date) )
    where s.subscription_type='Premium'
    ) x
    union all
    select user_id,start_date,end_date
      ,subscription_type
    from tB s
    where s.subscription_type<>'Premium'
    order by user_id,start_date
    

    Fiddle here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search