Postgresql - Writing query. I need to split the period of the subscription into periods before and after the another subscription, which contained in the first one

anonymous
December 17, 2023
250 views
1 vote
2 Answers

I have a table with data on activated subscriptions in PosgreSQL database. The table has the following columns: User ID, Start Date of Subscription, End Date of Subscription, and Type of Subscription.

If a subscription is currently active, then the End Date of Subscription is set to today’s date.

There are two independent types of subscriptions: 1. Premium, 2. Books. They are activated independently. For example, the data might look like this:

User ID	Start Date of Subscription	End Date of Subscription	Type of Subscription
675	2023-01-01	2023-05-10	Premium
675	2023-02-15	2023-02-28	Books
675	2023-04-18	2023-06-18	Books
726	2023-01-01	2023-10-10	Premium
726	2023-03-16	2023-05-28	Books
855	2023-04-05	2023-05-28	Books
855	2023-04-20	2023-07-25	Premium

What I need is to adjust the periods of the Premium subscription if another subscription (Books) is activated during its term. In other words, the Premium subscription should be split into several periods before and after the Book subscription activation. The desired output would look like this:

User ID	Start Date of Subscription	End Date of Subscription	Type of Subscription
675	2023-01-01	2023-02-15	Premium
675	2023-02-15	2023-02-28	Books
675	2023-02-28	2023-04-18	Premium
675	2023-04-18	2023-06-18	Books
726	2023-01-01	2023-03-16	Premium
726	2023-03-16	2023-05-28	Books
726	2023-05-28	2023-10-10	Premium
855	2023-04-05	2023-05-28	Books
855	2023-05-28	2023-07-25	Premium

Made smth like this, but it splits the first subscription named premium into period before the next sub "books". But I don’t understand how to continue sub "premium" after the "books" finished.

WITH ordered_subscriptions AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY start_date) as rn
    FROM subscriptions
),
date_ranges AS (
    SELECT 
        a.user_id, 
        a.start_date, 
        MIN(b.start_date) as end_date,
        a.subscription_type
    FROM ordered_subscriptions a
    LEFT JOIN ordered_subscriptions b ON a.user_id = b.user_id AND a.rn < b.rn
    GROUP BY a.user_id, a.start_date, a.subscription_type
),
filtered_subscriptions AS (
    SELECT 
        user_id, 
        start_date, 
        COALESCE(end_date, CURRENT_DATE) as end_date, 
        subscription_type
    FROM date_ranges
    WHERE subscription_type = 'Premium' AND NOT EXISTS (
        SELECT 1 
        FROM date_ranges d2 
        WHERE d2.user_id = date_ranges.user_id 
        AND d2.subscription_type = 'Books' 
        AND d2.start_date < date_ranges.end_date 
        AND (d2.end_date IS NULL OR d2.end_date > date_ranges.start_date)
    )
)

SELECT * FROM filtered_subscriptions
UNION ALL
SELECT user_id, start_date, COALESCE(end_date, CURRENT_DATE), subscription_type 
FROM date_ranges 
WHERE subscription_type = 'Books'
ORDER BY user_id, start_date;

Tags: postgresql sql

Answers

I’ll try to do a better write-up later and possibly even clean up the query a bit.

The presumption is that you want to keep all the Books subscriptions. Where there’s a Premium subscription that overlaps in some way then the period that overlaps needs to be absorbed into the other subscription while the non-overlaps (possibly two) are adjusted/created.

There are four different ways that two such subscriptions can intersect. There’s also the possibility that multiple Books can fall under a single Premium. The first part of the query identifies whether there is overlap, the type of overlap, and gathers data related to forward and backward "links".

The second portion generates a set of four dates corresponding to boundaries between different segments of the overlap. It also handles creating an extra row where a Premium must split into three parts (one remains with a Books) and where a Premium is fully subsumed by a Books and needs to disappear completely.

The rest of the logic just drives the assembly of all the parts. Care is taken to not duplicate segments where a "linkage" happens.

There’s no recursion and only a single join so I think this will be efficient. Appropriate indexes on columns user_id, start_date, end_date would probably be beneficial.

with data as (
    select
        s.user_id, s.subscription_type, s.start_date, s.end_date,
        b.start_date as b_start_date, b.end_date as b_end_date,
        case when b.subscription_type is not null
             then lag(b.end_date)
                      over (partition by s.user_id, s.start_date order by b.start_date) end as pb_end_date,
        case when b.subscription_type is not null
             then lead(b.start_date)
                      over (partition by s.user_id, s.start_date order by b.start_date) end as nb_start_date,
        case when s.start_date < b.start_date then 'p'
             when s.start_date > b.start_date then 'b' else '' end ||
        case when s.end_date   < b.end_date   then 'b'
             when s.end_date   > b.end_date   then 'p' else '' end as overlap
    from ordered_subscriptions s left outer join ordered_subscriptions as b
        on      s.subscription_type = 'Premium' and b.subscription_type = 'Books'
            and s.user_id = b.user_id
            and s.start_date < b.end_date and s.end_date > b.start_date
), data2 as (
    select
        user_id, subscription_type, overlap,
        case when pb_end_date is not null then 'Y' else 'N' end as linked,
        least(start_date, b_start_date) as d1,
        greatest(start_date, b_start_date) as d2,
        least(end_date, b_end_date) as d3,
        coalesce(nb_start_date, greatest(end_date, b_end_date)) as d4,
        seg_num
    from data left outer join lateral (
            select * from (values (1), (2)) as v(seg_num)
            where subscription_type = 'Premium' and (seg_num = 1 or overlap = 'pp')
        ) as s(seg_num) on 1 = 1
    where (subscription_type = 'Books' or overlap <> 'bb')
)
select
    user_id, subscription_type,
    case when seg_num is null then d1
         when seg_num = 1 then case when overlap = 'bp' then d3 else d1 end
         when seg_num = 2 then d3 end as start_date,
    case when seg_num is null then d3
         when seg_num = 1 then case when overlap = 'bp' then d4 else d2 end
         when seg_num = 2 then d4 end as end_date
from data2
where (linked = 'N' or seg_num = 2)
order by user_id, start_date, end_date;

https://dbfiddle.uk/chT-SLiC

I had originally thought that using an outer join would be cleaner and generate a better plan than going with a union. Potentially, it is in fact better to go this other route even though the two queries are mostly the same.

https://dbfiddle.uk/raMy-wOL

See example.
There periods with subscription_type Premium joined with rows Books (not Premium, more accurately). Join conditions on scheme

Premium period     =====++++========++++++++====
Books period 1  --------|  |        |      |
Books period 2             |        |      |---------
Books period 1 and 2       |--------|

To reduce calculations, we also immediately take the end of the previous Books period and the beginning of the next one.
Case when Books period completely overlaps Premium period, conditions 1 and 2 will not be satisfied and this row is excluded from the output (inner join).
For Joint sequences, like 2023-12-01->2023-12-04 <->2023-12-04->2023-12-16 need to clarify the conditions.

with tB as(
select *
   ,lag(end_date)over(partition by user_id order by start_date) prev_end
   ,lead(start_date)over(partition by user_id order by start_date) next_start
from subscriptions s
where s.subscription_type<>'Premium'
)
select user_id,start_daten start_date,end_daten end_date,subscription_type
  --,start_dateorig,end_dateorig,start_dateb,end_dateb
from(
select s.user_id,s.subscription_type
  ,s.start_date as start_dateOrig,s.end_date as end_dateOrig
  ,tb.start_date as start_dateB,tb.end_date as end_dateB
  ,case when n=1 then tb.end_date
        when n=2 then 
         case when tb.prev_end is null or tb.prev_end<s.start_date then s.start_date
         else tb.prev_end
         end
   end start_dateN
  ,case when n=2 then tb.start_date
        when n=1 then 
         case when tb.next_start is null then s.end_date --? current_date 
              when tb.next_start>s.end_date then s.end_date
         else tb.next_start
         end
   end end_dateN
from subscriptions s
left join tb on tB.user_id=s.user_id 
   and ( tb.start_date between s.start_date and s.end_date
       or tb.end_date between s.start_date and s.end_date) 
inner join (select 1 n union all select 2 ) nn 
  on   (n=1 and tb.end_date between s.start_date and s.end_date )
    or (n=2 and tb.start_date between s.start_date and s.end_date 
          and (tb.prev_end is null or tb.prev_end<s.start_date) )
where s.subscription_type='Premium'
) x
union all
select user_id,start_date,end_date
  ,subscription_type
from tB s
where s.subscription_type<>'Premium'
order by user_id,start_date

Fiddle here

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Writing query. I need to split the period of the subscription into periods before and after the another subscription, which contained in the first one

Answers