I have a table with data on activated subscriptions in PosgreSQL database. The table has the following columns: User ID, Start Date of Subscription, End Date of Subscription, and Type of Subscription.
If a subscription is currently active, then the End Date of Subscription is set to today’s date.
There are two independent types of subscriptions: 1. Premium, 2. Books. They are activated independently. For example, the data might look like this:
User ID | Start Date of Subscription | End Date of Subscription | Type of Subscription |
---|---|---|---|
675 | 2023-01-01 | 2023-05-10 | Premium |
675 | 2023-02-15 | 2023-02-28 | Books |
675 | 2023-04-18 | 2023-06-18 | Books |
726 | 2023-01-01 | 2023-10-10 | Premium |
726 | 2023-03-16 | 2023-05-28 | Books |
855 | 2023-04-05 | 2023-05-28 | Books |
855 | 2023-04-20 | 2023-07-25 | Premium |
What I need is to adjust the periods of the Premium subscription if another subscription (Books) is activated during its term. In other words, the Premium subscription should be split into several periods before and after the Book subscription activation. The desired output would look like this:
User ID | Start Date of Subscription | End Date of Subscription | Type of Subscription |
---|---|---|---|
675 | 2023-01-01 | 2023-02-15 | Premium |
675 | 2023-02-15 | 2023-02-28 | Books |
675 | 2023-02-28 | 2023-04-18 | Premium |
675 | 2023-04-18 | 2023-06-18 | Books |
726 | 2023-01-01 | 2023-03-16 | Premium |
726 | 2023-03-16 | 2023-05-28 | Books |
726 | 2023-05-28 | 2023-10-10 | Premium |
855 | 2023-04-05 | 2023-05-28 | Books |
855 | 2023-05-28 | 2023-07-25 | Premium |
Made smth like this, but it splits the first subscription named premium into period before the next sub "books". But I don’t understand how to continue sub "premium" after the "books" finished.
WITH ordered_subscriptions AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY start_date) as rn
FROM subscriptions
),
date_ranges AS (
SELECT
a.user_id,
a.start_date,
MIN(b.start_date) as end_date,
a.subscription_type
FROM ordered_subscriptions a
LEFT JOIN ordered_subscriptions b ON a.user_id = b.user_id AND a.rn < b.rn
GROUP BY a.user_id, a.start_date, a.subscription_type
),
filtered_subscriptions AS (
SELECT
user_id,
start_date,
COALESCE(end_date, CURRENT_DATE) as end_date,
subscription_type
FROM date_ranges
WHERE subscription_type = 'Premium' AND NOT EXISTS (
SELECT 1
FROM date_ranges d2
WHERE d2.user_id = date_ranges.user_id
AND d2.subscription_type = 'Books'
AND d2.start_date < date_ranges.end_date
AND (d2.end_date IS NULL OR d2.end_date > date_ranges.start_date)
)
)
SELECT * FROM filtered_subscriptions
UNION ALL
SELECT user_id, start_date, COALESCE(end_date, CURRENT_DATE), subscription_type
FROM date_ranges
WHERE subscription_type = 'Books'
ORDER BY user_id, start_date;
2
Answers
I’ll try to do a better write-up later and possibly even clean up the query a bit.
The presumption is that you want to keep all the Books subscriptions. Where there’s a Premium subscription that overlaps in some way then the period that overlaps needs to be absorbed into the other subscription while the non-overlaps (possibly two) are adjusted/created.
There are four different ways that two such subscriptions can intersect. There’s also the possibility that multiple Books can fall under a single Premium. The first part of the query identifies whether there is overlap, the type of overlap, and gathers data related to forward and backward "links".
The second portion generates a set of four dates corresponding to boundaries between different segments of the overlap. It also handles creating an extra row where a Premium must split into three parts (one remains with a Books) and where a Premium is fully subsumed by a Books and needs to disappear completely.
The rest of the logic just drives the assembly of all the parts. Care is taken to not duplicate segments where a "linkage" happens.
There’s no recursion and only a single join so I think this will be efficient. Appropriate indexes on columns
user_id, start_date, end_date
would probably be beneficial.https://dbfiddle.uk/chT-SLiC
I had originally thought that using an outer join would be cleaner and generate a better plan than going with a union. Potentially, it is in fact better to go this other route even though the two queries are mostly the same.
https://dbfiddle.uk/raMy-wOL
See example.
There periods with subscription_type
Premium
joined with rowsBooks
(not Premium, more accurately). Join conditions on schemeTo reduce calculations, we also immediately take the end of the previous
Books
period and the beginning of the next one.Case when Books period completely overlaps Premium period, conditions 1 and 2 will not be satisfied and this row is excluded from the output (inner join).
For Joint sequences, like 2023-12-01->2023-12-04 <->2023-12-04->2023-12-16 need to clarify the conditions.
Fiddle here