I have a table that has users’ activiies on their membership (activate or deactivate) and when those activities happened. The activity table look like this:
user_id | activity | date |
---|---|---|
123 | activate | 06/01/2024 |
123 | deactivate | 06/15/2024 |
123 | activate | 06/20/2024 |
123 | deactivate | 06/30/2024 |
456 | activate | 06/25/2024 |
123 | deactivate | 07/08/2024 |
123 | activate | 07/10/2024 |
There is another table called dim_date that basically has one row for each date until today like this:
date |
---|
06/01/2024 |
06/02/2024 |
06/03/2024 |
… |
07/21/2024 |
What is the optimal way to write a SQL to return a daily status table that has one row for each user per day showing whether their membership status is active or inactive? The output table should look like:
user_id | date | membership_status |
---|---|---|
123 | 06/01/2024 | active |
123 | 06/02/2024 | active |
123 | … | active |
123 | 06/14/2024 | active |
123 | 06/15/2024 | inactive |
123 | 06/16/2024 | inactive |
123 | … | inactive |
123 | 06/19/2024 | inactive |
123 | 06/20/2024 | active |
123 | 06/21/2024 | active |
123 | … | active |
123 | 06/29/2024 | active |
123 | 06/30/2024 | inactive |
123 | 07/01/2024 | inactive |
123 | … | inactive |
123 | 07/21/2024 | inactive |
Currently, what I have is:
with cte as (
select
a.user_id
,a.activity
,a.date as activity_date
,dd.date
,row_number() over (partition by a.user_id, dd.date order by a.date desc) as rn
from activity a
left join dim_date dd on a.date <= dd.date
)
select
user_id
,date
,case when a.activity = "activate" then "active" else "inactive" end as membership_status
from cte
where rn = 1
Was wondering if anyone could provide something simpler? Possibly without cte/subquery?
2
Answers
Maybe not really simpler but I hope clear and easy to read using a scalar subquery for
membership_status
.DB Fiddle demo
"Simpler" is subjective. The main concern with your original query isn’t the complexity, it’s not terribly complex. But it will have a large intermediate result set with a lot of redundancy for row_number() to reckon with only having the
a.date <= dd.date
condition.Explicitly deriving an effective (start/end) range with
lead()
and joining on where the date falls within that range will probably be more performant.Fiddle: https://dbfiddle.uk/8WcyWwOa
Note:
coalesce()
to ‘9999-12-31’ is a bit of syntactic sugar just to force the dimension date to fall below the derived end of the user’s last status.