I am working on a transaction dataset with one date column & one identifier column. Now retroactively I want to create a boolean column to check whether any particular activity (e.g. marketing campaign) should have been sent for that identifier if the last time a campaign sent was before 30 days. I can use lag function but the problem is that the starting point would keep on changing.
For example, on first transaction date a campaign would be sent, then for any transaction within next 30 days no campaign; then, let’s say next transaction happens on 35th day from first transaction; then, send a campaign and now the new counter for next 30 days should start from 35th day.
I’m haven’t been able to figure out how I can achieve it. I am using redshift sql (table for reference below)
Identifier | Date (YYYY-MM-DD time) | desired column |
---|---|---|
A | 2024-06-30 11:22:03 | FALSE |
A | 2024-06-16 14:02:36 | FALSE |
A | 2024-06-15 8:15:57 | TRUE |
A | 2024-05-24 14:30:57 | FALSE |
A | 2024-05-10 4:45:06 | FALSE |
A | 2024-05-08 15:19:48 | FALSE |
A | 2024-05-07 15:09:14 | FALSE |
A | 2024-05-06 4:16:39 | FALSE |
A | 2024-05-04 10:37:16 | TRUE |
A | 2024-04-08 5:02:00 | FALSE |
A | 2024-04-06 9:03:23 | FALSE |
A | 2024-03-30 11:05:55 | TRUE |
A | 2024-03-16 8:39:56 | FALSE |
A | 2024-03-15 14:06:10 | FALSE |
A | 2024-02-28 16:55:28 | TRUE |
Thanks for the help!
3
Answers
I tried kind of a hacky solution for now which worked -
You are looking for an iteration. The first date is 2024-02-28. With this date you are looking for the first date that is after its 30 days range. That date is 2024-03-30. Then again you want to find the first date after its 30 day range, and so on.
Iteration is done with recursive queries in SQL.
If redshift is not yet capable of handling aggregation in recursive CTEs as you say below in the comments, then use another way to get the top 1 row per group. Getting the top 1 means there does not exist a better one, so you may be able to solve this with
NOT EXISTS
.The part
can also be written as
for instance.
One option to do it is to find refference dates. You could try to do it combining dates and differences in days between them with a few ctes…
… removed the time part and combined the dates (practicaly cross join) to get all day differences between them …
… filtering the resultset …
… getting refference dates using UNION (not UNION ALL) to get ridd of duplicates …
… reference dates are left joined to your data defining the values ‘TRUE’/’FALSE’ for the new column using Case expressions …
See the fiddle here.