We have 4 tables, to study the email performance of a campaign:
- Table A contains the campaign name information
- Table B contains the campaign id delivery performance (who the campaign was sent out to)
- Table C contains the campaign id open performance (who opened the email)
- Table D contains the campaign’s click performance (who clicked on any elements in the email)
Sample of what the table structures look like (note: assume this is a snapshot and not the full dataset):
-- Table A: Campaign Information
CREATE TABLE TableA (
campaign_name VARCHAR(255),
campaign_id INT
);
-- Table B: Email Delivery Information
CREATE TABLE TableB (
campaign_id INT,
delivery_date DATE,
user_id VARCHAR(50)
);
-- Table C: Email Open Information
CREATE TABLE TableC (
campaign_id INT,
open_date DATE,
user_id VARCHAR(50)
);
-- Table D: Email Click Information
CREATE TABLE TableD (
campaign_id INT,
click_date DATE,
user_id VARCHAR(50)
);
-- Inserts for TableA (Campaign Information)
INSERT INTO TableA (campaign_name, campaign_id) VALUES
('Instacash Promo 1', 112233),
('Instacash Promo 2', 112244),
('RoarMoney Balance 5', 112259);
-- Inserts for TableB (Email Delivery Information)
INSERT INTO TableB (campaign_id, delivery_date, user_id) VALUES
(112233, '2021-01-01', 'a'),
(112233, '2021-01-01', 'b'),
(112233, '2021-01-01', 'c'),
(112244, '2021-01-05', 'd'),
(112244, '2021-01-05', 'e');
-- Inserts for TableC (Email Opened Information)
INSERT INTO TableC (campaign_id, open_date, user_id) VALUES
(112233, '2021-01-03', 'a'),
(112233, '2021-01-05', 'b'),
(112244, '2021-01-07', 'd'),
(112244, '2021-01-10', 'e');
-- Inserts for TableD (Email Link Clicked Information)
INSERT INTO TableD (campaign_id, click_date, user_id) VALUES
(112233, '2021-01-03', 'a'),
(112244, '2021-01-11', 'e');
I need to write a SQL to query the average time between the 2nd and 3rd email for each individual product on an overall level? The output should consists of:
campaign_category | Average time taken to 2nd to 3rd email |
---|---|
campaign_name should be classified as campaign_category of Instacash, RoarMoney or any campaign that is not under Instacash or RoarMoney can be dubbed “Others”.
Here is my code but I think the code below is wrong:
-- Calculate DATEDIFF between rn = 3 and rn = 2
WITH temp as (
SELECT
CASE
WHEN campaign_name REGEXP '^Instacash' THEN 'Instacash'
WHEN campaign_name REGEXP '^RoarMoney' THEN 'RoarMoney'
ELSE 'Others'
END AS campaign_category,
a.campaign_id,
a.campaign_name,
b.delivery_date
FROM TableA a
LEFT JOIN TableB b ON a.campaign_id = b.campaign_id
),
temp_1 as (
SELECT *, ROW_NUMBER() OVER (PARTITION BY campaign_name ORDER BY delivery_date) as rn
FROM temp
),
temp_2 as (
SELECT *
FROM temp_1
WHERE rn in (2, 3)
),
temp_3 as (
SELECT
t1.campaign_category,
t1.campaign_id,
t1.campaign_name,
DATEDIFF(day, t2.delivery_date, t3.delivery_date) as date_difference
FROM temp_2 t1
JOIN temp_2 t2 ON t1.campaign_id = t2.campaign_id AND t2.rn = 2
JOIN temp_2 t3 ON t1.campaign_id = t3.campaign_id AND t3.rn = 3
)
-- Select the calculated result
SELECT * FROM temp_3;
Can anyone help me on this?
2
Answers
I think this code is closer to what the question wants. Thanks @user1191247.
Your REGEXP for campaign_name seems a little excessive and can be replaced with LIKE. If you want the average of the number of days between 2nd and 3rd emails being delivered, per campaign category, then you can use something like this:
Note: I have changed the partitioning to be on
campaign_id
instead ofcampaign_name
in the CTE, and addedGROUP BY campaign_category
to the final select to get the average percampaign_category
.Here’s db<>fiddle.