skip to Main Content

We have 4 tables, to study the email performance of a campaign:

  • Table A contains the campaign name information
  • Table B contains the campaign id delivery performance (who the campaign was sent out to)
  • Table C contains the campaign id open performance (who opened the email)
  • Table D contains the campaign’s click performance (who clicked on any elements in the email)

Sample of what the table structures look like (note: assume this is a snapshot and not the full dataset):

-- Table A: Campaign Information
CREATE TABLE TableA (
campaign_name VARCHAR(255),
campaign_id INT
);

-- Table B: Email Delivery Information
CREATE TABLE TableB (
campaign_id INT,
delivery_date DATE,
user_id VARCHAR(50)
);

-- Table C: Email Open Information
CREATE TABLE TableC (
campaign_id INT,
open_date DATE,
user_id VARCHAR(50)
);

-- Table D: Email Click Information
CREATE TABLE TableD (
campaign_id INT,
click_date DATE,
user_id VARCHAR(50)
);

-- Inserts for TableA (Campaign Information)
INSERT INTO TableA (campaign_name, campaign_id) VALUES
('Instacash Promo 1', 112233),
('Instacash Promo 2', 112244),
('RoarMoney Balance 5', 112259);

-- Inserts for TableB (Email Delivery Information)
INSERT INTO TableB (campaign_id, delivery_date, user_id) VALUES
(112233, '2021-01-01', 'a'),
(112233, '2021-01-01', 'b'),
(112233, '2021-01-01', 'c'),
(112244, '2021-01-05', 'd'),
(112244, '2021-01-05', 'e');

-- Inserts for TableC (Email Opened Information)
INSERT INTO TableC (campaign_id, open_date, user_id) VALUES
(112233, '2021-01-03', 'a'),
(112233, '2021-01-05', 'b'),
(112244, '2021-01-07', 'd'),
(112244, '2021-01-10', 'e');

-- Inserts for TableD (Email Link Clicked Information)
INSERT INTO TableD (campaign_id, click_date, user_id) VALUES
(112233, '2021-01-03', 'a'),
(112244, '2021-01-11', 'e');

I need to write a SQL to query the average time between the 2nd and 3rd email for each individual product on an overall level? The output should consists of:

campaign_category Average time taken to 2nd to 3rd email

campaign_name should be classified as campaign_category of Instacash, RoarMoney or any campaign that is not under Instacash or RoarMoney can be dubbed “Others”.

Here is my code but I think the code below is wrong:

-- Calculate DATEDIFF between rn = 3 and rn = 2
WITH temp as (
    SELECT 
        CASE 
            WHEN campaign_name REGEXP '^Instacash' THEN 'Instacash'
            WHEN campaign_name REGEXP '^RoarMoney' THEN 'RoarMoney'
            ELSE 'Others'
        END AS campaign_category,
        a.campaign_id,
        a.campaign_name,
        b.delivery_date
    FROM TableA a
    LEFT JOIN TableB b ON a.campaign_id = b.campaign_id
),
temp_1 as (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY campaign_name ORDER BY delivery_date) as rn
    FROM temp
),
temp_2 as (
    SELECT *
    FROM temp_1
    WHERE rn in (2, 3)
),
temp_3 as (
    SELECT
        t1.campaign_category,
        t1.campaign_id,
        t1.campaign_name,
        DATEDIFF(day, t2.delivery_date, t3.delivery_date) as date_difference
    FROM temp_2 t1
    JOIN temp_2 t2 ON t1.campaign_id = t2.campaign_id AND t2.rn = 2
    JOIN temp_2 t3 ON t1.campaign_id = t3.campaign_id AND t3.rn = 3
)

-- Select the calculated result
SELECT * FROM temp_3;

Can anyone help me on this?

2

Answers


  1. Chosen as BEST ANSWER
    WITH temp as (
    SELECT 
        CASE 
            WHEN a.campaign_name LIKE 'Instacash%' THEN 'Instacash'
            WHEN a.campaign_name LIKE 'RoarMoney%' THEN 'RoarMoney'
            ELSE 'Others'
        END AS campaign_category,
        a.campaign_id,
        a.campaign_name,
        b.delivery_date
    FROM TableA a
    INNER JOIN TableB b ON a.campaign_id = b.campaign_id),
    temp_1 as (SELECT *, 
    ROW_NUMBER() OVER (PARTITION BY campaign_category ORDER BY delivery_date) as rn FROM temp)
    SELECT t1.campaign_category,
    AVG(datediff(t2.delivery_date,t1.delivery_date)) as Average_time_taken_to_2nd_to_3rd_email 
    FROM temp_1 t1 
    JOIN temp_1 t2 ON t1.campaign_id = t2.campaign_id 
    AND t2.rn = 3 WHERE t1.rn = 2 
    GROUP BY t1.campaign_category;
    

    I think this code is closer to what the question wants. Thanks @user1191247.


  2. Your REGEXP for campaign_name seems a little excessive and can be replaced with LIKE. If you want the average of the number of days between 2nd and 3rd emails being delivered, per campaign category, then you can use something like this:

    WITH temp as (
        SELECT 
            CASE 
                WHEN a.campaign_name LIKE 'Instacash%' THEN 'Instacash'
                WHEN a.campaign_name LIKE 'RoarMoney%' THEN 'RoarMoney'
                ELSE 'Others'
            END AS campaign_category,
            a.campaign_id,
            a.campaign_name,
            b.delivery_date,
            ROW_NUMBER() OVER (PARTITION BY a.campaign_id ORDER BY b.delivery_date) as rn
        FROM TableA a
        INNER JOIN TableB b ON a.campaign_id = b.campaign_id
    )
    SELECT
        t1.campaign_category,
        AVG(DATEDIFF(t2.delivery_date, t1.delivery_date)) as date_difference
    FROM temp t1
    JOIN temp t2 ON t1.campaign_id = t2.campaign_id AND t2.rn = 3
    WHERE t1.rn = 2
    GROUP BY t1.campaign_category;
    

    Note: I have changed the partitioning to be on campaign_id instead of campaign_name in the CTE, and added GROUP BY campaign_category to the final select to get the average per campaign_category.

    Here’s db<>fiddle.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search