skip to Main Content

I’m currently trying to calculate the retention rates of my users, and I am having difficulties to get what I need. What I want to do is for any given date be able to calculate all the new registered users and see how many come back the following days, weeks, months, etc.

I have therefore different retention metrics which I’m calling D1 (one day retention rate), D7 (one week retention rate) and this up to D365 (so one year retention rate).

Below is a simplified example of my table structure:

loginDate userId installDate
01/01/2023 1 01/01/2023
01/01/2023 2 01/01/2023
01/01/2023 3 01/01/2023
02/01/2023 1 01/01/2023
02/01/2023 4 02/01/2023
03/01/2023 4 02/01/2023
08/01/2023 1 01/01/2023

And here is a simplified example of the result that my query should return:

Date D1 D7
02/01/2023 33% NULL
03/01/2023 100% NULL
08/01/2023 NULL 33%

Finally, here is my current query:

  SELECT
       loginDate AS Date,
       sum(CASE WHEN DATEDIFF(loginDate, installDate) = 1 THEN 1 END) / count(*) AS D1,
       sum(CASE WHEN DATEDIFF(loginDate, installDate) = 7 THEN 1 END) / count(*) AS D7
  FROM logins
  GROUP BY loginDate

While I believe the sum part is correct, dividing by count(*) looks wrong. I’ve been trying to divide by count(distinct userId) but no luck as well. Overall, I’m not too sure on how to take into account each relevant day for my calculations. For instance, if we take the D1 metric, I’m struggling to find out how to divide the users who came back on T+1 by the total users who created their account on T.

Could anyone help please?

EDIT:

Replying to @FanoFN.
Please find below the code I used based on yours:

WITH cte1 AS
(SELECT loginDate,
       DATEDIFF(loginDate, installDate) Intervals,
       COUNT(loginDate=installDate) userCount
  FROM logins
GROUP BY Intervals, loginDate),
 cte2 AS (
  SELECT loginDate,
       SUM(CASE WHEN Intervals=0 THEN userCount ELSE 0 END) totalInstalledUser,
       SUM(CASE WHEN Intervals=1 THEN userCount ELSE 0 END) D1,
       SUM(CASE WHEN Intervals=7 THEN userCount ELSE 0 END) D7
FROM cte1
GROUP BY loginDate)
SELECT loginDate,
       (D1/totalInstalledUser)*100 D1Percentage,
       (D7/totalInstalledUser)*100 D7Percentage
   FROM cte2
   GROUP BY loginDate

2

Answers


  1. I misunderstood your expected result as the result you’re getting with your current query and a couple of crucial information that you missed out in your question is that the loginDate is in fact DATETIME datatype and " sometimes a user would login more than once in a day…… We therefore need to make sure each userId is being counted only once". I’ve made another attempt here for you to test:

    WITH cte1 AS (
      SELECT installDate,
             SUM(DATE(loginDate)=installDate) InstalledCount
      FROM logins 
      GROUP BY installDate
      ),
      cte2 AS (
    SELECT DATE(l.loginDate) LgnDt,
           DATEDIFF(l.loginDate, l.installDate) Intervals,
           COUNT(DISTINCT userId) TotalLogin,
           cte1.InstalledCount
    FROM logins l
    JOIN cte1
     ON l.installDate=cte1.installDate
    WHERE DATE(l.loginDate) != l.installDate
    GROUP BY LgnDt, Intervals, cte1.InstalledCount)
    SELECT LgnDt,
           CASE WHEN Intervals=1 THEN (TotalLogin/InstalledCount)*100 ELSE 0 END AS D1,
           CASE WHEN Intervals=7 THEN (TotalLogin/InstalledCount)*100 ELSE 0 END AS D7
    FROM cte2
    

    With a fiddle

    The changes I made are:

    1. Using DATE() function to extract date from loginDate column wherever necessary.
    2. Adding DISTINCT in COUNT(DISTINCT userId) TotalLogin, to make sure that a userId is only counted once a day regardless how many times they login.
    Login or Signup to reply.
  2. You can do it using WITH, COUNT OVER PARTITION As follows :

    cte and cte2 is to get number of loggings per day.

    cte3 is to get the Amount of users and If the range is D1 or D7.

    cte4 to get the percentages.

    WITH cte AS (
      SELECT *,
      count(case when loginDate=installDate then 1 end) 
        OVER (
          PARTITION BY installDate ORDER BY loginDate
        ) AS Count_Logins
      FROM logins
      
    ),
    cte2 AS (
      select cte.*, count(1) OVER (PARTITION BY loginDate, installDate order by loginDate) as today_logs
      from cte
    ), 
    cte3 AS (
       select loginDate, installDate,
       MAX(CASE WHEN DATEDIFF(loginDate, installDate) = 1 THEN 1 END)  AS D1,
       MAX(CASE WHEN DATEDIFF(loginDate, installDate) = 7 THEN 1 END)  AS D7,
       MAX(today_logs)/MAX(Count_Logins) AS Amount
       from cte2
       WHERE DATEDIFF(loginDate, installDate) > 0
       GROUP BY loginDate, installDate 
    ),
    cte4 AS (
      select loginDate, D1*Amount*100 AS D1, D7*Amount*100 AS D7
      from cte3
    )
    select *
    from cte4;
    

    Demo here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search