I’m currently trying to calculate the retention rates of my users, and I am having difficulties to get what I need. What I want to do is for any given date be able to calculate all the new registered users and see how many come back the following days, weeks, months, etc.
I have therefore different retention metrics which I’m calling D1 (one day retention rate), D7 (one week retention rate) and this up to D365 (so one year retention rate).
Below is a simplified example of my table structure:
loginDate | userId | installDate |
---|---|---|
01/01/2023 | 1 | 01/01/2023 |
01/01/2023 | 2 | 01/01/2023 |
01/01/2023 | 3 | 01/01/2023 |
02/01/2023 | 1 | 01/01/2023 |
02/01/2023 | 4 | 02/01/2023 |
03/01/2023 | 4 | 02/01/2023 |
08/01/2023 | 1 | 01/01/2023 |
And here is a simplified example of the result that my query should return:
Date | D1 | D7 |
---|---|---|
02/01/2023 | 33% | NULL |
03/01/2023 | 100% | NULL |
08/01/2023 | NULL | 33% |
Finally, here is my current query:
SELECT
loginDate AS Date,
sum(CASE WHEN DATEDIFF(loginDate, installDate) = 1 THEN 1 END) / count(*) AS D1,
sum(CASE WHEN DATEDIFF(loginDate, installDate) = 7 THEN 1 END) / count(*) AS D7
FROM logins
GROUP BY loginDate
While I believe the sum part is correct, dividing by count(*) looks wrong. I’ve been trying to divide by count(distinct userId) but no luck as well. Overall, I’m not too sure on how to take into account each relevant day for my calculations. For instance, if we take the D1 metric, I’m struggling to find out how to divide the users who came back on T+1 by the total users who created their account on T.
Could anyone help please?
EDIT:
Replying to @FanoFN.
Please find below the code I used based on yours:
WITH cte1 AS
(SELECT loginDate,
DATEDIFF(loginDate, installDate) Intervals,
COUNT(loginDate=installDate) userCount
FROM logins
GROUP BY Intervals, loginDate),
cte2 AS (
SELECT loginDate,
SUM(CASE WHEN Intervals=0 THEN userCount ELSE 0 END) totalInstalledUser,
SUM(CASE WHEN Intervals=1 THEN userCount ELSE 0 END) D1,
SUM(CASE WHEN Intervals=7 THEN userCount ELSE 0 END) D7
FROM cte1
GROUP BY loginDate)
SELECT loginDate,
(D1/totalInstalledUser)*100 D1Percentage,
(D7/totalInstalledUser)*100 D7Percentage
FROM cte2
GROUP BY loginDate
2
Answers
I misunderstood your expected result as the result you’re getting with your current query and a couple of crucial information that you missed out in your question is that the
loginDate
is in factDATETIME
datatype and " sometimes a user would login more than once in a day…… We therefore need to make sure eachuserId
is being counted only once". I’ve made another attempt here for you to test:With a fiddle
The changes I made are:
DATE()
function to extract date fromloginDate
column wherever necessary.DISTINCT
inCOUNT(DISTINCT userId) TotalLogin,
to make sure that auserId
is only counted once a day regardless how many times they login.You can do it using
WITH
,COUNT OVER PARTITION
As follows :cte
andcte2
is to get number of loggings per day.cte3
is to get the Amount of users and If the range is D1 or D7.cte4
to get the percentages.Demo here