I am trying to create a cumulative count for unique customers only by the month they purchased. The example Table is:
customer_email | cohortMonth |
---|---|
[email protected] | 10/2019 |
[email protected] | 10/2019 |
[email protected] | 10/2019 |
[email protected] | 11/2019 |
[email protected] | 11/2019 |
[email protected] | 12/2019 |
The output I am looking for is the total Customers for 10/2019 would be 3, The cumulative total customers for 11/2019 would be 4 taking all of the customers purchased in 10/2019 and adding [email protected] as this is the only Unique customer email for the month. The cumulative total customers for 12/2019 will still be 4 as no new customers purchased in this month.
cohortMonth | cumulative_total_customers |
---|---|
10/2019 | 3 |
11/2019 | 4 |
12/2019 | 4 |
3
Answers
try:
The ideal solution would be to use a
window
function with aDISTINCT
clause so that to eliminate the duplicated emails :But I get a Postgres error :
So in order to eliminate the duplicates, we have two solutions :
Solution 1 : build the array of customer_email by cohortMonth and then eliminate the duplicates with a query
Solution 2 : build the array of customer_email by cohortMonth and then eliminate the duplicates with a specific function
count_distinct_array_element
Result :
See test results in dbfiddle
We can do this simply we two levels of aggregation.
The subquery retrieves the earliest appearance of each customer, which we can then aggregate by month to generate the cumulative count.
If there are months without new customer, and you still want them to show in the resultset, then we can tweak the query a litle:
Demo on DB Fiddlde