Postgresql - Looking for a Cumulative Total for Unique Customers Only Month by Month

MrLockett
November 1, 2022
212 views
3 votes
3 Answers

I am trying to create a cumulative count for unique customers only by the month they purchased. The example Table is:

customer_email	cohortMonth
[email protected]	10/2019
[email protected]	10/2019
[email protected]	10/2019
[email protected]	11/2019
[email protected]	11/2019
[email protected]	12/2019

The output I am looking for is the total Customers for 10/2019 would be 3, The cumulative total customers for 11/2019 would be 4 taking all of the customers purchased in 10/2019 and adding [email protected] as this is the only Unique customer email for the month. The cumulative total customers for 12/2019 will still be 4 as no new customers purchased in this month.

cohortMonth	cumulative_total_customers
10/2019	3
11/2019	4
12/2019	4

Answers

try:

with dataset as (
    SELECT 
        customer,
        date_trunc('month', date(purchase_date)) purchase_date 
    FROM (
            VALUES 
                ('[email protected]', '2019-10-05'),
                ('[email protected]', '2019-10-10'),
                ('[email protected]', '2019-10-15'),
                ('[email protected]', '2019-11-03'),
                ('[email protected]', '2019-11-28'),
                ('[email protected]', '2019-12-12')
        ) AS x (customer, purchase_date)
)


SELECT purchase_date,
       COUNT( customer ) AS daily_cumulative_count
FROM ( 
    SELECT 
        a.purchase_date AS purchase_date,
        b.customer
    FROM dataset AS a
    INNER JOIN dataset AS b ON a.purchase_date >= b.purchase_date
    GROUP BY 
        a.purchase_date,
        b.customer
     ) 
GROUP BY 
    purchase_date
;

The ideal solution would be to use a window function with a DISTINCT clause so that to eliminate the duplicated emails :

SELECT DISTINCT ON (cohortMonth)
       cohortMonth
     , count(DISTINCT customer_email) OVER (ORDER BY cohortMonth RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_total_customers
  FROM mytable

But I get a Postgres error :

DISTINCT is not implemented for window functions

So in order to eliminate the duplicates, we have two solutions :

Solution 1 : build the array of customer_email by cohortMonth and then eliminate the duplicates with a query

SELECT l.cohortMonth, array_length(array_agg(DISTINCT a), 1) AS cumulative_total_customers
FROM
   ( SELECT DISTINCT ON (cohortMonth)
            cohortMonth
          , array_agg(customer_email) OVER (ORDER BY cohortMonth RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS email_array
       FROM mytable
   ) AS l
CROSS JOIN LATERAL unnest(l.email_array) AS a
GROUP BY l.cohortMonth

Solution 2 : build the array of customer_email by cohortMonth and then eliminate the duplicates with a specific function count_distinct_array_element

CREATE OR REPLACE FUNCTION count_distinct_array_element(x anyarray) RETURNS bigint LANGUAGE sql AS $$
SELECT count(DISTINCT e) FROM unnest(x) AS e ; $$ ;

SELECT DISTINCT ON (cohortMonth)
       cohortMonth
     , count_distinct_array_element(array_agg(customer_email) OVER (ORDER BY cohortMonth RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) AS cumulative_total_customers
  FROM mytable

Result :

cohortmonth	cumulative_total_customers
10/2019	3
11/2019	4
12/2019	4

See test results in dbfiddle

- GMB
- November 2, 2022 at 3:09 am
- 0 votes
0
We can do this simply we two levels of aggregation.
```
select cohort_month, 
    sum(count(*)) over(order by cohort_month) cumulative_total_customers 
from (select min(cohort_month) cohort_month from mytable group by customer_email) t
group by cohort_month
order by cohort_month
```
The subquery retrieves the earliest appearance of each customer, which we can then aggregate by month to generate the cumulative count.

If there are months without new customer, and you still want them to show in the resultset, then we can tweak the query a litle:
```
select c.cohort_month, 
    sum(count(t.cohort_month)) over(order by c.cohort_month) cumulative_total_customers 
from (select distinct cohort_month from mytable) c
left join (select min(cohort_month) cohort_month from mytable group by customer_email) t using (cohort_month)
group by c.cohort_month
order by c.cohort_month
```
Demo on DB Fiddlde
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Looking for a Cumulative Total for Unique Customers Only Month by Month

Answers