skip to Main Content

I am trying to create a cumulative count for unique customers only by the month they purchased. The example Table is:

customer_email cohortMonth
[email protected] 10/2019
[email protected] 10/2019
[email protected] 10/2019
[email protected] 11/2019
[email protected] 11/2019
[email protected] 12/2019

The output I am looking for is the total Customers for 10/2019 would be 3, The cumulative total customers for 11/2019 would be 4 taking all of the customers purchased in 10/2019 and adding [email protected] as this is the only Unique customer email for the month. The cumulative total customers for 12/2019 will still be 4 as no new customers purchased in this month.

cohortMonth cumulative_total_customers
10/2019 3
11/2019 4
12/2019 4

3

Answers


  1. try:

    with dataset as (
        SELECT 
            customer,
            date_trunc('month', date(purchase_date)) purchase_date 
        FROM (
                VALUES 
                    ('[email protected]', '2019-10-05'),
                    ('[email protected]', '2019-10-10'),
                    ('[email protected]', '2019-10-15'),
                    ('[email protected]', '2019-11-03'),
                    ('[email protected]', '2019-11-28'),
                    ('[email protected]', '2019-12-12')
            ) AS x (customer, purchase_date)
    )
    
    
    SELECT purchase_date,
           COUNT( customer ) AS daily_cumulative_count
    FROM ( 
        SELECT 
            a.purchase_date AS purchase_date,
            b.customer
        FROM dataset AS a
        INNER JOIN dataset AS b ON a.purchase_date >= b.purchase_date
        GROUP BY 
            a.purchase_date,
            b.customer
         ) 
    GROUP BY 
        purchase_date
    ;
    

    enter image description here

    Login or Signup to reply.
  2. The ideal solution would be to use a window function with a DISTINCT clause so that to eliminate the duplicated emails :

    SELECT DISTINCT ON (cohortMonth)
           cohortMonth
         , count(DISTINCT customer_email) OVER (ORDER BY cohortMonth RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_total_customers
      FROM mytable
    

    But I get a Postgres error :

    DISTINCT is not implemented for window functions

    So in order to eliminate the duplicates, we have two solutions :

    Solution 1 : build the array of customer_email by cohortMonth and then eliminate the duplicates with a query

    SELECT l.cohortMonth, array_length(array_agg(DISTINCT a), 1) AS cumulative_total_customers
    FROM
       ( SELECT DISTINCT ON (cohortMonth)
                cohortMonth
              , array_agg(customer_email) OVER (ORDER BY cohortMonth RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS email_array
           FROM mytable
       ) AS l
    CROSS JOIN LATERAL unnest(l.email_array) AS a
    GROUP BY l.cohortMonth
    

    Solution 2 : build the array of customer_email by cohortMonth and then eliminate the duplicates with a specific function count_distinct_array_element

    CREATE OR REPLACE FUNCTION count_distinct_array_element(x anyarray) RETURNS bigint LANGUAGE sql AS $$
    SELECT count(DISTINCT e) FROM unnest(x) AS e ; $$ ;
    
    SELECT DISTINCT ON (cohortMonth)
           cohortMonth
         , count_distinct_array_element(array_agg(customer_email) OVER (ORDER BY cohortMonth RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) AS cumulative_total_customers
      FROM mytable
    

    Result :

    cohortmonth cumulative_total_customers
    10/2019 3
    11/2019 4
    12/2019 4

    See test results in dbfiddle

    Login or Signup to reply.
  3. We can do this simply we two levels of aggregation.

    select cohort_month, 
        sum(count(*)) over(order by cohort_month) cumulative_total_customers 
    from (select min(cohort_month) cohort_month from mytable group by customer_email) t
    group by cohort_month
    order by cohort_month
    

    The subquery retrieves the earliest appearance of each customer, which we can then aggregate by month to generate the cumulative count.

    If there are months without new customer, and you still want them to show in the resultset, then we can tweak the query a litle:

    select c.cohort_month, 
        sum(count(t.cohort_month)) over(order by c.cohort_month) cumulative_total_customers 
    from (select distinct cohort_month from mytable) c
    left join (select min(cohort_month) cohort_month from mytable group by customer_email) t using (cohort_month)
    group by c.cohort_month
    order by c.cohort_month
    

    Demo on DB Fiddlde

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search