skip to Main Content

In postgresql, how to get cumulative count? I wanted to get a cumulative count of customers enrolled between 01-01 and 02-01, 01-01 and 03-01, 01-01 and 04-01 etc.,

Not sure if we can do for loop like in Python? I checked out a few for loop posts but am lost

select count(distinct customer)
from mytable
where enrollment_dt between '2023-01-01' and '2023-xx-01' 
(this xx will increase by 1 month in each run, like 02-01, 03-01, 04-01...)

so ideally the output should be like:

date count
01/01-02/01 10
01/01-03/01 12
01/01-04/01 15
01/01-05/01 19

2

Answers


  1. Looks like you need a window function to do a running sum over the normal aggregation.

    select
      count(distinct mt.customer) as count,
      sum(count(distinct mt.customer)) over (order by mt.enrollment_dt rows unbounded preceding) as running_count
    from mytable mt
    where mt.enrollment_dt between '2023-01-01' and '2023-xx-01' 
    
    Login or Signup to reply.
  2. It is not quite clear what exactly you want to do. I took a blind guess and could offer solutions for two possible options. To test it – lets assume that your sample data could be like this:

    --      S a m p l e    D a t a :
    Create Table customers (id Int, customer Varchar(64), enrollment_dt Date);
    Insert Into customers
    VALUES (101, 'Customer 101', '2022-11-17'), 
           (102, 'Customer 102', '2022-12-03'), 
           (103, 'Customer 103', '2023-01-17'), 
           (104, 'Customer 104', '2023-01-09'), 
           (105, 'Customer 105', '2023-03-01'), 
           (106, 'Customer 106', '2023-04-03'), 
           (107, 'Customer 107', '2023-04-21'), 
           (108, 'Customer 108', '2023-04-29'),
           (109, 'Customer 109', '2023-07-01');
    

    There are months with and without newly enrolled costumers. So you maybe want to select just the rows with new enrollments:

    --  Option 1.  - get only the data for months having newly enrolled customer(s)
    --      S Q L :
    SELECT    yr_mnth, mnth_count,
              Sum(mnth_count) 
                  Over(Order By yr_mnth
                       Rows Between Unbounded Preceding And Current Row) as running_sum
    FROM   ( Select    To_Char(Date_Trunc('month', enrollment_dt), 'yyyy-mm') as yr_mnth, 
                       Count(*) as mnth_count
             From      customers
             Where     enrollment_dt >= '2023-01-01'
             Group By  yr_mnth
           )
    ORDER BY yr_mnth
    
    /*    R e s u l t : 
    yr_mnth mnth_count  running_sum
    ------- ----------  -----------
    2023-01          2            2
    2023-03          1            3
    2023-04          3            6
    2023-07          1            7  */
    

    … on the other hand maybe you want to select all the months in certain period regardless new enrollments…
    .. for this you will need a sort of calendar of months in period…

    --  Option 2.  - get the data for all months regardless there were newly enrolled cusomer(s) or not
    --  Use Recursive cte to generate rows with all the months in period of interest
    WITH
        Recursive mnths ( month_id ) as
            ( Select Date_Trunc('month', Cast('2023-01-01' as Date)) as month_id 
             UNION ALL
              Select month_id + interval '1 month' 
              From mnths
              Where month_id < Date_Trunc('month', Current_Date)
            )
    

    … the calendar data should be joined to the sql from Option 1. (Left Join) handling the Null values…

    --    M a i n    S Q L :
    Select     To_Char(m.month_id, 'yyyy-mm') as yr_mnth, 
               Coalesce(c.mnth_count, 0) as mnth_count,
               Coalesce(c.running_sum, Max(running_sum) Over(Order By To_Char(m.month_id, 'yyyy-mm')
                                           Rows Between Unbounded Preceding And 1 Preceding)) as running_sum
    From       mnths m 
    Left Join  ( SELECT    yr_mnth, mnth_count,
                           Sum(mnth_count) 
                               Over(Order By yr_mnth
                               Rows Between Unbounded Preceding And Current Row) as running_sum
                 FROM   ( Select    To_Char(Date_Trunc('month', enrollment_dt), 'yyyy-mm') as yr_mnth, 
                                    Count(*) as mnth_count
                          From      customers
                          Where     enrollment_dt >= '2023-01-01'
                          Group By  yr_mnth
                       )
               ) c ON(c.yr_mnth = To_Char(m.month_id, 'yyyy-mm'))
    Order By   To_Char(m.month_id, 'yyyy-mm')
    
    /*    R e s u l t : 
    yr_mnth mnth_count  running_sum
    ------- ----------  -----------
    2023-01          2            2
    2023-02          0            2
    2023-03          1            3
    2023-04          3            6
    2023-05          0            6
    2023-06          0            6
    2023-07          1            7
    2023-08          0            7
    2023-09          0            7
    2023-10          0            7
    2023-11          0            7
    2023-12          0            7
    2024-01          0            7
    2024-02          0            7
    2024-03          0            7
    2024-04          0            7
    2024-05          0            7
    2024-06          0            7
    2024-07          0            7   */
    

    See the fiddle here.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search