skip to Main Content

First of all, sorry for bad title – I can’t figure out how to write generalized formulation of my problem.
I have a table in PostgreSQL with users and dates of their internships. It looks like this:

user_id start end
1 December 22, 2019 June 29, 2020
2 March 8, 2020 September 8, 2020
3 May 21, 2020 November 21, 2020

From this I need to calculate for each month, how many people were on internship during this month. I only need to calculate full months (if internship actually started on December 22, 2019, I will calculate from January 2022. If internship were finished at June 29, 2020, I will calculate till May 2020.
Finally I need this table:

Month Count
Jan-20 1
Feb-20 1
Mar-20 1
Apr-20 2
May-20 2
Jun-20 2
Jul-20 2
Aug-20 2
Sep-20 1
Oct-20 1

For making it absolutely clear, this is how I got it:

Month user_1 user_2 user_3 Count
Jan-20 1 1
Feb-20 1 1
Mar-20 1 1
Apr-20 1 1 2
May-20 1 1 2
Jun-20 1 1 2
Jul-20 1 1 2
Aug-20 1 1 2
Sep-20 1 1
Oct-20 1 1

My idea is to:

  1. Reshape my initial table, so it will look like this:
user_id date event
1 December 22, 2019 start
1 June 29, 2020 end
2 March 8, 2020 start
2 September 8, 2020 end
3 May 21, 2020 start
3 November 21, 2020 end
  1. Generate series between each start and end event:
user_id month
1 Jan-20
1 Feb-20
1 Mar-20
1 Apr-20
1 May-20
2 Apr-20
2 May-20
2 Jun-20
2 Jul-20
2 Aug-20
3 Jun-20
3 Jul-20
3 Aug-20
3 Sep-20
3 Oct-20
  1. Using count() GROUP BY month

Unfortunately, I have problems with 1 and 2 clauses.

I don’t know how to reshape the table in PostgreSQL. In Pandas I would use ‘stack’ function. For my case I can’t find the appropriate function.

Even if I can reshape it, I don’t understand how to make series of month for each user (shown above).

Please advise, what can be done here to solve my problem?

4

Answers


  1. One approach could be to

    • generate a calendar table using GENERATE_SERIES
    • joining the calendar table with your original table on date ranges
    • aggregating to count users for each month
    WITH calendar AS (
        SELECT DATE('2020-01-01') + (num_months::text || ' month')::interval AS months
        FROM GENERATE_SERIES(0, 11) AS num_months
    )
    SELECT c.months, COUNT(user_id) AS cnt
    FROM calendar c
    INNER JOIN tab 
            ON c.months BETWEEN DATE_TRUNC('month', tab.start_) + INTERVAL '1 month' AND DATE_TRUNC('month', tab.end_) - INTERVAL '1 month'
    GROUP BY c.months
    ORDER BY c.months
    

    Check the demo here.


    If you’re using a PostgreSQL legacy version, you can obtain the calendar table with a recursive query:

    WITH RECURSIVE calendar AS (
        SELECT '2020-01-01'::timestamp AS months, 
               0 AS num_months
        
        UNION ALL 
      
        SELECT months + INTERVAL '1 month' AS months,
               num_months + 1 AS num_months
        FROM calendar
        WHERE num_months +1 <= 12
    )
    SELECT c.months, COUNT(user_id) AS cnt
    FROM calendar c
    INNER JOIN tab 
            ON c.months BETWEEN DATE_TRUNC('month', tab.start_) + INTERVAL '1 month' AND DATE_TRUNC('month', tab.end_) - INTERVAL '1 month'
    GROUP BY c.months
    ORDER BY c.months
    

    Check the demo here.

    Login or Signup to reply.
  2. this query should do the job considering your table as test :

    SELECT to_char(d.date, 'Mon-YY') AS month, count(*) AS count
      FROM
         ( SELECT generate_series(date_trunc('month', min(start_date)), date_trunc('month', max(end_date)), interval '1 month') :: date AS date
             FROM test
         ) AS d
     INNER JOIN test AS t
        ON daterange(t.start_date, t.end_date, '[]') && daterange(d.date, (d.date + interval '1 month') :: date)
     WHERE daterange(t.start_date, t.end_date, '[]') @> daterange(d.date, (d.date + interval '1 month') :: date)
     GROUP BY d.date
    
    • The first sub query calculate the months covered in table test.
    • The JOIN clause calculates the intersections between the months and the date interval for each user
    • The WHERE clause filters the rows where the date interval for a user corresponds to a full months.

    Result :

    month count
    Jan-20 1
    Feb-20 1
    Mar-20 1
    Apr-20 2
    May-20 2
    Jun-20 2
    Jul-20 2
    Aug-20 2
    Sep-20 1
    Oct-20 1

    see results in dbfiddle

    Login or Signup to reply.
  3. Here is how I accomplished this:

    • Generated a calendar table using generate_series
    • Truncated the start and end dates to ensure we are only including instances where a full month of the internship was completed.
    • Performed a cross join to generate a cartesian product set.
    • Finally, add the WHERE predicate to include instances where the date_mm is between the truncated start and end dates.

    SQL:

    SELECT a.date_mm AS MONTH,
           count(b.user_id) AS COUNT
    FROM
      (SELECT date_mm :: date
       FROM generate_series('2020-01-01', '2023-01-01', '1 month' :: interval) date_mm) a
    CROSS JOIN
      (SELECT a.user_id,
              a.start,
             /* start_next_fom = first day of next month */ 
             (date_trunc('month', a.start) + interval '1 month') AS start_next_fom,
             a.end,
             /* end_last_eom = last day of last month */
             (date_trunc('month', a.end) - interval '1 day') AS end_last_eom
       FROM users a) b
    WHERE a.date_mm BETWEEN b.start_next_fom AND b.end_last_eom
    GROUP BY a.date_mm
    ORDER BY a.date_mm
    

    Result:

    |      month | count |
    |------------|-------|
    | 2020-01-01 |     1 |
    | 2020-02-01 |     1 |
    | 2020-03-01 |     1 |
    | 2020-04-01 |     2 |
    | 2020-05-01 |     2 |
    | 2020-06-01 |     2 |
    | 2020-07-01 |     2 |
    | 2020-08-01 |     2 |
    | 2020-09-01 |     1 |
    | 2020-10-01 |     1 |
    

    SQL Fiddle:
    http://sqlfiddle.com/#!17/9f9f3/37

    Login or Signup to reply.
  4. Example with date calendar as subquery – number sequence (10*12) – 10 year.
    Test data:

    create table test (user_id integer, start_date date, end_date date) ;
    insert into test values
    (1, 'December 22, 2019',    'June 29, 2020')
    ,(2,    'March 8, 2020',    'September 8, 2020')
    ,(3,    'May 21, 2020', 'November 21, 2020') --june-october
    ,(4,    'May 31, 2020', 'November 30, 2020') --june-november
    ,(5,    'May 01, 2020', 'May 31, 2020') -- 1 full month - may
    ,(6,    'May 01, 2020', 'May 30, 2020') -- part of month  - null
    ,(7,    'May 05, 2020', 'May 30, 2020') -- part of month - null
    ,(8,    'May 05, 2020', 'May 31, 2020') -- part of month - null
    ,(9,    'May 05, 2020', 'June 6, 2020') -- part of month - null
    ,(10,   'May 05, 2020', 'July 6, 2020') -- part of 2 month -june
    ,(11,   'Jan 01, 2018', 'Dec 31, 2020') -- full 3 year
      ;
    

    Main query:

    SELECT user_id,start_date,end_date,mn monthNum
      ,to_char(case when date_trunc('month', start_date)=start_date 
                    then  (start_date+cast((mn-1) ||' month' as interval))
               else (start_date+cast(mn ||' month' as interval))
               end, 'YY-mm-Mon') AS month
      ,case when date_trunc('month', start_date)=start_date 
            then  (start_date+cast((mn-1) ||' month' as interval))
       else (start_date+cast(mn ||' month' as interval))
       end date
    FROM  test AS t 
    left join --series of 120 numbers (month)
      (select yn*12+mn as mn
       from (select * from(values(1),(2),(3),(4),(5),(6),(7),(9),(10),(11),(12))tm(mn))tm -- month
           ,(select * from(values(0),(1),(2),(3),(4),(5),(6),(7),(9),(10))ty(yn)) ty --years
      )mm
      on case when date_trunc('month', start_date)=start_date -- first day of month
            then  (start_date+cast((mn-1) ||' month' as interval))
          else (start_date+cast(mn ||' month' as interval))
          end
      <=
        case when end_date
                  =(date_trunc('month', end_date) 
                               + interval '1 month' - interval '1 day') --eomonth
             then end_date
        else (end_date-cast(extract(day from end_date) ||' day' as interval))
        end
    ORDER BY user_id,mn
    

    Somewhat complicated to check start_date is first day of month and end_date is last day of month
    Fiddle here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search