How to count number of people for each month, who was on internship, if we know dates of internship - Postgresql

DanielG
January 20, 2023
220 views
2 votes
4 Answers

First of all, sorry for bad title – I can’t figure out how to write generalized formulation of my problem.
I have a table in PostgreSQL with users and dates of their internships. It looks like this:

user_id	start	end
1	December 22, 2019	June 29, 2020
2	March 8, 2020	September 8, 2020
3	May 21, 2020	November 21, 2020

From this I need to calculate for each month, how many people were on internship during this month. I only need to calculate full months (if internship actually started on December 22, 2019, I will calculate from January 2022. If internship were finished at June 29, 2020, I will calculate till May 2020.
Finally I need this table:

Month	Count
Jan-20	1
Feb-20	1
Mar-20	1
Apr-20	2
May-20	2
Jun-20	2
Jul-20	2
Aug-20	2
Sep-20	1
Oct-20	1

For making it absolutely clear, this is how I got it:

Month	user_1	user_2	user_3	Count
Jan-20	1			1
Feb-20	1			1
Mar-20	1			1
Apr-20	1	1		2
May-20	1	1		2
Jun-20		1	1	2
Jul-20		1	1	2
Aug-20		1	1	2
Sep-20			1	1
Oct-20			1	1

My idea is to:

Reshape my initial table, so it will look like this:

user_id	date	event
1	December 22, 2019	start
1	June 29, 2020	end
2	March 8, 2020	start
2	September 8, 2020	end
3	May 21, 2020	start
3	November 21, 2020	end

Generate series between each start and end event:

user_id	month
1	Jan-20
1	Feb-20
1	Mar-20
1	Apr-20
1	May-20
2	Apr-20
2	May-20
2	Jun-20
2	Jul-20
2	Aug-20
3	Jun-20
3	Jul-20
3	Aug-20
3	Sep-20
3	Oct-20

Using count() GROUP BY month

Unfortunately, I have problems with 1 and 2 clauses.

I don’t know how to reshape the table in PostgreSQL. In Pandas I would use ‘stack’ function. For my case I can’t find the appropriate function.

Even if I can reshape it, I don’t understand how to make series of month for each user (shown above).

Please advise, what can be done here to solve my problem?

Answers

One approach could be to

generate a calendar table using GENERATE_SERIES
joining the calendar table with your original table on date ranges
aggregating to count users for each month

WITH calendar AS (
    SELECT DATE('2020-01-01') + (num_months::text || ' month')::interval AS months
    FROM GENERATE_SERIES(0, 11) AS num_months
)
SELECT c.months, COUNT(user_id) AS cnt
FROM calendar c
INNER JOIN tab 
        ON c.months BETWEEN DATE_TRUNC('month', tab.start_) + INTERVAL '1 month' AND DATE_TRUNC('month', tab.end_) - INTERVAL '1 month'
GROUP BY c.months
ORDER BY c.months

Check the demo here.

If you’re using a PostgreSQL legacy version, you can obtain the calendar table with a recursive query:

WITH RECURSIVE calendar AS (
    SELECT '2020-01-01'::timestamp AS months, 
           0 AS num_months
    
    UNION ALL 
  
    SELECT months + INTERVAL '1 month' AS months,
           num_months + 1 AS num_months
    FROM calendar
    WHERE num_months +1 <= 12
)
SELECT c.months, COUNT(user_id) AS cnt
FROM calendar c
INNER JOIN tab 
        ON c.months BETWEEN DATE_TRUNC('month', tab.start_) + INTERVAL '1 month' AND DATE_TRUNC('month', tab.end_) - INTERVAL '1 month'
GROUP BY c.months
ORDER BY c.months

Check the demo here.

- Edouard
- January 20, 2023 at 10:05 pm
- 0 votes
0
this query should do the job considering your table as test :
```
SELECT to_char(d.date, 'Mon-YY') AS month, count(*) AS count
  FROM
     ( SELECT generate_series(date_trunc('month', min(start_date)), date_trunc('month', max(end_date)), interval '1 month') :: date AS date
         FROM test
     ) AS d
 INNER JOIN test AS t
    ON daterange(t.start_date, t.end_date, '[]') && daterange(d.date, (d.date + interval '1 month') :: date)
 WHERE daterange(t.start_date, t.end_date, '[]') @> daterange(d.date, (d.date + interval '1 month') :: date)
 GROUP BY d.date
```
- The first sub query calculate the months covered in table test.
- The JOIN clause calculates the intersections between the months and the date interval for each user
- The WHERE clause filters the rows where the date interval for a user corresponds to a full months.
Result :

month count

Jan-20 1

Feb-20 1

Mar-20 1

Apr-20 2

May-20 2

Jun-20 2

Jul-20 2

Aug-20 2

Sep-20 1

Oct-20 1

see results in dbfiddle
Login or Signup to reply.

Here is how I accomplished this:

Generated a calendar table using generate_series
Truncated the start and end dates to ensure we are only including instances where a full month of the internship was completed.
Performed a cross join to generate a cartesian product set.
Finally, add the WHERE predicate to include instances where the date_mm is between the truncated start and end dates.

SQL:

SELECT a.date_mm AS MONTH,
       count(b.user_id) AS COUNT
FROM
  (SELECT date_mm :: date
   FROM generate_series('2020-01-01', '2023-01-01', '1 month' :: interval) date_mm) a
CROSS JOIN
  (SELECT a.user_id,
          a.start,
         /* start_next_fom = first day of next month */ 
         (date_trunc('month', a.start) + interval '1 month') AS start_next_fom,
         a.end,
         /* end_last_eom = last day of last month */
         (date_trunc('month', a.end) - interval '1 day') AS end_last_eom
   FROM users a) b
WHERE a.date_mm BETWEEN b.start_next_fom AND b.end_last_eom
GROUP BY a.date_mm
ORDER BY a.date_mm

Result:

|      month | count |
|------------|-------|
| 2020-01-01 |     1 |
| 2020-02-01 |     1 |
| 2020-03-01 |     1 |
| 2020-04-01 |     2 |
| 2020-05-01 |     2 |
| 2020-06-01 |     2 |
| 2020-07-01 |     2 |
| 2020-08-01 |     2 |
| 2020-09-01 |     1 |
| 2020-10-01 |     1 |

SQL Fiddle:
http://sqlfiddle.com/#!17/9f9f3/37

Example with date calendar as subquery – number sequence (10*12) – 10 year.
Test data:

create table test (user_id integer, start_date date, end_date date) ;
insert into test values
(1, 'December 22, 2019',    'June 29, 2020')
,(2,    'March 8, 2020',    'September 8, 2020')
,(3,    'May 21, 2020', 'November 21, 2020') --june-october
,(4,    'May 31, 2020', 'November 30, 2020') --june-november
,(5,    'May 01, 2020', 'May 31, 2020') -- 1 full month - may
,(6,    'May 01, 2020', 'May 30, 2020') -- part of month  - null
,(7,    'May 05, 2020', 'May 30, 2020') -- part of month - null
,(8,    'May 05, 2020', 'May 31, 2020') -- part of month - null
,(9,    'May 05, 2020', 'June 6, 2020') -- part of month - null
,(10,   'May 05, 2020', 'July 6, 2020') -- part of 2 month -june
,(11,   'Jan 01, 2018', 'Dec 31, 2020') -- full 3 year
  ;

Main query:

SELECT user_id,start_date,end_date,mn monthNum
  ,to_char(case when date_trunc('month', start_date)=start_date 
                then  (start_date+cast((mn-1) ||' month' as interval))
           else (start_date+cast(mn ||' month' as interval))
           end, 'YY-mm-Mon') AS month
  ,case when date_trunc('month', start_date)=start_date 
        then  (start_date+cast((mn-1) ||' month' as interval))
   else (start_date+cast(mn ||' month' as interval))
   end date
FROM  test AS t 
left join --series of 120 numbers (month)
  (select yn*12+mn as mn
   from (select * from(values(1),(2),(3),(4),(5),(6),(7),(9),(10),(11),(12))tm(mn))tm -- month
       ,(select * from(values(0),(1),(2),(3),(4),(5),(6),(7),(9),(10))ty(yn)) ty --years
  )mm
  on case when date_trunc('month', start_date)=start_date -- first day of month
        then  (start_date+cast((mn-1) ||' month' as interval))
      else (start_date+cast(mn ||' month' as interval))
      end
  <=
    case when end_date
              =(date_trunc('month', end_date) 
                           + interval '1 month' - interval '1 day') --eomonth
         then end_date
    else (end_date-cast(extract(day from end_date) ||' day' as interval))
    end
ORDER BY user_id,mn

Somewhat complicated to check start_date is first day of month and end_date is last day of month
Fiddle here

Please signup or login to give your own answer.

Click here to cancel reply.

How to count number of people for each month, who was on internship, if we know dates of internship – Postgresql

Answers