I have a table with student data which includes their start and withdrawal dates. Not all students have withdrawal dates as they are still enrolled.
I am trying to calculate the number of new students per month, withdrawals per month, existing students per month. I have already completed counts of new and withdrawals per month, I am facing trouble in the calculation of existing students.
Data starts from January 2023.
This is the query I have come up with:
WITH student_activity AS
(
-- Convert start and withdrawal date keys to actual date format
SELECT to_date(fe.start_date_key::text, 'YYYYMMDD') AS start_date,
to_date(fe.withdrawal_date_key::text, 'YYYYMMDD') AS withdrawal_date,
dp.product_name, dp.sku
FROM fact_enrolment fe
INNER JOIN dim_product dp ON fe.product_key = dp.product_key
)
SELECT date_trunc('month', month_series) AS month,
COUNT(*) AS existing_students,
sa.product_name
FROM (
SELECT generate_series(
(SELECT MIN(to_date(start_date_key::text, 'YYYYMMDD')) FROM fact_enrolment),'2100-12-31',INTERVAL '1 month') AS month_series) AS months
LEFT JOIN student_activity sa ON sa.start_date < month_series AND (sa.withdrawal_date IS NULL OR sa.withdrawal_date >= month_series)
GROUP BY month, sa.product_name
This query does get the existing counts correctly but I am facing an issue where if a month does not have students starting in it, then the month is not displayed at all in the result set. I need it to still display each month and if it does not have any new values to it, then it should use the previous month’s count as those students are still enrolled.
CREATE TABLE fact_enrolment (
student_key int4 NULL,
start_date_key int4 NULL,
withdrawal_date_key int4 NULL,
product_key int4 NULL);
CREATE TABLE dim_product (
product_key int4 GENERATED ALWAYS AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483647 START 1 CACHE 1 NO CYCLE) NOT NULL,
product_name varchar NULL,
sku varchar NULL);
INSERT INTO dim_product (product_name , sku) VALUES ('Preschool' , 'ABC123');
INSERT INTO fact_enrolment (student_key, start_date_key, withdrawal_date_key, product_key) VALUES
(12, 20230105, 20230130, 1)
, (14, 20230106, 20230120, 1)
, (45, 20230405, 20230420, 1);
INSERT INTO fact_enrolment (student_key, start_date_key, product_key) VALUES
(17, 20230110, 1)
, (20, 20230120, 1)
, (21, 20230220, 1)
, (22, 20230202, 1)
, (23, 20230228, 1)
, (34, 20230206, 1)
, (44, 20230406, 1);
Calculations based on the sample given:
month | adds | withdrawals | existing students |
---|---|---|---|
2023-01 | 4 | 2 | 4 |
2023-02 | 4 | 0 | 6 |
2023-03 | 0 | 0 | 6 |
2023-04 | 2 | 1 | 8 |
Assuming that the withdrawals happen at the end of the month in the above table, they are subtracted in the next month’s count.
In the example to note 2023-03 is still shown with the previous month’s count.
I am using Postgres version 13.15.
2
Answers
You can use aggregate
filter
clause tocount(*)
different things in a single aggregated query. Window functions then let you run two stepping sums of those counts and subtract them to get current balance.demo at db<>fiddle
I’m not sure why after 10 students enrolled and 3 withdrew, you expect to be left with 8. That’s 7:
You need to create "date_spine" dataset that You can use as Your main table during the joins.
The result that You will get is like below: