I have a gaps and islands SQL problem combined with a recursion.
Background: I run a PostgreSQL database with school vacations and bank holidays from which I render calendar views for schools, city and what not. I code a lot of the logic with the programming language on top but try to move it into SQL to get better performance.
The table locations
stores schools (e.g. "Example school"), federal states (e.g. "Baden-Württemberg") and countries (e.g. "Germany"). The country is a parent of the federal state. A federal state is a parent of a school. All have different school vacations and bank holidays. Example for the current December:
Vacations, bank holidays (blue) and what not are stored in the entries
table. Depending on their hierarchy they are linked to different locations (e.g. country, federal state or school). Weekends (gray) belong to the country. Normal school vacations (green) to the federal state. Special events belong to the school. They often overlap.
Assuming I want to query all entries
from the location with the id
2 and want to know how many days (duration) an entry has, I can do this with
SELECT
e.starts_on,
e.ends_on,
e.name,
(e.ends_on - e.starts_on + 1) AS days,
l.name AS location_name
FROM
entries e
INNER JOIN locations l ON e.location_id = l.id
WHERE
l.id = 2;
This results in
starts_on | ends_on | name | days | location_name |
---|---|---|---|---|
2022-12-21 | 2023-01-07 | Christmas school vacation | 18 | Baden-Württemberg |
2023-01-06 | 2023-01-06 | Bank Holiday | 1 | Baden-Württemberg |
But any student wants to know the total number of days there is no school. Including adjoining weekends or bank holidays.
That could be solved with Find rows with adjourning date ranges and accumulate their durations
But I have trouble to combine that with the recursion (searching for a school but actually search for a school, a federal state and a country).
Question
How can I select the days
of the entry and the total_days
of all the adjoining entries of either the location and of all parents of the location? And can I also aggregate the entry_ids
that adjoin or overlap in one attribute too?
I want to search for the "Christmas school vacation" of the school with the id
3 and get this result:
starts_on | ends_on | name | days | location_name | total_days | aggr_location_ids | real_start | real_end |
---|---|---|---|---|---|---|---|---|
2022-12-21 | 2023-01-07 | Christmas school vacation | 18 | Baden-Württemberg | 19 | [1, 2, 3, 4, 6, 7, 8] | 2022-12-21 | 23-01-08 |
Setup with PostgreSQL
https://www.db-fiddle.com/f/jVPQoFPQw7axf7XivwpiPZ/0
CREATE TABLE locations (
id serial PRIMARY KEY,
name varchar(255) NOT NULL,
parent_id integer REFERENCES locations (id)
);
CREATE TABLE entries (
id serial PRIMARY KEY,
starts_on date NOT NULL,
ends_on date NOT NULL,
name varchar(255) NOT NULL,
location_id integer REFERENCES locations (id) NOT NULL
);
ALTER TABLE entries ADD CONSTRAINT ends_after_starts CHECK (ends_on >= starts_on);
INSERT INTO locations (name) VALUES ('Germany');
INSERT INTO locations (name, parent_id) VALUES ('Baden-Württemberg', 1);
INSERT INTO locations (name, parent_id) VALUES ('Example school', 2);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-21', '2023-01-07', 'Christmas school vacation', 2);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-25', '2022-12-26', 'Christmas', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-01', '2023-01-01', 'New Year', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-06', '2023-01-06', 'Bank Holiday', 2);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-17', '2022-12-18', 'Weekend', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-24', '2022-12-25', 'Weekend', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-31', '2023-01-01', 'Weekend', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-07', '2023-01-08', 'Weekend', 1);
INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-14', '2023-01-15', 'Weekend', 1);
2
Answers
Hello there is a solution using WITH RECURSIVE
The first query from WITH is used as a pivot and get stored inside of WITH rec(eid , lid , parent_id ) , the next query from union ALL
does a SELECT using the vales from REC , uses itself to relies under recursivity.
The query below first produces the recursive lookup for all the location entries and then, in
get_vacations
, searches for vacations that are adjacent to the holiday originally searched for:See fiddle.