skip to Main Content

I have a gaps and islands SQL problem combined with a recursion.

Background: I run a PostgreSQL database with school vacations and bank holidays from which I render calendar views for schools, city and what not. I code a lot of the logic with the programming language on top but try to move it into SQL to get better performance.

The table locations stores schools (e.g. "Example school"), federal states (e.g. "Baden-Württemberg") and countries (e.g. "Germany"). The country is a parent of the federal state. A federal state is a parent of a school. All have different school vacations and bank holidays. Example for the current December:

enter image description here

Vacations, bank holidays (blue) and what not are stored in the entries table. Depending on their hierarchy they are linked to different locations (e.g. country, federal state or school). Weekends (gray) belong to the country. Normal school vacations (green) to the federal state. Special events belong to the school. They often overlap.

Assuming I want to query all entries from the location with the id 2 and want to know how many days (duration) an entry has, I can do this with

SELECT
    e.starts_on,
    e.ends_on,
    e.name,
    (e.ends_on - e.starts_on + 1) AS days,
    l.name AS location_name
FROM
    entries e
    INNER JOIN locations l ON e.location_id = l.id
WHERE
    l.id = 2;

This results in

starts_on ends_on name days location_name
2022-12-21 2023-01-07 Christmas school vacation 18 Baden-Württemberg
2023-01-06 2023-01-06 Bank Holiday 1 Baden-Württemberg

But any student wants to know the total number of days there is no school. Including adjoining weekends or bank holidays.

That could be solved with Find rows with adjourning date ranges and accumulate their durations

But I have trouble to combine that with the recursion (searching for a school but actually search for a school, a federal state and a country).

Question

How can I select the days of the entry and the total_days of all the adjoining entries of either the location and of all parents of the location? And can I also aggregate the entry_ids that adjoin or overlap in one attribute too?

I want to search for the "Christmas school vacation" of the school with the id 3 and get this result:

starts_on ends_on name days location_name total_days aggr_location_ids real_start real_end
2022-12-21 2023-01-07 Christmas school vacation 18 Baden-Württemberg 19 [1, 2, 3, 4, 6, 7, 8] 2022-12-21 23-01-08

Setup with PostgreSQL

https://www.db-fiddle.com/f/jVPQoFPQw7axf7XivwpiPZ/0

CREATE TABLE locations (
  id serial PRIMARY KEY,
  name varchar(255) NOT NULL,
  parent_id integer REFERENCES locations (id)
);

CREATE TABLE entries (
  id serial PRIMARY KEY,
  starts_on date NOT NULL,
  ends_on date NOT NULL,
  name varchar(255) NOT NULL,
  location_id integer REFERENCES locations (id) NOT NULL
);
ALTER TABLE entries ADD CONSTRAINT ends_after_starts CHECK (ends_on >= starts_on);

INSERT INTO locations (name) VALUES ('Germany');
INSERT INTO locations (name, parent_id) VALUES ('Baden-Württemberg', 1);
INSERT INTO locations (name, parent_id) VALUES ('Example school', 2);

INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-21', '2023-01-07', 'Christmas school vacation', 2);

INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-25', '2022-12-26', 'Christmas', 1);

INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-01', '2023-01-01', 'New Year', 1);

INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-06', '2023-01-06', 'Bank Holiday', 2);

INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-17', '2022-12-18', 'Weekend', 1);

INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-24', '2022-12-25', 'Weekend', 1);

INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2022-12-31', '2023-01-01', 'Weekend', 1);

INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-07', '2023-01-08', 'Weekend', 1);

INSERT INTO entries (starts_on, ends_on, name, location_id) VALUES ('2023-01-14', '2023-01-15', 'Weekend', 1);

2

Answers


  1. Hello there is a solution using WITH RECURSIVE

        WITH RECURSIVE rec( eid , lid ,parent_id    )
        AS (
       SELECT e.id , l.id ,l.parent_id
    FROM entries e
    INNER JOIN locations l ON e.location_id = l.id
    WHERE l.id = 2 
        UNION ALL 
         SELECT rec.eid , b.id , b.parent_id  
         FROM rec  
         JOIN locations AS b ON rec.parent_id = b.id 
        )
        SELECT  
        e.id
        ,e.starts_on,
        e.ends_on,
        
        SUM((e.ends_on - e.starts_on + 1)) AS days
    
        FROM rec 
        JOIN locations AS l ON l.id = rec.lid 
        JOIN entries AS e ON e.id = rec.eid 
        GROUP BY e.id 
    

    The first query from WITH is used as a pivot and get stored inside of WITH rec(eid , lid , parent_id ) , the next query from union ALL
    does a SELECT using the vales from REC , uses itself to relies under recursivity.

    Login or Signup to reply.
  2. The query below first produces the recursive lookup for all the location entries and then, in get_vacations, searches for vacations that are adjacent to the holiday originally searched for:

    with recursive cte as (
       select l.*, jsonb_build_array(l.id) tree from locations l where l.parent_id is null
       union all
       select l.*, c.tree || jsonb_build_array(l.id) from cte c join locations l on l.parent_id = c.id
    ),
    get_vacations(id, t, h_id, r_s, r_e) as (
       select c.id, c.tree, e.id, e.starts_on, e.ends_on 
       from cte c join entries e on exists (select 1 from jsonb_array_elements(c.tree) v 
           where (v.value#>>'{}')::int = e.location_id) 
       where c.id = 3 and e.name = 'Christmas school vacation' -- SEARCH CRITERIA
       union all
       select g.id, g.t, g.h_id, least(e.starts_on, g.r_s), greatest(e.ends_on, g.r_e) 
       from get_vacations g join entries e on exists (select 1 from jsonb_array_elements(g.t) v 
          where (v.value#>>'{}')::int = e.location_id) 
        and e.starts_on = g.r_e or e.ends_on = g.r_s
    )
    select e.starts_on, e.ends_on, e.name, e.ends_on - e.starts_on days, 
       l.name location_name, t.r_e - t.r_s total_days, t.t aggr_location_ids, 
       t.r_s real_start, t.r_e real_end
    from (select g.id, g.t, g.h_id, min(g.r_s) r_s, max(g.r_e) r_e 
          from get_vacations g group by g.id, g.t, g.h_id) t 
    join entries e on e.id = t.h_id 
    join locations l on l.id = (t.t ->> 1)::int
    

    See fiddle.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search