skip to Main Content

This is my first question on the platform and I am beginning the SQL journey! 😀
I have a bit of a problem with my code and i do not know what I overlooked

I have a table ‘Vacations’ with 3 columns in Mysql
marca, which is unique code for a person’s name
start_date which is the first day of his vacation
and end_date which is the last one.

This is a portion of my table:

marca start_date end_date
750001 2022-01-12 2022-01-25
750001 2022-09-30 2022-10-15
750002 2022-03-30 2022-03-30
750002 2022-04-09 2022-04-10
750002 2022-06-15 2022-06-29
750003 2022-02-20 2022-02-20
750003 2022-05-16 2022-05-20 (16-20 it’s Monday to Friday)
750003 2022-05-23 2022-05-27 (same)
750003 2022-05-30 2022-06-03 (same)
750003 2022-06-06 2022-06-07 (Monday and Tuesday)
750003 2022-08-02 2022-08-23
750003 2022-08-23 2022-08-25

I need to see which people exceeds 10 free days.
Some people have it clear how they have it, like 750001
But some people have vacation from Monday to Friday and then again starting with the next Monday.
Therefor, I want my code to take in consideration that and combine those 2 (or more) segments of vacation that are separated by the same weekend and also if the next start_date is exactly the next day after the last end_date to combine them.

So the resulting table should be looking like:

marca start_date end_date
750001 2022-01-12 2022-01-25
750001 2022-09-30 2022-10-15
750002 2022-06-15 2022-06-29
750003 2022-05-16 2022-06-07
750003 2022-08-02 2022-08-25

This is the code:

SELECT marca, start_date, end_date
FROM (
    SELECT marca, start_date,
        CASE WHEN DATE_FORMAT(end_date, '%W') = 'Friday' AND DATE_FORMAT(LEAD(start_date) OVER (PARTITION BY marca ORDER BY start_date), '%W') = 'Monday'
                 AND DATEDIFF(LEAD(start_date) OVER (PARTITION BY marca ORDER BY start_date), end_date) = 3
             THEN LEAD(end_date) OVER (PARTITION BY marca ORDER BY start_date)
             ELSE end_date
        END AS end_date
    FROM vacations
) AS subquery
WHERE start_date <= end_date
ORDER BY marca, start_date;

It doesn’t do nothing 🙁

2

Answers


  1. Your current query does exactly what you have asked of it. It extends the end_date of the CURRENT ROW to that of the LEAD ROW when the conditions are met:

    marca start_date orig_end_date end_date
    750001 2022-01-12 2022-01-25 2022-01-25
    750001 2022-09-30 2022-10-15 2022-10-15
    750002 2022-03-30 2022-03-30 2022-03-30
    750002 2022-04-09 2022-04-10 2022-04-10
    750002 2022-06-15 2022-06-29 2022-06-29
    750003 2022-02-20 2022-02-20 2022-02-20
    750003 2022-05-16 2022-05-20 2022-05-27
    750003 2022-05-23 2022-05-27 2022-06-03
    750003 2022-05-30 2022-06-03 2022-06-07
    750003 2022-06-06 2022-06-07 2022-06-07
    750003 2022-08-02 2022-08-23 2022-08-23
    750003 2022-08-23 2022-08-25 2022-08-25

    One approach to this problem is to treat it as a gaps-and-islands problem, as you are looking for sequences of consecutive free days.

    We start by creating a list of all dates we are interested in:

    WITH RECURSIVE calendar (dt) AS (
        SELECT MIN(start_date) FROM vacations
        UNION ALL
        SELECT dt + INTERVAL 1 DAY FROM calendar WHERE dt < (SELECT MAX(end_date) FROM vacations)
    )
    

    Note: cte_max_recursion_depth has a default value of 1000, causing the CTE to terminate when it recurses past 1000 levels. You can read more here – Limiting Common Table Expression Recursion.

    If you are running this query for a period greater than 1000 days you will need to increase the recursion depth:

    SET SESSION cte_max_recursion_depth = 10000;
    

    We can then CROSS JOIN the calendar to a distinct list of the employees we are interested in (I have used a DISTINCT query on the vacations table), and then LEFT JOIN to vacations to get a full list of when employees have a free_day, either because they are on vacation or it is a weekend day (v.id IS NOT NULL OR WEEKDAY(c.dt) IN (5, 6)).

    To look for sequences of consecutive free-days we are using c.dt - INTERVAL (DENSE_RANK() OVER (PARTITION BY e.marca ORDER BY c.dt)) DAY which can be illustrated by:

    dt dense_rank dt – INTERVAL (dense_rank) DAY
    2022-01-03 1 2022-01-02
    2022-01-04 2 2022-01-02
    2022-01-05 3 2022-01-02
    2022-01-05 3 2022-01-02
    2022-01-06 4 2022-01-02
    2022-01-07 5 2022-01-02
    2022-01-19 6 2022-01-13
    2022-01-20 7 2022-01-13

    ROW_NUMBER() is commonly used for the above grouping but, due to your example having an employee with two vacation entries for the same date (750003, 2022-08-23), we need to use DENSE_RANK() instead.

    WITH RECURSIVE calendar (dt) AS (
        SELECT MIN(start_date) FROM vacations
        UNION ALL
        SELECT dt + INTERVAL 1 DAY FROM calendar WHERE dt < (SELECT MAX(end_date) FROM vacations)
    ),
    free_days AS (
        SELECT c.dt, e.marca, v.start_date, v.end_date,
            c.dt - INTERVAL (DENSE_RANK() OVER (PARTITION BY e.marca ORDER BY c.dt)) DAY AS free_day_grp
        FROM calendar c
        CROSS JOIN (SELECT DISTINCT marca FROM vacations) e
        LEFT JOIN vacations v
            ON c.dt BETWEEN v.start_date AND v.end_date
            AND e.marca = v.marca
        WHERE v.id IS NOT NULL OR WEEKDAY(c.dt) IN (5, 6)
    )
    SELECT marca, MIN(start_date) AS start_date, MAX(end_date) AS end_date
    FROM free_days
    GROUP BY marca, free_day_grp
    HAVING DATEDIFF(end_date, start_date) + 1 > 10;
    
    -- Obviously, the HAVING clause could be written as
    -- HAVING DATEDIFF(end_date, start_date) > 9;
    

    Output:

    marca start_date end_date
    750001 2022-01-12 2022-01-25
    750001 2022-09-30 2022-10-15
    750002 2022-06-15 2022-06-29
    750003 2022-05-16 2022-06-07
    750003 2022-08-02 2022-08-25

    Here’s a db<>fiddle

    Login or Signup to reply.
  2. Following should list any date sequence that extends or overlaps a prior vacation. Merging the two so you see both is an enhancement you should be able to do from this:

    WITH t1 AS(
        SELECT marca
             , CASE DATE_FORMAT(start_date,'%W')
                   WHEN 'Sunday' THEN DATE_ADD(start_date,interval -1 day)
                   WHEN 'Monday' THEN DATE_ADD(start_date,interval -2 day)
                   ELSE start_date
               END AS start_date
             , CASE DATE_FORMAT(end_date, '%W')
                   WHEN 'Friday' THEN DATE_ADD(end_date, interval 2 day)
                   WHEN 'Saturday' THEN DATE_ADD(end_date, interval 1 day)
               ELSE end_date END AS end_date
        FROM vacations )
    SELECT * 
    FROM   t1
    WHERE EXISTS(
        SELECT 1
        FROM t1 sub
        WHERE  sub.marca=t1.marca
           AND sub.end_date BETWEEN t1.start_date AND t1.end_date
           AND sub.start_date <> t1.start_date
         );
    

    It works by adjusting every start date that is Sunday or Monday to the preceding Saturday date… and converting every end date that is Friday or Saturday to the following Sunday date. Then, it looks for instances where there’s an overlap in dates (which there will be in your problem case).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search