This is my first question on the platform and I am beginning the SQL journey! 😀
I have a bit of a problem with my code and i do not know what I overlooked
I have a table ‘Vacations’ with 3 columns in Mysql
marca, which is unique code for a person’s name
start_date which is the first day of his vacation
and end_date which is the last one.
This is a portion of my table:
marca | start_date | end_date | |
---|---|---|---|
750001 | 2022-01-12 | 2022-01-25 | |
750001 | 2022-09-30 | 2022-10-15 | |
750002 | 2022-03-30 | 2022-03-30 | |
750002 | 2022-04-09 | 2022-04-10 | |
750002 | 2022-06-15 | 2022-06-29 | |
750003 | 2022-02-20 | 2022-02-20 | |
750003 | 2022-05-16 | 2022-05-20 | (16-20 it’s Monday to Friday) |
750003 | 2022-05-23 | 2022-05-27 | (same) |
750003 | 2022-05-30 | 2022-06-03 | (same) |
750003 | 2022-06-06 | 2022-06-07 | (Monday and Tuesday) |
750003 | 2022-08-02 | 2022-08-23 | |
750003 | 2022-08-23 | 2022-08-25 |
I need to see which people exceeds 10 free days.
Some people have it clear how they have it, like 750001
But some people have vacation from Monday to Friday and then again starting with the next Monday.
Therefor, I want my code to take in consideration that and combine those 2 (or more) segments of vacation that are separated by the same weekend and also if the next start_date is exactly the next day after the last end_date to combine them.
So the resulting table should be looking like:
marca | start_date | end_date |
---|---|---|
750001 | 2022-01-12 | 2022-01-25 |
750001 | 2022-09-30 | 2022-10-15 |
750002 | 2022-06-15 | 2022-06-29 |
750003 | 2022-05-16 | 2022-06-07 |
750003 | 2022-08-02 | 2022-08-25 |
This is the code:
SELECT marca, start_date, end_date
FROM (
SELECT marca, start_date,
CASE WHEN DATE_FORMAT(end_date, '%W') = 'Friday' AND DATE_FORMAT(LEAD(start_date) OVER (PARTITION BY marca ORDER BY start_date), '%W') = 'Monday'
AND DATEDIFF(LEAD(start_date) OVER (PARTITION BY marca ORDER BY start_date), end_date) = 3
THEN LEAD(end_date) OVER (PARTITION BY marca ORDER BY start_date)
ELSE end_date
END AS end_date
FROM vacations
) AS subquery
WHERE start_date <= end_date
ORDER BY marca, start_date;
It doesn’t do nothing 🙁
2
Answers
Your current query does exactly what you have asked of it. It extends the
end_date
of theCURRENT ROW
to that of theLEAD ROW
when the conditions are met:One approach to this problem is to treat it as a gaps-and-islands problem, as you are looking for sequences of consecutive free days.
We start by creating a list of all dates we are interested in:
Note: cte_max_recursion_depth has a default value of 1000, causing the CTE to terminate when it recurses past 1000 levels. You can read more here – Limiting Common Table Expression Recursion.
If you are running this query for a period greater than 1000 days you will need to increase the recursion depth:
We can then
CROSS JOIN
the calendar to a distinct list of the employees we are interested in (I have used a DISTINCT query on the vacations table), and thenLEFT JOIN
to vacations to get a full list of when employees have afree_day
, either because they are on vacation or it is a weekend day (v.id IS NOT NULL OR WEEKDAY(c.dt) IN (5, 6)
).To look for sequences of consecutive free-days we are using
c.dt - INTERVAL (DENSE_RANK() OVER (PARTITION BY e.marca ORDER BY c.dt)) DAY
which can be illustrated by:ROW_NUMBER() is commonly used for the above grouping but, due to your example having an employee with two vacation entries for the same date (750003, 2022-08-23), we need to use DENSE_RANK() instead.
Output:
Here’s a db<>fiddle
Following should list any date sequence that extends or overlaps a prior vacation. Merging the two so you see both is an enhancement you should be able to do from this:
It works by adjusting every start date that is Sunday or Monday to the preceding Saturday date… and converting every end date that is Friday or Saturday to the following Sunday date. Then, it looks for instances where there’s an overlap in dates (which there will be in your problem case).