gaps and islands with date format - Mysql

dannielsen
February 14, 2023
115 views
0 votes
2 Answers

Lets say I have a table like this:

ma_id	act_date
1	2023-01-01
1	2023-01-02
1	2023-01-03
1	2023-01-05
1	2023-01-06
2	2023-02-08
2	2023-02-09

I have read a lot of guides but couldn’t find what I was looking for…
I want a result like this:

ma_id	start_date	end_date
1	2023-01-01	2023-01-03
1	2023-01-05	2023-01-06
2	2023-02-08	2023-02-09

to select a new row each time a date is missing…

I have an sql I have tried here but it is designed for datetime-format and not just date-format….:

select *,min(act_date),max(act_date) from (select t.*,sum(case when prev_act_date >= act_date then 0 else 1 end) over(partition by ma_id,date_format(act_date,'%d-%m-%Y') order by act_date) grp from (select t.*,lag(act_date) over(partition by ma_id,date_format(act_date,'%d-%m-%Y') order by act_date) prev_act_date from XXXX.XXXX t where t.ma_id in (1,2)) t) t group by ma_id,date_format(act_date,'%d-%m-%Y'),grp order by min(act_date)

Tags: mysql

Answers

- TimBiegeleisen
- February 14, 2023 at 11:17 pm
- 0 votes
0
Assuming MySQL 8+:
```
WITH cte AS (
    SELECT *, DATEDIFF(act_date,
                       LAG(act_date, 1, act_date - INTERVAL 1 DAY) OVER
                           (PARTITION BY ma_id ORDER BY act_date)) - 1 AS diff
    FROM yourTable
),
cte2 AS (
    SELECT *, SUM(diff) OVER (PARTITION BY ma_id ORDER BY act_date) AS grp
    FROM cte
)

SELECT ma_id, MIN(act_date) AS start_date, MAX(act_date) AS end_date
FROM cte2
GROUP BY ma_id, grp
ORDER BY 1, 2;
```
The first CTE computes the difference in days between adjacent records. We use the long form of the LAG() window function, to ensure that the first record has a date difference of zero. The second CTE sums this date difference to form a pseudo group. Appreciate that the group number only changes when there is a date gap greater than one. Finally, we aggregate by ma_id and pseudo group to find the start and end dates.
Login or Signup to reply.

I will try to explain step by step(using row_number() function):

1. add a fixed date column to each row, like ‘19700101’

select yt.ma_id, yt.act_date, '19700101' from yourTable yt;

2. add column diff of `datediff` from act_date to ‘19700101’

select yt.ma_id, yt.act_date, '19700101', datediff(date(yt.act_date), '19700101') as diff from yourTable yt;

3. add column using row_number(partition by yt.ma_id order by yt.act_date) function to add another column to result

 select yt.ma_id, yt.act_date, '19700101', datediff(date(yt.act_date), '19700101') as diff, row_number() over (partition by yt.ma_id order by date(yt.act_date)) as row_num from yourTable yt;

4. calculate `diff` – `row_num`, we interested in column `diff_minus_row_num`

 select yt.ma_id, yt.act_date, '19700101', datediff(date(yt.act_date), '19700101') as diff, row_number() over (partition by yt.ma_id order by date(yt.act_date)) as row_num, datediff(date(yt.act_date), '19700101')  -  row_number() over (partition by yt.ma_id order by date(yt.act_date)) as diff_minus_row_num from yourTable yt;

5. use `group by` and `min`, `max` to get the results:

 select outside.ma_id, min(act_date) as 'start_date', max(act_date) as 'end_date' from (select yt.ma_id, yt.act_date, '19700101', datediff(date(yt.act_date), '19700101') as diff, row_number() over (partition by yt.ma_id order by date(yt.act_date)) as row_num, datediff(date(yt.act_date), '19700101')  -  row_number() over (partition by yt.ma_id order by date(yt.act_date)) as diff_minus_row_num from yourTable yt) outside group by outside.diff_minus_row_num order by 1,2;

Please signup or login to give your own answer.

Click here to cancel reply.

gaps and islands with date format – Mysql

Answers

1. add a fixed date column to each row, like ‘19700101’

2. add column diff of datediff from act_date to ‘19700101’

3. add column using row_number(partition by yt.ma_id order by yt.act_date) function to add another column to result

4. calculate diff – row_num, we interested in column diff_minus_row_num

5. use group by and min, max to get the results:

2. add column diff of `datediff` from act_date to ‘19700101’

4. calculate `diff` – `row_num`, we interested in column `diff_minus_row_num`

5. use `group by` and `min`, `max` to get the results: