MySQL update query against a certain number of consecutive rows

trifola
April 3, 2023
233 views
2 votes
3 Answers

this is an example of initial table data:

      |   id     | reservation |
      | -------- | ----------- |
      |    1     |      15     |
      |    2     |      0      |
      |    3     |      0      |
      |    4     |      16     |
      |    5     |      16     |
      |    6     |      0      |
      |    7     |      0      |
      |    8     |      0      |
      |    9     |      17     |
      |    10    |      17     |
      |    11    |      17     |
      |    12    |      17     |
      |    13    |      17     |
      |    14    |      17     |
      |    15    |      0      |
      |    16    |      0      |
      |    17    |      0      |
      |    18    |      0      |
      ...

I would like to update the table with reservation = 18 only where I find the first 3 consecutive rows without resevation, that is reservation = 0.

After the update, the resulting table would be:

      |   id     | reservation |
      | -------- | ----------- |
      |    1     |      15     |
      |    2     |      0      |
      |    3     |      0      |
      |    4     |      16     |
      |    5     |      16     |
      |    6     |      18     |<
      |    7     |      18     |<
      |    8     |      18     |<
      |    9     |      17     |
      |    10    |      17     |
      |    11    |      17     |
      |    12    |      17     |
      |    13    |      17     |
      |    14    |      17     |
      |    15    |      0      |
      |    16    |      0      |
      |    17    |      0      |
      |    18    |      0      |
      ...

Is it possible using Recursive Common Table Expressions?
How can i make it?

Thanks!

Answers

- PanagiotisKanavos
- April 3, 2023 at 10:20 am
- 0 votes
0
This is a gaps-and-islands problem. You’re trying to find islands of consecutive ids where reservation is 0. Problems like this can’t be solved with recursive CTEs. Typically, they’re solved using windowing and ranking functions like LAG() or ROW_NUMBER(). Such functions were added to MySQL in version 8.0. This means that most articles on gaps-and-islands refer to different databases. The functionality is the same though.

One way to solve the problem is to take advantage of the consecutive IDs. The ROW_NUMBER() ranking function returns the row number in the subset of rows without reservations. The difference of the row number from ID will be the same inside each island. The following query :
```
select 
    ID,
    Reservation, 
    ROW_NUMBER() OVER(ORDER BY ID) as Row_ID,
    ID-ROW_NUMBER() OVER(ORDER BY ID) as island_id
FROM Reservations
Where Reservation=0
```
Returns
```
ID  Reservation Row_ID  island_id
2   0   1   1
3   0   2   1
6   0   3   3
7   0   4   3
8   0   5   3
15  0   6   9
16  0   7   9
17  0   8   9
18  0   9   9
```
Once we have a way to identify the islands, we can find islands with exactly 3 members:
```
with islands as (
    select 
        ID,
        Reservation, 
        ROW_NUMBER() OVER(ORDER BY ID) as Row_ID,
        ID-ROW_NUMBER() OVER(ORDER BY ID) as island_id
    FROM Reservations
    Where Reservation=0
)
select 
    MIN(ID) AS island_from,
    MAX(ID) AS island_to,
    ROW_NUMBER() OVER (ORDER BY island_id) as Range_ID
from islands
group by island_id
having count(*)=3
```
This returns
```
island_from island_to Range_ID
6           8         1
```
ROW_NUMBER() OVER (ORDER BY island_id) as Range_ID is used to give an identifier to each range, allowing us to update the IDs returned in the first (or last) range only.

This can be converted to a CTE itself and used to update the Reservations table :
```
with islands as (
    select 
        ID,
        Reservation, 
        ROW_NUMBER() OVER(ORDER BY ID) as Row_ID,
        ID - ROW_NUMBER() OVER(ORDER BY ID) as island_id
    FROM Reservations
    Where Reservation=0
), 
ranges as (
    select 
        MIN(ID) AS island_from,
        MAX(ID) AS island_to,
        ROW_NUMBER() OVER (ORDER BY island_id) as Range_ID
    from islands
    group by island_id
    having count(*)=3
    )
UPDATE reservations r 
    INNER JOIN ranges on r.ID BETWEEN island_from and island_to
SET r.Reservation=18
WHERE Range_ID=1;
```
The UPDATE ... JOIN is a MySQL peculiarity. In other databases (PostgreSQL, SQL Server) you’d write
```
UPDATE reservations
SET Reservation=18
FROM reservations inner join ranges 
    on reservations.id between island_from and island_to
WHERE Range_ID=1;
```
Why count(*)=3 ?

This avoids fragmentation of the remaining islands. If there were three ranges with 5,3, and 6 rooms free, selecting the first available range would result in a new 2-room range that would be harder to use. Selecting the exact match would create no new range while selecting the largest would create a 3-room range, that would be more useful.

This is a common problem in disk and RAM allocation – or any allocation scenario.

To take advantage of larger areas one would need to come up with a strategy that reduces fragmentation. This can be done by changing the Range_ID order

The first strategy, using (ORDER BY island_id) will keep picking the same ranges first.

A best match strategy can be specified with ORDER BY MAX(ID)-MIN(ID) DESC:
```
    ROW_NUMBER() OVER (ORDER BY MAX(ID)-MIN(ID)) as Range_ID
from islands
group by island_id
having count(*)>=3
```
Surprisingly, this is not a good strategy as it results in a lot of small islands. That’s why file systems try to avoid it for larger files

A worst match strategy results in less fragmentation and larger ranges
```
ROW_NUMBER() OVER (ORDER BY MAX(ID)-MIN(ID) DESC) as Range_ID
```
Login or Signup to reply.

- user1191247
- April 3, 2023 at 10:21 am
- 0 votes
0
You can use ROW_NUMBER() to give a continuous sequence, subtracted from your id column, to find groups of consecutive 0 values:
```
UPDATE reservations r
JOIN (
    SELECT *, COUNT(*) OVER (PARTITION BY grp) AS consecutive_available
    FROM (
        SELECT id, reservation, id - ROW_NUMBER() OVER (ORDER BY id ASC) AS grp
        FROM reservations
        WHERE reservation = 0
    ) t1
) t2 ON r.id = t2.id
SET r.reservation = 18
WHERE consecutive_available >= 3
ORDER BY t2.id ASC
LIMIT 3;
```
The innermost subquery (t1) gives us the groups of consecutive ids and then the next subquery (t2) gives us the count within each group.

id reservation grp consecutive_available

2 0 1 2

3 0 1 2

6 0 3 3

7 0 3 3

8 0 3 3

15 0 9 4

16 0 9 4

17 0 9 4

18 0 9 4

If you are concerned about fragmentation, which you should be as pointed out by Panagiotis, you can:
```
ORDER BY
    IF(t2.consecutive_available = 3, 999999, t2.consecutive_available) DESC,
    t2.id ASC
```
which will give you an exact match on the gap size if available (by replacing it with high out of range value), otherwise it will take the largest gap to reduce the number of tiny gaps created.

Here’s a db<>fiddle showing the steps.
Login or Signup to reply.

id	grp	consecutive_available
2	1	2
3	1	2
6	3	3
7	3	3
8	3	3
15	9	4
16	9	4
17	9	4
18	9	4

For MySQL 8.0 or versions below, it’s possible to do it without using a window function or doing the row_number sequencing (supposing all your id values are consecutive). Since you only require 3 consecutive 0, we can use table joining. Check the steps below:

-- First of all, we shall get the initial id which satifies 3 consecutive 0 as the reservation value:
select t1.id
from test t1
join test t2
on t2.id=t1.id+1
join test t3
on t3.id=t2.id+1
where t1.reservation=0 and t2.reservation=0 and t3.reservation=0
order by id
limit 1
; 

-- result:
+------+
| id   |
+------+
|    6 |
+------+

Then , Use CASE to replace the reservation value whose id has the value between the calculated initial id and two ids right after it by using BETWEEN.

select id, 
case when id between (select t1.id
                    from test t1
                    join test t2
                    on t2.id=t1.id+1
                    join test t3
                    on t3.id=t2.id+1
                    where t1.reservation=0 and t2.reservation=0 and t3.reservation=0
                    order by id
                    limit 1) 
                AND
                    (select t1.id
                    from test t1
                    join test t2
                    on t2.id=t1.id+1
                    join test t3
                    on t3.id=t2.id+1
                    where t1.reservation=0 and t2.reservation=0 and t3.reservation=0
                    order by id
                    limit 1) +2 
    then 18
    else reservation end as reservation
from test
order by id;    

-- result:
+------+-------------+
| id   | reservation |
+------+-------------+
|    1 |          15 |
|    2 |           0 |
|    3 |           0 |
|    4 |          16 |
|    5 |          16 |
|    6 |          18 |
|    7 |          18 |
|    8 |          18 |
|    9 |          17 |
|   10 |          17 |
|   11 |          17 |
|   12 |          17 |
|   13 |          17 |
|   14 |          17 |
|   15 |           0 |
|   16 |           0 |
|   17 |           0 |
|   18 |           0 |
+------+-------------+

By the way, if ID is indeed consecutive and unique, please make sure it’s made the table’s PRIMARY KEY to improve performance.

-- without any index for id, here is the execution plan, which resort to full table scan for every single table:
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
|  1 | PRIMARY     | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |   100.00 | Using filesort                                     |
|  3 | SUBQUERY    | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |    10.00 | Using where; Using temporary; Using filesort       |
|  3 | SUBQUERY    | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |     9.09 | Using where; Using join buffer (Block Nested Loop) |
|  3 | SUBQUERY    | t3    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |     9.09 | Using where; Using join buffer (Block Nested Loop) |
|  2 | SUBQUERY    | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |    10.00 | Using where; Using temporary; Using filesort       |
|  2 | SUBQUERY    | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |     9.09 | Using where; Using join buffer (Block Nested Loop) |
|  2 | SUBQUERY    | t3    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |     9.09 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+


-- after id set to PK, execution plan significantly improves by using PK and eq_ref as the access type:
+----+-------------+-------+------------+--------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+------+------+----------+-------------+
|  1 | PRIMARY     | test  | NULL       | index  | NULL          | PRIMARY | 4       | NULL |   18 |   100.00 | NULL        |
|  3 | SUBQUERY    | t1    | NULL       | index  | NULL          | PRIMARY | 4       | NULL |   18 |    10.00 | Using where |
|  3 | SUBQUERY    | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | func |    1 |    10.00 | Using where |
|  3 | SUBQUERY    | t3    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | func |    1 |    10.00 | Using where |
|  2 | SUBQUERY    | t1    | NULL       | index  | NULL          | PRIMARY | 4       | NULL |   18 |    10.00 | Using where |
|  2 | SUBQUERY    | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | func |    1 |    10.00 | Using where |
|  2 | SUBQUERY    | t3    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | func |    1 |    10.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------+---------+------+------+----------+-------------+
 
-- this is the query time and number of examined rows without a PK:
Query_time: 0.001728  Lock_time: 0.000602 Rows_sent: 18  Rows_examined: 152

-- this is the query time and number of examined rows with ID being PK:
Query_time: 0.001666  Lock_time: 0.001178 Rows_sent: 18  Rows_examined: 40

Please signup or login to give your own answer.

Click here to cancel reply.