skip to Main Content

this is an example of initial table data:

      |   id     | reservation |
      | -------- | ----------- |
      |    1     |      15     |
      |    2     |      0      |
      |    3     |      0      |
      |    4     |      16     |
      |    5     |      16     |
      |    6     |      0      |
      |    7     |      0      |
      |    8     |      0      |
      |    9     |      17     |
      |    10    |      17     |
      |    11    |      17     |
      |    12    |      17     |
      |    13    |      17     |
      |    14    |      17     |
      |    15    |      0      |
      |    16    |      0      |
      |    17    |      0      |
      |    18    |      0      |
      ...

I would like to update the table with reservation = 18 only where I find the first 3 consecutive rows without resevation, that is reservation = 0.

After the update, the resulting table would be:

      |   id     | reservation |
      | -------- | ----------- |
      |    1     |      15     |
      |    2     |      0      |
      |    3     |      0      |
      |    4     |      16     |
      |    5     |      16     |
      |    6     |      18     |<
      |    7     |      18     |<
      |    8     |      18     |<
      |    9     |      17     |
      |    10    |      17     |
      |    11    |      17     |
      |    12    |      17     |
      |    13    |      17     |
      |    14    |      17     |
      |    15    |      0      |
      |    16    |      0      |
      |    17    |      0      |
      |    18    |      0      |
      ...

Is it possible using Recursive Common Table Expressions?
How can i make it?

Thanks!

3

Answers


  1. This is a gaps-and-islands problem. You’re trying to find islands of consecutive ids where reservation is 0. Problems like this can’t be solved with recursive CTEs. Typically, they’re solved using windowing and ranking functions like LAG() or ROW_NUMBER(). Such functions were added to MySQL in version 8.0. This means that most articles on gaps-and-islands refer to different databases. The functionality is the same though.

    One way to solve the problem is to take advantage of the consecutive IDs. The ROW_NUMBER() ranking function returns the row number in the subset of rows without reservations. The difference of the row number from ID will be the same inside each island. The following query :

    select 
        ID,
        Reservation, 
        ROW_NUMBER() OVER(ORDER BY ID) as Row_ID,
        ID-ROW_NUMBER() OVER(ORDER BY ID) as island_id
    FROM Reservations
    Where Reservation=0
    

    Returns

    ID  Reservation Row_ID  island_id
    2   0   1   1
    3   0   2   1
    6   0   3   3
    7   0   4   3
    8   0   5   3
    15  0   6   9
    16  0   7   9
    17  0   8   9
    18  0   9   9
    

    Once we have a way to identify the islands, we can find islands with exactly 3 members:

    
    with islands as (
        select 
            ID,
            Reservation, 
            ROW_NUMBER() OVER(ORDER BY ID) as Row_ID,
            ID-ROW_NUMBER() OVER(ORDER BY ID) as island_id
        FROM Reservations
        Where Reservation=0
    )
    select 
        MIN(ID) AS island_from,
        MAX(ID) AS island_to,
        ROW_NUMBER() OVER (ORDER BY island_id) as Range_ID
    from islands
    group by island_id
    having count(*)=3
    

    This returns

    island_from island_to Range_ID
    6           8         1
    

    ROW_NUMBER() OVER (ORDER BY island_id) as Range_ID is used to give an identifier to each range, allowing us to update the IDs returned in the first (or last) range only.

    This can be converted to a CTE itself and used to update the Reservations table :

    with islands as (
        select 
            ID,
            Reservation, 
            ROW_NUMBER() OVER(ORDER BY ID) as Row_ID,
            ID - ROW_NUMBER() OVER(ORDER BY ID) as island_id
        FROM Reservations
        Where Reservation=0
    ), 
    ranges as (
        select 
            MIN(ID) AS island_from,
            MAX(ID) AS island_to,
            ROW_NUMBER() OVER (ORDER BY island_id) as Range_ID
        from islands
        group by island_id
        having count(*)=3
        )
    UPDATE reservations r 
        INNER JOIN ranges on r.ID BETWEEN island_from and island_to
    SET r.Reservation=18
    WHERE Range_ID=1;
    

    The UPDATE ... JOIN is a MySQL peculiarity. In other databases (PostgreSQL, SQL Server) you’d write

    UPDATE reservations
    SET Reservation=18
    FROM reservations inner join ranges 
        on reservations.id between island_from and island_to
    WHERE Range_ID=1;
    

    Why count(*)=3 ?

    This avoids fragmentation of the remaining islands. If there were three ranges with 5,3, and 6 rooms free, selecting the first available range would result in a new 2-room range that would be harder to use. Selecting the exact match would create no new range while selecting the largest would create a 3-room range, that would be more useful.

    This is a common problem in disk and RAM allocation – or any allocation scenario.

    To take advantage of larger areas one would need to come up with a strategy that reduces fragmentation. This can be done by changing the Range_ID order

    The first strategy, using (ORDER BY island_id) will keep picking the same ranges first.

    A best match strategy can be specified with ORDER BY MAX(ID)-MIN(ID) DESC:

        ROW_NUMBER() OVER (ORDER BY MAX(ID)-MIN(ID)) as Range_ID
    from islands
    group by island_id
    having count(*)>=3
    

    Surprisingly, this is not a good strategy as it results in a lot of small islands. That’s why file systems try to avoid it for larger files

    A worst match strategy results in less fragmentation and larger ranges

    ROW_NUMBER() OVER (ORDER BY MAX(ID)-MIN(ID) DESC) as Range_ID
    
    Login or Signup to reply.
  2. You can use ROW_NUMBER() to give a continuous sequence, subtracted from your id column, to find groups of consecutive 0 values:

    UPDATE reservations r
    JOIN (
        SELECT *, COUNT(*) OVER (PARTITION BY grp) AS consecutive_available
        FROM (
            SELECT id, reservation, id - ROW_NUMBER() OVER (ORDER BY id ASC) AS grp
            FROM reservations
            WHERE reservation = 0
        ) t1
    ) t2 ON r.id = t2.id
    SET r.reservation = 18
    WHERE consecutive_available >= 3
    ORDER BY t2.id ASC
    LIMIT 3;
    

    The innermost subquery (t1) gives us the groups of consecutive ids and then the next subquery (t2) gives us the count within each group.

    id reservation grp consecutive_available
    2 0 1 2
    3 0 1 2
    6 0 3 3
    7 0 3 3
    8 0 3 3
    15 0 9 4
    16 0 9 4
    17 0 9 4
    18 0 9 4

    If you are concerned about fragmentation, which you should be as pointed out by Panagiotis, you can:

    ORDER BY
        IF(t2.consecutive_available = 3, 999999, t2.consecutive_available) DESC,
        t2.id ASC
    

    which will give you an exact match on the gap size if available (by replacing it with high out of range value), otherwise it will take the largest gap to reduce the number of tiny gaps created.

    Here’s a db<>fiddle showing the steps.

    Login or Signup to reply.
  3. For MySQL 8.0 or versions below, it’s possible to do it without using a window function or doing the row_number sequencing (supposing all your id values are consecutive). Since you only require 3 consecutive 0, we can use table joining. Check the steps below:

    -- First of all, we shall get the initial id which satifies 3 consecutive 0 as the reservation value:
    select t1.id
    from test t1
    join test t2
    on t2.id=t1.id+1
    join test t3
    on t3.id=t2.id+1
    where t1.reservation=0 and t2.reservation=0 and t3.reservation=0
    order by id
    limit 1
    ; 
    
    -- result:
    +------+
    | id   |
    +------+
    |    6 |
    +------+
    
    

    Then , Use CASE to replace the reservation value whose id has the value between the calculated initial id and two ids right after it by using BETWEEN.

    select id, 
    case when id between (select t1.id
                        from test t1
                        join test t2
                        on t2.id=t1.id+1
                        join test t3
                        on t3.id=t2.id+1
                        where t1.reservation=0 and t2.reservation=0 and t3.reservation=0
                        order by id
                        limit 1) 
                    AND
                        (select t1.id
                        from test t1
                        join test t2
                        on t2.id=t1.id+1
                        join test t3
                        on t3.id=t2.id+1
                        where t1.reservation=0 and t2.reservation=0 and t3.reservation=0
                        order by id
                        limit 1) +2 
        then 18
        else reservation end as reservation
    from test
    order by id;    
    
    -- result:
    +------+-------------+
    | id   | reservation |
    +------+-------------+
    |    1 |          15 |
    |    2 |           0 |
    |    3 |           0 |
    |    4 |          16 |
    |    5 |          16 |
    |    6 |          18 |
    |    7 |          18 |
    |    8 |          18 |
    |    9 |          17 |
    |   10 |          17 |
    |   11 |          17 |
    |   12 |          17 |
    |   13 |          17 |
    |   14 |          17 |
    |   15 |           0 |
    |   16 |           0 |
    |   17 |           0 |
    |   18 |           0 |
    +------+-------------+
    

    By the way, if ID is indeed consecutive and unique, please make sure it’s made the table’s PRIMARY KEY to improve performance.

    -- without any index for id, here is the execution plan, which resort to full table scan for every single table:
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
    | id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
    |  1 | PRIMARY     | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |   100.00 | Using filesort                                     |
    |  3 | SUBQUERY    | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |    10.00 | Using where; Using temporary; Using filesort       |
    |  3 | SUBQUERY    | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |     9.09 | Using where; Using join buffer (Block Nested Loop) |
    |  3 | SUBQUERY    | t3    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |     9.09 | Using where; Using join buffer (Block Nested Loop) |
    |  2 | SUBQUERY    | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |    10.00 | Using where; Using temporary; Using filesort       |
    |  2 | SUBQUERY    | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |     9.09 | Using where; Using join buffer (Block Nested Loop) |
    |  2 | SUBQUERY    | t3    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   11 |     9.09 | Using where; Using join buffer (Block Nested Loop) |
    +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
    
    
    -- after id set to PK, execution plan significantly improves by using PK and eq_ref as the access type:
    +----+-------------+-------+------------+--------+---------------+---------+---------+------+------+----------+-------------+
    | id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
    +----+-------------+-------+------------+--------+---------------+---------+---------+------+------+----------+-------------+
    |  1 | PRIMARY     | test  | NULL       | index  | NULL          | PRIMARY | 4       | NULL |   18 |   100.00 | NULL        |
    |  3 | SUBQUERY    | t1    | NULL       | index  | NULL          | PRIMARY | 4       | NULL |   18 |    10.00 | Using where |
    |  3 | SUBQUERY    | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | func |    1 |    10.00 | Using where |
    |  3 | SUBQUERY    | t3    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | func |    1 |    10.00 | Using where |
    |  2 | SUBQUERY    | t1    | NULL       | index  | NULL          | PRIMARY | 4       | NULL |   18 |    10.00 | Using where |
    |  2 | SUBQUERY    | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | func |    1 |    10.00 | Using where |
    |  2 | SUBQUERY    | t3    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | func |    1 |    10.00 | Using where |
    +----+-------------+-------+------------+--------+---------------+---------+---------+------+------+----------+-------------+
     
    -- this is the query time and number of examined rows without a PK:
    Query_time: 0.001728  Lock_time: 0.000602 Rows_sent: 18  Rows_examined: 152
    
    -- this is the query time and number of examined rows with ID being PK:
    Query_time: 0.001666  Lock_time: 0.001178 Rows_sent: 18  Rows_examined: 40
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search