We have IP allocation tasks(currently blocking one) with the intent of improving we have our goal set out to have it non-blocking with the use of SKIP LOCKED.
Our IP allocation tasks run the query that looks like this
SELECT
ip_addresses.id,
ip_addresses.is_active,
ip_addresses.region_id
FROM ip_addresses
WHERE ip_addresses.is_active=0
and ip_addresses.region_id = ?
LIMIT 1 FOR UPDATE SKIP LOCKED;
// do some work and then.
UPDATE ip_addresses, vnics, boxs
SET vnics.ip_address_id = ip_addresses.id,
ip_addresses.is_active = 1,
vnics.updated_at = now(),
ip_addresses.updated_at = now()
WHERE vnics.box_id = boxs.id
AND ip_addresses.id = ?
AND vnics.name = ?
AND boxs.uuid = ?;
after the additions of SKIP LOCKED, we are seeing some spurious random occurrences of deadlock which we cannot comprehend.
I wrote a small test to replicate this in my local system located here and the basic dump exists here.
the only thing I could find is that if I remove the index idx_unique_ip_address_id
I don’t see any deadlock error.
the MySQL server is our pre-prod and local is 8.0.34
explain ip_addresses
+------------+--------------+------+-----+-------------------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+-------------------+-------------------+
| id | int | NO | PRI | NULL | auto_increment |
| is_active | tinyint(1) | YES | MUL | NULL | |
| region_id | int | YES | | NULL | |
| created_at | datetime | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
| updated_at | datetime | YES | | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
| gateway | varchar(255) | YES | | NULL | |
| ip | varchar(255) | YES | | NULL | |
+------------+--------------+------+-----+-------------------+-------------------+
show index from ip_addresses;
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| ip_addresses | 0 | PRIMARY | 1 | id | A | 3 | NULL | NULL | | BTREE | | | YES | NULL |
| ip_addresses | 1 | idx_is_active | 1 | is_active | A | 2 | NULL | NULL | YES | BTREE | | | YES | NULL |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.01 sec)
explain vnics;
+---------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | MUL | NULL | |
| ip_address_id | int | YES | MUL | NULL | |
| box_id | int | YES | MUL | NULL | |
| updated_at | datetime | YES | | NULL | |
| created_at | datetime | YES | | NULL | |
+---------------+--------------+------+-----+---------+----------------+
show index from vnics;
+-------+------------+--------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+--------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| vnics | 0 | PRIMARY | 1 | id | A | 2 | NULL | NULL | | BTREE | | | YES | NULL |
| vnics | 0 | idx_unique_ip_address_id | 1 | ip_address_id | A | 2 | NULL | NULL | YES | BTREE | | | YES | NULL |
| vnics | 1 | idx_vnic_xid | 1 | ip_address_id | A | 2 | NULL | NULL | YES | BTREE | | | YES | NULL |
| vnics | 1 | idx_vnic_zid | 1 | box_id | A | 2 | NULL | NULL | YES | BTREE | | | YES | NULL |
| vnics | 1 | idx_vnic_name | 1 | name | A | 2 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------+------------+--------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
5 rows in set (0.01 sec)
explain boxs;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | | NULL | |
| uuid | varchar(255) | YES | MUL | NULL | |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
show index from boxs;
+-------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| boxs | 0 | PRIMARY | 1 | id | A | 2 | NULL | NULL | | BTREE | | | YES | NULL |
| boxs | 1 | idx_uuid_box | 1 | uuid | A | 2 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2
Answers
BEGIN;
— Step 1: Lock and update IP Address
SELECT id, is_active, region_id
FROM ip_addresses
WHERE is_active = 0 AND region_id = ?
LIMIT 1
FOR UPDATE SKIP LOCKED;
— Do some processing here
— Step 2: Update
ip_addresses
to setis_active
to 1UPDATE ip_addresses
SET is_active = 1, updated_at = NOW()
WHERE id = ?;
— Step 3: Update corresponding
vnics
entryUPDATE vnics
SET ip_address_id = ?, updated_at = NOW()
WHERE box_id = (SELECT id FROM boxs WHERE uuid = ?) AND name = ?;
COMMIT;
Turn on innodb_print_all_deadlocks to log deadlock info to error log.
Read innodb locking documentation at: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html if you haven’t already.
What’s happening is when the secondary index is available, the optimizer will use the secondary index column to find matching row(s). As the execution engine goes through rows based on that index, it puts a lock on the secondary index (you can see this in the deadlock info especially in error log). Secondary indexes are a tuple consisting of the value and a pointer to the primary key. MySQL uses this pointer to find the PK, and locks it. Then it can determine if the row is a match to the rest of the where clause.
While this is happening, if another thread/app (or the same app if multi-threaded) inserts or updates the table, it may need an insert intention lock on the index, a gap lock covering the row(s) locked by FOR UPDATE, or a lock on the index to update the value (which MySQL does by deleting the old tuple and inserting a new tuple; hence the insert intention lock). At least partially influenced by timing, if the PK is locked by thread 2 and the secondary index is locked by thread 1 – thread 1 needs to lock the PK but can’t because thread 2 has it locked, while thread 2 needs to lock the secondary but can’t because thread 1 has it locked. And that’s a deadlock.
MySQL does not (currently) skip locked PK when accessed through a secondary index; so when a second transaction has the PK locked and the first transaction has the secondary index locked, it deadlocks.
The fix is to query without skip locked, update row(s) to set them aside for processing, then do the code processing and come back to the DB to update the row(s); or remove the secondary index if the table isn’t too big for that. It looks like a scenario "skip locked" wasn’t designed to handle.