We have a long standing problem with a certain deadlock in our DB. The conflicting queries are almost always these two:
UPDATE db.tbl
SET pid = 68111
WHERE pid IS NULL
AND processed IS NULL
AND `time` <= now()
LIMIT 10
and
UPDATE `db`.`tbl`
SET `processed` = -1,
`updated_on` = NOW()
WHERE (`tbl`.`id` = 108588129)
The pid
, id
and processed
values vary, but the queries are the same.
What I don’t understand is what exactly could deadlock there, since the second query updates a specific record by its ID. There’s just one lock to be had, but you need at least 2 to create a deadlock. There are also no transactions that would lock other rows – each of these queries is standalone.
Why is this happening and how could I avoid it?
Anonymized/shortened output of show engine status innodb;
:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-06-08 08:16:27 0x7f70f3ef1700
*** (1) TRANSACTION:
TRANSACTION 2857804352, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2480 lock struct(s), heap size 286928, 9581 row lock(s)
MySQL thread id 25966272, OS thread handle 140144661681920, query id 847014117 x.x.x.x db_user updating
UPDATE db.tbl SET pid = ''68111'' WHERE pid IS NULL AND processed IS NULL AND `time` <= now() LIMIT 10
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 2714 page no 86552 n bits 1552 index processed of table `db`.`tbl` trx id 2857804352 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: SQL NULL;
1: len 4; hex 8678cf9b; asc x ;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: SQL NULL;
1: len 4; hex 8678cfad; asc x ;;
<Snip a lot of Record Locks just like the one above>
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2714 page no 113955 n bits 160 index PRIMARY of table `db`.`tbl` trx id 2857804352 lock_mode X locks rec but not gap waiting
Record lock, heap no 92 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
0: len 4; hex 8678ec61; asc x a;;
1: len 6; hex 0000aa56a25d; asc V ];;
2: len 7; hex 0100002bc01686; asc + ;;
3: len 4; hex 803ff583; asc ? ;;
4: len 4; hex 73746f70; asc stop;;
5: len 4; hex 80000002; asc ;;
6: len 1; hex 83; asc ;;
7: len 4; hex 80005737; asc W7;;
8: len 6; hex 4b4a38363431; asc KJ8641;;
9: len 8; hex 80000008a515e59b; asc ;;
10: len 4; hex 800001be; asc ;;
11: len 5; hex 99b050b41a; asc P ;;
12: len 4; hex 84ad8892; asc ;;
13: len 3; hex 736d73; asc sms;;
14: SQL NULL;
15: SQL NULL;
16: len 1; hex 7f; asc ;;
17: SQL NULL;
18: SQL NULL;
19: SQL NULL;
20: len 4; hex 53746f70; asc Stop;;
21: len 5; hex 99b050b41b; asc P ;;
22: len 5; hex 99b050b41b; asc P ;;
*** (2) TRANSACTION:
TRANSACTION 2857804381, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 25966282, OS thread handle 140122483259136, query id 847014368 x.x.x.x other_db_user updating
UPDATE `db`.`tbl` SET `processed` = ''-1'', `updated_on` = NOW() WHERE (`tbl`.`id` = 108588129)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2714 page no 113955 n bits 160 index PRIMARY of table `db`.`tbl` trx id 2857804381 lock_mode X locks rec but not gap
Record lock, heap no 92 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
0: len 4; hex 8678ec61; asc x a;;
1: len 6; hex 0000aa56a25d; asc V ];;
2: len 7; hex 0100002bc01686; asc + ;;
3: len 4; hex 803ff583; asc ? ;;
4: len 4; hex 73746f70; asc stop;;
5: len 4; hex 80000002; asc ;;
6: len 1; hex 83; asc ;;
7: len 4; hex 80005737; asc W7;;
8: len 6; hex 4b4a38363431; asc KJ8641;;
9: len 8; hex 80000008a515e59b; asc ;;
10: len 4; hex 800001be; asc ;;
11: len 5; hex 99b050b41a; asc P ;;
12: len 4; hex 84ad8892; asc ;;
13: len 3; hex 736d73; asc sms;;
14: SQL NULL;
15: SQL NULL;
16: len 1; hex 7f; asc ;;
17: SQL NULL;
18: SQL NULL;
19: SQL NULL;
20: len 4; hex 53746f70; asc Stop;;
21: len 5; hex 99b050b41b; asc P ;;
22: len 5; hex 99b050b41b; asc P ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2714 page no 86552 n bits 1552 index processed of table `db`.`tbl` trx id 2857804381 lock_mode X locks rec but not gap waiting
Record lock, heap no 1462 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: SQL NULL;
1: len 4; hex 8678ec61; asc x a;;
*** WE ROLL BACK TRANSACTION (2)
Also, a simplified table definition:
CREATE TABLE `tbl` (
`id` int NOT NULL AUTO_INCREMENT,
`pid` int DEFAULT NULL,
`processed` tinyint DEFAULT NULL,
`time` datetime DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`other` varchar(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `ix_other` (`other`),
KEY `ix_processed` (`processed`),
KEY `ix_time` (`time`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I should also note that there are many rows (currently over 5000, but the number constantly varies during the day) with processed=null, pid=null
but time
in the future.
Added: A bit more context, as per comments:
There are currently a little over 5 million rows in this table. It’s a sort of a queue and we get roughly 1.4M rows per month. The earliest rows are in February. When time
is in the future, it’s almost always in the same day.
pid
and processed
are related to the queue. When a row (job) is inserted, both pid
and processed
are null
. time
is when the job should be performed (some jobs need to be performed only AFTER a specified moment in time; others need to be performed ASAP).
When one of the processors is ready to execute some jobs, it starts by running the first query and "reserving" a batch of 10 jobs for itself by settings its PID (Linux process ID) to those rows. It then selects those rows and works through them one-by-one, setting the processed
flag when it’s done (the second query).
However sometimes those two queries clash and we get a deadlock.
3
Answers
I have a very limited understanding of locking and deadlocks, so this is over-simplified, as I am not sure "which lock is requested when" (initial secondary index scan, PRIMARY and then other secondary indices impacted by update?) by the main
UPDATE
query:Session 1 sends the initial
UPDATE
query, chooses theix_processed
index as the most selective, and starts locking the index rows, until it has gathered enough rows to satisfy the query.Session 2 sends the
UPDATE on PK
query which immediately gets the x-lock on row in the clustered index.Session 2 tries to apply the update to
processed
column which requires x-lock onix_processed
but it is already locked by Session 1, so it waits.Session 1 tries to get the x-lock on the clustered index for rows gathered in
1.
, but the row being updated is locked by Session 2 (2.
), so it waits.💣 deadlock 💣
Your current
UPDATE
requires a lot more locks than you might think.When using the repeatable-read isolation level (
select @@session.tx_isolation
), which is the default, the exclusive lock will be placed on every row examined for the UPDATE, not just those that are updated.You can test this with something like:
Adding a composite index on
(processed, pid, time)
will reduce the number of locks required for theUPDATE
statement to just 10.If you do not want to add the index, then you could try setting a distinct value for the
processed
column (-2 maybe) in theUPDATE
, as this would mean it is immediately excluded from subsequent updates.If you are using MySQL 8, another approach might be to use
SELECT ... FOR UPDATE SKIP LOCKED
:I would suggest updating your update
To an update where ID IN (select). At least my thinking about it. The update is locking while it is trying to gather and figure out what it SHOULD update. By pre-querying what you want as a select, and updating based on those IDs, the Select does not lock, finishes getting the few records it needs, then applies the update to them by their key ID.
So the time index is used first for the select (should be fast), then the update is done based on the PKID returned from the select. Something like
Now to optimize the Select query, I would have a single index to include the parts it needs without having to go to the raw data pages. So instead of your index on just the TIME column, I would have an index on
So the entire where is qualified, but also has the ID for returning the outer update WHERE ID IN condition.
"… Understanding …"
Without a suitable index (see user1191247 answer), too many rows will be locked, leading to a deadlock.
If you have a recent version of MySQL,
EXPLAIN UPDATE ...
will provide more information about what index is being used.(I doubt if using a subquery will help.)