skip to Main Content

We have a long standing problem with a certain deadlock in our DB. The conflicting queries are almost always these two:

UPDATE db.tbl 
    SET pid = 68111
WHERE pid IS NULL 
AND processed IS NULL 
AND `time` <= now() 
LIMIT 10

and

UPDATE `db`.`tbl` 
    SET `processed` = -1,
        `updated_on` = NOW() 
WHERE (`tbl`.`id` = 108588129)

The pid, id and processed values vary, but the queries are the same.

What I don’t understand is what exactly could deadlock there, since the second query updates a specific record by its ID. There’s just one lock to be had, but you need at least 2 to create a deadlock. There are also no transactions that would lock other rows – each of these queries is standalone.

Why is this happening and how could I avoid it?

Anonymized/shortened output of show engine status innodb;:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-06-08 08:16:27 0x7f70f3ef1700
*** (1) TRANSACTION:
TRANSACTION 2857804352, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2480 lock struct(s), heap size 286928, 9581 row lock(s)
MySQL thread id 25966272, OS thread handle 140144661681920, query id 847014117 x.x.x.x db_user updating
UPDATE db.tbl SET pid = ''68111'' WHERE pid IS NULL AND processed IS NULL AND `time` <= now() LIMIT 10

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 2714 page no 86552 n bits 1552 index processed of table `db`.`tbl` trx id 2857804352 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: SQL NULL;
 1: len 4; hex 8678cf9b; asc  x  ;;

Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: SQL NULL;
 1: len 4; hex 8678cfad; asc  x  ;;

<Snip a lot of Record Locks just like the one above>

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2714 page no 113955 n bits 160 index PRIMARY of table `db`.`tbl` trx id 2857804352 lock_mode X locks rec but not gap waiting
Record lock, heap no 92 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
 0: len 4; hex 8678ec61; asc  x a;;
 1: len 6; hex 0000aa56a25d; asc    V ];;
 2: len 7; hex 0100002bc01686; asc    +   ;;
 3: len 4; hex 803ff583; asc  ?  ;;
 4: len 4; hex 73746f70; asc stop;;
 5: len 4; hex 80000002; asc     ;;
 6: len 1; hex 83; asc  ;;
 7: len 4; hex 80005737; asc   W7;;
 8: len 6; hex 4b4a38363431; asc KJ8641;;
 9: len 8; hex 80000008a515e59b; asc         ;;
 10: len 4; hex 800001be; asc     ;;
 11: len 5; hex 99b050b41a; asc   P  ;;
 12: len 4; hex 84ad8892; asc     ;;
 13: len 3; hex 736d73; asc sms;;
 14: SQL NULL;
 15: SQL NULL;
 16: len 1; hex 7f; asc  ;;
 17: SQL NULL;
 18: SQL NULL;
 19: SQL NULL;
 20: len 4; hex 53746f70; asc Stop;;
 21: len 5; hex 99b050b41b; asc   P  ;;
 22: len 5; hex 99b050b41b; asc   P  ;;


*** (2) TRANSACTION:
TRANSACTION 2857804381, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 25966282, OS thread handle 140122483259136, query id 847014368 x.x.x.x other_db_user updating
UPDATE `db`.`tbl` SET `processed` = ''-1'', `updated_on` = NOW() WHERE (`tbl`.`id` = 108588129)

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2714 page no 113955 n bits 160 index PRIMARY of table `db`.`tbl` trx id 2857804381 lock_mode X locks rec but not gap
Record lock, heap no 92 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
 0: len 4; hex 8678ec61; asc  x a;;
 1: len 6; hex 0000aa56a25d; asc    V ];;
 2: len 7; hex 0100002bc01686; asc    +   ;;
 3: len 4; hex 803ff583; asc  ?  ;;
 4: len 4; hex 73746f70; asc stop;;
 5: len 4; hex 80000002; asc     ;;
 6: len 1; hex 83; asc  ;;
 7: len 4; hex 80005737; asc   W7;;
 8: len 6; hex 4b4a38363431; asc KJ8641;;
 9: len 8; hex 80000008a515e59b; asc         ;;
 10: len 4; hex 800001be; asc     ;;
 11: len 5; hex 99b050b41a; asc   P  ;;
 12: len 4; hex 84ad8892; asc     ;;
 13: len 3; hex 736d73; asc sms;;
 14: SQL NULL;
 15: SQL NULL;
 16: len 1; hex 7f; asc  ;;
 17: SQL NULL;
 18: SQL NULL;
 19: SQL NULL;
 20: len 4; hex 53746f70; asc Stop;;
 21: len 5; hex 99b050b41b; asc   P  ;;
 22: len 5; hex 99b050b41b; asc   P  ;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2714 page no 86552 n bits 1552 index processed of table `db`.`tbl` trx id 2857804381 lock_mode X locks rec but not gap waiting
Record lock, heap no 1462 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: SQL NULL;
 1: len 4; hex 8678ec61; asc  x a;;

*** WE ROLL BACK TRANSACTION (2)

Also, a simplified table definition:

CREATE TABLE `tbl` (
  `id` int NOT NULL AUTO_INCREMENT,
  `pid` int DEFAULT NULL,
  `processed` tinyint DEFAULT NULL,
  `time` datetime DEFAULT NULL,
  `created_on` datetime DEFAULT NULL,
  `updated_on` datetime DEFAULT NULL,

  `other` varchar(10) NOT NULL,
  
  PRIMARY KEY (`id`),
  KEY `ix_other` (`other`),
  KEY `ix_processed` (`processed`),
  KEY `ix_time` (`time`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

I should also note that there are many rows (currently over 5000, but the number constantly varies during the day) with processed=null, pid=null but time in the future.

Added: A bit more context, as per comments:

There are currently a little over 5 million rows in this table. It’s a sort of a queue and we get roughly 1.4M rows per month. The earliest rows are in February. When time is in the future, it’s almost always in the same day.

pid and processed are related to the queue. When a row (job) is inserted, both pid and processed are null. time is when the job should be performed (some jobs need to be performed only AFTER a specified moment in time; others need to be performed ASAP).

When one of the processors is ready to execute some jobs, it starts by running the first query and "reserving" a batch of 10 jobs for itself by settings its PID (Linux process ID) to those rows. It then selects those rows and works through them one-by-one, setting the processed flag when it’s done (the second query).

However sometimes those two queries clash and we get a deadlock.

3

Answers


  1. I have a very limited understanding of locking and deadlocks, so this is over-simplified, as I am not sure "which lock is requested when" (initial secondary index scan, PRIMARY and then other secondary indices impacted by update?) by the main UPDATE query:

    1. Session 1 sends the initial UPDATE query, chooses the ix_processed index as the most selective, and starts locking the index rows, until it has gathered enough rows to satisfy the query.

    2. Session 2 sends the UPDATE on PK query which immediately gets the x-lock on row in the clustered index.

    3. Session 2 tries to apply the update to processed column which requires x-lock on ix_processed but it is already locked by Session 1, so it waits.

    4. Session 1 tries to get the x-lock on the clustered index for rows gathered in 1., but the row being updated is locked by Session 2 (2.), so it waits.

    5. 💣 deadlock 💣


    Your current UPDATE requires a lot more locks than you might think.

    When using the repeatable-read isolation level (select @@session.tx_isolation), which is the default, the exclusive lock will be placed on every row examined for the UPDATE, not just those that are updated.

    You can test this with something like:

    START TRANSACTION;
    
    UPDATE tbl 
        SET pid = 68111
    WHERE pid IS NULL 
    AND processed IS NULL 
    AND `time` <= now() 
    LIMIT 10;
    
    SELECT dl.object_name, dl.index_name, dl.lock_type, dl.lock_mode, dl.lock_status, dl.lock_data, tbl.*
    FROM performance_schema.data_locks dl
    JOIN performance_schema.threads t ON dl.THREAD_ID = t.THREAD_ID
    JOIN tbl ON tbl.id = dl.lock_data
    WHERE dl.object_name = 'tbl'
    AND dl.index_name = 'PRIMARY'
    AND t.PROCESSLIST_ID = CONNECTION_ID();
    
    ROLLBACK;
    

    Adding a composite index on (processed, pid, time) will reduce the number of locks required for the UPDATE statement to just 10.

    ALTER TABLE tbl ADD INDEX ix_processed_pid_time (processed, pid, time);
    

    If you do not want to add the index, then you could try setting a distinct value for the processed column (-2 maybe) in the UPDATE, as this would mean it is immediately excluded from subsequent updates.

    If you are using MySQL 8, another approach might be to use SELECT ... FOR UPDATE SKIP LOCKED:

    START TRANSACTION;
    
    SELECT id FROM tbl
    WHERE pid IS NULL AND processed IS NULL AND `time` <= NOW()
    LIMIT 10
    FOR UPDATE SKIP LOCKED;
    
    UPDATE tbl
        SET pid = 68111,
            processed = -2
    WHERE id IN ( ... );
    
    COMMIT;
    
    Login or Signup to reply.
  2. I would suggest updating your update

    UPDATE db.tbl 
        SET pid = 68111
    WHERE pid IS NULL 
    AND processed IS NULL 
    AND `time` <= now() 
    LIMIT 10
    

    To an update where ID IN (select). At least my thinking about it. The update is locking while it is trying to gather and figure out what it SHOULD update. By pre-querying what you want as a select, and updating based on those IDs, the Select does not lock, finishes getting the few records it needs, then applies the update to them by their key ID.

    So the time index is used first for the select (should be fast), then the update is done based on the PKID returned from the select. Something like

    UPDATE db.tbl 
       SET pid = 68111
       WHERE id in ( select t2.id
                        from db.tbl t2
                        where t2.time <= now()
                          and t2.pid is null
                          and t2.processed is null 
                        limit 10 )    
    

    Now to optimize the Select query, I would have a single index to include the parts it needs without having to go to the raw data pages. So instead of your index on just the TIME column, I would have an index on

    (time, pid, processed, id )
    

    So the entire where is qualified, but also has the ID for returning the outer update WHERE ID IN condition.

    Login or Signup to reply.
  3. "… Understanding …"

    Without a suitable index (see user1191247 answer), too many rows will be locked, leading to a deadlock.

    If you have a recent version of MySQL, EXPLAIN UPDATE ... will provide more information about what index is being used.

    (I doubt if using a subquery will help.)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search