MySQL update via join - why is it slow?

daihop
March 31, 2023
289 views
3 votes
4 Answers

I have the following query:

UPDATE
    `temp_table`,
    `sms`
SET
    `sms`.`to` = `temp_table`.`new_value`
WHERE
    `sms`.`to` = `temp_table`.`old_value`;

temp_table has ~200,000 rows.

sms has ~2,000,000 rows.

sms.to, temp_table.new_value, and temp_table.old_value are VARCHAR(255) with indexes.

Values are UK phone numbers.

The update query is slow it never completes. Does anyone know why?

Explain:

Text Results for Creating Tables:

From phpMyAdmin

CREATE TABLE `temp_table` (
    `old_value` varchar(255) DEFAULT NULL,
    `new_value` varchar(255) DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = latin1 COLLATE = latin1_swedish_ci;

ALTER TABLE `temp_table`
    ADD KEY `old_value` (`old_value`),
    ADD KEY `new_value` (`new_value`);

CREATE TABLE `sms` (
    `id` int(11) NOT NULL,
    `branch_id` int(11) DEFAULT NULL,
    `customer_id` int(11) DEFAULT NULL,
    `message_id` int(11) DEFAULT NULL,
    `message` text DEFAULT NULL,
    `from` varchar(255) DEFAULT NULL,
    `to` varchar(255) DEFAULT NULL,
    `status` varchar(255) DEFAULT NULL,
    `created_at` int(11) DEFAULT NULL,
    `updated_at` int(11) DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = utf8 COLLATE = utf8_general_ci;

ALTER TABLE `sms`
    ADD PRIMARY KEY (`id`),
    ADD KEY `idx-sms-branch_id` (`branch_id`),
    ADD KEY `idx-sms-customer_id` (`customer_id`),
    ADD KEY `idx-sms-message_id` (`message_id`),
    ADD KEY `idx-sms-to` (`to`),
    ADD KEY `idx-sms-created_at` (`created_at`),
    ADD KEY `idx-sms-updated_at` (`updated_at`);

ALTER TABLE `sms`
    MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

Answers

- RickJames
- April 5, 2023 at 6:26 pm
- 0 votes
0
- If the Optimizer chooses to walk through sms (and then JOIN to the other table), then it would be better for temp_table to have INDEX(old_value, new_value) _instead of INDEX(old_value).
- Why are both message_id and the message text in this table? Possibly having that pair in another table would make this query much faster. (Then have only the message_id in this table. This is especially important if the text is often quite large.
- UPDATEing 200K rows all at once takes much time and RAM. One plan is to do the updates 1K at a time. See chunking.
I can’t predict which of these three suggestions will help the most. I recommend doing all of the, if practical.
Login or Signup to reply.

- davidao
- April 5, 2023 at 7:21 pm
- 0 votes
0
You don’t have an ON clause, but you are using a WHERE clause:

INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).

…

The search_condition used with ON is any conditional expression of the form that can be used in a WHERE clause. Generally, the ON clause serves for conditions that specify how to join tables, and the WHERE clause restricts which rows to include in the result set.

In your case the end result is the same (but in other cases is not). Conceptually the "where" filtering happens after the tables have been joined in a Cartesian product, the query optimizer for sure can do something about it and "push down" the where predicate but I would suggest for starters to instruct MySQL with the dedicated syntax to allow it to build an informed execution plan.

If you notice in the documentation about update it says:

The table_references clause lists the tables involved in the join. Its syntax is described in Section 13.2.13.2, “JOIN Clause”. Here is an example:
```
> UPDATE items,month SET items.price=month.price WHERE
> items.id=month.id;
```
The preceding example shows an inner join that uses the comma
operator, but multiple-table UPDATE statements can use any type of
join permitted in SELECT statements, such as LEFT JOIN.

So the example used in MySQL docs uses one of the possible joins available and that one is the cartesian product
Login or Signup to reply.

The Explain of the Update statement shows the usage of ‘old_value’ (possible_keys) for the temp_table, and the select_type is ‘UPDATE’ for the sms table, both are different from the OP output. (see further below for my output).

Besides that there is also an inconsistency in the COLLATE and CHARSET definitions for the two tables, which could also have some sideeffects in performance.

describe UPDATE     `temp_table`,     `sms` SET     `sms`.`to` = `temp_table`.`new_value` WHERE     `sms`.`to` = `temp_table`.`old_value`;

+----+-------------+------------+------------+------+---------------+------------+---------+--------------------------+-------+----------+-------------+
| id | select_type | table      | partitions | type | possible_keys | key        | key_len | ref                      | rows  | filtered | Extra       |
+----+-------------+------------+------------+------+---------------+------------+---------+--------------------------+-------+----------+-------------+
|  1 | SIMPLE      | temp_table | NULL       | ALL  | old_value     | NULL       | NULL    | NULL                     | 21702 |   100.00 | Using where |
|  1 | UPDATE      | sms        | NULL       | ref  | idx-sms-to    | idx-sms-to | 768     | sms.temp_table.old_value |     1 |   100.00 | Using where |
+----+-------------+------------+------------+------+---------------+------------+---------+--------------------------+-------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

Testing with about 20k sms with 20k temp_table, my humble machine does it in 0.63seconds, where all records are updated. Note: only the relevant fields for the update query has been with unique values.

The Update:

UPDATE     `temp_table`,     `sms` SET     `sms`.`to` = `temp_table`.`new_value` WHERE     `sms`.`to` = `temp_table`.`old_value`;
Query OK, 20000 rows affected (0.68 sec)
Rows matched: 20000  Changed: 20000  Warnings: 0

Table temp_table:

describe temp_table;
+-----------+--------------+------+-----+---------+-------+
| Field     | Type         | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| old_value | varchar(255) | YES  | MUL | NULL    |       |
| new_value | varchar(255) | YES  | MUL | NULL    |       |
+-----------+--------------+------+-----+---------+-------+
2 rows in set (0.01 sec)

mysql  Ver 8.0.32

Good Luck

- farhankureshi71
- April 10, 2023 at 2:21 pm
- 0 votes
0
You can use MySQL Update with Join in your query.
```
UPDATE `sms`
JOIN `temp_table` ON `sms`.`to` = `temp_table`.`old_value`
SET `sms`.`to` = `temp_table`.`new_value`;
```
Also, it would be better if you can indexes the search column "temp_table.old_value".
```
ALTER TABLE `temp_table` ADD INDEX `idx_old_value` (`old_value`);
```
So, search result be faster.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

MySQL update via join – why is it slow?

Answers