I have the following query:
UPDATE
`temp_table`,
`sms`
SET
`sms`.`to` = `temp_table`.`new_value`
WHERE
`sms`.`to` = `temp_table`.`old_value`;
temp_table
has ~200,000 rows.
sms
has ~2,000,000 rows.
sms
.to
, temp_table
.new_value
, and temp_table
.old_value
are VARCHAR(255) with indexes.
Values are UK phone numbers.
The update query is slow it never completes. Does anyone know why?
Explain:
Text Results for Creating Tables:
From phpMyAdmin
CREATE TABLE `temp_table` (
`old_value` varchar(255) DEFAULT NULL,
`new_value` varchar(255) DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = latin1 COLLATE = latin1_swedish_ci;
ALTER TABLE `temp_table`
ADD KEY `old_value` (`old_value`),
ADD KEY `new_value` (`new_value`);
CREATE TABLE `sms` (
`id` int(11) NOT NULL,
`branch_id` int(11) DEFAULT NULL,
`customer_id` int(11) DEFAULT NULL,
`message_id` int(11) DEFAULT NULL,
`message` text DEFAULT NULL,
`from` varchar(255) DEFAULT NULL,
`to` varchar(255) DEFAULT NULL,
`status` varchar(255) DEFAULT NULL,
`created_at` int(11) DEFAULT NULL,
`updated_at` int(11) DEFAULT NULL
) ENGINE = InnoDB DEFAULT CHARSET = utf8 COLLATE = utf8_general_ci;
ALTER TABLE `sms`
ADD PRIMARY KEY (`id`),
ADD KEY `idx-sms-branch_id` (`branch_id`),
ADD KEY `idx-sms-customer_id` (`customer_id`),
ADD KEY `idx-sms-message_id` (`message_id`),
ADD KEY `idx-sms-to` (`to`),
ADD KEY `idx-sms-created_at` (`created_at`),
ADD KEY `idx-sms-updated_at` (`updated_at`);
ALTER TABLE `sms`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
4
Answers
If the Optimizer chooses to walk through
sms
(and thenJOIN
to the other table), then it would be better fortemp_table
to haveINDEX(old_value, new_value)
_instead ofINDEX(old_value)
.Why are both
message_id
and the message text in this table? Possibly having that pair in another table would make this query much faster. (Then have only themessage_id
in this table. This is especially important if thetext
is often quite large.UPDATEing
200K rows all at once takes much time and RAM. One plan is to do the updates 1K at a time. See chunking.I can’t predict which of these three suggestions will help the most. I recommend doing all of the, if practical.
You don’t have an ON clause, but you are using a WHERE clause:
…
In your case the end result is the same (but in other cases is not). Conceptually the "where" filtering happens after the tables have been joined in a Cartesian product, the query optimizer for sure can do something about it and "push down" the where predicate but I would suggest for starters to instruct MySQL with the dedicated syntax to allow it to build an informed execution plan.
If you notice in the documentation about update it says:
So the example used in MySQL docs uses one of the possible joins available and that one is the cartesian product
The Explain of the Update statement shows the usage of ‘old_value’ (possible_keys) for the temp_table, and the select_type is ‘UPDATE’ for the sms table, both are different from the OP output. (see further below for my output).
Besides that there is also an inconsistency in the COLLATE and CHARSET definitions for the two tables, which could also have some sideeffects in performance.
Testing with about 20k sms with 20k temp_table, my humble machine does it in 0.63seconds, where all records are updated. Note: only the relevant fields for the update query has been with unique values.
The Update:
Table temp_table:
Good Luck
You can use MySQL Update with Join in your query.
Also, it would be better if you can indexes the search column "temp_table.old_value".
So, search result be faster.