I have a MySQL database table with 15 columns. It has 2 indexes; a primary which is an int (auto increment), and a compound unique index across 4 columns, int, int, int and datetime.
The table currently has 5.5M entries, and each new multi INSERT (1.2M rows) is taking less than 2 minutes when importing a SQL data file using the following method…
mysql -h localhost -P 3306 -u test_user -D my_database -C < data.sql
However, when I try to perform a similar process to the above, but using a multi UPDATE (only 2,000 rows!) using a single statement and only updating 1 column, e.g…
UPDATE table SET col15 = (CASE WHEN col2=x AND col3=y AND col4=z and col5=d THEN 'a' ... WHEN col2=x AND col3=y AND col4=z and col5=d THEN 'b' END);
…it is taking over 9 minutes (NOTE: the columns in the WHEN clause are the columns that make up the unique index).
This doesn’t sound right to me. I appreciate that using a unique index will be slower than using the primary, but there is still a huge difference in performance between these 2 results!
Any thoughts?
2
Answers
True.
Updating 2000 rows requires:
WHERE
clause, all 5.5M rows will be checked.CASE
.col5
.9 minutes seem high, but you have a combination of checking 5.5M rows and taking extra time on 2000 of them.
Assuming you really meant for testing col2/3/4 the same way, but not col5, I suggest this would be much better:
together with
The important thing is to have a
WHERE
that filters. If there is nothing common between the two WHENs, the it will be faster to use twoUPDATEs
. If you need them to happen ‘simultaneously’, then put them in a transaction (BEGIN...COMMIT
) to make the two Updates "atomic".Will be way faster if you separate your UPDATE statement per condition; breaking it down to multiple updates for each case. Also, filtering out with date ranges first before the other comparisons, for huge data, should bring an advantage.
One thing to check is, from the table structure, that the indexes are created with BTREE, not HASH.