skip to Main Content

I have a MySQL database table with 15 columns. It has 2 indexes; a primary which is an int (auto increment), and a compound unique index across 4 columns, int, int, int and datetime.

The table currently has 5.5M entries, and each new multi INSERT (1.2M rows) is taking less than 2 minutes when importing a SQL data file using the following method…

mysql -h localhost -P 3306 -u test_user -D my_database -C < data.sql

However, when I try to perform a similar process to the above, but using a multi UPDATE (only 2,000 rows!) using a single statement and only updating 1 column, e.g…

UPDATE table SET col15 = (CASE WHEN col2=x AND col3=y AND col4=z and col5=d THEN 'a' ... WHEN col2=x AND col3=y AND col4=z and col5=d THEN 'b' END);

…it is taking over 9 minutes (NOTE: the columns in the WHEN clause are the columns that make up the unique index).

This doesn’t sound right to me. I appreciate that using a unique index will be slower than using the primary, but there is still a huge difference in performance between these 2 results!

Any thoughts?

2

Answers


  1. True.

    Updating 2000 rows requires:

    • Without a WHERE clause, all 5.5M rows will be checked.
    • No index is useful for CASE.
    • Keeping a copy of each modified row — in case it needs to be ROLLBACK’d due to crash, etc.
    • Updating any indexes that include col5.

    9 minutes seem high, but you have a combination of checking 5.5M rows and taking extra time on 2000 of them.

    Assuming you really meant for testing col2/3/4 the same way, but not col5, I suggest this would be much better:

    UPDATE  table SET col15 = ( CASE 
              WHEN  col5=d THEN 'a' ... 
              WHEN  col5=e THEN 'b' END )
        WHERE  col2=x
          AND  col3=y
          AND  col4=z
    

    together with

    INDEX(col2, col3, col4)   -- in any order
    

    The important thing is to have a WHERE that filters. If there is nothing common between the two WHENs, the it will be faster to use two UPDATEs. If you need them to happen ‘simultaneously’, then put them in a transaction (BEGIN...COMMIT) to make the two Updates "atomic".

    Login or Signup to reply.
  2. Will be way faster if you separate your UPDATE statement per condition; breaking it down to multiple updates for each case. Also, filtering out with date ranges first before the other comparisons, for huge data, should bring an advantage.

    One thing to check is, from the table structure, that the indexes are created with BTREE, not HASH.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search