I know there are a lot related questions with many answers, but I have a bit of a more nuanced question. I have been doing reading on different insert techniques for mass records, but are there limits on how big a query insert can be? Can the same technique be used for REPLACE
and INSERT ...ON DUPLICATE KEY UPDATE ...
? Is there a faster method?
Table:
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| a | int(11) | NO | PRI | NULL | auto_increment |
| b | int(11) | YES | | NULL | |
| c | int(11) | YES | | NULL | |
#1
1) "INSERT INTO TABLE COLUMNS (a,b,c) values (1,2,3);"
2) "INSERT INTO TABLE COLUMNS (a,b,c) values (5,6,7);"
3) "INSERT INTO TABLE COLUMNS (a,b,c) values (8,9,10);"
...
10,000) "INSERT INTO TABLE COLUMNS (a,b,c) values (30001,30002,30003);"
or
#2 – should be faster, but is there a limit?
"INSERT INTO TABLE COLUMNS (a,b,c) values (1,2,3),(4,5,6),(8,9,10)....(30001,30002,30003)" ;
From a scripting perspective (PHP), using #2, is it better to loop through and queue up 100 entries (1000 times)…or a 1000 entries (100 times), or just all 10,000 at once? Could this be done with 100,000 entries?
Can the same be used with REPLACE
:
"REPLACE INTO TABLE (a, b, c) VALUES(1,2,3),(4,5,6)(7,8,9),...(30001,30002,30003);"
Can it also be used with INSERT ON DUPLICATE
?
INSERT INTO TABLE (a, b, c) VALUES(1,2,3),(4,5,6),(7,8,9),....(30001,30002,30003) ON DUPLICATE KEY UPDATE (b=2,c=3)(b=5,c=6),(b=8,c=9),....(b=30002,c=30003) ?
For any and all of the above (assuming the replace/on duplicate are valid), are there faster methods to achieve the inserts?
2
Answers
The length of any SQL statement is limited by a MySQL option called
max_allowed_packet
.The syntax of INSERT allows you to add an unlimited number of tuples after the VALUES clause, but the total length of the statement from
INSERT
to the last tuple must still be no more than the number of bytes equal tomax_allowed_packet
.Regardless of that, I have found that LOAD DATA INFILE is usually significantly faster than any INSERT syntax. It’s so much faster, that you might even find it faster to write your tuples to a temporary CSV file and then use LOAD DATA INFILE on that CSV file.
You might like my presentation comparing different bulk-loading solutions in MySQL: Load Data Fast!
#1 (single-row inserts) — Slow. A variant is
INSERT IGNORE
— beware: it burnsAUTO_INCREMENT
ids.#2 (batch insert) — Faster than #1 by a factor of 10. But do the inserts in batches of no more than 1000. (After that, you are into "diminishing returns" and may conflict with other activities.
#3
REPLACE
— Bad. It is essentially aDELETE
plus anINSERT
. Once IODKU was added to MySQL, I don’t think there is any use forREPLACE
. All the oldAUTO_INCREMENT
ids will be tossed and new ones created.#4 IODKU (Upsert) — [If you need to test before Insert.] It can be batched, but not the way you presented it. (There is no need to repeat the
b
andc
values.)Or, in MySQL 8.0, the last 2 lines are:
IODKU also burns ids.
MySQL LOAD DATA INFILE with ON DUPLICATE KEY UPDATE discusses a 2-step process of
LOAD
+ IODKU. Depending on how complex the "updates" are, 2+ steps may be your best answer.#5
LOAD DATA
— as Bill mentions, this is a good way if the data comes from a file. (I am dubious about its speed if you also have to write the data to a file first.) Be aware of the usefulness of @variables to make minor tweaks as you do the load. (Eg,STR_TO_DATE(..)
to fix aDATE
format.)#6
INSERT ... SELECT ...;
— If the data is already in some other table(s), you may as well combine the Insert and Select. This works for IODKU, too.As a side note, if you need to get
AUTO_INCREMENT
ids of each batched row, I recommend some variant on the following. It is aimed at batch-normalization of id-name pairs that might already exist in the mapping table. Normalization