Mysql fastest technique for insert, replace, on duplicate of mass records

rolinger
May 8, 2023
229 views
0 votes
2 Answers

I know there are a lot related questions with many answers, but I have a bit of a more nuanced question. I have been doing reading on different insert techniques for mass records, but are there limits on how big a query insert can be? Can the same technique be used for REPLACE and INSERT ...ON DUPLICATE KEY UPDATE ... ? Is there a faster method?

Table:

+-----------+-------------+------+-----+---------+----------------+
| Field     | Type        | Null | Key | Default | Extra          |
+-----------+-------------+------+-----+---------+----------------+
| a         | int(11)     | NO   | PRI | NULL    | auto_increment |
| b         | int(11)     | YES  |     | NULL    |                |
| c         | int(11)     | YES  |     | NULL    |                |

1) "INSERT INTO TABLE COLUMNS (a,b,c) values (1,2,3);"
2) "INSERT INTO TABLE COLUMNS (a,b,c) values (5,6,7);"
3) "INSERT INTO TABLE COLUMNS (a,b,c) values (8,9,10);"
 ...
10,000) "INSERT INTO TABLE COLUMNS (a,b,c) values (30001,30002,30003);"

#2 – should be faster, but is there a limit?

"INSERT INTO TABLE COLUMNS (a,b,c) values (1,2,3),(4,5,6),(8,9,10)....(30001,30002,30003)" ;

From a scripting perspective (PHP), using #2, is it better to loop through and queue up 100 entries (1000 times)…or a 1000 entries (100 times), or just all 10,000 at once? Could this be done with 100,000 entries?

Can the same be used with REPLACE:

"REPLACE INTO TABLE (a, b, c) VALUES(1,2,3),(4,5,6)(7,8,9),...(30001,30002,30003);"

Can it also be used with INSERT ON DUPLICATE?

INSERT INTO TABLE (a, b, c) VALUES(1,2,3),(4,5,6),(7,8,9),....(30001,30002,30003) ON DUPLICATE KEY UPDATE (b=2,c=3)(b=5,c=6),(b=8,c=9),....(b=30002,c=30003) ?

For any and all of the above (assuming the replace/on duplicate are valid), are there faster methods to achieve the inserts?

Answers

- BillKarwin
- September 12, 2022 at 8:51 pm
- 0 votes
0
The length of any SQL statement is limited by a MySQL option called max_allowed_packet.

The syntax of INSERT allows you to add an unlimited number of tuples after the VALUES clause, but the total length of the statement from INSERT to the last tuple must still be no more than the number of bytes equal to max_allowed_packet.

Regardless of that, I have found that LOAD DATA INFILE is usually significantly faster than any INSERT syntax. It’s so much faster, that you might even find it faster to write your tuples to a temporary CSV file and then use LOAD DATA INFILE on that CSV file.

You might like my presentation comparing different bulk-loading solutions in MySQL: Load Data Fast!

Login or Signup to reply.

- RickJames
- September 14, 2022 at 6:33 am
- 0 votes
0
#1 (single-row inserts) — Slow. A variant is INSERT IGNORE — beware: it burns AUTO_INCREMENT ids.

#2 (batch insert) — Faster than #1 by a factor of 10. But do the inserts in batches of no more than 1000. (After that, you are into "diminishing returns" and may conflict with other activities.

#3 REPLACE — Bad. It is essentially a DELETE plus an INSERT. Once IODKU was added to MySQL, I don’t think there is any use for REPLACE. All the old AUTO_INCREMENT ids will be tossed and new ones created.

#4 IODKU (Upsert) — [If you need to test before Insert.] It can be batched, but not the way you presented it. (There is no need to repeat the b and c values.)
```
INSERT INTO (
INSERT INTO TABLE (a, b, c)
     VALUES(1,2,3),(4,5,6),(7,8,9),....(30001,30002,30003)
     ON DUPLICATE KEY UPDATE
         b = VALUES(b),
         c = VALUES(c);
```
Or, in MySQL 8.0, the last 2 lines are:
```
         b = NEW.b,
         c = NEW.c;
```
IODKU also burns ids.

MySQL LOAD DATA INFILE with ON DUPLICATE KEY UPDATE discusses a 2-step process of LOAD + IODKU. Depending on how complex the "updates" are, 2+ steps may be your best answer.

#5 LOAD DATA — as Bill mentions, this is a good way if the data comes from a file. (I am dubious about its speed if you also have to write the data to a file first.) Be aware of the usefulness of @variables to make minor tweaks as you do the load. (Eg, STR_TO_DATE(..) to fix a DATE format.)

#6 INSERT ... SELECT ...; — If the data is already in some other table(s), you may as well combine the Insert and Select. This works for IODKU, too.

As a side note, if you need to get AUTO_INCREMENT ids of each batched row, I recommend some variant on the following. It is aimed at batch-normalization of id-name pairs that might already exist in the mapping table. Normalization
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.