I am trying to find a way to improve performance of ‘Load data infile’ command for InnoDB table, which is extremely slow. Generally I have been using MyISAM for more than 10 years, because this engine by design is the best option for tables with rare writes (once per day without concurency) and many reads (select queries). But with the time MyISAM has been left in the corner in favour of InnoDB. I decided to give InnoDB a try, because I would like to use partitioning. And this was the point when my headaches started with the attempts to achieve reasonable speed for loading of data.
My table has 860 columns and it cannot be normalized, because all columns are with equal priority and all of them are used to generate a result set from a query. The table has 4 indexes and a primary key, which consists of 2 columns – object and time. Every day about 80k rows are being inserted in the table, which results of close to 100 million rows for 3 years time. Edit: More information about the table’s structure can be found at the end of the question. No partitioning was implemented at the moment.
I did a few experiemnts playing with differents parameters and one MySQL tool. The experiment setup is the importing of 51000 rows in a blank InnoDB table with the properties mentioned above. The reference was the importing time in MyISAM table which was just 12 seconds. The best importing time that I achieved in the same table, but with InnoDB engine was 12 minutes, meaning 60 times slower, which is not tolerable at all.
The steps I did for the experiment are listed below. Apparently something else should be possible to be done, but I am not aware what and any suggestions will be welcome. My aim is to get importing times close to the ones for MyISAM table, otherwise InnoDB will be of no use to me.
- Importing time – 17 min 09 sec:
innodb_flush_log_at_trx_commit=1 /* (the default value) */
innodb_buffer_pool_size=0.125 GB /* (the default value) */
- Importing time – 12 min 04 sec:
innodb_flush_log_at_trx_commit=1 /* (the default value) */
innodb_buffer_pool_size=7 GB /* (out of 8 GB available) */
- Importing time – 11 min 59 sec:
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=7 GB /* (out of 8 GB available) */
- Importing time – 11 min 47 sec:
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=7 GB /* (out of 8 GB available) */
- Importing time – 10 min 59 sec:
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=7 GB /* (out of 8 GB available) */
/* all indexes were deleted prior to importing */
- Importing time – 03 min 15 sec (the fastest but not a usable solution):
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=7 GB /* (out of 8 GB available) */
/* all indexes were deleted prior to importing */
/* primary key was deleted prior to importing */
As can be seen above most improvement provides the increase of innodb_buffer_pool_size. Some improvement can be achieved with disabling of indexes before importing and after it finishes to enable them again (but this process will take probably some additional time). Still the improvements are so minor compared to MyISAM table performance. Obviously most time is spent with dealing with primary key, but there is no option not to have it.
Further experiments:
-
Importing time – 14 min. Usage of util.import_table in MySQL shell. 4 threads were used for paralel imports. 2 attempts were made with innodb_flush_log_at_trx_commit=1 and with innodb_flush_log_at_trx_commit=2. Importing time was the same in both cases – 14 minutes.
-
Importing time – 13 min 20 sec. Usage of autocommit = 0 as suggested by MySQL documentation. Setup was with following parameters: innodb_flush_log_at_trx_commit=2 and innodb_buffer_pool_size=7 GB
SET autocommit=0;
Load data infile '//Import_tools//process_hua//export1.csv' into table testdb.h_cell_inno FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY 'n';
COMMIT;
Edit: One more experiment was done with reduced number of columns. I left just 60 out of 860 columns in the table and also shortened the input CSV file to contain just 60 records per row. Results: Importing to MyISAM table took 2 seconds and to InnoDB it took 50 seconds. (Setup was with following parameters: innodb_flush_log_at_trx_commit=2 and innodb_buffer_pool_size=6 GB)
Worth mentioning is another comparison between MyISAM and InnoDB importing. After the first loading of data to MyISAM table every next import of 51k rows took approximately the same time – bteween 8 and 10 seconds. But when I loaded one more chunk of 51k rows to the InnoDB table it took more than twice the time spent to inject the first portion – 26 min vs. 12 min. This is also quite a problem considering that the table will be populated with a lot of data.
Edit: Now the results of SHOW CREATE TABLE and SHOW TABLE STATUS statements are included in the question. SHOW CREATE TABLE is shortend and is showing just a few columns with all the datatypes used.
CREATE TABLE `h_cell_inno` (
`Time` date NOT NULL,
`Site` int NOT NULL,
`Cell` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
`LocalCI` int NOT NULL,
`Object` int NOT NULL,
`Integrity` varchar(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
`Counter1` int DEFAULT NULL,
`Counter2` bigint DEFAULT NULL,
`Counter3` double DEFAULT NULL,
..................
`Counter863` int DEFAULT NULL,
PRIMARY KEY (`Cell`,`Time`) USING BTREE,
KEY `Index1` (`Cell`,`Time`) USING BTREE,
KEY `Index2` (`Site`,`CI`,`Time`) USING BTREE,
KEY `Index3` (`Time`,`CI`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+--------------------+---------+
| h_cell_inno | InnoDB | 10 | Dynamic | 36741 | 7756 | 284983296 | 0 | 8454144 | 6291456 | NULL | 2024-01-17 11:04:24 | NULL | NULL | utf8mb4_0900_ai_ci | NULL | row_format=DYNAMIC | |
+-------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+--------------------+---------+
Is there anything else which I could do to improve the speed of data loading in InnoDB? If not should I change MySQL database (like PostgreSQL for example), which could provide better import speeds, ACID and partitioning, while keeping select queries fast enough?
2
Answers
The
PRIMARY KEY
is aUNIQUE
key is anINDEX
. That is, yourIndex1
is redundant and should be dropped.What is
CI
?If the data is "sensor type, the
DOUBLE
(8 bytes, 16 significant digits) is probably overkill; consider usingFLOAT
(4 bytes, 7 significant digits).Similarly for
BIGINT
andINT
.Eh? Rows seem to be inserted every second, but
Time
is onlyDATE
?Cell
is up to 82 bytes; a good candidate for normalizing.With so many columns, there may be only 1 or 2 rows per node in the BTree; shrinking the datatypes is very important.
Ivaylo,
I totally agree with Rick James, INDEX1 is redundant and totally WASTES space in your table, CPU cycles for every insert of a row. Delete INDEX1 from your table, please.
We need to find out more about your CELL column. This query will answer my questions about data Maximum length and Average length, if you will let it run through your COMPLETE table content and post TEXT results.
sel-MLALh_cell_inno.sql MLAL last upd 2024-01-18 WLH
original 2024-01-18 WLH
if this helps you, please email SHAREABLE comments to [email protected]
SELECT
‘Begin Time’,NOW(),DATABASE(),’ ‘,COUNT(*),’ ‘,
MAX(LENGTH(cell)),AVG(LENGTH(cell)),’cell’,’ ‘
FROM testdb.h_cell_inno
G;
SELECT ‘ End Time’,NOW();