I have a flat file with 3200000 rows and performance of application decreases as time progresses.
For e.g 100000 rows in 30 sec to 320000 in total time of about 4 hours.
I am using jdbc template for batch saving . Also tried async read with completable future
database used = Mysql. with 2 indexes – 1 primary key , 1 unique index
expected lower time.
2
Answers
"Replacing" a table: If you are really "replacing" the entire table, this is much better.
For example, are you getting a replacement for your table every night? If so there is a much less invasive way to do such. (Not faster, but I suspect that is not the real issue.) The following keeps the
real
table "online" except for a very brief, ACID,RENAME
.LOAD DATA INFILE (applies for either adding or replacing)
If the "flat file" is comma- or tab-separated, then see if
LOAD DATA
can be used. It is very fast.Batch loading
autocommit=ON
.) (Saving 3.2M rows in the undo buffer is costly.)Other notes
Rethink whether you really need two unique indexes (the PK is one). That doubles some of the work of inserting each row.
(This partially addresses the "slow down" part of the question.)
The setting for
innodb_buffer_pool_size
should be about 70% of RAM. (However, if RAM is less than 4GB or you have lots of RAM taken up by other apps, that percentage is too high.)If the buffer_pool grows bigger than the available RAM, then "swapping" comes into play. This is terrible for performance.
How big is the table? If it has lots of big text columns, then it could be quite big. Big implies that it might overflow the buffer_pool and "slow down" as more rows are added.
Indexing with
UUIDs
also can cause a slowdown due to overflowing the buffer_pool. This is because of the randomness of such indexes. More: UUIDsDo you really need both unique indexes? Here’s the issue. When inserting a row, all unique indexes are checked before moving on to the next row.
AUTO_INCREMENT
and not specified — appends to ‘end’ of tableUUID
— Jumps around ‘randomly’.