I am trying to do a bulk insert into DB/2 from an Excel file.
I am running WildFly 12 on my Windows 10 PC. DB/2 is running on a VM under CentOS 7.
Using POI SAX methods, reading the complete XLSX file with 2.5M records takes 80 seconds.
I started off using simple inserts in my EJB program with literals to insert into the table. This took about 1.5 hours.
After reading https://www.idug.org/p/bl/et/blogid=2&blogaid=602 I changed to prepared statements using batches. I also first write to a Global Temporary Table and at the end use “INSERT INTO SELECT … FROM “.
Turned off auto commit.
The end result is still taking up an hour, or 750 inserts / second.
Prepare the items:
// get connection using DriverManager class
try {
connection = DriverManager.getConnection(
"jdbc:db2://192.168.0.5:50000/mydb",
"hussain",
"hussain");
} catch (SQLException e1) {
out.println("Failed to get connection");
e1.printStackTrace();
return;
}
// Clear the table
try {
Statement statement = connection.createStatement();
statement.execute("TRUNCATE TABLE " + strMainTable + " IMMEDIATE");
statement.execute("ALTER TABLE " + strMainTable + " ALTER COLUMN ID RESTART WITH 1 ACTIVATE NOT LOGGED INITIALLY");
statement.close();
} catch (SQLException e2) {
out.println("Failed to clear table<br>");
e2.printStackTrace();
return;
}
try {
Statement statement = connection.createStatement();
statement.execute("DROP TABLE " + strTempTable);
statement.close();
} catch (SQLException e2) {
out.println("Failed to delete table<br>");
}
// In DB2:
// CREATE USER TEMPORARY TABLESPACE USERTEMP MANAGED BY AUTOMATIC STORAGE
// GRANT USE OF TABLESPACE USERTEMP TO USER HUSSAIN
try {
Statement statement = connection.createStatement();
statement.execute("CREATE GLOBAL TEMPORARY TABLE " + strTempTable + " (" +
" COL1 VARCHAR(900 OCTETS), " +
" COL2 VARCHAR(20 OCTETS), " +
" COL3 VARCHAR(225 OCTETS), " +
" COL4 VARCHAR(500 OCTETS)" +
" ) NOT LOGGED ON COMMIT PRESERVE ROWS ");
statement.close();
} catch (SQLException e2) {
out.println("Failed to create table<br>");
e2.printStackTrace();
}
// Step 2.B: Creating JDBC Statement
String sql = "INSERT INTO " + strTempTable + " (COL1, COL2, COL3, COL4) VALUES(?,?,?,?)";
try {
statementInsert = connection.prepareStatement(sql);
connection.setAutoCommit(false);
} catch (SQLException e1) {
out.println("Failed to prepare statement<br>");
e1.printStackTrace();
return;
}
Then, in the loop, I read the XLSX file and
try {
if(batchRows > 10000) {
batchRows = 0;
statementInsert.executeBatch();
}
//statementInsert.execute();
batchRows++;
//lBigBatchRows++;
statementInsert.addBatch();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
At the end:
Statement statement = connection.createStatement();
statement.execute("INSERT INTO " + strMainTable +
" (COL1, COL2, COL3, COL4) " +
" SELECT COL1, COL2, COL3, COL4 FROM " + strTempTable);
statement.execute("DROP TABLE " + strTempTable);
connection.commit();
Yes, DB/2 is running in a VM so write speeds are effected by lots of factors. Still, 750 records / second is extremely slow. The author of the article says his worst speed was 3000/s and best was 140k/s. Would be nice to get close.
What else should I be looking at?
3
Answers
Your comment-thread mentions populating an unlogged session table (declared global temporary table) in an acceptable time, but you found that the bulk of the elapsed time was spent doing a logged insert-into-select-from the session table to the final table.
Logged inserts are significantly slower than load, but sometimes are the only option.
You can also speed up logged inserts on Db2-LUW using various techniques that are outside the scope of this question but which are well known to DBAs.
My suggestion was to use the ADMIN_CMD stored procedure to invoke a load-from-cursor action to populate the final table.
You tested this approach and found the performance to be acceptable.
Notice that in an HA configuration you will need to consider the impact of the load, possibly needing to use
load copy
or other approaches for synchronising or recovery (database restore etc).The approach might also need to be reconsidered if Db2 is running on a cloud service, depending on what the vendor offers.
You can turn of logging of the inserts to the main table.
alter table main_table activate not logged initially.
That setting will remain until you do a commit.
If you can export the excel as comma delimited file you can use load command which is much faster.
Alter Your strmaintable Id column cache 1000, no order.
Id integer not null generated by default as identity (start with 1, increment by 1, cache 1000, no order)
https://developer.ibm.com/technologies/databases/articles/dm-0403wilkins/#table5