When I tried running my code in tables with rows in thousands, it worked fine but as a performance testing, I tried it with tables having millions of records and then I encountered this issue.
I have tried the above approach. I can try an alternate approach without the copymanager but it would be great if I can use this piece of code. Any suggestions are welcome.
OFFSET is 0;
LIMIT is 3 million
Here is the code:
Connection connection = null;
try(StringWriter out = new StringWriter(); Writer printWriter = new PrintWriter(out, true)) {
connection = dataSource.getConnection();
OFFSET = 0;
// totalRecords is the total rows in the table
while (OFFSET <= totalRecords && totalRecords > 0) {
if (connection != null && connection.isWrapperFor(PGConnection.class)) {
PGConnection pgConnection = connection.unwrap(PGConnection.class);
CopyManager copyManager = null;
copyManager = pgConnection.getCopyAPI();
String sql = null;
sql = "SELECT " + table.getAttributeList() + " FROM " + table.getSchemaName().trim() + "."
+ table.getTableName().trim() + " WHERE " + "modified_ts" + " > " + "'" + dateTime + "'"
+ " OFFSET " + OFFSET + " LIMIT " + LIMIT;
LOGGER.info(sql);
long i;
// Here I am trying to copy the Result into the printwriter
i = copyManager.copyOut("COPY (" + sql + " ) TO STDOUT WITH (FORMAT CSV)", printWriter);
LOGGER.info("Total no of records in {} : {}", table.getTableName(), i);
printWriter.flush();
OFFSET = OFFSET + LIMIT;
}
}
String now = LocalDate.now().format(DateTimeFormatter.ofPattern("dd/MM/yyyy")).replace("/", "");
String localFileName = table.getTableName() + "_" + now + ".csv";
InputStream inputStream = new ByteArrayInputStream(out.toString().getBytes(UTF8));
postObjectToS3.uploadFile(saveFilePath + localFileName, inputStream);
inputStream.close();
}
and here is the stacktrace:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:633)
at java.base/java.lang.StringBuffer.append(StringBuffer.java:397)
at java.base/java.io.StringWriter.write(StringWriter.java:122)
at java.base/java.io.PrintWriter.write(PrintWriter.java:542)
at java.base/java.io.PrintWriter.write(PrintWriter.java:559)
at org.postgresql.copy.CopyManager.copyOut(CopyManager.java:92)
2
Answers
The issue was with the following line:
I was iterating the loop and was copying the result set of the query to the printwriter. It store as much records as possible but in the end, ran out of memory.
To resolve this, I simply used the data inside the printwriter in every iteration. The basic idea which solved this was:
Hope this clarifies the problem and the solution. Let me know if more clarification is needed.
There are no magics with memory mgmt., and it seems you need to find the balance: either expand the Xmx of the heap, or reduce the LIMIT (which is quite large). Assuming you can leave the CopyManager, iterate the ResultSet, and remove the element once you have converted it to text (otherwise you have the data twice in memory). This approach allows the Garbage-Collector to reclaim the memory as you move the results from sql to text (and having the data in memory one time). So, basic idea is