Fetch Millions of Records using CopyManager throwing OutOfMemoryError - Postgresql

HimanshuArya
February 10, 2023
131 views
0 votes
2 Answers

When I tried running my code in tables with rows in thousands, it worked fine but as a performance testing, I tried it with tables having millions of records and then I encountered this issue.

I have tried the above approach. I can try an alternate approach without the copymanager but it would be great if I can use this piece of code. Any suggestions are welcome.

OFFSET is 0;
LIMIT is 3 million

Here is the code:

Connection connection = null;
            
                    try(StringWriter out = new StringWriter(); Writer printWriter = new PrintWriter(out, true)) {
                        connection = dataSource.getConnection();
                        OFFSET = 0;
    // totalRecords is the total rows in the table
                        while (OFFSET <= totalRecords && totalRecords > 0) {
            
                            if (connection != null && connection.isWrapperFor(PGConnection.class)) {
            
                                PGConnection pgConnection = connection.unwrap(PGConnection.class);
                                CopyManager copyManager = null;
                                copyManager = pgConnection.getCopyAPI();
            
                                String sql = null;
                                    sql = "SELECT " + table.getAttributeList() + " FROM " + table.getSchemaName().trim() + "."
                                            + table.getTableName().trim() + " WHERE " + "modified_ts" + " > " + "'" + dateTime + "'"
                                            + " OFFSET " + OFFSET + " LIMIT " + LIMIT;
            
            
                                LOGGER.info(sql);
            
                                long i;
    // Here I am trying to copy the Result into the printwriter
                                i = copyManager.copyOut("COPY (" + sql + " ) TO STDOUT WITH (FORMAT CSV)", printWriter);
                                LOGGER.info("Total no of records in {} : {}", table.getTableName(), i);
                                printWriter.flush();
                                OFFSET = OFFSET + LIMIT;
                                
            
                            }
                        }
                        String now = LocalDate.now().format(DateTimeFormatter.ofPattern("dd/MM/yyyy")).replace("/", "");
                        String localFileName = table.getTableName() + "_" + now + ".csv";
                        InputStream inputStream = new ByteArrayInputStream(out.toString().getBytes(UTF8));
                        postObjectToS3.uploadFile(saveFilePath + localFileName, inputStream);
                        
                        inputStream.close();
                    }

and here is the stacktrace:

Exception in thread "main" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:633)
at java.base/java.lang.StringBuffer.append(StringBuffer.java:397)
at java.base/java.io.StringWriter.write(StringWriter.java:122)
at java.base/java.io.PrintWriter.write(PrintWriter.java:542)
at java.base/java.io.PrintWriter.write(PrintWriter.java:559)
at org.postgresql.copy.CopyManager.copyOut(CopyManager.java:92)

Answers

Chosen as BEST ANSWER
- HimanshuArya
- February 15, 2023 at 6:18 pm
- 0 votes
0
The issue was with the following line:
```
while (OFFSET <= totalRecords && totalRecords > 0) {

// some code
i = copyManager.copyOut("COPY (" + sql + " ) TO STDOUT WITH (FORMAT CSV)", printWriter);
}
```
I was iterating the loop and was copying the result set of the query to the printwriter. It store as much records as possible but in the end, ran out of memory.

To resolve this, I simply used the data inside the printwriter in every iteration. The basic idea which solved this was:
```
while (OFFSET <= totalRecords && totalRecords > 0) {
printwriter = new PrintWriter(out);
// some code
i = copyManager.copyOut("COPY (" + sql + " ) TO STDOUT WITH (FORMAT CSV)", printWriter);
/* Used printwriter data to create a file or append the data if file exists. Closed the printwriter.*/
}
```
Hope this clarifies the problem and the solution. Let me know if more clarification is needed.

(Edit)

- ShaharT
- February 12, 2023 at 11:57 pm
- 0 votes
0
There are no magics with memory mgmt., and it seems you need to find the balance: either expand the Xmx of the heap, or reduce the LIMIT (which is quite large). Assuming you can leave the CopyManager, iterate the ResultSet, and remove the element once you have converted it to text (otherwise you have the data twice in memory). This approach allows the Garbage-Collector to reclaim the memory as you move the results from sql to text (and having the data in memory one time). So, basic idea is
```
   List<Object[]> resultValues = query.getResultList();
   Iterator<Object[]> it = resultValues.iterator();
   while (it.hasNext()) {
       Object[] result = it.next();
       ... // the conversion to text
       it.remove();
    }
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Fetch Millions of Records using CopyManager throwing OutOfMemoryError – Postgresql

Answers