skip to Main Content

I’m cleaning up some code that’s started throwing java.lang.OutOfMemoryError in Production.

The problematic area has a couple of methods that process large collections, e.g.:

public void doSomething(Collection<HeavyObject> inputs) {

  ... do some stuff using INPUTS, deriving some different objects ...

  ... do some other stuff NOT using INPUTS, only derived objects ...

}

public void unsuspectingCaller() {
  Collection<HeavyObject> largeCollection;
  ... some stuff to populate the collection ...
  doSomething(largeCollection);
  ... other stuff ...
   // this following code may be added in the future
  kaboom(largeCollection); // walks into maintenance trap!
}

The code is blowing up and running out of memory in ... do some other stuff NOT using INPUTS ...

I can fix reduce the memory consumption (allowing early GC) by adding a inputs.clear() in between the two blocks.

But, I do not want to set a trap for future maintainers who might not be aware that the input collection is cleared. In fact, the inputs would ideally have been immutable, to more clearly communicate the intent of the code.

Is there an idiomatic way to declare doSomething() to make it clear, or even compiler verifiable, that the caller of doSomething() is not supposed to continue using the collection after doSomething() has been called?

UPDATE

For additional clarity, I renamed the parameter to inputs, instead of targets. Just keep that in mind when reviewing the comments.

UPDATE2

Addressing the suggestion from @Stephen C, we can see clearly that the JVM does not release references held by the caller, even if they are just passed in as an unnamed parameter. Execute with -Xmx8g (fail) and -Xmx9g (pass):

package com.stackoverflow.sandbox;

import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

import org.junit.jupiter.api.Test;

public class MemoryTest {

    static class HeavyObject {
        int[] oneGigabyte = IntStream.range(0, 256_000_000).toArray();
        public int[] getGig() {
            return oneGigabyte;
        }
    }
    
    private int[] skynet(int[] in) {
        // perform out-of-this-world artificial intelligence computation
        return Arrays.stream(in).map(x -> x >> 1).toArray();
    }
    
    void doSomething(Collection<HeavyObject> input) {
        Collection<int[]> doubleMemoryUsage = input.stream().map(HeavyObject::getGig).map(this::skynet).collect(Collectors.toList());

        input = null;
        
        Collection<int[]> tripleMemoryUsage = doubleMemoryUsage.stream().map(this::skynet).collect(Collectors.toList());
        
        double sum = tripleMemoryUsage.stream().flatMapToDouble(array -> Arrays.stream(array).asDoubleStream()).sum();
        System.out.println("sum = " + sum);
    }
    
    @Test
    void caller1() {
        doSomething(List.of(new HeavyObject(), new HeavyObject(), new HeavyObject()));
        System.out.println("done1");
    }
    
    @Test
    void caller2() {
        Collection<HeavyObject> threeGigs = List.of(new HeavyObject(), new HeavyObject(), new HeavyObject());
        doSomething(threeGigs);
        System.out.println("done2");
    }

}

Another way to state the challenge, is how to reduce the memory usage in doSomething() from triple to double in an idiomatic way that enforces safe usage at compile time?

3

Answers


  1. Chosen as BEST ANSWER

    I was hoping for something better but here's what I came up with:

    Refactor doSomething() into a class.

    class DoerOfSomething {
       public DoerOfSomething(Collection<HeavyObject> inputs) {
         ... do some stuff using INPUTS, deriving other objects ...
         // derived objects are set as members
         // inputs goes out of scope
       } 
       public void doSomething() {
         ... do some other stuff NOT using INPUTS, only derived objects ...
       }
    }
    

    Now the caller can perform their own analysis to see if targets.clear() is ok to call.


  2. The problem with calling targets.clear() is (as you noted) that something else may be using the collection. So here’s how I would approach this:

    public void doSomething(Collection<Widget> targets) {
      // ... do some stuff using TARGETS ...
    
      targets = null;
    
      // ... do some other stuff NOT using TARGETS ...
    }
    

    The targets = null; will prevent this call from retaining a reference to the collection longer than it needs to. The performance impact is almost zero, and the damage (if any!) of nulling prematurely is localized to the doSomething() method itself.

    The problem then falls on the caller:

    // This should be OK
    doSomething(computeWidgets(...));
    
    // This may be a problem
    Collection<Widget> targets = computeWidgets(...);
    doSomething(targets);
    // Don't use 'targets' from now on.
    

    In the first example, the JVM should be able to tell that the caller has no reachable reference once the call has started. In the second example, it is more difficult for the JVM to know. But in both cases, you are dependent on the JVM to detect that the reference is effectively unreachable in the caller.

    UPDATE

    I suspect that the reason that your MemoryTest example fails is that your doSomething code is creating a temporary variable or using a register or something to hold a reference to the Stream object. The JVM may not realize that that variable / register is no longer live, and may therefore be treating the Stream object as reachable. But the Stream object will most likely have a reference to the original collection, and that will make the collection reachable as well.

    It could be argued that that would be a JVM bug, but I don’t think so. The JLS and JVMS make no strong statements about whether / when the JVM should detect that an local variable (or temporary variable / register) used a method call is no longer reachable.


    But I really think that Bohemian has given you the best answer. (No. I don’t think he was being funny.)

    If you are having to micro-optimize this to squeeze your problem into memory footprint of your current (small) heap, then the simple solution is to make the heap bigger.

    As you noted the various clever things that you could do to optimize storage usage (e.g. by clearing things) could actually break the application or make it harder to maintain.

    (And your MemoryTest example is a good illustration of how clever optimizations may fail. There is stuff happening behind the scenes that is difficult to predict.)

    Login or Signup to reply.
  3. I am only posting this since it’s too big for a comment:

    import java.util.Arrays;
    import java.util.Collection;
    import java.util.List;
    import java.util.stream.Collectors;
    import java.util.stream.IntStream;
    
    public class MemoryTest {
    
        public static void main(String[] args) {
            new MemoryTest().doSomething(List.of(new HeavyObject(), new HeavyObject(), new HeavyObject()));
        }
    
    
        static class HeavyObject {
            int[] oneGigabyte = IntStream.range(0, 256_000_000).toArray();
            public int[] getGig() {
                return oneGigabyte;
            }
        }
    
        private int[] skynet(int[] in) {
            // perform out-of-this-world artificial intelligence computation
            return Arrays.stream(in)
                    .map(x -> x >> 1)
                    .toArray();
        }
    
    
        void doSomething(Collection<HeavyObject> input) {
            Collection<int[]> doubleMemoryUsage = input.stream().map(HeavyObject::getGig).map(this::skynet).collect(Collectors
                    .toList());
    
            input = null;
    
            Collection<int[]> tripleMemoryUsage = doubleMemoryUsage.stream().map(this::skynet).collect(Collectors.toList());
    
            double sum = tripleMemoryUsage.stream().flatMapToDouble(array -> Arrays.stream(array).asDoubleStream()).sum();
            System.out.println("sum = " + sum);
        }
    
    }
    

    And run this with java -Xms6g -Xmx6g MemoryTest.java. It will work.

    Now comment that input = null; and run it: it will fail.

    Running this on jdk-15 btw.


    In theory even if you change the method to:

    void doSomething() {
        Collection<HeavyObject> input = List.of(new HeavyObject(), new HeavyObject(), new HeavyObject());
        Collection<int[]> doubleMemoryUsage = input.stream().map(HeavyObject::getGig).map(this::skynet).collect(Collectors
                .toList());
    
        //input = null;
    
        Collection<int[]> tripleMemoryUsage = doubleMemoryUsage.stream().map(this::skynet).collect(Collectors.toList());
    
        double sum = tripleMemoryUsage.stream().flatMapToDouble(array -> Arrays.stream(array).asDoubleStream()).sum();
        System.out.println("sum = " + sum);
    }
    

    it should not fail either, but it does. Even with ZGC or Shenandoah. I don’t have the knowledge to explain why, though.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search