skip to Main Content

I have an "annotations" collection in MongoDB, which contains 5 million documents. The collection size is almost 2.5 GB, and its index size is 55 MB.

I was trying to store the collection in a variable

const Annotation = await database.collection("annotations").find().toArray();

But whenever I tried to run the application, it crashed by giving this error.

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

 1: 00007FF6351E7C4F v8::internal::CodeObjectRegistry::~CodeObjectRegistry+114207
 2: 00007FF635175EC6 DSA_meth_get_flags+65542
 3: 00007FF635176D7D node::OnFatalError+301
 4: 00007FF635AAB6CE v8::Isolate::ReportExternalAllocationLimitReached+94
 5: 00007FF635A95CAD v8::SharedArrayBuffer::Externalize+781
 6: 00007FF63593907C v8::internal::Heap::EphemeronKeyWriteBarrierFromCode+1468
 7: 00007FF635945D29 v8::internal::Heap::PublishPendingAllocations+1129
 8: 00007FF635942CFA v8::internal::Heap::PageFlagsAreConsistent+2842
 9: 00007FF635935959 v8::internal::Heap::CollectGarbage+2137
10: 00007FF63593E21B v8::internal::Heap::GlobalSizeOfObjects+33111: 00007FF63598498B v8::internal::StackGuard::HandleInterrupts+891
12: 00007FF63568C3C6 v8::internal::DateCache::Weekday+803813: 00007FF635B393C1 v8::internal::SetupIsolateDelegate::SetupHeap+494417
14: 000001F1E83C5EC9

I tried to fix it by using this command but it still gives me this error.

$env:NODE_OPTIONS=--max-old-space-size=8192

Can someone suggest how I can easily manage a such large amount of dataset in my
application?

2

Answers


  1. The best way to deal with a big amount of data is to handle it as a stream, so you never need to load the entire set of data at once in memory. You can try something like this:

    const cursor = database.collection("annotations").find();
    for await (const annotation of cursor) {
        // do something with the annotation
    }
    
    Login or Signup to reply.
  2. In your place, I would ask myself if I really need to get all this data and store in memory at once.

    If you are going to process it somehow only some fields may be useful at a time, so you can project just those, or do the processing/analytics in database itself, as it would be usually much faster, as data does not need to move from it to your variable.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search