skip to Main Content

I’m having trouble understanding the code below. Its performance worsens significantly after 2^23 (8,388,608) items. I was thinking it might have something to do with how JSObject is implemented, but it turns out that if I change the string keys to some sort of hashes, the performance is actually pretty good.

I want to understand how V8 works under the hood with this example:

function foo() {
    const result = Object.create(null);
    for (let i = 0; i < 25_000_000; i += 1) {
        console.time(`Prop ${i}`);
        result[`prop_${i}`] = i + (Math.random() * 25_000_000);
        console.timeEnd(`Prop ${i}`);
    }
    return result;
}


const a = foo();

%DebugPrint(a);

console.log(a[`prop_24000000`]);
console.log(Object.keys(a).length);

To run it

$ node --allow-natives-syntax file.js

The recorded time is around 0.001 ms before 2^23 items, but after that, it increases to approximately 1.5 seconds per iteration.

I also want to profile it to see if the theories are correct. If you have any theories, please let me know how I can profile it.

Thanks in advance!

2

Answers


  1. V8 stores objects in "classes". A class is formed by inserting unique keys in unique orders. This happens almost every time you add a key to an object, it branches off and you have a new class layout. If you know Linked Lists from other programming languages, this is sort of the layout structure of these classes in memory.

    For example:

    const obj = { a: 1 } // Class 1 Object
    obj.b = 2; // Class 2 Object
    // Here, obj's class is replaced by Class 2
    // Class 2's memory structure is
    // Keys: b
    // Values: 2
    // extends: Class 1 Object (pointer to the previous instantiation of Class 1)
    
    // Class 1's memory structure:
    // Keys: a
    // Values: 1
    // extends: nothing.
    

    If this process happens every time you add a new key, you can imagine the amount of classes you have in memory. I don’t know specifically why the time to insert is over 1.5 seconds, but it probably has something to do with the amount of memory you’re using, and some kind of analysis V8 is doing on the objects before adding another class that could be of O(n) time. V8 is also limited in memory to about 2-4GB because of their pointer compression, so that could also play a big role in manipulating such a big object.

    This is why the Map() class exists. Classes aren’t a thing for actual Hashmaps.

    Of course, this includes many simplifications and some speculation on my part. I would encourage you to find out yourself with the V8 Blog!

    https://v8.dev/blog/fast-properties

    https://v8.dev/blog/oilpan-pointer-compression

    https://v8.dev/blog/fast-super

    Login or Signup to reply.
  2. (V8 developer here.)

    V8 represents large objects as dictionaries under the hood. (That’s because its hidden-class system isn’t meant for huge objects, and wouldn’t work well for them, so has an internal limit of tracking up to 1020 properties, slightly less than 2**10.)

    JavaScript as a language requires object properties to be iterated in insertion order, so when an engine chooses to use a dictionary representation for an object, it must store the insertion/enumeration index with each entry (because dictionaries, by nature, arbitrarily reshuffle their entries internally). V8 stores the dictionary’s next insertion index (that the next entry will get) in a bit field that happens to be 23 bits wide (out of a 32-bit wide slot; the other bits are used for other stuff). When that’s exceeded, the next enumeration index needs to be computed every time by walking the entire dictionary. That’s pretty slow (which is why it’s normally avoided with the cached "next index").

    This is clearly unfortunate, but it’s also assumed to be very rare in practice, as most objects don’t get that big. I don’t know how feasible it would be to do something about this performance cliff.
    In the meantime, your best bet is probably to use a Map, which scales better to large sizes. If you need even more entries than a Map can hold, consider a sharded system, i.e. multiple smaller Maps (or objects) each being responsible for part of your index space.

    If you insist on profiling this yourself, you can use the d8 shell’s --prof option with a suitable test case, and then your platform’s "tick processor" script (e.g. for Linux). You’ll see certain C++ functions high up on the profile that have to do with Runtime_SetKeyedProperty and EnumIndexComparator. You can then use code search to understand the mechanisms that are at work under the hood, in particular BaseNameDictionary::NextEnumerationIndex.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search