For large arrays (i.e. 1M elements), is it faster to create and iterate over a Set (from an Array) than an Array – if you just need a single or two iterations to occur ever and performance is the only requirement (no need for deduping the array etc)?
Consider the following two scenarios for illustration of what I mean:
const array = Array.from({length: 1_000_000});
const methodOne = () => { array.forEach(() => {...}) }
const methodTwo = () => { const set = new Set(array); set.forEach(() => {...}) }
// end of program
In browser when timed, this goes in favor of the first approach (somewhat logical). But some evidence, such as this, states that set traversals are much faster than array traversals, so I’m wondering if it’s ever worthwhile to create a Set off an Array just to traverse it one or more times.
I’m guessing it boils down to the exact implementation of the Set constructor, which I can’t find and whether it needs to traverse the Array in order to create a Set. I’m aware that Set is implemented as an ordered hash table of some sort.
2
Answers
Yes it does traverse the array. Essentially it is implemented as
So creating a set by traversing the array, then traversing the set, will always be slower than directly traversing the array.
This approach only might be worth it (for overall runtime) when the set will eliminate some duplicates so the body of the loop over the set gets executed less often. Whether that is significant depends on the number of iterations you can save, what the body does exactly for each element, and how that compares to the overhead of the set creation.
V8 developer here.
I see no reason why that would be the case, and your own measurements don’t indicate so either. So no, it’s safe to assume that iterating over an array directly is faster than first creating a set and then iterating over that.
Just because someone on the internet claims otherwise (or you read their somewhat unclear statements as possibly claiming otherwise) doesn’t make it so.
The exact implementation of the
Set
constructor doesn’t matter. Either way, it’s obvious that it needs to copy all array elements into the set. That’s extra work (cost-wise on the order of a full iteration), which is totally unnecessary if you only want to iterate.For clarification, the one clear case where
Set
has a performance advantage is finding one specific item among a large collection.Array
methods like.find
and.indexOf
need to iterate, which could get lucky and find the right element immediately, but "on average" needs to iterate over half the array’s elements.Set
methods like.get
and.has
are required by the specification to have faster implementations than that; typically they do a hash-based "constant-time" lookup; in simple terms that means they only need to look in one specific place whether the element in question is there or not.That said, even this difference only becomes important for pretty large collections; it’s a common mistake among human programmers to intuitively think they should use a Set whenever they have more than two or three elements, whereas in truth even dead-simple iteration-based implementations tend to scale pretty well (i.e. end up being at least as fast, or even faster) up to several dozen elements.