I need to iterate over a large set of records and count the number of distinct values in a particular field:
>> const cats = [
{name: 'Sophie', color: 'black'},
{name: 'Moo', color: 'grey'},
{name: 'Tipper', color: 'black'}
]
>> countBy(cats, 'color')
Map('black' => 2, 'grey' => 1) # (precise data structure unimportant)
Hey, that’s what lodash countBy is for! But in my application, performance is paramount. In my (very informal) benchmark, this simple code beats lodash by around 50%:
function countBy<R extends object>(records: R[], key: keyof R) {
const counts = new Map<keyof R, number>()
for (const i = 0; i < records.length; i++) {
const value = records[i][key]
counts.set(value, (counts.get(value) || 0) + 1)
}
}
(This is perhaps unsurprising since it doesn’t need to handle fancier "iteratees" than keyof R
.)
Question: is there any way to beat this algorithm, speed-wise?
2
Answers
It totally depends on the amount of data on which we are going to perform the operation. I think using
Lodash
orvanilla JS
both are efficient way.But after checking the performance in the JS performance benchmark tool, I found that Vanilla JS code is more performance efficient as compare to Lodash one.
So just for the demo and testing purpose, I used
Array.reduce()
method to count the color from thecats
array.And after compare it with the
Lodash
solution.I found that lodash is slower than Vanilla JS. Here is the screenshot from
jsbench
.There’s not much you can do to avoid the
O(n)
nature of this. What you can do is:It will most likely depend on which browser(s) you need to support, or which Node.js/Deno/Bun version you use.
The TypeScript aspect of this won’t matter too much in terms of performance, the same function signature can be used regardless of the implementation, but you could tweak it a bit in my mind – you don’t really need to use
Map
:Memory usage can possibly play a role, so one other thing to try is to alter the input to rely on some form of streaming mechanism. This, however, would require a different function signature and may introduce an asynchronous aspect.