skip to Main Content

I need to iterate over a large set of records and count the number of distinct values in a particular field:

>> const cats = [
    {name: 'Sophie', color: 'black'},
    {name: 'Moo',    color: 'grey'},
    {name: 'Tipper', color: 'black'}
  ]

>> countBy(cats, 'color')
Map('black' => 2, 'grey' => 1) # (precise data structure unimportant)

Hey, that’s what lodash countBy is for! But in my application, performance is paramount. In my (very informal) benchmark, this simple code beats lodash by around 50%:

function countBy<R extends object>(records: R[], key: keyof R) {
  const counts = new Map<keyof R, number>()
  for (const i = 0; i < records.length; i++) {
    const value = records[i][key]
    counts.set(value, (counts.get(value) || 0) + 1)
  }
}

(This is perhaps unsurprising since it doesn’t need to handle fancier "iteratees" than keyof R.)

Question: is there any way to beat this algorithm, speed-wise?

2

Answers


  1. It totally depends on the amount of data on which we are going to perform the operation. I think using Lodash or vanilla JS both are efficient way.

    But after checking the performance in the JS performance benchmark tool, I found that Vanilla JS code is more performance efficient as compare to Lodash one.

    So just for the demo and testing purpose, I used Array.reduce() method to count the color from the cats array.

    const cats = [
      {name: 'Sophie', color: 'black'},
      {name: 'Moo',    color: 'grey'},
      {name: 'Tipper', color: 'black'}
    ];
    
    function countBy(arr, prop) {
        return arr.reduce((total, current) => {
        total[current[prop]] = (total[current[prop]] || 0) + 1;
        return total;
      }, {})
    }
    
    const result = countBy(cats, 'color');
    
    console.log(result);

    And after compare it with the Lodash solution.

    const cats = [
      {name: 'Sophie', color: 'black'},
      {name: 'Moo',    color: 'grey'},
      {name: 'Tipper', color: 'black'}
    ];
    
    const result = _.countBy(cats, 'color')
    
    console.log(result)
    <script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.js"></script>

    I found that lodash is slower than Vanilla JS. Here is the screenshot from jsbench.

    enter image description here

    Login or Signup to reply.
  2. There’s not much you can do to avoid the O(n) nature of this. What you can do is:

    • test different variations of the loop (for, for-of, forEach, etc)
    • how you access the requested property
    • how you create new entries when you find a new prop

    It will most likely depend on which browser(s) you need to support, or which Node.js/Deno/Bun version you use.

    The TypeScript aspect of this won’t matter too much in terms of performance, the same function signature can be used regardless of the implementation, but you could tweak it a bit in my mind – you don’t really need to use Map:

    function countBy<T extends { [K in keyof T]: T[K] }, K extends keyof T>(
      list: T[],
      prop: K
    ): { [key in T[K]]: number } {
      const acc = {} as { [key in T[K]]: number };
      for (let i = 0; i < list.length; i++) {
        const key = list[i][prop];
        acc[key] = (acc[key] ?? 0) + 1;
      }
      return acc;
    }
    

    Memory usage can possibly play a role, so one other thing to try is to alter the input to rely on some form of streaming mechanism. This, however, would require a different function signature and may introduce an asynchronous aspect.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search