skip to Main Content

I’m trying to find a fast way to search for partial strings in array elements and put them into a new array if the input value is greater than 2 characters.

Ultimately, this array will be used to create a list as results are found, a local data search.

This works okay but the arrays can be quite large (thousands of entries) and was looking for the best way to approach this that won’t end up a performance problem.
 

<html>
    <body>
        <input id="test">
    </body>
    <script>

        let test = ['Fruits: Apples', 'Fruits: Oranges', 'Fruits: Banannas', 'Fruits: Lemons', 'Vegetables: Corn', 'Vegetables: Potatoes']
        let found = []

        document.querySelector('#test').addEventListener('keyup', e => {
            let value = e.currentTarget.value
            if (value.length > 2) {
                let index = test.findIndex(i => i.includes(value))
                test.forEach(v => {
                    if(v.includes(value) && !found.includes(v))
                        found.push(v)
                })
            } else {
                found = []
            }
            console.log(found)
        })
    </script>
</html>

2

Answers


  1. First of all, you should examine, if it is really a performance problem, because as Don Knuth once said "Premature optimization is the root of all evil".

    Unless you create a fulltext index over your test array, there is no other way than looking at each element in the values range if it contains your search teams.

    But you could use a Set instead of an array for your found collection. This way you can just do

    let found = new Set();
    
    ...
    
    for (let v of test) {
      if (v.includes(value)) found.add(value);
    }
    

    The duplicate check on a Set is much faster than in an array, and will be done automatically by the add() function.

    As a further optimization you could consider restricting the collection of values you need to search, when your add additional characters to your searchterm. Ie when your searchterm first was Fru and now is Fruits, matches can only come from your previous found set, because a term, that doesn’t contain Fru can’t contain Fruits. Not sure if this suggestion meets your usecase, though … And it will of course only work for follow up searches under certain conditions.

    Login or Signup to reply.
    1. Convert your array to set and back. Thus you will have a unique array and no need to check values later.
    2. Use Array::filter() just filter the array.
    3. Also you could use a precompiled regex to make the search case insensitive.
    let test = ['Fruits: Apples', 'Fruits: Oranges', 'Fruits: Banannas', 'Fruits: Lemons', 'Vegetables: Corn', 'Vegetables: Potatoes']
    let unique = [...new Set(test)];
    let found = []
    
    document.querySelector('#test').addEventListener('keyup', e => {
        let value = e.currentTarget.value
        if (value.length > 2) {
            const regex = new RegExp(value, 'i');
            found = unique.filter(v => regex.test(v));
        } else {
            found = []
        }
        console.log(found)
    })
    <input id="test">

    A more complex and potentially more performant option could be to build a suffix array and binary search it:

    const source = ['Fruits: Apples', 'Fruits: Oranges', 'Fruits: Banannas', 'Fruits: Lemons', 'Vegetables: Corn', 'Vegetables: Potatoes'];
    const test = [...new Set(source)].map(v => v.toLowerCase());
    
    // build a suffix array
    let suff = test.reduce((r, v, i) => ([...[...v].keys()].forEach(idx => r.push([i, idx])), r), []).sort((a, b) => {
      a = test[a[0]].slice(a[1]);
      b = test[b[0]].slice(b[1]);
      return a > b ? 1 : -1;
    });
    
    const found = [];
    
    document.querySelector('#test').addEventListener('keyup', e => {
        let value = e.currentTarget.value.toLowerCase();
        found.length = 0;
        if (value.length > 2) {
          // binary search the suffix array
          let r = -1;
          let low = 0, high = suff.length, mid;
          while (low <= high){
            mid = (low + high) / 2 | 0;
            const [i, idx] = suff[mid];
            const v = test[i].slice(idx);
            if ( v.startsWith(value)) {
               r = mid;
               break;
            } else if (value > v)
              low = mid + 1;
            else            
              high = mid - 1;
          }
          if(r >= 0){
            // check neighbor suffix array values
            const idxs = [r];
            let n = r;
            let i, idx, v;
            while(([i, idx] = suff[++n]) && (v = test[i].slice(idx)) && v.startsWith(value)) idxs.push(n);
            n = r;
            while(([i, idx] = suff[--n]) && (v = test[i].slice(idx)) && v.startsWith(value)) idxs.push(n);
            found.push(...idxs.map(i => source[suff[i][0]]));
          }
        }
        console.log(found)
    })
    <input id="test">
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search