skip to Main Content

I have a large string(1000s of words) which i want to compare with all the elements of an array, which contains large strings as well, for all 3 or more consecutive word match. I have implemented it with regex but getting blank matched array.

Example with smaller text:

let textToCompare = "Hello there how are you doing with your life";

let textsToCompareWith= [
  { id:1, text:"Hope you are doing good with your life" },
  { id:2, text:"what are you doing with your life. hello there how are you" },
  { id:3, text:"hello there mate" }
];

Expected Output:

[
  {id:1, matchedText:["with your life"]}, 
  {id:2, matchedText:["are you doing with your life","hello there how are you"]},
  {id:3, matchedText:[]}
];

Current Output:

[
  {id:1, matchedText:[]}, 
  {id:2, matchedText:[]},
  {id:3, matchedText:[]}
];

My Code:

let regex = new RegExp("\b" + textToCompare.split(" ").join("\b.*\b") + "\b", "gi");

let output = textsToCompareWith.map(textObj => {
  // Match against each element in the array
  let matchedText = textObj?.text.match(regex);
  console.log(matchedText);
  return {
    id: textObj.id,
    matchedText: matchedText ? matchedText : [] // Return an empty array if no match is found
  };
});
                                  
console.log(output);

2

Answers


  1. You could check each word with each other and keep an eye on the last word.

    const
        compare = (w1, w2) => {
            const
                result = [],
                ends = {};
            
            for (let i = 0; i < w1.length; i++) {
                for (let j = 0; j < w2.length; j++) {
                    if (w1[i] !== w2[j]) continue;
                    let k = 0;
                    while (i + k < w1.length && j + k < w2.length) {
                        if (w1[i + k] !== w2[j + k]) break;
                        k++;
                    }
                    if (k > 2 && !ends[j + k]) {
                        result.push(w2.slice(j, j + k).join(' '));
                        ends[j + k] = true;
                    }
                }
            }
            return result;
        },
        lower = s => s.toLowerCase(),
        textToCompare = "Hello there how are you doing with your life",
        textsToCompareWith = [{ id: 1, text: "Hope you are doing good with your life" }, { id: 2, text: "what are you doing with your life. hello there how are you" }, { id: 3, text: "hello there mate" }],
        words = textToCompare.match(/w+/g).map(lower),
        result = textsToCompareWith.map(({ id, text }) => ({
            id,
            matchedText: compare(words, text.match(/w+/g).map(lower))
        }));
    
    console.log(result);
    .as-console-wrapper { max-height: 100% !important; top: 0; }

    A slightly different approach by avoiding use words.

    const
        compare = (w1, w2) => {
            const
                result = [],
                skip = {};
            
            for (let i = 0; i < w1.length; i++) {
                for (let j = 0; j < w2.length; j++) {
                    if (skip[j] || w1[i] !== w2[j]) continue;
                    let k = 0;
                    while (i + k < w1.length && j + k < w2.length) {
                        if (w1[i + k] !== w2[j + k]) break;
                        k++;
                    }
                    if (k > 2) {
                        result.push(w2.slice(j, j + k).join(' '));
                        while (k--) skip[j + k] = true;
                    }
                }
            }
            return result;
        },
        lower = s => s.toLowerCase(),
        textToCompare = "Hello there how are you doing with your life",
        textsToCompareWith = [{ id: 1, text: "Hope you are doing good with your life" }, { id: 2, text: "what are you doing with your life. hello there how are you" }, { id: 3, text: "hello there mate" }],
        words = textToCompare.match(/w+/g).map(lower),
        result = textsToCompareWith.map(({ id, text }) => ({
            id,
            matchedText: compare(words, text.match(/w+/g).map(lower))
        }));
    
    console.log(result);
    .as-console-wrapper { max-height: 100% !important; top: 0; }
    Login or Signup to reply.
  2. I created an answer just for my own learning of JavaScript. Piecing stuff together, I came up with:

    let textToCompare = "Hello there how are you doing with your life";
    let words = textToCompare.split(/s+/);
    let x = words.length;
    let textsToCompareWith= [
      { id:1, text:"Hope you are doing good with your life" },
      { id:2, text:"what are you doing with your life. hello there how are you" },
      { id:3, text:"hello there mate" }
    ];
    
    let combos = [...chunks(words)];
    combos.sort(function(a, b){return b.length - a.length});
    
    console.log(textsToCompareWith.map(({ id, text }) => ({id, matchedText: FindMatches(text)})));
    
    function* chunks(arr) {
        for (let i = 0; i < x-2; i++) {
            for (let j = i+3; j < x+1; j++) {
                yield arr.slice(i,j).join(" ");
            }
        }
    }
    
    function FindMatches(s) {
        var r = [];
        for (let i = 0; i < combos.length; i++) {
            re = new RegExp(`\b${combos[i]}\b`, 'i');
            if (re.test(s)) {
                r.push(combos[i]);
                s = s.replace(re, ' ');
            } 
        }
        return r;
    }

    I’m pretty sure this code will have many flaws and would look cluncky, but the idea is to split your input in chuncks of 3+words based on the assumption it’s splittable by whitespaces. I then tried to sort the resulting array on length so that we will not find smaller substrings first.

    Who knows, maybe something in here is actually usable.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search