skip to Main Content

I am trying to match the first instance of a word after a certain word has been listed. There could be whitespace and many lines.

Here is an example of desired outcomes with the requirement of the word "before" coming before the word "after" and then only matching the first instance.

beforeafter - match

before after - match

beforeafterafter - matches only first after

beforeafter something after - matches only first after

before
(whitespace)
after - match

befafter -no match

before bar foo baz after - match

I believe I am close with this regex

(?<=(before([^after]*)))(after) which utilizes a positive lookbehind.
But the [^after] negates the characters in the set rather than the whole word itself (which leads to the final example above not matching).

It’s almost like I want a "positive look behind" (aka look for "before") and then between that and the word I desire, I want a negative look behind of my initial word (i.e. "after").

When I tried that, I also got something close but not quite: (?<=(before(.|s)*(?<!(after))))after. This results in wrongly selecting an "after" that is not connected to the initial after.

3

Answers


  1. You don’t need a regex for this. You can just use indexOf.

    let location = text.indexOf("after", text.indexOf("before") + 1);
    
    let test = "beforeafter something after"
    
    console.log(test.indexOf("after", test.indexOf("before") + 1));
    Login or Signup to reply.
  2. Here’s an example I found works using python:

    import re
    
    def find_word_after_keyword(text, keyword):
        pattern = re.compile(r'(?<=' + re.escape(keyword) + r's)w+')
        match = pattern.search(text)
    
        if match:
            return match.group(0)
        else:
            return None
    
    # Example usage:
    text = "This is a sample text with keyword example and the word to find is 
    after the keyword."
    keyword = "keyword"
    result = find_word_after_keyword(text, keyword)
    
    if result:
        print("Word found:", result)
    else:
        print("Word not found after the keyword.")
    
    Login or Signup to reply.
  3. Just use a positive look behind.

    The key is to use .* after before and before after to allow any characters between before and after. To allow this match white spaces also add s flag to the regexp.

    The same task can be achieved with String::indexOf() which is 5x faster.

    const strings = `beforeafter 
    before after
    beforeafterafter
    beforeafter something after
    before(whitespace)after
    befafter
    before bar foo baz after`.split('n').map(str => str.replace('(whitespace)', 'n'));
    
    const lookAfter = (str, before, after) => {
      const idx = str.indexOf(before);
      return idx >= 0 ? str.indexOf(after, idx + before.length) : -1;
    }
    
    strings.forEach(str => console.log(str, '=>', 
      str.search(/(?<=before.*)after/s), // search with a regexp
      lookAfter(str, 'before', 'after') // search with String::indexOf()
    ));
    ` Chrome/120
    -------------------------------------------------------
    indexOf()   1.00x  |  x1000000  133  138  140  142  145
    regex       5.01x  |  x1000000  666  681  686  696  697
    -------------------------------------------------------
    https://github.com/silentmantra/benchmark `
    
    const strings = `beforeafter 
    before after
    beforeafterafter
    beforeafter something after
    before(whitespace)after
    befafter
    before bar foo baz after`.split('n').map(str => str.replace('(whitespace)', 'n'));
    
    const lookAfter = (str, before, after) => {
      const idx = str.indexOf(before);
      if(idx >= 0) return str.indexOf(after, idx + before.length);
      return -1;
    }
    
    // @benchmark indexOf()
    strings.map(str => {
      const idx = str.indexOf('before');
      return idx >= 0 ? str.indexOf('after', idx + 'before'.length) : -1;
    });
    
    
    // @benchmark regex
    strings.map(str =>str.search(/(?<=before.*)after/s));
    
    /*@end*/eval(atob('e2xldCBlPWRvY3VtZW50LmJvZHkucXVlcnlTZWxlY3Rvcigic2NyaXB0Iik7aWYoIWUubWF0Y2hlcygiW2JlbmNobWFya10iKSl7bGV0IHQ9ZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgic2NyaXB0Iik7dC5zcmM9Imh0dHBzOi8vY2RuLmpzZGVsaXZyLm5ldC9naC9zaWxlbnRtYW50cmEvYmVuY2htYXJrL2xvYWRlci5qcyIsdC5kZWZlcj0hMCxkb2N1bWVudC5oZWFkLmFwcGVuZENoaWxkKHQpfX0='));
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search