skip to Main Content

I’m working with regular expressions and my task is to find a word that has both substrings "a" and "ha" in a text file full of random word. Unfortunately, I’m unable to achieve this as the regex I wrote keeps on matching words like hat which has the substring ha but not a separate a.

This was my regex, b(?=w*a)w*(?=w*ha)w* ,which matches words like hat and hail when it should only match words with both substrings, for example haa and hata. How can I fix this?

PS. The order for substrings shouldn’t matter and the matching is just a general exercise i.e. for no particular programming language.

2

Answers


  1. You could use this regex:

    b(?:w*haw*a|w*aw*ha)w*
    

    which matches:

    • b : a word boundary
    • (?:w*haw*a|w*aw*ha) : either
      • w* : some number of word characters
      • ha : literal ha
      • w* : some number of word characters
      • a : literal a; or
      • w* : some number of word characters
      • a : literal a
      • w* : some number of word characters
      • ha : literal ha
    • w* : some number of word characters

    Regex demo on regex101

    Login or Signup to reply.
  2. At the expensive of performance, you might entertain the following pattern if you might need to expand the requirements in the future or are dynamically generating the pattern. The focus is more on developer convenience than computational performance.

    /bw*(ha|a)w*(?!1)(?1)w*/
    
    • bw* will match zero or more word characters before the first satisfying substring.

    • (ha|a) will match your alternate substrings as capture group one.

    • w* will match zero or more word characters.

    • (?!1) will ensure that the next word matched character(s) are not the same as the first capture group.

    • (?1) will re-execute the alternate expression in the first capture group (so that you don’t need to manually repeat&invert it.

    • w* will match zero or more word characters until the end of the word.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search