I’m working with regular expressions and my task is to find a word that has both substrings "a" and "ha" in a text file full of random word. Unfortunately, I’m unable to achieve this as the regex I wrote keeps on matching words like hat
which has the substring ha
but not a separate a
.
This was my regex, b(?=w*a)w*(?=w*ha)w*
,which matches words like hat
and hail
when it should only match words with both substrings, for example haa
and hata
. How can I fix this?
PS. The order for substrings shouldn’t matter and the matching is just a general exercise i.e. for no particular programming language.
2
Answers
You could use this regex:
which matches:
b
: a word boundary(?:w*haw*a|w*aw*ha)
: eitherw*
: some number of word charactersha
: literalha
w*
: some number of word charactersa
: literala
; orw*
: some number of word charactersa
: literala
w*
: some number of word charactersha
: literalha
w*
: some number of word charactersRegex demo on regex101
At the expensive of performance, you might entertain the following pattern if you might need to expand the requirements in the future or are dynamically generating the pattern. The focus is more on developer convenience than computational performance.
bw*
will match zero or more word characters before the first satisfying substring.(ha|a)
will match your alternate substrings as capture group one.w*
will match zero or more word characters.(?!1)
will ensure that the next word matched character(s) are not the same as the first capture group.(?1)
will re-execute the alternate expression in the first capture group (so that you don’t need to manually repeat&invert it.w*
will match zero or more word characters until the end of the word.