I am trying to create a regex that checks if tag is inside the text, but there are couple tags (from multiple thousands, don’t ask me why, SEO expert told my client its good) which end with parentheses.
The regex works great for normal tags, but plain fails on parentheses, as the match has to be exact, so I am forced to use word boundary. Is there a way to allow this?
Here is original regex I used:
https://regex101.com/r/wN9jO8/1
This is what I tried (yes, I am not good with regexes, but I tried googling and cold not find anything really useful):
https://regex101.com/r/wN9jO8/2
I also tried modifying word boundary, but it did not work correctly (always matched one letter of string in front and after the tag).
Basically, for the tag text (jadad)
:
lipsum is a dummy text (jadad) alsdasldk. // match
lipsum is a dummy text (jadad). // match
lipsum is a dummy text (jadad) // match
lipsum is a dummy (text (jadad)) // match
lipsum is a dummy text (jadad // should not match
lipsum is a dummy text jadad) // should not match
lipsum is a dummy text (jadad)asd // should not match
The main problem is, it has to work perfectly fine for tags with parenthesis and without them, ideally easily editable to support more weird characters in tags ([ or > or tag ending with . or ? or !).
I am really lost right now. If you need any more info, just comment and I will try to add it in.
Thanks for help.
3
Answers
I can’t see the regex101 because I’m on my phone, but this maybe is what you are looking for?
http://www.phpliveregex.com/p/fo9
Edit:
Try this http://www.phpliveregex.com/p/fob
Edit2
http://www.phpliveregex.com/p/foc
Edit3
With text (jadad) tag:
http://www.phpliveregex.com/p/fod
You can use a negative lookahead
(?!w)
(means next position doesn’t have a word char). Note that you cannot useb
asb
cannot assert after)
which is considered a non-word character:Updated Regex Demo
I think this is what you’re looking for:
DEMO
b
is equivalent to(?<!w)(?=w)|(?<=w)(?!w)
: a position that is either followed by a word character and not preceded by one (beginning of word), or preceded by a word character and not followed by one (end of word). You’ve got a “word” that ends with a non-word character, so you have to drop the(?<=w)
part of that word boundary.Depending on your needs, you may want to change the first
b
to(?<!w)
. Also, be aware thatw
includes digits and underscores (_
); if that doesn’t suit your needs, you can use a character class instead, e.g.(?![A-Za-z0-9])
.