skip to Main Content

I am trying to create a regex that checks if tag is inside the text, but there are couple tags (from multiple thousands, don’t ask me why, SEO expert told my client its good) which end with parentheses.

The regex works great for normal tags, but plain fails on parentheses, as the match has to be exact, so I am forced to use word boundary. Is there a way to allow this?

Here is original regex I used:

https://regex101.com/r/wN9jO8/1

This is what I tried (yes, I am not good with regexes, but I tried googling and cold not find anything really useful):

https://regex101.com/r/wN9jO8/2

I also tried modifying word boundary, but it did not work correctly (always matched one letter of string in front and after the tag).

Basically, for the tag text (jadad):

lipsum is a dummy text (jadad) alsdasldk. // match
lipsum is a dummy text (jadad). // match
lipsum is a dummy text (jadad) // match
lipsum is a dummy (text (jadad)) // match

lipsum is a dummy text (jadad // should not match
lipsum is a dummy text jadad) // should not match
lipsum is a dummy text (jadad)asd // should not match

The main problem is, it has to work perfectly fine for tags with parenthesis and without them, ideally easily editable to support more weird characters in tags ([ or > or tag ending with . or ? or !).

I am really lost right now. If you need any more info, just comment and I will try to add it in.

Thanks for help.

3

Answers


  1. I can’t see the regex101 because I’m on my phone, but this maybe is what you are looking for?

     preg_match_all("/((.*))/", $input_lines, $output_array);
    

    http://www.phpliveregex.com/p/fo9

    Edit:

    Try this http://www.phpliveregex.com/p/fob

    Edit2

    http://www.phpliveregex.com/p/foc

    Edit3
    With text (jadad) tag:

    preg_match("/.*text ((jadad))[^w].*/", $input_line, $output_array);
    

    http://www.phpliveregex.com/p/fod

    Login or Signup to reply.
  2. You can use a negative lookahead (?!w) (means next position doesn’t have a word char). Note that you cannot use b as b cannot assert after ) which is considered a non-word character:

    btext (jadad)(?!w)
    

    Updated Regex Demo

    Login or Signup to reply.
  3. I think this is what you’re looking for:

    btext (jadad)(?!w)
    

    DEMO

    b is equivalent to (?<!w)(?=w)|(?<=w)(?!w): a position that is either followed by a word character and not preceded by one (beginning of word), or preceded by a word character and not followed by one (end of word). You’ve got a “word” that ends with a non-word character, so you have to drop the (?<=w) part of that word boundary.

    Depending on your needs, you may want to change the first b to (?<!w). Also, be aware that w includes digits and underscores (_); if that doesn’t suit your needs, you can use a character class instead, e.g. (?![A-Za-z0-9]).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search