I know the /s
modifier in regex, but it doesn’t work with my specific case.
For example I’m trying to create a spam filter that matches urls with various domains like this
https://www.theonlineleaflets.com/u=/544hfb34s21jv335hs/u
Regex:
https://www..+?/u/w{18}/u
The problem is that the spammers insert newlines and =
symbols randomly like such:
<area coords=3D"0,0,1000,1000" href=3D"https://www.theonlineleaflets.com/u=
/544hfb34s21jv335hs/u"/>
OR:
<area coords=3D"0,0,1000,1000" href=3D"https://www.netprofessionalbitcoin.=
com/u/565i71cag5hd3kdh3mds/u"/>
OR:
<area coords=3D"0,0,1000,1000" href=3D"https://www.theonlineleaflets.com/=
u/544hfb34s21jv335hs/u"/>
I’m pretty much sure new lines cannot be ignored, but I’m asking in case I’m wrong, or someone knows a better regex tu flag these spammers that would be precise enough.
NOTE: This is for cPanel, so I suppose it’s standard PERL format and I don’t think it supports modifiers like /s
anyways.
UPDATE: It seems like the new line is always following the =
sign, however this sign can be anywhere in the url.
2
Answers
I came up with this regex that takes into account potential newlines.
Basically, I use
(?:s*?)?
which is an optional, non-capturing, lazy match of any number of whitespace characters, including newlines. If you want to restrict it to just newlines, usen
instead. Here’s a demo.I have changed your regex to support ‘
=
‘ andWhite Space
(incl.Newlines
).This the regex:
What I have changed is to use character classes instead of literal matches. That way the ‘=’ and Newlines are effectively ignored and it will match all your examples.
The only ‘problem’ is that I removed the ‘
{18}
‘ quantifier (since those bad characters take up room).Edit as per the comment:
I have changed a dot ‘
.
‘ to the character class ‘[sS]
‘. Now there can beNewlines
in the url as well.About the 18 quantifier: There’s 20 chars in your second example, so it won’t match if you limit that string.