skip to Main Content

I know the /s modifier in regex, but it doesn’t work with my specific case.

For example I’m trying to create a spam filter that matches urls with various domains like this

https://www.theonlineleaflets.com/u=/544hfb34s21jv335hs/u

Regex: https://www..+?/u/w{18}/u

The problem is that the spammers insert newlines and = symbols randomly like such:

<area  coords=3D"0,0,1000,1000" href=3D"https://www.theonlineleaflets.com/u=
/544hfb34s21jv335hs/u"/>

OR:

<area  coords=3D"0,0,1000,1000" href=3D"https://www.netprofessionalbitcoin.=
com/u/565i71cag5hd3kdh3mds/u"/>

OR:

<area  coords=3D"0,0,1000,1000" href=3D"https://www.theonlineleaflets.com/=
u/544hfb34s21jv335hs/u"/>

I’m pretty much sure new lines cannot be ignored, but I’m asking in case I’m wrong, or someone knows a better regex tu flag these spammers that would be precise enough.

NOTE: This is for cPanel, so I suppose it’s standard PERL format and I don’t think it supports modifiers like /s anyways.

UPDATE: It seems like the new line is always following the = sign, however this sign can be anywhere in the url.

2

Answers


  1. I came up with this regex that takes into account potential newlines.

    https://www..+?/=?(?:s*?)?u(?:s*?)?=?(?:s*?)?/.*?u
    

    Basically, I use (?:s*?)? which is an optional, non-capturing, lazy match of any number of whitespace characters, including newlines. If you want to restrict it to just newlines, use n instead. Here’s a demo.

    Login or Signup to reply.
  2. I have changed your regex to support ‘=‘ and White Space (incl. Newlines).

    This the regex:

    https://www..+?/[u=s]+/[w=s]+/[u=s]+
    

    What I have changed is to use character classes instead of literal matches. That way the ‘=’ and Newlines are effectively ignored and it will match all your examples.

    The only ‘problem’ is that I removed the ‘{18}‘ quantifier (since those bad characters take up room).

    Edit as per the comment:

    https://www.[sS]+?/[u=s]+/[w=s]+/[u=s]+
    

    I have changed a dot ‘.‘ to the character class ‘[sS]‘. Now there can be Newlines in the url as well.

    About the 18 quantifier: There’s 20 chars in your second example, so it won’t match if you limit that string.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search