^((?!ca-ct.mydomain)(?!ca.mydomain)(?!cats.mydomain).)*mydomain.com$
I got the above expression from a web.config file, it’s supposed to filter out anything that contains ca-ct.mydomain or ca.mydomain or cats.mydomain
I just cannot understand what the .)* piece means, the closing parenthesis between the dot and the asterisk seem to break the otherwise logical "any amount of characters after matching any of the 3 negative lookaheads" piece.
2
Answers
The negative look-ahead assertions are checked at successive positions. After consuming one character with
.
, a repetition with*
will apply those assertions again at the next position, …and so on.It is just one way to do it. Another approach is to have the three negative look-ahead assertions execute only once, only at the beginning of the input, and look further in the input (by incorporating
.*
). Then when these assertions succeed, the input can be consumed with yet another.*
:The work involved for the regex engine is similar though.
(?!.*cats.mydomain).*mydomain.com$" is used to match strings that contain mydomain.com but exclude specific subdomains. Let’s break it down step by step:
^: Asserts the start of the string.
(?!.ca-ct.mydomain): This is a negative lookahead assertion that ensures ca-ct.mydomain does not appear anywhere in the string. The . allows for any characters to appear before ca-ct.mydomain.
(?!.*ca.mydomain): This negative lookahead ensures ca.mydomain does not appear anywhere in the string.
(?!.*cats.mydomain): This negative lookahead ensures cats.mydomain does not appear anywhere in the string.
.*mydomain.com: This part matches any character (except for newline) zero or more times followed by mydomain.com.
$: Asserts the end of the string.
In summary, this regex pattern will match any string that contains mydomain.com but does not contain the subdomains ca-ct.mydomain, ca.mydomain, or cats.mydomain.