skip to Main Content

I’m using the following IIS Rewrite Rule to block as many bots as possible.

<rule name="BotBlock" stopProcessing="true">
  <match url=".*" />
  <conditions>
    <add input="{HTTP_USER_AGENT}" pattern="^$|bot|crawl|spider" />
  </conditions>
  <action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Forbidden" />
</rule>

This rule blocks all requests with an empty User-Agent string or a User-Agent string that contains bot, crawl and spider. This works great but it also blocks googlebot, which I do not want.

So how do I exclude the googlebot string from the above pattern so it does hit the site.

I’ve tried

^$|!googlebot|bot|crawl|spider

^$|(?!googlebot)|bot|crawl|spider

^(?!googlebot)$|bot|crawl|spider

^$|(!googlebot)|bot|crawl|spider

But they either block all User-Agents or still do not allow googlebot. Who has a solution and knows a bit about regex?

2

Answers


  1. Try this regex ^$|(?!.*googlebot)(bot|crawl|spider)

    Login or Signup to reply.
  2. If you want to match bot, but not google bot:

    ^$|(?<!bgoogle)bot|crawl|spider
    

    Regex demo

    Or you could group the alternatives in a non capture group and surround that group with word boundaries to prevent partial matches for all alternatives:

    ^$|b(?:bot|crawl|spider)b
    

    Regex demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search