skip to Main Content

Recently I saw a site’s robots.txt as follows:

User-agent: *
Allow: /login
Allow: /register

I could find only Allow entries and no Disallow entries.

From this, I could understand robots.txt is nearly a blacklist file to Disallow pages to be crawled. So, Allow is used only to allow a sub part of domain which is already blocked with Disallow. Similar to this:

Allow: /crawlthis
Disallow: /

But, that robots.txt has no Disallow entries. So, does this robots.txt let Google crawl all the pages? Or, does it allow only the specified pages tagged with Allow?

2

Answers


  1. You are right that this robots.txt file allows Google to crawl all the pages on the website. A thorough guide can be found here: http://www.robotstxt.org/robotstxt.html.

    If you want googleBot to only be allowed to crawl the specified pages then correct format would be:

    User Agent:*
    Disallow:/
    Allow: /login
    Allow: /register
    

    (I would normally disallow those specific pages though as they don’t provide much value to searchers.)

    It’s important to note that the Allow command line only works with some robots (including Googlebot)

    Login or Signup to reply.
  2. There is no point in having a robots.txt record that has Allow lines but no Disallow lines. Everything is allowed to be crawled by default anyway.

    According to the original robots.txt specification (which doesn’t define Allow), it’s even invalid, as at least one Disallow line is required (bold emphasis mine):

    The record starts with one or more User-agent lines, followed by one or more Disallow lines […]

    At least one Disallow field needs to be present in a record.


    In other words, a record like

    User-agent: *
    Allow: /login
    Allow: /register
    

    is equivalent to the record

    User-agent: *
    Disallow:
    

    i.e., everything is allowed to be crawled, including (but not limited to) URLs with paths that start with /login and /register.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search