Usage of 'Allow' in robots.txt - SEO

verstappen_doodle
October 19, 2016
156 views
0 votes
2 Answers

Recently I saw a site’s robots.txt as follows:

User-agent: *
Allow: /login
Allow: /register

I could find only Allow entries and no Disallow entries.

From this, I could understand robots.txt is nearly a blacklist file to Disallow pages to be crawled. So, Allow is used only to allow a sub part of domain which is already blocked with Disallow. Similar to this:

Allow: /crawlthis
Disallow: /

But, that robots.txt has no Disallow entries. So, does this robots.txt let Google crawl all the pages? Or, does it allow only the specified pages tagged with Allow?

Answers

- DFrank
- October 19, 2016 at 12:48 pm
- 0 votes
0
You are right that this robots.txt file allows Google to crawl all the pages on the website. A thorough guide can be found here: http://www.robotstxt.org/robotstxt.html.

If you want googleBot to only be allowed to crawl the specified pages then correct format would be:
```
User Agent:*
Disallow:/
Allow: /login
Allow: /register
```
(I would normally disallow those specific pages though as they don’t provide much value to searchers.)

It’s important to note that the Allow command line only works with some robots (including Googlebot)
Login or Signup to reply.

- unor
- October 22, 2016 at 12:18 am
- 0 votes
0
There is no point in having a robots.txt record that has Allow lines but no Disallow lines. Everything is allowed to be crawled by default anyway.

According to the original robots.txt specification (which doesn’t define Allow), it’s even invalid, as at least one Disallow line is required (bold emphasis mine):

The record starts with one or more User-agent lines, followed by one or more Disallow lines […]

At least one Disallow field needs to be present in a record.

In other words, a record like
```
User-agent: *
Allow: /login
Allow: /register
```
is equivalent to the record
```
User-agent: *
Disallow:
```
i.e., everything is allowed to be crawled, including (but not limited to) URLs with paths that start with /login and /register.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Usage of 'Allow' in robots.txt – SEO

Answers