Robots.txt disallow by regex - SEO

CaiqueCastroSoaresdaSilva
January 6, 2017
143 views
1 vote
2 Answers

On my website I have a page for the cart, that is: http://www.example.com/cart and another for the cartoons: http://www.example.com/cartoons. How should I write on my robots.txt file to ignore only the cart page?

The cart page does not accept an ending slash on the URL, so if I do:
Disallow: /cart, it will ignore /cartoon too.

I don’t know if it’s possible and it will be correctly parsed by the spider bots something like /cart$. I dont want to force Allow: /cartoon because may be another pages with the same prefix.

Tags: robots.txt seo

Answers

- JoshMiller
- January 6, 2017 at 4:24 pm
- 0 votes
0
You could explicitly allow and disallow both paths. More specific paths will take a higher precedent if they are longer in length:
```
disallow: /cart
allow: /cartoon
```
More info is available at: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
Login or Signup to reply.

- unor
- January 7, 2017 at 3:51 pm
- 0 votes
0
In the original robots.txt specification, this is not possible. It neither supports Allow nor any characters with special meaning inside a Disallow value.

But some consumers support additional things. For example, Google gives a special meaning to the $ sign, where it represents the end of the URL path:
```
Disallow: /cart$
```
For Google, this will block /cart, but not /cartoon.

Consumers that don’t give this special meaning will interpret $ literally, so they will block /cart$, but not /cart or /cartoon.

So if using this, you should specify the bots in User-agent.

Alternative

Maybe you are fine with crawling but just want to prevent indexing? In that case you could use meta–robots (with a noindex value) instead of robots.txt. Supporting bots will still crawl the /cart page (and follow links, unless you also use nofollow), but they won’t index it.
```

<meta name="robots" content="noindex" />
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Robots.txt disallow by regex – SEO

Answers

Alternative