Html - Can a URL be blocked using robot.txt disallow?

PamBifaro
August 31, 2023
231 views
1 vote
2 Answers

I am trying to block our job board from being crawled. Can a specific URL be blocked with "Disallow" in a robot.txt file? And what would that look like for this URL? I don’t want to just Disallow HTML, only the URL for the URL jobs.example.com

Disallow: https://jobs.example.com/

Answers

- marcobiedermann
- August 31, 2023 at 3:47 pm
- 0 votes
0
In order to disallow web crawlers from indexing one specific page, you can do so with the following lines:
```
User-agent: *
Disallow: /path/to/page/
```
Or the entire website
```
User-agent: *
Disallow: /
```
Note that not all search engines/crawlers will respect that file
Login or Signup to reply.

- StephenOstermiller
- August 31, 2023 at 8:34 pm
- 0 votes
0
You can’t put full URLs into robots.txt disallow rules. Your proposed rule WON’T WORK as written:
```
# INCORRECT
Disallow: https://jobs.example.com/
```
It looks like you might be trying to disallow crawling on the jobs subdomain. Doing so is possible. Each subdomain gets its own robots.txt file. You would have to configure you server to have different content for different robots.txt files:
- https://example.com/robots.txt
- https://jobs.example.com/robots.txt
Then your jobs robot.txt should disallow all crawling on that subdomain:
```
User-Agent: *
Disallow: /
```
If you are trying to disallow just the home page for that subdomain, you would have to use syntax that only the major search engines understand. You can use a $ for "ends with" and the major search engines will intepret it correctly:
```
User-Agent: *
Disallow: /$
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Html – Can a URL be blocked using robot.txt disallow?

Answers