I am trying to block our job board from being crawled. Can a specific URL be blocked with "Disallow" in a robot.txt
file? And what would that look like for this URL? I don’t want to just Disallow HTML, only the URL for the URL jobs.example.com
Disallow: https://jobs.example.com/
2
Answers
In order to disallow web crawlers from indexing one specific page, you can do so with the following lines:
Or the entire website
Note that not all search engines/crawlers will respect that file
You can’t put full URLs into
robots.txt
disallow rules. Your proposed rule WON’T WORK as written:It looks like you might be trying to disallow crawling on the
jobs
subdomain. Doing so is possible. Each subdomain gets its own robots.txt file. You would have to configure you server to have different content for differentrobots.txt
files:https://example.com/robots.txt
https://jobs.example.com/robots.txt
Then your jobs
robot.txt
should disallow all crawling on that subdomain:If you are trying to disallow just the home page for that subdomain, you would have to use syntax that only the major search engines understand. You can use a
$
for "ends with" and the major search engines will intepret it correctly: