I want to prevent a page from being indexed, along with its assets (images).
So if I tell crawlers to skip that page, but that page is still registered in sitemap.xml, will any information on that page be indexed?
I want to prevent a page from being indexed, along with its assets (images).
So if I tell crawlers to skip that page, but that page is still registered in sitemap.xml, will any information on that page be indexed?
2
Answers
robots.txt disallows crawling, not indexing.
If you disallow crawling of a URL in your robots.txt, and you list this URL in your sitemap, it is still disallowed to be crawled. Occurrence in a sitemap doesn’t change this.
This URL might still be indexed, though (whether it’s in the sitemap or not).
Just to add to the previous answer, you can use the Noindex directive in your robots.txt file. It is not part of the standard AFAIK but is commonly used, see blog – although there seem to be diverging opinions about it. Alternatively, you could use the robots meta tags in your webpages.
As usual, there is no guarantee that all crawlers will respect the robots directives, however the main ones will.