We have a website instance on a domain which is blocked by a .htaccess password. Some IPs, such as the company’s network are allowed through.
-
There are no inbound links (although obviously cannot guarantee this 100%)
-
The site has no robots.txt
-
The robots meta tag is set to follow and index
With all of these conditions, is there any way that search engines could still index the site? I think not but want to make sure there is no loophole I didn’t know about.
2
Answers
Also see this post from a Google employee:
I’m pretty sure any crawler would be stopped before reaching any content, at the point .htaccess demands a password, seeing as how that’s the whole point of having an .htaccess password.
If you wanted to be redundantly sure for educational purposes, you could probably test from various browsers in private tabs, and maybe send a raw request on a socket to see what output you get back. Here’s a page that describes how you’d send a raw HTTP request: https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html
Here’s an excerpt from that page, where they describe how you’d go about fetching a page at http://nowhere123.com/docs/index.html:
You can send raw requests using telnet, which is definitely available in most linux distros, and probably available in windows, too.
I went ahead and issued this request (with modified path and host) to one of my own servers with a known .htaccess password gateway, and got this response:
So … maybe this will help you.