skip to Main Content

We have a website instance on a domain which is blocked by a .htaccess password. Some IPs, such as the company’s network are allowed through.

  • There are no inbound links (although obviously cannot guarantee this 100%)

  • The site has no robots.txt

  • The robots meta tag is set to follow and index

With all of these conditions, is there any way that search engines could still index the site? I think not but want to make sure there is no loophole I didn’t know about.

2

Answers


  1. Pages that are password-protected will not be accessible to the search
    engines.

    Search engine robots typically can’t log in to crawl pages,
    so content behind a login will not make it into the search index.

    source: http://www.yourseoplan.com/is-password-protected-content-indexable-by-search-engines/

    Also see this post from a Google employee:

    No, our crawlers can’t access login protected pages.

    source: Gary Illyes, Google, https://productforums.google.com/forum/#!topic/news/2SdcGEWht1o

    Login or Signup to reply.
  2. I’m pretty sure any crawler would be stopped before reaching any content, at the point .htaccess demands a password, seeing as how that’s the whole point of having an .htaccess password.

    If you wanted to be redundantly sure for educational purposes, you could probably test from various browsers in private tabs, and maybe send a raw request on a socket to see what output you get back. Here’s a page that describes how you’d send a raw HTTP request: https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html

    Here’s an excerpt from that page, where they describe how you’d go about fetching a page at http://nowhere123.com/docs/index.html:

    GET /docs/index.html HTTP/1.1
    Host: www.nowhere123.com
    Accept: image/gif, image/jpeg, */*
    Accept-Language: en-us
    Accept-Encoding: gzip, deflate
    User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
    (blank line)
    

    You can send raw requests using telnet, which is definitely available in most linux distros, and probably available in windows, too.

    I went ahead and issued this request (with modified path and host) to one of my own servers with a known .htaccess password gateway, and got this response:

    HTTP/1.0 401 Unauthorized
    Date: Fri, 24 Jun 2016 15:08:26 GMT
    WWW-Authenticate: Basic realm="Restricted Area"
    Content-Type: text/plain
    Content-Length: 19
    
    Invalid CredentialsConnection closed by foreign host.
    

    So … maybe this will help you.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search