skip to Main Content

These days robots.txt became an important tool for SEO in websites. Through this file, web developers says crawler robots to check and not to check specific paths. But on the other hand, there are many secret and important directories and files inside websites that their paths must not mention anywhere to anyone to decrease security risks. Speaking about them is like giving a map to a thief to find all doors.
The problem is that robots.txt is in plain format and easy to read by every body because it almost stores in root directory with full read permission. So if I have a file like this

User-Agent: *
Disallow: 
Disallow: /admin/

I am saying to everybody (specially hackers): “I have a directory named admin and it must not be crawled”. Whereas I did not like others know there is such directory in my website.

How can we solve this problem?

2

Answers


  1. You can use the X-Robots-Tag in the page you don’t want to be crawled .

    But I really prefer a IP whitelist when is available .

    Login or Signup to reply.
  2. You can specify the beginning of the URL path only.

    In case of /admin/, you could for example specify:

    Disallow: /adm
    

    You just have to find the string that only blocks the URLs you want to block, and not others (like /administer-better).

    Depending on your URL structure, it might make sense to add a path segment to all “secret” URLs, and only refer to this segment in your robots.txt, and not the following segments:

    Disallow: /private/
    # nothing to see when visiting /private/ 
    # the secret URLs are:
    #   /private/admin/
    #   /private/login/
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search