I have a secret folder in my website and I don’t want search engines to know about it. I didn’t put the folder name in the Disallow rule of robots.txt because writing this folder name in robots.txt means telling my visitors about that secret folder.
My question is, will search engines be able to know about this folder / crawl it even if I don’t have any links published to this folder?
2
Answers
Yes they can crawl it.
Your folder is not “secret” at all. Do a quick search for a curl command line to download the whole site then try it on your site to convince yourself your security approach is invalid.
Here is a good example: download allfolders subfolders and files using wget
You can you .htaccess to prevent agents being able to request the directory listing, and this will probably protect you fairly well if you don’t give your folder an obvious name like “site”, but I’d test it.
see deny direct access to a folder and file by htaccess
The only truly reliable way to hide a directory from everyone is to put it behind a password. If you absolutely cannot put it behind a password, one band-aid solution is to name the folder something like:
and then block just the first part of the name, like this:
This will effectively block the directory without revealing its full name. It will prevent any crawler that obeys robots.txt from crawling the directory, but it won’t make the directory easy for hostile crawlers to find. Just don’t mistake this for actual security. This will keep the major search engines out. There are no guarantees beyond that. Again, the only truly reliable way to keep everyone out of a secret directory is to put the directory behind a password.