Hi this is my first question in Stackoverflow please can you help. It regards htaccess files and robot.txt files. In October I created a WordPress website from what was previously a non-WordPress site. I had built the new site on a sub-domain of the existing site so the live site could remain live whilst I built the new one.
The site I built on the subdomain is live but I am concerned about the old htaccess files and robots txt files as to whether I should delete them; I created new htaccess and robots.txt files on the new site and have left the old htaccess files there. Just to mention that all the old content files are still sat on the server under a folder called ‘old files’ so I am assuming that these aren’t affecting matters. Here are the contents of each file:
I access the htaccess and robots.txt files by clicking on ‘public html’ via ftp filezilla. The site I built (htaccess details below). W3TC is a wordpress caching plugin which I installed just a few days ago so I am not querying anything here about W3TC:
# BEGIN W3TC Browser Cache
<IfModule mod_deflate.c>
<IfModule mod_headers.c>
Header append Vary User-Agent env=!dont-vary
</IfModule>
<IfModule mod_filter.c>
AddOutputFilterByType DEFLATE text/css text/x-component application/x-javascript application/javascript text/javascript text/x-js text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon application/json
<IfModule mod_mime.c>
# DEFLATE by extension
AddOutputFilter DEFLATE js css htm html xml
</IfModule>
</IfModule>
</IfModule>
# END W3TC Browser Cache
# BEGIN W3TC CDN
<FilesMatch ".(ttf|ttc|otf|eot|woff|font.css)$">
<IfModule mod_headers.c>
Header set Access-Control-Allow-Origin "*"
</IfModule>
</FilesMatch>
# END W3TC CDN
# BEGIN W3TC Page Cache core
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteRule .* - [E=W3TC_ENC:_gzip]
RewriteCond %{HTTP_COOKIE} w3tc_preview [NC]
RewriteRule .* - [E=W3TC_PREVIEW:_preview]
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} =""
RewriteCond %{REQUEST_URI} /$
RewriteCond %{HTTP_COOKIE} !(comment_author|wp-postpass|w3tc_logged_out|wordpress_logged_in|wptouch_switch_toggle) [NC]
RewriteCond "%{DOCUMENT_ROOT}/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" -f
RewriteRule .* "/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" [L]
</IfModule>
# END W3TC Page Cache core
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
I have 7 redirects in place to new page urls and I have no issue with these I have tested and each one works.
#Force non-www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.websiteurl.co.uk [NC]
RewriteRule ^(.*)$ http://websiteurl/$1 [L,R=301]
The previous site (htaccess for the old site):
Deny from all
The site I built (Robots.txt):
User-agent: *
Disallow: /wp-admin/
Sitemap:
http://websitehomepageurl/sitemap_index.xml
The previous site (Robots.txt):
User-agent: *
Disallow:
Please can you assist. I’d really appreciate your time.
Thanks a lot.
3
Answers
Hi thanks for the somewhat minimal response. I got help elsewhere. I added a robots.txt file to the development site so bots aren't allowed. I did a redirect for all attachments to their original page. All other files are in place. I will leave it there. To the guy who did reply, thanks. But to say all I had to do was to just delete the old robot and htaccess files was incorrect because they are still needed in the grand scheme of things. Stackoverflow has a really good reputation online so when helping others try to explain so that they can understand your logic behind your advice. I am glad I did not take your advice because I could have been looking at a larger problem to fix. Have a good day.
Remove the old robot.txt and htaccess.
A little follow up tip: In addition to the blocking of content via robots.txt I would suggest that you use ON EACH PAGE
meta content=”noindex,noarchive,nofollow” name=”robots” (you will need to add the < and closing tag to this).
The reason is that some bots do not take into account the robots.txt content.
Also I would NEVER allow people or bots to see old htaccess files !! You risk serious security issues if people can read your htaccess content.