skip to Main Content

I have a rule which boils down to:

RewriteCond %{REQUEST_URI} ^(.+).html$
RewriteRule ^(.+).html$ $1 [R=302,L]

It won’t work without the first line, even though in the second line there is exactly the same regex. As I understand it, if there’s no “.html” at the end, RewriteRule won’t rewrite anything, so why it can’t work without that RewriteCond? Trying to access example.com/test/abcd.html gives an error in the server log:

[REWRITE] detected external loop redirection with target URL: /test/abcd, skip.

Here is the whole .htaccess file:

RewriteEngine On

# HTTPS everywhere and strip WWW
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^www.(.+) [NC]
RewriteRule ^ https://%1%{REQUEST_URI} [L,R=301]

# if example.com/xxx is not directory AND example.com/xxx.html file exists
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.html -f
# rewrite example.com/xxx to example.com/xxx.html
# only if there's no slash at the end
RewriteRule ^(.*[^/])$ $1.html

# if example.com/xxx/ is not directory, rewrite to example.com/xxx
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ $1 [R=301,L]

# if xxx.html is not directory AND xxx.html file exists
# redirect from xxx.html to xxx
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} -f
# won't work without line below, even though both have ^(.+).html$ - can't understand why
RewriteCond %{REQUEST_URI} ^(.+).html$
RewriteRule ^(.+).html$ $1 [R=301,L]

2

Answers


  1. Chosen as BEST ANSWER

    EDIT: I was wrong. I wasn't even aware that my website is hosted on LiteSpeed Web Server (LSWS), which is somewhat compatible with Apache, but not in 100%. So, the following reasoning applies to LSWS, but not to Apache.

    I've finally understood why it didn't work.

    In the original version the file structure went like:

    1. redirect to HTTPS and non-WWW (and stop processing subseqent rules - flag [L])
    2. redirect /foo/bar.html/ to /foo/bar.html (and stop as above)
    3. rewrite /foo/bar to /foo/bar.html (internally)
    4. redirect /foo/bar.html to /foo/bar (and stop as in 1. and 2.)

    So, when /foo/bar.html was requested, it got matched by the 4. rule and redirected to /foo/bar. Then the rewriting was started again, as a new request was made to /foo/bar, and it was rewritten as /foo/bar.html (3.). Then it went to the next rule - 4. (again) - and was redirected back to /foo/bar, so yet another request was made, and the rewriting has started again, but then it was blocked by the server because it loops.

    There are two ways to fix that. The first way is to change the order of the last two operations:

    1. redirect to HTTPS and non-WWW (and stop processing subseqent rules - flag [L])
    2. redirect /foo/bar.html/ to /foo/bar.html (and stop)
    3. redirect /foo/bar.html to /foo/bar (and stop)
    4. rewrite /foo/bar to /foo/bar.html (internally)

    In this scenario, request for /foo/bar.html will be redirected to /foo/bar (3.) as before, and in the new request it will be rewritten as /foo/bar.html internally (4.) and that's all. It won't be redirected back to /foo/bar because there are no redirections or other rules after 4.

    The second way is to add the [L] flag to the rule rewriting /foo/bar to /foo/bar.html which will give the same effect as changing the order. The rewriting will go like:

    1. redirect to HTTPS and non-WWW (and stop)
    2. redirect /foo/bar.html/ to /foo/bar.html (and stop)
    3. rewrite /foo/bar to /foo/bar.html (internally) (and stop)
    4. redirect /foo/bar.html to /foo/bar (and stop)

    I'll go with the first way (reordering) as it will allow me to add other rules after the "/foo/bar to /foo/bar.html" rule.

    The final (as for now...) .htaccess file:

    RewriteEngine On
    
    # force HTTPS everywhere and strip WWW
    RewriteCond %{HTTPS} !=on
    RewriteCond %{HTTP_HOST} ^www.(.+) [NC]
    RewriteRule ^ https://%1%{REQUEST_URI} [R=301,L]
    
    # rewrite example.com/xxx/ to example.com/xxx
    # if example.com/xxx/ is not directory
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.+)/$ $1 [R=301,L]
    
    # redirect from xxx.html to xxx
    # if xxx.html is not directory AND xxx.html file exists
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} -f
    RewriteRule ^(.+).html$ $1 [R=301,L]
    
    # rewrite example.com/xxx to example.com/xxx.html
    # if example.com/xxx is not directory AND example.com/xxx.html file exists
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME}.html -f
    RewriteRule ^ %{REQUEST_FILENAME}.html
    

  2. Your rules generate an infinite redirect loop. Indeed, something like foo/bar.html goes to foo/bar, which will go to foo/bar.html internally, which will go back to foo/bar, and so on.

    Following rules will prevent such a redirect loop (few improvements included):

    RewriteEngine On
    
    # strip www
    RewriteCond %{HTTP_HOST} ^www.(.+)$ [NC]
    RewriteRule ^ https://%1%{REQUEST_URI} [L,R=301]
    
    # HTTPS everywhere
    RewriteCond %{HTTPS} !=on
    RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
    
    # if example.com/xxx/ is not directory, rewrite to example.com/xxx
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.+)/$ /$1 [R=301,L]
    
    # if xxx.html is not directory AND xxx.html file exists
    # redirect from xxx.html to xxx
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} -f
    RewriteCond %{THE_REQUEST} s/(.+).html(?:s|?) [NC]
    RewriteRule ^ /%1? [R=301,L]
    
    # if example.com/xxx is not directory AND example.com/xxx.html file exists
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME}.html -f
    # rewrite example.com/xxx to example.com/xxx.html
    RewriteRule ^(.+)$ /$1.html [L]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search