I have the following rewrite rule in .htaccess :-
RewriteRule ^.*/-y.* /handleurl.php [L]
Its purpose is to display appropriate pages depending on the values in the url, for example:
example.com/books/BookA/-y?act=x
will display bookA page
the variable holding the book name is encoded such that …
example.com/books/Book B/-y?act=x
becomes example.com/books/book+B/-y?act=x
… which is fine (it’s decoded in handleurl.php
)
however if the book is called Book A/B
I have …
example.com/books/Book A/B/-y?act=x
which becomes example.com/books/Book+A%2FB/-y?act=x
It appears that htaccess decodes this before the rewrite rule, so the rewrite rule sees too many elements in the URL delineated by the /
.
Is there any way I can get the rewrite rule to ignore the encoded /
as intended?
I have seen a previous response to a similar question, but I only need the /
to be ignored, not other encoded characters.
2
Answers
The rewrite rule was never the problem. I think it was Apache not liking the encoded '/' and the fact that the downstream url handling program was using '/' as a delimiter when identifying the individual url elements. I have to work out: 1) whether I want to allow '/' in the variables that make up the elements of the freindly url, and 2) if so how to pass it without upsetting Apache and how to subsequently disect the url. Maybe I will convert '/' to '~' for the benefit of the URL then convert back to '/' prior to subsequent display. Thank you Mr White.
This is not the problem. Regardless of whether the URL-path
/books/Book+A%2FB/-y
is decoded or not makes no difference here*1. Both would match the (rather generous) regex^.*/-y.*
in theRewriteRule
pattern.(*1 But yes, the URL-path matched by the
RewriteRule
pattern is URL decoded, ie. %-decoded.)The problem is likely to be that Apache (by default) rejects – with a 404 – any URL that contains a %-encoded slash ie.
%2F
(or backslash%5C
) in the URL-path portion of the URL. This is a security feature, that otherwise "could potentially allow unsafe paths" (source).However, this can be overridden with the
AllowEncodedSlashes
directive. But this directive can only be used in a server or virtualhost context. It cannot be used in.htaccess
.You either need to set
AllowEncodedSlashes On
to allow encoded slashes, which are also decoded, as with other characters. Or setAllowEncodedSlashes NoDecode
to permit encoded slashes, but do not decode them – which is preferred and probably what you are expecting.Aside#1:
The regex
^.*/-y.*
is very generic, possibly too generic. This is the same as simply/-y
. What is the.*
after-y
intended to match? From your example URLs it looks like-y
is always at the end of the URL-path, so this could be anchored, eg./-y$
. And if the URL that you need to match always starts/books/
then maybe this should also be included in the regex?Aside#2:
This isn’t strictly "URL encoded", you have converted the space into a
+
in the URL-path. The+
is a valid "URL encoding" for a space when used in the query string only. A+
in the URL-path is a literal+
(and will be seen by search engines as such). In the URL-path, a space would be URL encoded as%20
. (You may have used the wrong PHP encoding functions, eg.urlencode()
instead ofrawurlencode()
?)Of course, you are free to convert/encode the URL however you wish to create a more readable URL – providing it’s valid.