I have some urls with %20
(encoded space) inside.
I have to remove %20
.
So, for example
https://www.example.com/aaaa%20bbbb/page.html
must become
https://www.example.com/aaaabbbb/page.html
I tried with this Apache rewrite rule:
RewriteRule ^/([^%20]*)(?:%20)+(.*)$ /$1$2 [R=301]
The problem is that first group stop on %
, 2
or 0
So the rule doesn’t work for a URL like this:
https://www.example.com/aa2aa%20bbbb/page.html
Any suggestion?
2
Answers
You can modify your regular expression to include the character class for any alphanumeric character or underscore after the first group. This will ensure that the first group includes all characters before the encoded space, regardless of whether there are any special characters in the URL. Here’s an updated version of your Apache rewrite rule:
RewriteRule ^/([a-zA-Z0-9_]+)(?:%20)+(.)$ /$1$2 [R=301]
This regular expression will match any string that starts with a sequence of one or more alphanumeric characters or underscores, followed by one or more occurrences of the encoded space (%20), and ending with any character. The first group will capture all characters before the encoded space, and the second group will capture the character after the encoded space.
With this rule, URLs like https://www.example.com/aa2aa%20bbbb/page.html should be correctly rewritten to https://www.example.com/aa2aabbbb/page.html.
A fundamental problem here is that the URL-path that the
RewriteRule
pattern matches against is already URL-decoded (%-decoded), so you need to match a literal space, not%20
(an encoded space).(Aside: You would only need to match
%20
if the requested URL had a doubly encoded space, ie.%2520
.)For example:
s
is a shorthand character class for any whitespace character.S
is the same as[^s]
(ie. anything except whitespace). Alternatively, you could just use a regex like^/(.*)s+(.*)
, but that is less efficient since it potentially requires a lot more backtracking.I removed the trailing
$
on the regex since it’s not required here (since the*
quantifier is greedy by default).The slash prefix on then
RewriteRule
pattern assumes this directive is being used directly in a server or virtualhost context (ie. not in a<Directory>
or.htaccess
context).You were also missing the
L
flag (important if you have any directives that follow in this context).