skip to Main Content

I have some urls with %20 (encoded space) inside.
I have to remove %20.

So, for example

https://www.example.com/aaaa%20bbbb/page.html

must become

https://www.example.com/aaaabbbb/page.html

I tried with this Apache rewrite rule:

RewriteRule ^/([^%20]*)(?:%20)+(.*)$ /$1$2 [R=301] 

The problem is that first group stop on %, 2 or 0

So the rule doesn’t work for a URL like this:

https://www.example.com/aa2aa%20bbbb/page.html

Any suggestion?

2

Answers


  1. You can modify your regular expression to include the character class for any alphanumeric character or underscore after the first group. This will ensure that the first group includes all characters before the encoded space, regardless of whether there are any special characters in the URL. Here’s an updated version of your Apache rewrite rule:

    RewriteRule ^/([a-zA-Z0-9_]+)(?:%20)+(.)$ /$1$2 [R=301]

    This regular expression will match any string that starts with a sequence of one or more alphanumeric characters or underscores, followed by one or more occurrences of the encoded space (%20), and ending with any character. The first group will capture all characters before the encoded space, and the second group will capture the character after the encoded space.

    With this rule, URLs like https://www.example.com/aa2aa%20bbbb/page.html should be correctly rewritten to https://www.example.com/aa2aabbbb/page.html.

    Login or Signup to reply.
  2. A fundamental problem here is that the URL-path that the RewriteRule pattern matches against is already URL-decoded (%-decoded), so you need to match a literal space, not %20 (an encoded space).

    (Aside: You would only need to match %20 if the requested URL had a doubly encoded space, ie. %2520.)

    For example:

    RewriteRule ^/(S+)s+(.*) /$1$2 [R=301,L]
    

    s is a shorthand character class for any whitespace character. S is the same as [^s] (ie. anything except whitespace). Alternatively, you could just use a regex like ^/(.*)s+(.*), but that is less efficient since it potentially requires a lot more backtracking.

    I removed the trailing $ on the regex since it’s not required here (since the * quantifier is greedy by default).

    The slash prefix on thenRewriteRule pattern assumes this directive is being used directly in a server or virtualhost context (ie. not in a <Directory> or .htaccess context).

    You were also missing the L flag (important if you have any directives that follow in this context).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search