skip to Main Content

Ive got a weird URL giving a 200 status where it shouldn’t. It should just give a 404 error. Is there a 404 redirect I can use in htaccess for this?

Good URLs look like this

www.example.com/this-is-static/anytext

or

www.example.com/this-is-static/anytext/alsoanytext_123

or

www.example.com/this-is-static/anytext/alsoanytext_123-123

or

www.example.com/this-is-static/anytext/alsoanytext/alsoanytextagain_123-123

The bad URL looks like this

www.example.com/this-is-static/anytext/alsoanytext

Side note: The words anytext,alsoanytext and alsoanytextagain are random wildcards * … they can be any words. The numbers "123" could be any combination of numbers

The " this-is-static " doesn’t change

So as you can see, the bad URL doesn’t have the part "_XXXXXX"

I basically need it so that if the URL gets up to subfolder "alsoanytext" ( and beyond) but no underscore after it, to 404 redirect

Hopefully, this makes sense

EDIT 2:

Im not sure below method would work as the regex matches all the URL’s except for the "bad one" making this impossible with htaccess without setting up new rules for all the other subfolders.

Just to simplify, using this as an example,( https://regexr.com/5e5fd ) how would we get the line

www.example.com/this-is-static/anytext/alsoanytext

to be the only match

2

Answers


  1. Use the following expression with a non-capturing group:

    www.example.com/this-is-static/[^/s]*(?:/[^_s]+_.*)?$
    

    Explained:

    • Match all the http://www.example.com/this-is-static/ regularly.
    • Match all but a slash ([^/s]*).
    • ?: Non-capturing group for the /alsoanytext stuff. The ? quantifier at the end makes this part optional so /anytext without anything else also matches.
    • / Match the slash now.
    • [^_s]+ Match all but underscore.
    • Match underscore _.
    • Match anything else except line breaks .*. Use your preferred method to match the rest of the URL if matching all is not okay.
    • Match the end of the string $. Required to make sure that nothing follows /anytext except the properly formed /alsoanytext_whatever. Otherwise, you may have a partial match when the wrong URL is used (since the beginning will match with /anytext).

    You can see it at regexr.

    Login or Signup to reply.
  2. There are two aspects to this… the regex and Apache .htaccess.

    www.example.com/this-is-static/[^/s]*(?:/[^_s]+_.*)?$
    

    @MarcSances appears to have created the regex that matches "good" URLs (+1). (Although do you need s (whitespace) to be part of the negated character class?) Using mod_rewrite you can simply negate this regex (with a ! prefix) to not-match good URLs (ie. successful for "bad" URLs).

    In Apache config files you do not need to backslash-escape slashes since the slash carries no special meaning (there are no regex-delimiters, except for spaces as argument delimiters). (It’s unfortunate that regexr.com does not allow you to change the regex delimiters?!)

    Note that with the RewriteRule directive you only match against the URL-path (less the slash prefix in .htaccess), not the hostname.

    RewriteRule ^this-is-statis/[^/s]*(?:/[^_s]+_.*)?$ error.php [R=404,L]
    

    When you specify a non-3xx status for the R flag, the substitution string (ie. error.php in this example) is ignored. You should specify a single hyphen (-) instead to explicitly indicate "no substitution". Also, the L flag is superfluous, it is implied.

    So, to negate this expression it would become:

    RewriteRule !^this-is-statis/[^/]*(?:/[^_]+_.*)?$ - [R=404]
    

    Will serve the 404 ErrorDocument for requested URLs that do not match "good URLs".


    /this-is-static/anytext/alsoanytext
    

    However, it looks like you should be able to match a "bad" URL directly, unless alsoanytext itself could legitimately contain underscores (_). For example:

    RewriteRule ^this-is-static/[^/]+/[^/_]+$ - [R=404]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search