Ive got a weird URL giving a 200 status where it shouldn’t. It should just give a 404 error. Is there a 404 redirect I can use in htaccess for this?
Good URLs look like this
www.example.com/this-is-static/anytext
or
www.example.com/this-is-static/anytext/alsoanytext_123
or
www.example.com/this-is-static/anytext/alsoanytext_123-123
or
www.example.com/this-is-static/anytext/alsoanytext/alsoanytextagain_123-123
The bad URL looks like this
www.example.com/this-is-static/anytext/alsoanytext
Side note: The words anytext,alsoanytext and alsoanytextagain are random wildcards * … they can be any words. The numbers "123" could be any combination of numbers
The " this-is-static " doesn’t change
So as you can see, the bad URL doesn’t have the part "_XXXXXX"
I basically need it so that if the URL gets up to subfolder "alsoanytext" ( and beyond) but no underscore after it, to 404 redirect
Hopefully, this makes sense
EDIT 2:
Im not sure below method would work as the regex matches all the URL’s except for the "bad one" making this impossible with htaccess without setting up new rules for all the other subfolders.
Just to simplify, using this as an example,( https://regexr.com/5e5fd ) how would we get the line
www.example.com/this-is-static/anytext/alsoanytext
to be the only match
2
Answers
Use the following expression with a non-capturing group:
Explained:
[^/s]*
).?:
Non-capturing group for the /alsoanytext stuff. The?
quantifier at the end makes this part optional so /anytext without anything else also matches./
Match the slash now.[^_s]+
Match all but underscore._
..*
. Use your preferred method to match the rest of the URL if matching all is not okay.$
. Required to make sure that nothing follows/anytext
except the properly formed/alsoanytext_whatever
. Otherwise, you may have a partial match when the wrong URL is used (since the beginning will match with/anytext
).You can see it at regexr.
There are two aspects to this… the regex and Apache
.htaccess
.@MarcSances appears to have created the regex that matches "good" URLs (+1). (Although do you need
s
(whitespace) to be part of the negated character class?) Using mod_rewrite you can simply negate this regex (with a!
prefix) to not-match good URLs (ie. successful for "bad" URLs).In Apache config files you do not need to backslash-escape slashes since the slash carries no special meaning (there are no regex-delimiters, except for spaces as argument delimiters). (It’s unfortunate that
regexr.com
does not allow you to change the regex delimiters?!)Note that with the
RewriteRule
directive you only match against the URL-path (less the slash prefix in.htaccess
), not the hostname.When you specify a non-3xx status for the
R
flag, the substitution string (ie.error.php
in this example) is ignored. You should specify a single hyphen (-
) instead to explicitly indicate "no substitution". Also, theL
flag is superfluous, it is implied.So, to negate this expression it would become:
Will serve the 404
ErrorDocument
for requested URLs that do not match "good URLs".However, it looks like you should be able to match a "bad" URL directly, unless
alsoanytext
itself could legitimately contain underscores (_
). For example: