skip to Main Content

i was trying to protect the main page because on google console my report on a querystring is visible like this example:

https://example.com/?s=something.g

i would like to 404 all querystring only on the main page "example.com/" but any other like the javascripts/css files, folders and wp-admin can use querystrings

this is not allowed (only on main page):

https://example.com/?anything=something
https://example.com/?anythingnew=something&anotherone=something
https://example.com/index.php?anything=something

but these urls should be allowed (all other should be good):

https://example.com/something.js?anything=something
https://example.com/folder/?anything=something
https://example.com/folder/anotherfolder/anyfile.php?anything=something

i was trying to do this:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /([^?]*)?
RewriteRule (.*) /$1? [R=404,L]

it appears that all querystrings as disallowed including the files and folders inside.

i also tried this:

RewriteCond %{QUERY_STRING} .+
RewriteRule (.*) /$1? [R=404,L]

same thing, nothing worked, the rule should only be in the main page. thanks in advance

2

Answers


  1. You were not far from the solution:

    RewriteCond %{QUERY_STRING} ^.+$
    RewriteRule ^(?:index.php)?$ - [R=404,L]
    

    Explained

    1. The RewriteRule will take the path (without the query string)
      as input. So, if you want to apply this rule only for the homepage
      (with or without index.php) then you have to write a regular
      expression such as ^(?:index.php)?$ :

      • ^ matches the beginning of the string, meaning "it should
        start with"
        instead of just "it should contain".
      • $ matches the end of the string, meaning
        "it should finish with".
      • (?:) is a non-capturing group. If you put () then it’s a
        capturing group, which will generate a variable called $1.
        But we don’t need to capture this part to put it back in the new
        rewritten URL as we can just put - to say "nothing to change"
        and generate the 404 error. Putting the question mark behind this
        group means that it can be present or not. I’ve put index.php
        inside it to say that we can have it or not in the URL. The dot
        has to be escaped because . means "any char" in a regular
        expression pattern.

      You might see someone write also ^/?(?:index.php)?$ to say that
      it could be with or without the leading slash. But normally
      Apache will always strip this leading slash before using it in
      the RewriteRule test. So there’s no reason to put it as this test
      will use a few CPU cycles for nothing.

    2. The RewriteCond is only run if we enter the RewriteRule.
      Here, we want to test if the query string is empty or not. This can
      easily be done by matching any char one or several times with .+.
      It would work with or without the ^ and $ around. I prefer
      putting them to show that the full query string must not be empty.

    Login or Signup to reply.
  2. The below Rewrite configuration passed all of your provided test cases.

    RewriteCond %{QUERY_STRING} .+
    RewriteCond %{REQUEST_URI} !^/w+.(js|css)$
    RewriteRule !/ - [R=404,L]
    

    Firstly, the RewriteRule is ensuring that it is applied only to requests on your homepage. In these examples, the request and resulting string input to the Rewrite rule are:

    Request: '/',                RewriteRule input: ''
    Request: '/index.php',       RewriteRule input: 'index.php'
    Request: '/folder/anything', RewriteRule input: 'folder/anything'
    

    As we prefixed the regex in the Rewrite rule with !, it will negate the result. Meaning only strings that do not have a / will continue to the RewriteCond checks.

    Next, all RewriteCond lines need to evaluate as true for the RewriteRule to be applied. Here we have two:

    • RewriteCond %{QUERY_STRING} .+ is checking to ensure that the query string is not empty by matching for 1 or more of any character
    • RewriteCond %{REQUEST_URI} !^/w+.(js|css)$ is checking that the URI is not requesting a file with javascript or css extension. This is another negated condition, so it is actually checking to see if the requested URI matches any word, followed by a literal ., and either the word css or js.

    There is an implied AND condition between these RewriteCond rules.


    As a bonus if required, you can enable additional logging to troubleshoot the Rewrite module by adding the below line to your Apache conf file.

    LogLevel alert rewrite:trace6
    

    Some examples of the output you’ll see taken from my testing:

    Request: /index.html?test=true, Logs:

    applying pattern '/' to uri 'index.html'
    RewriteCond: input='test=true' pattern='.+' => matched
    RewriteCond: input='/index.html' pattern='!^/\w+\.(js|css)$' => matched
    forcing responsecode 404 for /opt/homebrew/var/www/index.html
    

    Request: /test.css?test=true, Logs:

    applying pattern '/' to uri 'test.css'
    RewriteCond: input='test=true' pattern='.+' => matched
    RewriteCond: input='/test.css' pattern='!^/\w+\.(js|css)$' => not-matched
    pass through /opt/homebrew/var/www/test.css
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search