skip to Main Content

Trying to use htaccess to redirect Googlebot from an incoming URL file request of this form:

v_3099_0726dd5b5e8dd67a214c0c243436d131_all.css

to a file of this form, where the four digits at 5028 are not known but are always four digits.

v_5028_0726dd5b5e8dd67a214c0c243436d131_all.css

I don’t think this is possible with regex, because a catchcall regex for those four characters cannot be used in the target path. In other words, Rewrite cannot be told, “go look in directory ____ for a file name that matches everything literally except those four characters, and match them with a regex catchall.”

In RewriteCond, pattern matching with regex is only available in the right-hand side, in CondPattern, not in the left-hand side (test string) where one could have the false hope of using regex to match a file name on the server with -f flag and then using that regex grouping in a backreference in RewriteRule to accomplish the goal. That strategy won’t work.

In RewriteRule, pattern matching with regex is only available on the left-hand side, in Pattern of the incoming URL, so can’t be used on the right-hand side to say, “go look in the directory ____ for a file that is the same except for these four unknown digits, and redirect to that file.” That strategy won’t work, either.

Any ideas how to accomplish the goal stated at the top? Thank you.

2

Answers


  1. Chosen as BEST ANSWER

    I was unable to determine why server configuration or site code was forcing '410 Gone' response directive in htaccess to be overridden with a 404 response, so had to do something like this to tell googlebot to stop hunting for CSS/JS files that get purged periodically (and renamed when regenerated).

    in .htaccess:

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule v_(.*)_(.*)$ /410response.php [L]
    

    in 410response.php placed in root:

    <?php header($_SERVER['SERVER_PROTOCOL'].' 410 Gone');

    UPDATE I

    The 404 response when attempting to use htaccess for the 410 directive was being forced by the server, because of server apparently having a custom 410 document, that apparently routed to 404. Adding a directive to prevent that then properly allowed use of htaccess to return 410 for pattern matches in RewriteRule. (I thought that I had already checked yesterday to see if this would work, since @MrWhite said in his answer above to control for server possibly having a custom 410; today when making this check, it did work and indicate that server 410-to-404 redirection was overridding my 410 directive.)

    ErrorDocument 410 default
    RewriteRule test.txt$ - [NC,R=410]
    

    MrWhite! I located this solution in one of your posts on Stack Exchange.


  2. I can’t think of a way to do this in .htaccess alone. The file that you want to redirect/rewrite to must be “known”. There’s no way (in .htaccess) that I can see to scan a particular directory for a file that matches a particular pattern and return that instead (without the help of an external script).

    (Aside: MultiViews allows a file with an essentially unknown extension to be served – but that’s not the case here.)

    You can potentially “test” the existence of various files before redirecting/rewriting to one of them, but with what looks like a “random” 4-digit number – that would be hopelessly inefficient.

    However, what you could do is internally rewrite the request for such a file (that does not exist) to a server-side script (such as PHP). It would then be trivial (assuming there is just 1 file that should match this pattern) for this script to check for the “current” file and either redirect or return this file instead.

    The .htaccess portion of this would be something like:

    RewriteEngine On
    
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^v_d{4}_[0-9a-f]{32}_all.css$ return-current-file.php [L]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search