I created a script that serves private files from outside of the domains folder. The intent behind the script is to use some PHP script to verify that whoever is requesting the file has proper access to what they are asking for.
For the sake of trying to avoid asking repeatedly for a file the user has already downloaded, I added some headers to establish caching rules for the file. So far, everything appears to be working correctly for determining if the file served has since changed. However despite no longer sending a 304 Not modified status, it’s still sending the old file.
ex.:
- Request newly added file; serve file with 200 Ok code
- Request previously loaded file; serve 304 Not Modified code
- Change image, request newly changed image; serve new file with 200 Ok code
The only part that isn’t working is serving the new image, my browser refuses to update it unless I do Ctrl+F5
. If I dump just the raw data of the file, I can visually see that it’s updating properly but as soon as I add the Content-type header, it refuses to update.
Here’s a sample of the code I’m using to handle the caching headers
(actual values are determined before the script reaches this part, and they aren’t really relevant)
<?php
$filename = 'test.jpg';
$path = '/0001/user/';
$filesize = 300000;
$filetype = 'image/jpeg';
$created = '2023-10-17 12:00:00';
$modified = '2023-10-17 13:00:00';
$etag = md5($filename.$modified);
// Set expiration date to 30 days from the current date
$expiration = strtotime('+30 days');
$expiration = gmdate('D, d M Y H:i:s GMT', $expiration);
// Set the "Cache-Control" header to specify private caching for 30 days
header('Cache-Control: private, max-age=2592000');
header('Expires: '.$expiration);
if (isset($_SERVER['HTTP_IF_NONE_MATCH'])) {
if ($_SERVER['HTTP_IF_NONE_MATCH'] === $etag) {
// The client's cached version matches the current version, send a 304 Not Modified response
$serverLastModified = max(strtotime($modified), strtotime($created));
header('Last-Modified: '.gmdate('D, d M Y H:i:s GMT', $serverLastModified));
header('ETag: '.$etag);
header('HTTP/1.1 304 Not Modified');
die;
}
} else if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
// If the browser has a cached version, check if it's up to date
$clientLastModified = strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE']);
$serverLastModified = max(strtotime($modified), strtotime($created));
if ($clientLastModified >= $serverLastModified) {
// The client's cached version is up to date, send a 304 Not Modified response
header('Last-Modified: '.gmdate('D, d M Y H:i:s GMT', $serverLastModified));
// Add an ETag header for cache validation
header('ETag: '.$etag);
header('HTTP/1.1 304 Not Modified');
die;
}
}
// Add an ETag header for cache validation
header('ETag: '.$etag);
// Set the "Last-Modified" header using the most recent date
$lastModifiedDate = max(strtotime($modified), strtotime($created));
$lastModifiedDateFormatted = gmdate('D, d M Y H:i:s GMT', $lastModifiedDate);
header('Last-Modified: '.$lastModifiedDateFormatted);
// Output file to browser
header('Content-type: '.$filetype);
// Flush output buffer and send headers
clearstatcache();
if (ob_get_length()) {
ob_clean();
ob_end_clean();
}
flush();
readfile(__DIR__.'/../../private/'.$path.$filename);
In case it helps at all, here are the headers as sniffed by redbot.org
HTTP/1.1 200 OK
Connection: Keep-Alive
Cache-Control: private, max-age=2592000
Expires: Fri, 17 Nov 2023 13:09:12 GMT
Vary: Accept-Encoding,User-Agent
ETag: "92951e93c30b48ff39298b48877a0ccb"
Last-Modified: Thu, 12 Oct 2023 18:09:23 GMT
Content-Type: image/jpeg
Transfer-Encoding: chunked
Date: Wed, 18 Oct 2023 12:09:12 GMT
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Alt-Svc: quic=":443"; ma=2592000; v="43,46",
h3-Q043=":443"; ma=2592000, h3-Q046=":443";
ma=2592000, h3-Q050=":443"; ma=2592000,
h3-25=":443"; ma=2592000, h3-27=":443";
ma=2592000
2
Answers
You don’t need to use an ETag or expiration for what you want. The
Last-Modified
mechanism is enough. And, most importantly, you need to use the correct value forCache-Control
. which isno-cache
. Despite its name, the no-cache directive doesn’t mean that the resource can’t be cached; it just means that the browser must validate it with the server before using it.Here is a simple example of a script that serves a file named data.txt:
TL;DR; The solution posted by @Olivier in their answer is probably suitable for most use cases.
However, there are a couple of things to consider.
Any request comes with an additional time penalty and will use resources on both, client and server. Setting
Cache-Control: no-cache
means, that every time the resource is used by the client, there will be a request to the web server. If the resource wasn’t modified, the payload will not be transferred, but the overhead, a request comes with, will still be there. So if, at the time of a request, you can be sure the resource won’t be changing for another X seconds, it would be better to use something likeCache-Control: max-age=X; must-revalidate
(where X is the number of seconds the resource won’t be stale for). If your resource changes on a regular basis, you could additionally use the headerAge: Y
(where Y is the age of the resource in seconds). For example: If you created the resource via cron job every 10 minutes, last time 2 minutes ago, your headers could look something likeThe client will automatically subtract the
Age
(120) from themax-age
(600).Resources might change more often than once a second. When using
Last-Modified
to identify a modification, this could lead to problems. Consider a change at second 0.1 (Version 1.0), a request at second 0.3, another change at second 0.8 (Version 2.0) and then no further changes for one hour. The client would then use (the stale) Version 1.0 for another hour, because Version 1.0 and Version 2.0 carry the same modification time. Might seem unlikely at first glance but is quite common for log files, for example. For that kind of files it would be better to use anETag
with either the file size or (if the file is not growing constantly) anETag
with a combination of file size and last modification time.The last modification time might change w/out the contents actually changing, e.g. when
touch
ing a resource. In this case it would probably be best to use anEtag
with (a hash of) the file size only or even with a complete file hash. While the latter will cost a lot of CPU and I/O time on the server, it is the safest approach and might be suitable for very large resources and/or slow connections.I tried to cover all of these scenarios with a single PHP class, which can be found here. While it is lacking proper documentation yet, it should be self-explanatory for the most part. Everybody feel free to comment and/or contribute.