I have a question in regard to how CloudFront will use an S3 object’s ETag to determine if it needs to send a refreshed object or not.
I know that the ETag will be part of the Request to the CloudFront distribution, in my case I’m seeing the "weak" (shortened) version:
if-none-match: W/"eabcdef4036c3b4f8fbf1e8aa81502542"
If this ETag being sent does not match the S3 Object’s current ETag value, then the CloudFront will send the latest version.
I’m seeing this work as expected, but only after the CloudFront’s cache policy has been reached. In my case it’s been set to 20 mins.
CloudFront with a Cache Policy:
- Minimum TTL: 1
- Maximum TTL: 1200 <– (20 mins)
- Default TTL: 900
- Origin Request Policy is not set
S3 Bucket:
- Set to only allow access via its corresponding CloudFront
distribution above. - Bucket and objects not public
- The test object (index.html) in this case has only one header set:
Content-Type = text/html - While I am using the CloudFront’s Cache Policy, I’ve also tested
using the S3 Object header of Cache-Control = max-age=6000 - This had no affect on the refresh of the "index.html" object in
regard to the ETag check I’m asking about.
The Scenario:
Upon first "putObject" to that S3 bucket, the "index.html" file has an ETag of:
eabcdef4036c3b4f8fbf1e8aa81502542
When I hit the URL (GET) for that "index.html" file, the cache of 20 mins is effectively started.
Subsequent hits to the "index.html" URL (GET) has the Request with the value
if-none-match: W/"eabcdef4036c3b4f8fbf1e8aa81502542"
I also see "x-cache: Hit from cloudfront" in the Response coming back.
Before the 20 mins is up, I’ll make a change to the "index.html" file and re-upload via a "putObject" command in my code.
That will then change the ETag to:
exyzcde4099c3b4f8fuy1e8aa81501122
I would expect then that the next Request to CloudFront, before the 20-minute TTL and with the old "if-none-match" value, would then prompt the CloudFront to see the ETag is different and send the latest version.
But in all cases/tests it doesn’t. CloudFront will seem to ignore the ETag difference and continue to send the older "index.html" version.
It’s only after the 20 mins (cache TTL) is up that the CloudFront sends the latest version.
At that time the ETag in the Request changes/updates as well:
if-none-match: W/"exyzcde4099c3b4f8fuy1e8aa81501122"
Question (finally, huh?):
Is there a way to configure CloudFront to listen to the incoming ETag, and if needed, send the latest Object without having to wait for the Cache Policy TTL to expire?
UPDATE:
Kevin Henry’s response explains it well:
"CloudFront doesn’t know that you updated S3. You told it not to check with the origin until the TTL has expired. So it’s just serving the old file until the TTL has expired and it sees the new one that you uploaded to S3. (Note that this doesn’t have anything to do with ETags)."
So I decided to test how the ETag would be used if I turned the CloudFront Caching Policy to a TTL of 0 for all three CloudFront settings. I know that this defeats the purpose, and one of the strengths, of CloudFront, but I’m still wrapping my head around certain key aspects of CDN caching.
After setting the cache to 0, I’m seeing a continual "Miss from CloudFront" in the Response coming back.
I expected this, and in the first response I see a HTTP status of 200. Note the file size being returned is 128KB for this test.
Subsequent calls to this same file return a HTTP status of 304, with a file size being returned around 400B.
As soon as I update the "index.html" file in the S3 bucket, and call that same URL, the status code is 200 with a file size of 128KB.
Subsequent calls return a status of 304, again with an average of 400B in file size.
Looking again at the definition of an HTTP status of 304:
"A conditional GET or HEAD request has been received and would have resulted in a 200 OK response if it were not for the fact that the condition evaluated to false.
In other words, there is no need for the server to transfer a representation of the target resource because the request indicates that the client, which made the request conditional, already has a valid representation; the server is therefore redirecting the client to make use of that stored representation as if it were the payload of a 200 OK response."
So am I correct in thinking that I’m using the Browser’s cache at this point?
The calls to the CloudFront will now pass the requests to the Origin, where the ETag is used to verify if the resource has changed.
As it hasn’t, then a 304 is returned and the Browser kicks in and returns its stored version of "index.html".
Would this be a correct assumption?
In case you’re wondering, I can’t use the invalidation method for clearing cache, as my site could expect several thousand invalidations a day. I’m hosting a writing journal site, where the authors could update their files daily, therefore producing new versions of their work on S3.
I would also rather not use the versioning method, with a timestamp or other string added as a query to the page URL. SEO reasons for this one mainly.
My ideal scenario would be to serve the same version of the author’s work until they’ve updated it, at which time the next call to that same page would show its latest version.
This research/exercise is helping me to learn and weigh my options.
Thanks again for the help/input.
Jon
2
Answers
"I would expect then that the next Request to CloudFront, before the 20-minute TTL and with the old
if-none-match
value, would then prompt the CloudFront to see theETag
is different and send the latest version."That is a mistaken assumption. CloudFront doesn’t know that you updated S3. You told it not to check with the origin until the TTL has expired. So it’s just serving the old file until the TTL has expired and it sees the new one that you uploaded to S3. (Note that this doesn’t have anything to do with
ETags
).CloudFront does offer ways to invalidate the cache, and you can read more about how to combine that with S3 updates in these answers.
We can enable bucket versioning and object with new etag is picked up by the cloudfront