How do I 301 redirect (HTTP to HTTPS) && (www to non-www) for a single domain using S3 and Cloudfront? - SEO

holaymolay
October 10, 2018
253 views
0 votes
2 Answers

I am hosting a static site (purely html/css) on AWS S3 with a CloudFront distribution. I have no problem configuring only CloudFront to redirect HTTP to HTTPS. Nor do I have a problem only having S3 redirect www to a non-www (naked) subdomain.

The problem comes when I try to redirect all HTTP traffic to HTTPS and simultaneously redirect all www subdomains to non-www.

It simply doesn’t work. And I haven’t been able to find a solution to this problem and I’ve been looking for months. It may seem like StackOverflow has the answer, but I’m telling you it doesn’t. Either their solution reaches a dead-end or the solution is for an older AWS user interface that doesn’t quite match the way it is today.

The best I have been able to come up with is an HTML redirect for www to non-www, but that’s not ideal from an SEO and maintainability standpoint.

What is the best solution for this configuration?

Answers

As I mentioned in Supporting HTTPS URL redirection with a single CloudFront distribution, the simple and straightforward solution involves two buckets and two CloudFront distributions — one for www and one for the bare domain. I am highly skeptical that this would have any negative SEO impact.

However, that answer pre-dates the introduction of the CloudFront Lambda@Edge extension, which offers another solution because it allows you to trigger a Javascript Lambda function to run at specific points during CloudFront’s request processing, to inspect the request and potentially modify it or otherwise react to it.

There are several examples in the documentation but they are all very minimalistic, so here’s a complete, working example, with more comments than actual code, explaining exactly what it does and how it does it.

This function — configured as an Origin Request trigger — will fire every time there is a cache miss, and inspect the Host header sent by the browser, to see if the request should be allowed through, or if it should be redirected without actually sending the request all the way through to S3. For cache hits, the function will not fire, because CloudFront already has the content cached.

Any other domain name associated with the CloudFront distribution will be redirected to the “real” domain name of your site, as configured in the function body. Optionally, it will also return a generated 404 response if someone accesses your distribution’s *.cloudfront.net default hostname directly.

You may be wondering how the cache of a single CloudFront distribution can differentiate between the content for example.com/some-path and www.example.com/some-path and cache them separately, but the answer is that it can and it does if you configure it appropriately for this setup — which means telling it to cache based on selected request headers — specifically the Host header.

Normally, enabling that configuration wouldn’t be quite compatible with S3, but it works here because the Lambda function also sets the Host header back to what S3 expects. Note that you need to configure the Origin Domain Name — the web site hosting endpoint of your bucket — inline, in the code.

With this configuration, you only need one bucket, and the bucket’s name does not need to match any of the domain names. You can use whatever bucket you want… but you do need to use the web site hosting endpoint for the bucket, so that CloudFront treats it as a custom origin. Creating an “S3 Origin” using the REST endpoint for the bucket will not work.

'use strict';

// if an incoming request is for a domain name other than the canonical
// (official) hostname for the site, this Lambda@Edge trigger
// will redirect the request back to the official site, subject to the
// configuration parameters below.

// this trigger must be deployed as an Origin Request trigger.

// in the CloudFront Cache Behavior settings, the Host header must be
// whitelisted for forwarding, in order for this function to work as intended;
// this is an artifact of the way the Lambda@Edge interface interacts with the
// CloudFront cache key mechanism -- we can't react to what we can't see,
// and if it isn't part of the cache key, CloudFront won't expose it.

// specify the official hostname of the site; requests to this domain will
// be passed through; others will redirect to it...

const canonical_domain_name = 'example.com'; 

// ...but note that every CloudFront distribution has a default *.cloudfront.net
// hostname that  can't be disabled; you may not want this hostname to do
// anything at all, including redirect; set this parameter to true if you
// want to to return 404 for the default hostname; see the render_reject()
// function to customize the behavior further.

const reject_default_hostname = false; 

// the "origin" is the server that provides your content; this is configured
// in the distribution and selected in the Cache Behavior settings, but
// that information needs to be provided here, so that we can modify
// successful requests to match what the destination expects.

const origin_domain_name = 'example-bucket.s3-website.us-east-2.amazonaws.com';

// http status code for redirects; you may want 302 or 307 for testing,
// and 301 or 308 for production; note that this is a string, not a number.

const redirect_http_status_code = '302';

// for generated redirects, we can also set a cache control header; you'll need
// to ensure you format this correctly, since the code below does not validate
// the syntax; here, max-age is how long the browser should cache redirects, 
// while s-maxage tells CloudFront how long to potentially cache them;
// higher values should result in less traffic and potentially lower costs;
// set to empty string or null if you don't want to set a value.

const redirect_cache_control = 'max-age=300, s-maxage=86400';

// set false to drop the query string on redirects; true to preserve

const redirect_preserve_querystring = true;

// set false to change the path to '/' on redirects; true to preserve

const redirect_preserve_path = true;

// end of configuration

// the URL in the generated redirect will always use https unless you
// configure whitelisting of CloudFront-Forwarded-Proto, in which case we
// will use that value; if you want to send http to https, use the
// Viewer Protocol Policy settings in the CloudFront cache behavior.


exports.handler = (event, context, callback) => {

    // extract the CloudFront object from the trigger event    
    const cf = event.Records[0].cf;

    // extract the request object
    const request = cf.request;

    // extract the HTTP Host header
    const host = request.headers.host[0].value;

    // check whether the host header matches the canonical value; if so,
    // set the host header to what the origin expects, and return control
    // to CloudFront

    if(host === canonical_domain_name)
    {
        request.headers.host[0].value = origin_domain_name;
        return callback(null, request);
    }

    // check for rejection

    if (reject_default_hostname && host.endsWith('.cloudfront.net'))
    {
        return render_reject(cf, callback);
    }

    // if neither 'return' above has been invoked, then we need to generate a redirect.

    const proto = (request.headers['cloudfront-forwarded-proto'] || [{ value: 'https' }])[0].value;

    const path = redirect_preserve_path ? request.uri : '/';

    const query = redirect_preserve_querystring && (request.querystring != '') ? ('?' + request.querystring) : '';

    const location = proto + '://' + canonical_domain_name + path + query;

    // build a response object to redirect the browser.

    const response = {

        status: redirect_http_status_code,
        headers: {
            'location': [ { key: 'Location', value: location } ],
        },    
        body: '',

    };

    // add the cache control header, if configured

    if(redirect_cache_control)
    {
        response.headers['cache-control'] = [{ key: 'Cache-Control', value: redirect_cache_control }];
    }

    // return the response object, preventing the request from being sent to
    // the origin server

    return callback(null, response);

};

function render_reject(cf, callback) {
    // only invoked if the request is for *.cloudfront.net and you set
    // reject_default_hostname to true; here, we generate a very simple
    // response, text/plain, with a 404 error.  This can be customized to HTML
    // or XML, etc., according to your local practices, but be sure you properly
    // escape the request URI, since it is untrusted data and could lead to an
    // XSS injection otherwise; no similar vulnerability exists with plain text.

    const body_text = `The requested URL '${cf.request.uri}' does not exist ` +
                      'on this server, or access is not enabled via the ' +
                      `${ cf.request.headers.host[0].value } endpoint.rn`;

    // generate a response; you may want to customize this; note that
    // Lambda@Edge is strict with regard to the way headers are specified;
    // the outer keys are lowercase, the inner keys can be mixed.

    const response = {
        status: '404',
        headers: {
            'cache-control': [{ key: 'Cache-Control', value: 'no-cache, s-maxage=86400' }],
            'content-type':  [{ key: 'Content-Type',  value: 'text/plain' }],
        },
        body: body_text,
    };

    return callback(null, response);
}

// eof

- Michaelsqlbot
- October 14, 2018 at 6:29 pm
- 0 votes
0
Finishing up the other answer here using Lambda@Edge, I realized there is a significantly simpler solution, using only a single CloudFront distribution and three (explained below) S3 buckets.

There are more constraints to this solution, but it has fewer moving parts and costs less to implement and use.

Here are the constraints:
- You must be using the S3 web site hosting feature (should be a given, since we’re talking about hosting content and doing redirects)
- The buckets must all be in the same AWS region
- The first two buckets must be named exactly the same as the hostnames you want to handle — e.g. you need a bucket named example.com and a bucket named www.example.com.
- You also need to create a bucket whose name exactly matches the hostname assigned to the CloudFront distribution, e.g. dzczcexample.cloudfront.net, and this bucket also must be in the same region as the other two.
Configure the CloudFront distribution’s Origin Domain Name to point to your main content bucket using its web site hosting endpoint, e.g. example.com.s3-website.us-east-2.amazonaws.com.

Configure the Alternate Domain Name settings for both example.com and www.example.com.

Whitelist the Host header for forwarding to the origin. This setting takes advantage of the fact that when S3 does not recognize the incoming HTTP Host header as being one that belongs to S3, then…

the bucket for the request is the lowercase value of the Host header, and the key for the request is the Request-URI.

https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html

Ummm… perfect! That’s exactly what we need — and it gives us a way to pass requests to multiple buckets in one S3 region, through a single CloudFront distribution, based on what the browser asks for… because with this setup, we’re able to split the logic:
- the Origin Domain Name is used only for routing the request from the CloudFront edge to the correct S3 region, then
- the whitelisted Host header is used when the request arrives at S3 for selecting which bucket handles the request.
(This is why all the buckets have to be in the same region, as mentioned above. Otherwise, the request will be delivered to the region of the “main” bucket, and that region will reject it as misrouted if the identified bucket is in a different region.)

With this configuration in place, you’ll find that example.com requests are handled by the example.com bucket, and www.example.com requests are handled by the www.example.com bucket, which means all you need to do now is configure the buckets as desired.

But there is one more critical step. You absolutely need to create a bucket named after your CloudFront distribution’s assigned default domain name (e.g. d111jozxyqk.cloudfront.net), in order to avoid setting up an exploitable scenario. It’s not a security vulnerability, it’s a billing one. It doesn’t make a great deal of difference how you configure this bucket, but it is important that you own the bucket so that nobody else can create it. Why? Because with this configuration, requests sent directly to your CloudFront distribution’s default domain name (not your custom domains) will result in S3 returning a No Such Bucket error for that bucket name. If someone else were to discover your setup, they could create that bucket, you’d pay for all their data traffic through your CloudFront distribution. Create the bucket and either leave it empty (so that an error is returned) or set it up to redirect to your main web site.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How do I 301 redirect (HTTP to HTTPS) && (www to non-www) for a single domain using S3 and Cloudfront? – SEO

Answers