skip to Main Content

I want to improve SEO (i.e., correctly index my pages on search engines) in a serverless architecture when my website is hosted on AWS S3.

As I’m using a JavaScript approach to routing (something akin to angular, but simpler) and getting dynamic content to fill metatags, I’m finding everything to be quite troublesome for scrapers without JavaScript support, like Facebook’s.

I have default meta-tags already inserted and those are, of course, loaded just fine but I need the updated ones.

I know most people uses pre-rendering on a server or through something like Prerender.io but I really wanted to find an alternative that makes sense on a serverless approach.

I thought I had it figured out since Open Graph metatags allow for a “pointers” URL where you can have a “metatags-only” HTML if needed. So I was thinking of using a Lambda function to generate the HTML response with the right metatags on a GET request. The problem is since the Facebook scraper has no JavaScript support, how can I send the dynamic content on the GET request?

4

Answers


  1. If you are using S3, you must prerender the pages before uploading them. You can’t call Lambda functions on the fly because the crawler will not execute JavaScript. You can’t even use Prerender.io with S3.

    Suggestion:

    1. Host your website locally.
    2. Use PhanthomJS to fetch the pages and write a prerendered version.
    3. Upload each page to S3 following the page address*.

    * E.g.: the address from example.com/about/us must be mapped as a us.html file inside a folder about in your bucket root.

    Now, your users and the crawlers will see the exactly the same pages, without needing JavaScript to load the initial state. The difference is that with JavaScript enabled, your framework (Angular?) will load the JS dependencies (like routes, services, etc.) and take control like a normal SPA application. When the user click to browse another page, the SPA will reload the inner content without making a full page reload.

    Pros:

    • Easy to setup.
    • Very fast to serve content. You can also use CloudFront to improve the speed.

    Cons:

    • If you have 1000 pages (for e.g.: 1000 products that you sell in your store), you need make 1000 prerendered pages.
    • If your page data changes frequently, you need to prerender frequently.
    • Sometimes the crawler will index old content*.

    * The crawler will see the old content, but the user will probably see the current content as the SPA framework will take control of the page and load the inner content again.


    You said that you are using S3. If you want to prerender on the fly, you can’t use S3. You need to use the following:

    Route 53 => CloudFront => API Gateway => Lambda

    Configure:
    – Set the API Gateway endpoint as the CloudFront origin.
    – Use “HTTPS Only” in the “Origin Policy Protocol” (CloudFront).
    – The Lambda function must be a proxy.

    In this case, your Lambda function will know the requested address and will be able to correctly render the requested HTML page.

    Pros:

    • As Lambda has access to the database, the rendered page will always be updated.

    Cons:

    • Much slower to load the webpages.
    Login or Signup to reply.
  2. If you are willing to use CloudFront on top of your S3 bucket, there is a new possibility to solve your problem using prerender on the fly. Lambda@Edge is a new feature that allows code to be executed with low latency when a page is requested. With this, you can verify if the agent is a crawler and prerender the page for him.

    01 Dec 2016 announcement: Lambda@Edge – Preview

    Just last week, a comment that I made on Hacker News resulted in an
    interesting email from an AWS customer!

    (…)

    Here’s how he explained his problem to me:

    In order to properly get indexed by search engines and in order for
    previews of our content to show up correctly within Facebook and
    Twitter, we need to serve a prerendered version of each of our pages.
    In order to do this, every time a normal user hits our site need for
    them to be served our normal front end from Cloudfront. But if the
    user agent matches Google / Facebook / Twitter etc., we need to
    instead redirect them the prerendered version of the site.

    Without spilling any beans I let him know that we were very aware of
    this use case and that we had some interesting solutions in the works.
    Other customers have also let us know that they want to customize
    their end user experience by making quick decisions out at the edge.

    This feature is currently in preview mode (dec/2016), but you can request AWS to experiement it.

    Login or Signup to reply.
  3. Here’s a solution that uses (and is approved by) prerender.cloud: https://github.com/sanfrancesco/prerendercloud-lambda-edge

    This uses Lambda@Edge to prerender your app via a make deploy command.

    Taken from the repo’s README:

    Server-side rendering (pre-rendering) via Lambda@Edge for single-page apps hosted on CloudFront with an s3 origin.

    This is a serverless project with a make deploy command that:

    1. serverless.yml deploys 3 functions to Lambda (viewerRequest, originRequest, originResponse)
    2. deploy.js associates them with your CloudFront distribution
    3. create-invalidation.js clears/invalidates your CloudFront cache
    Login or Signup to reply.
  4. There are actually couple of options. Mostly will require Cloudfront and Lambda@Edge. One possible way is to add some logic to your lambda@edge function to check the ‘user-agent’ header of the request to differentiate between requests from crawlers and regular users. If request is from crawler, you can present a crawler friendly response with meta tags optimized for such request.

    This will definitely require some extra work, and it means a lambda@edge execution with almost every request. I hope that AWS give us an option to differentiate based on header on the future.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search