I want to improve SEO (i.e., correctly index my pages on search engines) in a serverless architecture when my website is hosted on AWS S3.
As I’m using a JavaScript approach to routing (something akin to angular, but simpler) and getting dynamic content to fill metatags, I’m finding everything to be quite troublesome for scrapers without JavaScript support, like Facebook’s.
I have default meta-tags already inserted and those are, of course, loaded just fine but I need the updated ones.
I know most people uses pre-rendering on a server or through something like Prerender.io but I really wanted to find an alternative that makes sense on a serverless approach.
I thought I had it figured out since Open Graph metatags allow for a “pointers” URL where you can have a “metatags-only” HTML if needed. So I was thinking of using a Lambda function to generate the HTML response with the right metatags on a GET request. The problem is since the Facebook scraper has no JavaScript support, how can I send the dynamic content on the GET request?
4
Answers
If you are using S3, you must prerender the pages before uploading them. You can’t call Lambda functions on the fly because the crawler will not execute JavaScript. You can’t even use Prerender.io with S3.
Suggestion:
* E.g.: the address from example.com/about/us must be mapped as a us.html file inside a folder about in your bucket root.
Now, your users and the crawlers will see the exactly the same pages, without needing JavaScript to load the initial state. The difference is that with JavaScript enabled, your framework (Angular?) will load the JS dependencies (like routes, services, etc.) and take control like a normal SPA application. When the user click to browse another page, the SPA will reload the inner content without making a full page reload.
Pros:
Cons:
* The crawler will see the old content, but the user will probably see the current content as the SPA framework will take control of the page and load the inner content again.
You said that you are using S3. If you want to prerender on the fly, you can’t use S3. You need to use the following:
Route 53 => CloudFront => API Gateway => Lambda
Configure:
– Set the API Gateway endpoint as the CloudFront origin.
– Use “HTTPS Only” in the “Origin Policy Protocol” (CloudFront).
– The Lambda function must be a proxy.
In this case, your Lambda function will know the requested address and will be able to correctly render the requested HTML page.
Pros:
Cons:
If you are willing to use CloudFront on top of your S3 bucket, there is a new possibility to solve your problem using prerender on the fly. Lambda@Edge is a new feature that allows code to be executed with low latency when a page is requested. With this, you can verify if the agent is a crawler and prerender the page for him.
01 Dec 2016 announcement: Lambda@Edge – Preview
(…)
This feature is currently in preview mode (dec/2016), but you can request AWS to experiement it.
Here’s a solution that uses (and is approved by) prerender.cloud: https://github.com/sanfrancesco/prerendercloud-lambda-edge
This uses Lambda@Edge to prerender your app via a
make deploy
command.Taken from the repo’s README:
There are actually couple of options. Mostly will require Cloudfront and Lambda@Edge. One possible way is to add some logic to your lambda@edge function to check the ‘user-agent’ header of the request to differentiate between requests from crawlers and regular users. If request is from crawler, you can present a crawler friendly response with meta tags optimized for such request.
This will definitely require some extra work, and it means a lambda@edge execution with almost every request. I hope that AWS give us an option to differentiate based on header on the future.