skip to Main Content

I have an AWS REST API Gateway configured with mTLS and a custom domain. This works like a charm. Now, I want to use the AWS Route53 HealthChecks that are hitting the /health endpoint of my gateway. I don’t know how to make healthchecks aware of the fact that the gateway is behind mTLS and my configured healthchecks always fail.

I tried enabling the default endpoint from my gateway just to expose that /health endpoint but I can’t pick that solution even though it works. Exposing the default endpoint is basically opening my entire API for unauthed traffic.

I also tried to make a non-mTLS /health endpoint in my gateway but it seems like mTLS is a per gateway(per custom domain) setting and it can not be disabled just for one endpoint.

Any thoughts how can I solve my issue? Thanks in advance.

2

Answers


  1. Chosen as BEST ANSWER

    @Mark B, so I managed to go with script+mTLS way here but I decided to do it a little bit differently. Let me give you a broader context here so we are on the same page. My primary region is us-west-2, secondary is us-east-2. In the primary region I have an EKS cluster. I realized I can create e.g. a CronJob that has the tls certs embedded via secrets and all it does is a simple call made every 5 min to hit my primary and secondary API Gateways on their /health endpoints. If the response from /health is non-2xx, I then put a 0 value in a specific CloudWatch metric. If the response is 2xx, I put 1 in the metric. Next I'm having a ClodudWatch alarm that observes that metric and based on that metric stream values it triggers an alarm. This alarm is next hooked up to the R53 healthcheck.

    I coded the entire solution only to later realize that I hit the chicken-egg problem:

    1. APIGW is healthy
    2. Cronjob makes a curl on GET /health to this APIGW and it sees it is all OK
    3. We put a 1 to the metric in CloudWatch and alarm is not triggering
    4. Now, the APIGW somehow broke, it returns 5xx all the time
    5. Cronjob makes a curl on GET /health to this APIGW and it gets 5xx because APIGW is down
    6. We put a 0 to the metric in CloudWatch and alarm is triggered
    7. The fact that alarm got triggered means that DNS Failover will not pass any traffic to that dead APIGW and instead it will forward all traffic to the backup region
    8. We are a couple of minutes later and now APIGW is up again
    9. Cronjob makes a curl on GET /health to this APIGW but this time even though the APIGW is healthy, it just can’t reach it because R53 failover still thinks this api is down so we basically can never send a healthy 1 metric again to CloudWatch

    Is it really a non-solvable problem? I believe, using CloudWatch Synthetics would end up with the same issue.


  2. It doesn’t look possible to use mTLS with Route53 Health Checks. You could instead use Amazon CloudWatch Synthetics to perform your health checks, which is more flexible because it allows you to specify your own script to perform the health check. The runtime environment for CloudWatch Synthetics is a bit limited, but you can embed your mTLS certificate as a PEM format string in your health check script.

    Otherwise, you might consider performing health checks against the underlying service(s) that your API Gateway sends traffic to, instead of performing health checks against API Gateway itself.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search