skip to Main Content

I’m new to Kubernetes and to supporting a particular website hosted in Kubernetes. I’m trying to figure out why cert-manager did not renew the certificate in the QA environment a few weeks back.

Looking at the details of various certificate-related resources, the problem seems to be that the challenge failed:

State: invalid, Reason: Error accepting authorization: acme: authorization error for [DOMAIN]: 400 urn:ietf:params:acme:error:connection: Fetching http://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING]: Timeout during connect (likely firewall problem)

I assume that error means Let’s Encrypt wasn’t able to access the challenge file at http://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING]

(Domain and challenge token string redacted)

I’ve tried connecting to the URL via PowerShell:

PS C:UsersSimon> invoke-webrequest -uri http://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING] -SkipCertificateCheck

and it returns a 200 OK.

However, PowerShell follows redirects automatically and checking with WireShark the Nginx web server is performing a 308 permanent redirect to https://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING]

(same URL but just redirecting HTTP to HTTPS)

I understand that Let’s Encrypt should be able to handle HTTP to HTTPS redirects.

Given that the URL Let’s Encrypt was trying to reach is accessible from the internet I’m at a loss as to what the next step should be in investigating this issue. Could anyone provide any advice?

Here is the full output of the kubectl cert-manager plugin, checking the status of the certificate and associated resources:

PS C:UsersSimon> kubectl cert-manager status certificate -n qa containers-tls-secret

Name: containers-tls-secret
Namespace: qa
Created at: 2020-10-16T08:40:14+13:00
Conditions:
  Ready: False, Reason: Expired, Message: Certificate expired on Sun, 14 Mar 2021 17:41:12 UTC
  Issuing: False, Reason: Failed, Message: The certificate request has failed to complete and will be retried: Failed to wait for order resource "containers-tls-secret-q2cwr-3223066309" to become ready: order is in "invalid" state:
DNS Names:
- [DOMAIN]
Events:
  Type     Reason   Age                 From          Message
  ----     ------   ----                ----          -------
  Normal   Issuing  31s (x236 over 9d)  cert-manager  Renewing certificate as renewal was scheduled at 2021-02-12 17:41:12 +0000 UTC
  Normal   Reused   31s (x236 over 9d)  cert-manager  Reusing private key stored in existing Secret resource "containers-tls-secret"
  Warning  Failed   31s (x236 over 9d)  cert-manager  The certificate request has failed to complete and will be retried: Failed to wait for order resource "containers-tls-secret-q2cwr-3223066309" to become ready: order is in "invalid" state:
Issuer:
  Name: letsencrypt
  Kind: ClusterIssuer
  Conditions:
    Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
  Events:  <none>
Secret:
  Name: containers-tls-secret
  Issuer Country: US
  Issuer Organisation: Let's Encrypt
  Issuer Common Name: R3
  Key Usage: Digital Signature, Key Encipherment
  Extended Key Usages: Server Authentication, Client Authentication
  Public Key Algorithm: RSA
  Signature Algorithm: SHA256-RSA
  Subject Key ID: dadf29869b58d05e980c390fdc8783f52369228d
  Authority Key ID: 142eb317b75856cbae500940e61faf9d8b14c2c6
  Serial Number: 04f7356add94a7909afab94f0847a3457765
  Events:  <none>
Not Before: 2020-12-15T06:41:12+13:00
Not After: 2021-03-15T06:41:12+13:00
Renewal Time: 2021-02-13T06:41:12+13:00
CertificateRequest:
  Name: containers-tls-secret-q2cwr
  Namespace: qa
  Conditions:
    Ready: False, Reason: Failed, Message: Failed to wait for order resource "containers-tls-secret-q2cwr-3223066309" to become ready: order is in "invalid" state:
  Events:  <none>
Order:
  Name: containers-tls-secret-q2cwr-3223066309
  State: invalid, Reason:
  Authorizations:
    URL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/10810339315, Identifier: [DOMAIN], Initial State: pending, Wildcard: false
  FailureTime: 2021-02-13T06:41:59+13:00
Challenges:
- Name: containers-tls-secret-q2cwr-3223066309-2302286353, Type: HTTP-01, Token: [CHALLENGE TOKEN STRING], Key: [CHALLENGE TOKEN STRING].8b00cc-ysOWGQ8vtmpOJobWOFa2cEQUe4Sun5NUKCws, State: invalid, Reason: Error accepting authorization: acme: authorization error for [DOMAIN]: 400 urn:ietf:params:acme:error:connection: Fetching http://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING]: Timeout during connect (likely firewall problem), Processing: false, Presented: false

By the way, the invoke-webrequest results show an HTML page was returned:

<!doctype html><html lang="en"><head><meta charset="utf-8"><title>Containers</title><base href="./"><meta name="viewport" content="width=device-width,initial-scale=1"><link rel="icon" href="favicon.ico…

Could that be the issue? I don’t know what Let’s Encrypt expects to find at the URL of the HTTP01 challenge. Is a web page allowed or is it expecting something different?

EDIT: I now suspect the HTML page returned by invoke-webrequest is not normal, since I understand the file should include the Let’s Encrypt token and a key. Here is the full HTML page:

<!doctype html>
<html lang="en">
    <head>
        <meta charset="utf-8">
        <title>Wineworks</title>
        <base href="./">
        <meta name="viewport" content="width=device-width,initial-scale=1">
        <link rel="icon" href="favicon.ico">
        <link rel="apple-touch-icon-precomposed" href="favicon-152.png">
        <meta name="msapplication-TileColor" content="#FFFFFF">
        <meta name="msapplication-TileImage" content="favicon-152.png">
        <script src="https://secure.aadcdn.microsoftonline-p.com/lib/1.0.16/js/adal.min.js"/>
        <link href="styles.025a840d59ecfcfe427e.bundle.css" rel="stylesheet"/>
    </head>
    <body>
        <app-root/>
        <script type="text/javascript" src="inline.ce954cfcbe723b5986e6.bundle.js"/>
        <script type="text/javascript" src="polyfills.7edc676f7558876c179d.bundle.js"/>
        <script type="text/javascript" src="main.da3590aac44ee76e7b3a.bundle.js"/>
    </body>
</html>

Any idea what might cause cert-manager to drop the wrong kind of file at the challenge location?

2

Answers


  1. Chosen as BEST ANSWER

    In the end I was unable to determine the cause of the certificate renewal failure. However, events on one of the certificate-related resources suggested previous renewals had worked. So I thought it was possible whatever the problem was might have been transient or a one-off, and that trying again to renew the certificate may work.

    Reading various articles and blog posts it appeared that deleting the CertificateRequest object would prompt cert-manager to create a new one, which should result in a certificate renewal. Also, deleting the CertificateRequest object would automatically delete the associated ACME Order and Challenge objects as well, so it wouldn't be necessary to delete them manually.

    Deleting the CertificateRequest object did work: The certificate was renewed successfully. However, it didn't renew straight away. Further reading suggests it may take an hour for the certificate renewal (I didn't check the exact time it took so can't verify this).

    To delete a CertificateRequest:

    kubectl delete certificaterequest <certificateRequest name>
    

    For example:

    kubectl delete certificaterequest my-certificate-zrt6p -n qa
    

    If you wish to force an immediate renewal, rather than waiting an hour, after deleting the CertificateRequest object and cert-manager creating a new one run the following kubectl command, if you have the kubectl cert-manager plugin installed:

    kubectl cert-manager renew <certificate name>
    

    For example, to renew certificate my-certificate in namespace qa:

    kubectl cert-manager renew my-certificate -n qa
    

    NOTE: The easiest way to install the kubectl cert-manager plugin is via the Krew plugin manager:

    kubectl krew install cert-manager
    

    See https://krew.sigs.k8s.io/docs/user-guide/setup/install/ for details of how to install Krew (which is useful for all kubectl plugins, not just cert-manager).

    One further thing I found from researching this is that sometimes the old certificate secret can get "stuck", preventing a new secret from being created. You can delete the certificate secret to avoid this problem. For example:

    kubectl delete secret my-certificate -n qa
    

    I assume, however, that without a certificate secret your website will have no certificate, which may prevent browsers from accessing it. So I would only delete the existing secret as a last resort.


  2. Maybe it will help someone in the future. My solution to the mentioned Problem was a misleading wildcard * A DNS ipv6 record. Lets letsencrypt is checking for ipv4&ipv6 record.

    Therefore the solution was to remove the ipv6 record.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search