We have a dozen of services exposed using a ingress-nginx controller in GKE.
In order to route the traffic correctly on the same domain name, we need to use a rewrite-target rule.
The services worked well without any maintenance since their launch in 2019, that is until recently; when cert-manager suddenly stopped renewing the Let’s Encrypt certificates, we "resolved" this by temporarily removing the "tls" section from the ingress definition, forcing our clients to use the http version.
After that we removed all traces of cert-manager attempting to set it up from scratch.
Now, the cert-manager is creating the certificate signing request, spawns an acme http solver pod and adds it to the ingress, however upon accessing its url I can see that it returns an empty response, and not the expected token.
This has to do with the rewrite-target annotation that messes up the routing of the acme challenge.
What puzzles me the most, is that this used to work before. (It was set up by a former employee)
Disabling rewrite-target is unfortunately not an option, because it will stop the routing from working correctly.
Using dns01 won’t work because our ISP does not support programmatic changes of the DNS records.
Is there a way to make this work without disabling rewrite-target?
P.S.
Here’s a number of similar cases reported on Github:
- https://github.com/cert-manager/cert-manager/issues/2826
- https://github.com/cert-manager/cert-manager/issues/286
- https://github.com/cert-manager/cert-manager/issues/487
None of them help.
Here’s the definition of my ClusterIssuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: [email protected]
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-prod
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx
2
Answers
Please share the cluster issuer or issue you are using.
ingressClass
Ref : https://cert-manager.io/v0.12-docs/configuration/acme/http01/#ingressclass
Mostly we don’t see the HTTP solver challenge it comes and get removed if DNS or HTTP working fine.
Also, make sure your ingress doesn’t have SSL-redirect annotation that could be also once reason behind certs not getting generated.
Did you try checking the other object of cert-manager like order and certificate status request ?
kubectl describe challenge
are you getting 404 there ?If you are trying continuously there could be chance you hit rate limit of let’s encrypt to request generating certificates.
Troubleshooting : https://cert-manager.io/docs/faq/troubleshooting/#troubleshooting-a-failed-certificate-request
When you configure an Issuer with
http01
, the default serviceType isNodePort
. This means, it won’t even go through the ingress controller. From the docs:I’m not sure how the rest of your setup looks like, but
http01
cause the acme server to make HTTP requests (not https). You need to make sure your nginx has listener for http (80). It does follow redirects, so you can listen on http and redirect all traffic to https, this is legit and working.The cert-manager creates an
ingress
resource for validation. It directs traffic to the temporary pod. This ingress has it’s own set of rules, and you can control it using this setting. You can try and disable or modify the rewrite-targets on this resource.Another thing I would try is to access this URL from inside the cluster (bypassing the ingress nginx). If it works directly, then it’s an ingress / networking problem, otherwise it’s something else.
Please share the relevant nginx and cert-manager logs, it might be useful for debugging or understanding where your problem exist.