We performed our kubernetes cluster upgrade from v1.21 to v1.22. After this operation we discovered that our nginx-ingress-controller deployment’s pods are failing to start with the following error message:
pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125: Failed to list *v1beta1.Ingress: the server could not find the requested resource
We have found out that this issue is tracked over here: https://github.com/bitnami/charts/issues/7264
Because azure doesn’t let to downgrade the cluster back to the 1.21 could you please help us fixing the nginx-ingress-controller deployment? Could you please be specific with what should be done and from where (local machine or azure cli, etc) as we are not very familiar with helm
.
This is our deployment current yaml:
kind: Deployment
apiVersion: apps/v1
metadata:
name: nginx-ingress-controller
namespace: ingress
uid: 575c7699-1fd5-413e-a81d-b183f8822324
resourceVersion: '166482672'
generation: 16
creationTimestamp: '2020-10-10T10:20:07Z'
labels:
app: nginx-ingress
app.kubernetes.io/component: controller
app.kubernetes.io/managed-by: Helm
chart: nginx-ingress-1.41.1
heritage: Helm
release: nginx-ingress
annotations:
deployment.kubernetes.io/revision: '2'
meta.helm.sh/release-name: nginx-ingress
meta.helm.sh/release-namespace: ingress
managedFields:
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:replicas: {}
subresource: scale
- manager: Go-http-client
operation: Update
apiVersion: apps/v1
time: '2020-10-10T10:20:07Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/component: {}
f:app.kubernetes.io/managed-by: {}
f:chart: {}
f:heritage: {}
f:release: {}
f:spec:
f:progressDeadlineSeconds: {}
f:revisionHistoryLimit: {}
f:selector: {}
f:strategy:
f:rollingUpdate:
.: {}
f:maxSurge: {}
f:maxUnavailable: {}
f:type: {}
f:template:
f:metadata:
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/component: {}
f:component: {}
f:release: {}
f:spec:
f:containers:
k:{"name":"nginx-ingress-controller"}:
.: {}
f:args: {}
f:env:
.: {}
k:{"name":"POD_NAME"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef: {}
k:{"name":"POD_NAMESPACE"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:fieldRef: {}
f:image: {}
f:imagePullPolicy: {}
f:livenessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:name: {}
f:ports:
.: {}
k:{"containerPort":80,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:name: {}
f:protocol: {}
k:{"containerPort":443,"protocol":"TCP"}:
.: {}
f:containerPort: {}
f:name: {}
f:protocol: {}
f:readinessProbe:
.: {}
f:failureThreshold: {}
f:httpGet:
.: {}
f:path: {}
f:port: {}
f:scheme: {}
f:initialDelaySeconds: {}
f:periodSeconds: {}
f:successThreshold: {}
f:timeoutSeconds: {}
f:resources:
.: {}
f:limits: {}
f:requests: {}
f:securityContext:
.: {}
f:allowPrivilegeEscalation: {}
f:capabilities:
.: {}
f:add: {}
f:drop: {}
f:runAsUser: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:dnsPolicy: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:serviceAccount: {}
f:serviceAccountName: {}
f:terminationGracePeriodSeconds: {}
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
time: '2022-01-24T01:23:22Z'
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
.: {}
k:{"type":"Available"}:
.: {}
f:type: {}
k:{"type":"Progressing"}:
.: {}
f:type: {}
- manager: Mozilla
operation: Update
apiVersion: apps/v1
time: '2022-01-28T23:18:41Z'
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:template:
f:spec:
f:containers:
k:{"name":"nginx-ingress-controller"}:
f:resources:
f:limits:
f:cpu: {}
f:memory: {}
f:requests:
f:cpu: {}
f:memory: {}
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
time: '2022-01-28T23:29:49Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:deployment.kubernetes.io/revision: {}
f:status:
f:conditions:
k:{"type":"Available"}:
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
k:{"type":"Progressing"}:
f:lastTransitionTime: {}
f:lastUpdateTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:observedGeneration: {}
f:replicas: {}
f:unavailableReplicas: {}
f:updatedReplicas: {}
subresource: status
spec:
replicas: 2
selector:
matchLabels:
app: nginx-ingress
app.kubernetes.io/component: controller
release: nginx-ingress
template:
metadata:
creationTimestamp: null
labels:
app: nginx-ingress
app.kubernetes.io/component: controller
component: controller
release: nginx-ingress
spec:
containers:
- name: nginx-ingress-controller
image: us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1
args:
- /nginx-ingress-controller
- '--default-backend-service=ingress/nginx-ingress-default-backend'
- '--election-id=ingress-controller-leader'
- '--ingress-class=nginx'
- '--configmap=ingress/nginx-ingress-controller'
ports:
- name: http
containerPort: 80
protocol: TCP
- name: https
containerPort: 443
protocol: TCP
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
resources:
limits:
cpu: 300m
memory: 512Mi
requests:
cpu: 200m
memory: 256Mi
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 101
allowPrivilegeEscalation: true
restartPolicy: Always
terminationGracePeriodSeconds: 60
dnsPolicy: ClusterFirst
serviceAccountName: nginx-ingress
serviceAccount: nginx-ingress
securityContext: {}
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 16
replicas: 3
updatedReplicas: 2
unavailableReplicas: 3
conditions:
- type: Available
status: 'False'
lastUpdateTime: '2022-01-28T22:58:07Z'
lastTransitionTime: '2022-01-28T22:58:07Z'
reason: MinimumReplicasUnavailable
message: Deployment does not have minimum availability.
- type: Progressing
status: 'False'
lastUpdateTime: '2022-01-28T23:29:49Z'
lastTransitionTime: '2022-01-28T23:29:49Z'
reason: ProgressDeadlineExceeded
message: >-
ReplicaSet "nginx-ingress-controller-59d9f94677" has timed out
progressing.
2
Answers
@Philip Welz's answer is the correct one of course. It was necessary to upgrade the ingress controller because of the removed
v1beta1
Ingress API version in Kubernetes v1.22. But that's not the only problem we faced so I've decided to make a "very very short" guide of how we have finally ended up with a healthy running cluster (5 days later) so it may save someone else the struggle.1. Upgrading nginx-ingress-controller version in YAML file.
Here we simply changed the version in the yaml file from:
to
After this operation, a new pod in v1.1.1 was spawned. It started nicely and was running healthy. Unfortunately that didn't bring our microservices back online. Now I know it was probably because of some changes that had to be done to the existing ingresses yaml files to make them compatible with the new version of the ingress controller. So go directly to step 2. now (two headers below).
Don't do this step for now, do only when step 2 failed for you: Reinstall nginx-ingress-controller
We decided that in this situation we will reinstall the controller from scratch following Microsoft's official documentation: https://learn.microsoft.com/en-us/azure/aks/ingress-basic?tabs=azure-cli. Be aware that this will probably change the external IP address of your ingress controller. The easiest way in our case was to just remove the whole
ingress
namespace:That unfortunately doesn't remove the ingress class so the additional is required:
Then install the new controller:
If you reinstalled nginx-ingress-controller or IP address changed after the upgrade in step 1.: Update your Network security groups, Load Balancers and domain DNS
In your AKS resource group should be a resource of type
Network security group
. It contains inbound and outbound security rules (I understand it works as a firewall). There should be a default network security group that is automatically managed by Kubernetes and the IP address should be automatically refreshed there.Unfortunately, we also had an additional custom one. We had to update the rules manually there.
In the same resource group there should be a resource of
Load balancer
type. In theFrontend IP configuration
tab double check if the IP address reflects your new IP address. As a bonus you can double check in theBackend pools
tab that the addresses there match your internal node IPs.Lastly don't forget to adjust your domain DNS records.
2. Upgrade your ingress yaml configuration files to match syntax changes
That took us a while to determine a working template but actually installing the helloworld application from, the mentioned above, Microsoft's tutorial helped us a lot. We started from this:
And after introducing changes incrementally we finally made it to the below. But I'm pretty sure the issue was that we were missing the
nginx.ingress.kubernetes.io/use-regex: 'true'
entry:Just in case someone would like to install, for testing purposes, the helloworld app then yamls looked as follows:
3. Deal with other crashing applications ...
Another application that was crashing in our cluster was
cert-manager
. This was in version 1.0.1 so, first, we upgraded this to version 1.1.1:That created a brand new healthy pod. We were happy and decided to stay with v1.1 because we were a bit scared about additional measures that have to be taken when upgrading to higher versions (check at the bottom of this page https://cert-manager.io/docs/installation/upgrading/).
The cluster is now finally fixed. It is, right?
4. ... but be sure to check the compatibility charts!
Well.. now we know that the cert-manager is compatible with Kubernetes v1.22 only starting from version 1.5. We were so unlucky that exactly that night our SSL certificate passed 30 days threshold from the expiration date so the cert-manager decided to renew the cert! The operation failed and the cert-manager crashed. Kubernetes fallback to the "Kubernetes Fake Certificate". The web page went down again because of browsers killing the traffic because of the invalid certificate. The fix was to upgrade to 1.5 and upgrade the CRDs as well:
After this, the new instance of cert-manager refreshed our certificate successfully. Cluster saved again.
In case you need to force the renewal you can take a look at this issue: https://github.com/jetstack/cert-manager/issues/2641
@ajcann suggests adding
renewBefore
property to the certificates:Then wait for the certificates to renew and then remove the property:
Kubernetes 1.22 is supported only with NGINX Ingress Controller 1.0.0 and higher = https://github.com/kubernetes/ingress-nginx#support-versions-table
You need tu upgrade your
nginx-ingress-controller
Bitnami Helm Chart to Version 9.0.0 inChart.yaml
. Then run ahelm upgrade nginx-ingress-controller bitnami/nginx-ingress-controller
.You should also regularly update specially your ingress controller, as the version v0.34.1 is very very old bcs the ingress is normally the only entry appoint from outside to your cluster.