This deployment has been running fine for months. It looks like the Pods redeployed early this morning, I think probably related to applying 2023.10.31 (AKSSecurityPatchedVHD
.
The Pods that mount Azure Files for file storage are stuck in ContainerCreating
with the following error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m46s default-scheduler Successfully assigned env/<deployment> to <aks-node>
Warning FailedMount 3m45s (x2 over 3m46s) kubelet MountVolume.MountDevice failed for volume "<pvc>" : rpc error: code = Internal desc = volume(<resource-group>) mount //<stuff>.file.core.windows.net/<pvc> on /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/<stuff>/globalmount failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o mfsymlinks,actimeo=30,nosharesock,file_mode=0777,dir_mode=0777,<masked> //<stuff>.file.core.windows.net/<pvc> /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/<stuff>/globalmount
Output: mount error: cifs filesystem not supported by the system
mount error(19): No such device
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)
Please refer to http://aka.ms/filemounterror for possible causes and solutions for mount errors.
Warning FailedMount 104s kubelet Unable to attach or mount volumes: unmounted volumes=[file-storage], unattached volumes=[file-storage kube-api-access-xbprr]: timed out waiting for the condition
Kind of stumped. What I’ve tried:
- Redeploying the pod deployments
- Redeploying storage
- Confirmed the pvc being referred to does in fact exist
Issue persists and I’m not sure what to try next other than redeploying everything.
There isn’t anything helpful at http://aka.ms/filemounterror. Nothing has changed in the environment for months. Another environment is running fine and it is basically a duplicate of this one, so seems isolated to this one. These are Linux nodes.
My storage.yaml
:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: file-storage
namespace: env
spec:
accessModes:
- ReadWriteMany
storageClassName: azurefile
resources:
requests:
storage: 25Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-storage
namespace: env
spec:
accessModes:
- ReadWriteOnce
storageClassName: default
resources:
requests:
storage: 25Gi
postgres-storage
seems to be fine, it is the file-storage
that is being an issue.
2
Answers
Both environments were running on
kubectl=1.26.6
. Upgraded the one that was having issues to1.27.3
and that fixed it. Why one environment was having issues and not the other I'm not sure.This didn’t affect us for AKS but it did for other VMs in our Azure Tenant; it seems that there is an issue with the CIFS module not being included in the MS Kernel builds
I’m not sure why it hasn’t affected Kubernetes version 1.27.3; perhaps MS haven’t moved that to the 6.2.0.1206 kernel yet?
A work around has been posted: