skip to Main Content

This deployment has been running fine for months. It looks like the Pods redeployed early this morning, I think probably related to applying 2023.10.31 (AKSSecurityPatchedVHD.

The Pods that mount Azure Files for file storage are stuck in ContainerCreating with the following error:

Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    3m46s                  default-scheduler  Successfully assigned env/<deployment> to <aks-node>
  Warning  FailedMount  3m45s (x2 over 3m46s)  kubelet            MountVolume.MountDevice failed for volume "<pvc>" : rpc error: code = Internal desc = volume(<resource-group>) mount //<stuff>.file.core.windows.net/<pvc> on /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/<stuff>/globalmount failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o mfsymlinks,actimeo=30,nosharesock,file_mode=0777,dir_mode=0777,<masked> //<stuff>.file.core.windows.net/<pvc> /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/<stuff>/globalmount
Output: mount error: cifs filesystem not supported by the system
mount error(19): No such device
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)

Please refer to http://aka.ms/filemounterror for possible causes and solutions for mount errors.
  Warning  FailedMount  104s  kubelet  Unable to attach or mount volumes: unmounted volumes=[file-storage], unattached volumes=[file-storage kube-api-access-xbprr]: timed out waiting for the condition

Kind of stumped. What I’ve tried:

  • Redeploying the pod deployments
  • Redeploying storage
  • Confirmed the pvc being referred to does in fact exist

Issue persists and I’m not sure what to try next other than redeploying everything.

There isn’t anything helpful at http://aka.ms/filemounterror. Nothing has changed in the environment for months. Another environment is running fine and it is basically a duplicate of this one, so seems isolated to this one. These are Linux nodes.


My storage.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: file-storage
  namespace: env
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile
  resources:
    requests:
      storage: 25Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-storage
  namespace: env
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: default
  resources:
    requests:
      storage: 25Gi

postgres-storage seems to be fine, it is the file-storage that is being an issue.

2

Answers


  1. Chosen as BEST ANSWER

    Both environments were running on kubectl=1.26.6. Upgraded the one that was having issues to 1.27.3 and that fixed it. Why one environment was having issues and not the other I'm not sure.


  2. This didn’t affect us for AKS but it did for other VMs in our Azure Tenant; it seems that there is an issue with the CIFS module not being included in the MS Kernel builds

    Between 6.2.0-1015 and 6.2.0-1016, the CIFS module was moved from fs/cifs/* to fs/smb/client/, fs/smb/common/ and fs/smb/server/*. The inclusion list (root/debian.azure-6.2/control.d/azure.inclusion-list) was not updated for this change, so the module is not included in the linux-modules-6.2.0-1026-azure package.

    I’m not sure why it hasn’t affected Kubernetes version 1.27.3; perhaps MS haven’t moved that to the 6.2.0.1206 kernel yet?

    A work around has been posted:

    ## Install older kernel
    sudo apt install linux-image-6.2.0-1015-azure
    
    ## Remove newer kernel (select NO when asked)
    sudo apt remove linux-image-6.2.0-1016-azure
    
    ## Reboot
    sudo reboot
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search