skip to Main Content

I have two nodepools in my AKS cluster; the default nodepool and an ‘application’ nodepool. I use the default nodepool for services like Airflow, and the application nodepool to run ETL jobs. However, the application nodepool never scales to zero, even when I do not schedule any ETL jobs for many hours.

I fail to understand why. Is there anyone who has any suggestions for the rootcause of the issue?

The cluster is depolyed using Terraform. The autoscaler is configured as foloows:

auto_scaler_profile {
    # (Optional) Maximum number of seconds the cluster autoscaler waits for pod termination when trying to scale down a node. Defaults to 600.
    max_graceful_termination_sec = 180
    # (Optional) How long after the scale up of AKS nodes the scale down evaluation resumes. Defaults to 10m.)
    scale_down_delay_after_add = "3m"
    # - (Optional) How long a node should be unneeded before it is eligible for scale down. Defaults to 10m.
    scale_down_unneeded = "3m"
    # (Optional) If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods). Defaults to true.
    skip_nodes_with_system_pods = false
  }

and the application nodepool is defined as:

resource "azurerm_kubernetes_cluster_node_pool" "main" {
  name                  = "application"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
  vm_size               = "Standard_B4ms"
  enable_auto_scaling   = true
  min_count             = 0
  max_count             = 2
  max_pods              = 15

  node_labels = {
    "type" = "application"
  }

}

Below are some relevant details about the AKS cluster:

k top nodes

NAME                                  CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
aks-application-XXXXXXXX-vmss000000   55m          1%     1579Mi          12%       
aks-default-XXXXXXXX-vmss000000       677m         17%    7783Mi          61%  
az aks nodepool show 
--resource-group <my-rg> 
--cluster-name <my-cluster> 
--name application 
--query "{min: minCount, max: maxCount}"

{
  "max": 2,
  "min": 0
}
az aks show 
--resource-group <my-rg> 
--name <my-cluster> 
--query autoScalerProfile

{
  "balanceSimilarNodeGroups": "false",
  "expander": "random",
  "maxEmptyBulkDelete": "10",
  "maxGracefulTerminationSec": "180",
  "maxNodeProvisionTime": "15m",
  "maxTotalUnreadyPercentage": "45",
  "newPodScaleUpDelay": "0s",
  "okTotalUnreadyCount": "3",
  "scaleDownDelayAfterAdd": "3m",
  "scaleDownDelayAfterDelete": "10s",
  "scaleDownDelayAfterFailure": "3m",
  "scaleDownUnneededTime": "3m",
  "scaleDownUnreadyTime": "20m",
  "scaleDownUtilizationThreshold": "0.5",
  "scanInterval": "10s",
  "skipNodesWithLocalStorage": "true",
  "skipNodesWithSystemPods": "false"
}
k get pods  --sort-by="{.spec.nodeName}" -A -o wide                                                                                                          
NAMESPACE      NAME                                  READY   STATUS    RESTARTS   AGE     NODE                                 
kube-system    azure-ip-masq-agent-XXXXX             1/1     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    metrics-server-XXXXXXXXXX-XXXXX       2/2     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    metrics-server-XXXXXXXXXX-XXXXX       2/2     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    kube-proxy-XXXXX                      1/1     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    csi-blob-node-XXXXX                   3/3     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    csi-azurefile-node-XXXXX              3/3     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    csi-azuredisk-node-XXXXX              3/3     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    cloud-node-manager-XXXXX              1/1     Running   0          3d17h   aks-application-XXXXXXXX-vmss000000
kube-system    cloud-node-manager-XXXXX              1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-pgbouncer-XXXXXXXXXX-XXXXX    2/2     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-triggerer-XXXXXXXXX-XXXXX     1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-webserver-XXXXXXXXX-XXXXX     1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-scheduler-XXXXXXXXX-XXXXX     2/2     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    azure-ip-masq-agent-XXXXX             1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-postgresql-0                  1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    coredns-XXXXXXXXXX-XXXXX              1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    coredns-XXXXXXXXXX-XXXXX              1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    coredns-autoscaler-XXXXXXXXXX-XXXXX   1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
airflow-prod   airflow-statsd-XXXXXXXX-XXXXX         1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    csi-azuredisk-node-XXXXX              3/3     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    csi-azurefile-node-XXXXX              3/3     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    csi-blob-node-XXXXX                   3/3     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000
kube-system    konnectivity-agent-XXXXXXXXXX-XXXXX   1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    konnectivity-agent-XXXXXXXXXX-XXXXX   1/1     Running   0          3d17h   aks-default-XXXXXXXX-vmss000000
kube-system    kube-proxy-XXXXX                      1/1     Running   0          3d21h   aks-default-XXXXXXXX-vmss000000

enter image description here

2

Answers


  1. Chosen as BEST ANSWER

    It turned out that adding a dedicated system node pool, as suggested by user 'Vuillemot Florian', solved the issue.

    I added a system node pool with the following configuration to my Terraform file:

    resource "azurerm_kubernetes_cluster_node_pool" "system" {
      name                  = "systempool"
      kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
      vm_size               = "Standard_B2s"
      enable_auto_scaling   = true
      min_count             = 1
      max_count             = 2
      node_taints           = ["CriticalAddonsOnly=true:NoSchedule"]
      mode                  = "System"
    }
    

    I believe the pods that prevented the node pool from scaling to 0 were the metrics-server-XXXXXXXX-XXXXX pods.


  2. Some system pods block node removal because there is no node affinity on your system pods.

    You can remedy this by deploying a dedicated system node pool.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search