I have two nodepools in my AKS cluster; the default nodepool and an ‘application’ nodepool. I use the default nodepool for services like Airflow, and the application nodepool to run ETL jobs. However, the application nodepool never scales to zero, even when I do not schedule any ETL jobs for many hours.
I fail to understand why. Is there anyone who has any suggestions for the rootcause of the issue?
The cluster is depolyed using Terraform. The autoscaler is configured as foloows:
auto_scaler_profile {
# (Optional) Maximum number of seconds the cluster autoscaler waits for pod termination when trying to scale down a node. Defaults to 600.
max_graceful_termination_sec = 180
# (Optional) How long after the scale up of AKS nodes the scale down evaluation resumes. Defaults to 10m.)
scale_down_delay_after_add = "3m"
# - (Optional) How long a node should be unneeded before it is eligible for scale down. Defaults to 10m.
scale_down_unneeded = "3m"
# (Optional) If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods). Defaults to true.
skip_nodes_with_system_pods = false
}
and the application nodepool is defined as:
resource "azurerm_kubernetes_cluster_node_pool" "main" {
name = "application"
kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
vm_size = "Standard_B4ms"
enable_auto_scaling = true
min_count = 0
max_count = 2
max_pods = 15
node_labels = {
"type" = "application"
}
}
Below are some relevant details about the AKS cluster:
k top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
aks-application-XXXXXXXX-vmss000000 55m 1% 1579Mi 12%
aks-default-XXXXXXXX-vmss000000 677m 17% 7783Mi 61%
az aks nodepool show
--resource-group <my-rg>
--cluster-name <my-cluster>
--name application
--query "{min: minCount, max: maxCount}"
{
"max": 2,
"min": 0
}
az aks show
--resource-group <my-rg>
--name <my-cluster>
--query autoScalerProfile
{
"balanceSimilarNodeGroups": "false",
"expander": "random",
"maxEmptyBulkDelete": "10",
"maxGracefulTerminationSec": "180",
"maxNodeProvisionTime": "15m",
"maxTotalUnreadyPercentage": "45",
"newPodScaleUpDelay": "0s",
"okTotalUnreadyCount": "3",
"scaleDownDelayAfterAdd": "3m",
"scaleDownDelayAfterDelete": "10s",
"scaleDownDelayAfterFailure": "3m",
"scaleDownUnneededTime": "3m",
"scaleDownUnreadyTime": "20m",
"scaleDownUtilizationThreshold": "0.5",
"scanInterval": "10s",
"skipNodesWithLocalStorage": "true",
"skipNodesWithSystemPods": "false"
}
k get pods --sort-by="{.spec.nodeName}" -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE NODE
kube-system azure-ip-masq-agent-XXXXX 1/1 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system metrics-server-XXXXXXXXXX-XXXXX 2/2 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system metrics-server-XXXXXXXXXX-XXXXX 2/2 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system kube-proxy-XXXXX 1/1 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system csi-blob-node-XXXXX 3/3 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system csi-azurefile-node-XXXXX 3/3 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system csi-azuredisk-node-XXXXX 3/3 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system cloud-node-manager-XXXXX 1/1 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system cloud-node-manager-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-pgbouncer-XXXXXXXXXX-XXXXX 2/2 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-triggerer-XXXXXXXXX-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-webserver-XXXXXXXXX-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-scheduler-XXXXXXXXX-XXXXX 2/2 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system azure-ip-masq-agent-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-postgresql-0 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system coredns-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system coredns-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system coredns-autoscaler-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-statsd-XXXXXXXX-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system csi-azuredisk-node-XXXXX 3/3 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system csi-azurefile-node-XXXXX 3/3 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system csi-blob-node-XXXXX 3/3 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system konnectivity-agent-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system konnectivity-agent-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system kube-proxy-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
2
Answers
It turned out that adding a dedicated system node pool, as suggested by user 'Vuillemot Florian', solved the issue.
I added a system node pool with the following configuration to my Terraform file:
I believe the pods that prevented the node pool from scaling to 0 were the
metrics-server-XXXXXXXX-XXXXX
pods.Some system pods block node removal because there is no node affinity on your system pods.
You can remedy this by deploying a dedicated system node pool.