it was a working set up and no manual changes were made.
when we are trying to deploy application on aks; it fails to pull an image from the acr.
as per kubectl describe po output:
Failed to pull image "xyz.azurecr.io/xyz:-beta-68": [rpc error: code = Unknown desc = Error response from daemon: Get https://xyz.azurecr.io/v2/: dial tcp: lookup rxyz.azurecr.io on [::1]:53: read udp [::1]:46256->[::1]:53: read: connection refused, rpc error: code = Unknown desc = Error response from daemon: Get https://xyz.azurecr.io/v2/: dial tcp: lookup xyz.azurecr.io on [::1]:53: read udp [::1]:46112->[::1]:53: read: connection refused, rpc error: code = Unknown desc = Error response from daemon: Get https://xyz.azurecr.io/v2/: dial tcp: lookup xyz.azurecr.io on [::1]:53: read udp [::1]:36677->[::1]:53: read: connection refused]
while troubleshooting i realised, few nodes has the dns entry in /etc/resolv.conf where image pull is working fine without issue and few node doesn’t have the dns entry in /etc/resolv.conf where the image pull fails.
and if i manually add dns entry to /etc/resolv.conf on the nodes that doesn’t have the entry; the changes are reverted to the initial state withing few minutes.
is there a procedure to edit /etc/resolv.conf or fix image pull issues.?
3
Answers
restart the cluster it will fix the problem ubuntu team have made some DNS issue so this problem started.
There is a bug in ubuntu that impacts AKS (global).
You can follow the link below to see the status.
https://status.azure.com/en-us/status
In addition, there is a thread here you can follow the suggestions to overcome this issue.
https://learn.microsoft.com/en-us/answers/questions/987231/error-connecting-aks-with-acr.html
Restarting the nodes solved the acr pull problem
https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-manage-cli#restart-vms-in-a-scale-set