I am running nginx as part of the docker-compose template.
In nginx config I am referring to other services by their docker hostnames (e.g. backend
, ui
).
That works fine until I do that trick:
docker stop backend
docker stop ui
docker start ui
docker start backend
which makes backend and ui containers to exchange IP addresses (docker provides private network IPs on a basis of giving the next IP available in CIDR to each new requester). This 4 commands executed imitate some rare cases when both upstream containers got restarted at the same time but the nginx container did not. Also, I believe, this should be a very common situation when running pods on Kubernetes-based clusters.
Now nginx resolves backend
host to ui’s IP and ui
to backend’s IP.
Reloading nginx’ configuration does help (nginx -s reload
).
Also, if I do nslookup from within the nginx container – the IPs are always resolved correctly.
So this isolates the problem to be a pure nginx issue around the DNS caching.
The things I tried:
- I have the resolver set under the http {} block in nginx config:
resolver 127.0.0.11 ipv6=off valid=10s;
- Most common solution proposed by the folks on the internet to use variables in proxy-pass (this helps to prevent nginx to resolve and cache DNS records on start) – that did not make ANY difference at all:
server {
<...>
set $mybackend "backend:3000";
location /backend/ {
proxy_pass http://$mybackend;
}
}
- Tried adding resolver line into the location itself
- Tried setting the variable on the http{} block level, using
map
:
http {
map "" $mybackend {
default backend:3000;
}
server {
...
}
}
- Tried to use openresty fork of nginx (https://hub.docker.com/r/openresty/openresty/) with
resolver local=true
None of the solutions gave any effect at all. The DNS caches are only wiped if I reload nginx configuration inside of the container OR restart the container manually.
My current workaround is to use static docker network declared in docker-compose.yml. But this has its cons too.
Nginx version used: 1.20.0 (latest as of now)
Openresty versions used: 1.13.6.1 and 1.19.3.1 (latest as of now)
Would appreciate any thoughts
UPDATE 2021-09-08: Few months later I am back to solving this same issue and still no luck. Really looks like the bug in nginx – I can not make nginx to re-resolve the dns names. There seems to be no timeout to nginx’ dns cache and none of the options listed above to introduce timeouts or trigger dns flush work.
UPDATE 2022-01-11: I think the problem is really in the nginx. I tested my config in many ways a couple months ago and it looks like something else in my nginx.conf prevents the valid
parameter of the resolver
directive from working properly. It is either the limit_req_zone
or the proxy_cache_path
directives used for request rate limiting and caching respectively. These just don’t play nicely with the valid
param for some reason. And I could not find any information about this anywhere in nginx docs.
I will get back to this later to confirm my hypothesis.
4
Answers
Maybe it’s because nginx’s DNS resolver for upstream servers only works in the commercial version, nginx plus?
https://www.nginx.com/products/nginx/load-balancing/#service-discovery
TLDR: Your Internet Provider may be caching dnses with no respect to tiny TTL values (like 1 second).
I’ve been trying to retest locally the same thing.
But nslookup is your friend, you can query each dns server between nginx and root DNS server.
Something very easy to reproduce (without setting up local dns server)
Create route 53 ‘A’ entry with TTL of 1 second and try to query AWS dns server in your hosted zone (it will be sth. like ns-239.awsdns-29.com)
Play around with dig / nslookup command
It will return IP you have set
Change the Route53 ‘A’ entry to some other ip.
use dig / nslookup and make sure you see changes immediately
Then set resolver in nginx to AWS dns name (for testing purposes only).
If that works it means that DNS is cached elsewere and this is no longer nginx issue!
In my case it was sunrise WIFI router which began to see new IP only after I restarted it (I assume things would resolve after some longer value).
Great help when debugging this is when your nginx is compiled with
Then in nginx logs you see whether given dns was resolved and to what IP.
My whole config looks like this (here with standard docker resolver which has to be set if you are using variables in proxy_pass!)
Then you can try to test it with
Was struggling on the same thing exactly for the same thing (Docker Swarm) and actually to make it work I required to let the
upstream
away from my configuration.Something that works well (tested 5′ ago on NGINX 2.22) :
where
$bck_parameters
is NOT an upstream but the real server behind.Doing same thing with upstream will fail.
After a long search I found some solution for uwsgi_pass. The same should work for
proxy_pass
.where
UWSGI_ADDR
is the name of your application container with port, e.g.app:8000
.UPD:
In fact, it follows from proxy_pass documentaiton.
Also you can find some useful information in section "Setting the Domain Name in a Variable" in the blog authored by one of the nginx developers.