skip to Main Content

I am running nginx as part of the docker-compose template.
In nginx config I am referring to other services by their docker hostnames (e.g. backend, ui).
That works fine until I do that trick:

docker stop backend
docker stop ui
docker start ui
docker start backend

which makes backend and ui containers to exchange IP addresses (docker provides private network IPs on a basis of giving the next IP available in CIDR to each new requester). This 4 commands executed imitate some rare cases when both upstream containers got restarted at the same time but the nginx container did not. Also, I believe, this should be a very common situation when running pods on Kubernetes-based clusters.

Now nginx resolves backend host to ui’s IP and ui to backend’s IP.
Reloading nginx’ configuration does help (nginx -s reload).
Also, if I do nslookup from within the nginx container – the IPs are always resolved correctly.

So this isolates the problem to be a pure nginx issue around the DNS caching.

The things I tried:

  1. I have the resolver set under the http {} block in nginx config:
resolver 127.0.0.11 ipv6=off valid=10s;
  1. Most common solution proposed by the folks on the internet to use variables in proxy-pass (this helps to prevent nginx to resolve and cache DNS records on start) – that did not make ANY difference at all:
server {
  <...>
  set $mybackend "backend:3000";
  location /backend/ {
    proxy_pass http://$mybackend;
  }
}
  1. Tried adding resolver line into the location itself
  2. Tried setting the variable on the http{} block level, using map:
http {  
  map "" $mybackend {
    default backend:3000;
  }
  server {
   ...
  }
}
  1. Tried to use openresty fork of nginx (https://hub.docker.com/r/openresty/openresty/) with resolver local=true

None of the solutions gave any effect at all. The DNS caches are only wiped if I reload nginx configuration inside of the container OR restart the container manually.

My current workaround is to use static docker network declared in docker-compose.yml. But this has its cons too.

Nginx version used: 1.20.0 (latest as of now)
Openresty versions used: 1.13.6.1 and 1.19.3.1 (latest as of now)

Would appreciate any thoughts

UPDATE 2021-09-08: Few months later I am back to solving this same issue and still no luck. Really looks like the bug in nginx – I can not make nginx to re-resolve the dns names. There seems to be no timeout to nginx’ dns cache and none of the options listed above to introduce timeouts or trigger dns flush work.

UPDATE 2022-01-11: I think the problem is really in the nginx. I tested my config in many ways a couple months ago and it looks like something else in my nginx.conf prevents the valid parameter of the resolver directive from working properly. It is either the limit_req_zone or the proxy_cache_path directives used for request rate limiting and caching respectively. These just don’t play nicely with the valid param for some reason. And I could not find any information about this anywhere in nginx docs.
I will get back to this later to confirm my hypothesis.

4

Answers


  1. Maybe it’s because nginx’s DNS resolver for upstream servers only works in the commercial version, nginx plus?

    https://www.nginx.com/products/nginx/load-balancing/#service-discovery

    Login or Signup to reply.
  2. TLDR: Your Internet Provider may be caching dnses with no respect to tiny TTL values (like 1 second).

    I’ve been trying to retest locally the same thing.

    • Your docker might be using local resolver (127.0.0.11)
    • Then Dns might be cached by your OS (which you may clean – that’s OS specific)
    • Then you might have it cached on your WIFI/router (yes!)
    • Later it goes to your ISP and is beyond your control.

    But nslookup is your friend, you can query each dns server between nginx and root DNS server.

    Something very easy to reproduce (without setting up local dns server)

    Create route 53 ‘A’ entry with TTL of 1 second and try to query AWS dns server in your hosted zone (it will be sth. like ns-239.awsdns-29.com)
    Play around with dig / nslookup command

    nslookup
    set type=a
    server ns-239.awsdns-29.com
    your.domain.com
    

    It will return IP you have set

    Change the Route53 ‘A’ entry to some other ip.

    use dig / nslookup and make sure you see changes immediately

    Then set resolver in nginx to AWS dns name (for testing purposes only).
    If that works it means that DNS is cached elsewere and this is no longer nginx issue!

    In my case it was sunrise WIFI router which began to see new IP only after I restarted it (I assume things would resolve after some longer value).

    Great help when debugging this is when your nginx is compiled with

    --with-debug
    

    Then in nginx logs you see whether given dns was resolved and to what IP.

    My whole config looks like this (here with standard docker resolver which has to be set if you are using variables in proxy_pass!)

            server {
    
                     listen 0.0.0.0:8888;
                     server_name nginx.my.custom.domain.in.aws;
                     resolver 127.0.0.11 valid=1s;
    
                     location / {
                         proxy_ssl_server_name on;
                         proxy_set_header X-Real-IP $remote_addr;
                         proxy_set_header X-Forwarded-Proto https;
                         proxy_set_header Host $host;
                         set $backend_servers my.custom.domain.in.aws;
                         proxy_pass https://$backend_servers$request_uri;
                     }
                }
    

    Then you can try to test it with

     curl -L http://nginx.my.custom.domain.in.aws --resolve nginx.my.custom.domain.in.aws 0.0.0.0:8888
    
    Login or Signup to reply.
  3. Was struggling on the same thing exactly for the same thing (Docker Swarm) and actually to make it work I required to let the upstream away from my configuration.

    Something that works well (tested 5′ ago on NGINX 2.22) :

    location ~* /api/parameters/(.*)$ {
        resolver 127.0.0.11 ipv6=off valid = 1s;
        set $bck_parameters parameters:8000;
        proxy_pass http://$bck_parameters/api/$1$is_args$args;
    }
    

    where $bck_parameters is NOT an upstream but the real server behind.
    Doing same thing with upstream will fail.

    Login or Signup to reply.
  4. After a long search I found some solution for uwsgi_pass. The same should work for proxy_pass.

        resolver 127.0.0.11 valid=10s;
        set $upstream_endpoint ${UWSGI_ADDR};
        location / {
            uwsgi_pass $upstream_endpoint;
            include uwsgi_params;
        }
    

    where UWSGI_ADDR is the name of your application container with port, e.g. app:8000.

    UPD:

    In fact, it follows from proxy_pass documentaiton.

    Parameter value can contain variables. In this case, if an address is specified as a domain name, the name is searched among the described server groups, and, if not found, is determined using a resolver.

    Also you can find some useful information in section "Setting the Domain Name in a Variable" in the blog authored by one of the nginx developers.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search