skip to Main Content

I’m trying to set up airflow instance using docker-compose as described in official docs and I’m stuck at airflow-init part. It looks like there is no connectivity between containers, but I don’t know how to fix it.

I use literally the same docker-compose.yaml as described in docs. It can be downloaded here: https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml

Currently, I see this in my shell:

~/dwn $ docker-compose up airflow-init
51ad8448b197_dwn_redis_1 is up-to-date
70409dec742c_dwn_postgres_1 is up-to-date
Starting dwn_airflow-init_1 ... done
Attaching to dwn_airflow-init_1
airflow-init_1       | BACKEND=postgresql+psycopg2
airflow-init_1       | DB_HOST=postgres
airflow-init_1       | DB_PORT=5432

Docs says I should see somethings like this:

airflow-init_1       | Upgrades done
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.1.2
start_airflow-init_1 exited with code 0

but that command just hangs and never exits. Htop shows me that netcat is running inside this container and it is trying to connect to postgres:

nc -zvvn 172.19.0.3 5432

curl shows timeout:

~/dwn $ docker exec -it dwn_airflow-init_1 curl postgres:5432
curl: (7) Failed to connect to postgres port 5432: Connection timed out

Why it hangs?


I tried a few things to fix this:

  1. I tried setting ports option in postgres service to 5432:5432 – no effect

  2. I tried setting links option – no effect

  3. Other question suggested system entropy is too low – no, there is plenty of entropy

  4. There is enough free RAM, CPU, disk space

  5. I tried setting network like in this answer – it is even worse, containter names aren’t resolved:

    ~/dwn $ docker exec -it dwn_airflow-init_1 curl postgres:5432
    curl: (6) Could not resolve host: postgres
    
  6. I tried resetting iptables like suggested in this answer – no effect


Some system info:

  • OS: Arch Linux
  • docker version: 20.10.7, build f0df35096d
  • docker-compose version: 1.29.2

Logs! (as requested by @larsks)

~/dwn $ docker-compose ps
       Name                     Command                  State                        Ports                  
-------------------------------------------------------------------------------------------------------------
dwn_airflow-init_1   /usr/bin/dumb-init -- /ent ...   Up             8080/tcp                                
dwn_postgres_1       docker-entrypoint.sh postgres    Up (healthy)   5432/tcp                                
dwn_redis_1          docker-entrypoint.sh redis ...   Up (healthy)   0.0.0.0:6379->6379/tcp,:::6379->6379/tcp
~/dwn $ docker-compose logs postgres
Attaching to dwn_postgres_1
postgres_1           | The files belonging to this database system will be owned by user "postgres".
postgres_1           | This user must also own the server process.
postgres_1           | 
postgres_1           | The database cluster will be initialized with locale "en_US.utf8".
postgres_1           | The default database encoding has accordingly been set to "UTF8".
postgres_1           | The default text search configuration will be set to "english".
postgres_1           | 
postgres_1           | Data page checksums are disabled.
postgres_1           | 
postgres_1           | fixing permissions on existing directory /var/lib/postgresql/data ... ok
postgres_1           | creating subdirectories ... ok
postgres_1           | selecting dynamic shared memory implementation ... posix
postgres_1           | selecting default max_connections ... 100
postgres_1           | selecting default shared_buffers ... 128MB
postgres_1           | selecting default time zone ... Etc/UTC
postgres_1           | creating configuration files ... ok
postgres_1           | running bootstrap script ... ok
postgres_1           | performing post-bootstrap initialization ... ok
postgres_1           | initdb: warning: enabling "trust" authentication for local connections
postgres_1           | You can change this by editing pg_hba.conf or using the option -A, or
postgres_1           | --auth-local and --auth-host, the next time you run initdb.
postgres_1           | syncing data to disk ... ok
postgres_1           | 
postgres_1           | 
postgres_1           | Success. You can now start the database server using:
postgres_1           | 
postgres_1           |     pg_ctl -D /var/lib/postgresql/data -l logfile start
postgres_1           | 
postgres_1           | waiting for server to start....2021-07-17 07:31:38.491 UTC [47] LOG:  starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
postgres_1           | 2021-07-17 07:31:38.493 UTC [47] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1           | 2021-07-17 07:31:38.499 UTC [48] LOG:  database system was shut down at 2021-07-17 07:31:35 UTC
postgres_1           | 2021-07-17 07:31:38.521 UTC [47] LOG:  database system is ready to accept connections
postgres_1           |  done
postgres_1           | server started
postgres_1           | CREATE DATABASE
postgres_1           | 
postgres_1           | 
postgres_1           | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
postgres_1           | 
postgres_1           | 2021-07-17 07:31:39.613 UTC [47] LOG:  received fast shutdown request
postgres_1           | waiting for server to shut down....2021-07-17 07:31:39.615 UTC [47] LOG:  aborting any active transactions
postgres_1           | 2021-07-17 07:31:39.616 UTC [47] LOG:  background worker "logical replication launcher" (PID 54) exited with exit code 1
postgres_1           | 2021-07-17 07:31:39.616 UTC [49] LOG:  shutting down
postgres_1           | 2021-07-17 07:31:39.644 UTC [47] LOG:  database system is shut down
postgres_1           |  done
postgres_1           | server stopped
postgres_1           | 
postgres_1           | PostgreSQL init process complete; ready for start up.
postgres_1           | 
postgres_1           | 2021-07-17 07:31:39.741 UTC [1] LOG:  starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
postgres_1           | 2021-07-17 07:31:39.741 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
postgres_1           | 2021-07-17 07:31:39.741 UTC [1] LOG:  listening on IPv6 address "::", port 5432
postgres_1           | 2021-07-17 07:31:39.748 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1           | 2021-07-17 07:31:39.756 UTC [75] LOG:  database system was shut down at 2021-07-17 07:31:39 UTC
postgres_1           | 2021-07-17 07:31:39.781 UTC [1] LOG:  database system is ready to accept connections
postgres_1           | 2021-07-17 07:33:49.955 UTC [79] LOG:  using stale statistics instead of current ones because stats collector is not responding
postgres_1           | 2021-07-17 07:34:00.040 UTC [79] LOG:  using stale statistics instead of current ones because stats collector is not responding
postgres_1           | 2021-07-17 07:34:00.049 UTC [235] LOG:  using stale statistics instead of current ones because stats collector is not responding
postgres_1           | 2021-07-17 07:34:10.141 UTC [79] LOG:  using stale statistics instead of current ones because stats collector is not responding

When I edit postgres service to make it accessible from host (ports option) I can see it really is there

~/dwn $ pg_isready -h localhost -p 5432
localhost:5432 - accepting connections

Here is how network created by docker-compose looks like:

[
    {
        "Name": "dwn_default",
        "Id": "8c4e4ab1629cd7d2cb5d532e28b0837a11bc3516ba094248294e5d734a69dc11",
        "Created": "2021-07-17T10:15:50.694208715+02:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.19.0.0/16",
                    "Gateway": "172.19.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "2c6dd1bcd0d81740ab17ff7816acd983ff053be2a8f886ef281b3e5ec1ec642b": {
                "Name": "dwn_airflow-init_1",
                "EndpointID": "945c9bd23ffb52bdee7ae9fdf32f48be623ac73cd60a5b248f919fce6aede366",
                "MacAddress": "02:42:ac:13:00:04",
                "IPv4Address": "172.19.0.4/16",
                "IPv6Address": ""
            },
            "3a79a194d97e491c75e573fa78492c9d4f73efd4d868e709c20eb23c9a0ff2a6": {
                "Name": "dwn_postgres_1",
                "EndpointID": "b3245b8ab82edc78b205485cd39c368881d7c7b2bc29f325fd3f6f6d8605d9c1",
                "MacAddress": "02:42:ac:13:00:03",
                "IPv4Address": "172.19.0.3/16",
                "IPv6Address": ""
            },
            "dd023f1d42be72d967c5045b7be29deca88caf99377e7d144c51f2212059cefa": {
                "Name": "dwn_redis_1",
                "EndpointID": "f85a6cd841028efb7fab17e40f814b0d9de300e90f9506df373d973695a38d97",
                "MacAddress": "02:42:ac:13:00:02",
                "IPv4Address": "172.19.0.2/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {
            "com.docker.compose.network": "default",
            "com.docker.compose.project": "dwn",
            "com.docker.compose.version": "1.29.2"
        }
    }
]

@jarek-potiuk suggested I should check ipv6 configuration. Still won’t work, but I got some errors this time. Here is what I did:

I created /etc/docker/daemon.json with following content:

{
  "ipv6": true,
  "fixed-cidr-v6": "2001:db8:1::/64"
}

This caused following error (after daemon restart):

could not find an available, non-overlapping IPv6 address pool among the defaults to as sign to the network

This error can be fixed by setting network_mode: bridge for every service in compose file and now my services have ipv6 address:

[
    {
        "Name": "bridge",
        "Id": "092767c3c4137429a7caaa85a1b87c7cb977c4f02055624fa84c4d586ed9758f",
        "Created": "2021-07-17T14:42:08.353393246+02:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": true,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.17.0.0/16",
                    "Gateway": "172.17.0.1"
                },
                {
                    "Subnet": "2001:db8:1::/64",
                    "Gateway": "2001:db8:1::1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "964c9edadb8f7eb757cd7f1296c2af154ab407ef4d9872f8e613f61d64d6a443": {
                "Name": "dwn_postgres_1",
                "EndpointID": "a19bd83ff487611e78074eddafbca18e545edcf9ddc9d7851d3b6d68b7962419",
                "MacAddress": "02:42:ac:11:00:02",
                "IPv4Address": "172.17.0.2/16",
                "IPv6Address": "2001:db8:1::242:ac11:2/64"
            },
            "b45ca546c1539f5f0f1d76423bd4f071efed2e3d6e118b8811e3fd28164fab5a": {
                "Name": "dwn_airflow-init_1",
                "EndpointID": "3a2fc42dfda6a534b6840971f4b11af9c78aac2253a036f46721ed6e5659f7b9",
                "MacAddress": "02:42:ac:11:00:04",
                "IPv4Address": "172.17.0.4/16",
                "IPv6Address": "2001:db8:1::242:ac11:4/64"
            },
            "f140d9c90c24fca254e34aec549b559ec5f82bc8b14537e7249192e604110d53": {
                "Name": "dwn_redis_1",
                "EndpointID": "1c26f7afa8ada58626b67e7446347e1c4d540513df72784addcf334f99fd53d1",
                "MacAddress": "02:42:ac:11:00:03",
                "IPv4Address": "172.17.0.3/16",
                "IPv6Address": "2001:db8:1::242:ac11:3/64"
            }
        },
        "Options": {
            "com.docker.network.bridge.default_bridge": "true",
            "com.docker.network.bridge.enable_icc": "true",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
            "com.docker.network.bridge.name": "docker0",
            "com.docker.network.driver.mtu": "1500"
        },
        "Labels": {}
    }
]

but there is another problem – name resolution stopped working:

~/dwn $ docker-compose up airflow-init
dwn_postgres_1 is up-to-date
dwn_redis_1 is up-to-date
Starting dwn_airflow-init_1 ... done
Attaching to dwn_airflow-init_1
airflow-init_1       | BACKEND=postgresql+psycopg2
airflow-init_1       | DB_HOST=postgres
airflow-init_1       | DB_PORT=5432
airflow-init_1       | ....................
airflow-init_1       | ERROR! Maximum number of retries (20) reached.
airflow-init_1       | 
airflow-init_1       | Last check result:
airflow-init_1       | $ run_nc 'postgres' '5432'
airflow-init_1       | Traceback (most recent call last):
airflow-init_1       |   File "<string>", line 1, in <module>
airflow-init_1       | socket.gaierror: [Errno -3] Temporary failure in name resolution
airflow-init_1       | Can't parse  as an IP address
airflow-init_1       | 
dwn_airflow-init_1 exited with code 1

This is actually documented: containers on the default bridge network can only access each other by IP, but access by IP still doesn’t work:

~/dwn $ docker exec -i -t dwn_airflow-init_1 sh -c 'echo "PING" | nc -v 172.17.0.3 6379'
172.17.0.3: inverse host lookup failed: Host name lookup failure
^C
~/dwn $ echo "PING" | ncat -v localhost 6379
Ncat: Version 7.91 ( https://nmap.org/ncat )
Ncat: Connected to ::1:6379.
+PONG
Ncat: 5 bytes sent, 7 bytes received in 0.01 seconds.

I also found that disabling ipv6 at daemon level does not disable ipv6 in containers, so I tried to disable it in postgres container by setting sysctls. It works as expected:

~/dwn $ docker exec -i -t dwn_postgres_1 cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1

but still no network access.

I’m out of ideas at this point.

2

Answers


  1. Chosen as BEST ANSWER

    Well, I figured it out myself.

    TL;DR: PEBKAC - the user misconfigured firewall and forgot he told the kernel to drop forwarded packets


    Let's start from the very beginning: docker-compose up airflow-init prints just this and waits for something:

    ~/dwn $ docker-compose up airflow-init
    51ad8448b197_dwn_redis_1 is up-to-date
    70409dec742c_dwn_postgres_1 is up-to-date
    Starting dwn_airflow-init_1 ... done
    Attaching to dwn_airflow-init_1
    airflow-init_1       | BACKEND=postgresql+psycopg2
    airflow-init_1       | DB_HOST=postgres
    airflow-init_1       | DB_PORT=5432
    

    Maybe host postgres points to weird place:

    ~/dwn $ docker exec -i -t dwn_airflow-init_1 host postgres
    postgres has address 172.20.0.2
    

    Not really, looks like any other docker ip, but this netcat invocation still hangs:

     nc -zvvn 172.120.0.2 5432
    

    That means postgres service did not respond at all to airflow init container. However, postgres responded to requests coming from host system. That means there is no route between postgres and airflow even if they are in the same network. Maybe kernel drops forwarded packets?

    ~ # sysctl net/ipv4/conf/all/forwarding
    net.ipv4.conf.all.forwarding = 1
    ~ # sysctl net/ipv6/conf/all/forwarding
    net.ipv6.conf.all.forwarding = 1
    

    Forwarding is enabled. Maybe firewall drops them?

    ~ # iptables -S FORWARD
    -P FORWARD ACCEPT
    -A FORWARD -j DOCKER-USER
    -A FORWARD -j DOCKER-ISOLATION-STAGE-1
    -A FORWARD -o br-b4a6c0b51ae7 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
    -A FORWARD -o br-b4a6c0b51ae7 -j DOCKER
    -A FORWARD -i br-b4a6c0b51ae7 ! -o br-b4a6c0b51ae7 -j ACCEPT
    -A FORWARD -i br-b4a6c0b51ae7 -o br-b4a6c0b51ae7 -j ACCEPT
    -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
    -A FORWARD -o docker0 -j DOCKER
    -A FORWARD -i docker0 ! -o docker0 -j ACCEPT
    -A FORWARD -i docker0 -o docker0 -j ACCEPT
    

    It looks like they can get through. Maybe docker is somehow broken? Reinstalled, restarted, the same happens.

    Maybe iptables is somehow broken?

    ~ # pacman -S iptables
    resolving dependencies...
    looking for conflicting packages...
    :: iptables and iptables-nft are in conflict. Remove iptables-nft? [y/N]
    

    Oh. I have nftables installed. That's weird. How is my firewall actually managed?

    ~ # systemctl status iptables nftables
    ○ iptables.service - IPv4 Packet Filtering Framework
         Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
         Active: inactive (dead)
    
    ● nftables.service - Netfilter Tables
         Loaded: loaded (/usr/lib/systemd/system/nftables.service; enabled; vendor preset: disabled)
         Active: active (exited) since Mon 2021-07-19 10:30:41 CEST; 6h ago
           Docs: man:nft(8)
        Process: 824 ExecStart=/usr/bin/nft -f /etc/nftables.conf (code=exited, status=0/SUCCESS)
       Main PID: 824 (code=exited, status=0/SUCCESS)
            CPU: 9ms
    

    And... how forward chain looks like?

    ~ # nft list chain inet filter forward                               
    table inet filter {
        chain forward {
            type filter hook forward priority filter; policy accept;
            drop
        }
    }
    

    Hmmm...

    Now, what if I delete that chain:

    ~ # nft delete chain inet filter forward
    

    Suddenly, airflow starts printing a lot in second terminal. Goal achieved:

    airflow-init_1       | Admin user airflow created
    airflow-init_1       | 2.1.2
    dwn_airflow-init_1 exited with code 0
    

  2. Very detailed analysis. Great to see someone taking that many steps to dig down.

    Everything looks good in your setup and logs. So I do not think the problem with docker-compose, it must be a problem with your environment.

    I noticed one thing however, and while I am not 100% sure, this might be the reason.

    I noticed that your postgres server listens on both IPV4 and IPV6 networks, however your docker-compose networks only show IPV4 addresses.

    My hypothesis is that while you have IPV6 enabled for docker engine, it is disabled (or misconfigured) for IPV6.

    What would then happen is that when you try to resolve the postgres address using IPV6 resolution and it hangs at retrieving address via misconfigured DNS – hence the timeout.

    You can likely set ipv6 to false (https://docs.docker.com/config/daemon/ipv6/) in /etc/docker/daemon.json and restart the daemon:

    {
      "ipv6": true,
      "fixed-cidr-v6": "2001:db8:1::/64"
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search