I’m trying to set up airflow instance using docker-compose as described in official docs and I’m stuck at airflow-init part. It looks like there is no connectivity between containers, but I don’t know how to fix it.
I use literally the same docker-compose.yaml
as described in docs. It can be downloaded here: https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml
Currently, I see this in my shell:
~/dwn $ docker-compose up airflow-init
51ad8448b197_dwn_redis_1 is up-to-date
70409dec742c_dwn_postgres_1 is up-to-date
Starting dwn_airflow-init_1 ... done
Attaching to dwn_airflow-init_1
airflow-init_1 | BACKEND=postgresql+psycopg2
airflow-init_1 | DB_HOST=postgres
airflow-init_1 | DB_PORT=5432
Docs says I should see somethings like this:
airflow-init_1 | Upgrades done
airflow-init_1 | Admin user airflow created
airflow-init_1 | 2.1.2
start_airflow-init_1 exited with code 0
but that command just hangs and never exits. Htop shows me that netcat is running inside this container and it is trying to connect to postgres:
nc -zvvn 172.19.0.3 5432
curl
shows timeout:
~/dwn $ docker exec -it dwn_airflow-init_1 curl postgres:5432
curl: (7) Failed to connect to postgres port 5432: Connection timed out
Why it hangs?
I tried a few things to fix this:
-
I tried setting
ports
option inpostgres
service to5432:5432
– no effect -
I tried setting
links
option – no effect -
Other question suggested system entropy is too low – no, there is plenty of entropy
-
There is enough free RAM, CPU, disk space
-
I tried setting network like in this answer – it is even worse, containter names aren’t resolved:
~/dwn $ docker exec -it dwn_airflow-init_1 curl postgres:5432 curl: (6) Could not resolve host: postgres
-
I tried resetting iptables like suggested in this answer – no effect
Some system info:
- OS: Arch Linux
- docker version: 20.10.7, build f0df35096d
- docker-compose version: 1.29.2
Logs! (as requested by @larsks)
~/dwn $ docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------------------------------------------
dwn_airflow-init_1 /usr/bin/dumb-init -- /ent ... Up 8080/tcp
dwn_postgres_1 docker-entrypoint.sh postgres Up (healthy) 5432/tcp
dwn_redis_1 docker-entrypoint.sh redis ... Up (healthy) 0.0.0.0:6379->6379/tcp,:::6379->6379/tcp
~/dwn $ docker-compose logs postgres
Attaching to dwn_postgres_1
postgres_1 | The files belonging to this database system will be owned by user "postgres".
postgres_1 | This user must also own the server process.
postgres_1 |
postgres_1 | The database cluster will be initialized with locale "en_US.utf8".
postgres_1 | The default database encoding has accordingly been set to "UTF8".
postgres_1 | The default text search configuration will be set to "english".
postgres_1 |
postgres_1 | Data page checksums are disabled.
postgres_1 |
postgres_1 | fixing permissions on existing directory /var/lib/postgresql/data ... ok
postgres_1 | creating subdirectories ... ok
postgres_1 | selecting dynamic shared memory implementation ... posix
postgres_1 | selecting default max_connections ... 100
postgres_1 | selecting default shared_buffers ... 128MB
postgres_1 | selecting default time zone ... Etc/UTC
postgres_1 | creating configuration files ... ok
postgres_1 | running bootstrap script ... ok
postgres_1 | performing post-bootstrap initialization ... ok
postgres_1 | initdb: warning: enabling "trust" authentication for local connections
postgres_1 | You can change this by editing pg_hba.conf or using the option -A, or
postgres_1 | --auth-local and --auth-host, the next time you run initdb.
postgres_1 | syncing data to disk ... ok
postgres_1 |
postgres_1 |
postgres_1 | Success. You can now start the database server using:
postgres_1 |
postgres_1 | pg_ctl -D /var/lib/postgresql/data -l logfile start
postgres_1 |
postgres_1 | waiting for server to start....2021-07-17 07:31:38.491 UTC [47] LOG: starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
postgres_1 | 2021-07-17 07:31:38.493 UTC [47] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1 | 2021-07-17 07:31:38.499 UTC [48] LOG: database system was shut down at 2021-07-17 07:31:35 UTC
postgres_1 | 2021-07-17 07:31:38.521 UTC [47] LOG: database system is ready to accept connections
postgres_1 | done
postgres_1 | server started
postgres_1 | CREATE DATABASE
postgres_1 |
postgres_1 |
postgres_1 | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
postgres_1 |
postgres_1 | 2021-07-17 07:31:39.613 UTC [47] LOG: received fast shutdown request
postgres_1 | waiting for server to shut down....2021-07-17 07:31:39.615 UTC [47] LOG: aborting any active transactions
postgres_1 | 2021-07-17 07:31:39.616 UTC [47] LOG: background worker "logical replication launcher" (PID 54) exited with exit code 1
postgres_1 | 2021-07-17 07:31:39.616 UTC [49] LOG: shutting down
postgres_1 | 2021-07-17 07:31:39.644 UTC [47] LOG: database system is shut down
postgres_1 | done
postgres_1 | server stopped
postgres_1 |
postgres_1 | PostgreSQL init process complete; ready for start up.
postgres_1 |
postgres_1 | 2021-07-17 07:31:39.741 UTC [1] LOG: starting PostgreSQL 13.3 (Debian 13.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
postgres_1 | 2021-07-17 07:31:39.741 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
postgres_1 | 2021-07-17 07:31:39.741 UTC [1] LOG: listening on IPv6 address "::", port 5432
postgres_1 | 2021-07-17 07:31:39.748 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1 | 2021-07-17 07:31:39.756 UTC [75] LOG: database system was shut down at 2021-07-17 07:31:39 UTC
postgres_1 | 2021-07-17 07:31:39.781 UTC [1] LOG: database system is ready to accept connections
postgres_1 | 2021-07-17 07:33:49.955 UTC [79] LOG: using stale statistics instead of current ones because stats collector is not responding
postgres_1 | 2021-07-17 07:34:00.040 UTC [79] LOG: using stale statistics instead of current ones because stats collector is not responding
postgres_1 | 2021-07-17 07:34:00.049 UTC [235] LOG: using stale statistics instead of current ones because stats collector is not responding
postgres_1 | 2021-07-17 07:34:10.141 UTC [79] LOG: using stale statistics instead of current ones because stats collector is not responding
When I edit postgres
service to make it accessible from host (ports
option) I can see it really is there
~/dwn $ pg_isready -h localhost -p 5432
localhost:5432 - accepting connections
Here is how network created by docker-compose looks like:
[
{
"Name": "dwn_default",
"Id": "8c4e4ab1629cd7d2cb5d532e28b0837a11bc3516ba094248294e5d734a69dc11",
"Created": "2021-07-17T10:15:50.694208715+02:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.19.0.0/16",
"Gateway": "172.19.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"2c6dd1bcd0d81740ab17ff7816acd983ff053be2a8f886ef281b3e5ec1ec642b": {
"Name": "dwn_airflow-init_1",
"EndpointID": "945c9bd23ffb52bdee7ae9fdf32f48be623ac73cd60a5b248f919fce6aede366",
"MacAddress": "02:42:ac:13:00:04",
"IPv4Address": "172.19.0.4/16",
"IPv6Address": ""
},
"3a79a194d97e491c75e573fa78492c9d4f73efd4d868e709c20eb23c9a0ff2a6": {
"Name": "dwn_postgres_1",
"EndpointID": "b3245b8ab82edc78b205485cd39c368881d7c7b2bc29f325fd3f6f6d8605d9c1",
"MacAddress": "02:42:ac:13:00:03",
"IPv4Address": "172.19.0.3/16",
"IPv6Address": ""
},
"dd023f1d42be72d967c5045b7be29deca88caf99377e7d144c51f2212059cefa": {
"Name": "dwn_redis_1",
"EndpointID": "f85a6cd841028efb7fab17e40f814b0d9de300e90f9506df373d973695a38d97",
"MacAddress": "02:42:ac:13:00:02",
"IPv4Address": "172.19.0.2/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"com.docker.compose.network": "default",
"com.docker.compose.project": "dwn",
"com.docker.compose.version": "1.29.2"
}
}
]
@jarek-potiuk suggested I should check ipv6 configuration. Still won’t work, but I got some errors this time. Here is what I did:
I created /etc/docker/daemon.json
with following content:
{
"ipv6": true,
"fixed-cidr-v6": "2001:db8:1::/64"
}
This caused following error (after daemon restart):
could not find an available, non-overlapping IPv6 address pool among the defaults to as sign to the network
This error can be fixed by setting network_mode: bridge
for every service in compose file and now my services have ipv6 address:
[
{
"Name": "bridge",
"Id": "092767c3c4137429a7caaa85a1b87c7cb977c4f02055624fa84c4d586ed9758f",
"Created": "2021-07-17T14:42:08.353393246+02:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": true,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.1"
},
{
"Subnet": "2001:db8:1::/64",
"Gateway": "2001:db8:1::1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"964c9edadb8f7eb757cd7f1296c2af154ab407ef4d9872f8e613f61d64d6a443": {
"Name": "dwn_postgres_1",
"EndpointID": "a19bd83ff487611e78074eddafbca18e545edcf9ddc9d7851d3b6d68b7962419",
"MacAddress": "02:42:ac:11:00:02",
"IPv4Address": "172.17.0.2/16",
"IPv6Address": "2001:db8:1::242:ac11:2/64"
},
"b45ca546c1539f5f0f1d76423bd4f071efed2e3d6e118b8811e3fd28164fab5a": {
"Name": "dwn_airflow-init_1",
"EndpointID": "3a2fc42dfda6a534b6840971f4b11af9c78aac2253a036f46721ed6e5659f7b9",
"MacAddress": "02:42:ac:11:00:04",
"IPv4Address": "172.17.0.4/16",
"IPv6Address": "2001:db8:1::242:ac11:4/64"
},
"f140d9c90c24fca254e34aec549b559ec5f82bc8b14537e7249192e604110d53": {
"Name": "dwn_redis_1",
"EndpointID": "1c26f7afa8ada58626b67e7446347e1c4d540513df72784addcf334f99fd53d1",
"MacAddress": "02:42:ac:11:00:03",
"IPv4Address": "172.17.0.3/16",
"IPv6Address": "2001:db8:1::242:ac11:3/64"
}
},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]
but there is another problem – name resolution stopped working:
~/dwn $ docker-compose up airflow-init
dwn_postgres_1 is up-to-date
dwn_redis_1 is up-to-date
Starting dwn_airflow-init_1 ... done
Attaching to dwn_airflow-init_1
airflow-init_1 | BACKEND=postgresql+psycopg2
airflow-init_1 | DB_HOST=postgres
airflow-init_1 | DB_PORT=5432
airflow-init_1 | ....................
airflow-init_1 | ERROR! Maximum number of retries (20) reached.
airflow-init_1 |
airflow-init_1 | Last check result:
airflow-init_1 | $ run_nc 'postgres' '5432'
airflow-init_1 | Traceback (most recent call last):
airflow-init_1 | File "<string>", line 1, in <module>
airflow-init_1 | socket.gaierror: [Errno -3] Temporary failure in name resolution
airflow-init_1 | Can't parse as an IP address
airflow-init_1 |
dwn_airflow-init_1 exited with code 1
This is actually documented: containers on the default bridge network can only access each other by IP, but access by IP still doesn’t work:
~/dwn $ docker exec -i -t dwn_airflow-init_1 sh -c 'echo "PING" | nc -v 172.17.0.3 6379'
172.17.0.3: inverse host lookup failed: Host name lookup failure
^C
~/dwn $ echo "PING" | ncat -v localhost 6379
Ncat: Version 7.91 ( https://nmap.org/ncat )
Ncat: Connected to ::1:6379.
+PONG
Ncat: 5 bytes sent, 7 bytes received in 0.01 seconds.
I also found that disabling ipv6 at daemon level does not disable ipv6 in containers, so I tried to disable it in postgres container by setting sysctls. It works as expected:
~/dwn $ docker exec -i -t dwn_postgres_1 cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1
but still no network access.
I’m out of ideas at this point.
2
Answers
Well, I figured it out myself.
TL;DR: PEBKAC - the user misconfigured firewall and forgot he told the kernel to drop forwarded packets
Let's start from the very beginning:
docker-compose up airflow-init
prints just this and waits for something:Maybe host
postgres
points to weird place:Not really, looks like any other docker ip, but this netcat invocation still hangs:
That means
postgres
service did not respond at all to airflow init container. However,postgres
responded to requests coming from host system. That means there is no route betweenpostgres
andairflow
even if they are in the same network. Maybe kernel drops forwarded packets?Forwarding is enabled. Maybe firewall drops them?
It looks like they can get through. Maybe docker is somehow broken? Reinstalled, restarted, the same happens.
Maybe iptables is somehow broken?
Oh. I have
nftables
installed. That's weird. How is my firewall actually managed?And... how forward chain looks like?
Hmmm...
Now, what if I delete that chain:
Suddenly, airflow starts printing a lot in second terminal. Goal achieved:
Very detailed analysis. Great to see someone taking that many steps to dig down.
Everything looks good in your setup and logs. So I do not think the problem with docker-compose, it must be a problem with your environment.
I noticed one thing however, and while I am not 100% sure, this might be the reason.
I noticed that your postgres server listens on both IPV4 and IPV6 networks, however your docker-compose networks only show IPV4 addresses.
My hypothesis is that while you have IPV6 enabled for docker engine, it is disabled (or misconfigured) for IPV6.
What would then happen is that when you try to resolve the postgres address using IPV6 resolution and it hangs at retrieving address via misconfigured DNS – hence the timeout.
You can likely set ipv6 to false (https://docs.docker.com/config/daemon/ipv6/) in
/etc/docker/daemon.json
and restart the daemon: