I’ve been using this docker compose
configuration for a couple of years in production now, and it’s been fine until randomly crashing twice in the past week…
version: "3.7"
services:
web:
image: backend
build: ..
restart: unless-stopped
expose:
- 8000
ports:
- 9109:8000
env_file:
- &ENV_FILE ../.env
depends_on:
- db
- worker
volumes:
- &MEDIA_VOLUME /srv/media/:/srv/media
- &STATIC_VOLUME /srv/static/:/srv/static
- &TMP_VOLUME /tmp/:/tmp/host/
logging:
driver: journald
options:
tag: docker-web
worker:
image: backend
environment:
- REMAP_SIGTERM=SIGQUIT
command: /usr/bin/start-worker.sh
restart: unless-stopped
env_file:
- *ENV_FILE
depends_on:
- db
- redis
- rabbitmq
volumes:
- *MEDIA_VOLUME
- *STATIC_VOLUME
- *TMP_VOLUME
logging:
driver: journald
options:
tag: docker-worker
db:
image: mdillon/postgis:11
shm_size: '256m'
restart: unless-stopped
env_file:
- *ENV_FILE
volumes:
- /var/docker-postgres/:/var/lib/postgresql/data/
- *TMP_VOLUME
logging:
driver: journald
options:
tag: docker-db
memcached:
container_name: memcached
image: memcached:latest
ports:
- "11211:11211"
rabbitmq:
image: rabbitmq:management
ports:
- 5672:5672
- 15672:15672
redis:
image: redis:latest
expose:
- 6379
Suddenly last week I started seeing errors from the web
process:
could not translate host name "db" to address: Name or service not known
Error -2 connecting to redis:6379. Name or service not known.
When I checked on the processes they all seemed to be running:
$ docker-compose ps
Name Command State Ports
---------------------------------------------------------------------------------------------
docker_db_1 docker-entrypoint.sh postgres Up 5432/tcp
docker_rabbitmq_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:15672->15672/
tcp,:::15672->15672/tcp,
15691/tcp, 15692/tcp, 25672/tcp,
4369/tcp, 5671/tcp, 0.0.0.0:5672
->5672/tcp,:::5672->5672/tcp
docker_redis_1 docker-entrypoint.sh redis ... Up 6379/tcp
docker_web_1 /bin/sh -c /usr/bin/start.sh Up 0.0.0.0:9109->8000/tcp,:::9109->
8000/tcp
docker_worker_1 /usr/bin/start-worker.sh Up
memcached docker-entrypoint.sh memcached Up 0.0.0.0:11211->11211/tcp,:::1121
1->11211/tcp
However it seems the containers were unable to communicate as these errors continued indefinitely until I stop
ped the containers and start
ed them again. Then everything was fine, for a few days, until it suddenly happened again without probable cause…
Any ideas what might be happening?!
$ docker -v
Docker version 24.0.1, build 6802122
$ uname -a
Linux redacted 3.10.0-1160.53.1.el7.x86_64 #1 SMP Fri Jan 14 13:59:45 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ docker-compose --version
docker-compose version 1.24.0, build 0aa59064
/etc/resolv.conf
in web
container:
nameserver 127.0.0.11
options ndots:0
UPDATE
OK found some clues in /var/log/messages
at the time of the latest crash (this morning):
# still working at this point
May 27 05:02:07 my-hostname yum[31556]: Updated: docker-buildx-plugin.x86_64 0.10.5-1.el7
May 27 05:02:09 my-hostname yum[31556]: Updated: docker-ce-cli.x86_64 1:24.0.2-1.el7
May 27 05:02:10 my-hostname yum[31556]: Updated: docker-ce-rootless-extras.x86_64 24.0.2-1.el7
May 27 05:02:16 my-hostname yum[31556]: Updated: docker-ce.x86_64 3:24.0.2-1.el7
May 27 05:02:16 my-hostname systemd: Reloading.
May 27 05:02:17 my-hostname systemd: Stopping Docker Application Container Engine...
May 27 05:02:17 my-hostname dockerd: time="2023-05-27T05:02:17.087647049+10:00" level=info msg="Processing signal 'terminated'"
May 27 05:02:17 my-hostname dockerd: time="2023-05-27T05:02:17.113911997+10:00" level=info msg="Daemon shutdown complete"
May 27 05:02:17 my-hostname dockerd: time="2023-05-27T05:02:17.117216292+10:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
May 27 05:02:17 my-hostname dockerd: time="2023-05-27T05:02:17.117728466+10:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=moby
May 27 05:02:17 my-hostname systemd: Stopped Docker Application Container Engine.
May 27 05:02:17 my-hostname systemd: Starting Docker Application Container Engine...
May 27 05:02:17 my-hostname dockerd: time="2023-05-27T05:02:17.301708572+10:00" level=info msg="Starting up"
May 27 05:02:18 my-hostname dockerd: time="2023-05-27T05:02:18.895994520+10:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
May 27 05:02:30 my-hostname dockerd: time="2023-05-27T05:02:30.709737036+10:00" level=info msg="Loading containers: start."
May 27 05:02:30 my-hostname dockerd: time="2023-05-27T05:02:30.754470385+10:00" level=error msg="stream copy error: reading from a closed fifo"
May 27 05:02:30 my-hostname dockerd: time="2023-05-27T05:02:30.756943164+10:00" level=error msg="stream copy error: reading from a closed fifo"
May 27 05:02:30 my-hostname dockerd: time="2023-05-27T05:02:30.798420878+10:00" level=info msg="ignoring event" container=562f53739a6c564fb7ca240a68de87489c5132f513977ae53012ecba752d90c4 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 27 05:02:30 my-hostname containerd: time="2023-05-27T05:02:30.798418353+10:00" level=info msg="shim disconnected" id=562f53739a6c564fb7ca240a68de87489c5132f513977ae53012ecba752d90c4
May 27 05:02:30 my-hostname containerd: time="2023-05-27T05:02:30.798866207+10:00" level=warning msg="cleaning up after shim disconnected" id=562f53739a6c564fb7ca240a68de87489c5132f513977ae53012ecba752d90c4 namespace=moby
May 27 05:02:30 my-hostname containerd: time="2023-05-27T05:02:30.798950034+10:00" level=info msg="cleaning up dead shim"
May 27 05:02:30 my-hostname containerd: time="2023-05-27T05:02:30.827844408+10:00" level=warning msg="cleanup warnings time="2023-05-27T05:02:30+10:00" level=info msg="starting signal loop" namespace=moby pid=22533 runtime=io.containerd.runc.v2n"
May 27 05:02:30 my-hostname dockerd: time="2023-05-27T05:02:30.924708741+10:00" level=info msg="Firewalld: docker zone already exists, returning"
May 27 05:02:31 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER' failed: iptables: No chain/target/match by that name.
May 27 05:02:31 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t nat -D PREROUTING' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May 27 05:02:31 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t nat -D OUTPUT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May 27 05:02:31 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER' failed: iptables: Too many links.
May 27 05:02:31 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION-STAGE-1' failed: iptables: Too many links.
May 27 05:02:31 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -F DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
May 27 05:02:31 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
May 27 05:02:31 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -D FORWARD -i br-7630a2794dac -o br-7630a2794dac -j DROP' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May 27 05:02:31 my-hostname dockerd: time="2023-05-27T05:02:31.454941777+10:00" level=info msg="Firewalld: interface br-7630a2794dac already part of docker zone, returning"
May 27 05:02:31 my-hostname dockerd: time="2023-05-27T05:02:31.522751685+10:00" level=info msg="Firewalld: interface br-7630a2794dac already part of docker zone, returning"
May 27 05:02:31 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -D FORWARD -i docker0 -o docker0 -j DROP' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May 27 05:02:31 my-hostname dockerd: time="2023-05-27T05:02:31.780958499+10:00" level=info msg="Firewalld: interface docker0 already part of docker zone, returning"
May 27 05:02:31 my-hostname dockerd: time="2023-05-27T05:02:31.844234522+10:00" level=info msg="Firewalld: interface docker0 already part of docker zone, returning"
May 27 05:02:32 my-hostname dockerd: time="2023-05-27T05:02:32.460772601+10:00" level=error msg="failed to populate fields for osl sandbox 9a40c0210cd412288d7c33eb61bad40530ef9ec48f6701bd1e7184c13fa64d3c"
May 27 05:02:32 my-hostname dockerd: time="2023-05-27T05:02:32.485507497+10:00" level=error msg="failed to populate fields for osl sandbox cd0489e190d48bd6f4e1361ffb1ef948c5b62695f1c3cc08c0793a12a36e0a70"
May 27 05:02:32 my-hostname dockerd: time="2023-05-27T05:02:32.511399464+10:00" level=error msg="failed to populate fields for osl sandbox cf16beecf54d69628a896ed823c890010dc59a288545514b3caab4288b96bbd3"
May 27 05:02:32 my-hostname dockerd: time="2023-05-27T05:02:32.538845672+10:00" level=error msg="failed to populate fields for osl sandbox d9d36cdee5fed3740df123decea7d79f189f42de8879500e36c5ce96695603cb"
May 27 05:02:32 my-hostname dockerd: time="2023-05-27T05:02:32.567477583+10:00" level=error msg="failed to populate fields for osl sandbox 6735141218e84ebd4f0dc0bfe178c926eb4aedd878cc957463d44506d2e55e83"
May 27 05:02:32 my-hostname dockerd: time="2023-05-27T05:02:32.597308395+10:00" level=error msg="failed to populate fields for osl sandbox 74703e4f1bd0cc3d22c0eac14e46c3794a8daef00727f775dd4c4389ee728875"
May 27 05:02:32 my-hostname kernel: br-7630a2794dac: port 8(vethdba34b5) entered disabled state
May 27 05:02:32 my-hostname kernel: device vethdba34b5 left promiscuous mode
May 27 05:02:32 my-hostname kernel: br-7630a2794dac: port 8(vethdba34b5) entered disabled state
May 27 05:02:32 my-hostname NetworkManager[708]: <info> [1685127752.6322] device (vethdba34b5): released from master device br-7630a2794dac
May 27 05:02:33 my-hostname dockerd: time="2023-05-27T05:02:33.030110748+10:00" level=info msg="Removing stale sandbox 8c4bad304b8e2c68b55d49ae928839050b9e6fc20caf369c6d3fae18f2f22f89 (562f53739a6c564fb7ca240a68de87489c5132f513977ae53012ecba752d90c4)"
May 27 05:02:33 my-hostname dockerd: time="2023-05-27T05:02:33.038984969+10:00" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 7630a2794dacc176f1eca74b359658aa646fc256cf8189af6a0963e182e8f85f 4f0d21ab87e2c9f5bcca66178add3e2bd787af62be9be0bc1ef21e41d1ddab6e], retrying...."
May 27 05:02:33 my-hostname dockerd: time="2023-05-27T05:02:33.055058812+10:00" level=error msg="failed to populate fields for osl sandbox 945e03bae7b186562c5a3d2993f8d2b4d6fcd1ddea6277743aaac5c20dd26b50"
May 27 05:02:33 my-hostname dockerd: time="2023-05-27T05:02:33.055937650+10:00" level=info msg="there are running containers, updated network configuration will not take affect"
May 27 05:02:33 my-hostname dockerd: time="2023-05-27T05:02:33.058198587+10:00" level=info msg="Loading containers: done."
May 27 05:02:33 my-hostname dockerd: time="2023-05-27T05:02:33.163904779+10:00" level=info msg="Docker daemon" commit=659604f graphdriver=overlay2 version=24.0.2
May 27 05:02:33 my-hostname dockerd: time="2023-05-27T05:02:33.164168658+10:00" level=info msg="Daemon has completed initialization"
May 27 05:02:33 my-hostname dockerd: time="2023-05-27T05:02:33.213072781+10:00" level=info msg="API listen on /var/run/docker.sock"
May 27 05:02:33 my-hostname systemd: Started Docker Application Container Engine.
# now failing
and the previous failure:
May 21 07:05:59 my-hostname yum[7323]: Updated: docker-compose-plugin.x86_64 2.18.1-1.el7
May 21 07:06:00 my-hostname yum[7323]: Updated: docker-ce-cli.x86_64 1:24.0.1-1.el7
May 21 07:06:01 my-hostname yum[7323]: Updated: docker-ce-rootless-extras.x86_64 24.0.1-1.el7
# working at this point
May 21 07:06:07 my-hostname yum[7323]: Updated: docker-ce.x86_64 3:24.0.1-1.el7
May 21 07:06:07 my-hostname systemd: Reloading.
May 21 07:06:07 my-hostname systemd: Stopping Docker Application Container Engine...
May 21 07:06:07 my-hostname dockerd: time="2023-05-21T07:06:07.434010268+10:00" level=info msg="Processing signal 'terminated'"
May 21 07:06:07 my-hostname dockerd: time="2023-05-21T07:06:07.462825347+10:00" level=info msg="Daemon shutdown complete"
May 21 07:06:07 my-hostname dockerd: time="2023-05-21T07:06:07.463607568+10:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=moby
May 21 07:06:07 my-hostname dockerd: time="2023-05-21T07:06:07.463903566+10:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
May 21 07:06:07 my-hostname systemd: Stopped Docker Application Container Engine.
May 21 07:06:07 my-hostname systemd: Starting Docker Application Container Engine...
May 21 07:06:07 my-hostname dockerd: time="2023-05-21T07:06:07.712187157+10:00" level=info msg="Starting up"
May 21 07:06:09 my-hostname dockerd: time="2023-05-21T07:06:09.213598321+10:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
May 21 07:06:21 my-hostname dockerd: time="2023-05-21T07:06:21.320119275+10:00" level=info msg="Loading containers: start."
May 21 07:06:21 my-hostname dockerd: time="2023-05-21T07:06:21.428525472+10:00" level=info msg="Firewalld: docker zone already exists, returning"
May 21 07:06:21 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER' failed: iptables: No chain/target/match by that name.
May 21 07:06:21 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t nat -D PREROUTING' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May 21 07:06:21 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t nat -D OUTPUT' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May 21 07:06:21 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER' failed: iptables: Too many links.
May 21 07:06:21 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION-STAGE-1' failed: iptables: Too many links.
May 21 07:06:21 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -F DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
May 21 07:06:21 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -t filter -X DOCKER-ISOLATION' failed: iptables: No chain/target/match by that name.
May 21 07:06:22 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -D FORWARD -i br-7630a2794dac -o br-7630a2794dac -j DROP' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May 21 07:06:22 my-hostname dockerd: time="2023-05-21T07:06:22.104048740+10:00" level=info msg="Firewalld: interface br-7630a2794dac already part of docker zone, returning"
May 21 07:06:22 my-hostname dockerd: time="2023-05-21T07:06:22.176584026+10:00" level=info msg="Firewalld: interface br-7630a2794dac already part of docker zone, returning"
May 21 07:06:22 my-hostname firewalld[704]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w10 -D FORWARD -i docker0 -o docker0 -j DROP' failed: iptables: Bad rule (does a matching rule exist in that chain?).
May 21 07:06:22 my-hostname dockerd: time="2023-05-21T07:06:22.493966150+10:00" level=info msg="Firewalld: interface docker0 already part of docker zone, returning"
May 21 07:06:22 my-hostname dockerd: time="2023-05-21T07:06:22.587591571+10:00" level=info msg="Firewalld: interface docker0 already part of docker zone, returning"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.228002494+10:00" level=error msg="failed to populate fields for osl sandbox bc76908fce84399c2679fb6ec97763ec2b3ed11cfc61599ae25df14cc99cab81"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.253473513+10:00" level=error msg="failed to populate fields for osl sandbox dfd2cc470474dd5a0dd7f067fc6988621c53e14c86f06be738b55cb248985965"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.277796542+10:00" level=error msg="failed to populate fields for osl sandbox 53adb48ffcd597198cfb549eab5f9b0e34b6b2d565a82a22ea3b7cfe8198e48b"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.303381498+10:00" level=error msg="failed to populate fields for osl sandbox 6f664d964c34d953bc869dd97cb52345e7271b991423f6defc277d57ab1d8d18"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.333007410+10:00" level=error msg="failed to populate fields for osl sandbox 8c780783dc3060797e991c1fa896e894b25faf94a94ca255731928d339a3fca1"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.363884199+10:00" level=error msg="failed to populate fields for osl sandbox 90004d1d46832307ae98d041c56cea879787fca8c8c9d20c62df0978784b707f"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.364885308+10:00" level=error msg="failed to populate fields for osl sandbox 945e03bae7b186562c5a3d2993f8d2b4d6fcd1ddea6277743aaac5c20dd26b50"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.366407341+10:00" level=info msg="there are running containers, updated network configuration will not take affect"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.368298228+10:00" level=info msg="Loading containers: done."
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.480334538+10:00" level=info msg="Docker daemon" commit=463850e graphdriver=overlay2 version=24.0.1
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.480582539+10:00" level=info msg="Daemon has completed initialization"
May 21 07:06:23 my-hostname dockerd: time="2023-05-21T07:06:23.537863940+10:00" level=info msg="API listen on /var/run/docker.sock"
May 21 07:06:23 my-hostname systemd: Started Docker Application Container Engine.
# now failing
So I think the trigger for the issue is the automatic update of docker
packages by yum
in CentOS, and the restarting of the Docker Application Container Engine, after which the networking breaks down. there are running containers, updated network configuration will not take affect
sounds particularly suspicious!
Aside from turning off automated yum
updates, how can I ensure that the container networking won’t break in these situations?
4
Answers
Have you tried with a specific network ?
Like that:
Since containers are having issues talking to each other, one thing we can test is whether the DNS resolution working fine or not.
Check if DNS resolution is working correctly within the Docker network. You can try running a temporary container and test if DNS resolution works for the service names. For Eg:
–> If you have any DNS caching mechanisms in place (e.g., DNS caching on the host machine or within the containers), ensure that they are not causing any conflicts or outdated information.
–> As a last resort Restarting Docker: If the issue persists and you cannot find any other causes, you can try restarting the Docker service itself. This may help resolve any potential networking or DNS-related issues.
–> It’s also worth mentioning that updating Docker Compose to the latest versions might help resolve any known issues or bugs. The docker-compose version you are using is docker-compose version 1.24.0, build 0aa59064
There have been so many new versions released after this. please check here for the latest versions
https://github.com/docker/compose/releases
I have been facing issues because of the low version of docker-compose used in my application. once upgraded to the latest one the was fixed. But the issue was not the exact same as you are facing.
I would not suggest any testing in the Prod environment but in parallel, you can update the docker-compose version and test in dev or qa to monitor for issues.
I would also suggest having a network set in the docker-compose file so that containers will be linked through the external network. I am not sure it can resolve this name resolution issue. But this is something you should be implementing in your services.
Hope this will help
I have written this as an answer, instead of comment, for text formatting reason.
Can you try to put following in /etc/resolv.conf ?
which is I’ve got when I run your docker-compose.yml.
Also, check /etc/resolv.conf in all your containers.
Such issues when upgrading Docker daemon are expected. On upgrade, Docker daemon will shutdown – and existing containers will be restarted by default.
You can configure Docker Daemon to enable Live Restore. Edit or create
etc/docker/daemon.json
(default path for Docker daemon config on Linux) to specify:Note however it may not work for major upgrade (such as
23.x
>24.x
) so you may want to configure automatic update accordingly.