I have a strange issue which I can’t seem to narrow down. Worse still is that it’s intermittent.
My requirements:
I am running app1 and app2 on a Jenkins job. app1 gets created first, I get the docker container IP, put that into the environment vairables file and spin up app2.
The issue is that every now and then, app2 will complain that app1 is not reachable. See error below from the application:
Failed to open TCP connection to 172.19.0.2:3006 (Connection refused - connect(2) for "172.19.0.2" port 3006
Here is my Jenkins job:
#!/bin/bash -e
#Create docker network app1 can talk to app2
docker network create mynetwork
#Run app1
docker run -d --name app1 --network=mynetwork -p 3006:3006 --env-file .app1vars 12345678.dkr.ecr.us-east-1.amazonaws.com/app1:latest docker/start-server.sh
#Update app2 env vars
newip=$(docker container inspect -f '{{ .NetworkSettings.Networks.mynetwork.IPAddress }}' app1)
sed -i "s/1.1.1.1/$newip/" .app1vars #1.1.1.1 is dummy IP
#Run app2
docker run -d --name app2 --network=mynetwork -p 3003 --env-file .app1vars 12345678.dkr.ecr.us-east-1.amazonaws.com/app2:latest docker/start-server.sh
docker exec app2 rake sometask
One thing to note – we use the EC2 plugin from Jenkins which spins up an EC2 server in AWS for each Jenkins job. I thought maybe the server spinning up was causing issues somehow but that too is not the case. We also have 100s of other jobs which are without any issues.
Troubleshooting steps I have performed:
- Confirmed this is an intermittent issue as I can run this successfully most of the time
- Ensured app1 and app2 are running
- Confirmed that the newip is correct
Any ideas how I go about this?
2
Answers
"Ensured app1 and app2 are running" you’re not doing anything in particular to verify that app1 is healthy, just you can inspect it and get the IP address. The simplest guess is that occasionally app1 doesn’t get started before app2 attempts to connect. I would suggest using a health check on app1 and waiting for it to be healthy before starting app2.
Please check below. if it works fine…!
Problem: Your setup occasionally faces an issue where app2 complains that it can’t connect to app1, even though most of the time it works fine.
Possible Cause and Solution:
Timing Issue: Sometimes, app2 might try to connect to app1 before app1 is fully ready to accept connections.
Solution: Add a small delay before starting app2 after starting app1. You can use the sleep command to pause the script for a few seconds.
After starting app1
sleep 5 # Wait for 5 seconds
Start app2
Network Update Delay: Occasionally, the IP update might not propagate to app2 in time, causing it to use the old IP.
Solution: Double-check that the IP update is happening consistently before starting app2. You could print out the updated IP to ensure it’s correct.
Docker Networking: Docker networking can be finicky, causing intermittent connection issues.
Solution: Test the connectivity between containers using simple tools like ping or curl from within the app2 container to the IP of app1. If there are issues, it might be related to Docker’s networking setup.
Logging and Debugging:
Solution: Add more logs and diagnostics to both app1 and app2 containers. These logs can help you understand if there are any issues during startup or runtime.
Retry Mechanism:
Solution: Implement a retry mechanism in app2 to handle cases where the connection to app1 fails. This can improve the system’s resilience to intermittent issues.
External Factors:
Solution: While other jobs are working fine, external factors can still occasionally affect your setup. Keep an eye on AWS-related events or network conditions that might impact your containers.
Remember to apply these solutions one at a time and test thoroughly after each change to identify which one helps mitigate the intermittent issue.