Suddenly, my Django website has stopped service over the internet. I have no idea what changed.
So, when I launch the website in browser, I am getting bunch of error messages(attached screenshot). The error is complaining about the webserver(nginx) which is hosting my website.
My environment:
Ubuntu 18
Gunicorn
Nginx
Website hosted on AWS. (inbound/outbond rule screenshot attached)
I have checked the sudo journalctl -u nginx.service
Aug 15 04:15:39 primarySNS.schoolnskill.com systemd[1]: Starting A high performance web server and a reverse proxy server...
Aug 15 04:15:39 primarySNS.schoolnskill.com systemd[1]: Started A high performance web server and a reverse proxy server.
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: Stopping A high performance web server and a reverse proxy server...
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: Stopped A high performance web server and a reverse proxy server.
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: Starting A high performance web server and a reverse proxy server...
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: nginx.service: Failed to parse PID from file /run/nginx.pid: Invalid argument
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: Started A high performance web server and a reverse proxy server.
I could see something "invalid argument" line. Not sure if that has anything to do with my situation.
I have also checked the nginx error log. its 0 bytes
-rw-r----- 1 xxx yyy 0 Aug 15 06:25 /var/log/nginx/error.log
The syslog dies have some interesting logs:
Aug 15 06:25:01 primarySNS rsyslogd: [origin software="rsyslogd" swVersion="8.32.0" x-pid="920" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Aug 15 06:26:07 primarySNS systemd-timesyncd[600]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).
Aug 15 06:26:50 primarySNS systemd-networkd[771]: eth0: Configured
Aug 15 06:26:50 primarySNS systemd-timesyncd[600]: Network configuration changed, trying to establish connection.
Aug 15 06:26:50 primarySNS systemd-timesyncd[600]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).
Aug 15 06:35:01 primarySNS CRON[2678]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 06:45:01 primarySNS CRON[2693]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 06:50:54 primarySNS gunicorn[1432]: Not Found: /robots.txt
Aug 15 06:53:30 primarySNS gunicorn[1432]: Not Found: /profile1/
Aug 15 06:55:01 primarySNS CRON[2718]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 06:56:50 primarySNS systemd-networkd[771]: eth0: Configured
Aug 15 06:56:50 primarySNS systemd-timesyncd[600]: Network configuration changed, trying to establish connection.
Aug 15 06:56:50 primarySNS systemd-timesyncd[600]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).
Aug 15 07:05:01 primarySNS CRON[2734]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:15:01 primarySNS CRON[2750]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:17:01 primarySNS CRON[2756]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 15 07:25:01 primarySNS CRON[2769]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:26:50 primarySNS systemd-networkd[771]: eth0: Configured
Aug 15 07:26:50 primarySNS systemd-timesyncd[600]: Network configuration changed, trying to establish connection.
Aug 15 07:26:50 primarySNS systemd-timesyncd[600]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).
Aug 15 07:35:01 primarySNS CRON[2802]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:41:11 primarySNS systemd-resolved[784]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 15 07:41:11 primarySNS systemd-resolved[784]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 15 07:45:01 primarySNS CRON[2852]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: 2020-08-15 07:50:43 INFO Backing off health check to every 3600 seconds for 10800 seconds.
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: 2020-08-15 07:50:43 ERROR Health ping failed with error - EC2RoleRequestError: no EC2 instance role found
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: caused by: EC2MetadataError: failed to make EC2Metadata request
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: #011status code: 404, request id:
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: caused by: <?xml version="1.0" encoding="iso-8859-1"?>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: #011"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: <head>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: <title>404 - Not Found</title>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: </head>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: <body>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: <h1>404 - Not Found</h1>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: </body>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: </html>
Aug 15 07:52:26 primarySNS systemd-resolved[784]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 15 07:52:26 primarySNS systemd-resolved[784]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 15 07:55:01 primarySNS CRON[2870]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Follow up questions and its answers:
Are you using route53 as the DNS resolver?
Yes
Has your EC2 been stopped and started again, and if so, have you checked that the ip address is still the same?
Yes, but I have made sure that the new IP is updated in my Route 53
Is your ec2 in a public subnet? Can you reach google.com or 8.8.8.8 from the command line on it?
ping google.com
PING google.com (172.217.2.110) 56(84) bytes of data.
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=1 ttl=112 time=1.31 ms
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=2 ttl=112 time=1.29 ms
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=3 ttl=112 time=1.33 ms
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=4 ttl=112 time=1.33 ms
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=5 ttl=112 time=1.34 ms
is nginx actually listening on the ec2? If you ssh to it, and curl -vvvv http://localhost/ do you actually get a response?
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Sat, 15 Aug 2020 13:56:06 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Fri, 10 Jul 2020 11:16:00 GMT
< Connection: keep-alive
< ETag: "5f084df0-264"
< Accept-Ranges: bytes
<
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host localhost left intact
What happens when you run curl -vvvv http://(ec2.public.ip.address)/ ?
* Rebuilt URL to: http://<public_ip>/
* Trying <public_ip>...
* TCP_NODELAY set
* Connected to <public_ip> (<public_ip>) port 80 (#0)
> GET / HTTP/1.1
> Host: <public_ip>
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Sat, 15 Aug 2020 13:57:13 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Fri, 10 Jul 2020 11:16:00 GMT
< Connection: keep-alive
< ETag: "5f084df0-264"
< Accept-Ranges: bytes
<
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host <public_ip> left intact
Is your site running at the path or virtual domain you think it is? Has your nginx config perhaps changed?
I didn’t make any changes to my nginx configuration.
What happens if you run curl http://169.254.169.254/latest/meta-data – do you get a response?
curl http://169.254.169.254/latest/meta-data
ami-id
ami-launch-index
ami-manifest-path
block-device-mapping/
events/
hibernation/
hostname
identity-credentials/
instance-action
instance-id
instance-life-cycle
instance-type
local-hostname
local-ipv4
mac
metrics/
network/
placement/
profile
public-hostname
public-ipv4
public-keys/
reservation-id
security-groups
2
Answers
Without more information this is hard to debug. Things to check:
Are you using route53 as the DNS resolver? Has your EC2 been stopped and started again, and if so, have you checked that the ip address is still the same?
Is your ec2 in a public subnet? Can you reach google.com or 8.8.8.8 from the command line on it?
is nginx actually listening on the ec2? If you ssh to it, and
curl -vvvv http://localhost/
do you actually get a response?What happens when you run
curl -vvvv http://(ec2.public.ip.address)/
?As above, what happens with
curl -k -vvvv https://ec2.public.ip.address)/
?Is your site running at the path or virtual domain you think it is? Has your nginx config perhaps changed?
What happens if you run
curl http://169.254.169.254/latest/meta-data
– do you get a response?The ssm agent timeout is curious.
As an aside, your security group egress rules are unnecessarily complex. You can remove the http, https and ssh rules because your
all traffic
rule overrides them anyway.Based on the comments.
I went to the OP’s website url and the website has been running. Therefore, there don’t seem to be any issue on the EC2 nor its settings.
It should be noted, that the site works only for HTTP, not HTTPS. Thus attempts to access it using
https://
will fail. This could potentially explain why it was not reachable when tested initially.