We are attempting to configure a live-chat on our website, through the use of Django Channels 2, AWS, and Nginx + Daphne. Our setup works fine running locally, however we are running in to issues when deploying to production.
Our production environment consists of two Docker containers that are deployed to AWS using Elastic Container Service (Fargate). The container running in front is a nginx configuration that is acting as a proxy server to serve static files. The second container runs our API/Django site. The proxy is running on port 8000 and forwards incoming requests to the API/Django container, which is running on port 9000. I will also note that we are using terraform to configure our AWS environment.
I have referenced multiple articles that have accomplished similar setups. For example:
https://medium.com/@elspanishgeek/how-to-deploy-django-channels-2-x-on-aws-elastic-beanstalk-8621771d4ff0
However this setup uses an Elastic Beanstalk deployment, which we are not using.
Proxy Dockerfile:
FROM nginxinc/nginx-unprivileged:1-alpine
LABEL maintainer='CodeDank'
COPY ./default.conf.tpl /etc/nginx/default.conf.tpl
COPY ./uwsgi_params /etc/nginx/uwsgi_params
ENV LISTEN_PORT=8000
ENV APP_HOST=app
ENV APP_PORT=9000
USER root
RUN mkdir -p /vol/static
RUN chmod 755 /vol/static
RUN touch /etc/nginx/conf.d/default.conf
RUN chown nginx:nginx /etc/nginx/conf.d/default.conf
COPY ./entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
USER nginx
CMD ["/entrypoint.sh"]
API/site Dockerfile:
FROM python:3.7-alpine3.11
LABEL maintainer="CodeDank"
ENV PYTHONUNBUFFERED 1
ENV PATH="/scripts:${PATH}"
RUN pip install --upgrade pip
COPY ./requirements.txt /requirements.txt
RUN apk add --update --no-cache postgresql-client jpeg-dev
RUN apk add --update --no-cache --virtual .tmp-build-deps
gcc libc-dev linux-headers postgresql-dev
musl-dev zlib zlib-dev
RUN apk add --update --no-cache libressl-dev musl-dev libffi-dev
RUN apk add --update --no-cache g++ freetype-dev jpeg-dev
RUN pip install -r /requirements.txt
RUN apk del .tmp-build-deps
RUN mkdir /app
WORKDIR /app
COPY ./app /app
COPY ./scripts /scripts
RUN chmod +x /scripts/*
RUN mkdir -p /vol/web/media
RUN mkdir -p /vol/web/static
RUN adduser -D user
RUN chown -R user:user /vol/
RUN chmod -R 755 /vol/web
USER user
CMD ["entrypoint.sh"]
(entrypoint scripts shown below)
We have created an AWS Elasticache Redis server to be used as the CHANNEL_LAYERS backend for Django channels. The ‘REDIS_HOSTNAME’ environment variable is the endpoint address of the redis server.
# Channels Settings
ASGI_APPLICATION = "app.routing.application"
CHANNEL_LAYERS = {
"default": {
"BACKEND": "channels_redis.core.RedisChannelLayer",
"CONFIG": {
"hosts": [
(os.environ.get('REDIS_HOSTNAME'), 6379)
],
},
},
}
asgi.py file:
import os
import django
from channels.routing import get_default_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'app.settings')
django.setup()
application = get_default_application()
Following the Channels docs, we are attempting to configure daphne to run the asgi application within our project. Ideally, we would like this setup to have our nginx proxy server forward all websocket requests to the daphne server, running on port 9001. All of our websocket endpoints will contain /ws/, thus the nginx proxy configuration has been defined as shown below.
default.conf.tpl:
upstream channels-backend {
server localhost:9001;
}
server {
listen ${LISTEN_PORT};
location /static {
alias /vol/static;
}
location / {
uwsgi_pass ${APP_HOST}:${APP_PORT};
include /etc/nginx/uwsgi_params;
client_max_body_size 4G;
}
location /ws/ {
proxy_pass http://channels-backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $server_name;
}
}
Proxy entrypoint script:
#!/bin/sh
set -e
envsubst '${LISTEN_PORT},${APP_HOST},${APP_PORT}' < /etc/nginx/default.conf.tpl > /etc/nginx/conf.d/default.conf
nginx -g 'daemon off;'
API/site entrypoint script:
#!/bin/sh
set -e
python manage.py collectstatic --noinput
python manage.py wait_for_db
python manage.py migrate
uwsgi --socket :9000 --workers 4 --master --enable-threads --module app.wsgi
daphne -b 0.0.0.0 -p 9001 app.asgi:application
Upon trying to connect to the websocket on our site, a 502 error is returned.
Error during WebSocket handshake: Unexpected response code: 502.
I suspect that the daphne server is either not running as we expect, or it is not properly configured with the nginx server. Within the API entrypoint script, would the daphne command even be ran as it currently stands? Or, is there anything that we are missing that is required to have the daphne run behind the nginx proxy? My initial thought is that the daphne command can not be run after the uwsgi command within the entrypoint script. However, I am not exactly sure where else this command would need to be placed in order to run the daphne process.
The cloudwatch logs for the proxy are not super detailed, however I receive this error message when attempting to connect to a websocket on the site.
[error] 8#8: *53700 connect() failed (111: Connection refused) while connecting to upstream, client: 10.1.1.190, server: , request: "GET /ws/chat/djagno/ HTTP/1.1", upstream: "http://127.0.0.1:9001/ws/chat/djagno/", host: "mycustomdomain.net"
I have seen that there are other approaches to this problem that do not include using the Nginx proxy to direct the websocket traffic to daphne. Maybe our approach is not the best solution? We are open to alternative configurations.
Any feedback would be greatly appreciated. Thanks!
3
Answers
One thing which comes to my mind is, are you scaling the nginx container? You might need to enable session stickiness on your Application Load Balancer in order make websockets work.
Reference:
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#sticky-sessions
Since you mentioned you are using Terraform for your AWS deployments, I would check the configuration for your AWS security groups, specifically for where you are setting up the security groups between your EC2 instance and Elasticache Redis.
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/elasticache_cluster
edit: On second glance I just noticed how you are starting up uwsgi and daphne. The way you are doing it now you are starting uwsgi in the foreground and then this process just waits and daphne never gets started up (hence the 502 error).
Change
uwsgi --socket :9000 --workers 4 --master --enable-threads --module app.wsgi
daphne -b 0.0.0.0 -p 9001 app.asgi:application
to
uwsgi --socket :9000 --workers 4 --master --enable-threads --module app.wsgi & daphne -b 0.0.0.0 -p 9001 app.asgi:application
This will start uwsgi in the background and then move on to start Daphne.
If you need a way to then kill both you can run this in a script and then add a
wait
at the end, so that when you kill the script the uwsgi and daphne processes get killed as well. Otherwise, you can look into daemonizing the uwsgi and daphne startups with systemd or supervisor.There could be a few issues here. The first thing I discovered when dealing with websocket requests is that they behave differently on your server, than they do with localhost. I had to modify my Django Channels logic in several different areas depending on the versions of Django, Django Channels, Daphne, etc.
For example: When we upgraded to Channels 3.0, we couldn’t access our database without the
database_sync_to_async()
decorator and had to offload the calls to their own separate functions.Check your
routing.py
for request stoppers likeAllowHostsOriginValidator
.If you are using custom middleware, the scope object is different based on your environment and the way you access the data.
Also, try running your Daphne outside of your daemon process through a unix socket like so:
We use the following set up, if you want to give it a go.
Load balancing nginx config:
Front end nginx config:
Django server nginx config: