I have an application deployed with docker on an EC2 instance: t3a.xlarge.
My application is using 7 different containers (cf image docker-ps.png):
- A Django App, as an API (using python 3.6)
- An Angular Application (using Angular2+)
- A memcached server
- A cerbot (using letsencrypt to automatically renew my SSL
certificats) - A Nginx, used as a reverse proxy to serve my angular application and
my Django API - A Postgres database
- A Pgadmin in order to mananage my database
The issues happen when we send a push notification to our users using Firebase (around 42,000 users). The API is not responding during a certain amount of time: from 1min to 6min.
The Django API use the webserver Gunicorn (https://gunicorn.org/ ) with this configuration:
gunicorn xxxx_api.wsgi -b 0.0.0.0:80 --max-requests 500 --max-requests-jitter 50 --enable-stdio-inheritance -k gevent --workers=16 -t 80
The server or the container never crashed. When I check the metrics, we never use more than 60% of the CPU. Here is a screenshot of some metrics when the notification has been sent: https://ibb.co/Mc0v7R1
Is it because we are using too much bandwidth than our instance allowed us to use? Or should I use another AWS service?
3
Answers
Memory utilisation metrics are not captured for ec2 instances since OS level metrics are not available to AWS. You can collect custom metrics by your self
Reference:
https://awscloudengineer.com/create-custom-cloudwatch-metrics-centos-7/
I think your problem is about the design, you could try sending your push notifications as an async queue using things like SNS & SQS (it’s AWS Way) or Celery & Redis (it’s a traditional way)
If you choose the traditional way this post could help you
https://blog.devartis.com/sending-real-time-push-notifications-with-django-celery-and-redis-829c7f2a714f
I think Its because of queuing Http requests to firebase. I believe that you are sending 42000 firebase requests in a loop. I/O calls are blocking in nature. if you are running the Django app in single thread using gunicorn. these 42000 http calls will block the new calls until they are finished. they will stay in queue until the connection is alive or the requests are within nginx threshold. I don’t think 42000 push notifications will exhaust memory and processing unless payload is too high.