I have created a python script which processes videos using Selenium on an EC2 instance, taking input from an SQS queue for each message. This works fine for a single video, but I need to process hundreds of videos concurrently. My current solution is to spin up a new EC2 instance for each message in the SQS queue, with the services used below:
- SQS queue holding metadata for each video rendering task
- EC2 AMI for the EC2 template
- EC2 Auto Scaling Group for scaling
- Lambda function for launching new EC2 instances
I am also wondering if I should retrieve batches per EC2 from the SQS instead of one message at a time to cut down the amount of instances spun up, sacrificing speed for the sake of cost.
Is this the correct approach in terms of architecture?
2
Answers
I think a better approach would be using AWS Batch. There is even the possibility to use it with Fargate. There are some situations when it is better to use EC2 instead of Fargate, though. You can read about it here https://docs.aws.amazon.com/batch/latest/userguide/fargate.html#when-to-use-fargate.
You would still need the Lambda function to start the AWS Batch jobs.
It is "correct" in the sense that it is a valid solution to your problem. Alternatively you could have your software running in the EC2 server simply pull another SQS message and process another task, continuing until there are no more tasks to process, then shutting down the EC2 server. This would be faster and more efficient because you wouldn’t have to spend the time waiting for the EC2 instance to start up each time.
I would call this part of your solution "incorrect":
Both the Auto Scaling Group and the Lambda function are creating EC2 instances here. You only need 1 of the 2. If the EC2 instance is pulling the records one-by-one in a loop, then you need the Auto Scaling Group. If you are passing a single SQS message to a new EC2 instance via
user-data
then you need the Lambda function.