I have a Lambda function which I’m currently invoking from EC2. But I’m invoking it sequentially in a loop. So, if there are 1000 items to be processed, I’m using a simple loop to iterate over them and invoke the Lambda function for each item.
If this were to be done for a million items, this will not be efficient. What options do I have to reduce the time taken for this process.
The process for all items are independent; so, I can invoke, say, 1000 Lambdas in 1000 threads. But is that appropriate? I don’t know much about Celery, SQS, redis, etc. but will those be useful in this scenario?
Just in case it’s relevant, each Lambda takes around 5 minutes to complete.
2
Answers
Using a message broker would be more efficient. You can got with Redis or RabbitMQ
A common practice is to push messages to an Amazon SQS queue and configure the queue to trigger an AWS Lambda function.
You can push up to 10 messages at a time to the SQS queue.
This approach also has the benefit of handling failed invocations by moving messages to a Dead Letter Queue.
See: Using AWS Lambda with Amazon SQS