Amazon web services - How can I split a list into it's own AWS instance where the value of each item is used as a python script parameter?

GrahamNedelka
June 19, 2023
127 views
0 votes
2 Answers

How can I use AWS tools to process a large list of email addresses in a Python script more quickly? I have a script that iterates through the list and passes each email address to a function, but even with multi-threading, it’s not processing quickly enough to meet my deadline.

I’m wondering if it’s possible to use AWS Step Functions and/or containerization to pass each email address to its own node/instance. Can someone walk me through the steps to achieve this?

Answers

- JohnRotenstein
- June 18, 2023 at 11:16 am
- 0 votes
0
One option would be:
- Send each email address as a separate message to an Amazon SQS queue
- Create an AWS Lambda function that is configured to trigger when messages are available in the queue
- The message will be sent to the Lambda function and you can write code to process the message
- You can configure the batch size to control how many messages are sent to each Lambda function that is invoked
For testing purposes, just send one message to the queue at a time until you have the code working.

There is a default limit of 1000 concurrent Lambda functions. Therefore, if you send many messages to the queue, they will be processed in parallel.
Login or Signup to reply.

- JustinCallison
- June 19, 2023 at 9:06 pm
- 0 votes
0
I think Step Functions Distributed Map would be a great fit here. You can use it to iterate through your list and run a workflow execution for each (or batch them up if you find that helpful).

As for how you process each of these, you have options there too. If each of the processing steps for each email address will take under 15 minutes, you can use Lambda for this and compose the steps using Step Functions. If each step is going to take longer, then you might want to look at the Optimized Integration with ECS to implement your processing using ECS. Or if you just want to run these tasks on EC2 instances, you can use Activities for that. And you can mix and match as you need to.

To get started, I’d encourage you to check out the module in The Step Functions Workshop. And if you want a broader overview, a few of us gave a presentation on Distributed Map at re:Invent last year which you can find here on Youtube.

Login or Signup to reply.