I’m creating a serverless application in AWS that needs to store some data gathered trough an external API into a DynamoDB instance.
I’m trying to achieve it through Lambda functions, but the data are quite heavy, so everytime I try to perfom some data manipulation on them I get "timeout error", even if my timeout time is set to some minutes (if I run the same script on my computer it takes less than 10s to be executed).
I tried to gather data and process them with json.loads()
only but I still get timeout error.
Looking around the internet I saw there are quite few methods to pull data from an external API endpoint in AWS, like Glue or AppFlow.
My questions are:
- Is it a good choice using Lambdas for this type of task?
- What could cause my "timeout" problems?
- Do you suggest better alternatives to accomplish this task?
Thank you in advance
2
Answers
My questions are:
Lambda has a maximum timeout of 15 minutes. If your work could take more than 15 mins then you can try a couple of things:
Make sure the task isn’t stuck in a loop, use adequate logging. Increase Lambda memory, Lambda should be able to match your local machine time.
It depends, if it’s a large amount of data (GB+) then I would suggest using AWS Glue.
In addition to the points mentioned by Leeroy Hannigan, with respect to question 3:
You can see whether you can split the workload and leverage AWS Stepfunctions.
E.g. you can use a different set of Lambda functions to split, iterate, transform, and then load the data.