skip to Main Content

I have a Lambda function about data science that gets a user id and a list of operations to perform over the data of this user.

Example path:

calculate?userId=1&operations=func1,func2,func3,func4,func5

In the Lambda function I am running calling all specified functions in a for loop and the functions are not that short-running. Every single one of them query the database and there are some overlapping queries. I have implemented sharing of the queries between functions.

I am suspecting that calling each function in the for loop is a good thing because for example while the func1 is running, func2 is waiting and so on. Should I:

  1. Run all of the functions in parallel with asyncio? So that they do not wait for each other to finish.
  2. Convert this function to a state machine and multiple Lambda functions (one for each function that I specified in query params) and implement necessary state transitions and etc.

2

Answers


  1. If the functions are dependent on each other and you’re not hitting lambda timeout limit, I would leave it as it is.

    If they can run independently, I would split:

    1. run all shared queries with asyncio
    2. run all functions with asyncio (assuimg they still use IO to write the results somewhere)

    Unfortunately, without any more specific example it is difficult to give more details.

    Login or Signup to reply.
  2. I’d personally go for step functions with state machines but that would also increase the cost in your case. The additional cost, though, comes with added benefits of the step function service.

    You can create a one function to query database and pass relevant information to each worker function.

    Also using this, if you want to add / remove any functionality it will be easier to manage. Also, the step functions have a great visualisation console where you can monitor and debug all of the state changes that your process goes through. It can be helpful for audit and debugging. Also, the built-in try-catch capability can come in handy too. And by using this, you won’t have to worry about timeout issues.

    Also, if in future you want to run some functions in parallel and some of them sequentially, then this kind of complex architecture can be easily handled using a state machine.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search