skip to Main Content

We could ingest 2 datasets (dataset A and dataset B) daily, dataset A may or may not show up everyday.

They both trigger separate Step Functions. step function B with dataset B cant run until step function A with dataset A Stops running or never ran. That was step function be has the most resent data.

Can I add ListExection and a wait until Execution != ‘RUNNING’?

Trying to setup but get a list of executions full history. How would I produce this without a lambda and just use step function logic?

 "executions": [
        {
            "executionArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345:1234-1",
            "stateMachineArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345",
            "name": "1234-1",
            "status": "RUNNING",
            "startDate": "2023-02-27T15:21:22.205000-08:00",
            "stopDate": "2023-02-27T15:28:11.358000-08:00"
        },
        {
            "executionArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345:1234-2",
            "stateMachineArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345",
            "name": "1234-2",
        "status": "ABORTED",
        "startDate": "2023-02-27T15:19:55.739000-08:00",
        "stopDate": "2023-02-27T15:21:11.924000-08:00"
    },
    {
        "executionArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345:1234-3",
            "stateMachineArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345",
            "name": "1234-3",
        "status": "COMPLETED",
        "startDate": "2023-02-27T15:18:45.228000-08:00",
        "stopDate": "2023-02-27T15:19:20.651000-08:00"
    },
    {
        "executionArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345:1234-4",
            "stateMachineArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345",
            "name": "1234-4",
        "status": "FAILED",
        "startDate": "2023-02-27T15:18:30.145000-08:00",
        "stopDate": "2023-02-27T15:18:34.315000-08:00"
    }
]

Is there a way to transform output of ListExecutions Above to something like this:

    {
       "status": "RUNNING"
    } 

or

    {
       "status": "NOT RUNNING"
    } 

With just step function logic?

2

Answers


  1. Here are two methods to identify RUNNING executions. The running parameter filters the running executions. hasRunning returns true if there are running executions, false otherwise.

    {
      "Comment": "Identify running executions",
      "StartAt": "FindRunning",
      "States": {
        "FindRunning": {
          "Type": "Pass",
          "Parameters": {
            "running.$": "$.executions.[?(@.status == 'RUNNING')]",
            "hasRunning.$": "States.ArrayContains($..status, 'RUNNING')"
          },
         "End": true
        }
      }
    }
    

    Given the OP input, the output is:

    {
      "running": [
        {
          "executionArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345:1234-1",
          "stateMachineArn": "arn:aws:states:us-west-2:123456789:execution:StateMachineA12345-12345-12345",
          "name": "1234-1",
          "status": "RUNNING",
          "startDate": "2023-02-27T15:21:22.205000-08:00",
          "stopDate": "2023-02-27T15:28:11.358000-08:00"
        }
      ],
      "hasRunning": true
    }
    

    You could use this state as part of a loop that runs until no running executions are found: FindRunning -> Choice State checks for hasRunning: false -> Wait State -> FindRunning.

    Login or Signup to reply.
  2. From what I can see, you want essentially want a mutex so these two workflows don’t run concurrently. You could solve this by polling the ListExecutions API action using AWS SDK Service Integrations. But you might also look at the solution in this blog post to implement a lock.

    Controlling concurrency in distributed systems using AWS Step Functions

    This might seem more complex than the polling solution, but it will be less prone to race conditions (as it’s possible the other state machine begins to run between the time you poll and start the other). And once you use this, it becomes easier to start doing more sophisticated things (such as a semaphore if you want to have more than one but not too many running concurrently).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search