skip to Main Content

I have some work that needs two s3 objects. Object A is uploaded by another system; I have to generate the Object B.

In fact, there is not one Object A, but several (A1, A2, A3). Each one is uploaded by an external system at any time. For each object A, another instance of the work has to be launched.

On the other hand, Object B remains the same for a specified period time, after which I have to regenerate it. Generating the object takes time.

I can use EventBridge Scheduler to generate the object B, and I can also use event bridge to fire events for each Object Ax that gets uploaded.

My question is how do combine these two events, so that I can launch a job only after both Object B is generated, and Object Ax is uploaded, ensuring that for every object Ax that gets uploaded, exactly one job is launched.

(Something similar to Promise.all in javascript)

2

Answers


  1. I suggest you use a Step Functions workflow, triggered at the specified time by EventBridge Scheduler, to implement this logic. It can do the work to generate the file required for your second step. It can then use Step Functions AWS SDK Service Integrations (e.g., arn:aws:states:::aws-sdk:s3:listObjectsV2) to determine if the object from the other system has been uploaded yet. If it has, then carry on. If not, then you can wait until it does (e.g., using a Job Poller pattern to check for existence and wait if not found) and implement any compensating logic you might want (e.g., to handle if the file is not uploaded by the other system within the timeline you expect). This would be the simplest approach (fewest moving parts) and would likely work well for you.

    If you later had additional things you wanted to wait for (e.g., you had two objects from different external systems to wait for), you could build that into your workflow using Step Functions Parallel State or Map State.

    If the other system typically uploads the files before your timed run, then this will be low latency. In scenarios where the other system uploads after, there will be some additional latency between the polling calls in your Step Functions workflow. You also have a trade-off there between frequency of polling (lowers latency between object creation and detection) and cost for state transitions (as the workflow needs to do work each time). If the volume is high enough and the latency important enough to you, you could enhance this further. For example, when the object from the other system isn’t there, you could use the .waitForTaskToken Service Integration Pattern from Step Functions to write a Task Token to a DynamoDB table for the expected object and wait. Then have another Step Functions Express Workflow triggered by EventBridge S3 Notifications that looked up the expected object in the DynamoDB table and, if it found it, call back to Step Functions (SendTaskSuccess) to complete the workflow execution. This would be preferrable if your scenario is high volume, but as you can see it woud be more development effort and you’d need to be more careful to manage race-conditions.

    Login or Signup to reply.
  2. While the step functions approach above seems like the right solution generally for these kind of orchestration scenarios, I wonder if in this simple case it would be sufficient to have a single lambda listen to both S3 notifications and simply check if the other file already exists?

    Something like:

    exports.handler = async function (event, context) {
      const s3Object = extractS3Object(event);
      if (isSecondFile(s3Object) && firstFileExists() 
        || isFirstFile(s3Object) && secondFileExists()) {
        // Do stuff
      } else {
        // Don't do stuff
      }
    };
    

    Update: This doesn’t work as great if you need a strong guarantee that job is only executed once, but it’s still possible with a bit of a hack. You can:

    • Set your lambda concurrency limit to 1
    • Store IDs of successfully processed jobs in a DynamoDB table
    • Check ID in the table before processing a job, and if found, skip.

    However, I would first think if it’s possible to make your job idempotent, so that you don’t have to enforce "only-once" policy

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search