skip to Main Content

TechStack: salesforce data ->Aws Appflow->s3 ->databricks job

Hello! I have an appflow flow that is grabbing salesforce data and uploading it to s3 in a folder with multiple parquet files. I have an lambda that is listening to the prefix where this folder is being dropped. This lambda then triggers a databricks job which is an ingestion process I have created.

My main issue is that when these files are being uploaded to s3 it is triggering my lambda 1 time per file that is uploaded, and was curious as to how I can have the lambda run just once.

2

Answers


  1. I hope I’ve understood your issue correctly but it sounds like your Lambda is working correctly if you have it setup to run every time a file is dropped into the S3 bucket as the S3 trigger will call the Lambda upon every upload.

    If you want to reduce the amount of time your Lambda runs is setup an Event Bridge trigger to check the bucket for new files you could run this off an Event Bridge CRON to ping the Lambda on a defined schedule. You could then send all the files to your data bricks block in bulk rather than individually.

    Login or Signup to reply.
  2. Amazon AppFlow publishes a Flow notification – Amazon AppFlow when a Flow is complete:

    Amazon AppFlow is integrated with Amazon CloudWatch Events to publish events related to the status of a flow. The following flow events are published to your default event bus.

    AppFlow End Flow Run Report: This event is published when a flow run is complete.

    You could trigger the Lambda function when this Event is published. That way, it is only triggered when the Flow is complete.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search