skip to Main Content

I have an amazon S3 bucket.

Every minute or so, a new file is added to that bucket.

I need a program to retrieve those files (To program memory, not to disk) for processing.

The program is launched manually, and on each launch it should process only files that have not been processed before.

Is there a function somewhere in the S3 SDK that lets me write it as easily as

var newFiles = s3client.GetFilesAfter(timestamp);

or will I have to write a more involved solution?

EDIT: I tagged this C# but if the API is markedly different in different languages, I am open to solutions in other languages.

2

Answers


  1. Amazon S3 Event Notifications can be automatically triggered when objects are created/modified/deleted.

    The event can:

    • Send a message to an Amazon Simple Notification Service topic
    • Push a message into an Amazon Simple Queue Service queue
    • Invoke an AWS Lambda function

    The easiest method would be to invoke an AWS Lambda function that can (hopefully) run your code. The function will be passed the Bucket and Key of the object that triggered the function so your code will likely just process the one object, but it will happen immediately after the object is created.

    Alternatively, if you want to keep running your existing code, it will need to:

    • List the entire contents of the bucket
    • Compare the listing against the previous run time
    • Determine which objects have been created/modified
    Login or Signup to reply.
  2. You could use S3 API to fetch the list of Object which are modified after a certain timestamp with --query parameter.

    aws s3api list-objects-v2 --bucket "$bucket" 
        --query 'Contents[?LastModified > `2023-05-30`]' 
    

    But the challenge is that it will return 1000 Keys with the pagination marker. Hence you will have to iterate it until last key. You might experience the performance issues in case of huge number of keys.

    The light weight approach would be to utilize the Event Notifications from S3 which can trigger the subsequent processing.

    You can create any kind of EventNotification trigger as defined in the
    AWS S3 Event Notifications and
    Event notification types and destinations

    You can either choose

    • sending all events to a message queue i.e. SNS/SQS and configure that to trigger a lambda function
    • or directly invoke Lambda function from S3 events.

    Examples:
    https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search