I have an amazon S3 bucket.
Every minute or so, a new file is added to that bucket.
I need a program to retrieve those files (To program memory, not to disk) for processing.
The program is launched manually, and on each launch it should process only files that have not been processed before.
Is there a function somewhere in the S3 SDK that lets me write it as easily as
var newFiles = s3client.GetFilesAfter(timestamp);
or will I have to write a more involved solution?
EDIT: I tagged this C# but if the API is markedly different in different languages, I am open to solutions in other languages.
2
Answers
Amazon S3 Event Notifications can be automatically triggered when objects are created/modified/deleted.
The event can:
The easiest method would be to invoke an AWS Lambda function that can (hopefully) run your code. The function will be passed the Bucket and Key of the object that triggered the function so your code will likely just process the one object, but it will happen immediately after the object is created.
Alternatively, if you want to keep running your existing code, it will need to:
You could use S3 API to fetch the list of Object which are modified after a certain timestamp with
--query
parameter.But the challenge is that it will return 1000 Keys with the pagination marker. Hence you will have to iterate it until last key. You might experience the performance issues in case of huge number of keys.
The light weight approach would be to utilize the Event Notifications from S3 which can trigger the subsequent processing.
You can create any kind of EventNotification trigger as defined in the
AWS S3 Event Notifications and
Event notification types and destinations
You can either choose
Examples:
https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html