I’ve taken a script written by Paul Davies about reingesting Splunk Logs from the AWS Cloud.
The
When my logs have failed to process in Kinesis Firehose they get placed in a backup S3 bucket. The Current format of the key is the following:
Folder/Folder/Year/Month/Day/HH/failedlogs
Example:
splunk-kinesis-firehose/splunk-failed/2023/01/01/01/failedlogs.gz
The key lookup in the script is set like this
key=urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
Is there way to get all the files within the following S3 Bucket under the sub folder – splunk-kinesis-firehose or is there a better way of looping through all folders?
2
Answers
As John Rotenshtein says, your Lambda function, if invoked by S3 trigger, will receive the key as part of the request. You could also invoke the Lambda manually and pass the key in the request.
But if, for some reason, you want to do a full (or partial) listing under a path, then please take a look at
s3list()
that I describe in this SO post. It is a fairly general S3 lister. In your case, you would call it with:to get all the objects under that path, or, for example:
to get just the files for the month of May 2023.
Note that
s3list
is agenerator
: you can start listing a trillion objects and stop whenever you like (internally, it goes by chunks of upo to 1000 objects per call to AWS).To list objects in an Amazon S3 bucket you can use the client method: list_objects_v2 – Boto3 documentation:
Or you can use the resource method, which is a bit more Pythonic: