skip to Main Content

I’ve taken a script written by Paul Davies about reingesting Splunk Logs from the AWS Cloud.

The

When my logs have failed to process in Kinesis Firehose they get placed in a backup S3 bucket. The Current format of the key is the following:

Folder/Folder/Year/Month/Day/HH/failedlogs

Example:

splunk-kinesis-firehose/splunk-failed/2023/01/01/01/failedlogs.gz

The key lookup in the script is set like this

key=urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')

Is there way to get all the files within the following S3 Bucket under the sub folder – splunk-kinesis-firehose or is there a better way of looping through all folders?

2

Answers


  1. As John Rotenshtein says, your Lambda function, if invoked by S3 trigger, will receive the key as part of the request. You could also invoke the Lambda manually and pass the key in the request.

    But if, for some reason, you want to do a full (or partial) listing under a path, then please take a look at s3list() that I describe in this SO post. It is a fairly general S3 lister. In your case, you would call it with:

    bucket = boto3.resource('s3').Bucket('bucket-name')
    path = 'splunk-kinesis-firehose/splunk-failed'
    
    for s3obj in s3list(bucket, path, list_dirs=False):
        key = s3obj.key
        ...
    

    to get all the objects under that path, or, for example:

    for s3obj in s3list(bucket, path, start='2023/05/01', end='2023/06', list_dirs=False):
        key = s3obj.key
        ...
    

    to get just the files for the month of May 2023.

    Note that s3list is a generator: you can start listing a trillion objects and stop whenever you like (internally, it goes by chunks of upo to 1000 objects per call to AWS).

    Login or Signup to reply.
  2. To list objects in an Amazon S3 bucket you can use the client method: list_objects_v2 – Boto3 documentation:

    import boto3
    
    s3_client = boto3.client('s3')
    
    response = s3_client.list_objects_v2(
        Bucket='your-bucket-name',
        Prefix='splunk-kinesis-firehose',
    )
    

    Or you can use the resource method, which is a bit more Pythonic:

    import boto3
    
    s3_resource = boto3.resource('s3')
    
    bucket = s3_resource.Bucket('your-bucket-name')
    
    for obj in bucket.objects.all():
        print(obj.key)
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search