I have this path in S3: object1/object2/object3/object4/
In Object4/
I have a list of objects, example:
directory1/directory2/directory3/directory4/2022-30-09-15h21/
directory1/directory2/directory3/directory4/2023-20-12-12h30/
directory1/directory2/directory3/directory4/2022-31-12-09h34/
directory1/directory2/directory3/directory4/2023-12-08-14h56/
I would like to select the last created directory in directory4/
then I should download all files inside it.
I wrote this script to do it:
import boto3
from datetime import datetime
session_root = boto3.Session(region_name='eu-west-3', profile_name='my_profile')
s3_client = session_root.client('s3')
bucket_name = 'my_bucket'
prefix = 'object1/object2/object3/object4/'
# List objects in the bucket
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=prefix)
# Extract the object names and convert them to datetime objects
objects_with_dates = [(obj['Key'], datetime.strptime(obj['LastModified'].strftime('%Y-%m-%d %H:%M:%S'), '%Y-%m-%d %H:%M:%S')) for obj in response.get('Contents', [])]
# Find the latest created object
latest_object = max(objects_with_dates, key=lambda x: x[1])
print("Last created S3 object:", latest_object[0]) # the returned value is: object1/object2/object3/object4/2023-20-12-12h30/my_file.csv
My script select the last created directory in directory4/
and download the last created file inside, the result of my script is: directory1/directory2/directory3/directory4/2023-20-12-12h30/my_file.csv
But I would like to download all files inside.
Do you have an idea please how can I modify my script to select the last created directory in directory4/
and I download all files inside ?
Thanks
2
Answers
A way to select the last created object into your S3 Buckets will be to create DynamoDB and use a Lambda with S3 Object Lambda to save a catalog into DynamoDB and place the index on the modified/change time.
Ofc you can use an other database then DynamoDB but DynamoDB is very cheap to start with and later you can think about what makes sens by changing DB, DynamoDB only costs when you use it if you use that option.
It’s a little bit more complex than you asked for, but if you have 100.000.000 objects in your S3 you will need to pay for each list scan and object lookup so it can be very expensive if you make mistakes so I will recommend you to use S3 Object Lambda ( https://aws.amazon.com/s3/features/object-lambda/ )
It appears that your requirement is:
directory1/directory2/directory3/directory4/
)YYYY-DD-MM-HHhmm
formatHere is a sample program that uses the list of
CommonPrefixes
returned by S3, which is effectively a list of sub-directories.