I want to retrieve files based on specific tag from AWS s3 bucket using python.
Although I have made a solution using boto3 and loops but it is taking a lot and lot of time depending on number of files in bucket.
Is their any direct method that accepts tag and bucket name and return only those files that fulfill the criteria. Files could be of any type pdf, docx, png etc. Although I would like to achieve this with boto3 but I am open to any other library too that can solve my problem.
Current Python –version is 3.10.7
2
Answers
I have not tried this in Python. However using Java SDK, you can call this method:
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3Client.html#getObjectTagging(software.amazon.awssdk.services.s3.model.GetObjectTaggingRequest)
The GetObjectTaggingRequest does not let you filter on a specific tag (ie – Outdoors). However, it returns a tag set. Then you can check the result set with the tag you want.
So the answer to your question:
"Is their any direct method that accepts tag and bucket name and return only those files that fulfill the criteria"
The answer is no. You need to loop through the result set (ie – the list of tags).
The Java Logic to get tags for an object in a specific bucket is:
You should be able to limit the access a role has by assuming a role with limited permissions to only the tag in question; in this example a tag called "location" with the contents "Outdoors". I haven’t had time to verify this code, but the concept should help.
This assumes a role called
role-with-get-perms
already exists and has permissions to get all objects in the bucketyour-bucket
. You can read more about sts assume_role here. There it explains that thePolicy
attribute in the request:So the following code should do as your wish, with some modifications to suit your context.