skip to Main Content

I want to retrieve files based on specific tag from AWS s3 bucket using python.
Although I have made a solution using boto3 and loops but it is taking a lot and lot of time depending on number of files in bucket.
Is their any direct method that accepts tag and bucket name and return only those files that fulfill the criteria. Files could be of any type pdf, docx, png etc. Although I would like to achieve this with boto3 but I am open to any other library too that can solve my problem.

Current Python –version is 3.10.7

2

Answers


  1. I have not tried this in Python. However using Java SDK, you can call this method:

    https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3Client.html#getObjectTagging(software.amazon.awssdk.services.s3.model.GetObjectTaggingRequest)

    The GetObjectTaggingRequest does not let you filter on a specific tag (ie – Outdoors). However, it returns a tag set. Then you can check the result set with the tag you want.

    So the answer to your question:

    "Is their any direct method that accepts tag and bucket name and return only those files that fulfill the criteria"

    The answer is no. You need to loop through the result set (ie – the list of tags).

    The Java Logic to get tags for an object in a specific bucket is:

    // Check for tags on the S3 object.
       public boolean tagCheck(String bucketName, String keyName) {
                List<Tag> tags = getObjectTags(PhotoApplication.STORAGE_BUCKET, keyName);
                for (Tag tag:tags) {
                    if (tag.key().compareTo(PhotoApplication.REKOGNITION_TAG_KEY) ==0)
                        return true;
                }
                return false;
            }
    
       private List<Tag> getObjectTags(String bucketName, String keyName) {
                S3Client s3 = getClient();
                GetObjectTaggingRequest request = GetObjectTaggingRequest.builder()
                    .bucket(bucketName)
                    .key(keyName)
                    .build();
        
                GetObjectTaggingResponse response = s3.getObjectTagging(request);
                return response.tagSet();
            }
    
    Login or Signup to reply.
  2. You should be able to limit the access a role has by assuming a role with limited permissions to only the tag in question; in this example a tag called "location" with the contents "Outdoors". I haven’t had time to verify this code, but the concept should help.

    This assumes a role called role-with-get-perms already exists and has permissions to get all objects in the bucket your-bucket. You can read more about sts assume_role here. There it explains that the Policy attribute in the request:

    An IAM policy in JSON format that you want to use as an inline session policy.

    This parameter is optional. Passing policies to this operation returns new temporary credentials. The resulting session’s permissions are the intersection of the role’s identity-based policy and the session policies. You can use the role’s temporary credentials in subsequent Amazon Web Services API calls to access resources in the account that owns the role. You cannot use session policies to grant more permissions than those allowed by the identity-based policy of the role that is being assumed. For more information, see Session Policies in the IAM User Guide .

    So the following code should do as your wish, with some modifications to suit your context.

    import boto3
    
    SESSION_POLICY_TMPL = """
    {
      "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Action": [
          "s3:GetObject"
        ],
        "Resource": "arn:aws:s3:::{}/*",
        "Condition": {
          "StringEquals": {
            "s3:ExistingObjectTag/location": "{}"
          }
        }
      }]
    }
    """
    
    def main():
        sts = boto3.client("sts")
        bucket = "your-bucket"
        tag = "Outdoors"
        assumed = sts_client.assume_role(
            RoleArn="arn:aws:iam::123412341234:role/role-with-get-perms",
            RoleSessionName=f"GetObjectsTagged{tag}",
            Policy=SESSION_POLICY_TMPL.format(bucket, tag),
        )
        s3 = boto3.resource(
            "s3",
            aws_access_key_id=assumed.get("Credentials").get("AccessKeyId"),
            aws_secret_access_key=assumed.get("Credentials").get("SecretAccessKey"),
            aws_session_token=assumed.get("Credentials").get("SessionToken"),
        )
        objects = []
        for key in s3.list_objects(Bucket=bucket)['Contents']:
            objects.append(s3.get_object(
                Bucket=bucket,
                Key=key
            ))
    
    
    if __name__ == "__main__":
        main()
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search