skip to Main Content

Some third party application is uploading around 10000 object to my bucket+prefix in a day. My requirement is to fetch all objects which were uploaded to my bucket+prefix in last 24 hours.

There are so many files in my bucket+prefix.

So I assume that when I call

response = s3_paginator.paginate(Bucket=bucket,Prefix='inside-bucket-level-1/', PaginationConfig={"PageSize": 1000})

then may be it makes multiple calls to S3 API and may be that’s why it is showing Rate Exceeded error.

Below is my Python Lambda Function.

import json
import boto3
import time
from datetime import datetime, timedelta


def lambda_handler(event, context):
    s3 = boto3.client("s3")
    from_date = datetime.today() - timedelta(days=1)
    string_from_date = from_date.strftime("%Y-%m-%d, %H:%M:%S")
    print("Date :", string_from_date)
    s3_paginator = s3.get_paginator('list_objects_v2')
    list_of_buckets = ['kush-dragon-data']
    bucket_wise_list = {}
    for bucket in list_of_buckets:

        response = s3_paginator.paginate(Bucket=bucket,Prefix='inside-bucket-level-1/', PaginationConfig={"PageSize": 1000})

        filtered_iterator = response.search(
            "Contents[?to_string(LastModified)>='"" + string_from_date + ""'].Key")

        keylist = []
        for key_data in filtered_iterator:

            if "/" in key_data:
                splitted_array = key_data.split("/")
                if len(splitted_array) > 1:
                    if splitted_array[-1]:
                        keylist.append(splitted_array[-1])
            else:
                keylist.append(key_data)

        bucket_wise_list.update({bucket: keylist})

    print("Total Number Of Object = ", bucket_wise_list)

    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps(bucket_wise_list)
    }

So when we execute above Lambda Function then it shows below error.

"Calling the invoke API action failed with this message: Rate Exceeded."

Can anyone help to resolve this error and achieve my requirement ?

2

Answers


  1. This is probably due to your account restrictions, you should add retry with some seconds between retries or increase pagesize

    Login or Signup to reply.
  2. This is most likely due to you reaching your quota limit for AWS S3 API calls. The "bigger hammer" solution is to request a quota increase, but if you don’t want to do that, there is another way using botocore.Config built in retries, for example:

    import json
    import time
    from datetime import datetime, timedelta
    from boto3 import client
    from botocore.config import Config
    
    config = Config(
       retries = {
          'max_attempts': 10,
          'mode': 'standard'
       }
    )
    def lambda_handler(event, context):
        s3 = client('s3', config=config)
    
    ###ALL OF YOUR CURRENT PYTHON CODE EXACTLY THE WAY IT IS###
    

    This config will use exponentially increasing sleep timer for a maximum number of retries. From the docs:

    • Any retry attempt will include an exponential backoff by a base factor of 2 for a maximum backoff time of 20 seconds.

    There is also an adaptive mode which is still experimental. For more info, see the docs on botocore.Config retries

    Another (much less robust IMO) option would be to write your own paginator with a sleep programmed in, though you’d probably just want to use the builtin backoff in 99.99% of cases (even if you do have to write your own paginator). (this code is untested and isn’t even asynchronous, so the sleep will be in addition to the wait time for a page response. To make the "sleep time" exactly sleep_secs, you’ll need to use concurrent.futures or asyncio (AWS built in paginators mostly use concurrent.futures)):

    from boto3 import client
    from typing import Generator
    from time import sleep
    
    def get_pages(bucket:str,prefix:str,page_size:int,sleep_secs:float) -> Generator:
        s3 = client('s3')
        page:dict = client.list_objects_v2(
            Bucket=bucket,
            MaxKeys=page_size,
            Prefix=prefix
        )
        next_token:str = page.get('NextContinuationToken')
        yield page
        while(next_token):
            sleep(sleep_secs)
            page = client.list_objects_v2(
                Bucket=bucket,
                MaxKeys=page_size,
                Prefix=prefix,
                ContinuationToken=next_token
            )
            next_token = page.get('NextContinuationToken')
            yield page
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search