Amazon web services - Reading whole S3 bucket in one request

BennySemyonov
March 3, 2024
138 views
0 votes
2 Answers

i have a bucket with 100,000 images(pngjpg) and i want to read them all in one s3 request.
this is my code:

# geting connection to s3
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)

# geting list of all objectsummery in the relevant dir - this is a request to the api
crops_path_list = bucket.objects.filter(Prefix=prefix)

# iterating on the list and geting objects
for index, crop in enumerate(crops_path_list):
    # getting the bytes from s3 - this a request to the api
    crop = crop.get()

    # working with the bytes to make them image
    image_content = crop['Body'].read()
    bytes_images = BytesIO(image_content)
    image = Image.open(bytes_images)
    image = image.convert("RGB")
    image = np.asarray(image)
    image = np.ascontiguousarray(image.transpose(2, 0, 1))
    image  = torch.from_numpy(np.array(image)).unsqueeze(0).to(dtype=torch.float32, device=device)

    # adding to list of all images
    images.append(image)

it takes agessss because every .get() is a request to the api. could not find any use to get the whole crops_path_list list in one request. other then running threadssubprocessziping, any other ideas how to use less IO and api requests?

thanks 🙂

Answers

Chosen as BEST ANSWER
- BennySemyonov
- March 3, 2024 at 5:15 pm
- 0 votes
0
PySpark works great for me, solved

(Edit)

- user253751
- January 21, 2024 at 2:43 pm
- 0 votes
0
This is not possible. Downloading each file is a request.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – Reading whole S3 bucket in one request

Answers