How to I write an S3 object to SharedMemory using boto3? - Amazon Web Sevices

John
March 7, 2023
313 views
1 vote
2 Answers

How do I write the contents of an S3 object into SharedMemory?

MB=100
mem = SharedMemory(create=True, size=MB*2**20)
response = s3_client.get_object(Bucket='my_bucket', Key='path/to/obj')
mem.buf[:] = response['Body'].read()

However, I then get an error:

memoryview assignment: lvalue and rvalue have different structure

Printing the memoryview shape gives this:

(105906176,)

When I then try this:

mem.buf[0] = response['Body'].read()

I get a different error:

memoryview: invalid type for format 'B'

How can I write the contents of an S3 file into SharedMemory? I don’t want to write to disk.

Answers

- juanpaarrivillaga
- March 7, 2023 at 9:24 pm
- 0 votes
0
So, if you want to use slice-notation, you have to give it the exact size.

i.e. you need something like:
```
 mem.buf[:len(data)] = data
```
because the way that mem.buf[:] = data would be interpreted with slice-syntax is to resize the container on the left to be the same size as data. So consider,
```
mylist = [1,2,3,4,5]
print(mylist, len(mylist))  # [1, 2, 3, 4, 5] 5
mylist[:] = [99, 98]
print(mylist, len(mylist))  # [99, 98] 2
```
So in this case, just:
```
data = response['Body'].read()
mem.buf[:len(data)] = data
```
Of course, this requires you have enough auxiliary space to keep duplicate of the whole data (your other approach did as well, so I assume this is OK).

To do this memory-efficiently, you can just iterate over the streaming body, and it will read in 1kb chunks. So something to the effect of:
```
i = 0
for chunk in resp['Body']:
    size = len(chunk)
    mem.buf[i:i+size] = chunk
    i += size
```
If you want to fiddle with the chunksize (in bytes), you can do something like:
```
chunksize = 100_000
stream = resp['Body']
i = 0
for chunk in iter(lambda : stream.read(chunksize), b""):
    mem.buf[i:i+len(chunk)] = chunk
    i += len(chunk)
```
Or an equivalent while-loop, if the two-arg form of iter is too arcane:
```
chunksize = 100_000
stream = resp['Body']
i = 0
while chunk := stream.read(chunksize):
    mem.buf[i:i+len(chunk)] = chunk
    i += len(chunk)
```
Login or Signup to reply.

You need to ensure the data you’re placing in the SharedBuffer is the same size as the part of the shared buffer you’re writing to. Further, if you want to avoid keeping a second copy of the data around, you can read from S3 in chunks, and write to the buffer as you download data:

MB=100
mem = SharedMemory(create=True, size=MB*2**20)

left = mem.size
offset = 0
response = s3_client.get_object(Bucket='my_bucket', Key='path/to/obj')
while left > 0:
    # Read 1mb at a time to keep the copy of the data to a minimium
    # Make sure we don't read more than is left in the shared buffer
    buffer = response['Body'].read(min(left, 2 ** 20))
    # If nothing's left, stop
    if len(buffer) == 0: break
    # Store the data into the shared memory
    mem.buf[offset:offset+len(buffer)] = buffer
    left -= len(buffer)
    offset += len(buffer)

Please signup or login to give your own answer.

Click here to cancel reply.

How to I write an S3 object to SharedMemory using boto3? – Amazon Web Sevices

Answers