skip to Main Content

How do I write the contents of an S3 object into SharedMemory?

MB=100
mem = SharedMemory(create=True, size=MB*2**20)
response = s3_client.get_object(Bucket='my_bucket', Key='path/to/obj')
mem.buf[:] = response['Body'].read()

However, I then get an error:

memoryview assignment: lvalue and rvalue have different structure

Printing the memoryview shape gives this:

(105906176,)

When I then try this:

mem.buf[0] = response['Body'].read()

I get a different error:

memoryview: invalid type for format 'B'

How can I write the contents of an S3 file into SharedMemory? I don’t want to write to disk.

2

Answers


  1. So, if you want to use slice-notation, you have to give it the exact size.

    i.e. you need something like:

     mem.buf[:len(data)] = data
    

    because the way that mem.buf[:] = data would be interpreted with slice-syntax is to resize the container on the left to be the same size as data. So consider,

    mylist = [1,2,3,4,5]
    print(mylist, len(mylist))  # [1, 2, 3, 4, 5] 5
    mylist[:] = [99, 98]
    print(mylist, len(mylist))  # [99, 98] 2
    

    So in this case, just:

    data = response['Body'].read()
    mem.buf[:len(data)] = data
    

    Of course, this requires you have enough auxiliary space to keep duplicate of the whole data (your other approach did as well, so I assume this is OK).

    To do this memory-efficiently, you can just iterate over the streaming body, and it will read in 1kb chunks. So something to the effect of:

    i = 0
    for chunk in resp['Body']:
        size = len(chunk)
        mem.buf[i:i+size] = chunk
        i += size
    

    If you want to fiddle with the chunksize (in bytes), you can do something like:

    chunksize = 100_000
    stream = resp['Body']
    i = 0
    for chunk in iter(lambda : stream.read(chunksize), b""):
        mem.buf[i:i+len(chunk)] = chunk
        i += len(chunk)
    

    Or an equivalent while-loop, if the two-arg form of iter is too arcane:

    chunksize = 100_000
    stream = resp['Body']
    i = 0
    while chunk := stream.read(chunksize):
        mem.buf[i:i+len(chunk)] = chunk
        i += len(chunk)
    
    Login or Signup to reply.
  2. You need to ensure the data you’re placing in the SharedBuffer is the same size as the part of the shared buffer you’re writing to. Further, if you want to avoid keeping a second copy of the data around, you can read from S3 in chunks, and write to the buffer as you download data:

    MB=100
    mem = SharedMemory(create=True, size=MB*2**20)
    
    left = mem.size
    offset = 0
    response = s3_client.get_object(Bucket='my_bucket', Key='path/to/obj')
    while left > 0:
        # Read 1mb at a time to keep the copy of the data to a minimium
        # Make sure we don't read more than is left in the shared buffer
        buffer = response['Body'].read(min(left, 2 ** 20))
        # If nothing's left, stop
        if len(buffer) == 0: break
        # Store the data into the shared memory
        mem.buf[offset:offset+len(buffer)] = buffer
        left -= len(buffer)
        offset += len(buffer)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search