Python Get MIME of s3 object on Lambda

ashrafminhaj
January 3, 2023
273 views
0 votes
2 Answers

I have a lambda that triggers upon s3 PutObject. Before proceeding the lambda needs to check if the file is actually a video file or not (mp4 in my case). File extension is not helpful because that can be fake. So I have tried checking MIME using FileType which works in local machine.
I don’t want to download large files from s3, just some portion and save in local machine to check if that’s mp4 or not.
So far I tried this (on local machine) –

import boto3
import filetype
from time import sleep

REGION = 'ap-southeast-1'

tmp_path = "path/src/my_file.mp4"

start_byte = 0
end_byte = 9000

s3 = boto3.client('s3', region_name=REGION)

resp = s3.get_object(
    Bucket="test", 
    Key="MVI_1494.MP4", 
    Range='bytes={}-{}'.format(start_byte, end_byte)
    )

# the file
object_content = resp['Body'].read()

print(type(object_content))
with open(tmp_path, "wb") as binary_file:
    # Write bytes to file
    binary_file.write(object_content)

sleep(5)
kind = filetype.guess_mime(tmp_path)
print(kind)

But this always return None as mimetype. I think I am not saving the binary file properly, any help would really save my day.

TLDR: Download small portion of large file from s3 -> save in tmp storage -> get mime.

Answers

- AnkushJain
- January 3, 2023 at 7:09 am
- 0 votes
0
Boto3 has a function S3.Client.head_object:

The HEAD action retrieves metadata from an object without returning
the object itself. This action is useful if you’re only interested in
an object’s metadata. To use HEAD, you must have READ access to the
object.

You can call this method to get metadata object associated with S3 bucket item.
```
metadata = s3client.head_object(Bucket='MyBucketName', Key='MyS3ItemKey')
```
This metadata includes a ContentType property, you can use this property to check the object type.

OR

If you can’t trust this ContentType as this can be faked. You can simply save the object’s MIME type in DynamoDB while uploading it. You can read the type from there whenever you want.

OR

You can simply create a Lambda that will get triggered, you can download the object in the Lambda as it has around 512MB as ephemeral storage. You can determine the content type there and update it, as you can also set some metadata when you upload the object and later edit it as your needs change.
Login or Signup to reply.

- MehmetG252ng246ren
- January 3, 2023 at 1:52 pm
- 0 votes
0
You dont need to save file on disk for filetype lib.

guess_mime function accept bytes datatype as well.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.