I have a lambda that triggers upon s3 PutObject
. Before proceeding the lambda needs to check if the file is actually a video file or not (mp4 in my case). File extension is not helpful because that can be fake. So I have tried checking MIME using FileType which works in local machine.
I don’t want to download large files from s3, just some portion and save in local machine to check if that’s mp4 or not.
So far I tried this (on local machine) –
import boto3
import filetype
from time import sleep
REGION = 'ap-southeast-1'
tmp_path = "path/src/my_file.mp4"
start_byte = 0
end_byte = 9000
s3 = boto3.client('s3', region_name=REGION)
resp = s3.get_object(
Bucket="test",
Key="MVI_1494.MP4",
Range='bytes={}-{}'.format(start_byte, end_byte)
)
# the file
object_content = resp['Body'].read()
print(type(object_content))
with open(tmp_path, "wb") as binary_file:
# Write bytes to file
binary_file.write(object_content)
sleep(5)
kind = filetype.guess_mime(tmp_path)
print(kind)
But this always return None
as mimetype. I think I am not saving the binary file properly, any help would really save my day.
TLDR: Download small portion of large file from s3 -> save in tmp storage -> get mime.
2
Answers
Boto3 has a function S3.Client.head_object:
You can call this method to get metadata object associated with S3 bucket item.
This metadata includes a
ContentType
property, you can use this property to check the object type.OR
If you can’t trust this
ContentType
as this can be faked. You can simply save the object’s MIME type in DynamoDB while uploading it. You can read the type from there whenever you want.OR
You can simply create a Lambda that will get triggered, you can download the object in the Lambda as it has around 512MB as ephemeral storage. You can determine the content type there and update it, as you can also set some metadata when you upload the object and later edit it as your needs change.
You dont need to save file on disk for
filetype
lib.guess_mime
function acceptbytes
datatype as well.