skip to Main Content

I have a lambda that triggers upon s3 PutObject. Before proceeding the lambda needs to check if the file is actually a video file or not (mp4 in my case). File extension is not helpful because that can be fake. So I have tried checking MIME using FileType which works in local machine.
I don’t want to download large files from s3, just some portion and save in local machine to check if that’s mp4 or not.
So far I tried this (on local machine) –

import boto3
import filetype
from time import sleep

REGION = 'ap-southeast-1'

tmp_path = "path/src/my_file.mp4"

start_byte = 0
end_byte = 9000

s3 = boto3.client('s3', region_name=REGION)

resp = s3.get_object(
    Bucket="test", 
    Key="MVI_1494.MP4", 
    Range='bytes={}-{}'.format(start_byte, end_byte)
    )

# the file
object_content = resp['Body'].read()

print(type(object_content))
with open(tmp_path, "wb") as binary_file:
    # Write bytes to file
    binary_file.write(object_content)

sleep(5)
kind = filetype.guess_mime(tmp_path)
print(kind)

But this always return None as mimetype. I think I am not saving the binary file properly, any help would really save my day.

TLDR: Download small portion of large file from s3 -> save in tmp storage -> get mime.

2

Answers


  1. Boto3 has a function S3.Client.head_object:

    The HEAD action retrieves metadata from an object without returning
    the object itself. This action is useful if you’re only interested in
    an object’s metadata. To use HEAD, you must have READ access to the
    object.

    You can call this method to get metadata object associated with S3 bucket item.

    metadata = s3client.head_object(Bucket='MyBucketName', Key='MyS3ItemKey')
    

    This metadata includes a ContentType property, you can use this property to check the object type.

    OR

    If you can’t trust this ContentType as this can be faked. You can simply save the object’s MIME type in DynamoDB while uploading it. You can read the type from there whenever you want.

    OR

    You can simply create a Lambda that will get triggered, you can download the object in the Lambda as it has around 512MB as ephemeral storage. You can determine the content type there and update it, as you can also set some metadata when you upload the object and later edit it as your needs change.

    Login or Signup to reply.
  2. You dont need to save file on disk for filetype lib.

    guess_mime function accept bytes datatype as well.

    mime type

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search