skip to Main Content

I have a lambda with a ffmpeg layer on it.

The command i want to use is basically

ffmpeg -i video.mp4 -qscale:v 2 -vf fps=8 '%04d.jpg'

so it has an input file, and creates 8 frames per second in the same folder

This code seems to do everything except writing the files, what am I missing ?

import ...
SIGNED_URL_TIMEOUT = 60
FPS_SAMPLES = 8

def lambda_handler(event, context):
    # Set up logging.
    logger = logging.getLogger(__name__)

    s3_client = boto3.client('s3')

    s3_source_bucket = event['Records'][0]['s3']['bucket']['name']
    s3_source_key = event['Records'][0]['s3']['object']['key']

    s3_source_basename = os.path.splitext(os.path.basename(s3_source_key))[0]

    logger.info( "bucket: %s, key: %s, basename: %s",s3_source_bucket, s3_source_key, s3_source_basename)

    s3_source_signed_url = s3_client.generate_presigned_url('get_object',
    Params={'Bucket': s3_source_bucket, 'Key': s3_source_key},
    ExpiresIn=SIGNED_URL_TIMEOUT)

    with tempfile.TemporaryDirectory() as tmpdir:
        os.chdir(tmpdir) # change the current folder to that one (current one is in     os.getcwd())
        cwd = os.getcwd()
        ffmpeg_cmd = "/opt/bin/ffmpeg -i "" + s3_source_signed_url + "" -qscale:v 2 -vf fps="+str(FPS_SAMPLES)+" "+ cwd + "/'%04d.jpg'"
    print("COMMAND: "+ffmpeg_cmd)
    
        command1 = shlex.split(ffmpeg_cmd)
        p1 = subprocess.run(command1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

        # List all files and directories in the current directory
        contents = os.listdir(cwd)
    
        # Print the contents
        print(f"Contents of {cwd}:")
        for item in contents:
            print(item) # <--- NOthing here...

   
return {
    'statusCode': 200,
    'body': json.dumps("bucket: %s, key: %s, basename: %s" % (s3_source_bucket, s3_source_key, s3_source_basename))
}

2

Answers


  1. AWS Lambda functions can only write to the /tmp/ directory, so make sure the temporary directory is being created there.

    It might also be better to have the Lambda function download the input file to /tmp/ and then pass that file to ffmpeg. This way, you can better diagnose what might be happening.

    Also, please note that the Lambda environment might be re-used, so it is generally a good idea to delete newly-created files before exiting the Lambda function otherwise the available disk space (512MB by default) might be filled by subsequent executions.

    Login or Signup to reply.
  2. As you mentioned, you’re following along with the example code from AWS seen in their blog.

    This is problematic. This worked a few versions of FFMpeg ago due to a quirk with how it was built back then. For quite a while, the static builds of FFMpeg have included this warning:

    A limitation of statically linking glibc is the loss of DNS resolution

    This means you can no longer use FFMpeg in any scenario where it will make a DNS query when run on Lambda with a static build, such as passing it a presigned URL. One way to work around this is to use Python to download the content, then pass it to FFMpeg to work on.

    There is another more direct way: Perform the DNS query manually, modify the URL, and pass the modified URL to FFMpeg along with some HTTP headers to work around the fact the URL has been changed. This will prevent the need for the local storage for the original content (and indeed, can be used to stream content back to S3 directly depending on the output format). Here’s an example of that:

    import boto3, os, stat, subprocess
    from urllib.parse import urlparse
    from socket import gethostbyname
    
    s3 = boto3.client('s3')
    def lambda_handler(event, context):
        # TODO: Get a copy of ffmpeg from S3 for this example
        #       It probably makes more sense to include it in the Lambda archive
        if not os.path.isfile('/tmp/ffmpeg'):
            s3.download_file('example-bucket', 'temp/ffmpeg', '/tmp/ffmpeg')
            st = os.stat('/tmp/ffmpeg')
            os.chmod('/tmp/ffmpeg', st.st_mode | stat.S_IEXEC)
    
        # Generate a presigned URL for an artifact in S3
        url = s3.generate_presigned_url(
            'get_object', 
            Params={'Bucket': 'example-bucket', 'Key': 'temp/example.mp3'}, 
            ExpiresIn=300,
        )
    
        # Pull the hostname out of the URL, and replace it with a IP address
        parsed = urlparse(url)
        hostname = parsed.hostname
        parsed = parsed._replace(netloc=gethostbyname(hostname))
        url = parsed.geturl()
    
        # Call FFMpeg with the target URL, along with the hostname so the webserver 
        # will properly parse the request. For this example, just transcode the MP3 
        # to a constant bitrate, the exact call you want to make to FFMpeg will 
        # no doubt differ
        subprocess.check_call([
            '/tmp/ffmpeg', 
            '-headers', f"Host: {hostname}",
            '-i', url, 
            "-ar", "44100", "-ac", "1", "-b:a", "64k",
            "/tmp/output.mp3",
        ])
    
        # For this example, just upload the object back to S3
        s3.upload_file('/tmp/output.mp3', 'example-bucket', 'temp/output.mp3')
    
        # TODO: Clean up /tmp when complete
    
        return {'statusCode': 200, 'body': "OK"}
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search