skip to Main Content

I have an S3 bucket which holds a pipeline.log file. Each time I run a lambda function, I want the logs to be written and then uploaded to my S3 bucket.

I have created custom functions that handle the downloads and uploads from s3. These custom S3 functions are tested and running well in sagemaker, lambdas and gluejobs.

In the next step, the same uploaded log file will be downloaded, written in a GlueJob then uploaded and then propagated to more lambdas and step function entities and so on.

Below is a sample code that I have tried to download a log file from S3, write to it and upload it back to end the module. I am not interested in using Cloudwatch, nor do I want to disable the cloudwatch logs. I just want to plain old download-write-upload.

import logging

logging.basicConfig(filename='/tmp/pipeline.log', 
                    level=logging.INFO,
                    format='%(asctime)s %(message)s',
                    filemode='w')


def lambda_handler(event, context):                 

    download_from_s3(bucket=bucket,      # download pipeline.log from s3
                    key=pipeline.log,
                    to='/tmp/pipeline.log')   

    logging.info('Starting Pipeline')    # Add logs to pipeline.log             

    upload_to_s3(bucket=bucket,          # Reupload to S3 to be downloaded in next module
                key=pipeline.log,
                frm='/tmp/pipeline.log')             

    return None

The output has no error and returns 200. However the pipeline.log file remains empty with a changed timestamp of the most recent lambda run. This code works perfectly with Gluejobs (ipynb uploads), and the written logs are visible in the log file in S3 but somehow I am unable to use the same code to update the log file from Lambdas.

Any idea on how to get this done? I want the same pipeline.log to be downloaded written and uploaded through each modules of the Step function pipeline.

2

Answers


  1. I assume you are not exceeding storage/invocation limits in tmp folder (each execution/invoke, tmp folder will be deleted) and Lambda execution time limits, you can try this,

    import json
    import logging
    import boto3
    
    
    def lambda_handler(event, context):
        
        s3_client = boto3.client('s3')
        bucket = event["bucket"]
        key = event["key"]
        
        filepath = "/tmp/" +  key
        
        s3_client.download_file(bucket, key, filepath)
        
        ## open file and do smth ##
        
        s3_client.upload_file(filepath, bucket, key)
        
        return {
            'statusCode': 200,
            'body': json.dumps("Uploaded successfully.")
        }
    
    Login or Signup to reply.
  2. Did you consider hosting your logs in EFS? For your lambda function, you can simply mount EFS to a local directory (and avoid concurrency complications with writing to S3, down the road!)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search