skip to Main Content

I’m trying to develop a simple lambda function that will scrape a pdf and save it to an s3 bucket given the url and the desired filename as input data. I keep receiving the error "Read-only file system,’ and I’m not sure if I have to change the bucket permissions or if there is something else I am missing. I am new to S3 and Lambda and would appreciate any help.

This is my code:

import urllib.request
    import json
    import boto3


def lambda_handler(event, context):   
    s3 = boto3.client('s3') 
    url = event['url']
    filename = event['filename'] + ".pdf"
    response = urllib.request.urlopen(url)   
    file = open(filename, 'w')
    file.write(response.read())
    s3.upload_fileobj(response.read(), 'sasbreports', filename)
    file.close()

This was my event file:

{
  "url": "https://purpose-cms-preprod01.s3.amazonaws.com/wp-content/uploads/2022/03/09205150/FY21-NIKE-Impact-Report_SASB-Summary.pdf",
  "filename": "nike"
}

When I tested the function, I received this error:

{
  "errorMessage": "[Errno 30] Read-only file system: 'nike.pdf.pdf'",
  "errorType": "OSError",
  "requestId": "de0b23d3-1e62-482c-bdf8-e27e82251941",
  "stackTrace": [
    "  File "/var/task/lambda_function.py", line 15, in lambda_handlern    file = open(filename + ".pdf", 'w')n"
  ]
}

2

Answers


  1. AWS Lambda functions can only write to the /tmp/ directory. All other directories are Read-Only.

    Also, there is a default limit of 512MB for storage in /tmp/, so make sure you delete the files after upload it to S3 for situations where the Lambda environment is re-used for future executions.

    Login or Signup to reply.
  2. AWS Lambda has limited space in /tmp, the sole writable location.
    Writing into this space can be dangerous without a proper disk management since this storage is kept alive across multiple executions. It can lead to a saturation or unexpected file share with previous requests.
    Instead of saving locally the PDF, write it directly to S3, without involving file system this way:

    import urllib.request
    import json
    import boto3
    
    
    def lambda_handler(event, context):   
        s3 = boto3.client('s3') 
        url = event['url']
        filename = event['filename']
        response = urllib.request.urlopen(url)   
        s3.upload_fileobj(response.read(), 'sasbreports', filename)
    

    BTW: The .pdf appending should be removed according your use case.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search