Here is the pseudo-code for what current lambda function looks like;
import pandas
import pymysql
def get_db_data(con_):
query = "SELECT * FROM mytable"
data = pandas.read_sql(query, con_)
return data
def lambda_handler(event, context):
con = pymysql.connect()
data = get_db_data(con)
"""
do other things with event
"""
con.close()
I am debating if I can do this instead:
import pandas
import pymysql
con = pymysql.connect()
def get_db_data(con_):
query = "SELECT * FROM mytable"
data = pandas.read_sql(query, con_)
return data
data = get_db_data(con)
def lambda_handler(event, context):
"""
do other things with event
"""
con.close()
But I am not sure if it is a good practice. What implications would the second option have on run-time and cost? Is it against the recommended way?
2
Answers
In summary, both approaches have their merits, and the choice depends on factors such as cold start time sensitivity, resource management, and the nature of your Lambda workload. To reuse connections, consider using connection pooling and ensuring proper scoping to mitigate potential issues.
First Version (Connection Created Inside lambda_handler):
Pros:
Cons:
Second Version (Connection Created Outside lambda_handler):
Pros:
Cons:
Resource Leaks: Keep connection scope limited to Lambda execution; global connections can lead to resource leaks.
Concurrency Issues: Ensure thread safety if multiple invocations modify the same global connection.
Recommendations:
When working with a database connection in a Lambda function, it is best to follow AWS best practices and use INIT code (which is where you are almost heading) to load expensive resources.
Lambda can run from either a COLD or WARM start. On COLD start, the code outside the lambda handler is executed. When a Lambda is run from a WARM start, the resources loading during COLD start will be available. By including resources like database connection opening in the COLD start, subsequent WARM starts will not have to re-execute the same expensive operation. Getting to reuse the WARM start requires that calls to the specific Lambda be within a short period of time. This can greatly reduce the execution time on your Lambda functions and this reduce costs!
Based on where you were going, I would say to rewrite it as such:
This concept is also explained well in the AWS Lambda docs here.