I have 3 tables in Athena and I want to run ‘SELECT * FROM table’ for each of the 3 tables. Below is the basic python code which is executed for one table named as client:
import json
import boto3
import time
def lambda_handler(event, context):
client = boto3.client('athena')
#Setup and perform query
queryStart = client.start_query_execution(
QueryString = 'SELECT * FROM client limit 10;',
QueryExecutionContext = {
'Database': 'dev'
},
ResultConfiguration = {
'OutputLocation': 's3://bucketpune/athena-output/lambda/'
}
)
#Observe results
queryId = queryStart['QueryExecutionId']
time.sleep(15)
results = client.get_query_results(QueryExecutionId = queryId)
for row in results['ResultSet']['Rows']:
print(row)
I want this to loop again with other tables one by one like ‘SELECT * FROM customers limit 10;’
I used for loop as shown below but it takes only the last table name ‘region’.
import json
import boto3
import time
table = ["client", "customer", "region"]
for x in table:
def lambda_handler(event, context):
client = boto3.client('athena')
#Setup and perform query
queryStart = client.start_query_execution(
QueryString = 'SELECT * FROM ' + x + ' limit 10;',
QueryExecutionContext = {
'Database': 'dev'
},
ResultConfiguration = {
'OutputLocation': 's3://bucketpune/athena-output/lambda/'
}
)
#Observe results
queryId = queryStart['QueryExecutionId']
time.sleep(15)
results = client.get_query_results(QueryExecutionId = queryId)
for row in results['ResultSet']['Rows']:
print(row)
2
Answers
Your
for x in table:
must be inside your handler, not outside:What @Marcin suggested is not bad, although since Athena is asynchronous in some sense, you can benefit from that (AS LONG AS YOU DON’T NEED ANSWER FROM PREVIOUS QUERY!). The technical term for this is that Athena works in the best effort manner. It can take a second or half a minute to return your rows, so remember that time.sleep(15) will not ensure that Athena finished processing. Here’s an example of a function in python that fetches data from Athena using also incremental wait (to not spam Athena each second and in the worst case scenario hit API qoutas):
Now, you can use that as a base to create a function that will first schedule all three (or n) queries and check for the answers periodically.