I am querying Athena thru lambda. Results are getting stored in csv format in S3 bucket.
The csv files has two columns – EventTime and instance id.
I am reading csv file via one of function in my lambda handler:
def read_instanceids(path):
s3 = boto3.resource('s3')
bucket = s3.Bucket('aws-athena-query-results-mybucket-us-east-1')
obj = bucket.Object(key= path)
response = obj.get()
lines = response['Body'].read().decode('utf-8').split()
return lines**
Output:
[
""eventTime","instanceId"",
""2021-09-27T19:46:08Z","""i-0aa1f4dd"""",
""2021-09-27T21:04:13Z","""i-0465c287"""",
""2021-09-27T21:10:48Z","""i-08b75f79"""",
""2021-09-27T19:40:43Z","""i-0456700b"""",
""2021-03-29T21:58:40Z","""i-0724f99f"""",
""2021-03-29T23:27:44Z","""i-0fafbe64"""",
""2021-03-29T21:41:12Z","""i-0064a8552"""",
""2021-03-29T23:19:09Z","""i-07f5f08e5""""
]
I want to store only my instance ids in one array.
How I can achieve that. I cant use Pandas/Numpy.
If I am using get_query_results – and returning the response – its in the below format:
[
{
"Data": [
{
"VarCharValue": "eventTime"
},
{
"VarCharValue": "instanceId"
}
]
},
{
"Data": [
{
"VarCharValue": "2021-09-23T22:36:15Z"
},
{
"VarCharValue": ""i-053090803""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T21:58:40Z"
},
{
"VarCharValue": ""i-0724f62a""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T21:41:12Z"
},
{
"VarCharValue": ""i-552""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T23:19:09Z"
},
{
"VarCharValue": ""i-07f4e5""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T23:03:09Z"
},
{
"VarCharValue": ""i-0eb453""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-30T19:18:11Z"
},
{
"VarCharValue": ""i-062120""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-30T18:15:26Z"
},
{
"VarCharValue": ""i-0121a04""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-29T23:27:44Z"
},
{
"VarCharValue": ""i-0f213""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-30T18:07:05Z"
},
{
"VarCharValue": ""i-0ee19d8""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-04-28T14:49:22Z"
},
{
"VarCharValue": ""i-04ad3c29""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-04-28T14:38:43Z"
},
{
"VarCharValue": ""i-7c6166""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-03-30T19:13:42Z"
},
{
"VarCharValue": ""i-07bc579d""
}
]
},
{
"Data": [
{
"VarCharValue": "2021-04-29T19:47:34Z"
},
{
"VarCharValue": ""i-0b8bc7df5""
}
]
}
]
3
Answers
IF your list was valid, you can do:
Python has csv module in standard library. https://docs.python.org/3/library/csv.html
But in this use case, if instanceIds doesn’t contain comma you can split lines by comma, take second field and strip double quotes.
You can use the result returned from Amazon Athena via
get_query_results()
.If the
data
variable contains the JSON shown in your question, you can extract a list of the instances with:The output is:
You can skip the column header by referencing:
rows[1:]