I have json files in S3 bucket generated by AWS Textract service and I’m using Athena to query data from those files. Every file has the same structure and I created a table in Athena where I have column "blocks" that is array of struct:
"blocks": [{
"BlockType": "LINE",
"Id": "12345",
"Text": "Text from document",
"Confidence": 98.7022933959961,
"Page": "1",
"SourceLanguage": "de",
"TargetLanguage": "en",
},
...100+ blocks]
How can I query just for the "Text" property from every block that has one?
Thanks in advance!
2
Answers
It looks like column stores array of rows, so you can process it as one (array functions):
I have defined a table with exact schema of yours using sample JSON provided.
I have used unnest operator to flatten the array of blocks and fetch the Text column from it using below query: