Amazon web services - AWS Athena query returning Unexpected token < in JSON

TreyCopeland
May 24, 2023
212 views
0 votes
2 Answers

I have a JSON file in s3 that I’m trying to run queries against using Athena. When I run my Lambda that tries to call Athena, I get this error: SyntaxError: Unexpected token < in JSON at position 0 at JSON.parse (<anonymous>)

Any help would be greatly appreciated.

My code looks like this:

const athena_client = new AthenaClient({ region: process.env.AWS_REGION });

const sql = `SELECT * FROM customer_data`

const params = {
            QueryString: sql,
            WorkGroup: 'customer-data-workgroup-prod'
        };

const command = new StartQueryExecutionCommand(params);

console.log("command", command)

try {
        const data = await client.send(command);
        console.log("data response: ", data)
  } catch (error) {
        console.log("error", error)
  }

In s3, I have a file called movies.json, that just pulls from this test API – https://jsonmock.hackerrank.com/api/movies

My Glue Table Cloudformation template looks like this:

  Type: AWS::Glue::Table
  Properties:
    CatalogId: !Ref AWS::AccountId
    DatabaseName: !Ref GlueDatabase
    TableInput:
      Name: customer_data
      StorageDescriptor:
        Columns:
          - Name: createdAt
            Type: timestamp
          - Name: fullJson  
            Type: string
          - Name: data  
            Type: string  
          - Name: title
            Type: string                
        Location: 's3://customer-data-bucket-prod/'
        InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
        OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
        SerdeInfo:
          SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
      Parameters: {'classification': 'json'}
      TableType: "EXTERNAL_TABLE"

JSON:

{
  "page": 1,
  "per_page": 10,
  "total": 2770,
  "total_pages": 277,
  "data": [
    {
      "Title": "Waterworld",
      "Year": 1995,
      "imdbID": "tt0114898"
    },
    {
      "Title": "Waterworld",
      "Year": 1995,
      "imdbID": "tt0189200"
    },
    {
      "Title": "The Making of 'Waterworld'",
      "Year": 1995,
      "imdbID": "tt2670548"
    },
    {
      "Title": "Waterworld 4: History of the Islands",
      "Year": 1997,
      "imdbID": "tt0161077"
    },
    {
      "Title": "Waterworld",
      "Year": 1997,
      "imdbID": "tt0455840"
    },
    {
      "Title": "Waterworld",
      "Year": 1997,
      "imdbID": "tt0390617"
    },
    {
      "Title": "Swordquest: Waterworld",
      "Year": 1983,
      "imdbID": "tt2761086"
    },
    {
      "Title": "Behind the Scenes of the Most Fascinating Waterworld on Earth: The Great Backwaters, Kerala.",
      "Year": 2014,
      "imdbID": "tt5847056"
    },
    {
      "Title": "Louise's Waterworld",
      "Year": 1997,
      "imdbID": "tt0298417"
    },
    {
      "Title": "Waterworld",
      "Year": 2001,
      "imdbID": "tt0381702"
    }
  ]
}

Answers

This is the Glue table schema to allow Athena parsing the text file correctly.

"Columns": [
    {
        "Name": "page",
        "Type": "int"
    },
    {
        "Name": "per_page",
        "Type": "int"
    },
    {
        "Name": "total",
        "Type": "int"
    },
    {
        "Name": "total_pages",
        "Type": "int"
    },
    {
        "Name": "data",
        "Type": "array<struct<Title:string,Year:int,imdbID:string>>"
    }
]

"InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",

"SerdeInfo": {
    "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe",
    "Parameters": {
        "paths": "data,page,per_page,total,total_pages"
    }
}

In Athena,

SELECT data FROM customer_data;

will return all movies in an array

[{title=Waterworld, year=1995, imdbid=tt0114898}, {title=Waterworld, year=1995, imdbid=tt0189200}, {title=The Making of 'Waterworld', year=1995, imdbid=tt2670548}, {title=Waterworld 4: History of the Islands, year=1997, imdbid=tt0161077}, {title=Waterworld, year=1997, imdbid=tt0455840}, {title=Waterworld, year=1997, imdbid=tt0390617}, {title=Swordquest: Waterworld, year=1983, imdbid=tt2761086}, {title=Behind the Scenes of the Most Fascinating Waterworld on Earth: The Great Backwaters, Kerala., year=2014, imdbid=tt5847056}, {title=Louise's Waterworld, year=1997, imdbid=tt0298417}, {title=Waterworld, year=2001, imdbid=tt0381702}]

If you want a regular table, create a view based on this query

WITH w AS
  (SELECT DATA
   FROM customer_data)
SELECT movie.title,
       movie.year,
       movie.imdbid
FROM w,
     UNNEST (DATA) t(movie);

which gives you the result

#   title   year    imdbid
1   Waterworld  1995    tt0114898
2   Waterworld  1995    tt0189200
3   The Making of 'Waterworld'  1995    tt2670548
...

As @Isc also mentioned, you need to define the columns as described in the json data.

Glue Table definition expects the column to be present in the data.Glue crawler crawls the data from external location using the Serde configuration and maps the data in the specified columns.
Hence you need to ensure that right schema is defined. Based on that, it crawls.

Your cloudformation template Glue::Table should look like this:

 Type: AWS::Glue::Table
  Properties:
    CatalogId: !Ref AWS::AccountId
    DatabaseName: !Ref GlueDatabase
    TableInput:
      Name: customer_data
      StorageDescriptor:
        Columns:
          - Name: page
            Type: int
          - Name: per_page  
            Type: int
          - Name: total  
            Type: int  
          - Name: total_pages
            Type: int     
          - Name: data
            Type: array<struct<Title:string,Year:int,imdbID:string>>           
        Location: 's3://customer-data-bucket-prod/'
        InputFormat: org.apache.hadoop.mapred.TextInputFormat
        OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
        SerdeInfo:
          SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
      Parameters: {'classification': 'json'}
      TableType: "EXTERNAL_TABLE"

Please signup or login to give your own answer.

Click here to cancel reply.

Amazon web services – AWS Athena query returning Unexpected token < in JSON

Answers