skip to Main Content

I’m starting working on json columns in google bigquery.

My aim is to implement a query which: extracts a given value in the json object based on path and returns its real SQL value.

Example: given {"foo": {"bar": 123}}, I’d like to get the $.foo.bar value as integer.
But, when querying the table containing json column I get a string.
I do understand this behavior is well documented.
I’m searching for a snippet / recipe / a way to implement something that would do the scalar casting for me.

I got to something which is not valid SQL, but represents what I search for:

WITH
  base AS (
  SELECT
    JSON_QUERY( JSON '{"foo": {"unknown":123}}', '$.foo') AS col)
SELECT
  CASE JSON_TYPE(JSON_QUERY(col, '$.unknown'))
    WHEN 'null' THEN NULL
    WHEN 'number' THEN LAX_FLOAT64(JSON_QUERY(col, '$.unknown'))
    WHEN 'boolean' THEN LAX_BOOL(JSON_QUERY(col, '$.unknown'))
    WHEN 'string' THEN LAX_STRING(JSON_QUERY(col, '$.unknown'))
  ELSE
  -- array or map are not scalar - keep them as they are:
  JSON_QUERY(col, '$.unknown')
END as output
FROM
  base

Any help is highly appreciated! THANKS 🙂

2

Answers


  1. WITH base AS (
      SELECT '{"foo": {"unknown":123}}' AS col
    )
    
    SELECT
      CASE 
        WHEN JSON_TYPE(JSON_QUERY(col, '$.unknown')) = 'null' THEN NULL
        WHEN JSON_TYPE(JSON_QUERY(col, '$.unknown')) = 'number' THEN CAST(JSON_EXTRACT(col, '$.unknown') AS FLOAT64)
        WHEN JSON_TYPE(JSON_QUERY(col, '$.unknown')) = 'boolean' THEN CAST(JSON_EXTRACT(col, '$.unknown') AS BOOL)
        WHEN JSON_TYPE(JSON_QUERY(col, '$.unknown')) = 'string' THEN CAST(JSON_EXTRACT(col, '$.unknown') AS STRING)
        ELSE JSON_QUERY(col, '$.unknown')
      END AS output
    FROM base;
    

    This query uses the JSON_EXTRACT function to extract the value and then uses a CASE statement to check the JSON type and perform the appropriate casting.

    Make sure to replace the example JSON string in the base CTE with your actual column reference.

    Login or Signup to reply.
  2. Query:

    In your query output column expecting only one data type,so instead of creating one column you can consider the below query:

    select  *, if(type="string",LAX_STRING(col),null) stringcolumn, if(type="number",LAX_FLOAT64(col),null) intcolumn, if(type="boolean",LAX_BOOL(col),null) boolcolumn, from (select *,json_type(col)as type from base);
    

    Result:

    image

    From the above result,you can select the column that does not contain null values. For example if your input value is number then column
    stringcolumn and boolcolumn value is null. Using a subquery instead of selecting all columns you can select the intcolumn column.

    Based on my understanding there is no builtin BigQuery function that extracts an exact given value in the json object based on the path and returns its real SQL value. So If you want this feature in BigQuery, you can raise a feature request in Issue Tracker.

    For more information you can refer to these link1,link2 and link3.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search