skip to Main Content

Trying to load a nested JSON file into Duckdb.
Here is the JSON contents:

{
   "MainLevel":[
      {
         "More":{
         }
      },
      {
         "More":{
            "Level2":[
               {
                  "Field1":"A"
               }
            ]
         }
      }
   ]
}

This is the Duckdb SQL script I am using to load:

CREATE TABLE duckdbtest1.main.nested_JSON AS
SELECT 
    Level2.Field1,
FROM 
    (SELECT unnest(MainLevel) as MainLevel
     FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
     as MainLevel,
    unnest(MainLevel.More.Level2) as Level2;

The error I get is:

SQL Error: java.sql.SQLException: Binder Error: Table "Level2" does not have a column named "Field1"
LINE 3: Level2.Field1,

I have done similar things in the past with no issues.

The only thing I can think of is that the first "More" has no "Level2".

Could that be causing an issue? Ideas of how to get this to work?

Have tried various combinations of unnest and left joins on unnest with no success.

Here is my hacky trial-and-error (I obviously have no idea what I’m doing):

1)

SELECT 
    *
FROM 
    (SELECT unnest(MainLevel) as MainLevel
     FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
     as MainLevel,
    unnest(MainLevel.More.Level2) as Level2;

Gives:

MainLevel unnest
{More={Level2=[{Field1=A}]}} {Field1=A}

2)

SELECT 
    Mainlevel
FROM 
    (SELECT unnest(MainLevel) as MainLevel
     FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
     as MainLevel,
    unnest(MainLevel.More.Level2) as Level2;

Gives:

MainLevel
{More={Level2=[{Field1=A}]}}

3)

SELECT 
    Mainlevel
FROM 
    (SELECT unnest(MainLevel) as MainLevel
     FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
     as MainLevel;
 

Gives:

MainLevel
{More={Level2=null}}
{More={Level2=[{Field1=A}]}}

4)

SELECT 
    Mainlevel
FROM 
    (SELECT unnest(MainLevel, recursive := true) as MainLevel
     FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
     as MainLevel;
 

Gives:

Mainlevel
{Level2=null}
{Level2=[{Field1=A}]}

2

Answers


  1. There seems to be a behaviour change starting in DuckDB 1.1.0

    duckdb.sql("""
    from (
       from values ([{'More': {'Level2': [{'Field1': 'A'}]}}]) as tbl(MainLevel)
       select unnest(MainLevel) as MainLevel
    ), unnest(MainLevel.More.Level2) as FOOBAR
    """)                           # ^^^^^^^^^ this was always "ignored" in all versions
    

    Previously, it seems the unnest was inherting the name from the final identifier in the path.

    DuckDB 1.0.0

    ┌──────────────────────────────────────────────────────┬────────────────────────┐
    │                      MainLevel                       │         Level2         │
    │ struct(more struct(level2 struct(field1 varchar)[])) │ struct(field1 varchar) │
    ├──────────────────────────────────────────────────────┼────────────────────────┤
    │ {'More': {'Level2': [{'Field1': A}]}}                │ {'Field1': A}          │
    └──────────────────────────────────────────────────────┴────────────────────────┘
    

    This made it seem like your as Level2 was doing something, but it was infact a "no-op".

    DuckDB 1.1.0

    ┌──────────────────────────────────────────────────────┬────────────────────────┐
    │                      MainLevel                       │         unnest         │
    │ struct(more struct(level2 struct(field1 varchar)[])) │ struct(field1 varchar) │
    ├──────────────────────────────────────────────────────┼────────────────────────┤
    │ {'More': {'Level2': [{'Field1': A}]}}                │ {'Field1': A}          │
    └──────────────────────────────────────────────────────┴────────────────────────┘
    

    Now you end up with the literal name unnest (I’m not sure if this is intended or not?)

    As for the structure of your query: from (subquery), unnest()

    That would read to me as from (subquery) select ... in which case the as ... applies.

    duckdb.sql("""
    from (
       from values ([{'More': {'Level2': [{'Field1': 'A'}]}}]) as tbl(MainLevel)
       select unnest(MainLevel) as MainLevel
    )
    select unnest(MainLevel.More.Level2) as FOOBAR
    """)
    
    ┌────────────────────────┐
    │         FOOBAR         │
    │ struct(field1 varchar) │
    ├────────────────────────┤
    │ {'Field1': A}          │
    └────────────────────────┘
    
    Login or Signup to reply.
  2. An unproblematic way to express the query would be:

    SELECT Level2.Field1
    FROM (SELECT unnest(MainLevel.More.Level2) as Level2
          FROM (SELECT unnest(MainLevel) as MainLevel,
                FROM read_JSON_auto('input.json') ));
    
    

    The hierarchy of SELECT/FROM statements allows DuckDB to work its unnest magic in a straightforward way. In particular, since unnest(NULL) has the effect of zapping a row, by the time the
    SELECT Level2.Field1 statement starts executing, the row that would otherwise have caused it to fail has been removed.

    By contrast, the statement having the form:

    FROM 
        (SELECT unnest(MainLevel) as MainLevel
         FROM read_JSON_auto('input.json') as MainLevel,
         unnest(MainLevel.More.Level2) as Level2
    

    seems to be at best dubious because of the way SELECT unnest(), unnest() works.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search