Trying to load a nested JSON file into Duckdb.
Here is the JSON contents:
{
"MainLevel":[
{
"More":{
}
},
{
"More":{
"Level2":[
{
"Field1":"A"
}
]
}
}
]
}
This is the Duckdb SQL script I am using to load:
CREATE TABLE duckdbtest1.main.nested_JSON AS
SELECT
Level2.Field1,
FROM
(SELECT unnest(MainLevel) as MainLevel
FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
as MainLevel,
unnest(MainLevel.More.Level2) as Level2;
The error I get is:
SQL Error: java.sql.SQLException: Binder Error: Table "Level2" does not have a column named "Field1"
LINE 3: Level2.Field1,
I have done similar things in the past with no issues.
The only thing I can think of is that the first "More" has no "Level2".
Could that be causing an issue? Ideas of how to get this to work?
Have tried various combinations of unnest and left joins on unnest with no success.
Here is my hacky trial-and-error (I obviously have no idea what I’m doing):
1)
SELECT
*
FROM
(SELECT unnest(MainLevel) as MainLevel
FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
as MainLevel,
unnest(MainLevel.More.Level2) as Level2;
Gives:
MainLevel | unnest |
---|---|
{More={Level2=[{Field1=A}]}} | {Field1=A} |
2)
SELECT
Mainlevel
FROM
(SELECT unnest(MainLevel) as MainLevel
FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
as MainLevel,
unnest(MainLevel.More.Level2) as Level2;
Gives:
MainLevel |
---|
{More={Level2=[{Field1=A}]}} |
3)
SELECT
Mainlevel
FROM
(SELECT unnest(MainLevel) as MainLevel
FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
as MainLevel;
Gives:
MainLevel |
---|
{More={Level2=null}} |
{More={Level2=[{Field1=A}]}} |
4)
SELECT
Mainlevel
FROM
(SELECT unnest(MainLevel, recursive := true) as MainLevel
FROM read_JSON_auto('C:\jsonfiles\*.json', maximum_object_size = 999999999))
as MainLevel;
Gives:
Mainlevel |
---|
{Level2=null} |
{Level2=[{Field1=A}]} |
2
Answers
There seems to be a behaviour change starting in DuckDB 1.1.0
Previously, it seems the unnest was inherting the name from the final identifier in the path.
DuckDB 1.0.0
This made it seem like your
as Level2
was doing something, but it was infact a "no-op".DuckDB 1.1.0
Now you end up with the literal name
unnest
(I’m not sure if this is intended or not?)As for the structure of your query:
from (subquery), unnest()
That would read to me as
from (subquery) select ...
in which case theas ...
applies.An unproblematic way to express the query would be:
The hierarchy of SELECT/FROM statements allows DuckDB to work its
unnest
magic in a straightforward way. In particular, sinceunnest(NULL)
has the effect of zapping a row, by the time theSELECT Level2.Field1
statement starts executing, the row that would otherwise have caused it to fail has been removed.By contrast, the statement having the form:
seems to be at best dubious because of the way
SELECT unnest(), unnest()
works.