i’m having trouble trying to read a json just like below:
{
"cats":{
"1":{
"name":"meow",
"favorite_food":"milk"
},
"2":{
"name":"honey",
"favorite_food":"fish"
},
"3":{
"name":"cutie",
"favorite_food":"milk"
},
"4":{
"name":"puss",
"favorite_food":"chicken"
},
"n"......
}
}
I want to output a table_cats like this:
name | favorite_food |
---|---|
meow | milk |
honey | fish |
the problem that i’m having is that I can have multiple "ids" and multiple keys inside each id (like favorite_toy, favorite_owner, etc) – i need to read this dynamic json.
any ideas?
I was trying reading, exploding and in some way unpivoting the output, but it wasn’t practical at all since i can have thousands of columns like this.
2
Answers
You can load your JSON into Python and convert the dictionary under
"cats"
to a list of records:Result:
read JSON files in spark using this spark.read.json(), but use the multiLine option as a single JSON is spread across multiple lines.