skip to Main Content

i’m having trouble trying to read a json just like below:

{
  "cats":{
    "1":{
      "name":"meow",
      "favorite_food":"milk"
    },
    "2":{
      "name":"honey",
      "favorite_food":"fish"
    },
    "3":{
      "name":"cutie",
      "favorite_food":"milk"
    },
    "4":{
      "name":"puss",
      "favorite_food":"chicken"
    },
    "n"......
  }
}

I want to output a table_cats like this:

name favorite_food
meow milk
honey fish

the problem that i’m having is that I can have multiple "ids" and multiple keys inside each id (like favorite_toy, favorite_owner, etc) – i need to read this dynamic json.
any ideas?

I was trying reading, exploding and in some way unpivoting the output, but it wasn’t practical at all since i can have thousands of columns like this.

2

Answers


  1. You can load your JSON into Python and convert the dictionary under "cats" to a list of records:

    import pandas as pd
    import json
    
    data = """{
      "cats":{
        "1":{
          "name":"meow",
          "favorite_food":"milk"
        },
        "2":{
          "name":"honey",
          "favorite_food":"fish"
        },
        "3":{
          "name":"cutie",
          "favorite_food":"milk"
        },
        "4":{
          "name":"puss",
          "favorite_food":"chicken"
        }
      }
    }"""
    
    json_data = json.loads(data)
    df = pd.DataFrame([v for k, v in json_data["cats"].items()])
    

    Result:

        name favorite_food
    0   meow          milk
    1  honey          fish
    2  cutie          milk
    3   puss       chicken
    
    Login or Signup to reply.
  2. read JSON files in spark using this spark.read.json(), but use the multiLine option as a single JSON is spread across multiple lines.

    spark.read.option(multiline=True).json(path)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search