skip to Main Content

I have a json file with this structure:

[
  {
    "name": "myName",
    "type": {
      "x": {
        "id": [
          "x1",
          "x2"
        ]
      },
      "y": {
        "id": "y1"
      },
      "z": {
        "id": "z1"
      }
    }
  }
]

Type can only take values x, y, and z. Any of those can be missing, but there is always at least one x, y, or z. There is no limit to the number of id’s for each type, although there are never more than ten id-values in the data.

I would like to transform that into a Pandas dataframe, with structure:

Name type id
myName x x1
myName x x2
myName y y1
myName z z1

I’ve experimented with to_json{orient="..."} and json_normalize(), but I’m not able to flatten the lists in the json.

2

Answers


  1. You should first create the rows, than convert those into DF:

    data = json.loads(json_data)
    
    def create_rows(data):
      rows = []
      for item in data:
        name = item['name']
        type_info = item['type']
        
        for type_key, type_value in type_info.items():
            ids = type_value['id']
            if not isinstance(ids, list):
                ids = [ids]
            for id_value in ids:
                rows.append({"name": name, "type": type_key, "id": id_value})
                  
      return rows
    
    rows = create_rows(data)
    df = pd.DataFrame(rows)
    

    Result:

    |    | name   | type   | id   |
    |---:|:-------|:-------|:-----|
    |  0 | myName | x      | x1   |
    |  1 | myName | x      | x2   |
    |  2 | myName | y      | y1   |
    |  3 | myName | z      | z1   |
    
    Login or Signup to reply.
  2. Assuming lst the input (as python object) and using a nested list comprehension:

    df = pd.DataFrame([{'name': d['name'], 'type': t, 'id': i}
                       for d in lst
                       for t, ids in d['type'].items()
                       for i in (ids['id'] if isinstance(ids['id'], list)
                                 else [ids['id']])
                      ])
    

    Output:

         name type  id
    0  myName    x  x1
    1  myName    x  x2
    2  myName    y  y1
    3  myName    z  z1
    

    Which can be simplified using explode:

    df = pd.DataFrame([{'name': d['name'], 'type': t, 'id': ids['id']}
                       for d in lst
                       for t, ids in d['type'].items()
                      ]).explode('id')
    

    Output:

         name type  id
    0  myName    x  x1
    0  myName    x  x2
    1  myName    y  y1
    2  myName    z  z1
    

    Used input:

    lst = [
      {
        "name": "myName",
        "type": {
          "x": {
            "id": [
              "x1",
              "x2"
            ]
          },
          "y": {
            "id": "y1"
          },
          "z": {
            "id": "z1"
          }
        }
      }
    ]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search