How can I convert a json file to a pandas dataframe

Matthijs
May 17, 2024
204 views
1 vote
2 Answers

I have a json file with this structure:

[
  {
    "name": "myName",
    "type": {
      "x": {
        "id": [
          "x1",
          "x2"
        ]
      },
      "y": {
        "id": "y1"
      },
      "z": {
        "id": "z1"
      }
    }
  }
]

Type can only take values x, y, and z. Any of those can be missing, but there is always at least one x, y, or z. There is no limit to the number of id’s for each type, although there are never more than ten id-values in the data.

I would like to transform that into a Pandas dataframe, with structure:

Name	type	id
myName	x	x1
myName	x	x2
myName	y	y1
myName	z	z1

I’ve experimented with to_json{orient="..."} and json_normalize(), but I’m not able to flatten the lists in the json.

Answers

You should first create the rows, than convert those into DF:

data = json.loads(json_data)

def create_rows(data):
  rows = []
  for item in data:
    name = item['name']
    type_info = item['type']
    
    for type_key, type_value in type_info.items():
        ids = type_value['id']
        if not isinstance(ids, list):
            ids = [ids]
        for id_value in ids:
            rows.append({"name": name, "type": type_key, "id": id_value})
              
  return rows

rows = create_rows(data)
df = pd.DataFrame(rows)

Result:

|    | name   | type   | id   |
|---:|:-------|:-------|:-----|
|  0 | myName | x      | x1   |
|  1 | myName | x      | x2   |
|  2 | myName | y      | y1   |
|  3 | myName | z      | z1   |

Assuming lst the input (as python object) and using a nested list comprehension:

df = pd.DataFrame([{'name': d['name'], 'type': t, 'id': i}
                   for d in lst
                   for t, ids in d['type'].items()
                   for i in (ids['id'] if isinstance(ids['id'], list)
                             else [ids['id']])
                  ])

Output:

     name type  id
0  myName    x  x1
1  myName    x  x2
2  myName    y  y1
3  myName    z  z1

Which can be simplified using explode:

df = pd.DataFrame([{'name': d['name'], 'type': t, 'id': ids['id']}
                   for d in lst
                   for t, ids in d['type'].items()
                  ]).explode('id')

Output:

     name type  id
0  myName    x  x1
0  myName    x  x2
1  myName    y  y1
2  myName    z  z1

Used input:

lst = [
  {
    "name": "myName",
    "type": {
      "x": {
        "id": [
          "x1",
          "x2"
        ]
      },
      "y": {
        "id": "y1"
      },
      "z": {
        "id": "z1"
      }
    }
  }
]

Please signup or login to give your own answer.

Click here to cancel reply.