Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Read json with multiple levels into a DataFrame [python]

Andr233Louren231o
December 7, 2022
269 views
1 vote
2 Answers

I have json files in this generic format:

{"attribute1": "test1",
 "attribute2": "test2",
 "data": {
      "0": 
         {"metadata": {
             "timestamp": "2022-08-14"},
         "detections": {
             "0": {"dim1": 40, "dim2": 30},
             "1": {"dim1": 50, "dim2": 20}}},
      "1": 
         {"metadata": {
             "timestamp": "2022-08-15"},
         "detections": {
             "0": {"dim1": 30, "dim2": 10},
             "1": {"dim1": 100, "dim2": 80}}}}}

These json files refer to the collection of measurements through a 3D camera. The upper levels in the key data correspond to frames and each frame has its own metadata and can have multiple detections objects, each object with its own dimensions (here represented by dim1 and dim2). I want to convert this type of json file to a pandas DataFrame in the following format:

timestamp	dim1	dim2
2022-08-14	40	30
2022-08-14	50	20
2022-08-15	30	10
2022-08-15	100	80

So, any fields in metadata (here I only added timestamp but there could be several) must be repeated for each entry in the detection key.

I can convert this type of json to a pandas DataFrame, but it requires multiple steps and for loops within a single file to concatenate everything at the end. I have also tried pd.json_normalize and playing with the arguments record_path, meta and max_level but so far I was not able to, in a few steps, convert this type of json to a DataFrame. Is there a clean way to do this?

Answers

I think a good solution could be:

data = [dict(d1, **{'detections': list(d1['detections'].values())}) 
        for d1 in d['data'].values()]
#data = list(map(lambda d1: dict(d1, 
#                **{'detections': list(d1['detections'].values())}),
#               d['data'].values()))

print(data)
df = 
pd.json_normalize(data, 'detections', [['metadata', 'timestamp']])
.rename({'metadata.timestamp': 'timestamp'}, axis=1)
print(df)

#[{'metadata': {'timestamp': '2022-08-14'}, 'detections': [{'dim1': 40, 'dim2': 30}, {'dim1': 50, 'dim2': 20}]}, {'metadata': {'timestamp': '2022-08-15'}, 'detections': [{'dim1': 30, 'dim2': 10}, {'dim1': 100, 'dim2': 80}]}]
#   dim1  dim2   timestamp
#0    40    30  2022-08-14
#1    50    20  2022-08-14
#2    30    10  2022-08-15
#3   100    80  2022-08-15

Use nested dictioanry comprehension for flatten values and merge subdictionaries, last pass to DataFrame constructor:

json = {"attribute1": "test1",
 "attribute2": "test2",
 "data": {
      "0": 
         {"metadata": {
             "timestamp": "2022-08-14"},
         "detections": {
             "0": {"dim1": 40, "dim2": 30},
             "1": {"dim1": 50, "dim2": 20}}},
      "1": 
         {"metadata": {
             "timestamp": "2022-08-15"},
         "detections": {
             "0": {"dim1": 30, "dim2": 10},
             "1": {"dim1": 100, "dim2": 80}}}}}

L = [{**x['metadata'], **y} for x in json['data'].values() 
                            for y in x['detections'].values()]

df = pd.DataFrame(L)
print (df)
    timestamp  dim1  dim2
0  2022-08-14    40    30
1  2022-08-14    50    20
2  2022-08-15    30    10
3  2022-08-15   100    80

Please signup or login to give your own answer.

Click here to cancel reply.