Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Creating nested json with list from pandas dataframe

TeodoricoLevoff
March 14, 2023
136 views
1 vote
2 Answers

I have a DataFrame with these columns: id, jan_x, jan_y, feb_x, feb_y, ..., dec_x, dec_y that I would like export as a json that is structured like this:

{
    "id1": [
        [jan_x, feb_x, ..., dec_x],
        [jan_y, feb_y, ..., dec_y]
    ],
    "id2": [
        [jan_x, feb_x, ..., dec_x],
        [jan_y, feb_y, ..., dec_y]
    ]
}

The initial keys e.g., id1 correspond to an id in column id of my dataframe. Without any custom parsing function is there a straight forward functional way of achieving this? I have tried dumping it as a json but the desired list structure isn’t captured.

Here is a sample data frame with just two months.

data = {'id': ['1', '2', '3', '4'],
        'jan_x': [1, 2, 3, 4],
        'jan_y': [5, 6, 7, 8],
        'feb_x': [9, 10, 11 12],
        'feb_y': [13 14, 15, 16]}
  
df = pd.DataFrame(data)

Sample Output:

{
    "1": [
        [1, 9],
        [5, 13]
    ],
    "2": [
        [2, 10],
        [6, 14]
    ],
    "3": [
        [3, 11],
        [7, 15]
    ],
    "4": [
        [4, 12],
        [8, 16]
    ]
}

Answers

- Corralien
- March 14, 2023 at 6:16 am
- 0 votes
0
Update:

Use to_dict('list') after aggregate by common suffix (‘x’, ‘y’, …):
```
import json

out = (df.rename(columns=lambda x: x.split('_')[-1])
         .set_index('id').groupby(level=0, axis=1, sort=False)
         .agg(lambda x: x.values.tolist()).T.to_dict('list'))
jd = json.dumps(out, indent=4)
```
Output:
```
>>> out
{'1': [[1, 9], [5, 13]],
 '2': [[2, 10], [6, 14]],
 '3': [[3, 11], [7, 15]],
 '4': [[4, 12], [8, 16]]}
```
Login or Signup to reply.

I am not sure what you mean by "any custom parsing function", but here is my attempt.

The solution uses iterrows go iterate through rows. For each row, I take the columns that ends with _x using x_columns variable that I defined above, and the same for columns that ends with _y. row[x_columns] gives you the values as a pandas Series, therefore you need to convert it to a list before adding to fit your desired format. I append all those to a result variable and print the result variable at the end.

import pandas as pd

# Create a dataframe for demo
data={"id":[15,12,13,22,32,11], "jan_x":[15,12,13,22,32,11],"jan_y":[12,21,23,22,56,11], "feb_x":[15,12,13,22,32,11],"feb_y":[12,21,23,22,56,11]}
df=pd.DataFrame(data)

# Get columns that ends with _x and _y
x_columns = [col for col in df if col.endswith('_x')]
y_columns = [col for col in df if col.endswith('_y')]

# Iterate through rows
results = []
for index, row in df.iterrows():
  results.append({
    "id"+str(row['id']): [
        list(row[x_columns]),
        list(row[y_columns])
        ]
    })
    
print(result)

Here is my example input:

   id  jan_x  jan_y  feb_x  feb_y
0  15     15     12     15     12
1  12     12     21     12     21
2  13     13     23     13     23
3  22     22     22     22     22
4  32     32     56     32     56
5  11     11     11     11     11

and here is the output:

[{'id15': [[15, 15], [12, 12]]}, 
 {'id12': [[12, 12], [21, 21]]}, 
 {'id13': [[13, 13], [23, 23]]}, 
 {'id22': [[22, 22], [22, 22]]}, 
 {'id32': [[32, 32], [56, 56]]}, 
 {'id11': [[11, 11], [11, 11]]}
 ]

Please signup or login to give your own answer.

Click here to cancel reply.