read nested json inside csv file using pandas?

Tue
September 11, 2024
96 views
0 votes
2 Answers

I have a csv file that with rows that looks like this:

745198;2024-09-10 10:09:10.7;leaf-2;{"Accelerometer": {"X": 0.055297852, "Y": 0.993530273, "Z": 0.000244141}}
745199;2024-09-10 10:09:10.71;leaf-2;{"Accelerometer": {"X": 0.056274414, "Y": 0.994384766, "Z": 0.000976563}}
745200;2024-09-10 10:09:10.721;leaf-2;{"Accelerometer": {"X": 0.055786133, "Y": 0.994018555, "Z": 0.000854492}}
745201;2024-09-10 10:09:10.732;leaf-2;{"Accelerometer": {"X": 0.055053711, "Y": 0.993530273, "Z": 0.000854492}}

and I would like to read this data into a dataframe, and somehow get the 3 accelerometer data into separate columns, but I haven’t been able to figure out a good way of doing this.
I have searched for similar cases Split / Explode a column of dictionaries into separate columns with pandas
But none of the suggested solutions seems to work.

I can create a for loop and manually extract the information I need line by line, but I’m guessing there should be a nice fast way of doing this that I just do not know of.

Answers

First convert the JSON string in the ‘Accelerometer’ column to a dictionary:

df = pd.read_csv('file.csv', sep=';', header=None, names=['ID', 'Timestamp', 'Device', 'Accelerometer'])

df['Accelerometer'] = df['Accelerometer'].apply(json.loads)

Then normalize the ‘Accelerometer’ column to separate columns:

accelerometer_df = pd.json_normalize(df['Accelerometer'])
accelerometer_df.columns = ['X', 'Y', 'Z']

Finally concatenate the original DataFrame with the new accelerometer columns:

result_df = pd.concat([df.drop(columns=['Accelerometer']), accelerometer_df], axis=1)
print(result_df)

The output:

       ID                Timestamp  Device         X         Y         Z
0  745198    2024-09-10 10:09:10.7  leaf-2  0.055298  0.993530  0.000244
1  745199   2024-09-10 10:09:10.71  leaf-2  0.056274  0.994385  0.000977
2  745200  2024-09-10 10:09:10.721  leaf-2  0.055786  0.994019  0.000854
3  745201  2024-09-10 10:09:10.732  leaf-2  0.055054  0.993530  0.000854

You can do this entirely in pandas as per the answer from @m-sarabi

However, if you want something that’s more efficient then you could just process the source file using basic Python techniques then finally build your dataframe from the objects you’ve created (a list of dictionaries).

import pandas as pd
import json
from pathlib import Path

SOURCE = Path("foo.csv")

data = []
with SOURCE.open() as source:
    for t in map(lambda x: x.split(";"), source):
        d = dict(zip("ABC", t[:-1]))
        d.update(loads(t[-1])["Accelerometer"])
        data.append(d)

print(pd.DataFrame(data))

Output:

        A                        B       C         X         Y         Z
0  745198    2024-09-10 10:09:10.7  leaf-2  0.055298  0.993530  0.000244
1  745199   2024-09-10 10:09:10.71  leaf-2  0.056274  0.994385  0.000977
2  745200  2024-09-10 10:09:10.721  leaf-2  0.055786  0.994019  0.000854
3  745201  2024-09-10 10:09:10.732  leaf-2  0.055054  0.993530  0.000854

Please signup or login to give your own answer.

Click here to cancel reply.