skip to Main Content

I have a csv file that with rows that looks like this:

745198;2024-09-10 10:09:10.7;leaf-2;{"Accelerometer": {"X": 0.055297852, "Y": 0.993530273, "Z": 0.000244141}}
745199;2024-09-10 10:09:10.71;leaf-2;{"Accelerometer": {"X": 0.056274414, "Y": 0.994384766, "Z": 0.000976563}}
745200;2024-09-10 10:09:10.721;leaf-2;{"Accelerometer": {"X": 0.055786133, "Y": 0.994018555, "Z": 0.000854492}}
745201;2024-09-10 10:09:10.732;leaf-2;{"Accelerometer": {"X": 0.055053711, "Y": 0.993530273, "Z": 0.000854492}}

and I would like to read this data into a dataframe, and somehow get the 3 accelerometer data into separate columns, but I haven’t been able to figure out a good way of doing this.
I have searched for similar cases Split / Explode a column of dictionaries into separate columns with pandas
But none of the suggested solutions seems to work.

I can create a for loop and manually extract the information I need line by line, but I’m guessing there should be a nice fast way of doing this that I just do not know of.

2

Answers


  1. First convert the JSON string in the ‘Accelerometer’ column to a dictionary:

    df = pd.read_csv('file.csv', sep=';', header=None, names=['ID', 'Timestamp', 'Device', 'Accelerometer'])
    
    df['Accelerometer'] = df['Accelerometer'].apply(json.loads)
    

    Then normalize the ‘Accelerometer’ column to separate columns:

    accelerometer_df = pd.json_normalize(df['Accelerometer'])
    accelerometer_df.columns = ['X', 'Y', 'Z']
    

    Finally concatenate the original DataFrame with the new accelerometer columns:

    result_df = pd.concat([df.drop(columns=['Accelerometer']), accelerometer_df], axis=1)
    print(result_df)
    

    The output:

           ID                Timestamp  Device         X         Y         Z
    0  745198    2024-09-10 10:09:10.7  leaf-2  0.055298  0.993530  0.000244
    1  745199   2024-09-10 10:09:10.71  leaf-2  0.056274  0.994385  0.000977
    2  745200  2024-09-10 10:09:10.721  leaf-2  0.055786  0.994019  0.000854
    3  745201  2024-09-10 10:09:10.732  leaf-2  0.055054  0.993530  0.000854
    
    Login or Signup to reply.
  2. You can do this entirely in pandas as per the answer from @m-sarabi

    However, if you want something that’s more efficient then you could just process the source file using basic Python techniques then finally build your dataframe from the objects you’ve created (a list of dictionaries).

    import pandas as pd
    import json
    from pathlib import Path
    
    SOURCE = Path("foo.csv")
    
    data = []
    with SOURCE.open() as source:
        for t in map(lambda x: x.split(";"), source):
            d = dict(zip("ABC", t[:-1]))
            d.update(loads(t[-1])["Accelerometer"])
            data.append(d)
    
    print(pd.DataFrame(data))
    

    Output:

            A                        B       C         X         Y         Z
    0  745198    2024-09-10 10:09:10.7  leaf-2  0.055298  0.993530  0.000244
    1  745199   2024-09-10 10:09:10.71  leaf-2  0.056274  0.994385  0.000977
    2  745200  2024-09-10 10:09:10.721  leaf-2  0.055786  0.994019  0.000854
    3  745201  2024-09-10 10:09:10.732  leaf-2  0.055054  0.993530  0.000854
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search