skip to Main Content

I’m trying to separate some data into a metadata section and the actual measurement taken. Each file has multiple measurements.

The .json file comes out like this.

{
    "version": "1",                                
    "data": [                                
        {                                ##This starts the first point
            "metadata1": 1.1,            ##metadata parts
            "metadata2": 2.1,            #
            "metadata3": 3.1,            #
            "metadata4": 4.1,            #
            "metadata5": 5.5,            #
            "measurements": [            ##This is the part that I can't get out
                {                        #
                    "datax": 0,          #This is the first x-value of the first measurement
                    "datay": 10          #This is the first y-value of the first measurement
                },
                {
                    "datax": .3,         #This is the second x-value of the first measurement
                    "datay": 15          #This is the second y-value of the first measurement
                },
                {
                    "datax": .7,
                    "datay": 25
                },
                {
                    "datax": 1.1,
                    "datay": 40
                },
                {
                    "datax": 1.7,
                    "datay": 55
                },
            ]
        },
        {                                ##This starts the second point
            "metadata1": 1.2,            ##metadata parts
            "metadata2": 2.2,            #
            "metadata3": 3.2,            #
            "metadata4": 4.4,            #
            "metadata5": 5.5,            #
            "measurements": [            ##This is the part that I can't get out
                {                        #
                    "datax": 1,          #This is the first x-value of the second measurement
                    "datay": 20          #This is the first y-value of the second measurement
                },
                {
                    "datax": 2.3,
                    "datay": 35
                },
                {
                    "datax": 3.7,
                    "datay": 25
                },
                {
                    "datax": 4.1,
                    "datay": 30
                },
                {
                    "datax": 5.7,
                    "datay": 32
                },
            ]
        }
    ]
}

I can get the metadata out easily but only the first measurement x and y value get grabbed and stuck into the same slot. Any attempt to get the data out of the measurements results in errors. I think the biggest problem is all the x and y values are labeled the same, which throws everything off. I want to group all the first measuements datax and datay into lists, and the second and so on measuements datax and datay into lists.

with open('Filename.json') as f:
    data = json.load(f)
df=pd.DataFrame(data['data'])
display(df)
metadata1 metadata2 metadata3 metadata4 metadata5 measurements
1.1 2.1 3.1 4.1 5.1 [{‘datax’: 0, ‘datay’: 10}]
1.2 2.2 3.2 4.2 5.2 [{‘datax’: 1, ‘datay’: 20}]

I’ve tried reading the file measurements in a similar way but it only gives KeyError: ‘measurements’

with open('Filename.json') as a:
    measurements = json.load(a)
df1=pd.DataFrame(data['measurements'])

What I want is to get all the metadata into one list for each aspect of the metadata which is currently working.
What I now need is to separate the data into:

  • datax into a list of datax at point 1,
  • datay into a list of datay at point 1,
  • datax into a list of datax at point 2,
  • datay into a list of datay at point 2,
  • etc.

The metadata will always be in the same spot in each file but the measurements section can be a variable length.

I’m still new to this and never worked with json files before so forgive me if this is a simple solution. Also feel free to ask questions if I didn’t explain something well.

2

Answers


  1. IIUC you can do:

    for d in data["data"]:
        for i, dct in enumerate(d.pop("measurements"), 1):
            d[f"point_{i}_x"] = dct["datax"]
            d[f"point_{i}_y"] = dct["datay"]
    
    df = pd.DataFrame(data["data"])
    print(df)
    

    Prints:

       metadata1  metadata2  metadata3  metadata4  metadata5  point_1_x  point_1_y  point_2_x  point_2_y  point_3_x  point_3_y  point_4_x  point_4_y  point_5_x  point_5_y
    0        1.1        2.1        3.1        4.1        5.5          0         10        0.3         15        0.7         25        1.1         40        1.7         55
    1        1.2        2.2        3.2        4.4        5.5          1         20        2.3         35        3.7         25        4.1         30        5.7         32
    

    OR: looking at your comment, maybe:

    for i, d in enumerate(data["data"], 1):
        datax, datay = [], []
    
        for m in d.pop("measurements"):
            datax.append(m["datax"])
            datay.append(m["datay"])
    
        d[f"datax{i}"] = datax
        d[f"datay{i}"] = datay
    
    df = pd.DataFrame(data["data"])
    print(df)
    

    Prints:

       metadata1  metadata2  metadata3  metadata4  metadata5                   datax1                datay1                   datax2                datay2
    0        1.1        2.1        3.1        4.1        5.5  [0, 0.3, 0.7, 1.1, 1.7]  [10, 15, 25, 40, 55]                      NaN                   NaN
    1        1.2        2.2        3.2        4.4        5.5                      NaN                   NaN  [1, 2.3, 3.7, 4.1, 5.7]  [20, 35, 25, 30, 32]
    
    Login or Signup to reply.
  2. Ignoring the comments in the example JSON, there are still a few minor syntax and formatting issues. However, once you fix those you can access the keys in your JSON-turned-DataFrame like so:

    import json
    import pandas as pd
    
    with open('Filename.json') as a:
        jsondata = json.load(a)  # renamed this to jsondata to avoid confusion with 'data'
    
    # you can access the measurements from the dataframe like so
    df1 = pd.DataFrame(jsondata['data'])['measurements']
    print(df1)
    
    # 0    [{'datax': 0, 'datay': 10}, {'datax': 0.3, 'da...
    # 1    [{'datax': 1, 'datay': 20}, {'datax': 2.3, 'da...
    # Name: measurements, dtype: object
    

    Each metadata item is available in the dataframe as a separate key as well, e.g.:

    df1 = pd.DataFrame(jsondata['data'])['metadata1']
    print(df1)
    
    # 0    1.1
    # 1    1.2
    # Name: metadata1, dtype: float64
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search