Separating data in .json file in python

wallydog
April 15, 2024
127 views
0 votes
2 Answers

I’m trying to separate some data into a metadata section and the actual measurement taken. Each file has multiple measurements.

The .json file comes out like this.

{
    "version": "1",                                
    "data": [                                
        {                                ##This starts the first point
            "metadata1": 1.1,            ##metadata parts
            "metadata2": 2.1,            #
            "metadata3": 3.1,            #
            "metadata4": 4.1,            #
            "metadata5": 5.5,            #
            "measurements": [            ##This is the part that I can't get out
                {                        #
                    "datax": 0,          #This is the first x-value of the first measurement
                    "datay": 10          #This is the first y-value of the first measurement
                },
                {
                    "datax": .3,         #This is the second x-value of the first measurement
                    "datay": 15          #This is the second y-value of the first measurement
                },
                {
                    "datax": .7,
                    "datay": 25
                },
                {
                    "datax": 1.1,
                    "datay": 40
                },
                {
                    "datax": 1.7,
                    "datay": 55
                },
            ]
        },
        {                                ##This starts the second point
            "metadata1": 1.2,            ##metadata parts
            "metadata2": 2.2,            #
            "metadata3": 3.2,            #
            "metadata4": 4.4,            #
            "metadata5": 5.5,            #
            "measurements": [            ##This is the part that I can't get out
                {                        #
                    "datax": 1,          #This is the first x-value of the second measurement
                    "datay": 20          #This is the first y-value of the second measurement
                },
                {
                    "datax": 2.3,
                    "datay": 35
                },
                {
                    "datax": 3.7,
                    "datay": 25
                },
                {
                    "datax": 4.1,
                    "datay": 30
                },
                {
                    "datax": 5.7,
                    "datay": 32
                },
            ]
        }
    ]
}

I can get the metadata out easily but only the first measurement x and y value get grabbed and stuck into the same slot. Any attempt to get the data out of the measurements results in errors. I think the biggest problem is all the x and y values are labeled the same, which throws everything off. I want to group all the first measuements datax and datay into lists, and the second and so on measuements datax and datay into lists.

with open('Filename.json') as f:
    data = json.load(f)
df=pd.DataFrame(data['data'])
display(df)

metadata1	metadata2	metadata3	metadata4	metadata5	measurements
1.1	2.1	3.1	4.1	5.1	[{‘datax’: 0, ‘datay’: 10}]
1.2	2.2	3.2	4.2	5.2	[{‘datax’: 1, ‘datay’: 20}]

I’ve tried reading the file measurements in a similar way but it only gives KeyError: ‘measurements’

with open('Filename.json') as a:
    measurements = json.load(a)
df1=pd.DataFrame(data['measurements'])

What I want is to get all the metadata into one list for each aspect of the metadata which is currently working.
What I now need is to separate the data into:

datax into a list of datax at point 1,
datay into a list of datay at point 1,
datax into a list of datax at point 2,
datay into a list of datay at point 2,
etc.

The metadata will always be in the same spot in each file but the measurements section can be a variable length.

I’m still new to this and never worked with json files before so forgive me if this is a simple solution. Also feel free to ask questions if I didn’t explain something well.

Answers

IIUC you can do:

for d in data["data"]:
    for i, dct in enumerate(d.pop("measurements"), 1):
        d[f"point_{i}_x"] = dct["datax"]
        d[f"point_{i}_y"] = dct["datay"]

df = pd.DataFrame(data["data"])
print(df)

Prints:

   metadata1  metadata2  metadata3  metadata4  metadata5  point_1_x  point_1_y  point_2_x  point_2_y  point_3_x  point_3_y  point_4_x  point_4_y  point_5_x  point_5_y
0        1.1        2.1        3.1        4.1        5.5          0         10        0.3         15        0.7         25        1.1         40        1.7         55
1        1.2        2.2        3.2        4.4        5.5          1         20        2.3         35        3.7         25        4.1         30        5.7         32

OR: looking at your comment, maybe:

for i, d in enumerate(data["data"], 1):
    datax, datay = [], []

    for m in d.pop("measurements"):
        datax.append(m["datax"])
        datay.append(m["datay"])

    d[f"datax{i}"] = datax
    d[f"datay{i}"] = datay

df = pd.DataFrame(data["data"])
print(df)

Prints:

   metadata1  metadata2  metadata3  metadata4  metadata5                   datax1                datay1                   datax2                datay2
0        1.1        2.1        3.1        4.1        5.5  [0, 0.3, 0.7, 1.1, 1.7]  [10, 15, 25, 40, 55]                      NaN                   NaN
1        1.2        2.2        3.2        4.4        5.5                      NaN                   NaN  [1, 2.3, 3.7, 4.1, 5.7]  [20, 35, 25, 30, 32]

- JRiggles
- April 15, 2024 at 8:50 pm
- 0 votes
0
Ignoring the comments in the example JSON, there are still a few minor syntax and formatting issues. However, once you fix those you can access the keys in your JSON-turned-DataFrame like so:
```
import json
import pandas as pd

with open('Filename.json') as a:
    jsondata = json.load(a)  # renamed this to jsondata to avoid confusion with 'data'

# you can access the measurements from the dataframe like so
df1 = pd.DataFrame(jsondata['data'])['measurements']
print(df1)

# 0    [{'datax': 0, 'datay': 10}, {'datax': 0.3, 'da...
# 1    [{'datax': 1, 'datay': 20}, {'datax': 2.3, 'da...
# Name: measurements, dtype: object
```
Each metadata item is available in the dataframe as a separate key as well, e.g.:
```
df1 = pd.DataFrame(jsondata['data'])['metadata1']
print(df1)

# 0    1.1
# 1    1.2
# Name: metadata1, dtype: float64
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.