I tried different ways to get this done, but unable to do so. Please help here.
I am trying to explode a nested json list using python pd.json_normalize. After exploding the output is coming in two rows causing duplicate PKeys. My data is below and i am trying to get a single record with "id" as pkey and values in the nested list
{
"data": [
{
"id": "123456",
"created_by": "[email protected]",
"created_at": "2023-12-11",
"modified_by": "[email protected]",
"modified_at": "2023-12-11",
"rg": {
"id": "xyz234",
"parent_id": "cpqrst",
"account_id": "ntb1we",
"values": [
{
"value1": 100,
"value2": 101,
"value10": {
"valueid": "btqns",
},
{
"value1": 200,
"value2": 201,
"value10": {
"valueid": "apqns",
}
}
],
"page_info": {
"page": 1,
"per_page": 100,
"total_pages": 286,
"total_count": 28580
}
}
my code as below to normalize and explode
response = requests.get(getURL, params={"page": page}, headers=headers)
if response.status_code == 200:
json_data = response.json()
target_schema = StructType([StructField(.................. StructField('version', LongType(), True)])
normalized_data = pd.json_normalize(json_data,["data"],"page_info"]).explode("values").reset_index(drop=True)
JSONdf = json.dumps(json.loads(normalized_data.to_json(orient="records")))
results.append(JSONdf)
This provides data in the format as below with id value 123456 repeating twice for values
Current output with id 123456 repeating in 2 rows for values.value1 and values.value2
desired output is to have a single row with id 123456 for values column having the entire list
Any help greatly appreciated in getting the desired output. I tried all possible ways but could not get the desired output.
2
Answers
You can achieve this by adjusting the column names inside the pd.json_normalize function according to the actual structure of your JSON data.
IIUC use:
If need separator
_
: