How to convert a pyspark data frame like below to a json array structure
OrderID field fieldValue itemSeqNo
123 Date 01-01-23 1
123 Amount 10.00 1
123 description Pencil 1
123 Date 01-02-23 2
123 Amount 11.00 2
123 description Pen 2
Into below json array structure
{
"orderDetails": {
"orderID": "123"
},
"itemizationDetails": [
{
"Date": "01-01-23",
"Amount": "10.00",
"description": "Pencil"
},
{
"Date": "01-02-23 ",
"Amount": "11.00",
"description": "Pen"
}
]
}
This is the current code I have and out put is not as expected .
import pandas as pd
test_dataframe = pd.DataFrame(
{
"OrderID" : ['123','123','123','123','123','123'],
"field" :
["Date","Amount",'description','Date','Amount','description'],
"fieldValue": ['01-01-23','10.00','Pencil','01-02-23
','11.00','Pen '],
"itemSeqNo" : ['1','1','1','2','2','2']
}
)
import json
res = json.loads(test_dataframe.to_json(orient='records'))
print(res)
[{'OrderID': '123', 'field': 'Date', 'fieldValue': '01-01-23', 'itemSeqNo': '1'}, {'OrderID': '123', 'field': 'Amount', 'fieldValue': '10.00', 'itemSeqNo': '1'}, {'OrderID': '123', 'field': 'description', 'fieldValue': 'Pencil', 'itemSeqNo': '1'}, {'OrderID': '123', 'field': 'Date', 'fieldValue': '01-02-23 ', 'itemSeqNo': '2'}, {'OrderID': '123', 'field': 'Amount', 'fieldValue': '11.00', 'itemSeqNo': '2'}, {'OrderID': '123', 'field': 'description', 'fieldValue': 'Pen ', 'itemSeqNo': '2'}]
2
Answers
You can convert it with pandas easily
Pyspark Solution
Pivot to reshape the dataframe
Pack the required columns into struct type
Group the dataframe by OrderID and collect list of structs
Pack OrderID into struct field
Export the dataframe to JSON