I have this json file which contains a list of objects
[ {
"infos": [
{ "localisation": "Kassel" },
{ "date": "01.08.2023" },
{ "places": "1 freier Platz" }
],
"applying email": "[email protected]",
"additional infos": {
"name": "jhon",
"position": "RH",
"email": "[email protected]" }
},
{
"infos": [
{ "places": "1 freier Platz" }
],
"applying email": "",
"applying link": "www.exemple.com",
"additional infos": {
"name": "jhon",
"position": "RH",
"email": "[email protected]"
}
},]
and I want to convert it into csv file (the object contains other fields but I wanted to make is as simple as possible) the problem is that the file contains nested objects and some field in the nested objects are missing from other objects in the list as what you can see in the infos fields (in the first item on the list you can see a localisation field under the infos field but it’s missing in the second object of the list), what i want to achieve is that my csv file header will contain all the field (the missing ones too) and if it didn’t find that missing field in the json object while appending data to it , i want to ignore it and let the place empty
3
Answers
Python has a module called "csv" that has methods for reading and writing CSV files. You can use the csv.DictReader() method to read a JSON file into a CSV file. The method takes in a JSON object, converts it into a csv file, and returns a reader object that can be used to iterate over the rows of the resulting csv file.
My preferred method to save to CSV is with
DataFrame.to_csv
and pandas also has ajson_normalize
method which could have been the shortest way to flatten this, but I’m not quite sure how to handle theinfos
lists.However, if you parse the JSON into a list of python dictionaries (
pData
)then you can flatten it with this
flattenObj
function(view examples) before converting to pandas DataFrames and the saving as CSV:I ran into a similar issue with a recent project. I used the built in csv module
this assumes you already have your data in the format you need, in my case, a list of dictionaries that I pass into the function. It reads through one dictionary at a time placing each record in its corresponding column based on the key names that are defined in field_names.
Im brand new to this so I’ve probably done this in a terribly inefficient way but it worked for me.