skip to Main Content

I have this json file which contains a list of objects

[ {
    "infos": [
      { "localisation": "Kassel" },
      { "date": "01.08.2023" },
      { "places": "1 freier Platz" }
    ],

    "applying email": "[email protected]",
    "additional infos": { 
       "name": "jhon", 
       "position": "RH", 
       "email": "[email protected]" }
  },
  {
    "infos": [
      { "places": "1 freier Platz" }
    ],
    "applying email": "",
    "applying link": "www.exemple.com",
    "additional infos": {
       "name": "jhon", 
       "position": "RH", 
       "email": "[email protected]"
    }
  },] 

and I want to convert it into csv file (the object contains other fields but I wanted to make is as simple as possible) the problem is that the file contains nested objects and some field in the nested objects are missing from other objects in the list as what you can see in the infos fields (in the first item on the list you can see a localisation field under the infos field but it’s missing in the second object of the list), what i want to achieve is that my csv file header will contain all the field (the missing ones too) and if it didn’t find that missing field in the json object while appending data to it , i want to ignore it and let the place empty

3

Answers


  1. Python has a module called "csv" that has methods for reading and writing CSV files. You can use the csv.DictReader() method to read a JSON file into a CSV file. The method takes in a JSON object, converts it into a csv file, and returns a reader object that can be used to iterate over the rows of the resulting csv file.

    Login or Signup to reply.
  2. My preferred method to save to CSV is with DataFrame.to_csv and pandas also has a json_normalize method which could have been the shortest way to flatten this, but I’m not quite sure how to handle the infos lists.

    However, if you parse the JSON into a list of python dictionaries (pData)

    import json
    jData = '''
    [ { "infos": [{ "localisation": "Kassel" }, { "date": "01.08.2023" }, { "places": "1 freier Platz" }],
        "applying email": "[email protected]",
        "additional infos": {"name": "jhon", "position": "RH", "email": "[email protected]" }  },
      { "infos": [{ "places": "1 freier Platz" }],
        "applying email": "", "applying link": "www.exemple.com",
        "additional infos": {"name": "jhon", "position": "RH", "email": "[email protected]"} }] 
    ''' ## I removed the last comma bc it was causing error when parsing
    
    pData = json.loads(jData) 
    

    then you can flatten it with this flattenObj function(view examples) before converting to pandas DataFrames and the saving as CSV:

    import pandas as pd
    csv_path = 'x.csv' ## path/name of csv file to save to
    
    # pd.DataFrame([flattenObj(d, kSep=' > ') for d in pData]).to_csv(csv_path, index=False)
    df = pd.DataFrame([flattenObj(d, kSep=' > ') for d in pData])
    df.to_csv(csv_path, index=False)
    
    Login or Signup to reply.
  3. I ran into a similar issue with a recent project. I used the built in csv module

    def generate_csv(records):
    field_names = ['Ticker', 'Quantity', 'Cost Per Share',         "Date"]
    with open('positions.csv', 'w') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=field_names)
        writer.writeheader()
        writer.writerows(records)
    

    this assumes you already have your data in the format you need, in my case, a list of dictionaries that I pass into the function. It reads through one dictionary at a time placing each record in its corresponding column based on the key names that are defined in field_names.

    Im brand new to this so I’ve probably done this in a terribly inefficient way but it worked for me.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search