Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Converting JSON Dictionary to Dataframe

MatC
December 31, 2022
296 views
0 votes
2 Answers

I am relatively new to JSON and after going through several tutorials I have decided to reach out here for additional advice. I am trying to convert the following JSON dictionary via Jupyternotebook/Pandas into a dataframe and later into a csv file (data is extracted from a public API).

Example JSON dictionary without indentation:

json_search_result = 
{'inactive_count': 0, 'links': {'self': '/company/08892216/officers'}, 'etag': '1ccc4c74454bc359a9b9ef686d11e70cc03b6cb8', 'kind': 'officer-list', 'resigned_count': 0, 'total_results': 1, 'active_count': 1, 'items_per_page': 35, 'items': [{'date_of_birth': {'year': 1964, 'month': 2}, 'links': {'self': '/company/08892216/appointments/Ao1-pIsFQHy0FVSv26y6ep4Qhyw', 'officer': {'appointments': '/officers/JpJgO2pKVmGq5VHD7j7lluLW-jo/appointments'}}, 'officer_role': 'director', 'address': {'premises': '19', 'locality': 'London', 'country': 'United Kingdom', 'address_line_1': 'Norcott Road', 'postal_code': 'N16 7EJ'}, 'appointed_on': '2014-02-12', 'nationality': 'Irish', 'country_of_residence': 'England', 'name': 'FLOOD, Eileen', 'occupation': 'Company Director'}], 'start_index': 0}

With indentation for better visibility:

json_search_result = 
{'inactive_count': 0, 'links': {'self': '/company/08892216/officers'},
 'etag': '1ccc4c74454bc359a9b9ef686d11e70cc03b6cb8', 'kind': 'officer-list', 
'resigned_count': 0, 'total_results': 1, 'active_count': 1, 'items_per_page': 35, 'items': 
[{'date_of_birth': {'year': 1964, 'month': 2}, 'links': {'self': '/company/08892216/appointments/Ao1-pIsFQHy0FVSv26y6ep4Qhyw', 'officer': {'appointments': 
'/officers/JpJgO2pKVmGq5VHD7j7lluLW-jo/appointments'}}, 'officer_role': 'director', 'address': {'premises': '19', 'locality': 'London', 'country': 'United Kingdom', 
'address_line_1': 'Norcott Road', 'postal_code': 'N16 7EJ'},
 'appointed_on': '2014-02-12', 'nationality': 'Irish', 'country_of_residence': 'England', 
'name': 'FLOOD, Eileen', 'occupation': 'Company Director'}], 'start_index': 0}

I have tried among others json_normalize but I either get errors given this is a dictionary:

df2 =  pd.json_normalize(json_search_result)

Thank you for any advice on this issues!

Answers

Please try this if this helps your problem.
Good reference for further improvements: Convert list of dictionaries to a pandas DataFrame

json_search_result = {'inactive_count': 0, 'links': {'self': '/company/08892216/officers'}, 'etag': '1ccc4c74454bc359a9b9ef686d11e70cc03b6cb8', 'kind': 'officer-list', 'resigned_count': 0, 'total_results': 1, 'active_count': 1, 'items_per_page': 35, 'items': [{'date_of_birth': {'year': 1964, 'month': 2}, 'links': {'self': '/company/08892216/appointments/Ao1-pIsFQHy0FVSv26y6ep4Qhyw', 'officer': {'appointments': '/officers/JpJgO2pKVmGq5VHD7j7lluLW-jo/appointments'}}, 'officer_role': 'director', 'address': {'premises': '19', 'locality': 'London', 'country': 'United Kingdom', 'address_line_1': 'Norcott Road', 'postal_code': 'N16 7EJ'}, 'appointed_on': '2014-02-12', 'nationality': 'Irish', 'country_of_residence': 'England', 'name': 'FLOOD, Eileen', 'occupation': 'Company Director'}], 'start_index': 0}

pprint(json_search_result)

out = pd.DataFrame.from_dict(json_search_result, orient='index').T
print(out)
print(out.columns)

I assume that you will also want to flatten your dataframe since it contains additional nested information. So start by transforming you json string to a dataframe:

json_search_result = {'inactive_count': 0, 'links': {'self': '/company/08892216/officers'}, 'etag': '1ccc4c74454bc359a9b9ef686d11e70cc03b6cb8', 'kind': 'officer-list', 'resigned_count': 0, 'total_results': 1, 'active_count': 1, 'items_per_page': 35, 'items': [{'date_of_birth': {'year': 1964, 'month': 2}, 'links': {'self': '/company/08892216/appointments/Ao1-pIsFQHy0FVSv26y6ep4Qhyw', 'officer': {'appointments': '/officers/JpJgO2pKVmGq5VHD7j7lluLW-jo/appointments'}}, 'officer_role': 'director', 'address': {'premises': '19', 'locality': 'London', 'country': 'United Kingdom', 'address_line_1': 'Norcott Road', 'postal_code': 'N16 7EJ'}, 'appointed_on': '2014-02-12', 'nationality': 'Irish', 'country_of_residence': 'England', 'name': 'FLOOD, Eileen', 'occupation': 'Company Director'}], 'start_index': 0}
A = pd.DataFrame.from_dict(json_search_result, orient='index').T

which gives:

inactive_count                                   links  
0              0  {'self': '/company/08892216/officers'}   

                                       etag          kind resigned_count  
0  1ccc4c74454bc359a9b9ef686d11e70cc03b6cb8  officer-list              0   

  total_results active_count items_per_page  
0             1            1             35   

                                               items start_index  
0  [{'date_of_birth': {'year': 1964, 'month': 2},...           0

Now, this function will flatten ANY dataframe with nested json strings:


def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()

    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            exploded.index = df.index
            df = pd.concat([df, exploded], axis=1).drop(columns=[col])
            new_columns.extend(exploded.columns)  # inplace

        for col in list_columns:
            # print(f"exploding: {col}")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df

Applying it:

flatten_nested_json_df(A)

will result in:

  index inactive_count                                      etag  
0      0              0  1ccc4c74454bc359a9b9ef686d11e70cc03b6cb8   

           kind resigned_count total_results active_count items_per_page  
0  officer-list              0             1            1             35   

  start_index                  links.self  ...  items.occupation  
0           0  /company/08892216/officers  ...  Company Director   

  items.date_of_birth.year items.date_of_birth.month  
0                     1964                         2   

                                    items.links.self  
0  /company/08892216/appointments/Ao1-pIsFQHy0FVS...   

                    items.links.officer.appointments items.address.premises  
0  /officers/JpJgO2pKVmGq5VHD7j7lluLW-jo/appointm...                     19   

   items.address.locality  items.address.country items.address.address_line_1  
0                  London         United Kingdom                 Norcott Road   

  items.address.postal_code  
0                   N16 7EJ

As you can see, the nested fields have been made to columns:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 25 columns):
 #   Column                            Non-Null Count  Dtype 
---  ------                            --------------  ----- 
 0   index                             1 non-null      int64 
 1   inactive_count                    1 non-null      object
 2   etag                              1 non-null      object
 3   kind                              1 non-null      object
 4   resigned_count                    1 non-null      object
 5   total_results                     1 non-null      object
 6   active_count                      1 non-null      object
 7   items_per_page                    1 non-null      object
 8   start_index                       1 non-null      object
 9   links.self                        1 non-null      object
 10  items.officer_role                1 non-null      object
 11  items.appointed_on                1 non-null      object
 12  items.nationality                 1 non-null      object
 13  items.country_of_residence        1 non-null      object
 14  items.name                        1 non-null      object
 15  items.occupation                  1 non-null      object
 16  items.date_of_birth.year          1 non-null      int64 
 17  items.date_of_birth.month         1 non-null      int64 
 18  items.links.self                  1 non-null      object
 19  items.links.officer.appointments  1 non-null      object
 20  items.address.premises            1 non-null      object
 21  items.address.locality            1 non-null      object
 22  items.address.country             1 non-null      object
 23  items.address.address_line_1      1 non-null      object
 24  items.address.postal_code         1 non-null      object
dtypes: int64(3), object(22)
memory usage: 328.0+ bytes
None

Please signup or login to give your own answer.

Click here to cancel reply.