skip to Main Content

I am unfamiliar with how to convert Json objects to Dataframe. I am hoping to get some ideas of more efficient ways to convert Json object to data frame.

As I loop through 2 rows of my input_df, I will get 2 json objects when I print(suggestion_string) as follow:

enter image description here

My expected output would be a data frame as follow:

suggestion1 reason1 suggestion2 reason2 suggestion3 reason3
free text free text free text free text free text free text
free text free text free text free text free text free text

I had attempted with the codes as follows, and it worked. I am hoping to get some ideas of more efficient ways to convert json object to data frame.

suggestion_list = []

for i in range(len(input_df)):

    description = input_df.loc[i, 'description']
    
    query = fr"""
    sample text ... {description}?
    """
    
    suggestion_string = return_json_object(query)
    string = suggestion_string.replace("`",'')
    string = string.replace('json', '')
    str_list = string.split('n')
    dict_str = ''.join(str_list)
    output_dict = json.loads(dict_str)
    suggestion_list.append(output_dict)

lists_of_dicts = []
for dict in suggestion_list:
    list_of_dicts.append(dict['suggestions'])

flat_data = []
for sublist in list_of_dicts:
    row_data = {}
    for i, item in enumerate(sublist):
        row_data[f'suggestion{i+1}'] = item['suggestion']
        row_data[f'reason{i+1}'] = item['reason']
    flat_data.append(row_data)

suggestion_df = pd.DataFrame(flat_data)

Thank you for your time!

2

Answers


  1. Here is a function and a sample json file to test it on. This function is applicable to any type of json, with or without nested fields.

    data = [
        {
            "id": 1,
            "name": "John Doe",
            "email": "[email protected]",
            "location": {
                "city": "New York",
                "country": "USA"
            },
            "skills": ["Python", "Data Analysis"]
        },
        {
            "id": 2,
            "name": "Jane Doe",
            "email": "[email protected]",
            "location": {
                "city": "San Francisco",
                "country": "USA"
            },
            "skills": ["JavaScript", "React"]
        }
    ]
    
    df = pd.DataFrame(data)
    
    def flatten_nested_json_df(df):
        df = df.reset_index()
        s = (df.applymap(type) == list).all()
        list_columns = s[s].index.tolist()
        
        s = (df.applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    
        while len(list_columns) > 0 or len(dict_columns) > 0:
            new_columns = []
    
            for col in dict_columns:
                horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
                horiz_exploded.index = df.index
                df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
                new_columns.extend(horiz_exploded.columns) # inplace
    
            for col in list_columns:
                df = df.drop(columns=[col]).join(df[col].explode().to_frame())
                new_columns.append(col)
    
            s = (df[new_columns].applymap(type) == list).all()
            list_columns = s[s].index.tolist()
    
            s = (df[new_columns].applymap(type) == dict).all()
            dict_columns = s[s].index.tolist()
        return df
    

    Apllying it

    flatten_nested_json_df(df)
    

    retunrs

       index  id      name             email  location.city location.country  
    0      0   1  John Doe  [email protected]       New York              USA   
    0      0   1  John Doe  [email protected]       New York              USA   
    1      1   2  Jane Doe  [email protected]  San Francisco              USA   
    1      1   2  Jane Doe  [email protected]  San Francisco              USA   
    
              skills  
    0         Python  
    0  Data Analysis  
    1     JavaScript  
    1          React
    
    Login or Signup to reply.
  2. I have created dummy suggestion strings – but you can continue your for loop by modifying the following code and incorporating your input_df logic. I have hardcoded the suggestions strings for simplicity.

    Code:

    import pandas as pd
    import json
    
    suggestion_string_1 = '''
    ```
    json
    {
        "suggestions": [
            {"suggestion": "free text", "reason": "free text"},
            {"suggestion": "free text", "reason": "free text"},
            {"suggestion": "free text", "reason": "free text"}
        ]
    }
    ```
    '''
    
    suggestion_string_2 = '''
    ```
    json
    {
        "suggestions": [
            {"suggestion": "free text", "reason": "free text"},
            {"suggestion": "free text", "reason": "free text"},
            {"suggestion": "free text", "reason": "free text"}
        ]
    }
    ```
    '''
    
    suggestion_list = []
    input_df = [suggestion_string_1, suggestion_string_2]
    
    for i in range(len(input_df)):
        # description = input_df.loc[i, 'description']
        #
        # query = fr"""
        # sample text ... {description}?
        # """
        #
        # suggestion_string = return_json_object(query)
    
        # I have hard-coded the suggestion_string for brevity - 
        # Assuming you are getting the same strings that I have 
        # hard-coded as per the screenshot shared by you.
        suggestion_string = input_df[i] # So, you can just replace this line with your logic for getting the suggestion string.
        suggestions = json.loads(
            suggestion_string[suggestion_string.find('{'):suggestion_string.rfind('}') + 1]
        )['suggestions']
        suggestion_list.append(suggestions)
    
    suggestion_df = pd.DataFrame()
    
    for idx, suggestions in enumerate(suggestion_list):
        for i, suggestion in enumerate(suggestions):
            suggestion_df.loc[idx, f'suggestion{i+1}'] = suggestion['suggestion']
            suggestion_df.loc[idx, f'reason{i+1}'] = suggestion['reason']
            
    print(suggestion_df)
    

    Output:

    suggestion1 reason1 suggestion2 reason2 suggestion3 reason3
    free text free text free text free text free text free text
    free text free text free text free text free text free text
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search