skip to Main Content

I have a Pandas DataFrame with this column:
This is an extraction from a database in Mongo but I don’t know how to handle a column containing both [] and {}:

Problem

How can split this column into two columns?

Desired result:

Desired result

Thanks for your help !

4

Answers


  1. You can create a list of dictionaries (instead of a list of lists with dictionaries), then a dataframe and join this to the original df.

    import pandas as pd
    
    data = {"coeff":[[{"value": 0.0641, "year":2000}],
                     [{"value": 0.0641, "year":2000}],
                     [{"value": 0.0641, "year":2000}],
                     [{"value": 0.0652, "year":2005}]]}
    
    df = pd.DataFrame(data)
    #                                coeff
    # 0  [{'value': 0.0641, 'year': 2000}]
    # 1  [{'value': 0.0641, 'year': 2000}]
    # 2  [{'value': 0.0641, 'year': 2000}]
    # 3  [{'value': 0.0652, 'year': 2005}]
    
    df = df.join(pd.DataFrame([x[0] for x in df.coeff]))
    #                                coeff   value  year
    # 0  [{'value': 0.0641, 'year': 2000}]  0.0641  2000
    # 1  [{'value': 0.0641, 'year': 2000}]  0.0641  2000
    # 2  [{'value': 0.0641, 'year': 2000}]  0.0641  2000
    # 3  [{'value': 0.0652, 'year': 2005}]  0.0652  2005
    
    Login or Signup to reply.
  2. pandas has a function to construct DF from dictionaries

    import pandas as pd

    my_data = {"coeff":[[{"value": 0.0641, "year":2000}],
                     [{"value": 0.0641, "year":2000}],
                     [{"value": 0.0641, "year":2000}],
                     [{"value": 0.0652, "year":2005}]]
               }
    
    df = pd.DataFrame(my_data)
    
    df2 = pd.DataFrame.from_records(d[0] for d in df['coeff'])
    
    print(df2)
    

    gives:

        value  year
    0  0.0641  2000
    1  0.0641  2000
    2  0.0641  2000
    3  0.0652  2005
    
    Login or Signup to reply.
  3. Combine explode and json_normalize:

    out = pd.json_normalize(df['coeff'].explode())
    

    Or, if you have only one dictionary per list:

    out = pd.json_normalize(df['coeff'].str[0])
    

    Or usig from_records:

    out = pd.DataFrame.from_records(df['coeff'].str[0])
    

    Output:

        value  year
    0  0.0641  2000
    1  0.0641  2000
    2  0.0641  2000
    3  0.0652  2005
    
    Login or Signup to reply.
  4. Create a df out of your base data:

    data = {"coeff":[[{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0641, "year":2000}],
                 [{"value": 0.0652, "year":2005}]]}
    

    enter image description here

    Each element in the df is a dictionary within a list. Isolate the dictionary within the list using the apply method and a lambda function to access the first element in the list (the dictionary).

    Use .values() to retrieve the dictionary values (year and value) which will exist as an object with dtype dict_values.

    The dytpe of dict_values is pretty limiting so wrap it in a list function to convert to a list so you can use slicing and inxdexing:

    df2 = df.coeff.apply(lambda x: list(x[0].values())) 
    

    enter image description here

    Use the apply method with a lambda function and index positions to retrieve the years and values respectively, assign these to their respective column names within a dictionary and pass this as an arguement into the pd.DataFrame class to create a new dataframe:

    pd.DataFrame(data = {'year': df2.apply(lambda y: y[1]),
                     'value':df2.apply(lambda y: y[0])})
    

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search