skip to Main Content

Here is my pd.DataFrame with its metadata column:

      date           metadata
0     2022-12-03     [{'key': 'key1', 'value': value0.1'}, {'key': 'key2', 'value': value0.2'}, {'key': 'key3', 'value': value0.3'}]
1     2022-12-07     [{'key': 'key1', 'value': value1.1'}, {'key': 'key2', 'value': value1.2'}, {'key': 'key3', 'value': value1.3'}]
2     2022-12-02     [{'key': 'key1', 'value': value2.1'}, {'key': 'key2', 'value': value2.2'}, {'key': 'key3', 'value': value2.3'}]
3     2022-12-01     [{'key': 'key1', 'value': value3.1'}, {'key': 'key2', 'value': value3.2'}, {'key': 'key3', 'value': value3.3'}]

What can I do so it becomes:

      date           key1         key2         key3
0     2022-12-03     value0.1     value0.2     value0.3
1     2022-12-07     value1.1     value1.2     value1.3
2     2022-12-02     value2.1     value2.2     value2.3
3     2022-12-01     value3.1     value3.2     value3.3

Edit:

I don’t know the names of the keys neither their number.

3

Answers


  1. Use list with nested dict comprehension for extract keys and values of dictionaries, pass to DataFrame and append to original DataFrame, DataFrame.pop is used for remove column after processing:

    import ast
    
    #if necessary
    #df['metadata'] = df['metadata'].apply(ast.literal_eval)
    
    df1 = pd.DataFrame([{y['key']:y['value'] for y in x} for x in df.pop('metadata')], 
                       index=df.index)
    df = df.join(df1)
    print (df)
             date      key1      key2      key3
    0  2022-12-03  value0.1  value0.2  value0.3
    1  2022-12-07  value1.1  value1.2  value1.3
    2  2022-12-02  value2.1  value2.2  value2.3
    3  2022-12-01  value3.1  value3.2  value3.3
    

    If always 2 values of dictionary for each list use:

    df1 = pd.DataFrame([dict(y.values() for y in x) for x in df.pop('metadata')], 
                       index=df.index)
    df = df.join(df1)
    print (df)
             date      key1      key2      key3
    0  2022-12-03  value0.1  value0.2  value0.3
    1  2022-12-07  value1.1  value1.2  value1.3
    2  2022-12-02  value2.1  value2.2  value2.3
    3  2022-12-01  value3.1  value3.2  value3.3
    

    EDIT: Final solution, df.index is not necessary if default RangeIndex:

    df = df.join(pd.DataFrame([{y['key']: y['value'] for y in x} for x in df.pop('metadata')]))
    
    Login or Signup to reply.
  2. You can also – although this is not very efficient – obtain a df from the metadata in each row and subsequently use pivot to obtain a long-to-wide transformation:

    df_list = []
    for i in range(len(df.index)):
        d = pd.DataFrame(df.iloc[i,]["metadata"])
        d["ID"] = i
        df_list.append(d)
    d_all = pd.concat(df_list)
    d_all.pivot(index="ID",columns="key", values="value")
    
    Login or Signup to reply.
  3. Use json_normalize and pivot on the Series after explode:

    s = df.pop('metadata').explode()
    
    df = df.join(pd.json_normalize(s).set_index(s.index)
                   .pivot(columns='key', values='value'))
    

    Output:

             date      key1      key2      key3
    0  2022-12-03  value0.1  value0.2  value0.3
    1  2022-12-07  value1.1  value1.2  value1.3
    2  2022-12-02  value2.1  value2.2  value2.3
    3  2022-12-01  value3.1  value3.2  value3.3
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search