skip to Main Content

Hello Stackoverflowers!

I have a DataFrame, obtained from Facebook Marketing API, and I’d like to unnest a column into several rows.

This is the a sample of the data I obtained through the API:

ad_name      video_play_curve_actions
ad_1         [{'action_type': 'video_view', 'value': [100, 40, 16, 10, 7, 5, 4, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}]
ad_2         [{'action_type': 'video_view', 'value': [100, 51, 22, 13, 9, 7, 6, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}]

What I am looking for, is an outcome that looks like illustrated below

enter image description here

I’d need a for loop for this because the solution needs to work for many 100 rows.

I’ve added example data and desired output to this sheet: https://docs.google.com/spreadsheets/d/1jjbtJlfBNZV_wyyAoPY_scyn_jCNFD04XO1-JsztKAg/edit?usp=sharing

Really hope someone here can help me out.

Thanks in advance

Edit:

Thank you so much. Seems like there are multiple ways to fix it, but all the solutions included: pandas.explode: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html

Will definitely use it going forward.

Happy Wednesday

3

Answers


  1. You’re looking for pandas.explode: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html, plus some necessary preprocessing for your video_play_curve_actions column.

    import pandas as pd
    import argparse
    
    if __name__ == '__main__':
        parser = argparse.ArgumentParser()
        parser.add_argument('--filepath')
        args = parser.parse_args()
    
        df = pd.read_csv(args.filepath)
        df['video_play_curve_actions'] = df['video_play_curve_actions'].apply(lambda x: eval(x)[0]['value'])
        df_exploded = df.explode('video_play_curve_actions').rename(columns={'video_play_curve_actions': 'value'})
        print(df_exploded.head())
    

    output:

      ad_name value
    0    ad_1   100
    0    ad_1    40
    0    ad_1    16
    0    ad_1    10
    0    ad_1     7
    

    Note that I used eval here to process the values in video_play_curve_actions which isn’t always considered best practice. If the input contained double quotes " instead of single quotes ' we could have used json.loads instead.

    Login or Signup to reply.
  2. One way would be to grab the values of your video_play_curve_actions using str.split(), and then explode():

    (
      df.set_index('ad_name')
            .video_play_curve_actions.str.split('[').str[-1].str[:-3]
                .str.split(',').explode().str.strip().reset_index()
                )
    

    prints:

        ad_name video_play_curve_actions
    0      ad_1                      100
    1      ad_1                       40
    2      ad_1                       16
    3      ad_1                       10
    4      ad_1                        7
    ..      ...                      ...
    105    ad_5                        0
    106    ad_5                        0
    107    ad_5                        0
    108    ad_5                        0
    109    ad_5                        0
    
    Login or Signup to reply.
  3. Use ast.literal_eval to convert the strings to a python data structure (list of dict here) then explode and extract ‘value’ key:

    import ast
    
    out = df[['ad_name']].join(
              df['video_play_curve_actions'].apply(ast.literal_eval).explode()
                                            .apply(lambda x: x['value']).explode()
          ).reset_index(drop=True)
    

    Output:

    >>> out
        ad_name video_play_curve_actions
    0      ad_1                      100
    1      ad_1                       40
    2      ad_1                       16
    3      ad_1                       10
    4      ad_1                        7
    ..      ...                      ...
    105    ad_5                        0
    106    ad_5                        0
    107    ad_5                        0
    108    ad_5                        0
    109    ad_5                        0
    
    [110 rows x 2 columns]
    

    Note: The best way is probably to use directly the response from Facebook Marketing API rather than load data from an excel file.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search