skip to Main Content

I am using the Facebook API (v2.10) to which I’ve extracted the data I need, 95% of which is perfect. My problem is the ‘actions‘ metric which returns as a dictionary within a list within another dictionary.

At present, all the data is in a DataFrame, however, the ‘actions’ column is a list of dictionaries that contain each individual action for that day.

{
    "actions": [
        {
            "action_type": "offsite_conversion.custom.xxxxxxxxxxx",
            "value": "7"
        },
        {
            "action_type": "offsite_conversion.custom.xxxxxxxxxxx",
            "value": "3"
        },
        {
            "action_type": "offsite_conversion.custom.xxxxxxxxxxx",
            "value": "144"
        },
        {
            "action_type": "offsite_conversion.custom.xxxxxxxxxxx",
            "value": "34"
        }]}

All this appears in one cell (row) within the DataFrame.

What is the best way to:

  • Get the action type, create a new column and use the Use “action_type” as the column name?
  • List the correct value under this column

It looks like JSON but when I look at the type, it’s a panda series (stored as an object).

For those willing to help (thank you, I greatly appreciate it) – can you either point me in the direction of the right material and I will read it and work it out on my own (I’m not entirely sure what to look for) or if you decide this is an easy problem, explain to me how and why you solved it this way. Don’t just want the answer

I have tried the following (with help from a friend) and it kind of works, but I have issues with this running in my script. IE: if it runs within a bigger code block, I get the following error:

for i in range(df.shape[0]):
    line = df.loc[i, 'Conversions']
    L = ast.literal_eval(line)
    for l in L:
        cid = l['action_type']
        value = l['value']
        df.loc[i, cid] = value

If I save the DF as a csv, call it using pd.read_csv…it executes properly, but not within the script. No idea why.

Error:

ValueError: malformed node or string: [{'value': '1', 'action_type': 'offsite_conversion.custom.xxxxx}]

Any help would be greatly appreciated.

Thanks,
Adrian

2

Answers


  1. You can use json_normalize:

    In [11]: d  # e.g. dict from json.load OR instead pass the json path to json_normalize
    Out[11]:
    {'actions': [{'action_type': 'offsite_conversion.custom.xxxxxxxxxxx',
       'value': '7'},
      {'action_type': 'offsite_conversion.custom.xxxxxxxxxxx', 'value': '3'},
      {'action_type': 'offsite_conversion.custom.xxxxxxxxxxx', 'value': '144'},
      {'action_type': 'offsite_conversion.custom.xxxxxxxxxxx', 'value': '34'}]}
    
    In [12]: pd.io.json.json_normalize(d, record_path="actions")
    Out[12]:
                                 action_type value
    0  offsite_conversion.custom.xxxxxxxxxxx     7
    1  offsite_conversion.custom.xxxxxxxxxxx     3
    2  offsite_conversion.custom.xxxxxxxxxxx   144
    3  offsite_conversion.custom.xxxxxxxxxxx    34
    
    Login or Signup to reply.
  2. You can use df.join(pd.DataFrame(df['Conversions'].tolist()).pivot(columns='action_type', values='value').reset_index(drop=True)).

    Explanation:
    df['Conversions'].tolist() returns a list of dictionaries. This list is then transformed into a DataFrame using pd.DataFrame. Then, you can use the pivot function to pivot the table into the shape that you want.

    Lastly, you can join the table with your original DataFrame. Note that this only works if you DataFrame’s index is the default (i.e., integers starting from 0). If this is not the case, you can do this instead:

    df2 = pd.DataFrame(df['Conversions'].tolist()).pivot(columns='action_type', values='value').reset_index(drop=True)
    for col in df2.columns:
    df[col] = df2[col]

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search