skip to Main Content

I want to learn Pandas framework, so I find free csv with Euro data from kaggle.com
https://www.kaggle.com/datasets/piterfm/football-soccer-uefa-euro-1960-2024/data
But there’re plenty of columns which looks like this

subset['goals'][1]

"[{'phase': 'FIRST_HALF', 'time': {'minute': 7, 'second': 41}, 'international_name': 'Xavi Simons', 'club_shirt_name': 'Xavi', 'country_code': 'NED', 'national_field_position': 'FORWARD', 'national_jersey_number': '7', 'goal_type': 'SCORED'}, {'phase': 'FIRST_HALF', 'time': {'minute': 18, 'second': 34}, 'international_name': 'Harry Kane', 'club_shirt_name': 'Kane', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '9', 'goal_type': 'PENALTY'}, {'phase': 'SECOND_HALF', 'time': {'injuryMinute': 1, 'minute': 90, 'second': 1}, 'international_name': 'Ollie Watkins', 'club_shirt_name': 'Watkins', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '19', 'goal_type': 'SCORED'}]"

So I’d like to extract this data and manipulate on it
example dataframe

I’ve traied to use this code

import json
stdf = subset['goals'].apply(json.loads)
# stlst = list(stdf)
# stjson = json.dumps(stlst)
# subset.join(pandas.read_json(stjson))

But for stdf = subset['goals'].apply(json.loads) I’m getting the error message
the JSON object must be str, bytes or bytearray, not float

So, I don’t know how to solve this problem.

I guess I have to iterate over the goal column, I’ve tried something, but still the results were not what they should have been.

3

Answers


  1. Chosen as BEST ANSWER

    @Daweo - ok it's helpful but It's not working with NaN values. I've tried iterate over every row but it returned me ValueError.

    I will show it in a different perspective.

    this is my dataframe

    data = [[2036211,'Spain','England','','',''], [2036210,'Netherlands','England',1.0,2.0,"[{'phase': 'FIRST_HALF', 'time': {'minute': 7, 'second': 41}, 'international_name': 'Xavi Simons', 'club_shirt_name': 'Xavi', 'country_code': 'NED', 'national_field_position': 'FORWARD', 'national_jersey_number': '7', 'goal_type': 'SCORED'}, {'phase': 'FIRST_HALF', 'time': {'minute': 18, 'second': 34}, 'international_name': 'Harry Kane', 'club_shirt_name': 'Kane', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '9', 'goal_type': 'PENALTY'}, {'phase': 'SECOND_HALF', 'time': {'injuryMinute': 1, 'minute': 90, 'second': 1}, 'international_name': 'Ollie Watkins', 'club_shirt_name': 'Watkins', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '19', 'goal_type': 'SCORED'}]"]]
    
    df = pd.DataFrame(data, columns=['id_match' ,'home_team' ,'away_team' ,'home_score','away_score','goals'])
    

    How to extract goals column to have dataframe with column id_match, home_team, away_team, home_score, away_score, goals_phase, goals_time etc.

    If there are more than one goal, It returns many rows with the same id_match, home_team, away_team, home_score, away_score columns and unique from goals column.


  2. This

    "[{'phase': 'FIRST_HALF', 'time': {'minute': 7, 'second': 41}, 'international_name': 'Xavi Simons', 'club_shirt_name': 'Xavi', 'country_code': 'NED', 'national_field_position': 'FORWARD', 'national_jersey_number': '7', 'goal_type': 'SCORED'}, {'phase': 'FIRST_HALF', 'time': {'minute': 18, 'second': 34}, 'international_name': 'Harry Kane', 'club_shirt_name': 'Kane', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '9', 'goal_type': 'PENALTY'}, {'phase': 'SECOND_HALF', 'time': {'injuryMinute': 1, 'minute': 90, 'second': 1}, 'international_name': 'Ollie Watkins', 'club_shirt_name': 'Watkins', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '19', 'goal_type': 'SCORED'}]"
    

    is not JSON, but is valid python list and thus can be loaded using ast.literal_eval that is

    import ast
    import pandas as pd
    data_string = "[{'phase': 'FIRST_HALF', 'time': {'minute': 7, 'second': 41}, 'international_name': 'Xavi Simons', 'club_shirt_name': 'Xavi', 'country_code': 'NED', 'national_field_position': 'FORWARD', 'national_jersey_number': '7', 'goal_type': 'SCORED'}, {'phase': 'FIRST_HALF', 'time': {'minute': 18, 'second': 34}, 'international_name': 'Harry Kane', 'club_shirt_name': 'Kane', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '9', 'goal_type': 'PENALTY'}, {'phase': 'SECOND_HALF', 'time': {'injuryMinute': 1, 'minute': 90, 'second': 1}, 'international_name': 'Ollie Watkins', 'club_shirt_name': 'Watkins', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '19', 'goal_type': 'SCORED'}]"
    data = ast.literal_eval(data_string)
    df = pd.DataFrame(data)
    print(df)
    

    gives output

             phase  ... goal_type
    0   FIRST_HALF  ...    SCORED
    1   FIRST_HALF  ...   PENALTY
    2  SECOND_HALF  ...    SCORED
    
    [3 rows x 8 columns]
    
    Login or Signup to reply.
  3. So what you’ll need to do is first convert those string values into a dictionary. Then use explode to convert each if those values of the keys into rows. Then ultimately use json_normalize

    import pandas as pd
    import ast
    
    file = 'C:/Users/SchvaJ01/Downloads/2024.csv/2024.csv'
    df = pd.read_csv(file)
    
    
    # Convert string representations of lists to actual lists
    df['goals'] = df['goals'].fillna('[]').apply(ast.literal_eval)
    
    # Explode the list of dictionaries into separate rows
    df_exploded = df.explode('goals')
    
    # Normalize the nested dictionaries
    df_flattened = pd.json_normalize(df_exploded['goals'])
    
    
    print(df_flattened.head(5).to_string())
    

    Output:

    print(df_flattened.head(5).to_string())
             phase international_name   club_shirt_name country_code national_field_position national_jersey_number goal_type  time.minute  time.second  time.injuryMinute
    0  SECOND_HALF      Nico Williams  WILLIAMS ARTHUER          ESP                 FORWARD                     17    SCORED         47.0         10.0                NaN
    1  SECOND_HALF        Cole Palmer            Palmer          ENG              MIDFIELDER                     24    SCORED         73.0          9.0                NaN
    2  SECOND_HALF    Mikel Oyarzabal         Oyarzabal          ESP                 FORWARD                     21    SCORED         86.0         56.0                NaN
    3   FIRST_HALF        Xavi Simons              Xavi          NED                 FORWARD                      7    SCORED          7.0         41.0                NaN
    4   FIRST_HALF         Harry Kane              Kane          ENG                 FORWARD                      9   PENALTY         18.0         34.0                NaN
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search