I want to learn Pandas framework, so I find free csv with Euro data from kaggle.com
https://www.kaggle.com/datasets/piterfm/football-soccer-uefa-euro-1960-2024/data
But there’re plenty of columns which looks like this
subset['goals'][1]
"[{'phase': 'FIRST_HALF', 'time': {'minute': 7, 'second': 41}, 'international_name': 'Xavi Simons', 'club_shirt_name': 'Xavi', 'country_code': 'NED', 'national_field_position': 'FORWARD', 'national_jersey_number': '7', 'goal_type': 'SCORED'}, {'phase': 'FIRST_HALF', 'time': {'minute': 18, 'second': 34}, 'international_name': 'Harry Kane', 'club_shirt_name': 'Kane', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '9', 'goal_type': 'PENALTY'}, {'phase': 'SECOND_HALF', 'time': {'injuryMinute': 1, 'minute': 90, 'second': 1}, 'international_name': 'Ollie Watkins', 'club_shirt_name': 'Watkins', 'country_code': 'ENG', 'national_field_position': 'FORWARD', 'national_jersey_number': '19', 'goal_type': 'SCORED'}]"
So I’d like to extract this data and manipulate on it
example dataframe
I’ve traied to use this code
import json
stdf = subset['goals'].apply(json.loads)
# stlst = list(stdf)
# stjson = json.dumps(stlst)
# subset.join(pandas.read_json(stjson))
But for stdf = subset['goals'].apply(json.loads)
I’m getting the error message
the JSON object must be str, bytes or bytearray, not float
So, I don’t know how to solve this problem.
I guess I have to iterate over the goal column, I’ve tried something, but still the results were not what they should have been.
3
Answers
@Daweo - ok it's helpful but It's not working with NaN values. I've tried iterate over every row but it returned me ValueError.
I will show it in a different perspective.
this is my dataframe
How to extract goals column to have dataframe with column
id_match, home_team, away_team, home_score, away_score, goals_phase, goals_time etc.
If there are more than one goal, It returns many rows with the same
id_match, home_team, away_team, home_score, away_score
columns and unique fromgoals
column.This
is not JSON, but is valid python list and thus can be loaded using
ast.literal_eval
that isgives output
So what you’ll need to do is first convert those string values into a dictionary. Then use
explode
to convert each if those values of the keys into rows. Then ultimately usejson_normalize
Output: