I am trying to split the data from one of the column in json / dict format to new rows and column.
e.g. Input
Column A | Column B | Column C |
---|---|---|
john | blue | [{city: "Manhattan", job_type: "PERM", sal_gp: 0d-1, age: 3.4}, {city: "Dallas", job_type: "TEMP", sal_gp: 0d-1, age: 1.4}] |
jack | gold | [{city: "San Diego", job_type: "TEMP", sal_gp: 0d-1, age: 5.44}] |
expected output
Column A | Column B | city | job_type | sal_gp | age |
---|---|---|---|---|---|
john | blue | Manhattan | PERM | 0d-1 | 3.4 |
john | blue | Dallas | TEMP | 0d-1 | 1.4 |
Jack | gold | San Diego | TEMP | 0d-1 | 5.44 |
output of df.to_dict(‘tight’):
{'index': [0,1], 'columns': ['Column A', 'Column B', 'Column C'], 'data': [['john','blue',[{city: "Manhattan", job_type: "PERM", sal_gp: 0d-1, age: 3.4}, {city: "Dallas", job_type: "TEMP", sal_gp: 0d-1, age:
1.4}]],'jack', 'gold', [{city: "San Diego", job_type: "TEMP", sal_gp: 0d-1, age: 5.44}]], 'index_names':[None], 'column_names': [None] }
I already tried json loads and pandas explode but it is not working.
I am not sure but is there any alternative solution for this?
2
Answers
Here is the minimal reproducible example (that you should have provided)
I’ve simplified a bit the names and the fields, since this is what "minimal" examples are. And it is reproducible, because anybody can copy and paste the code and try it at home.
The thing is, column C needs two layer of "expanding". One for the array, then another for the dicts in that array.
Let’s start by exploding arrays
Gives
So, half of the job is done: one row for each value in
C
array. Rest the expanding of fields of dict.We can find how to do it here
json_normalize
creates columns with fields ofC
.join
add those new columns. Anddrop
drops theC
column now that we no longer need itResult is:
Code
explode & apply
output:
If the results are not the same, please provide your example in code.
Example Code
Intermediate
df1