Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – How to split pandas column dictionary into new columns and rows (explode not working)

rootuser3214
October 3, 2023
236 views
0 votes
2 Answers

I am trying to split the data from one of the column in json / dict format to new rows and column.
e.g. Input

Column A	Column B	Column C
john	blue	[{city: "Manhattan", job_type: "PERM", sal_gp: 0d-1, age: 3.4}, {city: "Dallas", job_type: "TEMP", sal_gp: 0d-1, age: 1.4}]
jack	gold	[{city: "San Diego", job_type: "TEMP", sal_gp: 0d-1, age: 5.44}]

expected output

Column A	Column B	city	job_type	sal_gp	age
john	blue	Manhattan	PERM	0d-1	3.4
john	blue	Dallas	TEMP	0d-1	1.4
Jack	gold	San Diego	TEMP	0d-1	5.44

output of df.to_dict(‘tight’):

{'index': [0,1], 'columns': ['Column A', 'Column B', 'Column C'], 'data': [['john','blue',[{city: "Manhattan", job_type: "PERM", sal_gp: 0d-1, age: 3.4}, {city: "Dallas", job_type: "TEMP", sal_gp: 0d-1, age:
1.4}]],'jack', 'gold',  [{city: "San Diego", job_type: "TEMP", sal_gp: 0d-1, age: 5.44}]], 'index_names':[None], 'column_names': [None] }

I already tried json loads and pandas explode but it is not working.
I am not sure but is there any alternative solution for this?

Answers

Here is the minimal reproducible example (that you should have provided)

df=pd.DataFrame({'A':['john', 'jack'], 'B':['blue','gold'], 'C':[[{'city':'Manhattan', 'job':'perm'}, {'city':'Dallas', 'job':'temp'}], [{'city':'San Diego', 'job':'temp'}]]})

I’ve simplified a bit the names and the fields, since this is what "minimal" examples are. And it is reproducible, because anybody can copy and paste the code and try it at home.

	A	B	C
0	john	blue	[{‘city’: ‘Manhattan’, ‘job’: ‘perm’}, {‘city’: ‘Dallas’, ‘job’: ‘temp’}]
1	jack	gold	[{‘city’: ‘San Diego’, ‘job’: ‘temp’}]

The thing is, column C needs two layer of "expanding". One for the array, then another for the dicts in that array.

Let’s start by exploding arrays

dfc=df.explode('C').reset_index(drop=True)

Gives

	A	B	C
0	john	blue	{‘city’: ‘Manhattan’, ‘job’: ‘perm’}
1	john	blue	{‘city’: ‘Dallas’, ‘job’: ‘temp’}
2	jack	gold	{‘city’: ‘San Diego’, ‘job’: ‘temp’}

So, half of the job is done: one row for each value in C array. Rest the expanding of fields of dict.

We can find how to do it here

dfc.join(pd.json_normalize(dfc.C)).drop(columns='C')

json_normalize creates columns with fields of C. join add those new columns. And drop drops the C column now that we no longer need it

Result is:

	A	B	city	job
0	john	blue	Manhattan	perm
1	john	blue	Dallas	temp
2	jack	gold	San Diego	temp

Code

explode & apply

df1 = df.explode('Column C', ignore_index=True)
df1[['Column A', 'Column B']].join(df1['Column C'].apply(pd.Series))

output:

    Column A    Column B    city        job_type    sal_gp  age
0   john        blue        Manhattan   PERM        0d-1    3.40
1   john        blue        Dallas      TEMP        0d-1    1.40
2   jack        gold        San Diego   TEMP        0d-1    5.44

If the results are not the same, please provide your example in code.

Example Code

import pandas as pd
data1 = {'Column A': ['john', 'jack'], 'Column B': ['blue', 'gold'], 
         'Column C': [[{'city': "Manhattan", 'job_type': "PERM", 'sal_gp': '0d-1', 'age': 3.4}, 
                       {'city': "Dallas", 'job_type': "TEMP", 'sal_gp': '0d-1', 'age': 1.4}], 
                      [{'city': "San Diego", 'job_type': "TEMP", 'sal_gp': '0d-1', 'age': 5.44}]]}
df = pd.DataFrame(data1)

Intermediate

df1

Column A    Column B    Column C
0   john    blue    {'city': 'Manhattan', 'job_type': 'PERM', 'sal...
1   john    blue    {'city': 'Dallas', 'job_type': 'TEMP', 'sal_gp...
2   jack    gold    {'city': 'San Diego', 'job_type': 'TEMP', 'sal...

Please signup or login to give your own answer.

Click here to cancel reply.