Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – pandas dataframe breaking long table to multiple tables

Nathi
December 6, 2023
141 views
0 votes
2 Answers

I have a dataframe which has a structure like below.
id, name, value, employee_0_salary, employee_0_skills_0_id, employee_0_skills_1_id, employee_1_salary, employee_1_skills_0_id so on..

Basically formed from a flattened json. But the number of columns exceed 1024. I couldn’t copy the structure of json here but its a nested json with lists, arrays and dictionaries.

Now I want to split the dataframe so that the id column has duplicates but empoyee details comes as columns.

id salary skills_id

How to do it in pandas?

TIA

Answers

try this

import pandas as pd

df = pd.DataFrame({
    'id': [1, 2],
    'employee_0_salary': [30000, 35000],
    'employee_0_skills_0_id': [101, 102],
    'employee_0_skills_1_id': [103, 104],
    'employee_1_salary': [32000, 36000],
    'employee_1_skills_0_id': [105, 106],
    # Add other columns...
})


reshaped_data = []

for employee in range(2):  # Update this based on the actual number of employees
    for skill in range(2):  # Update this based on the actual number of skills per employee
        salary_col = f'employee_{employee}_salary'
        skill_col = f'employee_{employee}_skills_{skill}_id'

        temp_df = df[['id', salary_col, skill_col]].copy()

        temp_df.rename(columns={salary_col: 'salary', skill_col: 'skills_id'}, inplace=True)

        reshaped_data.append(temp_df)

final_df = pd.concat(reshaped_data)

final_df = final_df.drop_duplicates().reset_index(drop=True)

print(final_df)

maybe melt() does what you want?

df = pd.DataFrame({
    'id': [1, 2],
    'name': ['joe','sue'],
    'employee_0_salary': [30000, 35000],
    'employee_0_skills_0_id': [101, 102],
    'employee_0_skills_1_id': [103, 104],
    'employee_1_salary': [32000, 36000],
    'employee_1_skills_0_id': [105, 106],
    # Add other columns...
})

print(df)
   id name  employee_0_salary  employee_0_skills_0_id  employee_0_skills_1_id  
0   1  joe              30000                     101                     103   
1   2  sue              35000                     102                     104   

print(df.melt(id_vars=['id','name']))
   id name                variable  value
0   1  joe       employee_0_salary  30000
1   2  sue       employee_0_salary  35000
2   1  joe  employee_0_skills_0_id    101
3   2  sue  employee_0_skills_0_id    102
4   1  joe  employee_0_skills_1_id    103
5   2  sue  employee_0_skills_1_id    104
6   1  joe       employee_1_salary  32000
7   2  sue       employee_1_salary  36000
8   1  joe  employee_1_skills_0_id    105
9   2  sue  employee_1_skills_0_id    106

Please signup or login to give your own answer.

Click here to cancel reply.