Json - How to convert categorical entries in a pd dataframe row into a column?

Snak
June 12, 2024
159 views
1 vote
2 Answers

I have a json file which when converted into a dataframe looks something like this:

sl no.    id    name    value    date
1        101    Math    90       -
2        101    Phy     87       -
3        201    Math    85       -
4        201    Phy     93       -

(Not the actual data but of the same principle with multiple repeating entries with one differing category)

What I’m trying to achieve looks something like this:

sl no.    id    Math    Phy    date
1        101    90      87     -
...

Is there any way to easily convert these categorical entries into columns of the same name instead of having multiple repeating entries for each student?

Answers

- MuhammadAteeq
- June 12, 2024 at 2:48 pm
- 0 votes
0
Yes, you can achieve this using the pivot_table function in pandas. Here’s a sample code snippet:
```
import pandas as pd

# Assuming df is your DataFrame containing the JSON data
df_pivoted = pd.pivot_table(df, index=['sl no.', 'id', 'date'], columns='name', values='value').reset_index()

# Rename columns if needed
df_pivoted.columns.name = None

# Display the pivoted DataFrame
print(df_pivoted)
```
This will create a DataFrame where each unique value in the ‘name’ column becomes a separate column, with corresponding ‘value’ entries under each student’s ‘id’.
Login or Signup to reply.

- EmiOB
- June 12, 2024 at 2:51 pm
- 0 votes
0
You can use pandas.pivot() to do this. Assuming id is the column you want to use for each row, this will be your index (you can pass a list if there are other columns to be used here). name is the column to turn into the columns, and values specifies the data to show in the new pivoted dataframe. Which looks like this:
```
df.pivot(index='id', columns='name', values=['value', 'date'])
```
Which returns:
```
     value       date       
name  Math Phy   Math    Phy
id                          
101     90  87  date1  date2
201     85  93  date3  date4
```
(Assuming you want to keep both the maths and phy dates)

To remove the multi-index columns, you can rename them using:
```
df_pivot = df.pivot(index='id', columns='name', values=['value', 'date'])
df_pivot.columns = [' '.join(cols) for cols in df_pivot.columns]
```
Giving:
```
    value Math value Phy date Math date Phy
id                                         
101         90        87     date1    date2
201         85        93     date3    date4
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Json – How to convert categorical entries in a pd dataframe row into a column?

Answers