skip to Main Content

I have a json file which when converted into a dataframe looks something like this:

sl no.    id    name    value    date
1        101    Math    90       -
2        101    Phy     87       -
3        201    Math    85       -
4        201    Phy     93       -

(Not the actual data but of the same principle with multiple repeating entries with one differing category)

What I’m trying to achieve looks something like this:

sl no.    id    Math    Phy    date
1        101    90      87     -
...

Is there any way to easily convert these categorical entries into columns of the same name instead of having multiple repeating entries for each student?

2

Answers


  1. Yes, you can achieve this using the pivot_table function in pandas. Here’s a sample code snippet:

    import pandas as pd
    
    # Assuming df is your DataFrame containing the JSON data
    df_pivoted = pd.pivot_table(df, index=['sl no.', 'id', 'date'], columns='name', values='value').reset_index()
    
    # Rename columns if needed
    df_pivoted.columns.name = None
    
    # Display the pivoted DataFrame
    print(df_pivoted)
    

    This will create a DataFrame where each unique value in the ‘name’ column becomes a separate column, with corresponding ‘value’ entries under each student’s ‘id’.

    Login or Signup to reply.
  2. You can use pandas.pivot() to do this. Assuming id is the column you want to use for each row, this will be your index (you can pass a list if there are other columns to be used here). name is the column to turn into the columns, and values specifies the data to show in the new pivoted dataframe. Which looks like this:

    df.pivot(index='id', columns='name', values=['value', 'date'])
    

    Which returns:

         value       date       
    name  Math Phy   Math    Phy
    id                          
    101     90  87  date1  date2
    201     85  93  date3  date4
    

    (Assuming you want to keep both the maths and phy dates)

    To remove the multi-index columns, you can rename them using:

    df_pivot = df.pivot(index='id', columns='name', values=['value', 'date'])
    df_pivot.columns = [' '.join(cols) for cols in df_pivot.columns]
    

    Giving:

        value Math value Phy date Math date Phy
    id                                         
    101         90        87     date1    date2
    201         85        93     date3    date4
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search