skip to Main Content

After creating a DataFrame with some duplicated cell values in column with the name ‘keys’:

import pandas as pd
df = pd.DataFrame({'keys': [1,2,2,3,3,3,3],'values':[1,2,3,4,5,6,7]})

enter image description here

I go ahead and create two more DataFrames which are the consolidated versions of the original DataFrame df. Those newly created DataFrames will have no duplicated cell values under the ‘keys’ column:

df_sum = df_a.groupby('keys', axis=0).sum().reset_index()
df_mean = df_b.groupby('keys', axis=0).mean().reset_index()

As you can see df_sum['values'] cells values were all summed together.
While df_mean['values'] cell values were averaged with mean() method.
Lastly I rename the ‘values’ column in both dataframes with:

df_sum.columns = ['keys', 'sums']
df_mean.columns = ['keys', 'means']

enter image description here

Now I would like to copy the df_mean['means'] column into the dataframe df_sum.

How to achieve this?

The Photoshoped image below illustrates the dataframe I would like to create. Both ‘sums’ and ‘means’ columns are merged into a single DataFrame:

enter image description here

2

Answers


  1. There are several ways to do this. Using the merge function off the dataframe is the most efficient.

    df_both = df_sum.merge(df_mean, how='left', on='keys')
    
    df_both
    
    Out[1]:
       keys  sums  means
    0     1     1    1.0
    1     2     5    2.5
    2     3    22    5.5
    
    Login or Signup to reply.
  2. I think pandas.merge() is the function you are looking for. Like pd.merge(df_sum, df_mean, on = "keys"). Besides, this result can also be summarized on one agg function as following:

    df.groupby('keys')['values'].agg(['sum', 'mean']).reset_index()
    #  keys sum mean
    #0    1   1  1.0
    #1    2   5  2.5
    #2    3  22  5.5
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search