After creating a DataFrame with some duplicated cell values in column with the name ‘keys’:
import pandas as pd
df = pd.DataFrame({'keys': [1,2,2,3,3,3,3],'values':[1,2,3,4,5,6,7]})
I go ahead and create two more DataFrames which are the consolidated versions of the original DataFrame df
. Those newly created DataFrames will have no duplicated cell values under the ‘keys’ column:
df_sum = df_a.groupby('keys', axis=0).sum().reset_index()
df_mean = df_b.groupby('keys', axis=0).mean().reset_index()
As you can see df_sum['values']
cells values were all summed together.
While df_mean['values']
cell values were averaged with mean()
method.
Lastly I rename the ‘values’ column in both dataframes with:
df_sum.columns = ['keys', 'sums']
df_mean.columns = ['keys', 'means']
Now I would like to copy the df_mean['means']
column into the dataframe df_sum
.
How to achieve this?
The Photoshoped image below illustrates the dataframe I would like to create. Both ‘sums’ and ‘means’ columns are merged into a single DataFrame:
2
Answers
There are several ways to do this. Using the
merge
function off the dataframe is the most efficient.I think
pandas.merge()
is the function you are looking for. Likepd.merge(df_sum, df_mean, on = "keys")
. Besides, this result can also be summarized on oneagg
function as following: