Ok, this is driving me crazy, I can’t figure out what is going on with this which should be a very simple seaborn barplot. I have the following dataframe:
Users Count App Name
0 224244 Facebook
1 202253 Messenger – Text and Video Chat for Free
2 164995 Google Play Games
3 151713 Instagram
4 132949 Google Pay
5 126474 Netflix
6 122877 Pinterest
7 110071 Amazon Shopping
8 108515 Samsung Pay
9 108204 Pandora Music
10 105865 Snapchat
11 103858 Pinterest
12 101676 Uber
13 94741 Yelp: Food Shopping Services Nearby
14 83623 Wish - Shopping Made Fun
15 83327 Spotify Music
16 81678 WhatsApp Messenger
17 76064 Amazon Prime Video
18 68823 eBay: Shop Deals - Home Fashion & Electronics
19 64677 Amazon Shopping
20 59513 Walmart
All I’m doing is trying to plot a Seaborn barplot out of it:
plt.figure(figsize=(8,6))
sns.barplot(x='Users Count', y='App Name', data=pop_apps_df, alpha=0.8, ci=None)
plt.title('20 Most Popular Apps')
plt.ylabel('App Name', fontsize=12)
plt.xlabel('Number of Unique MAIDs', fontsize=12)
plt.show()
The plot, however, is somehow not pulling the correct values from ‘User Count’ that are equivalent to the rows on column ‘App Name’ (note the bar for Amazon Shopping). What on earth could be causing this and what can I do?? I’ve searched high and low and I can’t find an answer or a solution.
2
Answers
Amazon Shopping
appears twice in your Dataframe (lines 7 and 19), with different values. Remove the duplicate value(s) and everything should be fine.Also, you may want to increase the height of the plot to display all rows.
Per
seaborn.barplot
docs:Because ‘Amazon Shopping’ appears twice, the mean of the lower and higher value is reflected in barplot. To resolve, consider two solutions:
Adjust default
estimator
argument with an aggregate/statistical function such asnumpy.sum
:Aggregate before you plot:
Also, consider adding
plt.tight_layout()
beforeplt.show()
so longer y-axis labels adequately fit in plot.