skip to Main Content

Ok, this is driving me crazy, I can’t figure out what is going on with this which should be a very simple seaborn barplot. I have the following dataframe:

Users Count App Name
0   224244  Facebook
1   202253  Messenger – Text and Video Chat for Free
2   164995  Google Play Games
3   151713  Instagram
4   132949  Google Pay
5   126474  Netflix
6   122877  Pinterest
7   110071  Amazon Shopping
8   108515  Samsung Pay
9   108204  Pandora Music
10  105865  Snapchat
11  103858  Pinterest
12  101676  Uber
13  94741   Yelp: Food Shopping Services Nearby
14  83623   Wish - Shopping Made Fun
15  83327   Spotify Music
16  81678   WhatsApp Messenger
17  76064   Amazon Prime Video
18  68823   eBay: Shop Deals - Home Fashion & Electronics
19  64677   Amazon Shopping
20  59513   Walmart

All I’m doing is trying to plot a Seaborn barplot out of it:

plt.figure(figsize=(8,6))
sns.barplot(x='Users Count', y='App Name', data=pop_apps_df, alpha=0.8, ci=None)
plt.title('20 Most Popular Apps')
plt.ylabel('App Name', fontsize=12)
plt.xlabel('Number of Unique MAIDs', fontsize=12)
plt.show()

The plot, however, is somehow not pulling the correct values from ‘User Count’ that are equivalent to the rows on column ‘App Name’ (note the bar for Amazon Shopping). What on earth could be causing this and what can I do?? I’ve searched high and low and I can’t find an answer or a solution.

enter image description here

2

Answers


  1. Amazon Shopping appears twice in your Dataframe (lines 7 and 19), with different values. Remove the duplicate value(s) and everything should be fine.

    Also, you may want to increase the height of the plot to display all rows.

    Login or Signup to reply.
  2. Per seaborn.barplot docs:

    A bar plot represents an estimate of central tendency for a numeric variable…

    It is also important to keep in mind that a bar plot shows only the mean (or other estimator) value…

    Because ‘Amazon Shopping’ appears twice, the mean of the lower and higher value is reflected in barplot. To resolve, consider two solutions:

    1. Adjust default estimator argument with an aggregate/statistical function such as numpy.sum:

      sns.barplot(x='Users Count', y='App Name', data=pop_apps_df, 
                  estimator=np.sum, alpha=0.8, ci=None)
      
    2. Aggregate before you plot:

      pop_apps_agg = pop_apps_df.groupby(['App Name'], as_index=False)["Users Count"].sum()
      
      sns.barplot(x='Users Count', y='App Name', data=pop_apps_agg, alpha=0.8, ci=None)
      

    Also, consider adding plt.tight_layout() before plt.show() so longer y-axis labels adequately fit in plot.

    Plot Output

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search