skip to Main Content

I’m trying to sort list by frequency and then by name (pandas 1.3.2, python 3.10).

Firstly, I count each occurence in list, then, if amount is equal, names must be ordered alphabetically.

I found out that all works when len(list) < 19. Magic…

Code:

import pandas
        
df_data = pandas.DataFrame({
                'data':
                    ['14209adobepremiere', 'adobe-flash-player', 'adobe-flash-player-cis', 
                     'adobe-photoshop-cc-cis', 'discord', 'discord', 'driverpack', 
                     'freeoffice', 'freeoffice2018', 'generals',
                     'tiktok-for-pc-cis', 'tlauncher', 'utorrent', 'viber', 
                     'winrar', 'zoom', 'zoom', 'zoom-client-for-conferences', 
                     'zoom-client-for-conferences-cis']
            })

with pandas.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df_data['data'].value_counts().sort_index(
            ascending=True,
        ).sort_values(ascending=False))

Expected output (by count desc, then alphabetically asc):

discord                            2
zoom                               2
14209adobepremiere                 1
adobe-flash-player                 1
adobe-flash-player-cis             1
adobe-photoshop-cc-cis             1
driverpack                         1
freeoffice                         1
freeoffice2018                     1
generals                           1
tiktok-for-pc-cis                  1
tlauncher                          1
utorrent                           1
viber                              1
winrar                             1
zoom-client-for-conferences        1
zoom-client-for-conferences-cis    1
Name: data, dtype: int64

Real output (by count desc, but not alphabetically asc):

zoom                               2
discord                            2
14209adobepremiere                 1
tiktok-for-pc-cis                  1
zoom-client-for-conferences        1
winrar                             1
viber                              1
utorrent                           1
tlauncher                          1
generals                           1
adobe-flash-player                 1
freeoffice2018                     1
freeoffice                         1
driverpack                         1
adobe-photoshop-cc-cis             1
adobe-flash-player-cis             1
zoom-client-for-conferences-cis    1
Name: data, dtype: int64

Thnx in advance for any help.

2

Answers


  1. I don’t think you can chain the .sort_values operations on the index and then data, one method could be to reset the index, sort and reapply the index.

    df_data['data'].value_counts()
            .reset_index().sort_values(['data','index'],
              ascending=[False,True]).set_index('index')
    
                                    data
    index
    discord                             2
    zoom                                2
    14209adobepremiere                  1
    adobe-flash-player                  1
    adobe-flash-player-cis              1
    adobe-photoshop-cc-cis              1
    driverpack                          1
    freeoffice                          1
    freeoffice2018                      1
    generals                            1
    tiktok-for-pc-cis                   1
    tlauncher                           1
    utorrent                            1
    viber                               1
    winrar                              1
    zoom-client-for-conferences         1
    zoom-client-for-conferences-cis     1
    
    Login or Signup to reply.
  2. For counting frequencies only, you could use the collections.Counter object on the list directly, and if needed, convert the result to a pandas.DataFrame

    from collections import Counter
    data = ['14209adobepremiere', 'adobe-flash-player', 'adobe-flash-player-cis', 
                         'adobe-photoshop-cc-cis', 'discord', 'discord', 'driverpack', 
                         'freeoffice', 'freeoffice2018', 'generals',
                         'tiktok-for-pc-cis', 'tlauncher', 'utorrent', 'viber', 
                         'winrar', 'zoom', 'zoom', 'zoom-client-for-conferences', 
                         'zoom-client-for-conferences-cis']
    pandas.DataFrame(sorted(Counter(data).items(), key = lambda x: x[1], reverse=True), columns=['index', 'data']).set_index('index')
    

    Output

                                     data
    index                                
    discord                             2
    zoom                                2
    14209adobepremiere                  1
    adobe-flash-player                  1
    adobe-flash-player-cis              1
    adobe-photoshop-cc-cis              1
    driverpack                          1
    freeoffice                          1
    freeoffice2018                      1
    generals                            1
    tiktok-for-pc-cis                   1
    tlauncher                           1
    utorrent                            1
    viber                               1
    winrar                              1
    zoom-client-for-conferences         1
    zoom-client-for-conferences-cis     1
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search