skip to Main Content

I need to match keywords listed in pandas column with the keywords from a list and create a new column that consists of matched words. Example:

my_list = ['machine learning', 'artificial intelligence', 'lasso']

Data:

listing                                         keyword_column
I am looking for machine learning expert        machine learning
Machine learning expert that knows lasso        machine learning, lasso
Need a web designer                              
Artificial Intelligence application on...       artificial intelligence

3

Answers


  1. Use Series.str.findall for get all values in list, join togehter by Series.str.join and if necessary convert to lowercase by Series.str.lower:

    Also here are used word boundaries with b for correct matching whole words from my_list.

    my_list = ['machine learning', 'artificial intelligence', 'lasso']
    
    import re
    
    pat = '|'.join(r"b{}b".format(x) for x in my_list)
    df['new'] = df['listing'].str.findall(pat, flags=re.I).str.join(', ').str.lower()
    

    Or:

    df['new'] = df['listing'].str.lower().str.findall(pat).str.join(', ')
    

    print (df)
                                        listing           keyword_column  
    0  I am looking for machine learning expert         machine learning   
    1  Machine learning expert that knows lasso  machine learning, lasso   
    2                      Need a web designer                       NaN   
    3    Artificial Intelligence application on  artificial intelligence   
    
                           new  
    0         machine learning  
    1  machine learning, lasso  
    2                           
    3  artificial intelligence  
    
    Login or Signup to reply.
  2. You can also use str.lower + str.findall + str.join to solve your problem:

    df['keyword_column'] = df['listing'].str.lower().str.findall('|'.join(my_list)).str.join(', ')
    

    And now:

    print(df)
    

    Is:

                                         listing           keyword_column
    0   I am looking for machine learning expert         machine learning
    1   Machine learning expert that knows lasso  machine learning, lasso
    2                        Need a web designer                         
    3  Artificial Intelligence application on...  artificial intelligence
    
    Login or Signup to reply.
  3. flashtext can also be used to extract keyword

    import pandas as pd
    from flashtext import KeywordProcessor
    
    data = ['I am looking for machine learning expert','Machine learning expert that knows lasso ','Need a web designer','Artificial Intelligence application on...' ]
    
    df = pd.DataFrame(data, columns = ['listing'])
    my_list = ['machine learning', 'artificial intelligence', 'lasso']
    
    kp = KeywordProcessor()
    kp.add_keywords_from_list(my_list)
    
    df['keyword_columns'] = df['listing'].apply(lambda x: kp.extract_keywords(x))
    
    #op
    df['keyword_columns']
    Out[68]: 
    0           [machine learning]
    1    [machine learning, lasso]
    2                           []
    3    [artificial intelligence]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search