skip to Main Content

I’m trying to find all combinations of a substring found in a single column, and then explode the dataframe with all possible combinations of each word.

Example Dataframe

                     URL                Keyword
0  http://www.amazon.com  Amazon Lightning Sale
1   https://www.ebay.com        Shop eBay Today

Desired Output

                      URL                Keyword
0   http://www.amazon.com  Amazon Lightning Sale
1   http://www.amazon.com  Amazon Sale Lightning
2   http://www.amazon.com  Lightning Amazon Sale
3   http://www.amazon.com  Sale Amazon Lightning
4   http://www.amazon.com  Sale Lightning Amazon
5   http://www.amazon.com  Lightning Sale Amazon
6    https://www.ebay.com        Shop eBay Today
7    https://www.ebay.com        Shop Today eBay
8    https://www.ebay.com        eBay Shop Today
9    https://www.ebay.com        eBay Today Shop
10   https://www.ebay.com        Today eBay Shop
11   https://www.ebay.com        Today Shop eBay

Minimum Reproducable Example

import pandas as pd

# initialize data of lists.
data = {'URL': ['http://www.amazon.com', 'https://www.ebay.com'],
        'Keyword': ["Amazon Lightning Sale", "Shop eBay Today"]}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.
print(df)

I’ve tried the solution here: Pandas DataFrame Combinations and expand but it’s not quite what I need.

3

Answers


    1. create an ID for each row

       #create an id:
       df['ID'] = range(df.shape[0])
      
                        URL                Keyword      ID
       0  http://www.amazon.com  Amazon Lightning Sale  0
       1   https://www.ebay.com        Shop eBay Today  1
      
    2. Create the combinations for each row as a new DataFrame, or dict or …

      import re
      import itertools
      def create_combinations(id, kw):
      
          # split the keywords by word:
          words = re.split('W+', kw)
      
          return pd.DataFrame([ {'ID':id, 'Combinations' : ' '.join(x) } for x in itertools.permutations(words)])
      
      
      # create the combinations
      data = []
      for id, kw in zip(df.ID, df.Keyword):
          data.append( create_combinations(id, kw) )
      
    3. Join

      # join the data
      pd.merge(df, pd.concat(data), on = ['ID'])
      
      
               URL     Keyword     ID  Combinations
          0    http://www.amazon.com   Amazon Lightning Sale   0   Amazon Lightning Sale
          1    http://www.amazon.com   Amazon Lightning Sale   0   Amazon Sale Lightning
          2    http://www.amazon.com   Amazon Lightning Sale   0   Lightning Amazon Sale
          3    http://www.amazon.com   Amazon Lightning Sale   0   Lightning Sale Amazon
          4    http://www.amazon.com   Amazon Lightning Sale   0   Sale Amazon Lightning
          5    http://www.amazon.com   Amazon Lightning Sale   0   Sale Lightning Amazon
          6    https://www.ebay.com    Shop eBay Today     1   Shop eBay Today
          7    https://www.ebay.com    Shop eBay Today     1   Shop Today eBay
          8    https://www.ebay.com    Shop eBay Today     1   eBay Shop Today
          9    https://www.ebay.com    Shop eBay Today     1   eBay Today Shop
          10   https://www.ebay.com    Shop eBay Today     1   Today Shop eBay
          11   https://www.ebay.com    Shop eBay Today     1   Today eBay Shop
      

    Joost Döbken’s answer is a bit more elegant

    Login or Signup to reply.
  1. from itertools import permutations
    
    df['Keyword'] = df['Keyword'].apply(lambda x: list(permutations(x.split())))
    df.explode('Keyword', ignore_index=True)
    

    First the itertools.permutations method applied to the Keyword column will create all possible combinations of keywords as a list.

    Next you can use the pandas.DataFrame.explode function to create many items from the created lists.

    If you really want a full string instead of a tuple of keywords, you can replace the list(...) part with a string join: [" ".join(t) for t in permutations(x.split())]

    Login or Signup to reply.
  2. Here is an alternative way without using itertools:

    (df.assign(Keyword = df['Keyword'].str.split().map(lambda x: [[i,j,k] for i in x for j in x for k in x if len({i,j,k})==len(x)]))
     .explode('Keyword')
     .assign(Keyword = lambda x: x['Keyword'].str.join(' ')))
    

    Output:

                         URL                Keyword
    0  http://www.amazon.com  Amazon Lightning Sale
    0  http://www.amazon.com  Amazon Sale Lightning
    0  http://www.amazon.com  Lightning Amazon Sale
    0  http://www.amazon.com  Lightning Sale Amazon
    0  http://www.amazon.com  Sale Amazon Lightning
    0  http://www.amazon.com  Sale Lightning Amazon
    1   https://www.ebay.com        Shop eBay Today
    1   https://www.ebay.com        Shop Today eBay
    1   https://www.ebay.com        eBay Shop Today
    1   https://www.ebay.com        eBay Today Shop
    1   https://www.ebay.com        Today Shop eBay
    1   https://www.ebay.com        Today eBay Shop
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search