skip to Main Content

I have two columns with a list of each strings, named Tag 1, Tag 2. In simple terms I need to check if a string in a cell is found anywhere in the adjacent column and if so, swap the values.

Existing Dataframe

Tag 1    Tag 2
Amazon   Twitter
Amazon   Google
eBay     Amazon
Reddit   Facebook

Desired Output

Tag 1    Tag 2
Amazon   Twitter
Amazon   Google
Amazon   eBay
Reddit   Facebook

In the desired outcome, you can see that Amazon has switched places with eBay because it was found in the Tag 1 column.

Minimum Reproducible Example

import pandas as pd

data = {'Tag 1': ['Amazon', 'Amazon', 'eBay', 'Reddit'],
        'Tag 2': ['Twitter', 'Google', 'Amazon', 'Facebook']}

df = pd.DataFrame(data)

I’ve been researching similar posts but can’t quite seem to get it exactly right.

Here’s my code so far.

3

Answers


  1. This might do it

    import pandas as pd
    import numpy as np
    
    data = {'Tag 1': ['Amazon', 'Amazon', 'eBay', 'Reddit'],
            'Tag 2': ['Twitter', 'Google', 'Amazon', 'Facebook']}
    
    df = pd.DataFrame(data)
    
    values = np.unique(df['Tag 1'].values)  # All unique values in Tag 1
    swapped = []  # So we don't double swap
    
    for val in values:
        rows = df.loc[df['Tag 2'] == val]  # Get rows in Tag 2 that matches
        for idx, row in rows.iterrows():
            if idx in swapped:  # Ignore if already swapped
                continue
            # Swap the values
            temp = df.iloc[idx]['Tag 2']
            df.iloc[idx]['Tag 2'] = df.iloc[idx]['Tag 1']
            df.iloc[idx]['Tag 1'] = temp
            swapped.append(idx)
    

    There are probably more efficient solutions, but I think this is pretty clear. It goes through each unique value in the ‘Tag 1’ column and checks for matches in the ‘Tag 2’ column, and swaps them. I was not sure if you wanted to avoid double swaps, but that can be removed if it’s not a requirement.

    Login or Signup to reply.
  2. You could use pd.where to check when a value in ‘Tag 2’ existis in ‘Tag 1’ using isin and swap the values when you assign back:

    df[['Tag 1','Tag 2']] = df[['Tag 2','Tag 1']].where(
        df['Tag 2'].isin(df['Tag 1'].tolist()), df[['Tag 1','Tag 2']].values)
    

    prints back:

        Tag 1     Tag 2
    0  Amazon   Twitter
    1  Amazon    Google
    2  Amazon      eBay
    3  Reddit  Facebook
    
    Login or Signup to reply.
  3. The answer of @sophocles is probably the right.

    However if you just want to reorder columns in a lexicographical order for instance, try:

    tags = ['Tag 1', 'Tag 2']
    
    df[tags] = df[tags].apply(lambda x: sorted(tuple(x)), axis=1, result_type='expand')
    print(df)
    
    # Output
          Tag 1    Tag 2
    0    Amazon  Twitter
    1    Amazon   Google
    2    Amazon     eBay
    3  Facebook   Reddit
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search