skip to Main Content

I have two data frames.

DF1

isActive,trackedSearchId
True,53436615
True,53434228
True,53434229

DF2

trackedSearchIds,Primary Keyword Group(s)
"[53436613, 53436615, 53437436, 53436506]",SEO - Directory-Deployment
"[53435887, 53437509, 53437441, 53436615, 53438685, 53437636]",SEO - Other-Glossary
"[53437504, 53435090, 53435887, 53434228]",SEO - Other
"[53437504, 53435090, 53434229]",SEO - Glossary

I want to check for each row of DF1 for column trackedSearchId and check with each row in DF2 with each value of trackedSearchIds array and if the value of DF1 column is present in DF2 append it with DF1.

The output should look like:

isActive,trackedSearchId,Primary Keyword Group(s)
True,53436615,SEO - Directory-Deployment&SEO - Other-Glossary
True,53434228,SEO - Other
True,53434229,SEO - Glossary

2

Answers


  1. Assuming strings representations of lists in df2, use ast.literal_eval, then groupby.agg, explode, and merge:

    import ast
    
    df1['trackedSearchId'] = df1['trackedSearchId'].astype(str)
    
    out = df1.merge((df2.assign(trackedSearchIds=df2['trackedSearchIds'].str.findall(r'(d+)')).explode('trackedSearchIds')
                        .groupby('trackedSearchIds').agg('&'.join)
                   ),
                   left_on='trackedSearchId', right_on='trackedSearchIds')
    
    

    Output:

       isActive trackedSearchId                         Primary Keyword Group(s)
    0      True        53436615  SEO - Directory-Deployment&SEO - Other-Glossary
    1      True        53434228                                      SEO - Other
    2      True        53434229                                   SEO - Glossary
    
    Login or Signup to reply.
  2. You can use DataFrame.explode of converted values to list by ast.literal_eval if necessary, aggregate join in GroupBy.agg and append new column in DataFrame.join:

    import ast
    
    df = (df1.join(df2.assign(trackedSearchIds=df2['trackedSearchIds'].apply(ast.literal_eval))
               .explode('trackedSearchIds')
               .groupby('trackedSearchIds')['Primary Keyword Group(s)']
               .agg('&'.join), on='trackedSearchId'))
    print (df)
       isActive  trackedSearchId                         Primary Keyword Group(s)
    0      True         53436615  SEO - Directory-Deployment&SEO - Other-Glossary
    1      True         53434228                                      SEO - Other
    2      True         53434229                                   SEO - Glossary
    

    If values are lists:

    df = (df1.join(df2
               .explode('trackedSearchIds')
               .groupby('trackedSearchIds')['Primary Keyword Group(s)']
               .agg('&'.join), on='trackedSearchId'))
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search