skip to Main Content

I would like to create a new column for words used for grep. I have a data frame and a list of keywords to identify whether my data frame includes such list of keywords or not. If keywords are included in the data frame, I would like to know which words in a newly created column.

So, this is what my data is

id // skills
1 // this is a skill for xoobst
2 // artificial intelligence
3 // logistic regression

I used the below code to grep words.

keyword <- "xoobst|logistic|intelligence"
result <- df[grep(keyword, df$skills, ignore.case = T),]

This is what I desired for as an outcome

id // skills // words
1 // this is a skill for xoobst // xoobst
2 // artificial intelligence // intelligence
3 // logistic regression // logistic

I tried the below code, but it got me a full sentence rather than a word used to identify whether it includes the word or not.

keys <- sprintf(".*(%s).*", keyword)
df$words <- sub(keys, "\1", df$skills)

Which alternative way would be necessary for me? Thank you in advance!

3

Answers


  1. You can use stringr:

    df <- data.frame(
      id = c(1, 2, 3), 
      skills = c("this is a skill for xoobst", "artificial intelligence", "logistic regression")
    )
    
    df |>
      dplyr::mutate(words = stringr::str_extract(df$skills, "xoobst|logistic|intelligence"))
    #>   id                     skills        words
    #> 1  1 this is a skill for xoobst       xoobst
    #> 2  2    artificial intelligence intelligence
    #> 3  3        logistic regression     logistic
    
    Login or Signup to reply.
  2. Using R base functions:

    > df$words <- gsub(".*(xoobst|logistic|intelligence).*", "\1", df$skills)
    > df
      id                     skills        words
    1  1 this is a skill for xoobst       xoobst
    2  2    artificial intelligence intelligence
    3  3        logistic regression     logistic
    
    Login or Signup to reply.
  3. Using grep with sapply and strsplit.

    df$words <- sapply(strsplit(df$skills, " "), function(x) grep(keyword, x, value=T))
    df
      id                     skills        words
    1  1 this is a skill for xoobst       xoobst
    2  2    artificial intelligence intelligence
    3  3        logistic regression     logistic
    

    This assumes that single keywords don’t contain spaces.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search