skip to Main Content

I have two tables of Twitter API data bound together and I want a function that determines if the text contains the word f150. If it does then it should return ford, if not it should search the text for the word Silverado and if Silverado is found it should return chevy. all others should be null.

I saw this online but it isn’t working for me. Also are there wildcards in R like in SQL?

`tweet_sentiments <-
 tweet_sentiments %>% 
 mutate(vehicle = if(text = "f150") {ford}
     else_if(text= "silverado"){Chevy})

2

Answers


  1. 1) We can use case_when

    tweet_sentiments %>%
          mutate(vehicle = case_when(text == 'f150' ~ 'ford',
                  text == 'silverado' ~ 'Chevy'))
    

    2) If it is substring, then use str_detect

    library(stringr)
    tweet_sentiments %>%
          mutate(vehicel = case_when(str_detect(text, 'f150' ~ 'Ford',
                str_detect(text, 'silverado') ~ 'Chevy'))
    

    3) another option is %like%

     library(data.table)
     tweet_sentiments %>%
          mutate(vehicle = case_when(text %like% 'f150' ~ 'ford',
                          text %like% 'silverado' ~ 'Chevy'))
    

    4) another option is rowwise with if/else

    tweet_sentiments %>%
         rowwise %>%
         mutate(vehicle = if(str_detect(text, 'f150')) 'ford' 
            else if(str_detect(text, 'silverado')) 'Chevy' else NA_character_) %>%
         ungroup
    

    5) or we can use fuzzy_join

    keydat <- tibble(text = c('f150', 'silverado'), value = c('ford', 'Chevy'))
    library(fuzzyjoin)
    tweet_sentiments %>%
         regex_left_join(keydat, by = c('text')) %>%
         mutate(vehicle = coalesce(value, vehicle), value = NULL)
    
    Login or Signup to reply.
    1. It is legal to use if in a mutate call, but in what you demonstrate here, it is wrong. Since you want to condition on a vector, you should consider ifelse (base R), or if_else (in dplyr).

      The first change to your code is something like:

      tweet_sentiments %>% 
        mutate(
          vehicle = ifelse(...)
        )
      
    2. text = 'f150' is an assignment, you need a comparison, which is == for equality. Progressive code changes:

      tweet_sentiments %>% 
        mutate(
          vehicle = if_else(text == "f150", "Ford",
                            if_else(text == "silverado", "Chevy", ...))
        )
      
    3. You need a default value, one that is assigned if text is neither "f150" nor "silverado". Options include a literal string like "unknown", or the R-idiomatic NA (which means effectively "not-applicable" or "could be anything"). Code progress:

      tweet_sentiments %>% 
        mutate(
          vehicle = if_else(text == "f150", "Ford",
                            if_else(text == "silverado", "Chevy", NA_character_))
        )
      

      (R has at least six kinds of NA, and if_else is rather particular about keeping the class of its yes= and no= arguments the same class. If you used ifelse instead, you could have kept it at NA at the risk of several of the other problems that base::ifelse presents. It has baggage.)

    4. You mentioned wildcards, which suggests that you may want to find "f150" as a substring in the text, in which case we will want grepl. Code progress:

      tweet_sentiments %>% 
        mutate(
          vehicle = if_else(grepl("f150", text), "Ford",
                            if_else(grepl("silverado", text), "Chevy", NA_character_))
        )
      

      grepl supports ignore.case= as well, in case you want to consider case-insensitive comparisons.

    5. Lastly, working this back around to a dplyr-idiomatic way of doing things … whenever I see more than one nested ifelse (…), I immediately recommend dplyr::case_when. For instance, if you add another car type or two, it gets unwieldy:

      tweet_sentiments %>% 
        mutate(
          vehicle = if_else(grepl("f150", text), "Ford",
                            if_else(grepl("silverado", text), "Chevy",
                                    if_else(grepl("RAV4", text), "Toyota", NA_character_)))
        )
      

      but can be cleaned up (indents and parens) as:

      tweet_sentiments %>% 
        mutate(
          vehicle = case_when(
            grepl("f150", text) ~"Ford",
            grepl("silverado", text) ~ "Chevy",
            grepl("RAV4", text) ~ "Toyota",
            TRUE ~ NA_character_
          )
        )
      

    Since you asked about "wildcards", if you don’t know about regular expressions, or don’t know the difference between regex and glob-style patterns, then I suggest you look at https://stackoverflow.com/a/22944075/3358272 (and perhaps ?glob2rx, for converting glob-style to regex, since grep* functions only deal with regex or fixed-strings).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search