skip to Main Content

I use a dynamic variable (eg. ID) as a way to reference a column name that will change depending on which gene I am processing at the time. I then use case_when within mutate to create a new column that will have values that depend on the dynamic column.

I thought that !! (bang-bang) was what I needed to force eval of the content of the variable; however, I did not get the expected output in my new column. Only the !!as.name gave me the output I was expecting, and I do not fully understand why. Could someone explain why in this case using only !! isn’t appropriate and what is happening in !!as.name?

Here is a simple reproducible example that I made up to demo what I am experiencing:

library(tidyverse)

ID <- "birth_year"

# Correct output
test <- starwars %>%
  mutate(FootballLeague = case_when(
    !!as.name(ID) < 10 ~ "U10",
    !!as.name(ID) >= 10 & !!as.name(ID) < 50 ~ "U50",
    !!as.name(ID) >= 50 & !!as.name(ID) < 100 ~ "U100",
    !!as.name(ID) >= 100 ~ "Senior",
    TRUE ~ "Others"
  ))

# Incorrect output
test2 <- starwars %>%
  mutate(FootballLeague = case_when(
    !!(ID) < 10 ~ "U10",
    !!(ID) >= 10 & !!(ID) < 50 ~ "U50",
    !!(ID) >= 50 & !!(ID) < 100 ~ "U100",
    !!(ID) >= 100 ~ "Senior",
    TRUE ~ "Others"
  ))

# Incorrect output
test3 <- starwars %>%
  mutate(FootballLeague = case_when(
    as.name(ID) < 10 ~ "U10",
    as.name(ID) >= 10 & as.name(ID) < 50 ~ "U50",
    as.name(ID) >= 50 & as.name(ID) < 100 ~ "U100",
    as.name(ID) >= 100 ~ "Senior",
    TRUE ~ "Others"
  ))

identical(test, test2)
# FALSE

identical(test2, test3)
# TRUE

sessionInfo()
#R version 4.0.2 (2020-06-22)
#Platform: x86_64-centos7-linux-gnu (64-bit)
#Running under: CentOS Linux 7 (Core)

# tidyverse_1.3.0
# dplyr_1.0.2

Cheers!

2

Answers


  1. You can wrap your expressions in the function quo() to see the result of the operation after applying the !! operator. For simplicity I will use a shorter expression for demonstration:

    Preparations:

    library(tidyverse)
    ID <- "birth_year"
    
    ## Test without quasiquotation:
    starwars %>% 
      filter(birth_year < 50)
    

    Experiment 1:

    quo(
      starwars %>% 
        filter(ID < 50)
    )
    ## result: starwars %>% filter(ID < 50)
    

    We learn: filter() does not treat ID as variable, but "as is". So we need a mechanism to tell filter() that it should treat ID as variable, and it should use its value.

    –> The !! operator can be used to tell filter() it should treat an expression as variable and substitute its value.

    Experiment 2:

    quo(
      starwars %>% 
        filter(!!ID < 50)
    ) 
    ## result: starwars %>% filter("birth_year" < 50)
    

    We learn: The !! operator has indeed worked: ID was replaced with its value. But: The value of ID is the string "birth_year". Note the quotes in the result. But as you probably know, tidyverse functions don’t take variable names as strings, they want the raw names, without quotes. Compare with Experiment 1: filter() takes everything "as is", so it looks for a column named "birth_year" (including the quotes!)

    What does the function as.name() do?

    This is a base R fuction that takes a string (or a variable containing a string) and returns the content of the string as variable name.
    So if you call as.name(ID) in base R, the result is birth_year, this time without quotes – just like the tidyverse expects it. So let’s try it:

    Experiment 3:

    quo(
      starwars %>% 
        filter(as.name(ID) < 50)
    ) 
    ## result: starwars %>% filter(as.name(ID) < 50)
    

    We learn: This did not work, because, again, filter() takes everything "as is". So now it looks for column named as.name(ID), which does of course not exist.

    –> We need to combine the two things to make it work:

    1. Use as.name() to convert the string to a variable name.
    2. Use !! to tell filter() it should not take things "as is", but substitute the real value.

    Experiment 4:

    quo(
      starwars %>% 
        filter(!!as.name(ID) < 50)
    ) 
    ## result: starwars %>% filter(birth_year < 50)
    

    Now it works! 🙂

    I have used filter() in my experiments, but it works exactly the same with mutate() and other tidyverse functions.

    Login or Signup to reply.
  2. To make it easier, you can also use .data[[]] as suggested by @Lionel Henry in this comment. See also rlang 0.4.0 release notes

    library(tidyverse)
    
    ID <- "birth_year"
    
    # Correct output
    test <- starwars %>%
      mutate(FootballLeague = case_when(
        !!as.name(ID) < 10 ~ "U10",
        !!as.name(ID) >= 10 & !!as.name(ID) < 50 ~ "U50",
        !!as.name(ID) >= 50 & !!as.name(ID) < 100 ~ "U100",
        !!as.name(ID) >= 100 ~ "Senior",
        TRUE ~ "Others"
      ))
    test
    

    Using .data

    test2 <- starwars %>%
      mutate(FootballLeague = case_when(
        .data[[ID]]   < 10 ~ "U10",
        .data[[ID]]  >= 10 & .data[[ID]]  < 50 ~ "U50",
        .data[[ID]]  >= 50 & .data[[ID]]  < 100 ~ "U100",
        .data[[ID]]  >= 100 ~ "Senior",
        TRUE ~ "Others"
      ))
    test2
    #> # A tibble: 87 x 15
    #>    name               height  mass hair_color    skin_color  eye_color
    #>    <chr>               <int> <dbl> <chr>         <chr>       <chr>    
    #>  1 Luke Skywalker        172    77 blond         fair        blue     
    #>  2 C-3PO                 167    75 <NA>          gold        yellow   
    #>  3 R2-D2                  96    32 <NA>          white, blue red      
    #>  4 Darth Vader           202   136 none          white       yellow   
    #>  5 Leia Organa           150    49 brown         light       brown    
    #>  6 Owen Lars             178   120 brown, grey   light       blue     
    #>  7 Beru Whitesun lars    165    75 brown         light       blue     
    #>  8 R5-D4                  97    32 <NA>          white, red  red      
    #>  9 Biggs Darklighter     183    84 black         light       brown    
    #> 10 Obi-Wan Kenobi        182    77 auburn, white fair        blue-gray
    #> 11 Anakin Skywalker      188    84 blond         fair        blue     
    #> 12 Wilhuff Tarkin        180    NA auburn, grey  fair        blue     
    #> 13 Chewbacca             228   112 brown         unknown     blue     
    #> 14 Han Solo              180    80 brown         fair        brown    
    #> 15 Greedo                173    74 <NA>          green       black    
    #>    birth_year sex    gender    homeworld species films     vehicles  starships
    #>         <dbl> <chr>  <chr>     <chr>     <chr>   <list>    <list>    <list>   
    #>  1       19   male   masculine Tatooine  Human   <chr [5]> <chr [2]> <chr [2]>
    #>  2      112   none   masculine Tatooine  Droid   <chr [6]> <chr [0]> <chr [0]>
    #>  3       33   none   masculine Naboo     Droid   <chr [7]> <chr [0]> <chr [0]>
    #>  4       41.9 male   masculine Tatooine  Human   <chr [4]> <chr [0]> <chr [1]>
    #>  5       19   female feminine  Alderaan  Human   <chr [5]> <chr [1]> <chr [0]>
    #>  6       52   male   masculine Tatooine  Human   <chr [3]> <chr [0]> <chr [0]>
    #>  7       47   female feminine  Tatooine  Human   <chr [3]> <chr [0]> <chr [0]>
    #>  8       NA   none   masculine Tatooine  Droid   <chr [1]> <chr [0]> <chr [0]>
    #>  9       24   male   masculine Tatooine  Human   <chr [1]> <chr [0]> <chr [1]>
    #> 10       57   male   masculine Stewjon   Human   <chr [6]> <chr [1]> <chr [5]>
    #> 11       41.9 male   masculine Tatooine  Human   <chr [3]> <chr [2]> <chr [3]>
    #> 12       64   male   masculine Eriadu    Human   <chr [2]> <chr [0]> <chr [0]>
    #> 13      200   male   masculine Kashyyyk  Wookiee <chr [5]> <chr [1]> <chr [2]>
    #> 14       29   male   masculine Corellia  Human   <chr [4]> <chr [0]> <chr [2]>
    #> 15       44   male   masculine Rodia     Rodian  <chr [1]> <chr [0]> <chr [0]>
    #>    FootballLeague
    #>    <chr>         
    #>  1 U50           
    #>  2 Senior        
    #>  3 U50           
    #>  4 U50           
    #>  5 U50           
    #>  6 U100          
    #>  7 U50           
    #>  8 Others        
    #>  9 U50           
    #> 10 U100          
    #> 11 U50           
    #> 12 U100          
    #> 13 Senior        
    #> 14 U50           
    #> 15 U50           
    #> # ... with 72 more rows
    

    Check if they are the same

    identical(test, test2)
    #> [1] TRUE
    

    Created on 2020-11-26 by the reprex package (v0.3.0)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search