I use a dynamic variable (eg. ID
) as a way to reference a column name that will change depending on which gene I am processing at the time. I then use case_when
within mutate
to create a new column that will have values that depend on the dynamic column.
I thought that !!
(bang-bang) was what I needed to force eval of the content of the variable; however, I did not get the expected output in my new column. Only the !!as.name
gave me the output I was expecting, and I do not fully understand why. Could someone explain why in this case using only !!
isn’t appropriate and what is happening in !!as.name
?
Here is a simple reproducible example that I made up to demo what I am experiencing:
library(tidyverse)
ID <- "birth_year"
# Correct output
test <- starwars %>%
mutate(FootballLeague = case_when(
!!as.name(ID) < 10 ~ "U10",
!!as.name(ID) >= 10 & !!as.name(ID) < 50 ~ "U50",
!!as.name(ID) >= 50 & !!as.name(ID) < 100 ~ "U100",
!!as.name(ID) >= 100 ~ "Senior",
TRUE ~ "Others"
))
# Incorrect output
test2 <- starwars %>%
mutate(FootballLeague = case_when(
!!(ID) < 10 ~ "U10",
!!(ID) >= 10 & !!(ID) < 50 ~ "U50",
!!(ID) >= 50 & !!(ID) < 100 ~ "U100",
!!(ID) >= 100 ~ "Senior",
TRUE ~ "Others"
))
# Incorrect output
test3 <- starwars %>%
mutate(FootballLeague = case_when(
as.name(ID) < 10 ~ "U10",
as.name(ID) >= 10 & as.name(ID) < 50 ~ "U50",
as.name(ID) >= 50 & as.name(ID) < 100 ~ "U100",
as.name(ID) >= 100 ~ "Senior",
TRUE ~ "Others"
))
identical(test, test2)
# FALSE
identical(test2, test3)
# TRUE
sessionInfo()
#R version 4.0.2 (2020-06-22)
#Platform: x86_64-centos7-linux-gnu (64-bit)
#Running under: CentOS Linux 7 (Core)
# tidyverse_1.3.0
# dplyr_1.0.2
Cheers!
2
Answers
You can wrap your expressions in the function
quo()
to see the result of the operation after applying the!!
operator. For simplicity I will use a shorter expression for demonstration:Preparations:
Experiment 1:
We learn:
filter()
does not treatID
as variable, but "as is". So we need a mechanism to tellfilter()
that it should treatID
as variable, and it should use its value.–> The
!!
operator can be used to tellfilter()
it should treat an expression as variable and substitute its value.Experiment 2:
We learn: The
!!
operator has indeed worked:ID
was replaced with its value. But: The value ofID
is the string"birth_year"
. Note the quotes in the result. But as you probably know, tidyverse functions don’t take variable names as strings, they want the raw names, without quotes. Compare with Experiment 1:filter()
takes everything "as is", so it looks for a column named"birth_year"
(including the quotes!)What does the function
as.name()
do?This is a base R fuction that takes a string (or a variable containing a string) and returns the content of the string as variable name.
So if you call
as.name(ID)
in base R, the result isbirth_year
, this time without quotes – just like the tidyverse expects it. So let’s try it:Experiment 3:
We learn: This did not work, because, again,
filter()
takes everything "as is". So now it looks for column namedas.name(ID)
, which does of course not exist.–> We need to combine the two things to make it work:
as.name()
to convert the string to a variable name.!!
to tellfilter()
it should not take things "as is", but substitute the real value.Experiment 4:
Now it works! 🙂
I have used
filter()
in my experiments, but it works exactly the same withmutate()
and other tidyverse functions.To make it easier, you can also use
.data[[]]
as suggested by @Lionel Henry in this comment. See also rlang 0.4.0 release notesUsing
.data
Check if they are the same
Created on 2020-11-26 by the reprex package (v0.3.0)