skip to Main Content

I’m trying to assign colors to links within a Sankey diagram command in Rstudio. Specifically, I would like the links to be colored by their source node group (SOF_Data$Species_Binomial).

The links and nodes already display properly, but when the command to color links is included (sankeyNetwork::colourScale), the diagram displays blank. Additionally, the command to color nodes (sankeyNetwork::NodeGroup) displays as the default blue, regardless of the color input from nodes_SOF_Data$color.

Starting data frame, node list creation:

library(dplyr)
library(networkD3)
library(htmlwidgets)
library(data.table)

SOF_Data <- data.frame(
  Species_Binomial = c("C. artedi", "C. artedi", "C. artedi", "C. artedi", "C. artedi", "C. artedi", "C. fera", "C. fera"),
  Life_Stage = c("Larva/fry", "Embryotic/egg", "Embryotic/egg", "Embryotic/egg", "Embryotic/egg", "Larva/fry", "Embryotic/egg", "Larva/fry"),
  Effect_Category = c("Growth", "Growth", "Survival", "Growth", "Growth", "Growth", "Growth", "Growth"),
  Categorical_Effect = c("Growth rate", "Other - Specific", "Survival - Specific", "Development rate", "50% hatching time", "Growth rate", "Development rate", "Otolith growth"))
 
nodes_SOF_Data <- data.frame(name = unique(c(
  SOF_Data$Species_Binomial,
  SOF_Data$Life_Stage,
  SOF_Data$Effect_Category,
  SOF_Data$Categorical_Effect))) 
nodes_SOF_Data$color <- "#000"

Links creation:

links1_SOF_Data <- SOF_Data %>%
  group_by(Species_Binomial, Life_Stage) %>%
  summarize(value = n()) %>%
  ungroup() %>%
  mutate(source = match(Species_Binomial, nodes_SOF_Data$name) - 1,
         target = match(Life_Stage, nodes_SOF_Data$name) - 1,
         LinkGroup = Species_Binomial)

links2_SOF_Data <- SOF_Data %>% 
  group_by(Species_Binomial, Life_Stage, Effect_Category) %>% 
  summarize(value = n()) %>% 
  ungroup() %>% 
  mutate(source = match(Life_Stage, nodes_SOF_Data$name) - 1,
         target = match(Effect_Category, nodes_SOF_Data$name) - 1,
         LinkGroup = Species_Binomial) 

links3_SOF_Data <- SOF_Data %>% 
  group_by(Species_Binomial, Effect_Category, Categorical_Effect) %>% 
  summarize(value = n()) %>% 
  ungroup() %>% 
  mutate(source = match(Effect_Category, nodes_SOF_Data$name) - 1,
         target = match(Categorical_Effect, nodes_SOF_Data$name) - 1,
         LinkGroup = Species_Binomial)

links_SOF_Data <- bind_rows(links1_SOF_Data, links2_SOF_Data, links3_SOF_Data)

links_SOF_Data <- links_SOF_Data %>%
  mutate(color = case_when(Species_Binomial == "C. artedi" ~ "#66c2a5", 
Species_Binomial == "C. fera" ~ "#e78ac3"))

Sankey graph code:

colour_scale_species <- JS("function(d) { return d.color; }")

sankey_SOF_Data <- sankeyNetwork(Links = links_SOF_Data, 
                                 Nodes = nodes_SOF_Data, 
                                 Source = "source", 
                                 Target = "target", 
                                 Value = "value", 
                                 NodeID = "name", 
                                 units = "Count", 
                                 fontSize = 12, 
                                 nodeWidth = 30, 
                                 NodeGroup = "color", 
                                 LinkGroup = "LinkGroup", 
                                 colourScale = colour_scale_species)
sankey_SOF_Data

I’ve tried imbedding a color hex code column in the links_PaperData column and calling directly from that, although it doesn’t seem to fix the issue.

2

Answers


  1. The main problem is the way how the package parses the coloring function. If you look into the source code you will see that the package simply eval argument colourScale. Hence, you have to pass a string which returns a function.

    The easiest way to do so, is to use a construct like this:

    (function() {
      return function(d) {
       // do whatever
      }
    })()
    

    The outer function encapsulates the inner function and simply returns it. As it is directly invoked (note the brackets), eval will indeed receive a function which it can eventually use (this is a bit of a cumbersome interface, and the package author could simplify that by using a different approach).

    The second issue is that the color scale function (once we can correctly pass it to sankeyNetwork) will simply receive the value of column NodeGroup and LinkGroup respectively. So what you need to do is, to create a column with the link color in you link data.

    Code speaks a thousand words, so here’s a working example (N.B. I refactored your code a bit to avoid duplication and – to show the intended effect of links having the color of their source node – decided to color each node group sepearately):

    library(dplyr)
    library(networkD3)
    library(htmlwidgets)
    library(purrr)
    library(tidyr)
    
    SOF_Data <- tibble(
      Species_Binomial = c("C. artedi", "C. artedi", "C. artedi", 
                           "C. artedi", "C. artedi", 
                           "C. artedi", "C. fera", "C. fera"),
      Life_Stage = c("Larva/fry", "Embryotic/egg", "Embryotic/egg", "Embryotic/egg", 
                     "Embryotic/egg", "Larva/fry", "Embryotic/egg", "Larva/fry"),
      Effect_Category = c("Growth", "Growth", "Survival", "Growth", "Growth", "Growth", 
                          "Growth", "Growth"),
      Categorical_Effect = c("Growth rate", "Other - Specific", "Survival - Specific", 
                             "Development rate", "50% hatching time", "Growth rate", 
                             "Development rate", "Otolith growth"))
    
    col_pal <- viridisLite::viridis(4)
    
    nodes <- SOF_Data %>% 
      pivot_longer(everything(), names_to = "category", values_to = "name") %>% 
      distinct() %>% 
      mutate(order = match(category, names(SOF_Data)),
             color = col_pal[order]) %>% 
      arrange(order) %>% 
      select(name, color)
    
    lnks <- map2(
      head(names(SOF_Data), -1L),
      tail(names(SOF_Data), -1L),
      function(col_1, col_2) {
        col_1 <- sym(col_1)
        col_2 <- sym(col_2)
        SOF_Data %>% 
          group_by(!!col_1, !!col_2) %>% 
          summarize(value = n()) %>% 
          rename(source_name = 1L,
                 target_name = 2L)
      }
    ) %>% 
      list_rbind() %>% 
      mutate(source = match(source_name, nodes %>% pull(name)) -1L,
             target = match(target_name, nodes %>% pull(name)) -1L) %>% 
      inner_join(nodes, 
                 c(source_name = "name"))
    
    colour_identity <- JS("(function() {
      return function(d) {
        return d;
      }
    })()")
    
    sankeyNetwork(Links = lnks, 
                  Nodes = nodes, 
                  Source = "source", 
                  Target = "target", 
                  Value = "value", 
                  NodeID = "name", 
                  units = "Count", 
                  fontSize = 12, 
                  nodeWidth = 30, 
                  NodeGroup = "color", 
                  LinkGroup = "color", 
                  colourScale = colour_identity)
    

    Which produces the following graph:

    A sankey chart where nodes are colored according to their y-posiiton and links bear the color of their source node

    Login or Signup to reply.
  2. To process your data into the appropriate nodes and links data frames, here is my suggestion…

    SOF_Data <- data.frame(
      Species_Binomial = c("C. artedi", "C. artedi", "C. artedi", "C. artedi", "C. artedi", "C. artedi", "C. fera", "C. fera"),
      Life_Stage = c("Larva/fry", "Embryotic/egg", "Embryotic/egg", "Embryotic/egg", "Embryotic/egg", "Larva/fry", "Embryotic/egg", "Larva/fry"),
      Effect_Category = c("Growth", "Growth", "Survival", "Growth", "Growth", "Growth", "Growth", "Growth"),
      Categorical_Effect = c("Growth rate", "Other - Specific", "Survival - Specific", "Development rate", "50% hatching time", "Growth rate", "Development rate", "Otolith growth"))
    
    library(dplyr)
    library(tidyr)
    library(networkD3)
    
    links <-
      SOF_Data %>% 
      mutate(row = row_number()) %>% 
      pivot_longer(cols = -row, names_to = "column", values_to = "source") %>% 
      mutate(column = match(column, names(SOF_Data))) %>% 
      mutate(target = lead(source, order_by = column), .by = "row") %>% 
      filter(!is.na(target)) %>% 
      summarise(value = n(), .by = c(source, target))
    
    nodes <- data.frame(name = unique(c(links$source, links$target)))
    
    links$source_id <- match(links$source, nodes$name) - 1
    links$target_id <- match(links$target, nodes$name) - 1
    
    nodes
    #>                   name
    #> 1            C. artedi
    #> 2            Larva/fry
    #> 3               Growth
    #> 4        Embryotic/egg
    #> 5             Survival
    #> 6              C. fera
    #> 7          Growth rate
    #> 8     Other - Specific
    #> 9  Survival - Specific
    #> 10    Development rate
    #> 11   50% hatching time
    #> 12      Otolith growth
    
    links
    #> # A tibble: 13 × 5
    #>    source        target              value source_id target_id
    #>    <chr>         <chr>               <int>     <dbl>     <dbl>
    #>  1 C. artedi     Larva/fry               2         0         1
    #>  2 Larva/fry     Growth                  3         1         2
    #>  3 Growth        Growth rate             2         2         6
    #>  4 C. artedi     Embryotic/egg           4         0         3
    #>  5 Embryotic/egg Growth                  4         3         2
    #>  6 Growth        Other - Specific        1         2         7
    #>  7 Embryotic/egg Survival                1         3         4
    #>  8 Survival      Survival - Specific     1         4         8
    #>  9 Growth        Development rate        2         2         9
    #> 10 Growth        50% hatching time       1         2        10
    #> 11 C. fera       Embryotic/egg           1         5         3
    #> 12 C. fera       Larva/fry               1         5         1
    #> 13 Growth        Otolith growth          1         2        11
    

    With those, if you just want to make sure that the links match whatever color their source nodes are, you can simply set LinkGroup = "source" because the source columns in the links data frame should match the name of the source node in the nodes data frame (that is why I made the new numeric source_id column).

    sankeyNetwork(
      Links = links,
      Nodes = nodes,
      Source = "source_id",
      Target = "target_id",
      Value = "value",
      NodeID = "name",
      units = "Count",
      fontSize = 12,
      nodeWidth = 30,
      LinkGroup = "source"
    )
    

    If you want to set specific colors for the nodes and have the link colors follow, you can add a color column to the nodes and links data frames and use an identity JavaScript function for the colourScale argument.

    nodes$color <- viridisLite::viridis(n = nrow(nodes))
    links$color <- nodes$color[match(links$source, nodes$name)]
    
    nodes
    #>                   name     color
    #> 1            C. artedi #440154FF
    #> 2            Larva/fry #482173FF
    #> 3               Growth #433E85FF
    #> 4        Embryotic/egg #38598CFF
    #> 5             Survival #2D708EFF
    #> 6              C. fera #25858EFF
    #> 7          Growth rate #1E9B8AFF
    #> 8     Other - Specific #2BB07FFF
    #> 9  Survival - Specific #51C56AFF
    #> 10    Development rate #85D54AFF
    #> 11   50% hatching time #C2DF23FF
    #> 12      Otolith growth #FDE725FF
    
    links
    #> # A tibble: 13 × 6
    #>    source        target              value source_id target_id color    
    #>    <chr>         <chr>               <int>     <dbl>     <dbl> <chr>    
    #>  1 C. artedi     Larva/fry               2         0         1 #440154FF
    #>  2 Larva/fry     Growth                  3         1         2 #482173FF
    #>  3 Growth        Growth rate             2         2         6 #433E85FF
    #>  4 C. artedi     Embryotic/egg           4         0         3 #440154FF
    #>  5 Embryotic/egg Growth                  4         3         2 #38598CFF
    #>  6 Growth        Other - Specific        1         2         7 #433E85FF
    #>  7 Embryotic/egg Survival                1         3         4 #38598CFF
    #>  8 Survival      Survival - Specific     1         4         8 #2D708EFF
    #>  9 Growth        Development rate        2         2         9 #433E85FF
    #> 10 Growth        50% hatching time       1         2        10 #433E85FF
    #> 11 C. fera       Embryotic/egg           1         5         3 #25858EFF
    #> 12 C. fera       Larva/fry               1         5         1 #25858EFF
    #> 13 Growth        Otolith growth          1         2        11 #433E85FF
    
    sankeyNetwork(
      Links = links,
      Nodes = nodes,
      Source = "source_id",
      Target = "target_id",
      Value = "value",
      NodeID = "name",
      units = "Count",
      fontSize = 12,
      nodeWidth = 30,
      NodeGroup = "color",
      LinkGroup = "color",
      colourScale = "f => f"
    )
    

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search