skip to Main Content

I would like to build a "classical" circular dendrogram plot, such as those displayed in the answer of this post https://biology.stackexchange.com/questions/14152/how-should-i-put-a-large-phylogeny-into-a-scientific-paper using R (I am fairly used to R, but I have never done such plot until now).

I am using environmental metabarcoding datas (Multiple Operation Taxonomic Units, MOTUs). These are DNA sequences, each being identified to a certain taxonomic level, and assigned to it. The resulting data I am considering to build such a plot is the taxonomic "path" of each MOTU, hence the sequence of taxonomic assignation, e.g kingdom;order;class;family;genus;species (example below).

I have tried to different ways of building this plot using igraph package, both not quite satisfying. Big parts of this code are base on the ggtaxplot function from “`metabaR“ package (https://github.com/metabaRfactory/metabaR) package.

I think that I could get what I want from both methods, but I’m not sure any of them is really the effective way of doing it. Any help greatly appreciated !

I am not strictly willing to keep on using igraph, I would be fine with any other ggplot-like method (as I wish to add further information, e.g. labels, colors, on the plot).

Many thanks !

All this code were run on R 4.2.1 on Ubuntu 22.04.2

path <- c("root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Fungi incertae sedis@no rank:Mucoromycota@phylum:Mortierellomycotina@subphylum:Mortierellomycetes@class:Mortierellales@order:Mortierellaceae@family:Mortierella@genus:unclassified Mortierella@no rank",                                            
          "root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:sordariomyceta@clade:Sordariomycetes@class:Xylariomycetidae@subclass:Xylariales@order:unclassified Xylariales@no rank:Xylariales sp.@species",
          "root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Fungi incertae sedis@no rank:Mucoromycota@phylum:Mortierellomycotina@subphylum:Mortierellomycetes@class:Mortierellales@order:Mortierellaceae@family:Linnemannia@genus:Linnemannia zychae@species",                                                  
          "root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Fungi incertae sedis@no rank:Mucoromycota@phylum:Mortierellomycotina@subphylum:Mortierellomycetes@class:Mortierellales@order:Mortierellaceae@family",                                                                                               
          "root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:sordariomyceta@clade:Sordariomycetes@class:Hypocreomycetidae@subclass:Hypocreales@order",                                                     
          "root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:sordariomyceta@clade:Leotiomycetes@class:Helotiales@order",                                                                                   
          "root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:dothideomyceta@clade:Dothideomycetes@class:Pleosporomycetidae@subclass:Pleosporales@order",                                                   
          "root@no rank:Eukaryota@superkingdom:Opisthokonta@clade:Fungi@kingdom:Dikarya@subkingdom:Ascomycota@phylum:saccharomyceta@clade:Pezizomycotina@subphylum:leotiomyceta@clade:sordariomyceta@clade:Leotiomycetes@class:Helotiales@order"                                                                                  
)

I have tried to different ways of building this plot using igraph package, both not quite satisfying. Big parts of this code are base on the ggtaxplot function from “`metabaR“ package (https://github.com/metabaRfactory/metabaR) package :

library(metabaR)
library(magrittr)
library(igraph)
library(ggraph)

# Formatting taxonomic information from path
sep.level = ":"; sep.info = "@"
parse <- unname(taxoparser(path, sep.level, sep.info)) %>%
  lapply(X =., FUN = function(x){ x <- x[names(x) %in% c("kingdom", "phylum", "class", "order", "family", "genus", "species")]})
path <- sapply(parse, toString) 
parse <- strsplit(path, ", ")
parse.mat <- do.call(rbind, lapply(parse, `length<-`, max(lengths(parse))))

# Building edgelist
edgelist <- NULL
for (i in rev(2:ncol(parse.mat))) {
  idx <- which(!is.na(parse.mat[, i]))
  kid <- parse.mat[idx, i]
  parent <- parse.mat[idx, (i - 1)]
  kidfull <- apply(parse.mat[idx, 1:i, drop = F], 1, toString)
  parentfull <- apply(parse.mat[idx, 1:(i - 1), drop = F], 1, toString)
  
  edgelist <- rbind(
    edgelist,
    unique(cbind(
      parentfull,
      kidfull,
      parent, kid
    ))
  )
}

Here come my first attempt, using ggraph :

# first attempt, using ggraph
mygraph <- graph_from_data_frame( edgelist[,c("parent", "kid")] )
ggraph(mygraph, layout = 'dendrogram', circular = TRUE) +
  geom_edge_elbow() +
  geom_node_point() +
  theme_void()

I like the "look" of this plot, but all leaves of the dendrogram end up at the same y value, whereas there is different taxonomic levels, which I would like to show.

#second attemp, using igraph
g <- igraph::graph.edgelist(edgelist[rev(1:nrow(edgelist)),
                                     c("parentfull", "kidfull")], directed = F)
# Rename nodes with kid names
igraph::V(g)$name2 <- ifelse(igraph::V(g)$name %in% edgelist[, "parent"],
                             igraph::V(g)$name,
                             edgelist[match(igraph::V(g)$name, edgelist[, "kidfull"]), "kid"]
)
# extracting info from igraph object
coords <- layout_as_tree(g, root=1, circular = F, rootlevel = numeric(), mode = "out", flip.y = TRUE)
colnames(coords) <- c("x", "y")
vdf <- data.frame(as.data.frame(get.vertex.attribute(g)), coords)
#vdf$angle <- { 90 - as.numeric(rownames(vdf)) * 360 / nrow(vdf)}
# add segments
edf <- get.data.frame(g)

edf$from.x <- vdf$x[match(edf$from, as.vector(vdf$name))]
edf$from.y <- vdf$y[match(edf$from, as.vector(vdf$name))]
edf$to.x <- vdf$x[match(edf$to, as.vector(vdf$name))]
edf$to.y <- vdf$y[match(edf$to, as.vector(vdf$name))]

  ggplot(data = vdf, aes(x = .data$x, y = -.data$y)) +
  geom_segment(
    data = edf,
    aes(
      x = .data$from.x, xend = .data$to.x,
      y = -.data$from.y, yend = -.data$to.y
    ), size = 0.2, colour = "grey"
  ) +
  geom_point() +
  scale_color_viridis_c() +
  theme_void() + 
  coord_polar() +
  geom_text(aes(label = .data$name2),
            color = "darkgrey", show.legend = FALSE, hjust = 1)

Here, the structure of the plot is fine, but I don’t like the "networky" look of the plot (curve lines), which in my opinion make the relationship between levels unclear.

2

Answers


  1. Chosen as BEST ANSWER

    Thanks to A. Cameron advise, I kept looking for solution using igraph. Below is the code I ended up using. I post it here since I thought it might be usefull to others (who knows). I am pretty certain it is not as optimised as it could, but it did the job for me.

    # Formatting taxonomic information from path
    sep.level = ":"; sep.info = "@"
    parse <- unname(taxoparser(path, sep.level, sep.info)) %>%
      lapply(X =., FUN = function(x){ x <- x[names(x) %in% c("kingdom", "phylum", "class", "order", "family", "genus", "species")]})
    path <- sapply(parse, toString) 
    parse <- strsplit(path, ", ")
    parse.mat <- do.call(rbind, lapply(parse, `length<-`, max(lengths(parse))))
    
    # Building edgelist
    edgelist <- NULL
    for (i in rev(2:ncol(parse.mat))) {
      idx <- which(!is.na(parse.mat[, i]))
      kid <- parse.mat[idx, i]
      parent <- parse.mat[idx, (i - 1)]
      kidfull <- apply(parse.mat[idx, 1:i, drop = F], 1, toString)
      parentfull <- apply(parse.mat[idx, 1:(i - 1), drop = F], 1, toString)
      
      edgelist <- rbind(
        edgelist,
        unique(cbind(
          parentfull,
          kidfull,
          parent, kid
        ))
      )
    }
    
    # Work on leaves labels
    # https://r-graph-gallery.com/339-circular-dendrogram-with-ggraph.html
    vertices = data.frame(name = unique(c(
      as.character(edgelist[, c("parent")]), as.character(edgelist[, c("kid")])))) %>% 
      mutate(
      var1 = abs(rnorm(nrow(.), 0,1)), #add random variables
      var2 = abs(rnorm(nrow(.), 0,1)),
      var3 = abs(rnorm(nrow(.), 0,1))
    )
    
    #### plot ####
    graph_from_data_frame(edgelist[, c("parent", "kid")], vertices = vertices) %>%
      as_tbl_graph() %>%
      mutate(depth = node_distance_from(which(node_is_root())), # add the distance from roots
             max.depth = max(node_distance_from(which(node_is_root( # get the biggest distance from roots
             ))))) %>%
      ggraph(.,
             height = -depth,
             layout = 'dendrogram',
             circular = TRUE) +
      geom_edge_fan() +
      geom_node_text(
        aes(
          x = cos(node_angle(x, y, degrees = F)) * 1.35,
          y = sin(node_angle(x, y, degrees = F)) * 1.35,
          filter = leaf,
          label = name,
          angle = node_angle(x, y) %>% ifelse({
            . > 90 & . < 270
          }, . + 180, .),
          hjust = node_angle(x, y) %>% {
            ifelse({
              . > 90 & . < 270
            }, 1, 0)
          }
        ),
        size = 2,
        alpha = 1
      ) +
      geom_node_point() +
      geom_node_arc_bar(
        aes(
          x0 = 0,
          y0 = 0,
          r0 = 1.0,
          r = 1.1,
          start = 2 * pi / 360 * (90 - node_angle(x, y, degrees = T) + 360 /
                                    (length(leaf == "TRUE"))),
          end = 2 * pi / 360 * (90 - node_angle(x, y, degrees = T) - 360 /
                                  (length(leaf == "TRUE"))),
          fill = var1,
          filter = leaf
        )
      ) +
      geom_node_arc_bar(
        aes(
          x0 = 0,
          y0 = 0,
          r0 = 1.1,
          r = 1.2,
          start = 2 * pi / 360 * (90 - node_angle(x, y, degrees = T) + 360 /
                                    (length(leaf == "TRUE"))),
          end = 2 * pi / 360 * (90 - node_angle(x, y, degrees = T) - 360 /
                                  (length(leaf == "TRUE"))),
          fill = var2,
          filter = leaf
        )
      ) +
      geom_node_arc_bar(
        aes(
          x0 = 0,
          y0 = 0,
          r0 = 1.2,
          r = 1.3,
          start = 2 * pi / 360 * (90 - node_angle(x, y, degrees = T) + 360 /
                                    (length(leaf == "TRUE"))),
          end = 2 * pi / 360 * (90 - node_angle(x, y, degrees = T) - 360 /
                                  (length(leaf == "TRUE"))),
          fill = var3,
          filter = leaf
        )
      ) +
      scale_fill_viridis(discrete = F,
                         option = "B",
                         direction = -1) +
      coord_fixed() +
      theme_void() +
      expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))
    


  2. I don’t think you should give up on the ggraph approach. Your main issue seems to be that the leaf nodes you had were plotted at the wrong depth. However, it is straighforward to calculate the node depth and pass that to the height argument when creating the plot layout. To make the image a bit clearer, I have added in circles to indicate the toxonomic levels:

    library(tidygraph)
    
    as_tbl_graph(mygraph) %>%
      mutate(depth = node_distance_from(which(node_is_root()))) %>%
      ggraph(layout = 'dendrogram', height = -depth, circular = TRUE) +
      ggforce::geom_circle(aes(x0 = 0, y0 = 0 , r = r), color = 'gray90',
                           data = data.frame(r = seq(0, 1, 1/6))) +
      geom_edge_elbow() +
      geom_node_point() +
      theme_void()
    

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search