skip to Main Content

First of all, I’m still a beginner. I’m trying to interpret and draw a stack bar plot with R. I already took a look at a number of answers but some were not specific to my case and others I simply didn’t understand:

I’ve got a dataset dvl that has five columns, Variant, Region, Time, Person and PrecededByPrep. I’d like to make a multivariate comparison of Variant to the other four predictors. Every column can have one of two possible values:

  • Variant: elk or ieder.
  • Region = VL or NL.
  • Time: time or no time
  • Person: person or no person
  • PrecededByPrep: 1 or 0

Here’s the logistic regression

From the answers I gathered that the library ggplot2 might be the best drawing library to go with. I’ve read its documentation but for the life of me I can’t figure out how to plot this: how can I get a comparison of Variant with the other three factors?

It took me a while, but I made something similar in Photoshop to what I’d like (fictional values!).

graph

Dark gray/light gray: possible values of Variant
y-axis: frequency
x-axis: every column, subdivided into its possible values

I know to make individual bar plots, both stacked and grouped, but basically I do not know how to have stacked, grouped bar plots. ggplot2 can be used, but if it can be done without I’d prefer that.

I think this can be seen as a sample dataset, though I’m not entirely sure. I am a beginner with R and I read about creating a sample set.

t <- data.frame(Variant = sample(c("iedere","elke"),size = 50, replace = TRUE),
            Region = sample(c("VL","NL"),size = 50, replace = TRUE),
            PrecededByPrep = sample(c("1","0"),size = 50, replace = TRUE),
            Person = sample(c("person","no person"),size = 50, replace = TRUE),
            Time = sample(c("time","no time"),size = 50, replace = TRUE))

I’d like to have the plot to be aesthetically pleasing as well. What I had in mind:

  • Plot colours (i.e. for the bars): col=c("paleturquoise3", "palegreen3")
  • A bold font for the axis labels font.lab=2 but not for the value labels (e.g. ´regionin bold, butVLandNL` not in bold)
  • #404040 as a colour for the font, axis and lines
  • Labels for the axes: x: factors, y: frequency

3

Answers


  1. Here is one possibility which starts with the ‘un-tabulated’ data frame, melt it, plot it with geom_bar in ggplot2 (which does the counting per group), separate the plot by variable by using facet_wrap.

    Create toy data:

    set.seed(123)
    df <- data.frame(Variant = sample(c("iedere", "elke"), size = 50, replace = TRUE),
               Region = sample(c("VL", "NL"), size = 50, replace = TRUE),
               PrecededByPrep = sample(c("1", "0"), size = 50, replace = TRUE),
               Person = sample(c("person", "no person"), size = 50, replace = TRUE),
               Time = sample(c("time", "no time"), size = 50, replace = TRUE))
    

    Reshape data:

    library(reshape2)
    df2 <- melt(df, id.vars = "Variant")
    

    Plot:

    library(ggplot2)
    ggplot(data = df2, aes(factor(value), fill = Variant)) +
      geom_bar() +
      facet_wrap(~variable, nrow = 1, scales = "free_x") +
      scale_fill_grey(start = 0.5) +
      theme_bw()
    

    enter image description here

    There are lots of opportunities to customize the plot, such as setting order of factor levels, rotating axis labels, wrapping facet labels on two lines (e.g. for the longer variable name “PrecededByPrep”), or changing spacing between facets.

    Customization (following updates in question and comments by OP)

    # labeller function used in facet_grid to wrap "PrecededByPrep" on two lines
    # see http://www.cookbook-r.com/Graphs/Facets_%28ggplot2%29/#modifying-facet-label-text
    my_lab <- function(var, value){
      value <- as.character(value)
        if (var == "variable") { 
          ifelse(value == "PrecededByPrep", "PrecedednByPrep", value)
        }
    }
    
    ggplot(data = df2, aes(factor(value), fill = Variant)) +
      geom_bar() +
      facet_grid(~variable, scales = "free_x", labeller = my_lab) + 
      scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
      theme_bw() +
      theme(axis.text = element_text(face = "bold"), # axis tick labels bold 
            axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
            line = element_line(colour = "gray25"), # line colour gray25 = #404040
            strip.text = element_text(face = "bold")) + # facet labels bold  
      xlab("factors") + # set axis labels
      ylab("frequency")
    

    enter image description here

    Add counts to each bar (edit following comments from OP).

    The basic principles to calculate the y coordinates can be found in this Q&A. Here I use dplyr to calculate counts per bar (i.e. label in geom_text) and their y coordinates, but this could of course be done in base R, plyr or data.table.

    # calculate counts (i.e. labels for geom_text) and their y positions.
    library(dplyr)
    df3 <- df2 %>%
      group_by(variable, value, Variant) %>%
      summarise(n = n()) %>%
      mutate(y = cumsum(n) - (0.5 * n))
    
    # plot
    ggplot(data = df2, aes(x = factor(value), fill = Variant)) +
      geom_bar() +
      geom_text(data = df3, aes(y = y, label = n)) +
      facet_grid(~variable, scales = "free_x", labeller = my_lab) + 
      scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
      theme_bw() +
      theme(axis.text = element_text(face = "bold"), # axis tick labels bold 
            axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
            line = element_line(colour = "gray25"), # line colour gray25 = #404040
            strip.text = element_text(face = "bold")) + # facet labels bold  
      xlab("factors") + # set axis labels
      ylab("frequency")
    

    enter image description here

    Login or Signup to reply.
  2. Here is my proposition for a solution with function barplot of base R :

    1. calculate the counts

    l_count_df<-lapply(colnames(t)[-1],function(nomcol){table(t$Variant,t[,nomcol])})
    count_df<-l_count_df[[1]]
    for (i in 2:length(l_count_df)){
        count_df<-cbind(count_df,l_count_df[[i]])
    }
    

    2. draw the barplot without axis names, saving the bar coordinates

    par(las=1,col.axis="#404040",mar=c(5,4.5,4,2),mgp=c(3.5,1,0))
    bp<-barplot(count_df,width=1.2,space=rep(c(1,0.3),4),col=c("paleturquoise3", "palegreen3"),border="#404040", axisname=F, ylab="Frequency",
                legend=row.names(count_df),ylim=c(0,max(colSums(count_df))*1.2))
    

    3. label the bars

    mtext(side=1,line=0.8,at=bp,text=colnames(count_df))
    mtext(side=1,line=2,at=(bp[seq(1,8,by=2)]+bp[seq(2,8,by=2)])/2,text=colnames(t)[-1],font=2)
    

    4. add values inside the bars

    for(i in 1:ncol(count_df)){
        val_elke<-count_df[1,i]
        val_iedere<-count_df[2,i]
        text(bp[i],val_elke/2,val_elke)
        text(bp[i],val_elke+val_iedere/2,val_iedere)
    }
    

    Here is what I get (with my random data) :

    enter image description here

    Login or Signup to reply.
  3. I’m basically answering a different question. I suppose this can be seen as perversity on my part, but I really dislike barplots of pretty much any sort. They have always seemed to create wasted space because the present informationed numerical values are less useful that an appropriately constructed table. The vcd package offers an extended mosaicplot function that seems to me to be more accurately called a “multivariate barplot that any of the ones I have seen so far. It does require that you first construct a contingency table for which the xtabs function seems a perfect fit.

    install.packages)"vcd")
    library(vcd)
    help(package=vcd,mosaic)
    col=c("paleturquoise3", "palegreen3")
    vcd::mosaic(xtabs(~Variant+Region + PrecededByPrep   +  Time, data=ttt) 
               ,highlighting="Variant", highlighting_fill=col)
    

    enter image description here

    That was the 5 way plot and this is the 5-way plot:

    png(); vcd::mosaic( xtabs(
                      ~Variant+Region + PrecededByPrep +   Person  +  Time, 
                       data=ttt) 
                    ,highlighting="Variant", highlighting_fill=col); dev.off()
    

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search