skip to Main Content

I’ve simulated this data.frame:

library(plyr); library(ggplot2)
count <- rev(seq(0, 500, 20))
tide <- seq(0, 5, length.out = length(count))
df <- data.frame(count, tide)

count_sim <- unlist(llply(count, function(x) rnorm(20, x, 50)))
count_sim_df <- data.frame(tide=rep(tide,each=20), count_sim)

And it can be plotted like this:

ggplot(df, aes(tide, count)) + geom_jitter(data = count_sim_df, aes(tide, count_sim), position = position_jitter(width = 0.09)) + geom_line(color = "red")

enter image description here

I now want to split count_sim_df into two group: high and low. When I plot the split count_sim_df, it should look like this (everything in green and blue is photoshopped). The bit that I’m finding tricky is getting overlap between high and low around the middle values of tide.

This is how I want to split count_sim_df into high and low:

  • assign half of count_sim_df to high and half of count_sim_df to low
  • reassign the values of count to create overlap between high and low around the middle values of tide

enter image description here

2

Answers


  1. Here’s my revised suggestion. I hope it helps.

    middle_tide <- mean(count_sim_df$tide)
    hilo_margin <- 0.3
    middle_df <- subset(count_sim_df,tide > (middle_tide * (1 - hilo_margin)))
    middle_df <- subset(middle_df, tide < (middle_tide * (1 + hilo_margin)))
    upper_df <- count_sim_df[count_sim_df$tide > (middle_tide * (1 + hilo_margin)),]
    lower_df <- count_sim_df[count_sim_df$tide < (middle_tide * (1 - hilo_margin)),]
    idx <- sample(2,nrow(middle_df), replace = T)
    count_sim_high <- rbind(middle_df[idx==1,], upper_df)
    count_sim_low <- rbind(middle_df[idx==2,], lower_df)
    p <- ggplot(df, aes(tide, count)) + 
       geom_jitter(data = count_sim_high, aes(tide, count_sim), position = position_jitter(width = 0.09), alpha=0.4, col=3, size=3) + 
       geom_jitter(data = count_sim_low, aes(tide, count_sim), position = position_jitter(width = 0.09), alpha=0.4, col=4, size=3) + 
       geom_line(color = "red")
    

    enter image description here

    I might still not have fully understood your procedure to split into high and low, especially what you mean by “reassigning the value of count”. In this case here I have defined an overlap region of 30% around the middle value of tide and assigned randomly half of the points within this transition region to the “high” and the other half to the “low” set.

    Login or Signup to reply.
  2. Here’s a way to generate the sample dataset and the groupings using relatively little code and just base R:

    library(ggplot2)
    count <- rev(seq(0, 500, 20))
    tide <- seq(0, 5, length.out = length(count))
    df <- data.frame(count, tide)
    
    count_sim_df <- data.frame(tide = rep(tide,each=20),
                               count = rnorm(20 * nrow(df), rep(count, each = 20), 50))
    margin <- 0.3
    count_sim_df$`tide level` <-
      with(count_sim_df,
        factor((tide >= quantile(tide, 0.5 + margin / 2) |
               (tide >= quantile(tide, 0.5 - margin / 2) & sample(0:1, length(tide), TRUE))),
               labels = c("Low", "High")))
    ggplot(df, aes(x = tide, y = count)) +
      geom_line(colour = "red") +
      geom_point(aes(colour = `tide level`), count_sim_df, position = "jitter") +
      scale_colour_manual(values = c(High = "green", Low = "blue"))
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search