I’ve simulated this data.frame
:
library(plyr); library(ggplot2)
count <- rev(seq(0, 500, 20))
tide <- seq(0, 5, length.out = length(count))
df <- data.frame(count, tide)
count_sim <- unlist(llply(count, function(x) rnorm(20, x, 50)))
count_sim_df <- data.frame(tide=rep(tide,each=20), count_sim)
And it can be plotted like this:
ggplot(df, aes(tide, count)) + geom_jitter(data = count_sim_df, aes(tide, count_sim), position = position_jitter(width = 0.09)) + geom_line(color = "red")
I now want to split count_sim_df
into two group: high
and low
. When I plot the split count_sim_df
, it should look like this (everything in green and blue is photoshopped). The bit that I’m finding tricky is getting overlap between high
and low
around the middle values of tide
.
This is how I want to split count_sim_df
into high and low:
- assign half of
count_sim_df
tohigh
and half ofcount_sim_df
tolow
- reassign the values of
count
to create overlap betweenhigh
andlow
around the middle values oftide
2
Answers
Here’s my revised suggestion. I hope it helps.
I might still not have fully understood your procedure to split into high and low, especially what you mean by “reassigning the value of count”. In this case here I have defined an overlap region of 30% around the middle value of
tide
and assigned randomly half of the points within this transition region to the “high” and the other half to the “low” set.Here’s a way to generate the sample dataset and the groupings using relatively little code and just base R: