skip to Main Content

I have this problem with an ugly looking overlapping regression line when I plot my calculations with scatterplot() in combination with abline():
enter image description here

The bug seems to occur only when the line is cutting the upper frame. Unfortunately it happens with my data. I give following code example reproducing the plot above:

# setting seed vector which reproduces the error
set.seed(684654)

# create data-frame
value1 <- rnorm(500,40,10)
value2 <- sqrt(rnorm(500,25,15)^2)
value3 <- sqrt(rnorm(500,10,15)^4)
df <- data.frame(value1, value2, value3)

# categorize quantils of value2
library(dplyr)
q <- quantile(value2)
df <- df %>%
  mutate(cat=ifelse(value2 < q[2], "1st Qu.", NA),
       cat=ifelse(value2 >= q[2] & value2 < q[3], "2nd Qu.", cat),
       cat=ifelse(value2 >= q[3] & value2 < q[4], "3rd Qu.", cat),
       cat=ifelse(value2 >= q[4], "4th Qu.", cat))

# regress model and save outcome to vector
lmf <- lm(log(value3) ~ value1 + factor(cat) - 1, data=df) 
y.hat <- lmf$fitted 

# scatterplot
library(car)
scatterplot(y.hat ~ value1 | cat, 
        smooth=FALSE, boxplots=FALSE, data=df,
        grid=FALSE)

# regression line
library(graphics)
abline(lm(y.hat ~ df$value1), lwd=2, col="red")

Paradoxically the bug does not occur when I simply plot following, so I assume kinda cumulative problem:

a <- log(rnorm(100,25,9))
b <- rnorm(100,1,.5)
scatterplot(a ~ b, smooth=FALSE, boxplots=FALSE, grid=FALSE)
abline(a=1, b=2, lwd=2, col="red")

Does anybody have a clue to fix this and make a tidy plot without needing to photoshop? Thanks!

PS: I’m using R 3.4.0 and car 2.1-4

2

Answers


  1. You can manually draw the regression line.

    # get regression coefficients
    cfs <- coef(lm(y.hat ~ df$value1)) 
    # draw regression line
    curve({cfs[1] + x*cfs[2]}, add = TRUE, lwd = 2, col = "red",
          from = 12, to = 70)
    

    adjust the value for from to extend the line further to the left/ right.

    Login or Signup to reply.
  2. Well that’s frustrating.
    I found a sort of solution by setting xpd explicitly to FALSE like so:

    par(xpd=FALSE)
    scatterplot(y.hat ~ value1 | cat, 
            smooth=FALSE, boxplots=FALSE, data=df,
            grid=FALSE)
    abline(lm(y.hat ~ df$value1), lwd=2, col="red")
    

    enter image description here

    Only issue is that if you want to adjust the margins (eg. by setting mar=c(2, 2, 2, 2)), the abline will again be drawn outside the plotting area. I don’t feel like investing too much work in it, but as setting par mucks it up and the problem is absent when legend.plot=FALSE, my hunch is that it has something to do with

    top <- if (legend.plot && missing(legend.coords)) {
                if (missing(legend.columns)) 
                    legend.columns <- find.legend.columns(nlevels(groups))
                4 + ceiling(nlevels(groups))/legend.columns
    

    and

    par(mar = c(mar[1], 0, top, 0))
    if (ybox > 0) 
        vbox(.y)
    par(mar = c(0, mar[2], 0, mar[4]))
    if (xbox > 0) 
        hbox(.x)
    par(mar = c(mar[1:2], top, mar[4]))
    

    (which can be found by entering car:::scatterplot.default)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search