skip to Main Content

For some reason, running a lm regression in R gave me different output. It is a simple lm and no randomness is involved. And because I only got this weird result in my computer (the code gave the same results on other computers), I did not attach the data here; instead, I attached the outputs.

Does anyone have suggestions?? Thanks!

d = read.csv("d.csv")
str(d)
## 'data.frame':    5000 obs. of  2 variables:
##  $ elevation_sd: num  -0.6123 1.3988 -0.111 -0.0763 0.4129 ...
##  $ pred_rich   : num  6.74 10.35 10.81 15.24 37.58 ...
coef(lm(pred_rich ~ elevation_sd + I(elevation_sd^2), data = d))
##       (Intercept)      elevation_sd I(elevation_sd^2) 
##       31.40967132      -14.51366622       -0.04686489
coef(lm(pred_rich ~ elevation_sd + I(elevation_sd^2), data = d))
##       (Intercept)      elevation_sd I(elevation_sd^2) 
##         24.791451        -22.377711          2.530645
coef(lm(pred_rich ~ elevation_sd + I(elevation_sd^2), data = d))
##       (Intercept)      elevation_sd I(elevation_sd^2) 
##         29.009348        -24.539297          2.184579
coef(lm(pred_rich ~ elevation_sd + I(elevation_sd^2), data = d))
##       (Intercept)      elevation_sd I(elevation_sd^2) 
##         17.615122        -36.893819          5.300316
coef(lm(pred_rich ~ elevation_sd + I(elevation_sd^2), data = d))
##       (Intercept)      elevation_sd I(elevation_sd^2) 
##         17.292353        -37.052702          5.461568

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.29   R6_2.5.1        jsonlite_1.8.0  magrittr_2.0.3 
##  [5] evaluate_0.15   stringi_1.7.8   cachem_1.0.6    rlang_1.0.4    
##  [9] cli_3.3.0       rstudioapi_0.13 jquerylib_0.1.4 bslib_0.4.0    
## [13] rmarkdown_2.14  tools_4.2.1     stringr_1.4.0   xfun_0.31      
## [17] yaml_2.3.5      fastmap_1.1.0   compiler_4.2.1  htmltools_0.5.3
## [21] knitr_1.39      sass_0.4.2

3

Answers


  1. Chosen as BEST ANSWER

    Here is what I got by running Dirk's code:

                       [,1]        [,2]         [,3]        [,4]        [,5]
    (Intercept)  0.01103854  0.01819024 -0.010350210  0.01828507  0.02242858
    x           -0.02816530 -0.05002762 -0.027738397 -0.02812795 -0.02775974
    I(x^2)      -0.01918489 -0.01735535 -0.005590314 -0.01720919 -0.01929609
    

    To make the output more different:

    set.seed(123)
    x <- rnorm(5000) 
    y <- x + x^2 + rnorm(5000)  
    replicate(5, lm(y ~ x + I(x^2))$coef)
    
                     [,1]      [,2]      [,3]      [,4]      [,5]
    (Intercept) -1.643336 -3.908215 -1.979385 -3.707418 -1.845149
    x            2.033474  2.001891  2.019949  2.007776  2.045030
    I(x^2)       1.834672  1.977397  1.995412  2.867048  1.936577
    

    See the fourth replication have very different coefs.

    And the output below is from my other laptop (which works as expected):

    (Intercept) -0.01371532 -0.01371532 -0.01371532 -0.01371532 -0.01371532
    x            0.99386937  0.99386937  0.99386937  0.99386937  0.99386937
    I(x^2)       1.00964436  1.00964436  1.00964436  1.00964436  1.00964436
    

  2. As alluded to in the comment I made, this is at most an anecdote, and likely a waste of your and our time. We cannot do anything meaningful without a reproducible example.

    But we can mock this. Here I make a random draw, also with N = 5000 as in your case, and run the regression five times. Needless to say, even with these two vectors with no association whatsover by design the purely numerically determined coefficients are of course identical.

    > set.seed(123)
    > x <- rnorm(5000); y <- rnorm(5000)  # your dimensions
    > replicate(5, lm(y ~ x + I(x^2))$coef)
                       [,1]        [,2]        [,3]        [,4]        [,5]
    (Intercept) -0.01371532 -0.01371532 -0.01371532 -0.01371532 -0.01371532
    x           -0.00613063 -0.00613063 -0.00613063 -0.00613063 -0.00613063
    I(x^2)       0.00964436  0.00964436  0.00964436  0.00964436  0.00964436
    > 
    

    Now, those three lines of code are reproducible so I invite you to run this on your system.

    Edit: My sessionInfo(), also on Ubuntu but 22.04, starts with two lines for BLAS and LAPACK. You only have one which does not seem right.

    > sessionInfo()
    R version 4.2.1 (2022-06-23)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 22.04.1 LTS
    
    Matrix products: default
    BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
    LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
    
    Login or Signup to reply.
  3. I have solved this issue by changing my BLAS/LAPACK.

    sudo update-alternatives --config libblas.so.3-x86_64-linux-gnu
    sudo update-alternatives --config liblapack.so.3-x86_64-linux-gnu
    

    I selected OpenBLAS and then restarted R, no issue anymore! I have also tried the default BLAS/LAPACK and the AtlasBLAS, both work too!

    So, the issue is from the Intel MLK libraries! Thanks @dirk-eddelbuettel for the hint.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search