This is a continuation from a previous question:
Rfast hd.eigen() returns NAs but base eigen() does not
I have been having a problem with .Internal(La_rs((x))
returning different results on different machines.
I suspect it may have something to do with number formatting, because on the same machine, if I save as a CSV and re-open, I don’t get negatives anymore:
On Clear Linux install:
> load("input_to_La_rs.Rdata")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 1
> write.csv(x, "test_for_internal.csv", row.names = FALSE)
> x <- read.csv("test_for_internal.csv")
> r <- .Internal(La_rs(as.matrix(x), only.values = FALSE))
> sum(r$values < 0)
[1] 0
However on my Windows install (and on a CentOS based HPC setup), I can open the rdata file directly and don’t get negative values:
> load("input_to_La_rs.Rdata")
> r <- .Internal(La_rs(x, only.values=TRUE))
> sum(r$values < 0)
[1] 0
Is this related to different R builds/library versions? Some setting I don’t know about? A bug?
Edit: here is an updated example. It seems to work inconsistently, even on the this particular install, sometimes I do get zero:
set.seed(123)
bigm <- matrix(rnorm(2000*2000,mean=0,sd = 3), 2000, 2000)
m <- Rfast::colmeans(bigm)
y <- t(bigm) - m
xx <- crossprod(y)
x <- unname(as.matrix(xx))
b <- .Internal(La_rs(x, TRUE))
sum(b$values < 0)
# [1] 1
Yet another update: It turns out that the first difference creeps in with Rfast
‘s colmeans
producing slightly different results than base colMeans.
set.seed(123)
bigm <- matrix(rnorm(2000*2000,mean=0,sd = 3), 2000, 2000)
m <- colMeans(bigm)
m <- colmeans(bigm)
y <- t(bigm) - m
xx <- crossprod(y)
x <- unname(as.matrix(xx))
b <- .Internal(La_rs(x, TRUE))
sum(b$values < 0)
# [1] 1
m <- colMeans(bigm)
y <- t(bigm) - m
xx <- crossprod(y)
x <- unname(as.matrix(xx))
b <- .Internal(La_rs(x, TRUE))
sum(b$values < 0)
2
Answers
I think the difference is rather negligible. Average difference of the matrix elements equal to 10^(-12) or less are actually zero.
The hd.eigen function in Rfast works only, only for the case of n < p, i.e. when the rows are less than the columns. In the help page of the hd.eigen function is the reference to the paper that suggested this algorithm. The algorithm I do not think works for any other case. Perhaps that is why you get NAs.
Rfast2 contains a function called “pca” that works for either case, np. Try that one also. Inside there, an SVD is effectively performed calling “svd” from R.