Running the same version of R, one on a Linux R server and one on AWS, the RNG is almost the same, but not always identical. Out of 1 million samples from the uniform, gamma, and Normal distributions respectively:
runif()
produces identical results.rgamma()
produces 7 small differences; otherwise identical results.rnorm()
also produces 7 small differences; otherwise identical results.
By small differences, I mean something like 1.4510448921274106 vs 1.4510448921274115.
What would be causing these differences? If a floating point issue, why only some distributions? If an OS/library/software issue, why only different on rare occasions?
2
Answers
Besides for seed choice, your issue might lies in the choice of the Pseudo-random Number Generator (PRNG) that your R environment is using.
R usually implements the Mersenne Twister by default for generating random numbers which are then scaled to a range between 0 and 1 – thus simulating uniform random variables. The other distributions can then be simulated via the inverse probability transform.
For a easier understanding of how PRNGs work- check out this Kahn Academy video.
Additionally you can also check out the R documentation on this topic:
If you want to look at implementing different PRNGs to see what happens "under the hood" with R. I am in the process of developing an R package that allows users to implement different PRNGs via functions. Check it out here.
runif()
is not really using floating point; it’s doing integer arithmetic (I think it’s a tricky/hacky 64-bit integer computation, although I might be misremembering that), and only converting to floating-point at the last step. So it is not subject to cross-platform/cross-compiler floating-point artifacts.As for "why only different on rare occasions"; I assume that the
rgamma()
andrnorm()
implementations are relatively numerically stable, so that the possibilities for floating-point/roundoff error are rare.