skip to Main Content

Running the same version of R, one on a Linux R server and one on AWS, the RNG is almost the same, but not always identical. Out of 1 million samples from the uniform, gamma, and Normal distributions respectively:

  • runif() produces identical results.
  • rgamma() produces 7 small differences; otherwise identical results.
  • rnorm() also produces 7 small differences; otherwise identical results.

By small differences, I mean something like 1.4510448921274106 vs 1.4510448921274115.

What would be causing these differences? If a floating point issue, why only some distributions? If an OS/library/software issue, why only different on rare occasions?

2

Answers


  1. Besides for seed choice, your issue might lies in the choice of the Pseudo-random Number Generator (PRNG) that your R environment is using.

    R usually implements the Mersenne Twister by default for generating random numbers which are then scaled to a range between 0 and 1 – thus simulating uniform random variables. The other distributions can then be simulated via the inverse probability transform.

    For a easier understanding of how PRNGs work- check out this Kahn Academy video.

    Additionally you can also check out the R documentation on this topic:

    1. https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Random
    2. https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Random.user

    If you want to look at implementing different PRNGs to see what happens "under the hood" with R. I am in the process of developing an R package that allows users to implement different PRNGs via functions. Check it out here.

    enter image description here

    Login or Signup to reply.
  2. runif() is not really using floating point; it’s doing integer arithmetic (I think it’s a tricky/hacky 64-bit integer computation, although I might be misremembering that), and only converting to floating-point at the last step. So it is not subject to cross-platform/cross-compiler floating-point artifacts.

    As for "why only different on rare occasions"; I assume that the rgamma() and rnorm() implementations are relatively numerically stable, so that the possibilities for floating-point/roundoff error are rare.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search