PHP Shuffle in PHP 7.1+ produces different results - PHP Versions

LawrenceCooke
January 31, 2021
168 views
0 votes
2 Answers

I use the shuffle() function to emulate a letter bag in a word game.

I recently upgraded from PHP 5 to PHP 7, and have found that on the newer version, the shuffle function isn’t quite as good for my purposes.

I used to start with the letters clumped and it would shuffle them nicely so there was a decent but unpredictable letter order. On 7.1+, it frequently produces clumps of the same letters, whereas getting those clumps was rare on 5.x

I saw that they changed the internal algorithm in 7.1.

Is there any way to emulate the older way shuffle worked in order to produce results more closely to how they were in PHP 5?

Answers

You could use the random_int function, which is suitable for cryptographic use.
Then you could use an implementation of the Fisher-Yates (aka Knuth) Shuffle algorithm:

function shuffle($array) {
    $random_array = array();
    $countArray = count($array) - 1;

    while (count($array) != 0) {
        $randomValue = random_int(0, $countArray);
        $random_array[] = $array[$randomValue];

        if(($randomValue + 1) == count($array)) {
            array_splice($array, $randomValue, ($randomValue - $countArray + 1));
        } else {
            array_splice($array, $randomValue, ($randomValue - $countArray));
        }

        $countArray--;
    }

    return $random_array;
}

- IMSoP
- February 1, 2021 at 3:33 pm
- 0 votes
0
What is Randomness?

One thing that computers have in common with the human brain is that they’re very bad at being random.

You can’t actually write down a set of steps that creates randomness, without some external input (throwing a dice, measuring radioactive decay, listening to an untuned radio, staring at lava lamps). All you can try to do is make an actually predictable sequence that looks random. Unfortunately, it’s easy to fool the human brain into thinking something’s random, or into spotting patterns where there are none, so you have to be really careful in how you design this "pseudo-randomness".

A relevant example of how the human brain perceives randomness is the fact that with only 23 people in a room, there’s a 50% chance that two of them will have the same birthday. This is so surprising that it’s often referred to as the "birthday paradox", even though it’s mathematically quite straight-forward. The clumps of letters in your word game are very similar.

What Changed?

For a long time, PHP had two different "pseudo-random number generators":
- A system-dependent algorithm, exposed to users as rand(), which had a bunch of problems.
- Something called the "Mersenne Twister", exposed to users as mt_rand(), which was generally better in all respects.
In PHP 7.1, a series of changes were made to basically eliminate the rand() version and use the Mersenne Twister algorithm everywhere. This included shuffle.

None of the rest of the algorithm changed: it was always intended to put the elements into a random order regardless of their starting positions, so your ability to affect it by clumping the starting positions was a bug, not a feature.

How random was shuffle in the different versions?

I wrote a quick script to test if shuffle had any biases:
1. Generate an array with 100 items, 0 to 99
2. Shuffle it
3. Mark down the new position of each element
4. Repeat lots of times
The result is a 100 x 100 grid showing how often, for instance, element 0 ends up in position 42.

If the shuffling is truly random, we should expect that as we repeat the shuffle enough times, every element ends up in every position just as often. In other words, our grid should have the same count for every combination.

I ran the same code on PHP 7.0 (when shuffle() used the old rand() implementation) and 7.1 (when it used the mt_rand() implementation) and visualised the results (script also included in the gist). The columns of the image represent the 100 elements in the array, and the rows the different positions they could end up in. Blue pixels represent combinations that happened less often than expected, and red pixels combinations that happened more often than expected.

For PHP 7.1, the image looks mostly black, because the numbers were pretty close to perfectly equal across all combinations:

For PHP 7.0, however, there are some very visible "hot" and "cold" areas!

So in the old algorithm, there are certain swaps that were much less likely to happen. These biases presumably happened to fit with the order you placed the items in to make it less likely than it should have been for certain letters to occur at certain times in the game.

What if you want to bias your shuffle?

The new algorithm should more accurately reflect what would happen in a real game – all the letters have an equal chance of appearing at any time in the game.

But maybe you find it more fun to "rig" the order, so that you don’t get the same letter several times in a row, or you’re likely to have a mix of consonants and vowels.

You could achieve that by re-implementing the shuffle algorithm, and then tweaking the probabilities of certain things happening.

The core of the algorithm is this (essentially what changed in 7.1 was the definition of RAND_RANGE):
```
while (--n_left) {
    RAND_RANGE(rnd_idx, 0, n_left, PHP_RAND_MAX);
    if (rnd_idx != n_left) {
        temp = hash->arData[n_left];
        hash->arData[n_left] = hash->arData[rnd_idx];
        hash->arData[rnd_idx] = temp;
    }
}
```
This picks items at random one by one, as you’d expect (but from last to first, because it happens to be easier), but to save re-allocating the array each time you "remove" an item, it uses a neat trick: it swaps the current element of the array with a random element earlier in the array.

In PHP code, it would look something like this (untested!):
```
$n_left = count($array);
while (--$n_left) {
    $rnd_idx = mt_rand(0, $n_left);
    // The if statement is just skipping the swap if our random choice is for it to stay where it is
    if ($rnd_idx != $n_left) {
        $temp = $array[$n_left];
        $array[$n_left] = $array[$rnd_idx];
        $array[$rnd_idx] = $temp;
    }
}
```
To add bias to this, you just need to change the way $rnd_idx is chosen. You could for instance say that if the chosen letter ($array[$rnd_idx]) is the same as the previously chosen one, pick again. You could have a whole function deciding how "good" a pick it is, and introducing bias that way.

Since "fun" is subjective, not mathematically definable, you’d have to just test a bunch of things and see which one you liked the results of.
Login or Signup to reply.