I use the shuffle() function to emulate a letter bag in a word game.
I recently upgraded from PHP 5 to PHP 7, and have found that on the newer version, the shuffle function isn’t quite as good for my purposes.
I used to start with the letters clumped and it would shuffle them nicely so there was a decent but unpredictable letter order. On 7.1+, it frequently produces clumps of the same letters, whereas getting those clumps was rare on 5.x
I saw that they changed the internal algorithm in 7.1.
Is there any way to emulate the older way shuffle worked in order to produce results more closely to how they were in PHP 5?
2
Answers
You could use the random_int function, which is suitable for cryptographic use.
Then you could use an implementation of the Fisher-Yates (aka Knuth) Shuffle algorithm:
What is Randomness?
One thing that computers have in common with the human brain is that they’re very bad at being random.
You can’t actually write down a set of steps that creates randomness, without some external input (throwing a dice, measuring radioactive decay, listening to an untuned radio, staring at lava lamps). All you can try to do is make an actually predictable sequence that looks random. Unfortunately, it’s easy to fool the human brain into thinking something’s random, or into spotting patterns where there are none, so you have to be really careful in how you design this "pseudo-randomness".
A relevant example of how the human brain perceives randomness is the fact that with only 23 people in a room, there’s a 50% chance that two of them will have the same birthday. This is so surprising that it’s often referred to as the "birthday paradox", even though it’s mathematically quite straight-forward. The clumps of letters in your word game are very similar.
What Changed?
For a long time, PHP had two different "pseudo-random number generators":
rand()
, which had a bunch of problems.mt_rand()
, which was generally better in all respects.In PHP 7.1, a series of changes were made to basically eliminate the
rand()
version and use the Mersenne Twister algorithm everywhere. This includedshuffle
.None of the rest of the algorithm changed: it was always intended to put the elements into a random order regardless of their starting positions, so your ability to affect it by clumping the starting positions was a bug, not a feature.
How random was shuffle in the different versions?
I wrote a quick script to test if
shuffle
had any biases:The result is a 100 x 100 grid showing how often, for instance, element 0 ends up in position 42.
If the shuffling is truly random, we should expect that as we repeat the shuffle enough times, every element ends up in every position just as often. In other words, our grid should have the same count for every combination.
I ran the same code on PHP 7.0 (when
shuffle()
used the oldrand()
implementation) and 7.1 (when it used themt_rand()
implementation) and visualised the results (script also included in the gist). The columns of the image represent the 100 elements in the array, and the rows the different positions they could end up in. Blue pixels represent combinations that happened less often than expected, and red pixels combinations that happened more often than expected.For PHP 7.1, the image looks mostly black, because the numbers were pretty close to perfectly equal across all combinations:
For PHP 7.0, however, there are some very visible "hot" and "cold" areas!
So in the old algorithm, there are certain swaps that were much less likely to happen. These biases presumably happened to fit with the order you placed the items in to make it less likely than it should have been for certain letters to occur at certain times in the game.
What if you want to bias your shuffle?
The new algorithm should more accurately reflect what would happen in a real game – all the letters have an equal chance of appearing at any time in the game.
But maybe you find it more fun to "rig" the order, so that you don’t get the same letter several times in a row, or you’re likely to have a mix of consonants and vowels.
You could achieve that by re-implementing the shuffle algorithm, and then tweaking the probabilities of certain things happening.
The core of the algorithm is this (essentially what changed in 7.1 was the definition of
RAND_RANGE
):This picks items at random one by one, as you’d expect (but from last to first, because it happens to be easier), but to save re-allocating the array each time you "remove" an item, it uses a neat trick: it swaps the current element of the array with a random element earlier in the array.
In PHP code, it would look something like this (untested!):
To add bias to this, you just need to change the way
$rnd_idx
is chosen. You could for instance say that if the chosen letter ($array[$rnd_idx]
) is the same as the previously chosen one, pick again. You could have a whole function deciding how "good" a pick it is, and introducing bias that way.Since "fun" is subjective, not mathematically definable, you’d have to just test a bunch of things and see which one you liked the results of.