Random numbers

A few years ago, I was reading through some code handed to us by the teacher at university, and one line caught my attention. It was a very compact way to generate random numbers, using something I hadn’t seen before: Mersenne Twisters!

What I had heard before was not to use the “default” way of generating random numbers in many of the popular programming languages. For Python, C/C++, JavaScript, and Go, the random numbers aren’t recommended for secure applications, and often not at all. Why is that?

The simple answer is that random numbers are mathematically complex, and many random number implementations are flawed in one way or another. And not every developer’s needs are the same, so a seemingly random number could be all that is needed for some.

The much worse way

Using C++’s standard rand function to generate random numbers is an old-time classic; every developer has done this at least once in their lifetime (in whatever language they started with), and it’s not a great way to generate a random number…

In the case of C languages, the numbers turn out to be very bad compared to some more modern ones like Rust, and you will see why soon. But for now, here’s an example:

#include <iostream>
#include <cstdlib>
#include <ctime>
int main()
{
for (int i = 0; i < 100; i++)
{
// Technically this line should be outside of the loop for the best random numbers, but to see how it performs in the worst case it's kept inside
srand(time(NULL) + i);
int dice = (rand() % 6) + 1;
std::cout << "Your dice throw: " << dice << std::endl;
}
return 0;
}

With the code above, we will get numbers like 1, 3, 4, 1, 3, 4, 6, 3, 4, 6, 3, 4, 6, 1, 4, 6, 1, 4, etc. printed. Notice here that a different seed is being used for every new dice throw, i.e., the random function is being reseeded for every throw.

This should give completely new and unpredictable numbers every time, right… Right?

It doesn’t, not even close. The rand function in C is one of the simplest algorithms, and has existed since 1958: a Linear Congruential Generator, or LCG. These do not play nice with the last few bits of the generated number.

In the example code, I tried my best to show the worst side of the rand function, and I’m feeling kind of bad about it. The rand function doesn’t deserve all the hate I’m giving it; when seeded only once, it’s quite useful, as long as the lowest bits aren’t needed and the starting seed is properly chosen. This does not make it a good choice; it’s just not as bad when used correctly.

The slightly better way

Since C++11, there has instead been a more robust random number generator that I did not hear about until a few years ago. Looking more deeply into why this new class exists shows that it has strong statistical properties that should almost always be used whenever high-quality random numbers are needed.

#include <iostream>
#include <random>
int main()
{
for (int i = 0; i < 100; i++)
{
// Technically this line should be outside of the loop for the best random numbers, but to see how it performs in the worst case it's kept inside
std::mt19937 twister{ std::random_device{}() };
std::uniform_int_distribution distribute{ 1, 6 };
std::cout << distribute(twister) << std::endl;
}
return 0;
}

With the same torture test above with the Mersenne Twister, we will instead get something like 6, 6, 3, 2, 2, 4, 5, 3, 5, 6, 4, 4, 3, 4, 3, 5, 6. This is much better randomness, even at lower bit counts, and it doesn’t break down even when the constructor is abused like this.

Why even care?

But then, why even care? It’s not like you will be going around needing random numbers all the time. And that is also true, but would you rather have results that look like the left side or the right side when picking a “random” number?

The left image shows LCG-low-bit issues; the right image shows the benefit of high-quality random numbers, in this case a Mersenne Twister.



Leave a comment