How to make your own randomness

academics College statistics & probability
By Gavin

Suppose you end up in a foreign country which, though beautiful and with great cafes, has incredibly odd currency. Whenever you flip their coins, they don’t land heads 50% of the time and tails 50% of the time; diabolically, they land heads 90% of the time and tails 10% of the time. You’ve found yourself in the country of “Determinish”, and they do not like randomness. 

But let’s say you really wanted to flip a 50-50 coin. Could you “make” a 50-50 coin flip using a single one of their 90-10 coins?   

You can!  

Thinking about symmetry 

We’ll solve this for any biased coin with probability of heads 𝑝, with which we want to construct a fair, 50-50 coin. By construct, I mean design a process that, once we start it, says “Heads” 50% of the time, and “Tails” the other 50% of the time. That’s our abstracted version of “making” or “constructing” a 50-50 coin. Of course, there are a lot of ways to construct a fair coin like this, and there’s active research in functions called “randomness extractors” that generalize this idea of “constructing unbiased randomness from biased randomness”. Our solution is one that the computer scientist John Von Neumann discovered.  

A key issue with our biased coin is that Heads and Tails are asymmetric. If we changed the sides of the coins, like changing the stickers on a Rubik’s cube, then we’d have a different coin. Instead of being Heads with probability 𝑝, it would be heads with probability 1−𝑝. The only way for the coin to be the same if we change the sides, to be symmetric, is if 𝑝 = 1 <--> 𝑝 = (1/2).

If symmetry has some essential relationship with the 50-50 coin, maybe we can make lemonade out of biased coins lemons, and create something symmetric from our biased coin.  

Here’s the clever idea: if we flip the biased coin twice, because there’s nothing about either coin flip that influences the other (they are independent), so we have symmetry across time. We could say “Heads then Tails” or “Tails then Heads” and no one would, statistically, know the difference. We can see this mathematically with the following probability table: 

Screen Shot 2024-10-15 at 10.45.21 AM

Writing it out, we have that 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝑇𝑎𝑖𝑙𝑠) = 𝑝(1−𝑝) = (1−𝑝)𝑝 = 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝐻𝑒𝑎𝑑𝑠). So we have something symmetric! We can even see it visually, looking at the expression. How do we use it to construct the fair coin?  

First, we have to ask: is flipping twice completely symmetric?

Not quite. When 𝑝 ≠ (1/2), 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠) = 𝑝2 ≠ (1−𝑝)2 = 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠), so no symmetry there.  

Let’s just forge ahead and imagine what an algorithm/process might look like for making our fair coin out of two flips, and then see if we can fix it up, knowing that some problems will arise. 

I’m thinking about our algorithm like this: 

Screen Shot 2024-10-15 at 10.49.26 AM

So right now, we have that 𝑃(𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝑠𝑎𝑦𝑠 𝐻𝑒𝑎𝑑𝑠) = 𝑃(𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝑠𝑎𝑦𝑠 𝑇𝑎𝑖𝑙𝑠), which sort of sounds like our algorithm is a fair, 50-50 coin! But this relies on our algorithm saying nothing when it sees (𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠) or (𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠). Our algorithm is a bit of a weird coin: where sometimes it lands Heads, sometimes it lands Tails, and sometimes it just refuses to say either. 

Our solution is to just keep flipping until it says something! We can write this out like a procedure that a person or a computer could follow. 

Screen Shot 2024-10-15 at 10.49.58 AM

Flipping forever? 

 Notice that our algorithm might continue forever! We might keep getting (𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠)or (𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠) forever, but of course, that would be incredibly unlikely. Though we can’t say exactly how many flips it will take beforehand, we can find the average (expected value of) how many flips it will take. We’ll use linearity of expectation and the law of total probability to break this expectation, 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠], into two terms. 

Screen Shot 2024-10-15 at 10.53.33 AM

But of course, 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠 𝑖𝑓 𝑓𝑖𝑟𝑠𝑡 (𝐻,𝑇), (𝑇,𝐻)] is straightforward, because our algorithm would say something and stop! So we know that 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠 𝑖𝑓 𝑓𝑖𝑟𝑠𝑡 (𝐻,𝑇), (𝑇,𝐻)] = 2.

Screen Shot 2024-10-15 at 10.53.47 AM If we see (𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠) or (𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠), then we start fresh, and there’s nothing about our initial flips that effects the result of subsequent flips because they are independent. So we can write: 

Screen Shot 2024-10-15 at 10.53.53 AM

Statisticians call processes like this memoryless. Note that it’s easy to tie yourself into knots thinking about independence here. The knot is the question: if I see (𝐻𝑒𝑎𝑑𝑠, 𝑇𝑎𝑖𝑙𝑠), then I won’t flip again, which doesn’t sound like they’re independent? This is because we’re combining two different events into one: whether we flip again, and the results of the next flip. The coin flips are not independent of flipping again, but they are independent of the results of the next flip. 

Going back to our equation and using our probability table from earlier, our equation simplifies to  

Screen Shot 2024-10-15 at 10.53.58 AM

We can solve this like any other algebraic equation, 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠] is just a number after all. 

Screen Shot 2024-10-15 at 10.54.04 AM

We can do some algebra to simplify the right hand side, or look closer and notice that 

Screen Shot 2024-10-15 at 10.54.11 AM

Likewise, on the left hand side we see that 1 𝑝2 (1𝑝)2 = 1 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠) = 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝑇𝑎𝑖𝑙𝑠) + 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝐻𝑒𝑎𝑑𝑠), leaving us with the nice expression: 

Screen Shot 2024-10-15 at 10.54.17 AMFollowing this expression, the number of flips is minimized when 𝑝 = (1/2), which makes sense: making a fair coin out of fair coins would be easiest. In our example where 𝑝 = (1/10) in the strange country “Determinish”, we have  

Screen Shot 2024-10-15 at 10.54.23 AM

It takes a lot of not-so-random coin flips to produce one “maximally random”, 50-50 coin flip!  

Conclusion

Making randomness extractors like this more efficient is a broad topic in computer science, but our simple algorithm for the biased coin demonstrates an intuition which is fundamentally true: we can add together multiple not-so-random pieces to make a very-random whole, and the more random the pieces, the less of them we need. If you’re interested in how this works mathematically, the concept of “entropy” in information theory and statistics is a great place to start. 

Gavin holds a BA in Mathematics and Computer Science from Harvard College. After working as a software engineer in computational geometry at Markforged in Boston, he moved to Austria to work as a research assistant in quantitative social science at the Institute for Science and Technology Austria and Complexity Science Hub Vienna.

Related Content

College

Did you know we offer tutoring for college students?

Learn more

Comments