Suppose you end up in a foreign country which, though beautiful and with great cafes, has incredibly odd currency. Whenever you flip their coins, they don’t land heads 50% of the time and tails 50% of the time; diabolically, they land heads 90% of the time and tails 10% of the time. You’ve found yourself in the country of “Determinish”, and they do not like randomness.
But let’s say you really wanted to flip a 50-50 coin. Could you “make” a 50-50 coin flip using a single one of their 90-10 coins?
You can!
Thinking about symmetry
We’ll solve this for any biased coin with probability of heads 𝑝, with which we want to construct a fair, 50-50 coin. By construct, I mean design a process that, once we start it, says “Heads” 50% of the time, and “Tails” the other 50% of the time. That’s our abstracted version of “making” or “constructing” a 50-50 coin. Of course, there are a lot of ways to construct a fair coin like this, and there’s active research in functions called “randomness extractors” that generalize this idea of “constructing unbiased randomness from biased randomness”. Our solution is one that the computer scientist John Von Neumann discovered.
A key issue with our biased coin is that Heads and Tails are asymmetric. If we changed the sides of the coins, like changing the stickers on a Rubik’s cube, then we’d have a different coin. Instead of being Heads with probability 𝑝, it would be heads with probability 1−𝑝. The only way for the coin to be the same if we change the sides, to be symmetric, is if 𝑝 = 1 <--> 𝑝 = (1/2).
If symmetry has some essential relationship with the 50-50 coin, maybe we can make lemonade out of biased coins lemons, and create something symmetric from our biased coin.
Here’s the clever idea: if we flip the biased coin twice, because there’s nothing about either coin flip that influences the other (they are independent), so we have symmetry across time. We could say “Heads then Tails” or “Tails then Heads” and no one would, statistically, know the difference. We can see this mathematically with the following probability table:
Writing it out, we have that 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝑇𝑎𝑖𝑙𝑠) = 𝑝(1−𝑝) = (1−𝑝)𝑝 = 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝐻𝑒𝑎𝑑𝑠). So we have something symmetric! We can even see it visually, looking at the expression. How do we use it to construct the fair coin?
First, we have to ask: is flipping twice completely symmetric?
Not quite. When 𝑝 ≠ (1/2), 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠) = 𝑝2 ≠ (1−𝑝)2 = 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠), so no symmetry there.
Let’s just forge ahead and imagine what an algorithm/process might look like for making our fair coin out of two flips, and then see if we can fix it up, knowing that some problems will arise.
I’m thinking about our algorithm like this:
So right now, we have that 𝑃(𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝑠𝑎𝑦𝑠 𝐻𝑒𝑎𝑑𝑠) = 𝑃(𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝑠𝑎𝑦𝑠 𝑇𝑎𝑖𝑙𝑠), which sort of sounds like our algorithm is a fair, 50-50 coin! But this relies on our algorithm saying nothing when it sees (𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠) or (𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠). Our algorithm is a bit of a weird coin: where sometimes it lands Heads, sometimes it lands Tails, and sometimes it just refuses to say either.
Our solution is to just keep flipping until it says something! We can write this out like a procedure that a person or a computer could follow.
Flipping forever?
Notice that our algorithm might continue forever! We might keep getting (𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠)or (𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠) forever, but of course, that would be incredibly unlikely. Though we can’t say exactly how many flips it will take beforehand, we can find the average (expected value of) how many flips it will take. We’ll use linearity of expectation and the law of total probability to break this expectation, 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠], into two terms.
But of course, 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠 𝑖𝑓 𝑓𝑖𝑟𝑠𝑡 (𝐻,𝑇), (𝑇,𝐻)] is straightforward, because our algorithm would say something and stop! So we know that 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠 𝑖𝑓 𝑓𝑖𝑟𝑠𝑡 (𝐻,𝑇), (𝑇,𝐻)] = 2.
If we see (𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠) or (𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠), then we start fresh, and there’s nothing about our initial flips that effects the result of subsequent flips because they are independent. So we can write:
Statisticians call processes like this memoryless. Note that it’s easy to tie yourself into knots thinking about independence here. The knot is the question: if I see (𝐻𝑒𝑎𝑑𝑠, 𝑇𝑎𝑖𝑙𝑠), then I won’t flip again, which doesn’t sound like they’re independent? This is because we’re combining two different events into one: whether we flip again, and the results of the next flip. The coin flips are not independent of flipping again, but they are independent of the results of the next flip.
Going back to our equation and using our probability table from earlier, our equation simplifies to
We can solve this like any other algebraic equation, 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠] is just a number after all.
We can do some algebra to simplify the right hand side, or look closer and notice that
Likewise, on the left hand side we see that 1 − 𝑝2 − (1−𝑝)2 = 1 − 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠 − 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠) = 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝑇𝑎𝑖𝑙𝑠) + 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝐻𝑒𝑎𝑑𝑠), leaving us with the nice expression:
Following this expression, the number of flips is minimized when 𝑝 = (1/2), which makes sense: making a fair coin out of fair coins would be easiest. In our example where 𝑝 = (1/10) in the strange country “Determinish”, we have
It takes a lot of not-so-random coin flips to produce one “maximally random”, 50-50 coin flip!
Conclusion
Making randomness extractors like this more efficient is a broad topic in computer science, but our simple algorithm for the biased coin demonstrates an intuition which is fundamentally true: we can add together multiple not-so-random pieces to make a very-random whole, and the more random the pieces, the less of them we need. If you’re interested in how this works mathematically, the concept of “entropy” in information theory and statistics is a great place to start.
Comments