How to make your own randomness

academics statistics & probability
By Gavin

Suppose you end up in a foreign country which, though beautiful and with great cafes, has incredibly odd currency. Whenever you flip their coins, they don’t land heads 50% of the time and tails 50% of the time; diabolically, they land heads 90% of the time and tails 10% of the time. You’ve found yourself in the country of “Determinish”, and they do not like randomness. 

But let’s say you really wanted to flip a 50-50 coin. Could you “make” a 50-50 coin flip using a single one of their 90-10 coins?   

You can!  

Thinking about symmetry 

We’ll solve this for any biased coin with probability of heads 𝑝, with which we want to construct a fair, 50-50 coin. By construct, I mean design a process that, once we start it, says “Heads” 50% of the time, and “Tails” the other 50% of the time. That’s our abstracted version of “making” or “constructing” a 50-50 coin. Of course, there are a lot of ways to construct a fair coin like this, and there’s active research in functions called “randomness extractors” that generalize this idea of “constructing unbiased randomness from biased randomness”. Our solution is one that the computer scientist John Von Neumann discovered.  

A key issue with our biased coin is that Heads and Tails are asymmetric. If we changed the sides of the coins, like changing the stickers on a Rubik’s cube, then we’d have a different coin. Instead of being Heads with probability 𝑝, it would be heads with probability 1−𝑝. The only way for the coin to be the same if we change the sides, to be symmetric, is if 𝑝 = 1 <--> 𝑝 = (1/2).

If symmetry has some essential relationship with the 50-50 coin, maybe we can make lemonade out of biased coins lemons, and create something symmetric from our biased coin.  

Here’s the clever idea: if we flip the biased coin twice, because there’s nothing about either coin flip that influences the other (they are independent), so we have symmetry across time. We could say “Heads then Tails” or “Tails then Heads” and no one would, statistically, know the difference. We can see this mathematically with the following probability table: 

Screen Shot 2024-10-15 at 10.45.21 AM

Writing it out, we have that 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝑇𝑎𝑖𝑙𝑠) = 𝑝(1−𝑝) = (1−𝑝)𝑝 = 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝐻𝑒𝑎𝑑𝑠). So we have something symmetric! We can even see it visually, looking at the expression. How do we use it to construct the fair coin?  

First, we have to ask: is flipping twice completely symmetric?

Not quite. When 𝑝 ≠ (1/2), 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠) = 𝑝2 ≠ (1−𝑝)2 = 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠), so no symmetry there.  

Let’s just forge ahead and imagine what an algorithm/process might look like for making our fair coin out of two flips, and then see if we can fix it up, knowing that some problems will arise. 

I’m thinking about our algorithm like this: 

Screen Shot 2024-10-15 at 10.49.26 AM

So right now, we have that 𝑃(𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝑠𝑎𝑦𝑠 𝐻𝑒𝑎𝑑𝑠) = 𝑃(𝐴𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚 𝑠𝑎𝑦𝑠 𝑇𝑎𝑖𝑙𝑠), which sort of sounds like our algorithm is a fair, 50-50 coin! But this relies on our algorithm saying nothing when it sees (𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠) or (𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠). Our algorithm is a bit of a weird coin: where sometimes it lands Heads, sometimes it lands Tails, and sometimes it just refuses to say either. 

Our solution is to just keep flipping until it says something! We can write this out like a procedure that a person or a computer could follow. 

Screen Shot 2024-10-15 at 10.49.58 AM

Flipping forever? 

 Notice that our algorithm might continue forever! We might keep getting (𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠)or (𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠) forever, but of course, that would be incredibly unlikely. Though we can’t say exactly how many flips it will take beforehand, we can find the average (expected value of) how many flips it will take. We’ll use linearity of expectation and the law of total probability to break this expectation, 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠], into two terms. 

Screen Shot 2024-10-15 at 10.53.33 AM

But of course, 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠 𝑖𝑓 𝑓𝑖𝑟𝑠𝑡 (𝐻,𝑇), (𝑇,𝐻)] is straightforward, because our algorithm would say something and stop! So we know that 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠 𝑖𝑓 𝑓𝑖𝑟𝑠𝑡 (𝐻,𝑇), (𝑇,𝐻)] = 2.

Screen Shot 2024-10-15 at 10.53.47 AM If we see (𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠) or (𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠), then we start fresh, and there’s nothing about our initial flips that effects the result of subsequent flips because they are independent. So we can write: 

Screen Shot 2024-10-15 at 10.53.53 AM

Statisticians call processes like this memoryless. Note that it’s easy to tie yourself into knots thinking about independence here. The knot is the question: if I see (𝐻𝑒𝑎𝑑𝑠, 𝑇𝑎𝑖𝑙𝑠), then I won’t flip again, which doesn’t sound like they’re independent? This is because we’re combining two different events into one: whether we flip again, and the results of the next flip. The coin flips are not independent of flipping again, but they are independent of the results of the next flip. 

Going back to our equation and using our probability table from earlier, our equation simplifies to  

Screen Shot 2024-10-15 at 10.53.58 AM

We can solve this like any other algebraic equation, 𝐸[# 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠] is just a number after all. 

Screen Shot 2024-10-15 at 10.54.04 AM

We can do some algebra to simplify the right hand side, or look closer and notice that 

Screen Shot 2024-10-15 at 10.54.11 AM

Likewise, on the left hand side we see that 1 𝑝2 (1𝑝)2 = 1 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝐻𝑒𝑎𝑑𝑠 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝑇𝑎𝑖𝑙𝑠) = 𝑃(𝐻𝑒𝑎𝑑𝑠, 𝑇𝑎𝑖𝑙𝑠) + 𝑃(𝑇𝑎𝑖𝑙𝑠, 𝐻𝑒𝑎𝑑𝑠), leaving us with the nice expression: 

Screen Shot 2024-10-15 at 10.54.17 AMFollowing this expression, the number of flips is minimized when 𝑝 = (1/2), which makes sense: making a fair coin out of fair coins would be easiest. In our example where 𝑝 = (1/10) in the strange country “Determinish”, we have  

Screen Shot 2024-10-15 at 10.54.23 AM

It takes a lot of not-so-random coin flips to produce one “maximally random”, 50-50 coin flip!  

Conclusion

Making randomness extractors like this more efficient is a broad topic in computer science, but our simple algorithm for the biased coin demonstrates an intuition which is fundamentally true: we can add together multiple not-so-random pieces to make a very-random whole, and the more random the pieces, the less of them we need. If you’re interested in how this works mathematically, the concept of “entropy” in information theory and statistics is a great place to start. 

Gavin holds a BA in Mathematics and Computer Science from Harvard College. After working as a software engineer in computational geometry at Markforged in Boston, he moved to Austria to work as a research assistant in quantitative social science at the Institute for Science and Technology Austria and Complexity Science Hub Vienna.

Comments

topicTopics
academics study skills MCAT medical school admissions SAT college admissions expository writing strategy English MD/PhD admissions writing LSAT physics GMAT GRE chemistry academic advice graduate admissions biology math interview prep law school admissions ACT language learning test anxiety personal statements premed career advice MBA admissions AP exams homework help test prep creative writing MD mathematics computer science study schedules Common Application history summer activities secondary applications research philosophy organic chemistry economics supplements admissions coaching 1L dental admissions grammar statistics & probability PSAT psychology law legal studies ESL reading comprehension CARS PhD admissions SSAT covid-19 logic games calculus engineering USMLE medical school mentorship Latin Spanish biochemistry parents AMCAS admissions advice case coaching verbal reasoning DAT English literature STEM dental school excel genetics political science skills French Linguistics MBA coursework Tutoring Approaches academic integrity astrophysics chinese classics freewriting gap year letters of recommendation mechanical engineering technical interviews units Anki DO Social Advocacy algebra amino acids art history artificial intelligence business careers cell biology cold emails data science diversity statement first generation student geometry graphing kinematics linear algebra mental health pre-dental presentations quantitative reasoning software engineering study abroad tech industry time management work and activities writer's block 2L AAMC DMD IB exams ISEE MD/PhD programs MMI Sentence Correction adjusting to college algorithms analysis essay argumentative writing athletics business skills executive function fellowships finance functions genomics infinite information sessions international students internships logic networking office hours poetry proofs resume revising scholarships science social sciences trigonometry 3L Academic Interest ChatGPT EMT FlexMed Fourier Series Greek Health Professional Shortage Area Italian JD/MBA admissions Japanese Lagrange multipliers London MD vs PhD Montessori National Health Service Corps Pythagorean Theorem Python Shakespeare Step 2 TMDSAS Taylor Series Truss Analysis Zoom acids and bases active learning architecture art art and design schools art portfolios bacteriology bibliographies biomedicine boarding school brain teaser burnout campus visits cantonese capacitors capital markets central limit theorem centrifugal force chem/phys chemical engineering chess chromatography class participation climate change clinical experience community service competitions constitutional law consulting cover letters creative nonfiction curriculum dementia demonstrated interest dimensional analysis distance learning econometrics electric engineering electricity and magnetism embryology entropy escape velocity evolution extracurriculars fundraising harmonics health policy history of medicine history of science hybrid vehicles hydrophobic effect ideal gas law immunology induction infinite series institutional actions integrated reasoning intermolecular forces intern investing