A math puzzle (for fun!)

brain teaser mathematics
By Max R.

I'm going to introduce you to my favorite math puzzle. It's a doozy, and I hope you'll find it as intriguing as I do. And maybe a bit more intuitive than I did when I first encountered it. 

Let’s play a game

For the purposes of the puzzle, let's say you and nine of your friends (10 people total) are offered a chance to play a game. You'll all line up and be numbered in order, from 1 to 10. In front of the line is a closed door, leading to a room that you'll each go into one at a time. In that room, the game master tells you, there are ten boxes. The boxes are lined up in order, also labeled 1-10 on the outside like you and your friends.

Screen Shot 2023-03-08 at 1.11.09 PM

Okay, but here's the twist. Each box also contains a number on the inside. The game master informs you that each box contains a number between 1 and 10, and no two boxes contain the same number. In other words, every number between 1 and 10 is inside exactly one box. The numbers inside the boxes have been assigned randomly, however, so knowing the outside number tells you exactly nothing about the inside number.

Each of you will go into the room and open 5 boxes of your choice. If all of you find the box that contains your number on the inside, you all win the game. If even one person doesn't find it, you all lose. Oh, and you have to close the boxes after you're done checking them out, and after you're done trying you go into a further room and can't talk to anyone else until the game is over.

Screen Shot 2023-03-08 at 1.11.26 PM

Knowing the rules of the game, what strategy should you and your friends agree on to make it as likely as possible that you win?

A shot in the dark

The first thing you might try, if you had absolutely no idea what to do, would be guessing at random. After all, the numbers in the boxes are assigned at random. Can you really hope to do better than random guessing?

If each player opens half the boxes at random, then each player has a ½ probability of finding their number. (Do you see why?) Since the players' choices are independent of each other (no player's choices can influence any others in any way), the overall probability of success is just ½ times itself 10 times. In other words: 2-10

That's pretty tiny! 210 is 1024, so the probability of winning using this strategy is only about 0.1%. Even worse, if you add more players and more boxes, the probability of victory dips even lower. If you play with 20 people and 20 boxes, and each person gets to open 10 of them, the probability of winning using this method drops all the way to 2-20 or about 0.001%, a literally one in a million chance. 

If you’d like to think more about this problem before seeing the solution (the best possible strategy in fact!), now is your moment. If not, continue reading.

Going in circles cycles

We’re going to use some of the only information we have at our disposal. At first, when a person goes into the room, they only know their number. So, since every box is equally likely to contain their number as every other box, why not start with the box that’s labeled on the outside with their number?

Screen Shot 2023-03-08 at 1.11.33 PM

At first, there isn’t any reason to believe that will work particularly well. There’s still only a 1/10 chance that their number is inside that box. But we can iterate this strategy. Let’s say the person is person number 4 and finds the number 2 in their box, like in the image above. Okay. So next, they can open the box that’s labeled 2 on the outside, and they can keep doing that until they’ve opened the 5 boxes they’re allowed to.

Screen Shot 2023-03-08 at 1.11.41 PM

Concretely, this might play out by the person finding 2 in their first box, opening the box labeled 2 and finding 1, then opening the one labeled 1 and finding 4. Aha, good luck! I generated the box assignment genuinely randomly, so it was by luck that this person found their number. (Can you find if any of the 10 people wouldn’t find their number in 5 steps of this process?)

Screen Shot 2023-03-08 at 1.11.52 PM

What’re the odds?

How likely is the cycle-following strategy to work for all 10 people? 

Well, first of all, let’s look at what all the cycles are. So we’ll take the same arrangement of numbers in boxes from before and visualize them as cycles.

Screen Shot 2023-03-08 at 1.11.58 PM

We have 1 → 4 because if we open the box labeled 1 outside, we find 4 on the inside. And 4 → 2 because if we open the box labeled 4 on the outside, we find 2, and so on. 8 loops to itself because the box labeled 8 contains the number 8. 

(If you know about permutations, by the way, you’ll probably recognize that we’re dealing with a permutation of the numbers 1 through 10, and if you know about cycles of permutations, you’ll see that we’re just mapping out the cycle structure. We’re gonna mostly avoid thinking about it that way in this post, to avoid extra prerequisites.)

Now, here’s the thing. Anyone whose number appears in a cycle of length 5 or shorter will find their number using the cycle-following method! And, conveniently, anyone whose number appears in a cycle of length more than 5 (so at least 6) will not find their number using this method!

So when we ask, what is the chance that everyone finds their number, we’re just asking: What’s the probability that all the cycles have length 5 or shorter? We’re now going to answer that question.

Remember that since we’re choosing arrangements of numbers in boxes uniformly at random - in other words, every arrangement is as likely as every other - the probability of something happening is just the fraction of arrangements where that things happens. So we’re asking what fraction of arrangements of numbers in boxes have no cycles longer than 5?

The number of total number arrangements when we have n boxes is given (as you might already know) by the following formula:

Screen Shot 2023-03-08 at 1.12.05 PM

Since we have 10 boxes, that formula gives us 

Screen Shot 2023-03-08 at 1.12.10 PM

Of these, we want to know how many have no cycles longer than 5. Instead of counting that directly, it turns out to be easier to count how many arrangements have a longest cycle of length 6, then how many have a longest cycle of length 7, all the way up to a longest cycle of length 10. Let’s start with counting arrangements with the longest cycle of length 6; the same approach works for all of them.

First of all, there can only be one cycle of length at least 6, because cycles don’t overlap and there are only 10 numbers total. So basically we want to pick 6 of the 10 numbers to put in the cycle, and decide how to link them up to each other, and then we can do whatever we want with the rest of the numbers - as long as there’s a cycle of length 6, the remaining 4 numbers that aren’t in it can be arranged however we want.

There’s a formula for the number of ways of picking 6 items out of 10, which you may also know. It is:

Screen Shot 2023-03-08 at 1.12.14 PM

which is a special case of the general for picking k items out of n:

Screen Shot 2023-03-08 at 1.12.50 PM

That tells us how many ways there are to pick the 6 numbers we’re gonna put in the cycle. How about how many ways to arrange those in a cycle? The answer to that is that we’re basically ordering those 6 elements - meaning there are 6! ways to do that - but then every cycle corresponds to 6 different orderings, because in the cycle which number goes “first” doesn’t actually matter. Check out this illustration to see how it works in a specific case:

Screen Shot 2023-03-08 at 1.12.22 PM

All of the orderings on the left give the same cycle, because unlike a sequence a cycle has no beginning and no ending - all that matters is which numbers are next to which other numbers.

So that means we get a total of 5! = (6!)/(6) ways of arranging those 6 items into a cycle. And as for the rest? Well we can arrange them however we want, and there are 4 items, so we end up with 4! possible arrangements of those items. 

Combining everything, we’ve figured out that the total number of arrangements with a cycle of length exactly 6 is:

Screen Shot 2023-03-08 at 1.12.32 PM

Whoa! There’re exactly 10! arrangements total, so the fraction of them that have a cycle of length exactly 6 is… exactly ⅙. It turns out the answer for 7 is 1/7 and the answer for 8 is ⅛ and so on all the way up to 1/10. 

Therefore, the total fraction of arrangements that have a cycle of length at least 6 is:

Screen Shot 2023-03-08 at 1.12.38 PM

When there’s a cycle of length at least 6, the players lose, so the probability of winning is 

Screen Shot 2023-03-08 at 1.12.41 PM

That means about a 35% chance of winning, which is a massive, massive improvement over the 0.1% we were looking at with random guessing.

Take it to the limit

At this point you might be wondering what happens if I play the game with more players. Did I use some trick that only applies to 10? Does the game get harder with more players and more boxes? In this section, I’m going to answer those questions.

In the interest of time and space, I’m going to move through the argument quickly here. Following this part might require some more comfort with combinatorics than we’ve used so far. So let’s put the answer right up front and leave the details for after. 

If we play the game with an even number n of players and boxes, and each player gets to open n/2 of those boxes, then the cycle-following strategy wins with this probability pn:

Screen Shot 2023-03-08 at 1.12.46 PM

In other words, about a 30% probability of winning no matter how large n gets!

The reason for this is that again there are n! total arrangements of numbers in boxes (corresponding to permutations). We have to count for every k > (n/2) how many permutations have a longest cycle of length k, and there’s always only one such cycle. Just like above we get that the total number of such permutations is

Screen Shot 2023-03-08 at 1.12.50 PM

Which gives us a 1/k. That means the total probability of losing is

Screen Shot 2023-03-08 at 1.12.54 PM

And here’s where things get interesting. There’s a special series you might know about known as the harmonic series

 

Screen Shot 2023-03-08 at 1.12.57 PM

 

So the probability of losing is 

Screen Shot 2023-03-08 at 1.13.00 PM

However, Hn for large n is approximately (up to an additive error that goes to 0 for large n) equal to

Screen Shot 2023-03-08 at 1.13.03 PM

where γ is a constant that doesn’t matter for our purposes. What matters is that we end up with

Screen Shot 2023-03-08 at 1.13.07 PM

And thus the probability of winning is 

Screen Shot 2023-03-08 at 1.13.10 PM

 

Max holds a PhD in Computer Science from UC Berkeley, prior to which he received his AB in Mathematics from Princeton University. He has done research across multiple areas of machine learning and statistics, notably natural language processing, fine tuning for neural networks, probabilistic modeling, and statistical theory.

Comments

topicTopics
academics study skills MCAT medical school admissions SAT expository writing college admissions English MD/PhD admissions strategy writing LSAT GMAT GRE physics chemistry math biology graduate admissions academic advice ACT interview prep law school admissions test anxiety language learning premed MBA admissions career advice personal statements homework help AP exams creative writing MD study schedules test prep computer science Common Application summer activities history mathematics philosophy organic chemistry secondary applications economics supplements research 1L PSAT admissions coaching grammar law psychology statistics & probability legal studies ESL CARS SSAT covid-19 dental admissions logic games reading comprehension engineering USMLE calculus PhD admissions Spanish mentorship parents Latin biochemistry case coaching verbal reasoning DAT English literature STEM excel medical school political science skills AMCAS French Linguistics MBA coursework Tutoring Approaches academic integrity chinese letters of recommendation Anki DO Social Advocacy admissions advice algebra art history artificial intelligence astrophysics business cell biology classics diversity statement gap year genetics geometry kinematics linear algebra mechanical engineering mental health presentations quantitative reasoning study abroad technical interviews time management work and activities 2L DMD IB exams ISEE MD/PhD programs Sentence Correction adjusting to college algorithms amino acids analysis essay athletics business skills careers cold emails data science dental school finance first generation student functions graphing information sessions international students internships logic networking poetry resume revising science social sciences software engineering tech industry trigonometry writer's block 3L AAMC Academic Interest EMT FlexMed Fourier Series Greek Health Professional Shortage Area Italian Lagrange multipliers London MD vs PhD MMI Montessori National Health Service Corps Pythagorean Theorem Python Shakespeare Step 2 TMDSAS Taylor Series Truss Analysis Zoom acids and bases active learning architecture argumentative writing art art and design schools art portfolios bacteriology bibliographies biomedicine brain teaser campus visits cantonese capacitors capital markets central limit theorem centrifugal force chemical engineering chess chromatography class participation climate change clinical experience community service constitutional law consulting cover letters curriculum dementia demonstrated interest dimensional analysis distance learning econometrics electric engineering electricity and magnetism escape velocity evolution executive function freewriting genomics harmonics health policy history of medicine history of science hybrid vehicles hydrophobic effect ideal gas law immunology induction infinite institutional actions integrated reasoning intermolecular forces intern investing investment banking lab reports linear maps mandarin chinese matrices mba medical physics meiosis microeconomics mitosis mnemonics music music theory nervous system neurology neuroscience object-oriented programming office hours operating systems