I'm going to introduce you to my favorite math puzzle. It's a doozy, and I hope you'll find it as intriguing as I do. And maybe a bit more intuitive than I did when I first encountered it.
Let’s play a game
For the purposes of the puzzle, let's say you and nine of your friends (10 people total) are offered a chance to play a game. You'll all line up and be numbered in order, from 1 to 10. In front of the line is a closed door, leading to a room that you'll each go into one at a time. In that room, the game master tells you, there are ten boxes. The boxes are lined up in order, also labeled 1-10 on the outside like you and your friends.
Okay, but here's the twist. Each box also contains a number on the inside. The game master informs you that each box contains a number between 1 and 10, and no two boxes contain the same number. In other words, every number between 1 and 10 is inside exactly one box. The numbers inside the boxes have been assigned randomly, however, so knowing the outside number tells you exactly nothing about the inside number.
Each of you will go into the room and open 5 boxes of your choice. If all of you find the box that contains your number on the inside, you all win the game. If even one person doesn't find it, you all lose. Oh, and you have to close the boxes after you're done checking them out, and after you're done trying you go into a further room and can't talk to anyone else until the game is over.
Knowing the rules of the game, what strategy should you and your friends agree on to make it as likely as possible that you win?
A shot in the dark
The first thing you might try, if you had absolutely no idea what to do, would be guessing at random. After all, the numbers in the boxes are assigned at random. Can you really hope to do better than random guessing?
If each player opens half the boxes at random, then each player has a ½ probability of finding their number. (Do you see why?) Since the players' choices are independent of each other (no player's choices can influence any others in any way), the overall probability of success is just ½ times itself 10 times. In other words: 2-10.
That's pretty tiny! 210 is 1024, so the probability of winning using this strategy is only about 0.1%. Even worse, if you add more players and more boxes, the probability of victory dips even lower. If you play with 20 people and 20 boxes, and each person gets to open 10 of them, the probability of winning using this method drops all the way to 2-20 or about 0.001%, a literally one in a million chance.
If you’d like to think more about this problem before seeing the solution (the best possible strategy in fact!), now is your moment. If not, continue reading.
Going in circles cycles
We’re going to use some of the only information we have at our disposal. At first, when a person goes into the room, they only know their number. So, since every box is equally likely to contain their number as every other box, why not start with the box that’s labeled on the outside with their number?
At first, there isn’t any reason to believe that will work particularly well. There’s still only a 1/10 chance that their number is inside that box. But we can iterate this strategy. Let’s say the person is person number 4 and finds the number 2 in their box, like in the image above. Okay. So next, they can open the box that’s labeled 2 on the outside, and they can keep doing that until they’ve opened the 5 boxes they’re allowed to.
Concretely, this might play out by the person finding 2 in their first box, opening the box labeled 2 and finding 1, then opening the one labeled 1 and finding 4. Aha, good luck! I generated the box assignment genuinely randomly, so it was by luck that this person found their number. (Can you find if any of the 10 people wouldn’t find their number in 5 steps of this process?)
What’re the odds?
How likely is the cycle-following strategy to work for all 10 people?
Well, first of all, let’s look at what all the cycles are. So we’ll take the same arrangement of numbers in boxes from before and visualize them as cycles.
We have 1 → 4 because if we open the box labeled 1 outside, we find 4 on the inside. And 4 → 2 because if we open the box labeled 4 on the outside, we find 2, and so on. 8 loops to itself because the box labeled 8 contains the number 8.
(If you know about permutations, by the way, you’ll probably recognize that we’re dealing with a permutation of the numbers 1 through 10, and if you know about cycles of permutations, you’ll see that we’re just mapping out the cycle structure. We’re gonna mostly avoid thinking about it that way in this post, to avoid extra prerequisites.)
Now, here’s the thing. Anyone whose number appears in a cycle of length 5 or shorter will find their number using the cycle-following method! And, conveniently, anyone whose number appears in a cycle of length more than 5 (so at least 6) will not find their number using this method!
So when we ask, what is the chance that everyone finds their number, we’re just asking: What’s the probability that all the cycles have length 5 or shorter? We’re now going to answer that question.
Remember that since we’re choosing arrangements of numbers in boxes uniformly at random - in other words, every arrangement is as likely as every other - the probability of something happening is just the fraction of arrangements where that things happens. So we’re asking what fraction of arrangements of numbers in boxes have no cycles longer than 5?
The number of total number arrangements when we have n boxes is given (as you might already know) by the following formula:
Since we have 10 boxes, that formula gives us
Of these, we want to know how many have no cycles longer than 5. Instead of counting that directly, it turns out to be easier to count how many arrangements have a longest cycle of length 6, then how many have a longest cycle of length 7, all the way up to a longest cycle of length 10. Let’s start with counting arrangements with the longest cycle of length 6; the same approach works for all of them.
First of all, there can only be one cycle of length at least 6, because cycles don’t overlap and there are only 10 numbers total. So basically we want to pick 6 of the 10 numbers to put in the cycle, and decide how to link them up to each other, and then we can do whatever we want with the rest of the numbers - as long as there’s a cycle of length 6, the remaining 4 numbers that aren’t in it can be arranged however we want.
There’s a formula for the number of ways of picking 6 items out of 10, which you may also know. It is:
which is a special case of the general for picking k items out of n:
That tells us how many ways there are to pick the 6 numbers we’re gonna put in the cycle. How about how many ways to arrange those in a cycle? The answer to that is that we’re basically ordering those 6 elements - meaning there are 6! ways to do that - but then every cycle corresponds to 6 different orderings, because in the cycle which number goes “first” doesn’t actually matter. Check out this illustration to see how it works in a specific case:
All of the orderings on the left give the same cycle, because unlike a sequence a cycle has no beginning and no ending - all that matters is which numbers are next to which other numbers.
So that means we get a total of 5! = (6!)/(6) ways of arranging those 6 items into a cycle. And as for the rest? Well we can arrange them however we want, and there are 4 items, so we end up with 4! possible arrangements of those items.
Combining everything, we’ve figured out that the total number of arrangements with a cycle of length exactly 6 is:
Whoa! There’re exactly 10! arrangements total, so the fraction of them that have a cycle of length exactly 6 is… exactly ⅙. It turns out the answer for 7 is 1/7 and the answer for 8 is ⅛ and so on all the way up to 1/10.
Therefore, the total fraction of arrangements that have a cycle of length at least 6 is:
When there’s a cycle of length at least 6, the players lose, so the probability of winning is
That means about a 35% chance of winning, which is a massive, massive improvement over the 0.1% we were looking at with random guessing.
Take it to the limit
At this point you might be wondering what happens if I play the game with more players. Did I use some trick that only applies to 10? Does the game get harder with more players and more boxes? In this section, I’m going to answer those questions.
In the interest of time and space, I’m going to move through the argument quickly here. Following this part might require some more comfort with combinatorics than we’ve used so far. So let’s put the answer right up front and leave the details for after.
If we play the game with an even number n of players and boxes, and each player gets to open n/2 of those boxes, then the cycle-following strategy wins with this probability pn:
In other words, about a 30% probability of winning no matter how large n gets!
The reason for this is that again there are n! total arrangements of numbers in boxes (corresponding to permutations). We have to count for every k > (n/2) how many permutations have a longest cycle of length k, and there’s always only one such cycle. Just like above we get that the total number of such permutations is
Which gives us a 1/k. That means the total probability of losing is
And here’s where things get interesting. There’s a special series you might know about known as the harmonic series
So the probability of losing is
However, Hn for large n is approximately (up to an additive error that goes to 0 for large n) equal to
where γ is a constant that doesn’t matter for our purposes. What matters is that we end up with
And thus the probability of winning is