A Simple Demonstration Which Does Not Clarify

When last we talked about the “clean sweep” of winning contestants coming from the same of four seats in Contestants Row for all six Items Up For Bid on The Price Is Right, we had got established the pieces needed if we suppose this to be a binomial distribution problem. That is, we suppose that any given episode has a probability, p, of successfully having all six contestants from the same seat, and a probability 1 – p of failing to have all six contestants from the same seat. There are N episodes, and we are interested in the chance of x of them being clean sweeps. From the production schedule we know the number of episodes N is about 6,000. We supposed the probability of a clean sweep to be about p = 1/1000, on the assumption that the chance of winning isn’t any better or worse for any contestant. The probability of there not being a clean sweep is then 1 – p = 999/1000. And we expected x = 6 clean sweeps, while Drew Carey claimed there had been only 1.

The chance of finding x successes out of N attempts, according to the binomial distribution, is the probability of any combination of x successes and N – x successes — which is equal to (p)(x) * (1 – p)(N – x) — times the number of ways there are to select x items out of N candidates. Either of those is easy enough to calculate, up to the point where we try calculating it. Let’s start out by supposing x to be the expected 6, and later we’ll look at it being 1 or other numbers.

If we work it out for x = 6 then the first of those terms is (1/1000)(6) * (999/1000)(1000 – 6). That’s easy enough to have the calculator do, we’d very much like to think. Actually, the (1/1000) raised to the sixth power — that is, (1/1000) times (1/1000) times (1/1000) times (1/1000) times (1/1000) times (1/1000) — might well make a calculator round the product off to zero, after which we can’t do anything. A scientific calculator or a computer running a mathematics package like Matlab (or Octave, its open-source imitator) or Maple would probably do better. That (999/1000) multiplied by itself 994 times isn’t so pleasant either. This has the potential to be really ugly.

Fortunately most scientific calculators would let you enter a number and a “raise to the power of” symbol, usually written with a button labelled either xy or yx. The order doesn’t matter, as I never remember straight whether I’m supposed to enter the lower or the upper number first. I suspect the calculator’s programmers didn’t either.

It does also leave the question how people did this sort of problem before there were calculators. The obvious answer is, much more slowly, and only after thinking carefully about whether they wanted to do it at all. There’s sound reasoning behind this answer. Doing it by hand would be ridiculously long, not to mention tedious, and it’s almost certain we’d make an error somewhere along the lines. I probably couldn’t enter it even on a calculator which lacked the “raise to the power of” symbol without making a typo somewhere along the line. But there were ways to do a problem like that, even before Matlab made it easy; in some later installment I’ll get to some of them.

Letting Matlab do the heavy lifting, we find out the probability of any one configuration of 6 clean sweeps out of 6,000 attempts is a modest 0.000 000 000 000 000 000 002 486 202 or thereabouts. That’s an intimidatingly tiny number, the kind where it’s tempting to ask whether it’s a number at all. To put it in physical terms that won’t help anyone understand how tiny that number is, it’s about how much of the distance between the Earth and Mars is spanned by a single atom. I told you that wouldn’t help, past, it’s a really tiny number. There had better be a lot of ways to pick 6 things out of 6,000.

It turns out there are a lot of ways to pick 6 things out of 6,000. As worked out last time, to pick out six things in some particular order means we have 6,000 choices for the first item, then 5,999 choices for the second, then 5,998 for the third, then 5,997 for the fourth, then 5,996 choices for the fifth, and 5,995 choices for the sixth item. So there are 6,000 times 5,999 times 5,998 times 5,997 times 5,996 times 5,995 ways to select, in order, 6 episodes out of the 6,000 possible.

Since we don’t care what order they appear in — we can’t tell the difference if the clean sweep episodes were the first, fourth, hundredth and so on versus the hundredth, fourth, and first episodes — we have to divide this by the number of ways to arrange six things, which is 6 times 5 times 4 times 3 times 2 times 1. So, what is 6,000 x 5,999 x 5,998 x 5,997 x 5,996 x 5,995 divided by 6 x 5 x 4 x 3 x 2 x 1? A horribly big number we’re happy calculators can work out is what: it’s around 64,638,152,932,513,690,000. That’s a number almost as big as the first one was tiny. Multiply that really big number by that really small number and we have the probability of seeing 6 perfect sweeps out of 6,000: the probability is about 0.1607 or a touch over one in six.

That’s tolerably likely, particularly when we consider there are so many other things that might happen — 5 perfect sweeps out of 6,000, or 8 perfect sweeps, or 1 perfect sweep, or 1,000 perfect sweeps. We can figure out the likelihood of some of them, at least up until we get tired of ordering Matlab around. For example, if x = 7, then we have to evaluate the probability of exactly seven clean sweeps, (1/1000)(7) x (999/1000)(1000 – 7), times the ways to pick 7 episodes out of 6,000, or 6,000 x 5,999 x 5,998 x 5,997 x 5,996 x 5,995 x 5,994 divided by 7 x 6 x 5 x 4 x 3 x 2 x 1, which works out to be a probability of 0.1377 unless I’ve typed something in wrong. That still looks pretty likely, if not quite so likely as the expected 6 clean sweeps.

The probability of 5 clean sweeps is interesting, as it turns out to be 0.1607 again, making it look just as likely as 6 clean sweeps. That’s just a rounding-off difference, though; 6 clean sweeps has a slightly greater probability. Still, 5 and 6 clean sweeps look likely. 4 clean sweeps turns out to have a probability of 0.1339, and 8 clean sweeps a probability of 0.1033, which altogether has an interesting implication. The probability of having 4, 5, 6, 7, or 8 clean sweeps comes out to a total of 0.6963, so there’s better than two chances in three of seeing from 4 to 8 clean sweeps in 6,000 episodes, and less than one chance in three of seeing all the other possible numbers combined, including the chance of seeing just the claimed 1.

Rather than carry this on even more let me work out the probabilities of x clean sweeps for each x from 0 up to 15. There’s an obvious reason for stopping at 0; I stopped at 15 because it makes for a fairly nice table and the probabilities are getting pretty tiny around there. It may be possible to see 1,000 clean sweeps out of 6,000, but, it’s not going to happen.

x P(x) x P(x) x P(x) x P(x)
0 0.0025 4 0.1339 8 0.1033 12 0.0112
1 0.0148 5 0.1607 9 0.0688 13 0.0052
2 0.0446 6 0.1607 10 0.0413 14 0.0022
3 0.0892 7 0.1377 11 0.0225 15 0.0009

And standing out from that table is the difficult to contest data point: the probability of exactly 1 clean-sweep episode out of 6,000 attempts, if we’ve done all our work correctly, is 0.0148, or about a 1.5 percent shot of this happening.

Now we have the new hard problem: what does that mean?