We became suspicious of the number of clean sweeps in The Price Is Right when there were not the expected six of them in 6,000 episodes. The chance there would be only one was about one and a half percent, not very high. But are there so few clean sweeps that we should be suspicious? That is, is the difference between the expected number of sweeps and the observed number so large as to be significant? Is it too big to just result from chance?
This is significance testing: is whatever quantity we mean to observe dramatically less than what is expected? Is it dramatically more? Is it at least different? Are these differences bigger than what could be expected by mere chance? For every statistician’s favorite example, a tossed fair coin will come up tails half the time; that means, of twenty flips, there are expected to be ten tails. But there being merely nine or as many as twelve is reasonable. Three or fifteen tails may be a little unlikely. Zero or twenty seem impossible. There’s a point where if our observations are so different from what we expect then we have to reject the idea that our observations and our expectations agree.
It’s not enough to say there’s a probability of only 1.5 percent that there should be exactly one clean sweep episode out of 6,000, though. It’s unlikely that should happen, but if we look at it, it’s unlikely there should be any outcome. Even the most likely result of 6,000 episodes, six clean sweeps, has only about one chance in six of happening. That’s near the chance that the next person you meet will have a birthday in either September or November. That isn’t absurdly unlikely, but, the person betting against it has the surer deal.
If we think “one” is too few clean sweeps, we have to work out the probability that we would see one or fewer clean sweeps. Since last time around we worked out — well, I worked out, but I’m sure everyone checked all the numbers themselves to see how poorly I do arithmetic — the probabilities of zero, one, two, or so on episodes, we don’t even need to do any further work. The probability of zero clean sweeps was 0.0025; the probability of one clean sweep was 0.0148; and since it’s not possible to have both exactly zero and exactly one clean sweep in 6,000 episodes simultaneously, the probability of one or fewer clean sweeps is 0.0173.
I admit this isn’t too involved an example, but that’s what the observation said. If we had heard of two clean sweeps, and wanted to know if this was suspiciously few, we’d have to add the probabilities of zero, one, or two clean sweeps. If we had seen, say, ten clean sweeps and wanted to know if that was too many, we would have to add together the probabilities of ten and eleven and twelve and thirteen and fourteen clean sweeps … all the way up to, in principle, the chance of six thousand clean sweeps. That’s a mad prospect, and we would have two practical ways of avoiding doing all that work. The simpler is to declare that some cases were so improbable we could pretend they couldn’t happen — I’d guess that anything above twenty meets that — and just ignore them, and accept that our answer will be a little bit wrong.
The harder, but more honest, method is to note that since the probabilities of all possible numbers of clean sweeps adds up to 1, we could say the probability of ten or more clean sweeps has to be equal to 1 minus the probability of between zero and nine clean sweeps. It’s tedious, but it’s not hard, and we — that is, I — worked it out already. That’s a different, but very similar, question, though. (Yet another question is whether the number of clean sweeps is just different from what’s expected; it will work out about the same way.)
What we’re left with is: there is a probability of 0.0173 that we should be at least this deficient in clean sweep episodes. That’s not much different from where we were last time. The chance of no more than one clean sweep in 6,000 episodes is a tiny bit better than the chance that at the end of this sentence it will be exactly 26 seconds after the minute, but, is it unlikely that would happen?
We need to set some standards for how unlikely something can be before we suppose it’s suspiciously unlikely.