## How Much Can I Expect To Lose In Pinball?

This weekend, all going well, I’ll be going to the Michigan state pinball championship contest. There, I will lose in the first round.

I’m not trying to run myself down. But I know who I’m scheduled to play in the first round, and she’s quite a good player. She’s the state’s highest-ranked woman playing competitive pinball. So she starts off being better than me. And then the venue is one she gets to play in more than I do. Pinball, a physical thing, is idiosyncratic. The reflexes you build practicing on one table can betray you on a strange machine. She’s had more chance to practice on the games we have and that pretty well settles the question. I’m still showing up, of course, and doing my best. Stranger things have happened than my winning a game. But I’m going in with I hope realistic expectations.

That bit about having realistic expectations, though, makes me ask what are realistic expectations. The first round is a best-of-seven match. How many games should I expect to win? And that becomes a probability question. It’s a great question to learn on, too. Our match is straightforward to model: we play up to seven times. Each time we play one or the other wins.

So we can start calculating. There’s some probability I have of winning any particular game. Call that number ‘p’. It’s at least zero (I’m not sure to lose) but it’s less than one (I’m not sure to win). Let’s suppose the probability of my winning never changes over the course of seven games. I will come back to the card I palmed there. If we’re playing 7 games, and I have a chance ‘p’ of winning any one of them, then the number of games I can expect to win is 7 times ‘p’. This is the number of wins you might expect if you were called on in class and had no idea and bluffed the first thing that came to mind. Sometimes that works.

7 times p isn’t very enlightening. What number is ‘p’, after all? And I don’t know exactly. The International Flipper Pinball Association tracks how many times I’ve finished a tournament or league above her and vice-versa. We’ve played in 54 recorded events together, and I’ve won 23 and lost 29 of them. (We’ve tied twice.) But that isn’t all head-to-head play. It counts matches where I’m beaten by someone she goes on to beat as her beating me, and vice-versa. And it includes a lot of playing not at the venue. I lack statistics and must go with my feelings. I’d estimate my chance of beating her at about one in three. Let’s say ‘p’ is 1/3 until we get evidence to the contrary. It is “Flipper Pinball” because the earliest pinball machines had no flippers. You plunged the ball into play and nudged the machine a little to keep it going somewhere you wanted. (The game Simpsons Pinball Party has a moment where Grampa Simpson says, “back in my day we didn’t have flippers”. It’s the best kind of joke, the one that is factually correct.)

Seven times one-third is not a difficult problem. It comes out to two and a third, raising the question of how one wins one-third of a pinball game. Most games involve playing three rounds, called balls, is the obvious observation. But this one-third of a game is an average. Imagine the two of us playing three thousand seven-game matches, without either of us getting the least bit better or worse or collapsing of exhaustion. I would expect to win seven thousand of the games, or two and a third games per seven-game match.

Ah, but … that’s too high. I would expect to win two and a third games out of seven. But we probably won’t play seven. We’ll stop when she or I gets to four wins. This makes the problem hard. Hard is the wrong word. It makes the problem tedious. At least it threatens to. Things will get easy enough, but we have to go through some difficult parts first.

There are eight different ways that our best-of-seven match can end. She can win in four games. I can win in four games. She can win in five games. I can win in five games. She can win in six games. I can win in six games. She can win in seven games. I can win in seven games. There is some chance of each of those eight outcomes happening. And exactly one of those will happen; it’s not possible that she’ll win in four games and in five games, unless we lose track of how many games we’d played. They give us index cards to write results down. We won’t lose track.

It’s easy to calculate the probability that I win in four games, if the chance of my winning a game is the number ‘p’. The probability is p4. Similarly it’s easy to calculate the probability that she wins in four games. If I have the chance ‘p’ of winning, then she has the chance ‘1 – p’ of winning. So her probability of winning in four games is (1 – p)4.

The probability of my winning in five games is more tedious to work out. It’s going to be p4 times (1 – p) times 4. The 4 here is the number of different ways that she can win one of the first four games. Turns out there’s four ways to do that. She could win the first game, or the second, or the third, or the fourth. And in the same way the probability she wins in five games is p times (1 – p)4 times 4.

The probability of my winning in six games is going to be p4 times (1 – p)2 times 10. There are ten ways to scatter four wins by her among the first five games. The probability of her winning in six games is the strikingly parallel p2 times (1 – p)4 times 10.

The probability of my winning in seven games is going to be p4 times (1 – p)3 times 20, because there are 20 ways to scatter three wins among the first six games. And the probability of her winning in seven games is p3 times (1 – p)4 times 20.

Add all those probabilities up, no matter what ‘p’ is, and you should get 1. Exactly one of those four outcomes has to happen. And we can work out the probability that the series will end after four games: it’s the chance she wins in four games plus the chance I win in four games. The probability that the series goes to five games is the probability that she wins in five games plus the probability that I win in five games. And so on for six and for seven games.

So that’s neat. We can figure out the probability of the match ending after four games, after five, after six, or after seven. And from that we can figure out the expected length of the match. This is the expectation value. Take the product of ‘4’ and the chance the match ends at four games. Take the product of ‘5’ and the chance the match ends at five games. Take the product of ‘6’ and the chance the match ends at six games. Take the product of ‘7’ and the chance the match ends at seven games. Add all those up. That’ll be, wonder of wonders, the number of games a match like this can be expected to run.

Now it’s a matter of adding together all these combinations of all these different outcomes and you know what? I’m not doing that. I don’t know what the chance is I’d do all this arithmetic correctly is, but I know there’s no chance I’d do all this arithmetic correctly. This is the stuff we pirate Mathematica to do. (Mathematica is supernaturally good at working out mathematical expressions. A personal license costs all the money you will ever have in your life plus ten percent, which it will calculate for you.)

Happily I won’t have to work it out. A person appearing to be a high school teacher named B Kiggins has worked it out already. Kiggins put it and a bunch of other interesting worksheets on the web. (Look for the Voronoi Diagramas!)

There’s a lot of arithmetic involved. But it all simplifies out, somehow. Per Kiggins’ work, the expected number of games in a best-of-seven match, if one of the competitors has the chance ‘p’ of winning any given game, is:

$E(p) = 4 + 4\cdot p + 4\cdot p^2 + 4\cdot p^3 - 52\cdot p^4 + 60\cdot p^5 - 20\cdot p^6$

Whatever you want to say about that, it’s a polynomial. And it’s easy enough to evaluate it, especially if you let the computer evaluate it. Oh, I would say it seems like a shame all those coefficients of ‘4’ drop off and we get weird numbers like ’52’ after that. But there’s something beautiful in there being four 4’s, isn’t there? Good enough.

So. If the chance of my winning a game, ‘p’, is one-third, then we’d expect the series to go 5.5 games. This accords well with my intuition. I thought I would be likely to win one game. Winning two would be a moral victory akin to championship.

Let me go back to my palmed card. This whole analysis is based on the idea that I have some fixed probability of winning and that it isn’t going to change from one game to the next. If the probability of winning is entirely based on my and my opponents’ abilities this is fair enough. Neither of us is likely to get significantly more or less skilled over the course of even seven matches. We won’t even play long enough to get fatigued. But ability isn’t everything.

But our abilities aren’t everything. We’re going to be playing up to seven different tables. How each table reacts to our play is going to vary. Some tables may treat me better, some tables my opponent. Luck of the draw. And there’s an important psychological component. It’s easy to get thrown and to let a bad ball wreck the rest of one’s game. It’s hard to resist feeling nervous if you go into the last ball from way behind your opponent. And it seems as if a pinball knows you’re nervous and races out of play to help you calm down. (The best pinball players tend to have outstanding last balls, though. They don’t get rattled. And they spend the first several balls building up to high-value shots they can collect later on.) And there will be freak events. Last weekend I was saved from elimination in a tournament by the pinball machine spontaneously resetting. We had to replay the game. I did well in the tournament, but it was the freak event that kept me from being knocked out in the first round.

That’s some complicated stuff to fit together. I suppose with enough data we could possibly model how much the differences between pinball machines affects the outcome. That’s what sabermetrics is all about. Representing how severely I’ll build a little bad luck into a lot of bad luck? Oh, that’s hard.

Too hard to deal with, at least not without much more sports psychology and modelling of pinball players than we have data to do. The supposition that my chance of winning is fixed for the duration of the match may not be true. But we won’t be playing enough games to be able to tell the difference. The assumption that my chance of winning doesn’t change over the course of the match may be false. But it’s near enough, and it gets us some useful information. We have to know not to demand too much precision from our model.

And seven games isn’t statistically significant. Not when players are as closely matched as we are. I could be worse and still get a couple wins in when they count; I could play better than my average and still get creamed four games straight. I’ll be trying my best, of course. But I expect my best is one or two wins, then getting to the snack room and waiting for the side tournament to start. Shall let you know if something interesting happens.

I feel as though I ought to write something thoughtful about the lottery, since the Mega Millions draw has got to be an impressively high value, and there are fine things to be said about probability, expectation values, and what we really mean by the probability of an event when we can only take one attempt at doing it (since I can’t make myself think the lottery will go another week without a winner and it is likely to be years before such a big jackpot rises again).

## Proving Something With One Month’s Counting

One week, it seems, isn’t enough to tell the difference conclusively between the first bidder on Contestants Row having a 25 percent chance of winning — winning one out of four times — or a 17 percent chance of winning — winning one out of six times. But we’re not limited to watching just the one week of The Price Is Right, at least in principle. Some more episodes might help us, and we can test how many episodes are needed to be confident that we can tell the difference. I won’t be clever about this. I have a tool — Octave — which makes it very easy to figure out whether it’s plausible for something which happens 1/4 of the time to turn up only 1/6 of the time in a set number of attempts, and I’ll just keep trying larger numbers of attempts until I’m satisfied. Sometimes the easiest way to solve a problem is to keep trying numbers until something works.

In two weeks (or any ten episodes, really, as talked about above), with 60 items up for bids, a 25 percent chance of winning suggests the first bidder should win 15 times. A 17 percent chance of winning would be a touch over 10 wins. The chance of 10 or fewer successes out of 60 attempts, with a 25 percent chance of success each time, is about 8.6 percent, still none too compelling.

Here we might turn to despair: 6,000 episodes — about 35 years of production — weren’t enough to give perfectly unambiguous answers about whether there were fewer clean sweeps than we expected. There were too few at the 5 percent significance level, but not too few at the 1 percent significance level. Do we really expect to do better with only 60 shows?

## What Can One Week Prove?

We have some reason to think the chance of winning an Item Up For Bids, if you’re the first one of the four to place bids — let’s call this the first bidder or first seat so there’s a name for it — is lower than the 25 percent which we’d expect if every contestant in The Price Is Right‘s Contestants Row had an equal shot at it. Based on the assertion that only one time in about six thousand episodes had all six winning bids in one episode come from the same seat, we reasoned that the chance for the first bidder — the same seat as won the previous bid — could be around 17 percent. My next question is how we could test this? The chance for the first bidder to win might be higher than 17 percent — around 1/6, which is near enough and easier to work with — or lower than 25 percent — exactly 1/4 — or conceivably even be outside that range.

The obvious thing to do is test: watch a couple episodes, and see whether it’s nearer to 1/6 or to 1/4 of the winning bids come from the first seat. It’s easy to tally the number of items up for bid and how often the first bidder wins. However, there are only six items up for bid each episode, and there are five episodes per week, for 30 trials in all. I talk about a week’s worth of episodes because it’s a convenient unit, easy to record on the Tivo or an equivalent device, easy to watch at The Price Is Right‘s online site, but it doesn’t have to be a single week. It could be any five episodes. But I’ll say a week just because it’s convenient to do so.

If the first seat has a chance of 25 percent of winning, we expect 30 times 1/4, or seven or eight, first-seat wins per week. If the first seat has a 17 percent chance of winning, we expect 30 times 1/6, or 5, first-seat wins per week. That’s not much difference. What’s the chance we see 5 first-seat wins if the first seat has a 25 percent chance of winning?

## Figuring Out The Penalty Of Going First

Let’s accept the conclusion that the small number of clean sweeps of Contestants Row is statistically significant, that all six winning contestants on a single episode of The Price Is Right come from the same seat less often than we would expect from chance alone, and that the reason for this is that whichever seat won the last item up for bids is less likely to win the next. It seems natural to suppose the seat which won last time — and which is therefore bidding first this next time — is at a disadvantage. The irresistible question, to me anyway, is: how big is that disadvantage? If no seats had any advantage, the first, second, third, and fourth bidders would be expected to have a probability of 1/4 of winning any particular item. How much less a chance does the first bidder need to have to get the one clean sweep in 6,000 episodes reported?

Chiaroscuro came to an estimate that the first bidder had a probability of about 17.6 percent of winning the item up for bids, and I agree with that, at least if we make a couple of assumptions which I’m confident we are making together. But it’s worth saying what those assumptions are because if the assumptions do not hold, the answers come out different.

The first assumption was made explicitly in the first paragraph here: that the low number of clean sweeps is because the chance of a clean sweep is less than the 1 in 1000 (or to be exact, 1 in 1024) chance which supposes every seat has an equal probability of winning. After all, the probability that we saw so few clean sweeps for chance alone was only a bit under two percent; that’s unlikely but hardly unthinkable. We’re supposing there is something to explain.

## Significance Intrudes on Contestants Row

We worked out the likelihood that there would be only one clean sweep, with all six contestants getting on stage coming from the same seat in Contestants Row, out of six thousand episodes of The Price Is Right. That turned out to be not terribly likely: it had about a one and a half percent chance of being the case. For a sense of scale, that’s around the same probability that the moment you finish reading this sentence will be exactly 26 seconds past the minute. It’s pretty safe to bet that it wasn’t.

However, it isn’t particularly outlandish to suppose that it was. I’d certainly hope at least some reader found that it was. Events which aren’t particularly likely do happen, all the time. Consider the likelihood of this single-clean-sweep or the 26-seconds-past-the-minute thing happening to the likelihood of any given hand of poker: any specific hand is phenomenally less likely, but something has to happen once you start dealing. So do we have any grounds for saying the particular outcome of one clean sweep in 6,000 shows is improbable? Or for saying that it’s reasonable?

## A Simple Demonstration Which Does Not Clarify

When last we talked about the “clean sweep” of winning contestants coming from the same of four seats in Contestants Row for all six Items Up For Bid on The Price Is Right, we had got established the pieces needed if we suppose this to be a binomial distribution problem. That is, we suppose that any given episode has a probability, p, of successfully having all six contestants from the same seat, and a probability 1 – p of failing to have all six contestants from the same seat. There are N episodes, and we are interested in the chance of x of them being clean sweeps. From the production schedule we know the number of episodes N is about 6,000. We supposed the probability of a clean sweep to be about p = 1/1000, on the assumption that the chance of winning isn’t any better or worse for any contestant. The probability of there not being a clean sweep is then 1 – p = 999/1000. And we expected x = 6 clean sweeps, while Drew Carey claimed there had been only 1.

The chance of finding x successes out of N attempts, according to the binomial distribution, is the probability of any combination of x successes and N – x successes — which is equal to (p)(x) * (1 – p)(N – x) — times the number of ways there are to select x items out of N candidates. Either of those is easy enough to calculate, up to the point where we try calculating it. Let’s start out by supposing x to be the expected 6, and later we’ll look at it being 1 or other numbers.

## Came On Down

On the December 15th episode of The Price Is Right, host Drew Carey mentioned as the sixth Item Up For Bids began that so far that show, all the contestants who won their Item Up For Bids (and so got on-stage for the pricing games) had come from the same spot so far, five out of six. He said that only once before on the show had all the contestants come from the same seat in Contestants Row. That seems awfully few, but, how many should there be?

We can say roughly how many “clean sweep” shows we should expect. There’ve been just about 6,000 episodes of The Price Is Right played in the current hour-long format (the show was a half-hour its first few years after being revived in 1972; it was a very different show in previous decades). If we know the probability of all six contestants in one game winning their Item Up For Bids — properly speaking, it’s called the One-Bid, but nobody cares — and multiply the probability of six contestants in one show coming from the same seat by the number of shows, we have the number of shows we should expect to have had such a clean sweep. This product, the chance of something happening times the number of times it could happen, is termed the “expected value” or “expectation value”, or sometimes just the “mean”, as in the average number to be, well, expected.

This makes a couple of assumptions. All probability problems do. For example, it assumes the chance of a clean sweep in one show is unaffected by clean sweeps in other shows. That is, if everyone in the red seat won on Thursday, that wouldn’t make everyone in the blue seat winning Friday more or less likely. That condition is termed “independence”, and it is frequently relied upon to make probability problems work out. Unfortunately, it’s often hard to prove: how do you prove that one thing happening doesn’t affect the other?

## What is .19 of a bathroom?

I’ve had a little more time attempting to teach probability to my students and realized I had been overlooking something obvious in the communication of ideas such as the probability of events or the expectation value of a random variable. Students have a much easier time getting the abstract idea if the examples used for it are already ones they find interesting, and if the examples can avoid confusing interpretations. This is probably about 3,500 years behind the curve in educational discoveries, but at least I got there eventually.

A “random variable”, here, sounds a bit scary, but shouldn’t. It means that the variable, for which x is a popular name, is some quantity which might be any of a collection of possible values. We don’t know for any particular experiment what value it has, at least before the experiment is done, but we know how likely it is to be any of those. For example, the number of bathrooms in a house is going to be one of 1, 1.5, 2, 2.5, 3, 3.5, up to the limits of tolerance of the zoning committee.

The expectation value of a random variable is kind of the average value of that variable. You find it by taking the sum of each of the possible values of the random variable times the probability of the random variable having that value. This is at least for a discrete random variable, where the imaginable values are, er, discrete: there’s no continuous ranges of possible values. Number of bathrooms is clearly discrete; the number of seconds one spends in the bathroom is, at least in principle, continuous. For a continuous random variable you don’t take the sum, but instead take an integral, which is just a sum that handles the idea of infinitely many possible values quite well.