## My Answer For Who’s The Most Improved Pinball Player

Okay, so writing “this next essay right away” didn’t come to pass, because all sorts of other things got in the way. But to get back to where we had been: we hoped to figure out which of the players at the local pinball league had most improved over the season. The data I had available. But data is always imperfect. We try to learn anyway.

What data I had was this. Each league night we selected five pinball games. Each player there played those five tables. We recorded their scores. Each player’s standing was based on, for each table, how many other players they beat. If you beat everyone on a particular table, you got 100 points. If you beat all but three people, you got 96 points. If ten people beat you, you got 90 points. And so on. Add together the points earned for all five games of that night. We didn’t play the same games week to week. And not everyone played every single week. These are some of the limits of the data.

My first approach was to look at a linear regression. That is, take a plot where the independent variable is the league night number and the dependent variable is player’s nightly scores. This will almost certainly not be a straight line. There’s an excellent chance it will never touch any of the data points. But there is some line that comes closer than any other line to touching all these data points. What is that line, and what is its slope? And that’s easy to calculate. Well, it’s tedious to calculate. But the formula for it is easy enough to make a computer do. And then it’s easy to look at the slope of the line approximating each player’s performance. The highest slope of their performance line obviously belongs to the best player.

And the answer gotten was that the most improved player — the one whose score increased most, week to week — was a player I’ll call T. The thing is T was already a good player. A great one, really. He’d just been unable to join the league until partway through. So nights that he didn’t play, and so was retroactively given a minimal score for, counted as “terrible early nights”. This made his play look like it was getting better than it was. It’s not just a problem of one person, either. I had missed a night, early on, and that weird outlier case made my league performance look, to this regression, like it was improving pretty well. If we removed the missed nights, my apparent improvement changed to a slight decline. If we pretend that my second-week absence happened on week eight instead, I had a calamitous fall over the season.

And that felt wrong, so I went back to re-think. This is dangerous stuff, by the way. You can fool yourself if you go back and change your methods because your answer looked wrong. But. An important part of finding answers is validating your answer. Getting a wrong-looking answer can be a warning that your method was wrong. This is especially so if you started out unsure how to find what you were looking for.

So what did that first answer, that I didn’t believe, tell me? It told me I needed some better way to handle noisy data. I should tell apart a person who’s steadily doing better week to week and a person who’s just had one lousy night. Or two lousy nights. Or someone who just had a lousy season, but enjoyed one outstanding night where they couldn’t be beaten. Is there a measure of consistency?

And there — well, there kind of is. I’m looking at Pearson’s Correlation Coefficient, also known as Pearson’s r, or r. Karl Pearson is a name you will know if you learn statistics, because he invented just about all of them except the Student T test. Or you will not know if you learn statistics, because we don’t talk much about the history of statistics. (A lot of the development of statistical ideas was done in the late 19th and early 20th century, often by people — like Pearson — who were eugenicists. When we talk about mathematics history we’re more likely to talk about, oh, this fellow published what he learned trying to do quality control at Guinness breweries. We move with embarrassed coughing past oh, this fellow was interested in showing which nationalities were dragging the average down.) I hope you’ll allow me to move on with just some embarrassed coughing about this.

Anyway, Pearson’s ‘r’ is a number between -1 and 1. It reflects how well a line actually describes your data. The closer this ‘r’ is to zero, the less like a line your data really is. And the square of this, r2, has a great, easy physical interpretation. It tells you how much of the variations in your dependent variable — the rankings, here — can be explained by a linear function of the independent variable — the league night, here. The bigger r2 is, the more line-like the original data is. The less its result depends on fluke events.

This is another tedious calculation, yes. Computers. They do great things for statistical study. These told me something unsurprising: r2 for our putative best player, T, was about 0.313. That is, about 31 percent of his score’s change could be attributed to improvement; 69 percent of it was noise, reflecting the missed nights. For me, r2 was about 0.105. That is, 90 percent of the variation in my standing was noise. This suggests by the way that I was playing pretty consistently, week to week, which matched how I felt about my season. And yes, we did have one player whose r2 was 0.000. So he was consistent and about all the change in his week-to-week score reflected noise. (I only looked at three digits past the decimal. That’s more precision than the data could support, though. I wouldn’t be willing to say whether he played more consistently than the person with r2 of 0.005 or the one with 0.012.)

Now, looking at that — ah, here’s something much better. Here’s a player, L, with a Pearson’s r of 0.803. r2 was about 0.645, the highest of anyone. The most nearly linear performance in the league. Only about 35 percent of L’s performance change could be attributed to random noise rather than to a linear change, week-to-week. And that change was the second-highest in the league, too. L’s standing improved by about 5.21 points per league night. Better than anyone but T.

This, then, was my nomination for the most improved player. L had a large positive slope, in looking at ranking-over-time. L also also a high correlation coefficient. This makes the argument that the improvement was consistent and due to something besides L getting luckier later in the season.

Yes, I am fortunate that I didn’t have to decide between someone with a high r2 and mediocre slope versus someone with a mediocre r2 and high slope. Maybe this season. I’ll let you know how it turns out.

## Who We Just Know Is Not The Most Improved Pinball Player

Back before suddenly everything got complicated I was working on the question of who’s the most improved pinball player? This was specifically for our local league. The league meets, normally, twice a month for a four-month season. Everyone plays the same five pinball tables for the night. They get league points for each of the five tables. The points are based on how many of their fellow players their score on that table beat that night. (Most leagues don’t keep standings this way. It’s one that harmonizes well with the vengue and the league’s history.) The highest score on a game earns its player 100 league points. Second-highest earns its scorer 99 league points. Third-highest earns 98, and so on. Setting the highest score to a 100 and counting down makes the race for the top less dependent on how many people show up each night. A fantastic night when 20 people attended is as good as a fantastic night when only 12 could make it out.

Last season had a large number of new players join the league. The natural question this inspired was, who was most improved? One answer is to use linear regression. That is, look at the scores each player had each of the eight nights of the season. This will be a bunch of points — eight, in this league’s case — with x-coordinates from 1 through 8 and y-coordinates from between about 400 to 500. There is some straight line which comes the nearest to describing each player’s performance that a straight line possibly can. Finding that straight line is the “linear regression”.

A straight line has a slope. This describes stuff about the x- and y-coordinates that match points on the line. Particularly, if you start from a point on the line, and change the x-coordinate a tiny bit, how much does the y-coordinate change? A positive slope means the y-coordinate changes as the x-coordinate changes. So a positive slope implies that each successive league night (increase in the x-coordinate) we expect an increase in the nightly score (the y-coordinate).

For me, I had a slope of about 2.48. That’s a positive number, so apparently I was on average getting better all season. Good to know. And with the data on each player and their nightly scores on hand, it was easy to calculate the slopes of all their performances. This is because I did not do it. I had the computer do it. Finding the slopes of these linear regressions is not hard; it’s just tedious. It takes these multiplications and additions and divisions and you know? This is what we have computing machines for. Setting up the problem and interpreting the results is what we have people for.

And with that work done we found the most improved player in the league was … ah-huh. No, that’s not right. The person with the highest slope, T, finished the season a quite good player, yes. Thing is he started the season that way too. He’d been playing pinball for years. Playing competitively very well, too, at least when he could. Work often kept him away from chances. Now that he’s retired, he’s a plausible candidate to make the state championship contest, even if his winning would be rather a surprise. Still. It’s possible he improved over the course of our eight meetings. But more than everyone else in the league, including people who came in as complete novices and finished as competent players?

So what happened?

T joined the league late, is what happened. After the first week. So he was proleptically scored at the bottom of the league that first meeting. He also had to miss one of the league’s first several meetings after joining. The result is that he had two boat-anchor scores in the first half of the season, and then basically middle-to-good scores for the latter half. Numerically, yeah, T started the season lousy and ended great. That’s improvement. Improved the standings by about 6.79 points per league meeting, by this standard. That’s just not so.

This approach for measuring how a competitor improved is flawed. But then every scheme for measuring things is flawed. Anything actually interesting is complicated and multifaceted; measurements of it are, at least, a couple of discrete values. We hope that this tiny measurement can tell us something about a complicated system. To do that, we have to understand in what ways we know the measurements to be flawed.

So treating a missed night as a bottomed-out score is bad. Also the bottomed-out scores are a bit flaky. If you miss a night when ten people were at league, you get a score of 450. Miss a night when twenty people were at league, you get a score of 400. It’s daft to get fifty points for something that doesn’t reflect anything you did except spread false information about what day league was.

Still, this is something we can compensate for. We can re-run the linear regression, for example, taking out the scores that represent missed nights. This done, T’s slope drops to 2.57. Still quite the improvement. T was getting used to the games, apparently. But it’s no longer a slope that dominates the league while feeling illogical. I’m not happy with this decision, though, not least because the same change for me drops my slope to -0.50. That is, that I got appreciably worse over the season. But that’s sentiment. Someone looking at the plot of my scores, that anomalous second week aside, would probably say that yeah, my scores were probably dropping night-to-night. Ouch.

Or does it drop to -0.50? If we count league nights as the x-coordinate and league points as the y-coordinate, then yeah, omitting night two altogether gives me a slope of -0.50. What if the x-coordinate is instead the number of league nights I’ve been to, to get to that score? That is, if for night 2 I record, not a blank score, but the 472 points I got on league night number three? And for night 3 I record the 473 I got on league night number four? If I count by my improvement over the seven nights I played? … Then my slope is -0.68. I got worse even faster. I had a poor last night, and a lousy league night number six. They sank me.

And what if we pretend that for night two I got an average-for-me score? There are a couple kinds of averages, yes. The arithmetic mean for my other nights was a score of 468.57. The arithmetic mean is what normal people intend when they say average. Fill that in as a provisional night two score. My weekly decline in standing itself declines, to only -0.41. The other average that anyone might find convincing is my median score. For the rest of the season that was 472; I put in as many scores lower than that as I did higher. Using this average makes my decline worse again. Then my slope is -0.62.

You see where I’m getting more dissatisfied. What was my performance like over the season? Depending on how you address how to handle a missed night, I either got noticeably better, with a slope of 2.48. Or I got noticeably worse, with a slope of -0.68. Or maybe -0.61. Or I got modestly worse, with a slope of -0.41.

There’s something unsatisfying with a study of some data if handling one or two bad entries throws our answers this far off. More thought is needed. I’ll come back to this, but I mean to write this next essay right away so that I actually do.

## Can We Tell Whether A Pinball Player Is Improving?

The question posed for the pinball league was: can we say which of the players most improved over the season? I had data. I had the rankings of each of the players over the course of eight league nights. I had tools. I’ve taken statistics classes.

Could I say what a “most improved” pinball player looks like? Well, I can give a rough idea. A player’s improving if their rankings increase over the the season. The most-improved person would show the biggest improvement. This definition might go awry; maybe there’s some important factor I overlooked. But it was a place to start looking.

So here’s the first problem. It’s the plot of my own data, my league scores over the season. Yes, league night 2 is dismal. I’d had to miss the night and so got the lowest score possible.

Is this getting better? Or worse? The obvious thing to do is to look for a curve that goes through these points. Then look at what that curve is doing. The thing is, it’s always possible to draw a curve through a bunch of data points. As long as there’s not something crazy like there’s four data points for the same league night. As long as there’s one data point for each measurement you can always connect those points to some curve. Worse, you can always fit more than one curve through those points. We need to think harder.

Here’s the thing about pinball league night results. Or any other data that comes from the real world. It’s got noise in it. There’s some amount of it that’s just random. We don’t need to look for a curve that matches every data point. Or any data point particularly. What if the actual data is “some easy-to-understand curve, plus some random noise”?

It’s a good thought. It’s a dangerous thought. You need to have an idea of what the “real” curve should be. There’s infinitely many possibilities. You can bias your answer by choosing what curve you think the data ought to represent. Or by not thinking before you make a choice. As ever, the hard part is not in doing a calculation. It’s choosing what calculation to do.

That said there’s a couple safe bets. One of them is straight lines. Why? … Well, they’re easy to work with. But we have deeper reasons. Lots of stuff, when it changes, looks like it’s changing in a straight line. Take any curve that hasn’t got a corner or a jump or a break in it. There’s a straight line that looks close enough to it. Maybe not for long, but at least for some stretch. In the absence of a better idea of what ought to be right, a line is at least a starting point. You might learn something even if a line doesn’t fit well, and get ideas for why to look at particular other shapes.

So there’s good, steady mathematics business to be found in doing “linear regression”. That is, find the line that best fits a set of data points. What do we mean by “best fits”?

The mathematical community has an answer. I agree with it, surely to the comfort of the mathematical community. Here’s the premise. You have a bunch of data points, with a dependent variable ‘x’ and an independent variable ‘y’. So the data points are a bunch of points, $\left(x_j, y_j\right)$ for a couple values of j. You want the line that “best” matches that. Fine. In my pinball league case here, j is the whole numbers from 1 to 8. $x_j$ is … just j again. All right, as happens, this is more mechanism than we need for this problem. But there’s problems where it would be useful anyway. And for $y_j$, well, here:

j yj
1 467
2 420
3 472
4 473
5 472
6 455
7 479
8 462

For the linear regression, propose a line described by the equation $y = m\cdot x + b$. No idea what ‘m’ and ‘b’ are just yet. But. Calculate for each of the $x_j$ values what the projection would be, that is, what $m\cdot x_j + b$. How far are those from the actual $y_j$ data?

Are there choices for ‘m’ and ‘b’ that make the difference smaller? It’s easy to convince yourself there are. Suppose we started out with ‘m’ equal to 0 and ‘b’ equal to 472. That’s an okay fit. Suppose we started out with ‘m’ equal to 100,000,000 and ‘b’ equal to -2,038. That’s a crazy bad fit. So there must be some ‘m’ and ‘b’ that make for better fits.

Is there a best fit? If you don’t think much about mathematics the answer is obvious: of course there’s a best fit. If there’s some poor, some decent, some good fits there must be a best. If you’re a bit better-learned and have thought more about mathematics you might grow suspicious. That term ‘best’ is dangerous. Maybe there’s several fits that are all different but equally good. Maybe there’s an endless series of ever-better fits but no one best. (If you’re not clear how this could work, ponder: what’s the largest negative real number?)

Good suspicions. If you learn a bit more mathematics you learn the calculus of variations. This is the study of how small changes in one quantity change something that depends on it; and it’s all about finding the maxima or minima of stuff. And that tells us that there is, indeed, a best choice for ‘m’ and ‘b’.

(Here I’m going to hedge. I’ve learned a bit more mathematics than that. I don’t think there’s some freaky set of data that will turn up multiple best-fit curves. But my gut won’t let me just declare that. There’s all kinds of crazy, intuition-busting stuff out there. But if there exists some data set that breaks linear regression you aren’t going to run into it by accident.)

So. How to find the best ‘m’ and ‘b’ for this? You’ve got choices. You can open up DuckDuckGo and search for ‘matlab linear regression’ and follow the instructions. Or ‘excel linear regression’, if you have an easier time entering data into spreadsheets. If you’re on the Mac, maybe ‘apple numbers linear regression’. Follow the directions on the second or third link returned. Oh, you can do the calculation yourself. It’s not hard. It’s just tedious. It’s a lot of multiplication and addition and you know what? We’ve already built tools that know how to do this. Use them. Not if your homework assignment is to do this by hand, but, for stuff you care about yes. (In Octave, an open-source clone of Matlab, you can do it by an admirably slick formula that might even be memorizable.)

If you suspect that some shape other than a line is best, okay. Then you’ll want to look up and understand the formulas for these linear regression coefficients. That’ll guide you to finding a best-fit for these other shapes. Or you can do a quick, dirty hack. Like, if you think it should be an exponential curve, then try fitting a line to x and the logarithm of y. And then don’t listen to those doubts about whether this would be the best-fit exponential curve. It’s a calculation, it’s done, isn’t that enough?

Back to lines, back to my data. I’ll spare you the calculations and show you the results.

Done. For me, this season, I ended up with a slope ‘m’ of about 2.48 and a ‘b’ of about 451.3. That is, the slightly diagonal black line here. The red circles are what my scores would have been if my performance exactly matched the line.

That seems like a claim that I’m improving over the season. Maybe not a compelling case. That missed night certainly dragged me down. But everybody had some outlier bad night, surely. Why not find the line that best fits everyone’s season, and declare the most-improved person to be the one with the largest positive slope?

## Who’s The Most Improved Pinball Player?

My love just completed a season as head of a competitive pinball league. People find this an enchanting fact. People find competitive pinball at all enchanting. Many didn’t know pinball was still around, much less big enough to have regular competitions.

Pinball’s in great shape compared to, say, the early 2000s. There’s one major manufacturer. There’s a couple of small manufacturers who are well-organized enough to make a string of games without (yet) collapsing from not knowing how to finance game-building. Many games go right to private collections. But the “barcade” model of a hipster bar with a bunch of pinball machines and, often, video games is working quite well right now. We’re fortunate to live in Michigan. All the major cities in the lower part of the state have pretty good venues and leagues in or near them. We’re especially fortunate to live in Lansing, so that most of these spots are within an hour’s drive, and all of them are within two hours’ drive.

Ah, but how do they work? Many ways, but there are a couple of popular ones. My love’s league uses a scheme that surely has a name. In this scheme everybody plays their own turn on a set of games. Then they get ranked for each game. So the person who puts up the highest score on the game Junkyard earns 100 league points. The person who puts up the second-highest score on Junkyard earns 99 league points. The person with the third-highest score on Junkyard earns 98 league points. And so on, like this. If 20 people showed up for the day, then the poor person who bottoms out earns a mere 81 league points for the game.

This is a relative ranking, yes. I don’t know any competitive-pinball scheme that uses more than one game that doesn’t rank players relative to each other. I’m not sure how an alternative could work. Different games have different scoring schemes. Some games try to dazzle with blazingly high numbers. Some hoard their points as if giving them away cost them anything. A score of 50 million points? If you had that on Attack From Mars you would earn sympathetic hugs and the promise that life will not always be like that. (I’m not sure it’s possible to get a score that low without tilting your game away.) 50 million points on Lord of the Rings would earn a bunch of nods that yeah, that’s doing respectably, but there’s other people yet to play. 50 million points on Scared Stiff would earn applause for the best game anyone had seen all year. 50 million points on The Wizard of Oz would get you named the Lord Mayor of Pinball, your every whim to be rapidly done.

And each individual manifestation of a table is different. It’s part of the fun of pinball. Each game is a real, physical thing, with its own idiosyncrasies. The flippers are a little different in strength. The rubber bands that guard most things are a little harder or softer. The table is a little more or less worn. The sensors are a little more or less sensitive. The tilt detector a little more forgiving, or a little more brutal. Really the least unfair way to rate play is comparing people to each other on a particular table played at approximately the same time.

It’s not perfectly fair. How could any real thing be? It’s maddening to put up the best game of your life on some table, and come in the middle of the pack because everybody else was having great games too. It’s some compensation that there’ll be times you have a mediocre game but everybody else has a lousy one so you’re third-place for the night.

Back to league. Players earn these points for every game played. So whoever has the highest score of all on, say, Attack From Mars gets 100 league points for that regardless of whatever they did on Junkyard. Whoever has the best score on Iron Maiden (a game so new we haven’t actually played it during league yet, and that somehow hasn’t got an entry on the Internet Pinball Database; give it time) gets their 100 points. And so on. A player’s standings for the night are based on all the league points earned on all the tables played. For us that’s usually five games. Five or six games seems about standard; that’s enough time playing and hanging out to feel worthwhile without seeming too long.

So each league night all the players earn between (about) 420 and 500 points. We have eight league nights. Add the scores up over those league nights and there we go. (Well, we drop the lowest nightly total for each player. This lets them miss a night for some responsibility, like work or travel or recovering from sickness or something, without penalizing them.)

As we got to the end of the season my love asked: is it possible to figure out which player showed the best improvement over time?

Well. I had everybody’s scores from every night played. And I’ve taken multiple classes in statistics. Why would I not be able to?

## Is A Basketball Tournament Interesting? My Thoughts

It’s a good weekend to bring this back. I have some essays about information theory and sports contests and maybe you missed them earlier. Here goes.

And then for a follow-up I started looking into actual scoring results from major sports. This let me estimate the information-theory content of the scores of soccer, (US) football, and baseball scores, to match my estimate of basketball scores’ information content.

Don’t try to use this to pass your computer science quals. But I hope it gives you something interesting to talk about while sulking over your brackets, and maybe to read about after that.

## Dabbing and the Pythagorean Theorem

The picture explains itself nicely. Just a thought on an average day.

I enjoyed this article from Fox Sports. Apparently, a French Precalculus textbook created a homework problem asking if football (soccer) superstar Paul Pogba is doing the perfect dab by creating two right triangles.

View original post

## How Much Might I Have Lost At Pinball?

After the state pinball championship last month there was a second, side tournament. It was a sort-of marathon event in which I played sixteen games in short order. I won three of them and lost thirteen, a disheartening record. The question I can draw from this: was I hopelessly outclassed in the side tournament? Is it plausible that I could do so awfully?

The answer would be “of course not”. I was playing against, mostly, the same people who were in the state finals. (A few who didn’t qualify for the finals joined the side tournament.) In that I had done well enough, winning seven games in all out of fifteen played. It’s implausible that I got significantly worse at pinball between the main and the side tournament. But can I make a logically sound argument about this?

In full, probably not. It’s too hard. The question is, did I win way too few games compared to what I should have expected? But what should I have expected? I haven’t got any information on how likely it should have been that I’d win any of the games, especially not when I faced something like a dozen different opponents. (I played several opponents twice.)

But we can make a model. Suppose that I had a fifty percent chance of winning each match. This is a lie in detail. The model contains lies; all models do. The lies might let us learn something interesting. Some people there I could only beat with a stroke of luck on my side. Some people there I could fairly often expect to beat. If we pretend I had the same chance against everyone, though, we get something that we can model. It might tell us something about what really happened.

If I play 16 matches, and have a 50 percent chance of winning each of them, then I should expect to win eight matches. But there’s no reason I might not win seven instead, or nine. Might win six, or ten, without that being too implausible. It’s even possible I might not win a single match, or that I might win all sixteen matches. How likely?

This calls for a creature from the field of probability that we call the binomial distribution. It’s “binomial” because it’s about stuff for which there are exactly two possible outcomes. This fits. Each match I can win or I can lose. (If we tie, or if the match is interrupted, we replay it, so there’s not another case.) It’s a “distribution” because we describe, for a set of some number of attempted matches, how the possible outcomes are distributed. The outcomes are: I win none of them. I win exactly one of them. I win exactly two of them. And so on, all the way up to “I win exactly all but one of them” and “I win all of them”.

To answer the question of whether it’s plausible I should have done so badly I need to know more than just how likely it is I would win only three games. I need to also know the chance I’d have done worse. If I had won only two games, or only one, or none at all. Why?

Here I admit: I’m not sure I can give a compelling reason, at least not in English. I’ve been reworking it all week without being happy at the results. Let me try pieces.

One part is that as I put the question — is it plausible that I could do so awfully? — isn’t answered just by checking how likely it is I would win only three games out of sixteen. If that’s awful, then doing even worse must also be awful. I can’t rule out even-worse results from awfulness without losing a sense of what the word “awful” means. Fair enough, to answer that question. But I made up the question. Why did I make up that one? Why not just “is it plausible I’d get only three out of sixteen games”?

Habit, largely. Experience shows me that the probability of any particular result turns out to be implausibly low. It isn’t quite that case here; there’s only seventeen possible noticeably different outcomes of playing sixteen games. But there can be so many possible outcomes that even the most likely one isn’t.

Take an extreme case. (Extreme cases are often good ways to build an intuitive understanding of things.) Imagine I played 16,000 games, with a 50-50 chance of winning each one of them. It is most likely that I would win 8,000 of the games. But the probability of winning exactly 8,000 games is small: only about 0.6 percent. What’s going on there is that there’s almost the same chance of winning exactly 8,001 or 8,002 games. As the number of games increases the number of possible different outcomes increases. If there are 16,000 games there are 16,001 possible outcomes. It’s less likely that any of them will stand out. What saves our ability to predict the results of things is that the number of plausible outcomes increases more slowly. It’s plausible someone would win exactly three games out of sixteen. It’s impossible that someone would win exactly three thousand games out of sixteen thousand, even though that’s the same ratio of won games.

Card games offer another way to get comfortable with this idea. A bridge hand, for example, is thirteen cards drawn out of fifty-two. But the chance that you were dealt the hand you just got? Impossibly low. Should we conclude from this all bridge hands are hoaxes? No, but ask my mother sometime about the bridge class she took that one cruise. “Three of sixteen” is too particular; “at best three of sixteen” is a class I can study.

Unconvinced? I don’t blame you. I’m not sure I would be convinced of that, but I might allow the argument to continue. I hope you will. So here are the specifics. These are the chance of each count of wins, and the chance of having exactly that many wins, for sixteen matches:

Wins Percentage
0 0.002 %
1 0.024 %
2 0.183 %
3 0.854 %
4 2.777 %
5 6.665 %
6 12.219 %
7 17.456 %
8 19.638 %
9 17.456 %
10 12.219 %
11 6.665 %
12 2.777 %
13 0.854 %
14 0.183 %
15 0.024 %
16 0.002 %

So the chance of doing as awfully as I had — winning zero or one or two or three games — is pretty dire. It’s a little above one percent.

Is that implausibly low? Is there so small a chance that I’d do so badly that we have to figure I didn’t have a 50-50 chance of winning each game?

I hate to think that. I didn’t think I was outclassed. But here’s a problem. We need some standard for what is “it’s implausibly unlikely that this happened by chance alone”. If there were only one chance in a trillion that someone with a 50-50 chance of winning any game would put in the performance I did, we could suppose that I didn’t actually have a 50-50 chance of winning any game. If there were only one chance in a million of that performance, we might also suppose I didn’t actually have a 50-50 chance of winning any game. But here there was only one chance in a hundred? Is that too unlikely?

It depends. We should have set a threshold for “too implausibly unlikely” before we started research. It’s bad form to decide afterward. There are some thresholds that are commonly taken. Five percent is often useful for stuff where it’s hard to do bigger experiments and the harm of guessing wrong (dismissing the idea I had a 50-50 chance of winning any given game, for example) isn’t so serious. One percent is another common threshold, again common in stuff like psychological studies where it’s hard to get more and more data. In a field like physics, where experiments are relatively cheap to keep running, you can gather enough data to insist on fractions of a percent as your threshold. Setting the threshold after is bad form.

In my defense, I thought (without doing the work) that I probably had something like a five percent chance of doing that badly by luck alone. It suggests that I did have a much worse than 50 percent chance of winning any given game.

Is that credible? Well, yeah; I may have been in the top sixteen players in the state. But a lot of those people are incredibly good. Maybe I had only one chance in three, or something like that. That would make the chance I did that poorly something like one in six, likely enough.

And it’s also plausible that games are not independent, that whether I win one game depends in some way on whether I won or lost the previous. But it does feel like it’s easier to win after a win, or after a close loss. And it feels harder to win a game after a string of losses. I don’t know that this can be proved, not on the meager evidence I have available. And you can almost always question the independence of a string of events like this. It’s the safe bet.

## How Interesting Is March Madness?

And now let me close the week with some other evergreen articles. A couple years back I mixed the NCAA men’s basketball tournament with information theory to produce a series of essays that fit the title I’ve given this recap. They also sprawl out into (US) football and baseball. Let me link you to them:

## How Much I Did Lose In Pinball

A follow-up for people curious how much I lost at the state pinball championships Saturday: I lost at the state pinball championships Saturday. As I expected I lost in the first round. I did beat my expectations, though. I’d figured I would win one, maybe two games in our best-of-seven contest. As it happened I won three games and I had a fighting chance in game seven.

I’d mentioned in the previous essay about how much contingency there is especially in a short series like this one. My opponent picked the game I expected she would to start out. And she got an awful bounce on the first ball, while I got a very lucky bounce that started multiball on the last. So I won, but not because I was playing better. The seventh game was one that I had figured she might pick if she needed to crush me, and if I had gotten a better bounce on the first ball I’d still have had an uphill struggle. Just less of one.

After the first round I got into a set of three “tie-breaking” rounds, used to sort out which of the sixteen players ranked as number 11 versus number 10. Each of those were a best-of-three series. I did win one series and lost two others, dropping me into 12th place. Over the three series I had four wins and four losses, so I can’t say that I mismatched there.

Where I might have been mismatched is the side tournament. This was a two-hour marathon of playing a lot of games one after the other. I finished with three wins and 13 losses, enough to make me wonder whether I somehow went from competent to incompetent in the hour or so between the main and the side tournament. Of course not, based on a record like that, but — can I prove it?

Meanwhile a friend pointed out The New York Times covering the New York State pinball championship:

The article is (at least for now) at https://www.nytimes.com/2017/02/12/nyregion/pinball-state-championship.html. What my friend couldn’t have known, and what shows how networked people are, is that I know one of the people featured in the article, Sean “The Storm” Grant. Well, I knew him, back in college. He was an awesome pinball player even then. And he’s only got more awesome since.

How awesome? Let me give you some background. The International Flipper Pinball Association (IFPA) gives players ranking points. These points are gathered by playing in leagues and tournaments. Each league or tournament has a certain point value. That point value is divided up among the players, in descending order from how they finish. How many points do the events have? That depends on how many people play and what their ranking is. So, yes, how much someone’s IFPA score increases depends on the events they go to, and the events they go to depend on their score. This might sound to you like there’s a differential equation describing all this. You’re close: it’s a difference equation, because these rankings change with the discrete number of events players go to. But there’s an interesting and iterative system at work there.

(Points only expire with time. The system is designed to encourage people to play a lot of things and keep playing them. You can’t lose ranking points by playing, although it might hurt your player-versus-player rating. That’s calculated by a formula I don’t understand at all.)

Anyway, Sean Grant plays in the New York Superleague, a crime-fighting band of pinball players who figured out how to game the IFPA rankings system. They figured out how to turn the large number of people who might visit a Manhattan bar and casually play one or two games into a source of ranking points for the serious players. The IFPA, combatting this scheme, just this week recalculated the Superleague values and the rankings of everyone involved in it. It’s fascinating stuff, in that way a heated debate over an issue you aren’t emotionally invested in can be.

Anyway. Grant is such a skilled player that he lost more points in this nerfing than I have gathered in my whole competitive-pinball-playing career.

So while I knew I’d be knocked out in the first round of the Michigan State Championships I’ll admit I had fantasies of having an impossibly lucky run. In that case, I’d have gone to the nationals and been turned into a pale, silverball-covered paste by people like Grant.

Thanks again for all your good wishes, kind readers. Now we start the long road to the 2017 State Championships, to be held in February of next year. I’m already in 63rd place in the state for the year! (There haven’t been many events for the year yet, and the championship and side tournament haven’t posted their ranking scores yet.)

## How Much Can I Expect To Lose In Pinball?

This weekend, all going well, I’ll be going to the Michigan state pinball championship contest. There, I will lose in the first round.

I’m not trying to run myself down. But I know who I’m scheduled to play in the first round, and she’s quite a good player. She’s the state’s highest-ranked woman playing competitive pinball. So she starts off being better than me. And then the venue is one she gets to play in more than I do. Pinball, a physical thing, is idiosyncratic. The reflexes you build practicing on one table can betray you on a strange machine. She’s had more chance to practice on the games we have and that pretty well settles the question. I’m still showing up, of course, and doing my best. Stranger things have happened than my winning a game. But I’m going in with I hope realistic expectations.

That bit about having realistic expectations, though, makes me ask what are realistic expectations. The first round is a best-of-seven match. How many games should I expect to win? And that becomes a probability question. It’s a great question to learn on, too. Our match is straightforward to model: we play up to seven times. Each time we play one or the other wins.

So we can start calculating. There’s some probability I have of winning any particular game. Call that number ‘p’. It’s at least zero (I’m not sure to lose) but it’s less than one (I’m not sure to win). Let’s suppose the probability of my winning never changes over the course of seven games. I will come back to the card I palmed there. If we’re playing 7 games, and I have a chance ‘p’ of winning any one of them, then the number of games I can expect to win is 7 times ‘p’. This is the number of wins you might expect if you were called on in class and had no idea and bluffed the first thing that came to mind. Sometimes that works.

7 times p isn’t very enlightening. What number is ‘p’, after all? And I don’t know exactly. The International Flipper Pinball Association tracks how many times I’ve finished a tournament or league above her and vice-versa. We’ve played in 54 recorded events together, and I’ve won 23 and lost 29 of them. (We’ve tied twice.) But that isn’t all head-to-head play. It counts matches where I’m beaten by someone she goes on to beat as her beating me, and vice-versa. And it includes a lot of playing not at the venue. I lack statistics and must go with my feelings. I’d estimate my chance of beating her at about one in three. Let’s say ‘p’ is 1/3 until we get evidence to the contrary. It is “Flipper Pinball” because the earliest pinball machines had no flippers. You plunged the ball into play and nudged the machine a little to keep it going somewhere you wanted. (The game Simpsons Pinball Party has a moment where Grampa Simpson says, “back in my day we didn’t have flippers”. It’s the best kind of joke, the one that is factually correct.)

Seven times one-third is not a difficult problem. It comes out to two and a third, raising the question of how one wins one-third of a pinball game. Most games involve playing three rounds, called balls, is the obvious observation. But this one-third of a game is an average. Imagine the two of us playing three thousand seven-game matches, without either of us getting the least bit better or worse or collapsing of exhaustion. I would expect to win seven thousand of the games, or two and a third games per seven-game match.

Ah, but … that’s too high. I would expect to win two and a third games out of seven. But we probably won’t play seven. We’ll stop when she or I gets to four wins. This makes the problem hard. Hard is the wrong word. It makes the problem tedious. At least it threatens to. Things will get easy enough, but we have to go through some difficult parts first.

There are eight different ways that our best-of-seven match can end. She can win in four games. I can win in four games. She can win in five games. I can win in five games. She can win in six games. I can win in six games. She can win in seven games. I can win in seven games. There is some chance of each of those eight outcomes happening. And exactly one of those will happen; it’s not possible that she’ll win in four games and in five games, unless we lose track of how many games we’d played. They give us index cards to write results down. We won’t lose track.

It’s easy to calculate the probability that I win in four games, if the chance of my winning a game is the number ‘p’. The probability is p4. Similarly it’s easy to calculate the probability that she wins in four games. If I have the chance ‘p’ of winning, then she has the chance ‘1 – p’ of winning. So her probability of winning in four games is (1 – p)4.

The probability of my winning in five games is more tedious to work out. It’s going to be p4 times (1 – p) times 4. The 4 here is the number of different ways that she can win one of the first four games. Turns out there’s four ways to do that. She could win the first game, or the second, or the third, or the fourth. And in the same way the probability she wins in five games is p times (1 – p)4 times 4.

The probability of my winning in six games is going to be p4 times (1 – p)2 times 10. There are ten ways to scatter four wins by her among the first five games. The probability of her winning in six games is the strikingly parallel p2 times (1 – p)4 times 10.

The probability of my winning in seven games is going to be p4 times (1 – p)3 times 20, because there are 20 ways to scatter three wins among the first six games. And the probability of her winning in seven games is p3 times (1 – p)4 times 20.

Add all those probabilities up, no matter what ‘p’ is, and you should get 1. Exactly one of those four outcomes has to happen. And we can work out the probability that the series will end after four games: it’s the chance she wins in four games plus the chance I win in four games. The probability that the series goes to five games is the probability that she wins in five games plus the probability that I win in five games. And so on for six and for seven games.

So that’s neat. We can figure out the probability of the match ending after four games, after five, after six, or after seven. And from that we can figure out the expected length of the match. This is the expectation value. Take the product of ‘4’ and the chance the match ends at four games. Take the product of ‘5’ and the chance the match ends at five games. Take the product of ‘6’ and the chance the match ends at six games. Take the product of ‘7’ and the chance the match ends at seven games. Add all those up. That’ll be, wonder of wonders, the number of games a match like this can be expected to run.

Now it’s a matter of adding together all these combinations of all these different outcomes and you know what? I’m not doing that. I don’t know what the chance is I’d do all this arithmetic correctly is, but I know there’s no chance I’d do all this arithmetic correctly. This is the stuff we pirate Mathematica to do. (Mathematica is supernaturally good at working out mathematical expressions. A personal license costs all the money you will ever have in your life plus ten percent, which it will calculate for you.)

Happily I won’t have to work it out. A person appearing to be a high school teacher named B Kiggins has worked it out already. Kiggins put it and a bunch of other interesting worksheets on the web. (Look for the Voronoi Diagramas!)

There’s a lot of arithmetic involved. But it all simplifies out, somehow. Per Kiggins’ work, the expected number of games in a best-of-seven match, if one of the competitors has the chance ‘p’ of winning any given game, is:

$E(p) = 4 + 4\cdot p + 4\cdot p^2 + 4\cdot p^3 - 52\cdot p^4 + 60\cdot p^5 - 20\cdot p^6$

Whatever you want to say about that, it’s a polynomial. And it’s easy enough to evaluate it, especially if you let the computer evaluate it. Oh, I would say it seems like a shame all those coefficients of ‘4’ drop off and we get weird numbers like ’52’ after that. But there’s something beautiful in there being four 4’s, isn’t there? Good enough.

So. If the chance of my winning a game, ‘p’, is one-third, then we’d expect the series to go 5.5 games. This accords well with my intuition. I thought I would be likely to win one game. Winning two would be a moral victory akin to championship.

Let me go back to my palmed card. This whole analysis is based on the idea that I have some fixed probability of winning and that it isn’t going to change from one game to the next. If the probability of winning is entirely based on my and my opponents’ abilities this is fair enough. Neither of us is likely to get significantly more or less skilled over the course of even seven matches. We won’t even play long enough to get fatigued. But ability isn’t everything.

But our abilities aren’t everything. We’re going to be playing up to seven different tables. How each table reacts to our play is going to vary. Some tables may treat me better, some tables my opponent. Luck of the draw. And there’s an important psychological component. It’s easy to get thrown and to let a bad ball wreck the rest of one’s game. It’s hard to resist feeling nervous if you go into the last ball from way behind your opponent. And it seems as if a pinball knows you’re nervous and races out of play to help you calm down. (The best pinball players tend to have outstanding last balls, though. They don’t get rattled. And they spend the first several balls building up to high-value shots they can collect later on.) And there will be freak events. Last weekend I was saved from elimination in a tournament by the pinball machine spontaneously resetting. We had to replay the game. I did well in the tournament, but it was the freak event that kept me from being knocked out in the first round.

That’s some complicated stuff to fit together. I suppose with enough data we could possibly model how much the differences between pinball machines affects the outcome. That’s what sabermetrics is all about. Representing how severely I’ll build a little bad luck into a lot of bad luck? Oh, that’s hard.

Too hard to deal with, at least not without much more sports psychology and modelling of pinball players than we have data to do. The supposition that my chance of winning is fixed for the duration of the match may not be true. But we won’t be playing enough games to be able to tell the difference. The assumption that my chance of winning doesn’t change over the course of the match may be false. But it’s near enough, and it gets us some useful information. We have to know not to demand too much precision from our model.

And seven games isn’t statistically significant. Not when players are as closely matched as we are. I could be worse and still get a couple wins in when they count; I could play better than my average and still get creamed four games straight. I’ll be trying my best, of course. But I expect my best is one or two wins, then getting to the snack room and waiting for the side tournament to start. Shall let you know if something interesting happens.

## Reading the Comics, October 14, 2016: Classics Edition

The mathematically-themed comic strips of the past week tended to touch on some classic topics and classic motifs. That’s enough for me to declare a title for these comics. Enjoy, won’t you please?

John McPherson’s Close To Home for the 9th uses the classic board full of mathematics to express deep thinking. And it’s deep thinking about sports. Nerds like to dismiss sports as trivial and so we get the punch line out of this. But models of sports have been one of the biggest growth fields in mathematics the past two decades. And they’ve shattered many longstanding traditional understandings of strategy. It’s not proper mathematics on the board, but that’s all right. It’s not proper sabermetrics either.

Vic Lee’s Pardon My Planet for the 10th is your classic joke about putting mathematics in marketable terms. There is an idea that a mathematical idea to be really good must be beautiful. And it’s hard to say exactly what beauty is, but “short” and “simple” seem to be parts of it. That’s a fine idea, as long as you don’t forget how context-laden these are. Whether an idea is short depends on what ideas and what concepts you have as background. Whether it’s simple depends on how much you’ve seen similar ideas before. π looks simple. “The smallest positive root of the solution to the differential equation y”(x) = -y(x) where y(0) = 0 and y'(0) = 1” looks hard, but however much mathematics you know, rhetoric alone tells you those are the same thing.

Scott Hilburn’s The Argyle Sweater for the 10th is your classic anthropomorphic-numerals joke. Well, anthropomorphic-symbols in this case. But it’s the same genre of joke.

Randy Glasbergen’s Glasbergen Cartoons rerun for the 10th is your classic sudoku-and-arithmetic-as-hard-work joke. And it’s neat to see “programming a VCR” used as an example of the difficult-to-impossible task for a comic strip drawn late enough that it’s into the era of flat-screen, flat-bodied desktop computers.

Bill Holbrook’s On The Fastrack for 11th is your classic grumbling-about-how-mathematics-is-understood joke. Well, statistics, but most people consider that part of mathematics. (One could mount a strong argument that statistics is as independent of mathematics as physics or chemistry are.) Statistics offers many chances for intellectual mischief, whether deliberately or just from not thinking matters through. That may be inevitable. Sampling, as in political surveys, must talk about distributions, about ranges of possible results. It’s hard to be flawless about that.

That said I’m not sure I can agree with Fi in her example here. Take her example, a political poll with three-point margin of error. If the poll says one candidate’s ahead by three points, Fi asserts, they’ll say it’s tied when it’s as likely the lead is six. I don’t see that’s quite true, though. When we sample something we estimate the value of something in a population based on what it is in the sample. Obviously we’ll be very lucky if the population and the sample have exactly the same value. But the margin of error gives us a range of how far from the sample value it’s plausible the whole population’s value is, or would be if we could measure it. Usually “plausible” means 95 percent; that is, 95 percent of the time the actual value will be within the margin of error of the sample’s value.

So here’s where I disagree with Fi. Let’s suppose that the first candidate, Kirk, polls at 43 percent. The second candidate, Picard, polls at 40 percent. (Undecided or third-party candidates make up the rest.) I agree that Kirk’s true, whole-population, support is equally likely to be 40 percent or 46 percent. But Picard’s true, whole-population, support is equally likely to be 37 percent or 43 percent. Kirk’s lead is actually six points if his support was under-represented in the sample and Picard’s was over-represented, by the same measures. But suppose Kirk was just as over-represented and Picard just as under-represented as they were in the previous case. This puts Kirk at 40 percent and Picard at 43 percent, a Kirk-lead of minus three percentage points.

So what’s the actual chance these two candidates are tied? Well, you have to say what a tie is. It’s vanishingly impossible they have precisely the same true support and we can’t really calculate that. Don’t blame statisticians. You tell me an election in which one candidate gets three more votes than the other isn’t really tied, if there are more than seven votes cast. We can work on “what’s the chance their support is less than some margin?” And then you’d have all the possible chances where Kirk gets a lower-than-surveyed return while Picard gets a higher-than-surveyed return. I can’t say what that is offhand. We haven’t said what this margin-of-tying is, for one thing.

But it is certainly higher than the chance the lead is actually six; that only happens if the actual vote is different from the poll in one particular way. A tie can happen if the actual vote is different from the poll in many different ways.

Doing a quick and dirty little numerical simulation suggests to me that, if we assume the sampling respects the standard normal distribution, then in this situation Kirk probably is ahead of Picard. Given a three-point lead and a three-point margin for error Kirk would be expected to beat Picard about 92 percent of the time, while Picard would win about 8 percent of the time.

Here I have been making the assumption that Kirk’s and Picard’s support are to an extent independent. That is, a vote might be for Kirk or for Picard or for neither. There’s this bank of voting-for-neither-candidate that either could draw on. If there are no undecided candidates, every voter is either Kirk or Picard, then all of this breaks down: Kirk can be up by six only if Picard is down by six. But I don’t know of surveys that work like that.

Not to keep attacking this particular strip, which doesn’t deserve harsh treatment, but it gives me so much to think about. Assuming by “they” Fi means news anchors — and from what we get on panel, it’s not actually clear she does — I’m not sure they actually do “say the poll is tied”. What I more often remember hearing is that the difference is equal to, or less than, the survey’s margin of error. That might get abbreviated to “a statistical tie”, a usage that I think is fair. But Fi might mean the candidates or their representatives in saying “they”. I can’t fault the campaigns for interpreting data in ways useful for their purposes. The underdog needs to argue that the election can yet be won. The leading candidate needs to argue against complacency. In either case a tie is a viable selling point and a reasonable interpretation of the data.

Gene Weingarten, Dan Weingarten, and David Clark’s Barney and Clyde for the 12th is a classic use of Einstein and general relativity to explain human behavior. Everyone’s tempted by this. Usually it’s thermodynamics that inspires thoughts that society could be explained mathematically. There’s good reason for this. Thermodynamics builds great and powerful models of complicated systems by supposing that we never know, or need to know, what any specific particle of gas or fluid is doing. We care only about aggregate data. That statistics shows we can understand much about humanity without knowing fine details reinforces this idea. The Wingartens and Clark probably shifted from thermodynamics to general relativity because Einstein is recognizable to normal people. And we’ve all at least heard of mass warping space and can follow the metaphor to money warping law.

In vintage comics, Dan Barry’s Flash Gordon for the 14th (originally run the 28th of November, 1961) uses the classic idea that sufficient mathematics talent will outwit games of chance. Many believe it. I remember my grandmother’s disappointment that she couldn’t bring the underaged me into the casinos in Atlantic City. This did save her the disappointment of learning I haven’t got any gambling skill besides occasionally buying two lottery tickets if the jackpot is high enough. I admit that an irrational move on my part, but I can spare two dollars for foolishness once or twice a year. The idea of beating a roulette wheel, at least a fair wheel, isn’t absurd. In principle if you knew enough about how the wheel was set up and how the ball was weighted and how it was launched into the spin you could predict where it would land. In practice, good luck. I wouldn’t be surprised if a good roulette wheel weren’t chaotic, or close to it. If it’s chaotic then while the outcome could be predicted if the wheel’s spin and the ball’s initial speed were known well enough, they can’t be measured well enough for a prediction to be meaningful. The comic also uses the classic word balloon full of mathematical symbols to suggest deep reasoning. I spotted Einstein’s famous quote there.

## Some End-Of-August Mathematics Reading

I’ve found a good way to procrastinate on the next essay in the Why Stuff Can Orbit series. (I’m considering explaining all of differential calculus, or as much as anyone really needs, to save myself a little work later on.) In the meanwhile, though, here’s some interesting reading that’s come to my attention the last few weeks and that you might procrastinate your own projects with. (Remember Benchley’s Principle!)

First is Jeremy Kun’s essay Habits of highly mathematical people. I think it’s right in describing some of the worldview mathematics training instills, or that encourage people to become mathematicians. It does seem to me, though, that most everything Kun describes is also true of philosophers. I’m less certain, but I strongly suspect, that it’s also true of lawyers. These concentrations all tend to encourage thinking about we mean by things, and to test those definitions by thought experiments. If we suppose this to be true, then what implications would it have? What would we have to conclude is also true? Does it include anything that would be absurd to say? And is are the results useful enough we can accept a bit of apparent absurdity?

New York magazine had an essay: Jesse Singal’s How Researchers Discovered the Basketball “Hot Hand”. The “Hot Hand” phenomenon is one every sports enthusiast, and most casual fans, know: sometimes someone is just playing really, really well. The problem has always been figuring out whether it exists. Do anything that isn’t a sure bet long enough and there will be streaks. There’ll be a stretch where it always happens; there’ll be a stretch where it never does. That’s how randomness works.

But it’s hard to show that. The messiness of the real world interferes. A chance of making a basketball shot is not some fixed thing over the course of a career, or over a season, or even over a game. Sometimes players do seem to be hot. Certainly anyone who plays anything competitively experiences a feeling of being in the zone, during which stuff seems to just keep going right. It’s hard to disbelieve something that you witness, even experience.

So the essay describes some of the challenges of this: coming up with a definition of a “hot hand”, for one. Coming up with a way to test whether a player has a hot hand. Seeing whether they’re observed in the historical record. Singal’s essay writes about some of the history of studying hot hands. There is a lot of probability, and of psychology, and of experimental design in it.

And then there’s this intriguing question Analysis Fact Of The Day linked to: did Gaston Julia ever see a computer-generated image of a Julia Set? There are many Julia Sets; they and their relative, the Mandelbrot Set, became trendy in the fractals boom of the 1980s. If you knew a mathematics major back then, there was at least one on her wall. It typically looks like a craggly, lightning-rimmed cloud. Its shapes are not easy to imagine. It’s almost designed for the computer to render. Gaston Julia died in March of 1978. Could he have seen a depiction?

It’s not clear. The linked discussion digs up early computer renderings. It also brings up an example of a late-19th-century hand-drawn depiction of a Julia-like set, and compares it to a modern digital rendition of the thing. Numerical simulation saves a lot of tedious work; but it’s always breathtaking to see how much can be done by reason.

## Reading the Comics, August 5, 2016: Word Problems Edition

And now to close out the rest of last week’s comics, those from between the 1st and the 6th of the month. It’s a smaller set. Take it up with the traffic division of Comic Strip Master Command.

Mason Mastroianni, Mick Mastroianni, and Perri Hart’s B.C. for the 2nd is mostly a word problem joke. It’s boosted some by melting into it a teacher complaining about her pay. It does make me think some about what the point of a story problem is. That is, why is the story interesting? Often it isn’t. The story is just an attempt to make a computation problem look like the sort of thing someone might wonder in the real world. This is probably why so many word problems are awful as stories and as incentive to do a calculation. There’s a natural interest that one might have in, say, the total distance travelled by a rubber ball dropped and bouncing until it finally comes to a rest. But that’s only really good for testing how one understands a geometric series. It takes more storytelling to work out why you might want to find a cube root of x2 minus eight.

Dave Whamond’s Reality Check for the 3rd uses mathematics on the blackboard as symbolic for all the problems one might have. Also a solution, if you call it that. It wouldn’t read so clearly if Ms Haversham had an English problem on the board.

Mark Anderson’s Andertoons for the 5th keeps getting funnier to me. At first reading I didn’t connect the failed mathematics problem of 2 x 0 with the caption. Once I did, I realized how snugly fit the comic is.

Greg Curfman’s Meg Classics for the 5th ran originally the 23rd of May, 1998. The application of mathematics to everyday sports was a much less developed thing back then. It’s often worthwhile to methodically study what you do, though, to see what affects the results. Here Mike has found the team apparently makes twelve missed shots for each goal. This might not seem like much of a formula, but these are kids. We shouldn’t expect formulas with a lot of variables under consideration. Since Meg suggests Mike needed to account for “the whiff factor” I have to suppose she doesn’t understand the meaning of the formula. Or perhaps she wonders why missed kicks before getting to the goal don’t matter. Well, every successful model starts out as a very simple thing to which we add complexity, and realism, as we’re able to handle them. If lucky we end up with a good balance between a model that describes what we want to know and yet is simple enough to understand.

## How Interesting Is A Baseball Score? Some Further Results

While researching for my post about the information content of baseball scores I found some tantalizing links. I had wanted to know how often each score came up. From this I could calculate the entropy, the amount of information in the score. That’s the sum, taken over every outcome, of minus one times the frequency of that score times the base-two logarithm of the frequency of the outcome. And I couldn’t find that.

An article in The Washington Post had a fine lead, though. It offers, per the title, “the score of every basketball, football, and baseball game in league history visualized”. And as promised it gives charts of how often each number of runs has turned up in a game. The most common single-team score in a game is 3, with 4 and 2 almost as common. I’m not sure the date range for these scores. The chart says it includes (and highlights) data from “a century ago”. And as the article was posted in December 2014 it can hardly use data from after that. I can’t imagine that the 2015 season has changed much, though. And whether they start their baseball statistics at either 1871, 1876, 1883, 1891, or 1901 (each a defensible choice) should only change details.

Which is fine. I can’t get precise frequency data from the chart. The chart offers how many thousands of times a particular score has come up. But there’s not the reference lines to say definitely whether a zero was scored closer to 21,000 or 22,000 times. I will accept a rough estimate, since I can’t do any better.

I made my best guess at the frequency, from the chart. And then made a second-best guess. My best guess gave the information content of a single team’s score as a touch more than 3.5 bits. My second-best guess gave the information content as a touch less than 3.5 bits. So I feel safe in saying a single team’s score is about three and a half bits of information.

So the score of a baseball game, with two teams scoring, is probably somewhere around twice that, or about seven bits of information.

I have to say “around”. This is because the two teams aren’t scoring runs independently of one another. Baseball doesn’t allow for tie games except in rare circumstances. (It would usually be a game interrupted for some reason, and then never finished because the season ended with neither team in a position where winning or losing could affect their standing. I’m not sure that would technically count as a “game” for Major League Baseball statistical purposes. But I could easily see a roster of game scores counting that.) So if one team’s scored three runs in a game, we have the information that the other team almost certainly didn’t score three runs.

This estimate, though, does fit within my range estimate from 3.76 to 9.25 bits. And as I expected, it’s closer to nine bits than to four bits. The entropy seems to be a bit less than (American) football scores — somewhere around 8.7 bits — and college basketball — probably somewhere around 10.8 bits — which is probably fair. There are a lot of numbers that make for plausible college basketball scores. There are slightly fewer pairs of numbers that make for plausible football scores. There are fewer still pairs of scores that make for plausible baseball scores. So there’s less information conveyed in knowing that the game’s score is.

## Reading the Comics, May 12, 2016: No Pictures Again Edition

I’ve hardly stopped reading the comics. I doubt I could even if I wanted at this point. But all the comics this bunch are from GoComics, which as far as I’m aware doesn’t turn off access to comic strips after a couple of weeks. So I don’t quite feel justified including the images of the comics when you can just click links to them instead.

It feels a bit barren, I admit. I wonder if I shouldn’t commission some pictures so I have something for visual appeal. There’s people I know who do comics online. They might be able to think of something to go alongside every “Student has snarky answer for a word problem” strip.

Brian and Ron Boychuk’s The Chuckle Brothers for the 8th of May drops in an absolute zero joke. Absolute zero’s a neat concept. People became aware of it partly by simple extrapolation. Given that the volume of a gas drops as the temperature drops, is there a temperature at which the volume drops to zero? (It’s complicated. But that’s the thread I use to justify pointing out this strip here.) And people also expected there should be an absolute temperature scale because it seemed like we should be able to describe temperature without tying it to a particular method of measuring it. That is, it would be a temperature “absolute” in that it’s not explicitly tied to what’s convenient for Western Europeans in the 19th century to measure. That zero and that instrument-independent temperature idea get conflated, and reasonably so. Hasok Chang’s Inventing Temperature: Measurement and Scientific Progress is well-worth the read for people who want to understand absolute temperature better.

Gene Weingarten, Dan Weingarten & David Clark’s Barney and Clyde for the 9th is another strip that seems like it might not belong here. While it’s true that accidents sometimes lead to great scientific discoveries, what has that to do with mathematics? And the first thread is that there are mathematical accidents and empirical discoveries. Many of them are computer-assisted. There is something that feels experimental about doing a simulation. Modern chaos theory, the study of deterministic yet unpredictable systems, has at its founding myth Edward Lorentz discovering that tiny changes in a crude weather simulation program mattered almost right away. (By founding myth I don’t mean that it didn’t happen. I just mean it’s become the stuff of mathematics legend.)

But there are other ways that “accidents” can be useful. Monte Carlo methods are often used to find extreme — maximum or minimum — solutions to complicated systems. These are good if it’s hard to find a best possible answer, but it’s easy to compare whether one solution is better or worse than another. We can get close to the best possible answer by picking an answer at random, and fiddling with it at random. If we improve things, good: keep the change. You can see why this should get us pretty close to a best-possible-answer soon enough. And if we make things worse then … usually but not always do we reject the change. Sometimes we take this “accident”. And that’s because if we only take improvements we might get caught at a local extreme. An even better extreme might be available but only by going down an initially unpromising direction. So it’s worth allowing for some “mistakes”.

Mark Anderson’s Andertoons for the 10th of Anderson is some wordplay on volume. The volume of boxes is an easy formula to remember and maybe it’s a boring one. It’s enough, though. You can work out the volume of any shape using just the volume of boxes. But you do need integral calculus to tell how to do it. So maybe it’s easier to memorize the formula for volumes of a pyramid and a sphere.

Berkeley Breathed’s Bloom County for the 10th of May is a rerun from 1981. And it uses a legitimate bit of mathematics for Milo to insult Freida. He calls her a “log 10 times 10 to the derivative of 10,000”. The “log 10” is going to be 1. A reference to logarithm, without a base attached, means either base ten or base e. “log” by itself used to invariably mean base ten, back when logarithms were needed to do ordinary multiplication and division and exponentiation. Now that we have calculators for this mathematicians have started reclaiming “log” to mean the natural logarithm, base e, which is normally written “ln”, but that’s still an eccentric use. Anyway, the logarithm base ten of ten is 1: 10 is equal to 10 to the first power.

10 to the derivative of 10,000 … well, that’s 10 raised to whatever number “the derivative of 10,000” is. Derivatives take us into calculus. They describe how much a quantity changes as one or more variables change. 10,000 is just a number; it doesn’t change. It’s called a “constant”, in another bit of mathematics lingo that reminds us not all mathematics lingo is hard to understand. Since it doesn’t change, its derivative is zero. As anything else changes, the constant 10,000 does not. So the derivative of 10,000 is zero. 10 to the zeroth power is 1.

So, one times one is … one. And it’s rather neat that kids Milo’s age understand derivatives well enough to calculate that.

Ruben Bolling’s Super-Fun-Pak Comix rerun for the 10th happens to have a bit of graph theory in it. One of Uncle Cap’n’s Puzzle Pontoons is a challenge to trace out a figure without retracting a line or lifting your pencil. You can’t, not this figure. One of the first things you learn in graph theory teaches how to tell, and why. And thanks to a Twitter request I’m figuring to describe some of that for the upcoming Theorem Thursdays project. Watch this space!

Charles Schulz’s Peanuts Begins for the 11th, a rerun from the 6th of February, 1952, is cute enough. It’s one of those jokes about how a problem seems intractable until you’ve found the right way to describe it. I can’t fault Charlie Brown’s thinking here. Figuring out a way the problems are familiar and easy is great.

Shaenon K Garrity and Jeffrey C Wells’s Skin Horse for the 12th is a “see, we use mathematics in the real world” joke. In this case it’s triangles and triangulation. That’s probably the part of geometry it’s easiest to demonstrate a real-world use for, and that makes me realize I don’t remember mathematics class making use of that. I remember it coming up some, particularly in what must have been science class when we built and launched model rockets. We used a measure of how high an angle the rocket reached, and knowledge of how far the observing station was from the launchpad. But that wasn’t mathematics class for some reason, which is peculiar.

## How Interesting Is A Baseball Score? Some Partial Results

Meanwhile I have the slight ongoing quest to work out the information-theory content of sports scores. For college basketball scores I made up some plausible-looking score distributions and used that. For professional (American) football I found a record of all the score outcomes that’ve happened, and how often. I could use experimental results. And I’ve wanted to do other sports. Soccer was asked for. I haven’t been able to find the scoring data I need for that. Baseball, maybe the supreme example of sports as a way to generate statistics … has been frustrating.

The raw data is available. Retrosheet.org has logs of pretty much every baseball game, going back to the forming of major leagues in the 1870s. What they don’t have, as best I can figure, is a list of all the times each possible baseball score has turned up. That I could probably work out, when I feel up to writing the scripts necessary, but “work”? Ugh.

Some people have done the work, although they haven’t shared all the results. I don’t blame them; the full results make for a boring sort of page. “The Most Popular Scores In Baseball History”, at ValueOverReplacementGrit.com, reports the top ten most common scores from 1871 through 2010. The essay also mentions that as of then there were 611 unique final scores. And that lets me give some partial results, if we trust that blogger post from people I never heard of before are accurate and true. I will make that assumption over and over here.

There’s, in principle, no limit to how many scores are possible. Baseball contains many implied infinities, and it’s not impossible that a game could end, say, 580 to 578. But it seems likely that after 139 seasons of play there can’t be all that many more scores practically achievable.

Suppose then there are 611 possible baseball score outcomes, and that each of them is equally likely. Then the information-theory content of a score’s outcome is negative one times the logarithm, base two, of 1/611. That’s a number a little bit over nine and a quarter. You could deduce the score for a given game by asking usually nine, sometimes ten, yes-or-no questions from a source that knew the outcome. That’s a little higher than the 8.7 I worked out for football. And it’s a bit less than the 10.8 I estimate for college basketball.

And there’s obvious rubbish there. In no way are all 611 possible outcomes equally likely. “The Most Popular Scores In Baseball History” says that right there in the essay title. The most common outcome was a score of 3-2, with 4-3 barely less popular. Meanwhile it seems only once, on the 28th of June, 1871, has a baseball game ended with a score of 49-33. Some scores are so rare we can ignore them as possibilities.

(You may wonder how incompetent baseball players of the 1870s were that a game could get to 49-33. Not so bad as you imagine. But the equipment and conditions they were playing with were unspeakably bad by modern standards. Notably, the playing field couldn’t be counted on to be flat and level and well-mowed. There would be unexpected divots or irregularities. This makes even simple ground balls hard to field. The baseball, instead of being replaced with every batter, would stay in the game. It would get beaten until it was a little smashed shell of unpredictable dynamics and barely any structural integrity. People were playing without gloves. If a game ran long enough, they would play at dusk, without lights, with a muddy ball on a dusty field. And sometimes you just have four innings that get out of control.)

What’s needed is a guide to what are the common scores and what are the rare scores. And I haven’t found that, nor worked up the energy to make the list myself. But I found some promising partial results. In a September 2008 post on Baseball-Fever.com, user weskelton listed the 24 most common scores and their frequency. This was for games from 1993 to 2008. One might gripe that the list only covers fifteen years. True enough, but if the years are representative that’s fine. And the top scores for the fifteen-year survey look to be pretty much the same as the 139-year tally. The 24 most common scores add up to just over sixty percent of all baseball games, which leaves a lot of scores unaccounted for. I am amazed that about three in five games will have a score that’s one of these 24 choices though.

But that’s something. We can calculate the information content for the 25 outcomes, one each of the 24 particular scores and one for “other”. This will under-estimate the information content. That’s because “other” is any of 587 possible outcomes that we’re not distinguishing. But if we have a lower bound and an upper bound, then we’ve learned something about what the number we want can actually be. The upper bound is that 9.25, above.

The information content, the entropy, we calculate from the probability of each outcome. We don’t know what that is. Not really. But we can suppose that the frequency of each outcome is close to its probability. If there’ve been a lot of games played, then the frequency of a score and the probability of a score should be close. At least they’ll be close if games are independent, if the score of one game doesn’t affect another’s. I think that’s close to true. (Some games at the end of pennant races might affect each other: why try so hard to score if you’re already out for the year? But there’s few of them.)

The entropy then we find by calculating, for each outcome, a product. It’s minus one times the probability of that outcome times the base-two logarithm of the probability of that outcome. Then add up all those products. There’s good reasons for doing it this way and in the college-basketball link above I give some rough explanations of what the reasons are. Or you can just trust that I’m not lying or getting things wrong on purpose.

So let’s suppose I have calculated this right, using the 24 distinct outcomes and the one “other” outcome. That makes out the information content of a baseball score’s outcome to be a little over 3.76 bits.

As said, that’s a low estimate. Lumping about two-fifths of all games into the single category “other” drags the entropy down.

But that gives me a range, at least. A baseball game’s score seems to be somewhere between about 3.76 and 9.25 bits of information. I expect that it’s closer to nine bits than it is to four bits, but will have to do a little more work to make the case for it.

## How Interesting Is A Football Score?

Last month, Sarcastic Goat asked me how interesting a soccer game was. This is “interesting” in the information theory sense. I describe what that is in a series of posts, linked to from above. That had been inspired by the NCAA “March Madness” basketball tournament. I’d been wondering about the information-theory content of knowing the outcome of the tournament, and of each game.

This measure, called the entropy, we can work out from knowing how likely all the possible outcomes of something — anything — are. If there’s two possible outcomes and they’re equally likely, the entropy is 1. If there’s two possible outcomes and one is a sure thing while the other can’t happen, the entropy is 0. If there’s four possible outcomes and they’re all equally likely, the entropy is 2. If there’s eight possible outcomes, all equally likely, the entropy is 3. If there’s eight possible outcomes and some are likely while some are long shots, the entropy is … smaller than 3, but bigger than 0. The entropy grows with the number of possible outcomes and shrinks with the number of unlikely outcomes.

But it’s easy to calculate. List all the possible outcomes. Find the probability of each of those possible outcomes happening. Then, calculate minus one times the probability of each outcome times the logarithm, base two, of that outcome. For each outcome, so yes, this might take a while. Then add up all those products.

I’d estimated the outcome of the 63-game basketball tournament was somewhere around 48 bits of information. There’s a fair number of foregone, or almost foregone, conclusions in the game, after all. And I guessed, based on a toy model of what kinds of scores often turn up in college basketball games, that the game’s score had an information content of a little under 11 bits of information.

Sarcastic Goat, as I say, asked about soccer scores. I don’t feel confident that I could make up a plausible model of soccer score distributions. So I went looking for historical data. Surely, a history of actual professional soccer scores over a couple decades would show all the possible, plausible, outcomes and how likely each was to turn out.

I didn’t find one. My search for soccer scores kept getting contaminated with (American) football scores. But that turned up something interesting anyway. Sports Reference LLC has a table which purports to list the results of all professional football games played from 1920 through the start of 2016. There’ve been, apparently, some 1,026 different score outcomes, from 0-0 through to 73-0.

As you’d figure, there are a lot of freakish scores; only once in professional football history has the game ended 62-28. (Although it’s ended 62-14 twice, somehow.) There hasn’t been a 2-0 game since the second week of the 1938 season. Some scores turn up a lot; 248 games (as of this writing) have ended 20-17. That’s the most common score, in its records. 27-24 and 17-14 are the next most common scores. If I’m not making a dumb mistake, 7-0 is the 21st most common score. 93 games have ended with that tally. But it hasn’t actually been a game’s final score since the 14th week of the 1983 season, somehow. 98 games have ended 21-17; only ten have ended 21-18. Weird.

Anyway, there’s 1,026 recorded outcomes. That’s surely as close to “all the possible outcomes” as we can expect to get, at least until the Jets manage to lose 74-0 in their home opener. But if all 1,026 outcomes were equally likely then the information content of the game’s score would be a touch over 10 bits. But these outcomes aren’t all equally likely. It’s vastly more likely that a game ended 16-13 than it is likely it ended 16-8.

Let’s suppose I didn’t make any stupid mistakes in working out the frequency of all the possible outcomes. Then the information content of a football game’s outcome is a little over 8.72 bits.

Don’t be too hypnotized by the digits past the decimal. It’s approximate. But it suggests that if you were asking a source that would only answer ‘yes’ or ‘no’, then you could expect to get the score for any particular football game with about nine well-chosen questions.

I’m not surprised this is less than my estimated information content of a basketball game’s score. I think basketball games see a wider range of likely scores than football games do.

If someone has a reference for the outcomes of soccer games — or other sports — over a reasonably long time please let me know. I can run the same sort of calculation. We might even manage the completely pointless business of ranking all major sports by the information content of their scores.

## Reading the Comics, October 29, 2015: Spherical Squirrel Edition

John Zakour and Scott Roberts’s Maria’s Day is going to Sunday-only publication. A shame, but I understand Zakour and Roberts choosing to focus their energies on better-paying venues. That those venues are “writing science fiction novels” says terrifying things about the economic logic of web comics.

This installment, from the 23rd, is a variation on the joke about the lawyer, or accountant, or consultant, or economist, who carefully asks “what do you want the answer to be?” before giving it. Sports are a rich mine of numbers, though. Mostly they’re statistics, and we might wonder: why does anyone care about sports statistics? Once the score of a game is done counted, what else matters? A sociologist and a sports historian are probably needed to give true, credible answers. My suspicion is that it amounts to money, as it ever does. If one wants to gamble on the outcomes of sporting events, one has to have a good understanding of what is likely to happen, and how likely it is to happen. And I suppose if one wants to manage a sporting event, one wants to spend money and time and other resources to best effect. That requires data, and that we see in numbers. And there are so many things that can be counted in any athletic event, aren’t there? All those numbers carry with them a hypnotic pull.

In Darrin Bell’s Candorville for the 24th of October, Lemont mourns how he’s forgotten how to do long division. It’s an easy thing to forget. For one, we have calculators, as Clyde points out. For another, long division ultimately requires we guess at and then try to improve an answer. It can’t be reduced to an operation that will never require back-tracking and trying some part of it again. That back-tracking — say, trying to put 28 into the number seven times, and finding it actually goes at least eight times — feels like a mistake. It feels like the sort of thing a real mathematician would never do.

And that’s completely wrong. Trying an answer, and finding it’s not quite right, and improving on it is perfectly sound mathematics. Arguably it’s the whole field of numerical mathematics. Perhaps students would find long division less haunting if they were assured that it is fine to get a wrong-but-close answer as long as you make it better.

John Graziano’s Ripley’s Believe It or Not for the 25th of October talks about the Rubik’s Cube, and all the ways it can be configured. I grant it sounds like 43,252,003,274,489,856,000 is a bit high a count of possible combinations. But it is about what I hear from proper mathematics texts, the ones that talk about group theory, so let’s let it pass.

The Rubik’s Cube gets talked about in group theory, the study of things that work kind of like arithmetic. In this case, turning one of the faces — well, one of the thirds of a face — clockwise or counterclockwise by 90 degrees, so the whole thing stays a cube, works like adding or subtracting one, modulo 4. That is, we pretend the only numbers are 0, 1, 2, and 3, and the numbers wrap around. 3 plus 1 is 0; 3 plus 2 is 1. 1 minus 2 is 3; 1 minus 3 is 2. There are several separate rotations that can be done, each turning a third of each face of the cube. That each face of the cube starts a different color means it’s easy to see how these different rotations interact and create different color patterns. And rotations look easy to understand. We can at least imagine rotating most anything. In the Rubik’s Cube we can look at a lot of abstract mathematics in a handheld and friendly-looking package. It’s a neat thing.

Scott Hilburn’s The Argyle Sweater for the 26th of October is really a physics joke. But it uses (gibberish) mathematics as the signifier of “a fully thought-out theory” and that’s good enough for me. Also the talk of a “big boing” made me giggle and I hope it does you too.

Izzy Ehnes’s The Best Medicine Cartoon makes, I believe, its debut for Reading the Comics posts with its entry for the 26th. It’s also the anthropomorphic-numerals joke for the week.

Frank Page’s Bob the Squirrel is struggling under his winter fur this week. On the 27th Bob tries to work out the Newtonian forces involved in rolling about in his condition. And this gives me the chance to share a traditional mathematicians joke and a cliche punchline.

The story goes that a dairy farmer knew he could be milking his cows better. He could surely get more milk, and faster, if only the operations of his farm were arranged better. So he hired a mathematician, to find the optimal way to configure everything. The mathematician toured every part of the pastures, the milking barn, the cows, everything relevant. And then the mathematician set to work devising a plan for the most efficient possible cow-milking operation. The mathematician declared, “First, assume a spherical cow.”

The punch line has become a traditional joke in the mathematics and science fields. As a joke it comments on the folkloric disconnection between mathematicians and practicality. It also comments on the absurd assumptions that mathematicians and scientists will make for the sake of producing a model, and for getting an answer.

The joke within the joke is that it’s actually fine to make absurd assumptions. We do it all the time. All models are simplifications of the real world, tossing away things that may be important to the people involved but that just complicate the work we mean to do. We may assume cows are spherical because that reflects, in a not too complicated way, that while they might choose to get near one another they will also, given the chance, leave one another some space. We may pretend a fluid has no viscosity, because we are interested in cases where the viscosity does not affect the behavior much. We may pretend people are fully aware of the costs, risks, and benefits of any action they wish to take, at least when they are trying to decide which route to take to work today.

That an assumption is ridiculous does not mean the work built on it is ridiculous. We must defend why we expect those assumptions to make our work practical without introducing too much error. We must test whether the conclusions drawn from the assumption reflect what we wanted to model reasonably well. We can still learn something from a spherical cow. Or a spherical squirrel, if that’s the case.

Keith Tutt and Daniel Saunders’s Lard’s World Peace Tips for the 28th of October is a binary numbers joke. It’s the other way to tell the joke about there being 10 kinds of people in the world. (I notice that joke made in the comments on Gocomics.com. That was inevitable.)

Eric the Circle for the 29th of October, this one by “Gilly” again, jokes about mathematics being treated as if quite subject to law. The truth of mathematical facts isn’t subject to law, of course. But the use of mathematics is. It’s obvious, for example, in the setting of educational standards. What things a member of society must know to be a functioning part of it are, western civilization has decided, a subject governments may speak about. Thus what mathematics everyone should know is a subject of legislation, or at least legislation in the attenuated form of regulated standards.

But mathematics is subject to parliament (or congress, or the diet, or what have you) in subtler ways. Mathematics is how we measure debt, that great force holding society together. And measurement again has been (at least in western civilization) a matter for governments. We accept the principle that a government may establish a fundamental unit of weight or fundamental unit of distance. So too may it decide what is a unit of currency, and into how many pieces the unit may be divided. And from this it can decide how to calculate with that currency: if the “proper” price of a thing would be, say, five-ninths of the smallest available bit of currency, then what should the buyer give the seller?

Who cares, you might ask, and fairly enough. I can’t get worked up about the risk that I might overpay four-ninths of a penny for something, nor feel bad that I might cheat a merchant out of five-ninths of a penny. But consider: when Arabic numerals first made their way to the west they were viewed with suspicion. Everyone at the market or the moneylenders’ knew how Roman numerals worked, and could follow addition and subtraction with ease. Multiplication was harder, but it could be followed. Division was a diaster and I wouldn’t swear that anyone has ever successfully divided using Roman numerals, but at least everything else was nice and familiar.

But then suddenly there was this influx of new symbols, only one of them something that had ever been a number before. One of them at least looked like the letter O, but it was supposed to represent a missing quantity. And every calculation on this was some strange gibberish where one unfamiliar symbol plus another unfamiliar symbol turned into yet another unfamiliar symbol or maybe even two symbols. Sure, the merchant or the moneylender said it was easier, once you learned the system. But they were also the only ones who understood the system, and the ones who would profit by making “errors” that could not be detected.

Thus we see governments, even in worldly, trade-friendly city-states like Venice, prohibiting the use of Arabic numerals. Roman numerals may be inferior by every measure, but they were familiar. They stood at least until enough generations passed that the average person could feel “1 + 1 = 2” contained no trickery.

If one sees in this parallels to the problem of reforming mathematics education, all I can offer is that people are absurd, and we must love the absurdness of them.

One last note, so I can get this essay above two thousand words somehow. In the 1910s Alfred North Whitehead and Bertrand Russell published the awesome and menacing Principia Mathematica. This was a project to build arithmetic, and all mathematics, on sound logical grounds utterly divorced from the great but fallible resource of human intuition. They did probably as well as human beings possibly could. They used a bewildering array of symbols and such a high level of abstraction that a needy science fiction movie could put up any random page of the text and pass it off as Ancient High Martian.

But they were mathematicians and philosophers, and so could not avoid a few wry jokes, and one of them comes in Volume II, around page 86 (it’ll depend on the edition you use). There, in Proposition 110.643, Whitehead and Russell establish “1 + 1 = 2” and remark, “the above proposition is occasionally useful”. They note at least three uses in their text alone. (Of course this took so long because they were building a lot of machinery before getting to mere work like this.)

Back in my days as a graduate student I thought it would be funny to put up a mock political flyer, demanding people say “NO ON PROP *110.643”. I was wrong. But the joke is strong enough if you don’t go to the trouble of making up the sign. I didn’t make up the sign anyway.

And to murder my own weak joke: arguably “1 + 1 = 2” is established much earlier, around page 380 of the first volume, in proposition *54.43. The thing is, that proposition warns that “it will follow, when mathematical addition has been defined”, which it hasn’t been at that point. But if you want to say it’s Proposition *54.43 instead go ahead; it will not get you any better laugh.

If you’d like to see either proof rendered as non-head-crushingly as possible, the Metamath Proof Explorer shows the reasoning for Proposition *54.43 as well as that for *110.643. And it contains hyperlinks so that you can try to understand the exact chain of reasoning which comes to that point. Good luck. I come from a mathematical heritage that looks at the Principia Mathematica and steps backward, quickly, before it has the chance to notice us and attack.

## At The Home Field

There was a neat little fluke in baseball the other day. All fifteen of the Major League Baseball games on Tuesday were won by the home team. This appears to be the first time it’s happened since the league expanded to thirty teams in 1998. As best as the Elias Sports Bureau can work out, the last time every game was won by the home team was on the 23rd of May, 1914, when all four games in each of the National League, American League, and Federal League were home-team wins.

This produced talk about the home field advantage never having it so good, naturally. Also at least one article claimed the odds of fifteen home-team wins were one in 32,768. I can’t find that article now that I need it; please just trust me that it existed.

The thing is this claim is correct, if you assume there is no home-field advantage. That is, if you suppose the home team has exactly one chance in two of winning, then the chance of fifteen home teams winning is one-half raised to the fifteenth power. And that is one in 32,768.

This also assumes the games are independent, that is, that the outcome of one has no effect on the outcome of another. This seems likely, at least as long as we’re far enough away from the end of the season. In a pennant race a team might credibly relax once another game decided whether they had secured a position in the postseason. That might affect whether they win the game under way. Whether results are independent is always important for a probability question.

But stadium designers and the groundskeeping crew would not be doing their job if the home team had an equal chance of winning as the visiting team does. It’s been understood since the early days of organized professional baseball that the state of the field can offer advantages to the team that plays most of its games there.

Jack Jones, at Betfirm.com, estimated that for the five seasons from 2010 to 2014, the home team won about 53.7 percent of all games. Suppose we take this as accurate and representative of the home field advantage in general. Then the chance of fifteen home-team wins is 0.537 raised to the fifteenth power. That is approximately one divided by 11,230.

That’s a good bit more probable than the one in 32,768 you’d expect from the home team having exactly a 50 percent chance of winning. I think that’s a dramatic difference considering the home team wins a bit less than four percent more often than 50-50.

The follow-up question and one that’s good for a probability homework would be to work out what are the odds that we’d see one day with fifteen home-team wins in the mere eighteen years since it became possible.

## Reading the Comics, July 12, 2015: Chuckling At Hart Edition

I haven’t had the chance to read the Gocomics.com comics yet today, but I’d had enough strips to bring up anyway. And I might need something to talk about on Tuesday. Two of today’s strips are from the legacy of Johnny Hart. Hart’s last decades at especially B.C., when he most often wrote about his fundamentalist religious views, hurt his reputation and obscured the fact that his comics were really, really funny when they start. His heirs and successors have been doing fairly well at reviving the deliberately anachronistic and lightly satirical edge that made the strips funny to begin with, and one of them’s a perennial around here. The other, Wizard of Id Classics, is literally reprints from the earliest days of the comic strip’s run. That shows the strip when it was earning its place on every comics page everywhere, and made a good case for it.

Mason Mastroianni, Mick Mastroianni, and Perri Hart’s B.C. (July 8) shows how a compass, without straightedge, can be used to ensure one’s survival. I suppose it’s really only loosely mathematical but I giggled quite a bit.

Ken Cursoe’s Tiny Sepuku (July 9) talks about luck as being just the result of probability. That’s fair enough. Random chance will produce strings of particularly good, or bad, results. Those strings of results can look so long or impressive that we suppose they have to represent something real. Look to any sport and the argument about whether there are “hot hands” or “clutch performers”. And Maneki-Neko is right that a probability manipulator would help. You can get a string of ten tails in a row on a fair coin, but you’ll get many more if the coin has an eighty percent chance of coming up tails.

Brant Parker and Johnny Hart’s Wizard of Id Classics (July 9, rerun from July 12, 1965) is a fun bit of volume-guessing and logic. So, yes, I giggled pretty solidly at both B.C. and The Wizard of Id this week.

Mell Lazarus’s Momma (July 11) identifies “long division” as the first thing a person has to master to be an engineer. I don’t know that this is literally true. It’s certainly true that liking doing arithmetic helps one in a career that depends on calculation, though. But you can be a skilled songwriter without being any good at writing sheet music. I wouldn’t be surprised if there are skilled engineers who are helpless at dividing fourteen into 588.

Bunny Hoest and John Reiner’s Lockhorns (July 12) includes an example of using “adding up” to mean “make sense”. It’s a slight thing. But the same idiom was used last week, in Eric Teitelbaum and Bill Teitelbaum’s Bottomliners. I don’t think Comic Strip Master Command is ordering this punch line yet, but you never know.

And finally, I do want to try something a tiny bit new, and explicitly invite you-the-readers to say what strip most amused you. Please feel free to comment about your choices, r warn me that I set up the poll wrong. I haven’t tried this before.