## Let Me Tell You How Interesting March Madness Could Possibly Be

I read something alarming in the daily “Best of GoComics” e-mail this morning. It was a panel of Dave Whamond’s Reality Check. It’s a panel comic, although it stands out from the pack by having a squirrel character in the margins. And here’s the panel.

Certainly a solid enough pun to rate a mention. I don’t know of anyone actually doing a March Mathness bracket, but it’s not a bad idea. Rating mathematical terms for their importance or usefulness or just beauty might be fun. And might give a reason to talk about their meaning some. It’s a good angle to discuss what’s intersting about mathematical terms.

And that lets me segue into talking about a set of essays. The next few weeks see the NCAA college basketball tournament, March Madness. I’ve used that to write some stuff about information theory, as it applies to the question: is a basketball game interesting?

Along the way here I got to looking up actual scoring results from major sports. This let me estimate the information-theory content of the scores of soccer, (US) football, and baseball scores, to match my estimate of basketball scores’ information content.

• How Interesting Is A Football Score? Football scoring is a complicated thing. But I was able to find a trove of historical data to give me an estimate of the information theory content of a score.
• How Interesting Is A Baseball Score? Some Partial Results I found some summaries of actual historical baseball scores. Somehow I couldn’t find the detail I wanted for baseball, a sport that since 1845 has kept track of every possible bit of information, including how long the games ran, about every game ever. I made do, though.
• How Interesting Is A Baseball Score? Some Further Results Since I found some more detailed summaries and refined the estimate a little.
• How Interesting Is A Low-Scoring Game? And here, well, I start making up scores. It’s meant to represent low-scoring games such as soccer, hockey, or baseball to draw some conclusions. This includes the question: just because a distribution of small whole numbers is good for mathematicians, is that a good match for what sports scores are like?

## Reading the Comics, April 25, 2018: Coronet Blue Edition

You know what? Sometimes there just isn’t any kind of theme for the week’s strips. I can use an arbitrary name.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 21st of April, 2018 would have gone in last week if I weren’t preoccupied on Saturday. The joke is aimed at freshman calculus students and then intro Real Analysis students. The talk about things being “arbitrarily small” turns up a lot in these courses. Why? Well, in them we usually want to show that one thing equals another. But it’s hard to do that. What we can show is some estimate of how different the first thing can be from the second. And if you can show that that difference can be made small enough by calculating it correctly, great. You’ve shown the two things are equal.

Delta and epsilon turn up in these a lot. In the generic proof of this you say you want to show the difference between the thing you can calculate and the thing you want is smaller than epsilon. So you have the thing you can calculate parameterized by delta. Then your problem becomes showing that if delta is small enough, the difference between what you can do and what you want is smaller than epsilon. This is why it’s an appropriately-formed joke to show someone squeezed by a delta and an epsilon. These are the lower-case delta and epsilon, which is why it’s not a triangle on the left there.

For example, suppose you want to know how long the perimeter of an ellipse is. But all you can calculate is the perimeter of a polygon. I would expect to make a proof of it look like this. Give me an epsilon that’s how much error you’ll tolerate between the polygon’s perimeter and the ellipse’s perimeter. I would then try to find, for epsilon, a corresponding delta. And that if the edges of a polygon are never farther than delta from a point on the ellipse, then the perimeter of the polygon and that of the ellipse are less than epsilon away from each other. And that’s Calculus and Real Analysis.

John Zakour and Scott Roberts’s Maria’s Day for the 22nd is the anthropomorphic numerals joke for this week. I’m curious whether the 1 had a serif that could be wrestled or whether the whole number had to be flopped over, as though it were a ruler or a fat noodle.

Anthony Blades’s Bewley for the 23rd offers advice for what to do if you’ve not got your homework. This strip’s already been run, and mentioned here. I might drop this from my reading if it turns out the strip is done and I’ve exhausted all the topics it inspires.

Dave Whamond’s Reality Check for the 23rd is designed for the doors of mathematics teachers everywhere. It does incidentally express one of those truths you barely notice: that statisticians and mathematicians don’t seem to be quite in the same field. They’ve got a lot of common interest, certainly. But they’re often separate departments in a college or university. When they do share a department it’s named the Department of Mathematics and Statistics, itself an acknowledgement that they’re not quite the same thing. (Also it seems to me it’s always Mathematics-and-Statistics. If there’s a Department of Statistics-and-Mathematics somewhere I don’t know of it and would be curious.) This has to reflect historical influence. Statistics, for all that it uses the language of mathematics and that logical rigor and ideas about proofs and all, comes from a very practical, applied, even bureaucratic source. It grew out of asking questions about the populations of nations and the reliable manufacture of products. Mathematics, even the mathematics that is about real-world problems, is different. A mathematician might specialize in the equations that describe fluid flows, for example. But it could plausibly be because they have interesting and strange analytical properties. It’d be only incidental that they might also say something enlightening about why the plumbing is stopped up.

Neal Rubin and Rod Whigham’s Gil Thorp for the 24th seems to be setting out the premise for the summer storyline. It’s sabermetrics. Or at least the idea that sports performance can be quantized, measured, and improved. The principle behind that is sound enough. The trick is figuring out what are the right things to measure, and what can be done to improve them. Also another trick is don’t be a high school student trying to lecture classmates about geometry. Seriously. They are not going to thank you. Even if you turn out to be right. I’m not sure how you would have much control of the angle your ball comes off the bat, but that’s probably my inexperience. I’ve learned a lot about how to control a pinball hitting the flipper. I’m not sure I could quantize any of it, but I admit I haven’t made a serious attempt to try either. Also, when you start doing baseball statistics you run a roughly 45% chance of falling into a deep well of calculation and acronyms of up to twelve letters from which you never emerge. Be careful. (This is a new comic strip tag.)

Randy Glasbergen’s Glasbergen Cartoons rerun for the 25th feels a little like a slight against me. Well, no matter. Use the things that get you in the mood you need to do well. (Not a new comic strip tag because I’m filing it under ‘Randy Glasbergen’ which I guess I used before?)

## Is A Basketball Tournament Interesting? My Thoughts

It’s a good weekend to bring this back. I have some essays about information theory and sports contests and maybe you missed them earlier. Here goes.

And then for a follow-up I started looking into actual scoring results from major sports. This let me estimate the information-theory content of the scores of soccer, (US) football, and baseball scores, to match my estimate of basketball scores’ information content.

Don’t try to use this to pass your computer science quals. But I hope it gives you something interesting to talk about while sulking over your brackets, and maybe to read about after that.

## Reading the Comics, November 18, 2017: Story Problems and Equation Blackboards Edition

It was a normal-paced week at Comic Strip Master Command. It was also one of those weeks that didn’t have anything from Comics Kingdom or Creators.Com. So I’m afraid you’ll all just have to click the links for strips you want to actually see. Sorry.

Bill Amend’s FoxTrot for the 12th has Jason and Marcus creating “mathic novels”. They, being a couple of mathematically-gifted smart people, credit mathematics knowledge with smartness. A “chiliagon” is a thousand-sided regular polygon that’s mostly of philosophical interest. A regular polygon with a thousand equal sides and a thousand equal angles looks like a circle. There’s really no way to draw one so that the human eye could see the whole figure and tell it apart from a circle. But if you can understand the idea of a regular polygon it seems like you can imagine a chilagon and see how that’s not a circle. So there’s some really easy geometry things that can’t be visualized, or at least not truly visualized, and just have to be reasoned with.

Rick Detorie’s One Big Happy for the 12th is a story-problem-subversion joke. The joke’s good enough as it is, but the supposition of the problem is that the driving does cover fifty miles in an hour. This may not be the speed the car travels at the whole time of the problem. Mister Green is maybe speeding to make up for all the time spent travelling slower.

Brandon Sheffield and Dami Lee’s Hot Comics for Cool People for the 13th uses a blackboard full of equations to represent the deep thinking being done on a silly subject.

Shannon Wheeler’s Too Much Coffee Man for the 15th also uses a blackboard full of equations to represent the deep thinking being done on a less silly subject. It’s a really good-looking blackboard full of equations, by the way. Beyond the appearance of our old friend E = mc2 there’s a lot of stuff that looks like legitimate quantum mechanics symbols there. They’re at least not obvious nonsense, as best I can tell without the ability to zoom the image in. I wonder if Wheeler didn’t find a textbook and use some problems from it for the feeling of authenticity.

Samson’s Dark Side of the Horse for the 16th is a story-problem subversion joke.

Jef Mallett’s Frazz for the 18th talks about making a bet on the World Series, which wrapped up a couple weeks ago. It raises the question: can you bet on an already known outcome? Well, sure, you can bet on anything you like, given a willing partner. But there does seem to be something fundamentally different between betting on something whose outcome isn’t in principle knowable, such as the winner of the next World Series, and betting on something that could be known but happens not to be, such as the winner of the last. We see this expressed in questions like “is it true the 13th of a month is more likely to be Friday than any other day of the week?” If you know which month and year is under discussion the chance the 13th is Friday is either 1 or 0. But we mean something more like, if we don’t know what month and year it is, what’s the chance this is a month with a Friday the 13th? Something like this is at work in this World Series bet. (The Astros won the recently completed World Series.)

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 18th is also featured on some underemployed philosopher’s “Reading the Comics” WordPress blog and fair enough. Utilitarianism exists in an odd triple point, somewhere on the borders of ethics, economics, and mathematics. The idea that one could quantize the good or the utility or the happiness of society, and study how actions affect it, is a strong one. It fits very well the modern mindset that holds everything can be quantified even if we don’t know how to do it well just yet. And it appeals strongly to a mathematically-minded person since it sounds like pure reason. It’s not, of course, any more than any ethical scheme can be. But it sounds like the ethics a Vulcan would come up with and that appeals to a certain kind of person. (The comic is built on one of the implications of utilitarianism that makes it seem like the idea’s gone off the rails.)

There’s some mathematics symbols on The Utilitarian’s costume. The capital U on his face is probably too obvious to need explanation. The $\sum u$ on his chest relies on some mathematical convention. For maybe a half-millennium now mathematicians have been using the capital sigma to mean “take a sum of things”. The things are whatever the expression after that symbol is. Usually, the Sigma will have something below and above which carries meaning. It says what the index is for the thing after the symbol, and what the bounds of the index are. Here, it’s not set. This is common enough, though, if this is understood from context. Or if it’s obvious. The small ‘u’ to the right suggests the utility of whatever’s thought about. (“Utility” being the name for the thing measured and maximized; it might be happiness, it might be general well-being, it might be the number of people alive.) So the symbols would suggest “take the sum of all the relevant utilities”. Which is the calculation that would be done in this case.

## How Interesting Is March Madness?

And now let me close the week with some other evergreen articles. A couple years back I mixed the NCAA men’s basketball tournament with information theory to produce a series of essays that fit the title I’ve given this recap. They also sprawl out into (US) football and baseball. Let me link you to them:

## Reading the Comics, July 13, 2016: Catching Up On Vacation Week Edition

I confess I spent the last week on vacation, away from home and without the time to write about the comics. And it was another of those curiously busy weeks that happens when it’s inconvenient. I’ll try to get caught up ahead of the weekend. No promises.

Art and Chip Samson’s The Born Loser for the 10th talks about the statistics of body measurements. Measuring bodies is one of the foundations of modern statistics. Adolphe Quetelet, in the mid-19th century, found a rough relationship between body mass and the square of a person’s height, used today as the base for the body mass index.Francis Galton spent much of the late 19th century developing the tools of statistics and how they might be used to understand human populations with work I will describe as “problematic” because I don’t have the time to get into how much trouble the right mind at the right idea can be.

No attempt to measure people’s health with a few simple measurements and derived quantities can be fully successful. Health is too complicated a thing for one or two or even ten quantities to describe. Measures like height-to-waist ratios and body mass indices and the like should be understood as filters, the way temperature and blood pressure are. If one or more of these measurements are in dangerous ranges there’s reason to think there’s a health problem worth investigating here. It doesn’t mean there is; it means there’s reason to think it’s worth spending resources on tests that are more expensive in time and money and energy. And similarly just because all the simple numbers are fine doesn’t mean someone is perfectly healthy. But it suggests that the person is more likely all right than not. They’re guides to setting priorities, easy to understand and requiring no training to use. They’re not a replacement for thought; no guides are.

Jeff Harris’s Shortcuts educational panel for the 10th is about zero. It’s got a mix of facts and trivia and puzzles with a few jokes on the side.

I don’t have a strong reason to discuss Ashleigh Brilliant’s Pot-Shots rerun for the 11th. It only mentions odds in a way that doesn’t open up to discussing probability. But I do like Brilliant’s “Embrace-the-Doom” tone and I want to share that when I can.

John Hambrock’s The Brilliant Mind of Edison Lee for the 13th of July riffs on the world’s leading exporter of statistics, baseball. Organized baseball has always been a statistics-keeping game. The Olympic Ball Club of Philadelphia’s 1837 rules set out what statistics to keep. I’m not sure why the game is so statistics-friendly. It must be in part that the game lends itself to representation as a series of identical events — pitcher throws ball at batter, while runners wait on up to three bases — with so many different outcomes.

Alan Schwarz’s book The Numbers Game: Baseball’s Lifelong Fascination With Statistics describes much of the sport’s statistics and record-keeping history. The things recorded have varied over time, with the list of things mostly growing. The number of statistics kept have also tended to grow. Sometimes they get dropped. Runs Batted In were first calculated in 1880, then dropped as an inherently unfair statistic to keep; leadoff hitters were necessarily cheated of chances to get someone else home. How people’s idea of what is worth measuring changes is interesting. It speaks to how we change the ways we look at the same event.

Dana Summers’s Bound And Gagged for the 13th uses the old joke about computers being abacuses and the like. I suppose it’s properly true that anything you could do on a real computer could be done on the abacus, just, with a lot ore time and manual labor involved. At some point it’s not worth it, though.

Nate Fakes’s Break of Day for the 13th uses the whiteboard full of mathematics to denote intelligence. Cute birds, though. But any animal in eyeglasses looks good. Lab coats are almost as good as eyeglasses.

David L Hoyt and Jeff Knurek’s Jumble for the 13th is about one of geometry’s great applications, measuring how large the Earth is. It’s something that can be worked out through ingenuity and a bit of luck. Once you have that, some clever argument lets you work out the distance to the Moon, and its size. And that will let you work out the distance to the Sun, and its size. The Ancient Greeks had worked out all of this reasoning. But they had to make observations with the unaided eye, without good timekeeping — time and position are conjoined ideas — and without photographs or other instantly-made permanent records. So their numbers are, to our eyes, lousy. No matter. The reasoning is brilliant and deserves respect.

## How Interesting Is A Low-Scoring Game?

I’m still curious about the information-theory content, the entropy, of sports scores. I haven’t found the statistics I need about baseball or soccer game outcomes that I need. I’d also like hockey score outcomes if I could get them. If anyone knows a reference I’d be glad to know of it.

But there’s still stuff I can talk about without knowing details of every game ever. One of them suggested itself when I looked at the Washington Post‘s graphic. I mean the one giving how many times each score came up in baseball’s history.

By “distribution” mathematicians mean almost what you would imagine. Suppose we have something that might hold any of a range of values. This we call a “random variable”. How likely is it to hold any particular value? That’s what the distribution tells us. The higher the distribution, the more likely it is we’ll see that value. In baseball terms, that means we’re reasonably likely to see a game with a team scoring three runs. We’re not likely to see a game with a team scoring twenty runs.

There are many families of distributions. Feloni Mayhem suggested the baseball scores look like one called the Beta Distribution. I can’t quite agree, on technical grounds. Beta Distributions describe continuously-valued variables. They’re good for stuff like the time it takes to do something, or the height of a person, or the weight of a produced thing. They’re for measurements that can, in principle, go on forever after the decimal point. A baseball score isn’t like that. A team can score zero points, or one, or 46, but it can’t score four and two-thirds points. Baseball scores are “discrete” variables.

But there are good distributions for discrete variables. Almost everything you encounter taking an Intro to Probability class will be about discrete variables. So will most any recreational mathematics puzzle. The distribution of a tossed die’s outcomes is discrete. So is the number of times tails comes up in a set number of coin tosses. So are the birth dates of people in a room, or the number of cars passed on the side of the road during your ride, or the number of runs scored by a baseball team in a full game.

I suspected that, of the simpler distributions, the best model for baseball should be the Poisson distribution. It also seems good for any other low-scoring game, such as soccer or hockey. The Poisson distribution turns up whenever you have a large number of times that some discrete event can happen. But that event can happen only once each chance. And it has a constant chance of happening. That is, happening this chance doesn’t make it more likely or less likely it’ll happen next chance.

I have reasons to think baseball scoring should be well-modelled this way. There are hundreds of pitches in a game. Each of them is in principle a scoring opportunity. (Well, an intentional walk takes three pitches without offering any chance for scoring. And there’s probably some other odd case where a pitched ball can’t even in principle let someone score. But these are minor fallings-away from the ideal.) This is part of the appeal of baseball, at least for some: the chance is always there.

We only need one number to work out the Poisson distribution of something. That number is the mean, the arithmetic mean of all the possible values. Let me call the mean μ, which is the Greek version of m and so a good name for a mean. The probability that you’ll see the thing happen n times is $\mu^n e^{-\mu} \div (n!)$. Here e is that base of the natural logarithm, that 2.71828 et cetera number. n! is the factorial. That’s n times (n – 1) times (n – 2) times (n – 3) and so on all the way down to times 2 times 1.

And here is the Poisson distribution for getting numbers from 0 through 20, if we take the mean to be 3.4. I can defend using the Poisson distribution much more than I can defend picking 3.4 as the mean. Why not 3.2, or 3.8? Mostly, I tried a couple means around the three-to-four runs range and picked one that looked about right. Given the lack of better data, what else can I do?

I don’t think it’s a bad fit. The shape looks about right, to me. But the Poisson distribution suggests fewer zero- and one-run games than the actual data offers. And there are more high-scoring games in the real data than in the Poisson distribution. Maybe there’s something that needs tweaking.

And there are several plausible causes for this. A Poisson distribution, for example, supposes that there are a lot of chances for a distinct event. That would be scoring on a pitch. But in an actual baseball game there might be up to four runs scored on one pitch. It’s less likely to score four runs than to score one, sure, but it does happen. This I imagine boosts the number of high-scoring games.

I suspect this could be salvaged by a model that’s kind of a chain of Poisson distributions. That is, have one distribution that represents the chance of scoring on any given pitch. Then use another distribution to say whether the scoring was one, two, three, or four runs.

Low-scoring games I have a harder time accounting for. My suspicion is that each pitch isn’t quite an independent event. Experience shows that pitchers lose control of their game the more they pitch. This results in the modern close watching of pitch counts. We see pitchers replaced at something like a hundred pitches even if they haven’t lost control of the game yet.

If we ignore reasons to doubt this distribution, then, it suggests an entropy of about 2.9 for a single team’s score. That’s lower than the 3.5 bits I estimated last time, using score frequencies. I think that’s because of the multiple-runs problem. Scores are spread out across more values than the Poisson distribution suggests.

If I am right this says we might model games like soccer and hockey, with many chances to score a single run each, with a Poisson distribution. A game like baseball, or basketball, with many chances to score one or more points at once needs a more complicated model.

## How Interesting Is A Baseball Score? Some Further Results

While researching for my post about the information content of baseball scores I found some tantalizing links. I had wanted to know how often each score came up. From this I could calculate the entropy, the amount of information in the score. That’s the sum, taken over every outcome, of minus one times the frequency of that score times the base-two logarithm of the frequency of the outcome. And I couldn’t find that.

An article in The Washington Post had a fine lead, though. It offers, per the title, “the score of every basketball, football, and baseball game in league history visualized”. And as promised it gives charts of how often each number of runs has turned up in a game. The most common single-team score in a game is 3, with 4 and 2 almost as common. I’m not sure the date range for these scores. The chart says it includes (and highlights) data from “a century ago”. And as the article was posted in December 2014 it can hardly use data from after that. I can’t imagine that the 2015 season has changed much, though. And whether they start their baseball statistics at either 1871, 1876, 1883, 1891, or 1901 (each a defensible choice) should only change details.

Which is fine. I can’t get precise frequency data from the chart. The chart offers how many thousands of times a particular score has come up. But there’s not the reference lines to say definitely whether a zero was scored closer to 21,000 or 22,000 times. I will accept a rough estimate, since I can’t do any better.

I made my best guess at the frequency, from the chart. And then made a second-best guess. My best guess gave the information content of a single team’s score as a touch more than 3.5 bits. My second-best guess gave the information content as a touch less than 3.5 bits. So I feel safe in saying a single team’s score is about three and a half bits of information.

So the score of a baseball game, with two teams scoring, is probably somewhere around twice that, or about seven bits of information.

I have to say “around”. This is because the two teams aren’t scoring runs independently of one another. Baseball doesn’t allow for tie games except in rare circumstances. (It would usually be a game interrupted for some reason, and then never finished because the season ended with neither team in a position where winning or losing could affect their standing. I’m not sure that would technically count as a “game” for Major League Baseball statistical purposes. But I could easily see a roster of game scores counting that.) So if one team’s scored three runs in a game, we have the information that the other team almost certainly didn’t score three runs.

This estimate, though, does fit within my range estimate from 3.76 to 9.25 bits. And as I expected, it’s closer to nine bits than to four bits. The entropy seems to be a bit less than (American) football scores — somewhere around 8.7 bits — and college basketball — probably somewhere around 10.8 bits — which is probably fair. There are a lot of numbers that make for plausible college basketball scores. There are slightly fewer pairs of numbers that make for plausible football scores. There are fewer still pairs of scores that make for plausible baseball scores. So there’s less information conveyed in knowing that the game’s score is.

## How Interesting Is A Baseball Score? Some Partial Results

Meanwhile I have the slight ongoing quest to work out the information-theory content of sports scores. For college basketball scores I made up some plausible-looking score distributions and used that. For professional (American) football I found a record of all the score outcomes that’ve happened, and how often. I could use experimental results. And I’ve wanted to do other sports. Soccer was asked for. I haven’t been able to find the scoring data I need for that. Baseball, maybe the supreme example of sports as a way to generate statistics … has been frustrating.

The raw data is available. Retrosheet.org has logs of pretty much every baseball game, going back to the forming of major leagues in the 1870s. What they don’t have, as best I can figure, is a list of all the times each possible baseball score has turned up. That I could probably work out, when I feel up to writing the scripts necessary, but “work”? Ugh.

Some people have done the work, although they haven’t shared all the results. I don’t blame them; the full results make for a boring sort of page. “The Most Popular Scores In Baseball History”, at ValueOverReplacementGrit.com, reports the top ten most common scores from 1871 through 2010. The essay also mentions that as of then there were 611 unique final scores. And that lets me give some partial results, if we trust that blogger post from people I never heard of before are accurate and true. I will make that assumption over and over here.

There’s, in principle, no limit to how many scores are possible. Baseball contains many implied infinities, and it’s not impossible that a game could end, say, 580 to 578. But it seems likely that after 139 seasons of play there can’t be all that many more scores practically achievable.

Suppose then there are 611 possible baseball score outcomes, and that each of them is equally likely. Then the information-theory content of a score’s outcome is negative one times the logarithm, base two, of 1/611. That’s a number a little bit over nine and a quarter. You could deduce the score for a given game by asking usually nine, sometimes ten, yes-or-no questions from a source that knew the outcome. That’s a little higher than the 8.7 I worked out for football. And it’s a bit less than the 10.8 I estimate for college basketball.

And there’s obvious rubbish there. In no way are all 611 possible outcomes equally likely. “The Most Popular Scores In Baseball History” says that right there in the essay title. The most common outcome was a score of 3-2, with 4-3 barely less popular. Meanwhile it seems only once, on the 28th of June, 1871, has a baseball game ended with a score of 49-33. Some scores are so rare we can ignore them as possibilities.

(You may wonder how incompetent baseball players of the 1870s were that a game could get to 49-33. Not so bad as you imagine. But the equipment and conditions they were playing with were unspeakably bad by modern standards. Notably, the playing field couldn’t be counted on to be flat and level and well-mowed. There would be unexpected divots or irregularities. This makes even simple ground balls hard to field. The baseball, instead of being replaced with every batter, would stay in the game. It would get beaten until it was a little smashed shell of unpredictable dynamics and barely any structural integrity. People were playing without gloves. If a game ran long enough, they would play at dusk, without lights, with a muddy ball on a dusty field. And sometimes you just have four innings that get out of control.)

What’s needed is a guide to what are the common scores and what are the rare scores. And I haven’t found that, nor worked up the energy to make the list myself. But I found some promising partial results. In a September 2008 post on Baseball-Fever.com, user weskelton listed the 24 most common scores and their frequency. This was for games from 1993 to 2008. One might gripe that the list only covers fifteen years. True enough, but if the years are representative that’s fine. And the top scores for the fifteen-year survey look to be pretty much the same as the 139-year tally. The 24 most common scores add up to just over sixty percent of all baseball games, which leaves a lot of scores unaccounted for. I am amazed that about three in five games will have a score that’s one of these 24 choices though.

But that’s something. We can calculate the information content for the 25 outcomes, one each of the 24 particular scores and one for “other”. This will under-estimate the information content. That’s because “other” is any of 587 possible outcomes that we’re not distinguishing. But if we have a lower bound and an upper bound, then we’ve learned something about what the number we want can actually be. The upper bound is that 9.25, above.

The information content, the entropy, we calculate from the probability of each outcome. We don’t know what that is. Not really. But we can suppose that the frequency of each outcome is close to its probability. If there’ve been a lot of games played, then the frequency of a score and the probability of a score should be close. At least they’ll be close if games are independent, if the score of one game doesn’t affect another’s. I think that’s close to true. (Some games at the end of pennant races might affect each other: why try so hard to score if you’re already out for the year? But there’s few of them.)

The entropy then we find by calculating, for each outcome, a product. It’s minus one times the probability of that outcome times the base-two logarithm of the probability of that outcome. Then add up all those products. There’s good reasons for doing it this way and in the college-basketball link above I give some rough explanations of what the reasons are. Or you can just trust that I’m not lying or getting things wrong on purpose.

So let’s suppose I have calculated this right, using the 24 distinct outcomes and the one “other” outcome. That makes out the information content of a baseball score’s outcome to be a little over 3.76 bits.

As said, that’s a low estimate. Lumping about two-fifths of all games into the single category “other” drags the entropy down.

But that gives me a range, at least. A baseball game’s score seems to be somewhere between about 3.76 and 9.25 bits of information. I expect that it’s closer to nine bits than it is to four bits, but will have to do a little more work to make the case for it.

## At The Home Field

There was a neat little fluke in baseball the other day. All fifteen of the Major League Baseball games on Tuesday were won by the home team. This appears to be the first time it’s happened since the league expanded to thirty teams in 1998. As best as the Elias Sports Bureau can work out, the last time every game was won by the home team was on the 23rd of May, 1914, when all four games in each of the National League, American League, and Federal League were home-team wins.

This produced talk about the home field advantage never having it so good, naturally. Also at least one article claimed the odds of fifteen home-team wins were one in 32,768. I can’t find that article now that I need it; please just trust me that it existed.

The thing is this claim is correct, if you assume there is no home-field advantage. That is, if you suppose the home team has exactly one chance in two of winning, then the chance of fifteen home teams winning is one-half raised to the fifteenth power. And that is one in 32,768.

This also assumes the games are independent, that is, that the outcome of one has no effect on the outcome of another. This seems likely, at least as long as we’re far enough away from the end of the season. In a pennant race a team might credibly relax once another game decided whether they had secured a position in the postseason. That might affect whether they win the game under way. Whether results are independent is always important for a probability question.

But stadium designers and the groundskeeping crew would not be doing their job if the home team had an equal chance of winning as the visiting team does. It’s been understood since the early days of organized professional baseball that the state of the field can offer advantages to the team that plays most of its games there.

Jack Jones, at Betfirm.com, estimated that for the five seasons from 2010 to 2014, the home team won about 53.7 percent of all games. Suppose we take this as accurate and representative of the home field advantage in general. Then the chance of fifteen home-team wins is 0.537 raised to the fifteenth power. That is approximately one divided by 11,230.

That’s a good bit more probable than the one in 32,768 you’d expect from the home team having exactly a 50 percent chance of winning. I think that’s a dramatic difference considering the home team wins a bit less than four percent more often than 50-50.

The follow-up question and one that’s good for a probability homework would be to work out what are the odds that we’d see one day with fifteen home-team wins in the mere eighteen years since it became possible.

## Reading the Comics, April 6, 2015: Little Infinite Edition

As I warned, there were a lot of mathematically-themed comic strips the last week, and here I can at least get us through the start of April. This doesn’t include the strips that ran today, the 7th of April by my calendar, because I have to get some serious-looking men to look at my car and I just know they’re going to disapprove of what my CV joint covers look like, even though I’ve done nothing to them. But I won’t be reading most of today’s comic strips until after that’s done, and so commenting on them later.

Mark Anderson’s Andertoons (April 3) makes its traditional appearance in my roundup, in this case with a business-type guy declaring infinity to be “the loophole of all loopholes!” I think that’s overstating things a fair bit, but strange and very counter-intuitive things do happen when you try to work out a problem in which infinities turn up. For example: in ordinary arithmetic, the order in which you add together a bunch of real numbers makes no difference. If you want to add together infinitely many real numbers, though, it is possible to have them add to different numbers depending on what order you add them in. Most unsettlingly, it’s possible to have infinitely many real numbers add up to literally any real number you like, depending on the order in which you add them. And then things get really weird.

Keith Tutt and Daniel Saunders’s Lard’s World Peace Tips (April 3) is the other strip in this roundup to at least name-drop infinity. I confess I don’t see how “being infinite” would help in bringing about world peace, but I suppose being finite hasn’t managed the trick just yet so we might want to think outside the box.

## Gaussian distribution of NBA scores

The Prior Probability blog points out an interesting graph, showing the most common scores in basketball teams, based on the final scores of every NBA game. It’s actually got three sets of data there, one for all basketball games, one for games this decade, and one for basketball games of the 1950s. Unsurprisingly there’s many more results for this decade — the seasons are longer, and there are thirty teams in the league today, as opposed to eight or nine in 1954. (The Baltimore Bullets played fourteen games before folding, and the games were expunged from the record. The league dropped from eleven teams in 1950 to eight for 1954-1959.)

I’m fascinated by this just as a depiction of probability distributions: any team can, in principle, reach most any non-negative score in a game, but it’s most likely to be around 102. Surely there’s a maximum possible score, based on the fact a team has to get the ball and get into position before it can score; I’m a little curious what that would be.

Prior Probability itself links to another blog which reviews the distribution of scores for other major sports, and the interesting result of what the most common basketball score has been, per decade. It’s increased from the 1940s and 1950s, but it’s considerably down from the 1960s.

You can see the most common scores in such sports as basketball, football, and baseball in Philip Bump’s fun Wonkblog post here. Mr Bump writes: “Each sport follows a rough bell curve … Teams that regularly fall on the left side of that curve do poorly. Teams that land on the right side do well.” Read more about Gaussian distributions here.

View original post

## Reading The Comics, October 20, 2014: No Images This Edition

Since I started including Comics Kingdom strips in my roundups of mathematically-themed strips I’ve been including images of those, because I’m none too confident that Comics Kingdom’s pages are accessible to normal readers after some time has passed. Gocomics.com has — as far as I’m aware, and as far as anyone has told me — no such problems, so I haven’t bothered doing more than linking to them. So this is the first roundup in a long while I remember that has only Gocomics strips, with nothing from Comics Kingdom. It’s also the first roundup for which I’m fairly sure I’ve done one of these strips before.

Guy Endore-Kaiser and Rodd Perry and Dan Thompson’s Brevity (October 15, but a rerun) is an entry in the anthropomorphic-numbers line of mathematics comics, and I believe it’s one that I’ve already mentioned in the past. This particular strip is a rerun; in modern times the apparently indefatigable Dan Thompson has added this strip to the estimated fourteen he does by himself. In any event it stands out in the anthropomorphic-numbers subgenre for featuring non-integers that aren’t pi.

Ralph Hagen’s The Barn (October 16) ponders how aliens might communicate with Earthlings, and like pretty much everyone who’s considered the question mathematics is supposed to be the way they’d do it. It’s easy to see why mathematics is plausible as a universal language: a mathematical truth should be true anywhere that deductive logic holds, and it’s difficult to conceive of a universe existing in which it could not hold true. I have somewhere around here a mention of a late-19th-century proposal to try contacting Martians by planting trees in Siberia which, in bloom, would show a proof of the Pythagorean theorem.

In modern times we tend to think of contact with aliens being done by radio more likely (or at least some modulated-light signal), which makes a signal like a series of pulses counting out prime numbers sound likely. It’s easy to see why prime numbers should be interesting too: any species that has understood multiplication has almost certainly noticed them, and you can send enough prime numbers in a short time to make clear that there is a deliberate signal being sent. For comparison, perfect numbers — whose factors add up to the original number — are also almost surely noticed by any species that understands multiplication, but the first several of those are 6, 28, 496, and 8,128; by the time 8,128 pulses of anything have been sent the whole point of the message has been lost.

And yet finding prime numbers is still not really quite universal. You or I might see prime numbers as key, but why not triangular numbers, like the sequence 1, 3, 6, 10, 15? Why not square or cube numbers? The only good answer is, well, we have to pick something, so to start communicating let’s hope we find something that everyone will be able to recognize. But there’s an arbitrariness that can’t be fully shed from the process.

John Zakour and Scott Roberts’s Maria’s Day (October 17) reminds us of the value of having a tutor for mathematics problems — if you’re having trouble in class, go to one — and of paying them appropriately.

Steve Melcher’s That Is Priceless (October 17) puts comic captions to classic paintings and so presented Jusepe de Ribera’s 1630 Euclid, Letting Me Copy His Math Homework. I confess I have a broad-based ignorance of art history, but if I’m using search engines correctly the correct title was actually … Euclid. Hm. It seems like Melcher usually has to work harder at these things. Well, I admit it doesn’t quite match my mental picture of Euclid, but that would have mostly involved some guy in a toga wielding a compass. Ribera seems to have had a series of Greek Mathematician pictures from about 1630, including Pythagoras and Archimedes, with similar poses that I’ll take as stylized representations of the great thinkers.

Mark Anderson’s Andertoons (October 18) plays around statistical ideas that include expectation values and the gambler’s fallacy, but it’s a good puzzle: has the doctor done the procedure hundreds of times without a problem because he’s better than average at it, or because he’s been lucky? For an alternate formation, baseball offers a fine question: Ted Williams is the most recent Major League Baseball player to have a season batting average over .400, getting a hit in at least two-fifths of his at-bats over the course of the season. Was he actually good enough to get a hit that often, though, or did he just get lucky? Consider that a .250 hitter — with a 25 percent chance of a hit at any at-bat — could quite plausibly get hits in three out of his four chances in one game, or for that matter even two or three games. Why not a whole season?

Well, because at some point it becomes ridiculous, rather the way we would suspect something was up if a tossed coin came up tails thirty times in a row. Yes, possibly it’s just luck, but there’s good reason to suspect this coin doesn’t have a fifty percent chance of coming up heads, or that the hitter is likely to do better than one hit for every four at-bats, or, to the original comic, that the doctor is just better at getting through the procedure without complications.

Ryan North’s quasi-clip-art Dinosaur Comics (October 20) thrilled the part of me that secretly wanted to study language instead by discussing “light verb constructions”, a grammatical touch I hadn’t paid attention to before. The strip is dubbed “Compressed Thesis Comics”, though, from the notion that the Tyrannosaurus Rex is inspired to study “computationally” what forms of light verb construction are more and what are less acceptable. The impulse is almost perfect thesis project, really: notice a thing and wonder how to quantify it. A good piece of this thesis would probably be just working out how to measure acceptability of a particular verb construction. I imagine the linguistics community has a rough idea how to measure these or else T Rex is taking on way too big a project for a thesis, since that’d be an obvious point for the thesis to crash against.

Well, I still like the punch line.

## How weird is it that three pairs of same-market teams made the playoffs this year?

The “God Plays Dice” blog has a nice little baseball-themed post built on the coincidence that a number of the teams in the postseason this year are from the same or at least neighboring markets — two from Los Angeles, a pair from San Francisco and Oakland, and another pair from Washington and Baltimore. It can’t be likely that this should happen much, but, how unlikely is it? Michael Lugo works it out in what’s probably the easiest way to do it.

The Major League Baseball postseason is starting just as I write this.

From the National League, we have Washington, St. Louis, Pittsburgh, Los Angeles, and San Francisco.
From the American League, we have Baltimore, Kansas City, Detroit, Los Angeles (Anaheim), and Oakland.

These match up pretty well geographically, and this hasn’t gone unnoticed: see for example the New York Times blog post “the 2014 MLB playoffs have a neighborly feel” (apologies for not providing a link; I’m out of NYT views for the month, and I saw this back when I wasn’t); a couple mathematically inclined Facebook friends of mine have mentioned it as well.

In particular there are three pairs of “same-market” teams in here: Washington/Baltimore, Los Angeles/Los Angeles, San Francisco/Oakland. How likely is that?

(People have pointed out St. Louis/Kansas City as being both in Missouri, but that’s a bit more of a judgment call, and St. Louis…

View original post 545 more words

## Reading the Comics, June 11, 2014: Unsound Edition

I can tell the school year is getting near the end: it took a full week to get enough mathematics-themed comic strips to put together a useful bundle of them this time. I don’t know what I’m going to do this summer when there’s maybe two comic strips I can talk about per week and I have to go finding my own initiative to write about things.

Jef Mallet’s Frazz (June 6) is a pun strip, yeah, although it’s one that’s more or less legitimate for a word problem. The reason I have to say “more or less” is that it’s not clear to me whether, per Caulfield’s specification, the amount of ore lost across each Great Lake is three percent of the original cargo or three percent of the remaining cargo. But writing a word problem so that there’s only the one correct solution is a skill that needs development no less than solving word problems is, and probably if we imagine Caulfield grading he’d realize there was an ambiguity when a substantial number of of the papers make the opposite assumption to what he’d had in his mind.

Ruben Bolling’s Tom the Dancing Bug (June 6, and I believe it’s a rerun) steps into some of the philosophically heady waters that one gets into when you look seriously at probability, and that get outright silly when you mix omniscience into the mix. The Supreme Planner has worked out what he concludes to be a plan certain of success, but: does that actually mean one will succeed? Even if we assume that the Supreme Planner is able to successfully know and account for every factor which might affect his success — well, for a less criminal plan, consider: one is certain to toss heads at least once, if one flips a fair coin infinitely many times. And yet it would not actually be impossible to flip a fair coin infinitely many times and have it turn up tails every time. That something can have a probability of 1 (or 100%) of happening and nevertheless not happen — or equivalently, that something can have a probability of 0 (0%) of happening and still happen — is exactly analogous to how a concept can be true almost everywhere, that is, it can be true with exceptions that in some sense don’t matter. Ruben Bolling tosses in the troublesome notion of the multiverse, the idea that everything which might conceivably happen does happen “somewhere”, to make these impossible events all the more imminent. I’m impressed Bolling is able to touch on so much, with a taste of how unsettling the implications are, in a dozen panels and stay funny about it.

Bud Grace’s The Piranha Club (June 9) gives us Enos cheating with perfectly appropriate formulas for a mathematics exam. I’m kind of surprised the Pythagorean Theorem would rate cheat-sheet knowledge, actually, as I thought that had reached the popular culture at least as well as Einstein’s E = mc2 had, although perhaps it’s reached it much as Einstein’s has, as a charming set of sounds without any particular meaning behind them. I admit my tendency in giving exams, too, has been to allow students to bring their own sheet of notes, or even to have open-book exams, on the grounds that I don’t really care whether they’ve memorized formulas and am more interested in whether they can find and apply the relevant formulas. But that doesn’t make me right; I agree there’s value in being able to identify what the important parts of the course are and to remember them well, and even more value in being able to figure out the area of a triangle or a trapezoid from thinking hard about the subject on your own.

Jason Poland’s Robbie and Bobbie (June 10) is looking for philosophy and mathematics majors, so, here’s hoping it’s found a couple more. The joke here is about the classification of logical arguments. A valid argument is one in which the conclusion does indeed follow from the premises according to the rules of deductive logic. A sound argument is a valid argument in which the premises are also true. The reason these aren’t exactly the same thing is that whether a conclusion follows from the premise depends on the structure of the argument; the content is irrelevant. This means we can do a great deal of work, reasoning out things which follow if we suppose that proposition A being true implies B is false, or that we know B and C cannot both be false, or whatnot. But this means we may fill in, Mad-Libs-style, whatever we like to those propositions and come away with some funny-sounding arguments.

So this is how we can have an argument that’s valid yet not sound. It is valid to say that, if baseball is a form of band organ always found in amusement parks, and if amusement parks are always found in the cubby-hole under my bathroom sink, then, baseball is always found in the cubby-hole under my bathroom sink. But as none of the premises going into that argument are true, the argument’s not sound, which is how you can have anything be “valid but not sound”. Identifying arguments that are valid but not sound is good for a couple questions on your logic exam, so, be ready for that.

John Hambrock’s The Brilliant Mind of Edison Lee (June 11) has the brilliant yet annoying Edison trying to prove his genius by calculating precisely where the baseball will drop. This is a legitimate mathematics/physics problem, of course: one could argue that the modern history of mathematical physics comes from the study of falling balls, albeit more of cannonballs than baseballs. If there’s no air resistance and if gravity is uniform, the problem is easy and you get to show off your knowledge of parabolas. If gravity isn’t uniform, you have to show off your knowledge of ellipses. Either way, you can get into some fine differential equations work, and that work gets all the more impressive if you do have to pay attention to the fact that a ball moving through the air loses some of its speed to the air molecules. That said, it’s amazing that people are able to, in effect, work out approximate solutions to “where is this ball going” in their heads, not to mention to act on it and get to the roughly correct spot, lat least when they’ve had some practice.

## Reading the Comics, May 13, 2014: Good Class Problems Edition

Someone in Comic Strip Master Command must be readying for the end of term, as there’s been enough comic strips mentioning mathematics themes to justify another of these entries, and that’s before I even start reading Wednesday’s comics. I can’t say that there seem to be any overarching themes in the past week’s grab-bag of strips, but, there are a bunch of pretty good problems that would fit well in a mathematics class here.

Darrin Bell’s Candorville (May 6) comes back around to the default application of probability, questions in coin-flipping. You could build a good swath of a probability course just from the questions the strip implies: how many coins have to come up heads before it becomes reasonable to suspect that something funny is going on? Two is obviously too few; two thousand is likely too many. But improbable things do happen, without it signifying anything. So what’s the risk of supposing something’s up when it isn’t? What’s the risk of dismissing the hints that something is happening?

Mark Anderson’s Andertoons (May 8) is another entry in the wiseacre schoolchild genre (I wonder if I’ve actually been consistent in describing this kind of comic, but, you know what I mean) and suggesting that arithmetic just be done on the computer. I’m sympathetic, however much fun it is doing arithmetic by hand.

Justin Boyd’s Invisible Bread (May 9) is honestly a marginal inclusion here, but it does show a mathematics problem that’s correctly formed and would reasonably be included on a precalculus or calculus class’s worksheets. It is a problem that’s a no-brainer, really, but that fits the comic’s theme of poorly functioning.

Steve Moore’s In The Bleachers (May 12) uses baseball scores and the start of a series. A series, at least once you’re into calculus, is the sum of a sequence of numbers, and if there’s only finitely many of them, here, there’s not much that’s interesting to say. Each sequence of numbers has some sum and that’s it. But if you have an infinite series — well, there, all sorts of amazing things become possible (or at least logically justified), including integral calculus and numerical computing. The series from the panel, if carried out, would come to a pair of infinitely large sums — this is called divergence, and is why your mathematician friends on Facebook or Twitter are passing around that movie poster with a math formula for a divergent series on it — and you can probably get a fair argument going about whether the sum of all the even numbers would be equal to the sum of all the odd numbers. (My advice: if pressed to give an answer, point to the other side of the room, yell, “Look, a big, distracting thing!” and run off.)

Samson’s Dark Side Of The Horse (May 13) is something akin to a pun, playing as it does on the difference between a number and a numeral and shifting between the ways we might talk about “three”. Also, I notice for the first time that apparently the little bird sometimes seen in the comic is named “Sine”, which is probably why it flies in such a wavy pattern. I don’t know how I’d missed that before.

Rick Detorie’s One Big Happy (May 13, rerun) is also a strip that plays on the difference between a number and its representation as a numeral, really. Come to think of it, it’s a bit surprising that in Arabic numerals there aren’t any relationships between the representations for numbers; one could easily imagine a system in which, say, the symbol for “four” were a pair of whatever represents “two”. In A History Of Mathematical Notations Florian Cajori notes that there really isn’t any system behind why a particular numeral has any particular shape, and he takes a section (Section 96 in Book 1) to get engagingly catty about people who do. I’d like to quote it because it’s appealing, in that way:

A problem as fascinating as the puzzle of the origin of language relates to the evolution of the forms of our numerals. Proceeding on the tacit assumption that each of our numerals contains within itself, as a skeleton so to speak, as many dots, strokes, or angles as it represents units, imaginative writers of different countries and ages have advanced hypotheses as to their origin. Nor did these writers feel that they were indulging simply in pleasing pastimes or merely contributing to mathematical recreations. With perhaps only one exception, they were as convinced of the correctness of their explanations as are circle-squarers of the soundness of their quadratures.

Cajori goes on to describe attempts to rationalize the Arabic numerals as “merely … entertaining illustrations of the operation of a pseudo-scientific imagination, uncontrolled by all the known facts”, which gives some idea why Cajori’s engaging reading for seven hundred pages about stuff like where the plus sign comes from.

## Reading the Comics, May 4, 2014: Summing the Series Edition

Before I get to today’s round of mathematics comics, a legend-or-joke, traditionally starring John Von Neumann as the mathematician.

The recreational word problem goes like this: two bicyclists, twenty miles apart, are pedaling toward each other, each at a steady ten miles an hour. A fly takes off from the first bicyclist, heading straight for the second at fifteen miles per hour (ground speed); when it touches the second bicyclist it instantly turns around and returns to the first at again fifteen miles per hour, at which point it turns around again and head for the second, and back to the first, and so on. By the time the bicyclists reach one another, the fly — having made, incidentally, infinitely many trips between them — has travelled some distance. What is it?

And this is not hard problem to set up, inherently: each leg of the fly’s trip is going to be a certain ratio of the previous leg, which means that formulas for a geometric infinite series can be used. You just need to work out what the lengths of those legs are to start with, and what that ratio is, and then work out the formula in your head. This is a bit tedious and people given the problem may need some time and a couple sheets of paper to make it work.

Von Neumann, who was an expert in pretty much every field of mathematics and a good number of those in physics, allegedly heard the problem and immediately answered: 15 miles! And the problem-giver said, oh, he saw the trick. (Since the bicyclists will spend one hour pedaling before meeting, and the fly is travelling fifteen miles per hour all that time, it travels a total of a fifteen miles. Most people don’t think of that, and try to sum the infinite series instead.) And von Neumann said, “What trick? All I did was sum the infinite series.”

Did this charming story of a mathematician being all mathematicky happen? Wikipedia’s description of the event credits Paul Halmos’s recounting of Nicholas Metropolis’s recounting of the story, which as a source seems only marginally better than “I heard it on the Internet somewhere”. (Other versions of the story give different distances for the bicyclists and different speeds for the fly.) But it’s a wonderful legend and can be linked to a Herb and Jamaal comic strip from this past week.

Paul Trap’s Thatababy (April 29) has the baby “blame entropy”, which fits as a mathematical concept, it seems to me. Entropy as a concept was developed in the mid-19th century as a thermodynamical concept, and it’s one of those rare mathematical constructs which becomes a superstar of pop culture. It’s become something of a fancy word for disorder or chaos or just plain messes, and the notion that the entropy of a system is ever-increasing is probably the only bit of statistical mechanics an average person can be expected to know. (And the situation is more complicated than that; for example, it’s just more probable that the entropy is increasing in time.)

Entropy is a great concept, though, as besides capturing very well an idea that’s almost universally present, it also turns out to be meaningful in surprising new places. The most powerful of those is in information theory, which is just what the label suggests; the field grew out of the problem of making messages understandable even though the telegraph or telephone lines or radio beams on which they were sent would garble the messages some, even if people sent or received the messages perfectly, which they would not. The most captivating (to my mind) new place is in black holes: the event horizon of a black hole has a surface area which is (proportional to) its entropy, and consideration of such things as the conservation of energy and the link between entropy and surface area allow one to understand something of the way black holes ought to interact with matter and with one another, without the mathematics involved being nearly as complicated as I might have imagined a priori.

Meanwhile, Lincoln Pierce’s Big Nate (April 30) mentions how Nate’s Earned Run Average has changed over the course of two innings. Baseball is maybe the archetypical record-keeping statistics-driven sport; Alan Schwarz’s The Numbers Game: Baseball’s Lifelong Fascination With Statistics notes that the keeping of some statistical records were required at least as far back as 1837 (in the Constitution of the Olympic Ball Club of Philadelphia). Earned runs — along with nearly every other baseball statistic the non-stathead has heard of other than batting averages — were developed as a concept by the baseball evangelist and reporter Henry Chadwick, who presented them from 1867 as an attempt to measure the effectiveness of batting and fielding. (The idea of the pitcher as an active player, as opposed to a convenient way to get the ball into play, was still developing.) But — and isn’t this typical? — he would come to oppose the earned run average as a measure of pitching performance, because things that were really outside the pitcher’s control, such as stolen bases, contributed to it.

It seems to me there must be some connection between the record-keeping of baseball and the development of statistics as a concept in the 19th century. Granted the 19th century was a century of statistics, starting with nation-states measuring their populations, their demographics, their economies, and projecting what this would imply for future needs; and then with science, as statistical mechanics found it possible to quite well understand the behavior of millions of particles despite it being impossible to perfectly understand four; and in business, as manufacturing and money were made less individual and more standard. There was plenty to drive the field without an amusing game, but, I can’t help thinking of sports as a gateway into the field.

The Disney Company’s Donald Duck (May 2, rerun) suggests that Ludwig von Drake is continuing to have problems with his computing machine. Indeed, he’s apparently having the same problem yet. I’d like to know when these strips originally ran, but the host site of creators.com doesn’t give any hint.

Stephen Bentley’s Herb and Jamaal (May 3) has the kid whose name I don’t really know fret how he spent “so much time” on an equation which would’ve been easy if he’d used “common sense” instead. But that’s not a rare phenomenon mathematically: it’s quite possible to set up an equation, or a process, or a something which does indeed inevitably get you to a correct answer but which demands a lot of time and effort to finish, when a stroke of insight or recasting of the problem would remove that effort, as in the von Neumann legend. The commenter Dartpaw86, on the Comics Curmudgeon site, brought up another excellent example, from Katie Tiedrich’s Awkward Zombie web comic. (I didn’t use the insight shown in the comic to solve it, but I’m happy to say, I did get it right without going to pages of calculations, whether or not you believe me.)

However, having insights is hard. You can learn many of the tricks people use for different problems, but, say, no amount of studying the Awkward Zombie puzzle about a square inscribed in a circle inscribed in a square inscribed in a circle inscribed in a square will help you in working out the area left behind when a cylindrical tube is drilled out of a sphere. Setting up an approach that will, given enough work, get you a correct solution is worth knowing how to do, especially if you can give the boring part of actually doing the calculations to a computer, which is indefatigable and, certain duck-based operating systems aside, pretty reliable. That doesn’t mean you don’t feel dumb for missing the recasting.

Rick DeTorie’s One Big Happy (May 3) puns a little on the meaning of whole numbers. It might sound a little silly to have a name for only a handful of numbers, but, there’s no reason not to if the group is interesting enough. It’s possible (although I’d be surprised if it were the case) that there are only 47 Mersenne primes (a number, such as 7 or 31, that is one less than a whole power of 2), and we have the concept of the “odd perfect number”, when there might well not be any such thing.

## Reading the Comics, September 21, 2013

It must have been the summer vacation making comic strip artists take time off from mathematics-themed jokes: there’s a fresh batch of them a mere ten days after my last roundup.

John Zakour and Scott Roberts’s Maria’s Day (September 12) tells the basic “not understanding fractions” joke. I suspect that Zakour and Roberts — who’re pretty well-steeped in nerd culture, as their panel strip Working Daze shows — were summoning one of those warmly familiar old jokes. Well, Sydney Harris got away with the same punch line; why not them?

Brett Koth’s Diamond Lil (September 14) also mentions fractions, but as an example of one of those inexplicably complicated mathematics things that’ll haunt you rather than be useful or interesting or even understandable. I choose not to be offended by this insult of my preferred profession and won’t even point out that Koth totally redrew the panel three times over so it’s not a static shot of immobile talking heads.

## Distribution of the batting order slot that ends a baseball game

The God Plays Dice blog has a nice piece attempting to model a baseball question. Baseball is wonderful for all kinds of mathematics questions, partly because the game has since its creation kept data about the plays made, partly because the game breaks its action neatly into discrete units with well-defined outcomes.

Here, Dr Michael Lugo ponders whether games are more likely to end at any particular spot in the batting order. Lugo points out that certainly we could just count where games actually end, since baseball records are enough to make an estimate from that route possible. But that’s tedious, and it’s easier to work out a simple model and see what that suggests. Lupo also uses the number of perfect games as a test of whether the model is remotely plausible, and a test like this — a simple check to whether the scheme could possibly tell us something meaningful — is worth doing whenever one builds a model of something interesting.

Tom Tango, while writing about lineup construction in baseball, pointed out that batters batting closer to the top of the batting order have a greater chance of setting records that are based on counting something – for example, Chris Davis’ chase for 62 home runs. (It’s interesting that enough people see Roger Maris’ 61 as the “real” record that 62 is a big deal.) He observes that over a 162-game season, each slot further down in the batting order (of 9) means 18 fewer plate appearances.

Implicitly this means that every slot in the batting order is equally likely to end the game — that is, that the number of plate appearances for a team in a game, mod 9, is uniformly distributed over {0, 1, …, 8}.

Can we check this? There are two ways to check it:

• 1. find the number of plate appearances in every game…

View original post 652 more words

## Reblog: Lawler’s Log

I don’t intend to transform my writings here into a low-key sports mathematics blog. I just happen to have run across a couple of interesting problems and, after all, sports do offer a lot of neat questions about probability and statistics.

benperreira here makes mention of “Lawler’s Law”, something I had not previously noticed. The “Law” is the observation that the first basketball team to make it to 100 points wins the game just about 90 percent of the time. It was apparently first observed by Los Angeles Clippers announcer Ralph Lawler and has been supported by a review of the statistics of NBA teams over the decades.

benperreira is unimpressed with the law, regarding it as just a restatement of the principle that a team that scores more than the league average number of points per game will tend to have a winning record in an unduly wise-sounding phrasing. I’m inclined to agree the Law doesn’t seem to be particularly much, though I was caught by the implication that the team which lets the other get to 100 points first still pulls out a victory one time out of ten.

To underscore his point benperreira includes a diagram purporting to show the likelihood of victory to points scored, although it’s pretty obviously meant to be a quick joke extrapolating from the data that both teams start with a 50 percent chance of victory and zero points, and apparently 100 points gives a nearly 90 percent chance of victory. I am curious about a more precise chart — showing how often the first team to make 10, or 25, or 50, or so points goes on to victory, but I certainly haven’t got time to compile that data.

Well, perhaps I do, but my reading in baseball history and brushes up against people with SABR connections makes it very clear I have every possible risk factor for getting lost in the world of sports statistics so I want to stay far from the meat of actual games.

Still, there are good probability questions to be asked about things like how big a lead is effectively unbeatable, and I’ll leave this post and reblog as a way to nag myself in the future to maybe thinking about it later.

Lawler’s Law states that the NBA team that reaches 100 points first will win the game. It is based on Lawler’s observations and confirmed by looking back at NBA statistics that show it is true over 90% of the time.

Its brilliance lies in its uselessness. Like NyQuil helps us sleep but does little to help our immune systems make us well, Lawler’s Law soothes us by making us think it means something more than it does.

Why is it so useless, one may venture to ask?

This is a graphical representation of Lawler’s Law. Point A represents the beginning of a game. This team (which ultimately wins this game) has roughly a 50% chance of winning at that point. As the game goes on, and more points are scored, the team depicted here increases its chance of victory based on the number of points it has scored. Point B…

View original post 142 more words

## Trivial Little Baseball Puzzle

I’ve been reading a book about the innovations of baseball so that’s probably why it’s on my mind. And this isn’t important and I don’t expect it to go anywhere, but it did cross my mind, so, why not give it 200 words where they won’t do any harm?

Imagine one half-inning in a baseball game; imagine that there’s no substitutions or injuries or anything requiring the replacement of a batter. Also suppose there are none of those freak events like when a batter hits out of order and the other team doesn’t notice (or pretends not to notice), the sort of things which launch one into the wonderful and strange world of stuff baseball does because they did it that way in 1835 when everyone playing was striving to be a Gentleman.

What’s the maximum number of runs that could be scored while still having at least one player not get a run?

## From Drew Carey To An Imaginary Baseball Player

So, we calculated that on any given episode of The Price Is Right there’s around one chance of all six winners of the Item Up For Bid coming from the same seat. And we know there have been about six thousand episodes with six Items Up For Bid. So we expect there to have been about six clean sweep episodes; yet if Drew Carey is to be believed, there has been just the one. What’s wrong?

Possibly, nothing. Just because there is a certain probability of a thing happening does not mean it happens all that often. Consider an analogous situation: a baseball batter might hit safely one time out of every three at-bats; but there would be nothing particularly odd in the batter going hitless in four at-bats during a single game, however much we would expect him to get at least one. There wouldn’t be much very peculiar in his hitting all four times, either. Our expected value, the number of times something could happen times the probability of it happening each time, is not necessarily what we actually see. (We might get suspicious if we always saw the expected value turn up.)

Still, there must be some limits. We might accept a batter who hits one time out of every three getting no hits in four at-bats. If he got no runs in four hundred at-bats, we’d be inclined to say he’s not a decent hitter having some bad luck. More likely he’s failing to bring the bat with him to the plate. We need a tool to say whether some particular outcome is tolerably likely or so improbable that something must be up.