## Let Me Remind You How Interesting a Basketball Tournament Is

Several years ago I stumbled into a nice sequence. All my nice sequences have been things I stumbled upon. This one looked at the most basic elements of information theory by what they tell us about the NCAA College Basketball tournament. This is (in the main) a 64-team single-elimination playoff. It’s been a few years since I ran through the sequence. But it’s been a couple years since the tournament could be run with a reasonably clear conscience too. So here’s my essays:

And this spins off to questions about other sports events.

And I still figure to get to this year’s Pi Day comic strips. Soon. It’s been a while since I felt I had so much to write up.

## Let Me Tell You How Interesting March Madness Could Possibly Be

I read something alarming in the daily “Best of GoComics” e-mail this morning. It was a panel of Dave Whamond’s Reality Check. It’s a panel comic, although it stands out from the pack by having a squirrel character in the margins. And here’s the panel.

Certainly a solid enough pun to rate a mention. I don’t know of anyone actually doing a March Mathness bracket, but it’s not a bad idea. Rating mathematical terms for their importance or usefulness or just beauty might be fun. And might give a reason to talk about their meaning some. It’s a good angle to discuss what’s interesting about mathematical terms.

And that lets me segue into talking about a set of essays. The next few weeks see the NCAA college basketball tournament, March Madness. I’ve used that to write some stuff about information theory, as it applies to the question: is a basketball game interesting?

Along the way here I got to looking up actual scoring results from major sports. This let me estimate the information-theory content of the scores of soccer, (US) football, and baseball scores, to match my estimate of basketball scores’ information content.

• How Interesting Is A Football Score? Football scoring is a complicated thing. But I was able to find a trove of historical data to give me an estimate of the information theory content of a score.
• How Interesting Is A Baseball Score? Some Partial Results I found some summaries of actual historical baseball scores. Somehow I couldn’t find the detail I wanted for baseball, a sport that since 1845 has kept track of every possible bit of information, including how long the games ran, about every game ever. I made do, though.
• How Interesting Is A Baseball Score? Some Further Results Since I found some more detailed summaries and refined the estimate a little.
• How Interesting Is A Low-Scoring Game? And here, well, I start making up scores. It’s meant to represent low-scoring games such as soccer, hockey, or baseball to draw some conclusions. This includes the question: just because a distribution of small whole numbers is good for mathematicians, is that a good match for what sports scores are like?

## Reading the Comics, January 5, 2019: Start of the Year Edition

With me wrapping up the mathematically-themed comic strips that ran the first of the year, you can see how far behind I’m falling keeping everything current. In my defense, Monday was busier than I hoped it would be, so everything ran late. Next week is looking quite slow for comics, so maybe I can catch up then. I will never catch up on anything the rest of my life, ever.

Scott Hilburn’s The Argyle Sweater for the 2nd is a bit of wordplay about regular and irregular polygons. Many mathematical constructs, in geometry and elsewhere, come in “regular” and “irregular” forms. The regular form usually has symmetries that make it stand out. For polygons, this is each side having the same length, and each interior angle being congruent. Irregular is everything else. The symmetries which constrain the regular version of anything often mean we can prove things we otherwise can’t. But most of anything is the irregular. We might know fewer interesting things about them, or have a harder time proving them.

I’m not sure what the teacher would be asking for in how to “make an irregular polygon regular”. I mean if we pretend that it’s not setting up the laxative joke. I can think of two alternatives that would make sense. One is to draw a polygon with the same number of sides and the same perimeter as the original. The other is to draw a polygon with the same number of sides and the same area as the original. I’m not sure of the point of either. I suppose polygons of the same area have some connection to quadrature, that is, integration. But that seems like it’s higher-level stuff than this class should be doing. I hate to question the reality of a comic strip but that’s what I’m forced to do.

Bud Fisher’s Mutt and Jeff rerun for the 4th is a gambler’s fallacy joke. Superficially the gambler’s fallacy seems to make perfect sense: the chance of twelve bad things in a row has to be less than the chance of eleven bad things in a row. So after eleven bad things, the twelfth has to come up good, right? But there’s two ways this can go wrong.

Suppose each attempted thing is independent. In this case, what if each patient is equally likely to live or die, regardless of what’s come before? And in that case, the eleven deaths don’t make it more likely that the next will live.

Suppose each attempted thing is not independent, though. This is easy to imagine. Each surgery, for example, is a chance for the surgeon to learn what to do, or not do. He could be getting better, that is, more likely to succeed, each operation. Or the failures could reflect the surgeon’s skills declining, perhaps from overwork or age or a loss of confidence. Impossible to say without more data. Eleven deaths on what context suggests are low-risk operations suggest a poor chances of surviving any given surgery, though. I’m on Jeff’s side here.

Mark Anderson’s Andertoons for the 5th is a welcome return of Wavehead. It’s about ratios. My impression is that ratios don’t get much attention in themselves anymore, except to dunk on stupid Twitter comments. It’s too easy to jump right into fractions, and division. Ratios underlie this, at least historically. It’s even in the name, ‘rational numbers’.

Wavehead’s got a point in literally comparing apples and oranges. It’s at least weird to compare directly different kinds of things. This is one of those conceptual gaps between ancient mathematics and modern mathematics. We’re comfortable stripping the units off of numbers, and working with them as abstract entities. But that does mean we can calculate things that don’t make sense. This produces the occasional bit of fun on social media where we see something like Google trying to estimate a movie’s box office per square inch of land in Australia. Just because numbers can be combined doesn’t mean they should be.

Larry Wright’s Motley rerun for the 5th has the form of a story problem. And one timely to the strip’s original appearance in 1987, during the National Football League players strike. The setup, talking about the difference in weekly pay between the real players and the scabs, seems like it’s about the payroll difference. The punchline jumps to another bit of mathematics, the point spread. Which is an estimate of the expected difference in scoring between teams. I don’t know for a fact, but would imagine the scab teams had nearly meaningless point spreads. The teams were thrown together extremely quickly, without much training time. The tools to forecast what a team might do wouldn’t have the data to rely on.

The at-least-weekly appearances of Reading the Comics in these pages are at this link.

## Is A Basketball Tournament Interesting? My Thoughts

It’s a good weekend to bring this back. I have some essays about information theory and sports contests and maybe you missed them earlier. Here goes.

And then for a follow-up I started looking into actual scoring results from major sports. This let me estimate the information-theory content of the scores of soccer, (US) football, and baseball scores, to match my estimate of basketball scores’ information content.

Don’t try to use this to pass your computer science quals. But I hope it gives you something interesting to talk about while sulking over your brackets, and maybe to read about after that.

## How Interesting Is March Madness?

And now let me close the week with some other evergreen articles. A couple years back I mixed the NCAA men’s basketball tournament with information theory to produce a series of essays that fit the title I’ve given this recap. They also sprawl out into (US) football and baseball. Let me link you to them:

## How Interesting Is A Football Score?

Last month, Sarcastic Goat asked me how interesting a soccer game was. This is “interesting” in the information theory sense. I describe what that is in a series of posts, linked to from above. That had been inspired by the NCAA “March Madness” basketball tournament. I’d been wondering about the information-theory content of knowing the outcome of the tournament, and of each game.

This measure, called the entropy, we can work out from knowing how likely all the possible outcomes of something — anything — are. If there’s two possible outcomes and they’re equally likely, the entropy is 1. If there’s two possible outcomes and one is a sure thing while the other can’t happen, the entropy is 0. If there’s four possible outcomes and they’re all equally likely, the entropy is 2. If there’s eight possible outcomes, all equally likely, the entropy is 3. If there’s eight possible outcomes and some are likely while some are long shots, the entropy is … smaller than 3, but bigger than 0. The entropy grows with the number of possible outcomes and shrinks with the number of unlikely outcomes.

But it’s easy to calculate. List all the possible outcomes. Find the probability of each of those possible outcomes happening. Then, calculate minus one times the probability of each outcome times the logarithm, base two, of that outcome. For each outcome, so yes, this might take a while. Then add up all those products.

I’d estimated the outcome of the 63-game basketball tournament was somewhere around 48 bits of information. There’s a fair number of foregone, or almost foregone, conclusions in the game, after all. And I guessed, based on a toy model of what kinds of scores often turn up in college basketball games, that the game’s score had an information content of a little under 11 bits of information.

Sarcastic Goat, as I say, asked about soccer scores. I don’t feel confident that I could make up a plausible model of soccer score distributions. So I went looking for historical data. Surely, a history of actual professional soccer scores over a couple decades would show all the possible, plausible, outcomes and how likely each was to turn out.

I didn’t find one. My search for soccer scores kept getting contaminated with (American) football scores. But that turned up something interesting anyway. Sports Reference LLC has a table which purports to list the results of all professional football games played from 1920 through the start of 2016. There’ve been, apparently, some 1,026 different score outcomes, from 0-0 through to 73-0.

As you’d figure, there are a lot of freakish scores; only once in professional football history has the game ended 62-28. (Although it’s ended 62-14 twice, somehow.) There hasn’t been a 2-0 game since the second week of the 1938 season. Some scores turn up a lot; 248 games (as of this writing) have ended 20-17. That’s the most common score, in its records. 27-24 and 17-14 are the next most common scores. If I’m not making a dumb mistake, 7-0 is the 21st most common score. 93 games have ended with that tally. But it hasn’t actually been a game’s final score since the 14th week of the 1983 season, somehow. 98 games have ended 21-17; only ten have ended 21-18. Weird.

Anyway, there’s 1,026 recorded outcomes. That’s surely as close to “all the possible outcomes” as we can expect to get, at least until the Jets manage to lose 74-0 in their home opener. But if all 1,026 outcomes were equally likely then the information content of the game’s score would be a touch over 10 bits. But these outcomes aren’t all equally likely. It’s vastly more likely that a game ended 16-13 than it is likely it ended 16-8.

Let’s suppose I didn’t make any stupid mistakes in working out the frequency of all the possible outcomes. Then the information content of a football game’s outcome is a little over 8.72 bits.

Don’t be too hypnotized by the digits past the decimal. It’s approximate. But it suggests that if you were asking a source that would only answer ‘yes’ or ‘no’, then you could expect to get the score for any particular football game with about nine well-chosen questions.

I’m not surprised this is less than my estimated information content of a basketball game’s score. I think basketball games see a wider range of likely scores than football games do.

If someone has a reference for the outcomes of soccer games — or other sports — over a reasonably long time please let me know. I can run the same sort of calculation. We might even manage the completely pointless business of ranking all major sports by the information content of their scores.

## Gaussian distribution of NBA scores

The Prior Probability blog points out an interesting graph, showing the most common scores in basketball teams, based on the final scores of every NBA game. It’s actually got three sets of data there, one for all basketball games, one for games this decade, and one for basketball games of the 1950s. Unsurprisingly there’s many more results for this decade — the seasons are longer, and there are thirty teams in the league today, as opposed to eight or nine in 1954. (The Baltimore Bullets played fourteen games before folding, and the games were expunged from the record. The league dropped from eleven teams in 1950 to eight for 1954-1959.)

I’m fascinated by this just as a depiction of probability distributions: any team can, in principle, reach most any non-negative score in a game, but it’s most likely to be around 102. Surely there’s a maximum possible score, based on the fact a team has to get the ball and get into position before it can score; I’m a little curious what that would be.

Prior Probability itself links to another blog which reviews the distribution of scores for other major sports, and the interesting result of what the most common basketball score has been, per decade. It’s increased from the 1940s and 1950s, but it’s considerably down from the 1960s.

You can see the most common scores in such sports as basketball, football, and baseball in Philip Bump’s fun Wonkblog post here. Mr Bump writes: “Each sport follows a rough bell curve … Teams that regularly fall on the left side of that curve do poorly. Teams that land on the right side do well.” Read more about Gaussian distributions here.

View original post