Last month, Sarcastic Goat asked me how interesting a soccer game was. This is “interesting” in the information theory sense. I describe what that is in a series of posts, linked to from above. That had been inspired by the NCAA “March Madness” basketball tournament. I’d been wondering about the information-theory content of knowing the outcome of the tournament, and of each game.
This measure, called the entropy, we can work out from knowing how likely all the possible outcomes of something — anything — are. If there’s two possible outcomes and they’re equally likely, the entropy is 1. If there’s two possible outcomes and one is a sure thing while the other can’t happen, the entropy is 0. If there’s four possible outcomes and they’re all equally likely, the entropy is 2. If there’s eight possible outcomes, all equally likely, the entropy is 3. If there’s eight possible outcomes and some are likely while some are long shots, the entropy is … smaller than 3, but bigger than 0. The entropy grows with the number of possible outcomes and shrinks with the number of unlikely outcomes.
But it’s easy to calculate. List all the possible outcomes. Find the probability of each of those possible outcomes happening. Then, calculate minus one times the probability of each outcome times the logarithm, base two, of that outcome. For each outcome, so yes, this might take a while. Then add up all those products.
I’d estimated the outcome of the 63-game basketball tournament was somewhere around 48 bits of information. There’s a fair number of foregone, or almost foregone, conclusions in the game, after all. And I guessed, based on a toy model of what kinds of scores often turn up in college basketball games, that the game’s score had an information content of a little under 11 bits of information.
Sarcastic Goat, as I say, asked about soccer scores. I don’t feel confident that I could make up a plausible model of soccer score distributions. So I went looking for historical data. Surely, a history of actual professional soccer scores over a couple decades would show all the possible, plausible, outcomes and how likely each was to turn out.
I didn’t find one. My search for soccer scores kept getting contaminated with (American) football scores. But that turned up something interesting anyway. Sports Reference LLC has a table which purports to list the results of all professional football games played from 1920 through the start of 2016. There’ve been, apparently, some 1,026 different score outcomes, from 0-0 through to 73-0.
As you’d figure, there are a lot of freakish scores; only once in professional football history has the game ended 62-28. (Although it’s ended 62-14 twice, somehow.) There hasn’t been a 2-0 game since the second week of the 1938 season. Some scores turn up a lot; 248 games (as of this writing) have ended 20-17. That’s the most common score, in its records. 27-24 and 17-14 are the next most common scores. If I’m not making a dumb mistake, 7-0 is the 21st most common score. 93 games have ended with that tally. But it hasn’t actually been a game’s final score since the 14th week of the 1983 season, somehow. 98 games have ended 21-17; only ten have ended 21-18. Weird.
Anyway, there’s 1,026 recorded outcomes. That’s surely as close to “all the possible outcomes” as we can expect to get, at least until the Jets manage to lose 74-0 in their home opener. But if all 1,026 outcomes were equally likely then the information content of the game’s score would be a touch over 10 bits. But these outcomes aren’t all equally likely. It’s vastly more likely that a game ended 16-13 than it is likely it ended 16-8.
Let’s suppose I didn’t make any stupid mistakes in working out the frequency of all the possible outcomes. Then the information content of a football game’s outcome is a little over 8.72 bits.
Don’t be too hypnotized by the digits past the decimal. It’s approximate. But it suggests that if you were asking a source that would only answer ‘yes’ or ‘no’, then you could expect to get the score for any particular football game with about nine well-chosen questions.
I’m not surprised this is less than my estimated information content of a basketball game’s score. I think basketball games see a wider range of likely scores than football games do.
If someone has a reference for the outcomes of soccer games — or other sports — over a reasonably long time please let me know. I can run the same sort of calculation. We might even manage the completely pointless business of ranking all major sports by the information content of their scores.