Doesn’t The Other Team Count? How Much?

I’d worked out an estimate of how much information content there is in a basketball score, by which I was careful to say the score that one team manages in a game. I wasn’t able to find out what the actual distribution of real-world scores was like, unfortunately, so I made up a plausible-sounding guess: that college basketball scores would be distributed among the imaginable numbers (whole numbers from zero through … well, infinitely large numbers, though in practice probably not more than 150) according to a very common distribution called the “Gaussian” or “normal” distribution, that the arithmetic mean score would be about 65, and that the standard deviation, a measure of how spread out the distribution of scores is, would be about 10.

If those assumptions are true, or are at least close enough to true, then there are something like 5.4 bits of information in a single team’s score. Put another way, if you were trying to divine the score by asking someone who knew it a series of carefully-chosen questions, like, “is the score less than 65?” or “is the score more than 39?”, with at each stage each question equally likely to be answered yes or no, you could expect to hit the exact score with usually five, sometimes six, such questions.

That’s only the information about a single team’s score, though. What about the scores of both teams? The easy answer to give is that it’s 10.8 bits — twice the information of a single score — and if I had to give an answer I would settle on that. But I’m not quite happy with that answer. I suspect that it’s close to correct without actually being right, and let me talk about why.

When I declare that I expect a team’s score to be represented by a Gaussian distribution I am making a bunch of assumptions about how team scores are distributed. That is, I’m not assuming anything much about one score, but I am assuming that if I looked at ten scores, or a hundred scores, or a million scores, that they would follow some rules. One of the most important, and the trickest, rule is known as independence: knowing the previous score, or scores, doesn’t give me any information about what the next score will be. For the scores of one team across many games that’s probably fair enough: that a team reached 68 points last game doesn’t tell you whether it’s going to score 68 points this game.

(Well. If a team consistently scores, say, between 40 and 50 points, that indicates that it’s not likely that it’s going to score 68 points next game. That reflects another assumption, that scores are identically distributed. That is, the way the scores are spread out doesn’t depend on which game in the season you’re looking at. This assumption of identical distribution is not precisely true when we look at a real team playing, since players are getting better or worse, getting injured or recovered, forming better strategies or holding on to strategies they should be dumping, playing stronger or weaker opponents. But the assumption is probably close enough to true for the difference not to hurt us much in this context: we assume a team will score an average of 45 points or 65 points in a game consistently.)

The score of one team may be independent. But the scores of both teams in a single game? If I know that one team scored 68 points, then I do know something about the other team’s score: that it is almost certainly not 68 points. College basketball does not ordinarily allow for tie games — it certainly doesn’t allow them in tournament play — and while it seems possible that sometimes a game will be unimportant enough or overtime extended long enough that the game will be allowed to expire with a tie score, that’s very rare. The scores of the two teams are not independent.

Even putting aside that the two teams will not (ordinarily) end the game with identical scores, the scores of two teams in a single game still won’t be independent. There are social factors that discourage, particularly, complete runaway games. If one team is beating the other by thirty points at halftime, the tendency will be for that team to play less overwhelmingly in the second half. For one, crushing an opponent too much is seen as bad sportsmanship: winning a game by 25 points marks your team as well-managed and highly skilled; winning by 50 points marks your team as bullies. For another, when the score difference is too great the players on the dominant team have good reason to take it a little easier, as there’s no point smashing one’s body further to win by 48 points when 18 is just as good in the standings. And a runaway score is a chance to substitute in second- and third-tier players and get some playing time in real game experience without running the risk of defeat, or a chance for all the players to test out new strategies or techniques, similarly without real risk. While, again, I don’t have the results of all the college basketball games I might like, I’m willing to guess that there are fewer 90-to-30 wins than would be expected from the assumption of normal distributions.

So that is why I don’t quite believe I can properly model the results of basketball scores exactly as if the two teams were drawing from the same Gaussian distribution of imaginable scores. Why do I do it anyway? And do I have a better reason than just “I haven’t got any better ideas”?

While I don’t believe the distribution of scores of both teams are exactly what I’d get from a Gaussian distribution, the real question is how wrong am I if I assume it is a Gaussian distribution anyway? If I do, I’m supposing that some runaway scores happen more often than they really do, although the Gaussian distribution would still forecast very few 90-to-30 (or similar enormous runaway scores) games anyway. Perhaps that’s few enough that I shouldn’t worry about this error. Similarly, I’m assuming that ties happen a lot more than they actually could; that seems like a more important error. On the other hand, if I know that one team has scored 65, I know that the other team got some score, any score, other than 65.

The chance of its scoring 64, or 68, or every other number is increased a tiny bit. The total increase in the chance of scoring 64, or 68, or any number other than 65 has to be exactly however likely scoring 65 was, although it’s distributed across a bunch of other scores. It seems plausible that, at least to a first approximation, the increase in the probability of every other score means that the information content — that sum of the probability of a given score times the logarithm-base-two of the probability of that score — doesn’t actually change by very much, at least not when all the possible alternate scores have been added up.

And so that’s why while I don’t believe that the second team’s score has an information content exactly the same as the first team’s score, I do suspect that it’s probably close. I am wary of my answer, but 10.8 bits is probably near enough right.