And now let me close the week with some other evergreen articles. A couple years back I mixed the NCAA men’s basketball tournament with information theory to produce a series of essays that fit the title I’ve given this recap. They also sprawl out into (US) football and baseball. Let me link you to them:
Got another little flood of mathematically-themed comic strips last week and so once again I’ll split them along something that looks kind of middle-ish. Also this is another bunch of GoComics.com-only posts. Since those seem to be accessible to anyone whether or not they’re subscribers indefinitely far into the future I don’t feel like I can put the comics directly up and will trust you all to click on the links that you find interesting. Which is fine; the new GoComics.com design makes it annoyingly hard to download a comic strip. I don’t think that was their intention. But that’s one of the two nagging problems I have with their new design. So you know.
Tony Cochran’s Agnes for the 5th sees a brand-new mathematics. Always dangerous stuff. But mathematicians do invent, or discover, new things in mathematics all the time. Part of the task is naming the things in it. That’s something which takes talent. Some people, such as Leonhard Euler, had the knack a great novelist has for putting names to things. The rest of us muddle along. Often if there’s any real-world inspiration, or resemblance to anything, we’ll rely on that. And we look for terminology that evokes similar ideas in other fields. … And, Agnes would like to know, there is mathematics that’s about approximate answers, being “right around” the desired answer. Unfortunately, that’s hard. (It’s all hard, if you’re going to take it seriously, much like everything else people do.)
Carol Lay’s Lay Lines for the 6th depicts the hazards of thinking deeply and hard about the infinitely large and the infinitesimally small. They’re hard. Our intuition seems well-suited to handing a modest bunch of household-sized things. Logic guides us when thinking about the infinitely large or small, but it takes a long time to get truly conversant and comfortable with it all.
Paul Gilligan’s Pooch Cafe for the 6th sees Poncho try to argue there’s thermodynamical reasons for not being kind. Reasoning about why one should be kind (or not) is the business of philosophers and I won’t overstep my expertise. Poncho’s mathematics, that’s something I can write about. He argues “if you give something of yourself, you inherently have less”. That seems to be arguing for a global conservation of self-ness, that the thing can’t be created or lost, merely transferred around. That’s fair enough as a description of what the first law of thermodynamics tells us about energy. The equation he reads off reads, “the change in the internal energy (Δ U) equals the heat added to the system (U) minus the work done by the system (W)”. Conservation laws aren’t unique to thermodynamics. But Poncho may be aware of just how universal and powerful thermodynamics is. I’m open to an argument that it’s the most important field of physics.
Jonathan Lemon’s Rabbits Against Magic for the 6th is another strip Intro to Calculus instructors can use for their presentation on instantaneous versus average velocities. There’s been a bunch of them recently. I wonder if someone at Comic Strip Master Command got a speeding ticket.
Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 6th is about numeric bases. They’re fun to learn about. There’s an arbitrariness in the way we represent concepts. I think we can understand better what kinds of problems seem easy and what kinds seem harder if we write them out different ways. But base eleven is only good for jokes.
And now I can close out last week’s mathematically-themed comic strips. Two of them are even about counting, which is enough for me to make that the name of this set.
John Allen’s Nest Heads for the 2nd mentions a probability and statistics class and something it’s supposed to be good for. I would agree that probability and statistics are probably (I can’t find a better way to write this) the most practically useful mathematics one can learn. At least once you’re past arithmetic. They’re practical by birth; humans began studying them because they offer guidance in uncertain situations. And one can use many of their tools without needing more than arithmetic.
I’m not so staunchly anti-lottery as many mathematics people are. I’ll admit I play it myself, when the jackpot is large enough. When the expectation value of the prize gets to be positive, it’s harder to rationalize not playing. This happens only once or twice a year, but it’s fun to watch and see when it happens. I grant it’s a foolish way to use two dollars (two tickets are my limit), but you know? My budget is not so tight I can’t spend four dollars foolishly a year. Besides, I don’t insist on winning one of those half-billion-dollar prizes. I imagine I’d be satisfied if I brought in a mere $10,000.
Rick Detorie’s One Big Happy for the 3rd continues my previous essay’s bit of incompetence at basic mathematics, here, counting. But working out that her age is between 22 an a gazillion may be worth doing. It’s a common mathematical challenge to find a correct number starting from little information about it. Usually we find it by locating bounds: the number must be larger than this and smaller than that. And then get the bounds closer together. Stop when they’re close enough for our needs, if we’re numerical mathematicians. Stop when the bounds are equal to each other, if we’re analytic mathematicians. That can take a lot of work. Many problems in number theory amount to “improve our estimate of the lowest (or highest) number for which this is true”. We have to start somewhere.
Samson’s Dark Side of the Horse for the 3rd is a counting-sheep joke and I was amused that the counting went so awry here. On looking over the strip again for this essay, though, I realize I read it wrong. It’s the fences that are getting counted, not the sheep. Well, it’s a cute little sheep having the same problems counting that Horace has. We don’t tend to do well counting more than around seven things at a glance. We can get a bit farther if we can group things together and spot that, say, we have four groups of four fences each. That works and it’s legitimate; we’re counting and we get the right count out of it. But it does feel like we’re doing something different from how we count, say, three things at a glance.
Mick Mastroianni and Mason MastroianniDogs of C Kennel for the 3rd is about the world’s favorite piece of statistical mechanics, entropy. There’s room for quibbling about what exactly we mean by thermodynamics saying all matter is slowly breaking down. But the gist is fair enough. It’s still mysterious, though. To say that the disorder of things is always increasing forces us to think about what we mean by disorder. It’s easy to think we have an idea what we mean by it. It’s hard to make that a completely satisfying definition. In this way it’s much like randomness, which is another idea often treated as the same as disorder.
Bill Amend’s FoxTrot Classics for the 3rd reprinted the comic from the 10th of February, 2006. Mathematics teachers always want to see how you get your answers. Why? … Well, there are different categories of mistakes someone can make. One can set out trying to solve the wrong problem. One can set out trying to solve the right problem in a wrong way. One can set out solving the right problem in the right way and get lost somewhere in the process. Or one can be doing just fine and somewhere along the line change an addition to a subtraction and get what looks like the wrong answer. Each of these is a different kind of mistake. Knowing what kinds of mistakes people make is key to helping them not make these mistakes. They can get on to making more exciting mistakes.
I know it seems like when I write these essays I spend the most time on the first comic in the bunch and give the last ones a sentence, maybe two at most. I admit when there’s a lot of comics I have to write up at once my energy will droop. But Comic Strip Master Command apparently wants the juiciest topics sent out earlier in the week. I have to follow their lead.
Stephen Beals’s Adult Children for the 14th uses mathematics to signify deep thinking. In this case Claremont, the dog, is thinking of the Riemann Zeta function. It’s something important in number theory, so longtime readers should know this means it leads right to an unsolved problem. In this case it’s the Riemann Hypothesis. That’s the most popular candidate for “what is the most important unsolved problem in mathematics right now?” So you know Claremont is a deep-thinking dog.
The big Σ ordinary people might recognize as representing “sum”. The notation means to evaluate, for each legitimate value of the thing underneath — here it’s ‘n’ — the value of the expression to the right of the Sigma. Here that’s . Then add up all those terms. It’s not explicit here, but context would make clear, n is positive whole numbers: 1, 2, 3, and so on. s would be a positive number, possibly a whole number.
The big capital Pi is more mysterious. It’s Sigma’s less popular brother. It means “product”. For each legitimate value of the thing underneath it — here it’s “p” — evaluate the expression on the right. Here that’s . Then multiply all that together. In the context of the Riemann Zeta function, “p” here isn’t just any old number, or even any old whole number. It’s only the prime numbers. Hence the “p”. Good notation, right? Yeah.
This particular equation, once shored up with the context the symbols live in, was proved by Leonhard Euler, who proved so much you sometimes wonder if later mathematicians were needed at all. It ties in to how often whole numbers are going to be prime, and what the chances are that some set of numbers are going to have no factors in common. (Other than 1, which is too boring a number to call a factor.) But even if Claremont did know that Euler got there first, it’s almost impossible to do good new work without understanding the old.
Charlos Gary’s Working It Out for the 14th is this essay’s riff on pie charts. Or bar charts. Somewhere around here the past week I read that a French idiom for the pie chart is the “cheese chart”. That’s a good enough bit I don’t want to look more closely and find out whether it’s true. If it turned out to be false I’d be heartbroken.
Ryan North’s Dinosaur Comics for the 15th talks about everyone’s favorite physics term, entropy. Everyone knows that it tends to increase. Few advanced physics concepts feel so important to everyday life. I almost made one expression of this — Boltzmann’s H-Theorem — a Theorem Thursday post. I might do a proper essay on it yet. Utahraptor describes this as one of “the few statistical laws of physics”, which I think is a bit unfair. There’s a lot about physics that is statistical; it’s often easier to deal with averages and distributions than the mass of real messy data.
Utahraptor’s right to point out that it isn’t impossible for entropy to decrease. It can be expected not to, in time. Indeed decent scientists thinking as philosophers have proposed that “increasing entropy” might be the only way to meaningfully define the flow of time. (I do not know how decent the philosophy of this is. This is far outside my expertise.) However: we would expect at least one tails to come up if we simultaneously flipped infinitely many coins fairly. But there is no reason that it couldn’t happen, that infinitely many fairly-tossed coins might all come up heads. The probability of this ever happening is zero. If we try it enough times, it will happen. Such is the intuition-destroying nature of probability and of infinitely large things.
Tony Cochran’s Agnes on the 16th proposes to decode the Voynich Manuscript. Mathematics comes in as something with answers that one can check for comparison. It’s a familiar role. As I seem to write three times a month, this is fair enough to say to an extent. Coming up with an answer to a mathematical question is hard. Checking the answer is typically easier. Well, there are many things we can try to find an answer. To see whether a proposed answer works usually we just need to go through it and see if the logic holds. This might be tedious to do, especially in those enormous brute-force problems where the proof amounts to showing there are a hundred zillion special cases and here’s an answer for each one of them. But it’s usually a much less hard thing to do.
Johnny Hart and Brant Parker’s Wizard of Id Classics for the 17th uses what seems like should be an old joke about bad accountants and nepotism. Well, you all know how important bookkeeping is to the history of mathematics, even if I’m never that specific about it because it never gets mentioned in the histories of mathematics I read. And apparently sometime between the strip’s original appearance (the 20th of August, 1966) and my childhood the Royal Accountant character got forgotten. That seems odd given the comic potential I’d imagine him to have. Sometimes a character’s only good for a short while is all.
Mark Anderson’s Andertoons for the 18th is the Andertoons representative for this essay. Fair enough. The kid speaks of exponents as a kind of repeating oneself. This is how exponents are inevitably introduced: as multiplying a number by itself many times over. That’s a solid way to introduce raising a number to a whole number. It gets a little strained to describe raising a number to a rational number. It’s a confusing mess to describe raising a number to an irrational number. But you can make that logical enough, with effort. And that’s how we do make the idea rigorous. A number raised to (say) the square root of two is something greater than the number raised to 1.4, but less than the number raised to 1.5. More than the number raised to 1.41, less than the number raised to 1.42. More than the number raised to 1.414, less than the number raised to 1.415. This takes work, but it all hangs together. And then we ask about raising numbers to an imaginary or complex-valued number and we wave that off to a higher-level mathematics class.
Lachowski’s Get A Life for the 18th is the sudoku joke for this essay. It’s also a representative of the idea that any mathematical thing is some deep, complicated puzzle at least as challenging as calculating one’s taxes. I feel like this is a rerun, but I don’t see any copyright dates. Sudoku jokes like this feel old, but comic strips have been known to make dated references before.
Samson’s Dark Side Of The Horse for the 19th is this essay’s Dark Side Of The Horse gag. I thought initially this was a counting-sheep in a lab coat. I’m going to stick to that mistaken interpretation because it’s more adorable that way.
I’m still curious about the information-theory content, the entropy, of sports scores. I haven’t found the statistics I need about baseball or soccer game outcomes that I need. I’d also like hockey score outcomes if I could get them. If anyone knows a reference I’d be glad to know of it.
But there’s still stuff I can talk about without knowing details of every game ever. One of them suggested itself when I looked at the Washington Post‘s graphic. I mean the one giving how many times each score came up in baseball’s history.
I had planned to write about this when one of my Twitter friends wrote —
@nebusj Cool, looks sort of like a beta distribution. Most movie grosses over time follow a similar curve.
By “distribution” mathematicians mean almost what you would imagine. Suppose we have something that might hold any of a range of values. This we call a “random variable”. How likely is it to hold any particular value? That’s what the distribution tells us. The higher the distribution, the more likely it is we’ll see that value. In baseball terms, that means we’re reasonably likely to see a game with a team scoring three runs. We’re not likely to see a game with a team scoring twenty runs.
There are many families of distributions. Feloni Mayhem suggested the baseball scores look like one called the Beta Distribution. I can’t quite agree, on technical grounds. Beta Distributions describe continuously-valued variables. They’re good for stuff like the time it takes to do something, or the height of a person, or the weight of a produced thing. They’re for measurements that can, in principle, go on forever after the decimal point. A baseball score isn’t like that. A team can score zero points, or one, or 46, but it can’t score four and two-thirds points. Baseball scores are “discrete” variables.
But there are good distributions for discrete variables. Almost everything you encounter taking an Intro to Probability class will be about discrete variables. So will most any recreational mathematics puzzle. The distribution of a tossed die’s outcomes is discrete. So is the number of times tails comes up in a set number of coin tosses. So are the birth dates of people in a room, or the number of cars passed on the side of the road during your ride, or the number of runs scored by a baseball team in a full game.
I suspected that, of the simpler distributions, the best model for baseball should be the Poisson distribution. It also seems good for any other low-scoring game, such as soccer or hockey. The Poisson distribution turns up whenever you have a large number of times that some discrete event can happen. But that event can happen only once each chance. And it has a constant chance of happening. That is, happening this chance doesn’t make it more likely or less likely it’ll happen next chance.
I have reasons to think baseball scoring should be well-modelled this way. There are hundreds of pitches in a game. Each of them is in principle a scoring opportunity. (Well, an intentional walk takes three pitches without offering any chance for scoring. And there’s probably some other odd case where a pitched ball can’t even in principle let someone score. But these are minor fallings-away from the ideal.) This is part of the appeal of baseball, at least for some: the chance is always there.
We only need one number to work out the Poisson distribution of something. That number is the mean, the arithmetic mean of all the possible values. Let me call the mean μ, which is the Greek version of m and so a good name for a mean. The probability that you’ll see the thing happen n times is . Here e is that base of the natural logarithm, that 2.71828 et cetera number. n! is the factorial. That’s n times (n – 1) times (n – 2) times (n – 3) and so on all the way down to times 2 times 1.
And here is the Poisson distribution for getting numbers from 0 through 20, if we take the mean to be 3.4. I can defend using the Poisson distribution much more than I can defend picking 3.4 as the mean. Why not 3.2, or 3.8? Mostly, I tried a couple means around the three-to-four runs range and picked one that looked about right. Given the lack of better data, what else can I do?
I don’t think it’s a bad fit. The shape looks about right, to me. But the Poisson distribution suggests fewer zero- and one-run games than the actual data offers. And there are more high-scoring games in the real data than in the Poisson distribution. Maybe there’s something that needs tweaking.
And there are several plausible causes for this. A Poisson distribution, for example, supposes that there are a lot of chances for a distinct event. That would be scoring on a pitch. But in an actual baseball game there might be up to four runs scored on one pitch. It’s less likely to score four runs than to score one, sure, but it does happen. This I imagine boosts the number of high-scoring games.
I suspect this could be salvaged by a model that’s kind of a chain of Poisson distributions. That is, have one distribution that represents the chance of scoring on any given pitch. Then use another distribution to say whether the scoring was one, two, three, or four runs.
Low-scoring games I have a harder time accounting for. My suspicion is that each pitch isn’t quite an independent event. Experience shows that pitchers lose control of their game the more they pitch. This results in the modern close watching of pitch counts. We see pitchers replaced at something like a hundred pitches even if they haven’t lost control of the game yet.
If I am right this says we might model games like soccer and hockey, with many chances to score a single run each, with a Poisson distribution. A game like baseball, or basketball, with many chances to score one or more points at once needs a more complicated model.
Last week’s Reading The Comics was a bunch of Gocomics.com strips. And I don’t feel the need to post the images for those, since they’re reasonably stable links. Today’s is also a bunch of Gocomics.com strips. I know how every how-to-bring-in-readers post ever says you should include images. Maybe I will commission someone to do some icons. It couldn’t hurt.
Someone looking close at the title, with responsible eye protection, might notice it’s dated the 17th, a day this is not. There haven’t been many mathematically-themed comic strips since the 17th is all. And I’m thinking to try out, at least for a while, making the day on which a Reading the Comics post is issued regular. Maybe Monday. This might mean there are some long and some short posts, but being a bit more scheduled might help my writing.
Mark Anderson’s Andertoons for the 14th is the charting joke for this essay. Also the Mark Anderson joke for this essay. I was all ready to start explaining ways that the entropy of something can decrease. The easiest way is by expending energy, which we can see as just increasing entropy somewhere else in the universe. The one requiring the most patience is simply waiting: entropy almost always increases, or at least doesn’t decrease. But “almost always” isn’t the same as “always”. But I have to pass. I suspect Anderson drew the chart going down because of the sense of entropy being a winding-down of useful stuff. Or because of down having connotations of failure, and the increase of entropy suggesting the failing of the universe. And we can also read this as a further joke: things are falling apart so badly that even entropy isn’t working like it ought. Anderson might not have meant for a joke that sophisticated, but if he wants to say he did I won’t argue it.
Scott Adams’s Dilbert Classics for the 14th reprinted the comic of the 20th of March, 1993. I admit I do this sort of compulsive “change-simplifying” paying myself. It’s easy to do if you have committed to memory pairs of numbers separated by five: 0 and 5, 1 and 6, 2 and 7, and so on. So if I get a bill for (say) $4.18, I would look for whether I have three cents in change. If I have, have I got 23 cents? That would give me back a nickel. 43 cents would give me back a quarter in change. And a quarter is great because I can use that for pinball.
Sometimes the person at the cash register doesn’t want a ridiculous bunch of change. I don’t blame them. It’s easy to suppose that someone who’s given you $5.03 for a $4.18 charge misunderstood what the bill was. Some folks will take this as a chance to complain mightily about how kids don’t learn even the basics of mathematics anymore and the world is doomed because the young will follow their job training and let machines that are vastly better at arithmetic than they are do arithmetic. This is probably what Adams was thinking, since, well, look at the clerk’s thought balloon in the final panel.
But consider this: why would Dilbert have handed over $7.14? Or, specifically, how could he give $7.14 to the clerk but not have been able to give $2.14, which would make things easier on everybody? There’s no combination of bills — in United States or, so far as I’m aware, any major world currency — in which you can give seven dollars but not two dollars. He had to be handing over five dollars he was getting right back. The clerk would be right to suspect this. It looks like the start of a change scam, begun by giving a confusing amount of money.
Had Adams written it so that the charge was $6.89, and Dilbert “helpfully” gave $12.14, then Dilbert wouldn’t be needlessly confusing things.
Dave Whamond’s Reality Check for the 15th is that pirate-based find-x joke that feels like it should be going around Facebook, even though I don’t think it has been. I can’t say the combination of jokes quite makes logical sense, but I’m amused. It might be from the Reality Check squirrel in the corner.
Nate Fakes’s Break of Day for the 16th is the anthropomorphized shapes joke for this essay. It’s not the only shapes joke, though.
Rick Detorie’s One Big Happy rerun for the 17th is another shapes joke. Ruthie has strong ideas about what distinguishes a pyramid from a triangle. In this context I can’t say she’s wrong to assert what a pyramid is.
While researching for my post about the information content of baseball scores I found some tantalizing links. I had wanted to know how often each score came up. From this I could calculate the entropy, the amount of information in the score. That’s the sum, taken over every outcome, of minus one times the frequency of that score times the base-two logarithm of the frequency of the outcome. And I couldn’t find that.
An article in The Washington Post had a fine lead, though. It offers, per the title, “the score of every basketball, football, and baseball game in league history visualized”. And as promised it gives charts of how often each number of runs has turned up in a game. The most common single-team score in a game is 3, with 4 and 2 almost as common. I’m not sure the date range for these scores. The chart says it includes (and highlights) data from “a century ago”. And as the article was posted in December 2014 it can hardly use data from after that. I can’t imagine that the 2015 season has changed much, though. And whether they start their baseball statistics at either 1871, 1876, 1883, 1891, or 1901 (each a defensible choice) should only change details.
Which is fine. I can’t get precise frequency data from the chart. The chart offers how many thousands of times a particular score has come up. But there’s not the reference lines to say definitely whether a zero was scored closer to 21,000 or 22,000 times. I will accept a rough estimate, since I can’t do any better.
I made my best guess at the frequency, from the chart. And then made a second-best guess. My best guess gave the information content of a single team’s score as a touch more than 3.5 bits. My second-best guess gave the information content as a touch less than 3.5 bits. So I feel safe in saying a single team’s score is about three and a half bits of information.
So the score of a baseball game, with two teams scoring, is probably somewhere around twice that, or about seven bits of information.
I have to say “around”. This is because the two teams aren’t scoring runs independently of one another. Baseball doesn’t allow for tie games except in rare circumstances. (It would usually be a game interrupted for some reason, and then never finished because the season ended with neither team in a position where winning or losing could affect their standing. I’m not sure that would technically count as a “game” for Major League Baseball statistical purposes. But I could easily see a roster of game scores counting that.) So if one team’s scored three runs in a game, we have the information that the other team almost certainly didn’t score three runs.
This estimate, though, does fit within my range estimate from 3.76 to 9.25 bits. And as I expected, it’s closer to nine bits than to four bits. The entropy seems to be a bit less than (American) football scores — somewhere around 8.7 bits — and college basketball — probably somewhere around 10.8 bits — which is probably fair. There are a lot of numbers that make for plausible college basketball scores. There are slightly fewer pairs of numbers that make for plausible football scores. There are fewer still pairs of scores that make for plausible baseball scores. So there’s less information conveyed in knowing that the game’s score is.
The raw data is available. Retrosheet.org has logs of pretty much every baseball game, going back to the forming of major leagues in the 1870s. What they don’t have, as best I can figure, is a list of all the times each possible baseball score has turned up. That I could probably work out, when I feel up to writing the scripts necessary, but “work”? Ugh.
Some people have done the work, although they haven’t shared all the results. I don’t blame them; the full results make for a boring sort of page. “The Most Popular Scores In Baseball History”, at ValueOverReplacementGrit.com, reports the top ten most common scores from 1871 through 2010. The essay also mentions that as of then there were 611 unique final scores. And that lets me give some partial results, if we trust that blogger post from people I never heard of before are accurate and true. I will make that assumption over and over here.
There’s, in principle, no limit to how many scores are possible. Baseball contains many implied infinities, and it’s not impossible that a game could end, say, 580 to 578. But it seems likely that after 139 seasons of play there can’t be all that many more scores practically achievable.
Suppose then there are 611 possible baseball score outcomes, and that each of them is equally likely. Then the information-theory content of a score’s outcome is negative one times the logarithm, base two, of 1/611. That’s a number a little bit over nine and a quarter. You could deduce the score for a given game by asking usually nine, sometimes ten, yes-or-no questions from a source that knew the outcome. That’s a little higher than the 8.7 I worked out for football. And it’s a bit less than the 10.8 I estimate for college basketball.
(You may wonder how incompetent baseball players of the 1870s were that a game could get to 49-33. Not so bad as you imagine. But the equipment and conditions they were playing with were unspeakably bad by modern standards. Notably, the playing field couldn’t be counted on to be flat and level and well-mowed. There would be unexpected divots or irregularities. This makes even simple ground balls hard to field. The baseball, instead of being replaced with every batter, would stay in the game. It would get beaten until it was a little smashed shell of unpredictable dynamics and barely any structural integrity. People were playing without gloves. If a game ran long enough, they would play at dusk, without lights, with a muddy ball on a dusty field. And sometimes you just have four innings that get out of control.)
What’s needed is a guide to what are the common scores and what are the rare scores. And I haven’t found that, nor worked up the energy to make the list myself. But I found some promising partial results. In a September 2008 post on Baseball-Fever.com, user weskelton listed the 24 most common scores and their frequency. This was for games from 1993 to 2008. One might gripe that the list only covers fifteen years. True enough, but if the years are representative that’s fine. And the top scores for the fifteen-year survey look to be pretty much the same as the 139-year tally. The 24 most common scores add up to just over sixty percent of all baseball games, which leaves a lot of scores unaccounted for. I am amazed that about three in five games will have a score that’s one of these 24 choices though.
But that’s something. We can calculate the information content for the 25 outcomes, one each of the 24 particular scores and one for “other”. This will under-estimate the information content. That’s because “other” is any of 587 possible outcomes that we’re not distinguishing. But if we have a lower bound and an upper bound, then we’ve learned something about what the number we want can actually be. The upper bound is that 9.25, above.
The information content, the entropy, we calculate from the probability of each outcome. We don’t know what that is. Not really. But we can suppose that the frequency of each outcome is close to its probability. If there’ve been a lot of games played, then the frequency of a score and the probability of a score should be close. At least they’ll be close if games are independent, if the score of one game doesn’t affect another’s. I think that’s close to true. (Some games at the end of pennant races might affect each other: why try so hard to score if you’re already out for the year? But there’s few of them.)
The entropy then we find by calculating, for each outcome, a product. It’s minus one times the probability of that outcome times the base-two logarithm of the probability of that outcome. Then add up all those products. There’s good reasons for doing it this way and in the college-basketball link above I give some rough explanations of what the reasons are. Or you can just trust that I’m not lying or getting things wrong on purpose.
So let’s suppose I have calculated this right, using the 24 distinct outcomes and the one “other” outcome. That makes out the information content of a baseball score’s outcome to be a little over 3.76 bits.
As said, that’s a low estimate. Lumping about two-fifths of all games into the single category “other” drags the entropy down.
But that gives me a range, at least. A baseball game’s score seems to be somewhere between about 3.76 and 9.25 bits of information. I expect that it’s closer to nine bits than it is to four bits, but will have to do a little more work to make the case for it.
Last month, Sarcastic Goat asked me how interesting a soccer game was. This is “interesting” in the information theory sense. I describe what that is in a series of posts, linked to from above. That had been inspired by the NCAA “March Madness” basketball tournament. I’d been wondering about the information-theory content of knowing the outcome of the tournament, and of each game.
This measure, called the entropy, we can work out from knowing how likely all the possible outcomes of something — anything — are. If there’s two possible outcomes and they’re equally likely, the entropy is 1. If there’s two possible outcomes and one is a sure thing while the other can’t happen, the entropy is 0. If there’s four possible outcomes and they’re all equally likely, the entropy is 2. If there’s eight possible outcomes, all equally likely, the entropy is 3. If there’s eight possible outcomes and some are likely while some are long shots, the entropy is … smaller than 3, but bigger than 0. The entropy grows with the number of possible outcomes and shrinks with the number of unlikely outcomes.
But it’s easy to calculate. List all the possible outcomes. Find the probability of each of those possible outcomes happening. Then, calculate minus one times the probability of each outcome times the logarithm, base two, of that outcome. For each outcome, so yes, this might take a while. Then add up all those products.
I’d estimated the outcome of the 63-game basketball tournament was somewhere around 48 bits of information. There’s a fair number of foregone, or almost foregone, conclusions in the game, after all. And I guessed, based on a toy model of what kinds of scores often turn up in college basketball games, that the game’s score had an information content of a little under 11 bits of information.
Sarcastic Goat, as I say, asked about soccer scores. I don’t feel confident that I could make up a plausible model of soccer score distributions. So I went looking for historical data. Surely, a history of actual professional soccer scores over a couple decades would show all the possible, plausible, outcomes and how likely each was to turn out.
As you’d figure, there are a lot of freakish scores; only once in professional football history has the game ended 62-28. (Although it’s ended 62-14 twice, somehow.) There hasn’t been a 2-0 game since the second week of the 1938 season. Some scores turn up a lot; 248 games (as of this writing) have ended 20-17. That’s the most common score, in its records. 27-24 and 17-14 are the next most common scores. If I’m not making a dumb mistake, 7-0 is the 21st most common score. 93 games have ended with that tally. But it hasn’t actually been a game’s final score since the 14th week of the 1983 season, somehow. 98 games have ended 21-17; only ten have ended 21-18. Weird.
Anyway, there’s 1,026 recorded outcomes. That’s surely as close to “all the possible outcomes” as we can expect to get, at least until the Jets manage to lose 74-0 in their home opener. But if all 1,026 outcomes were equally likely then the information content of the game’s score would be a touch over 10 bits. But these outcomes aren’t all equally likely. It’s vastly more likely that a game ended 16-13 than it is likely it ended 16-8.
Let’s suppose I didn’t make any stupid mistakes in working out the frequency of all the possible outcomes. Then the information content of a football game’s outcome is a little over 8.72 bits.
Don’t be too hypnotized by the digits past the decimal. It’s approximate. But it suggests that if you were asking a source that would only answer ‘yes’ or ‘no’, then you could expect to get the score for any particular football game with about nine well-chosen questions.
I’m not surprised this is less than my estimated information content of a basketball game’s score. I think basketball games see a wider range of likely scores than football games do.
If someone has a reference for the outcomes of soccer games — or other sports — over a reasonably long time please let me know. I can run the same sort of calculation. We might even manage the completely pointless business of ranking all major sports by the information content of their scores.
The Kullback-Leibler Divergence comes to us from information theory. It’s also known as “information divergence” or “relative entropy”. Entropy is by now a familiar friend. We got to know it through, among other things, the “How interesting is a basketball tournament?” question. In this context, entropy is a measure of how surprising it would be to know which of several possible outcomes happens. A sure thing has an entropy of zero; there’s no potential surprise in it. If there are two equally likely outcomes, then the entropy is 1. If there are four equally likely outcomes, then the entropy is 2. If there are four possible outcomes, but one is very likely and the other three mediocre, the entropy might be low, say, 0.5 or so. It’s mostly but not perfectly predictable.
Suppose we have a set of possible outcomes for something. (Pick anything you like. It could be the outcomes of a basketball tournament. It could be how much a favored stock rises or falls over the day. It could be how long your ride into work takes. As long as there are different possible outcomes, we have something workable.) If we have a probability, a measure of how likely each of the different outcomes is, then we have a probability distribution. More likely things have probabilities closer to 1. Less likely things have probabilities closer to 0. No probability is less than zero or more than 1. All the probabilities added together sum up to 1. (These are the rules which make something a probability distribution, not just a bunch of numbers we had in the junk drawer.)
The Kullback-Leibler Divergence describes how similar two probability distributions are to one another. Let me call one of these probability distributions p. I’ll call the other one q. We have some number of possible outcomes, and we’ll use k as an index for them. pk is how likely, in distribution p, that outcome number k is. qk is how likely, in distribution q, that outcome number k is.
To calculate this divergence, we work out, for each k, the number pk times the logarithm of pk divided by qk. Here the logarithm is base two. Calculate all this for every one of the possible outcomes, and add it together. This will be some number that’s at least zero, but it might be larger.
The closer that distribution p and distribution q are to each other, the smaller this number is. If they’re exactly the same, this number will be zero. The less that distribution p and distribution q are like each other, the bigger this number is.
And that’s all good fun, but, why bother with it? And at least one answer I can give is that it lets us measure how good a model of something is.
Suppose we think we have an explanation for how something varies. We can say how likely it is we think there’ll be each of the possible different outcomes. This gives us a probability distribution which let’s call q. We can compare that to actual data. Watch whatever it is for a while, and measure how often each of the different possible outcomes actually does happen. This gives us a probability distribution which let’s call p.
If our model is a good one, then the Kullback-Leibler Divergence between p and q will be small. If our model’s a lousy one, then this divergence will be large. If we have a couple different models, we can see which ones make for smaller divergences and which ones make for larger divergences. Probably we’ll want smaller divergences.
Here you might ask: why do we need a model? Isn’t the actual data the best model we might have? It’s a fair question. But no, real data is kind of lousy. It’s all messy. It’s complicated. We get extraneous little bits of nonsense clogging it up. And the next batch of results is going to be different from the old ones anyway, because real data always varies.
Furthermore, one of the purposes of a model is to be simpler than reality. A model should do away with complications so that it is easier to analyze, easier to make predictions with, and easier to teach than the reality is. But a model mustn’t be so simple that it can’t represent important aspects of the thing we want to study.
The Kullback-Leibler Divergence is a tool that we can use to quantify how much better one model or another fits our data. It also lets us quantify how much of the grit of reality we lose in our model. And this is at least some of the use of this quantity.
The United States is about to spend a good bit of time worrying about the NCAA men’s basketball tournament. It’s a good distraction from the women’s basketball tournament and from the National Invitational Tournament. Last year I used this to write a couple essays that stepped into information theory. Nobody knowledgeable in information theory has sent me threatening letters since. So since the inspiration is back in season I’d like to bring them to your attention again:
But How Interesting Is A Real Basketball Tournament? Because I started out assuming that games were perfectly even match ups either team was likely to win. This isn’t so. If we grant that a number-16 seed is almost sure to lose to a number-1 seed, how does the information content change?
As anyone who tries to be funny knows, some words just are funnier than others. Or at least they sound funnier. (This brings us into the problem of whether something is actually funny or whether we just think it is.) Westbury, Shaoul, Moroschan, and Ramscar try testing whether a common measure of how unpredictable something is — the entropy, a cornerstone of information theory — can tell us how funny a word might be.
We’ve encountered entropy in these parts before. I used it in that series earlier this year about how interesting a basketball tournament was. Entropy, in this context, is low if something is predictable. It gets higher the more unpredictable the thing being studied is. You see this at work in auto-completion: if you have typed in ‘th’, it’s likely your next letter is going to be an ‘e’. This reflects the low entropy of ‘the’ as an english word. It’s rather unlikely the next letter will be ‘j’, because English has few contexts that need ‘thj’ to be written out. So it will suggest words that start ‘the’ (and ‘tha’, and maybe even ‘thi’), while giving ‘thj’ and ‘thq’ and ‘thd’ a pass.
Westbury, Shaoul, Moroschan, and Ramscar found that the entropy of a word, how unlikely that collection of letters is to appear in an English word, matches quite well how funny people unfamiliar with it find it. This fits well with one of the more respectable theories of comedy, Arthur Schopenhauer’s theory that humor comes about from violating expectations. That matches well with unpredictability.
Of course it isn’t just entropy that makes words funny. Anyone trying to be funny learns that soon enough, since a string of perfect nonsense is also boring. But this is one of the things that can be measured and that does have an influence.
(I doubt there is any one explanation for why things are funny. My sense is that there are many different kinds of humor, not all of them perfectly compatible. It would be bizarre if any one thing could explain them all. But explanations for pieces of them are plausible enough.)
This month Peter Mander’s CarnotCycle blog talks about the interesting world of statistical equilibriums. And particularly it talks about stable equilibriums. A system’s in equilibrium if it isn’t going to change over time. It’s in a stable equilibrium if being pushed a little bit out of equilibrium isn’t going to make the system unpredictable.
For simple physical problems these are easy to understand. For example, a marble resting at the bottom of a spherical bowl is in a stable equilibrium. At the exact bottom of the bowl, the marble won’t roll away. If you give the marble a little nudge, it’ll roll around, but it’ll stay near where it started. A marble sitting on the top of a sphere is in an equilibrium — if it’s perfectly balanced it’ll stay where it is — but it’s not a stable one. Give the marble a nudge and it’ll roll away, never to come back.
In statistical mechanics we look at complicated physical systems, ones with thousands or millions or even really huge numbers of particles interacting. But there are still equilibriums, some stable, some not. In these, stuff will still happen, but the kind of behavior doesn’t change. Think of a steadily-flowing river: none of the water is staying still, or close to it, but the river isn’t changing.
CarnotCycle describes how to tell, from properties like temperature and pressure and entropy, when systems are in a stable equilibrium. These are properties that don’t tell us a lot about what any particular particle is doing, but they can describe the whole system well. The essay is higher-level than usual for my blog. But if you’re taking a statistical mechanics or thermodynamics course this is just the sort of essay you’ll find useful.
In terms of simplicity, purely mechanical systems have an advantage over thermodynamic systems in that stability and instability can be defined solely in terms of potential energy. For example the center of mass of the tower at Pisa, in its present state, must be higher than in some infinitely near positions, so we can conclude that the structure is not in stable equilibrium. This will only be the case if the tower attains the condition of metastability by returning to a vertical position or absolute stability by exceeding the tipping point and falling over.
Thermodynamic systems lack this simplicity, but in common with purely mechanical systems, thermodynamic equilibria are always metastable or stable, and never unstable. This is equivalent to saying that every spontaneous (observable) process proceeds towards an equilibrium state, never away from it.
If we restrict our attention to a thermodynamic system of unchanging composition and apply…
I do like looking for thematic links among the comic strips that mention mathematical topics that I gather for these posts. This time around all I can find is a theme of “nothing big going on”. I’m amused by some of them but don’t think there’s much depth to the topics. But I like them anyway.
Mark Anderson’s Andertoons gets its appearance here with the May 25th strip. And it’s a joke about the hatred of fractions. It’s a suitable one for posting in mathematics classes too, since it is right about naming three famous irrational numbers — pi, the “golden ratio” phi, and the square root of two — and the fact they can’t be written as fractions which use only whole numbers in the numerator and denominator. Pi is, well, pi. The square root of two is nice and easy to find, and has that famous legend about the Pythagoreans attached to it. And it’s probably the easiest number to prove is irrational.
The “golden ratio” is that number that’s about 1.618. It’s interesting because 1 divided by phi is about 0.618, and who can resist a symmetry like that? That’s about all there is to say for it, though. The golden ratio is otherwise a pretty boring number, really. It’s gained celebrity as an “ideal” ratio — that a rectangle with one side about 1.6 times as long as the other is somehow more appealing than other choices — but that’s rubbish. It’s a novelty act among numbers. Novelty acts are fun, but we should appreciate them for what they are.
Entropy is hard to understand. It’s deceptively easy to describe, and the concept is popular, but to understand it is challenging. In this month’s entry CarnotCycle talks about thermodynamic entropy and where it comes from. I don’t promise you will understand it after this essay, but you will be closer to understanding it.
Reversible change is a key concept in classical thermodynamics. It is important to understand what is meant by the term as it is closely allied to other important concepts such as equilibrium and entropy. But reversible change is not an easy idea to grasp – it helps to be able to visualize it.
Reversibility and mechanical systems
The simple mechanical system pictured above provides a useful starting point. The aim of the experiment is to see how much weight can be lifted by the fixed weight M1. Experience tells us that if a small weight M2 is attached – as shown on the left – then M1 will fall fast while M2 is pulled upwards at the same speed.
Experience also tells us that as the weight of M2 is increased, the lifting speed will decrease until a limit is reached when the weight difference between M2 and M1 becomes…
In this information-theory context, an experiment is just anything that could have different outcomes. A team can win or can lose or can tie in a game; that makes the game an experiment. The outcomes are the team wins, or loses, or ties. A team can get a particular score in the game; that makes that game a different experiment. The possible outcomes are the team scores zero points, or one point, or two points, or so on up to whatever the greatest possible score is.
If you know the probability p of each of the different outcomes, and since this is a mathematics thing we suppose that you do, then we have what I was calling the information content of the outcome of the experiment. That’s a number, measured in bits, and given by the formula
The sigma summation symbol means to evaluate the expression to the right of it for every value of some index j. The pj means the probability of outcome number j. And the logarithm may be that of any base, although if we use base two then we have an information content measured in bits. Those are the same bits as are in the bytes that make up the megabytes and gigabytes in your computer. You can see this number as an estimate of how many well-chosen yes-or-no questions you’d have to ask to pick the actual result out of all the possible ones.
I’d called this the information content of the experiment’s outcome. That’s an idiosyncratic term, chosen because I wanted to hide what it’s normally called. The normal name for this is the “entropy”.
To be more precise, it’s known as the “Shannon entropy”, after Claude Shannon, pioneer of the modern theory of information. However, the equation defining it looks the same as one that defines the entropy of statistical mechanics, that thing everyone knows is always increasing and somehow connected with stuff breaking down. Well, almost the same. The statistical mechanics one multiplies the sum by a constant number called the Boltzmann constant, after Ludwig Boltzmann, who did so much to put statistical mechanics in its present and very useful form. We aren’t thrown by that. The statistical mechanics entropy describes energy that is in a system but that can’t be used. It’s almost background noise, present but nothing of interest.
Is this Shannon entropy the same entropy as in statistical mechanics? This gets into some abstract grounds. If two things are described by the same formula, are they the same kind of thing? Maybe they are, although it’s hard to see what kind of thing might be shared by “how interesting the score of a basketball game is” and “how much unavailable energy there is in an engine”.
The legend has it that when Shannon was working out his information theory he needed a name for this quantity. John von Neumann, the mathematician and pioneer of computer science, suggested, “You should call it entropy. In the first place, a mathematical development very much like yours already exists in Boltzmann’s statistical mechanics, and in the second place, no one understands entropy very well, so in any discussion you will be in a position of advantage.” There are variations of the quote, but they have the same structure and punch line. The anecdote appears to trace back to an April 1961 seminar at MIT given by one Myron Tribus, who claimed to have heard the story from Shannon. I am not sure whether it is literally true, but it does express a feeling about how people understand entropy that is true.
Well, these entropies have the same form. And they’re given the same name, give or take a modifier of “Shannon” or “statistical” or some other qualifier. They’re even often given the same symbol; normally a capital S or maybe an H is used as the quantity of entropy. (H tends to be more common for the Shannon entropy, but your equation would be understood either way.)
I’m not comfortable saying they’re the same thing, though. After all, we use the same formula to calculate a batting average and to work out the average time of a commute. But we don’t think those are the same thing, at least not more generally than “they’re both averages”. These entropies measure different kinds of things. They have different units that just can’t be sensibly converted from one to another. And the statistical mechanics entropy has many definitions that not just don’t have parallels for information, but wouldn’t even make sense for information. I would call these entropies siblings, with strikingly similar profiles, but not more than that.
But let me point out something about the Shannon entropy. It is low when an outcome is predictable. If the outcome is unpredictable, presumably knowing the outcome will be interesting, because there is no guessing what it might be. This is where the entropy is maximized. But an absolutely random outcome also has a high entropy. And that’s boring. There’s no reason for the outcome to be one option instead of another. Somehow, as looked at by the measure of entropy, the most interesting of outcomes and the most meaningless of outcomes blur together. There is something wondrous and strange in that.
I’d worked out an estimate of how much information content there is in a basketball score, by which I was careful to say the score that one team manages in a game. I wasn’t able to find out what the actual distribution of real-world scores was like, unfortunately, so I made up a plausible-sounding guess: that college basketball scores would be distributed among the imaginable numbers (whole numbers from zero through … well, infinitely large numbers, though in practice probably not more than 150) according to a very common distribution called the “Gaussian” or “normal” distribution, that the arithmetic mean score would be about 65, and that the standard deviation, a measure of how spread out the distribution of scores is, would be about 10.
If those assumptions are true, or are at least close enough to true, then there are something like 5.4 bits of information in a single team’s score. Put another way, if you were trying to divine the score by asking someone who knew it a series of carefully-chosen questions, like, “is the score less than 65?” or “is the score more than 39?”, with at each stage each question equally likely to be answered yes or no, you could expect to hit the exact score with usually five, sometimes six, such questions.
When I worked out how interesting, in an information-theory sense, a basketball game — and from that, a tournament — might be, I supposed there was only one thing that might be interesting about the game: who won? Or to be exact, “did (this team) win”? But that isn’t everything we might want to know about a game. For example, we might want to know what a team scored. People often do. So how to measure this?
When I wrote about how interesting the results of a basketball tournament were, and came to the conclusion that it was 63 (and filled in that I meant 63 bits of information), I was careful to say that the outcome of a basketball game between two evenly-matched opponents has an information content of 1 bit. If the game is a foregone conclusion, then the game hasn’t got so much information about it. If the game really is foregone, the information content is 0 bits; you already know what the result will be. If the game is an almost sure thing, there’s very little information to be had from actually seeing the game. An upset might be thrilling to watch, but you would hardly count on that, if you’re being rational. But most games aren’t sure things; we might expect the higher-seed to win, but it’s plausible they don’t. How does that affect how much information there is in the results of a tournament?
Last year, the NCAA College Men’s Basketball tournament inspired me to look up what the outcomes of various types of matches were, and which teams were more likely to win than others. If some person who wrote something for statistics.about.com is correct, based on 27 years of March Madness outcomes, the play between a number one and a number 16 seed is a foregone conclusion — the number one seed always wins — while number two versus number 15 is nearly sure. So while the first round of play will involve 32 games — four regions, each region having eight games — there’ll be something less than 32 bits of information in all these games, since many of them are so predictable.
So if the eight contests in a single region were all evenly matched, the information content of that region would be 8 bits. But there’s one sure and one nearly-sure game in there, and there’s only a couple games where the two teams are close to evenly matched. As a result, I make out the information content of a single region to be about 5.392 bits of information. Since there’s four regions, that means the first round of play — the first 32 games — have altogether about 21.567 bits of information.
Warning: I used three digits past the decimal point just because three is a nice comfortable number. Do not by hypnotized into thinking this is a more precise measure than it really is. I don’t know what the precise chance of, say, a number three seed beating a number fourteen seed is; all I know is that in a 27-year sample, it happened the higher-seed won 85 percent of the time, so the chance of the higher-seed winning is probably close to 85 percent. And I only know that if whoever it was wrote this article actually gathered and processed and reported the information correctly. I would not be at all surprised if the first round turned out to have only 21.565 bits of information, or as many as 21.568.
A statistical analysis of the tournaments which I dug up last year indicated that in the last three rounds — the Elite Eight, Final Four, and championship game — the higher- and lower-seeded teams are equally likely to win, and therefore those games have an information content of 1 bit per game. The last three rounds therefore have 7 bits of information total.
Unfortunately, experimental data seems to fall short for the second round — 16 games, where the 32 winners in the first round play, producing the Sweet Sixteen teams — and the third round — 8 games, producing the Elite Eight. If someone’s done a study of how often the higher-seeded team wins I haven’t run across it.
There are six of these games in each of the four regions, for 24 games total. Presumably the higher-seeded is more likely than the lower-seeded to win, but I don’t know how much more probable it is the higher-seed will win. I can come up with some bounds: the 24 games total in the second and third rounds can’t have an information content less than 0 bits, since they’re not all foregone conclusions. The higher-ranked seed won’t win all the time. And they can’t have an information content of more than 24 bits, since that’s how much there would be if the games were perfectly even matches.
So, then: the first round carries about 21.567 bits of information. The second and third rounds carry between 0 and 24 bits. The fourth through sixth rounds (the sixth round is the championship game) carry seven bits. Overall, the 63 games of the tournament carry between 28.567 and 52.567 bits of information. I would expect that many of the second-round and most of the third-round games are pretty close to even matches, so I would expect the higher end of that range to be closer to the true information content.
Let me make the assumption that in this second and third round the higher-seed has roughly a chance of 75 percent of beating the lower seed. That’s a number taken pretty arbitrarily as one that sounds like a plausible but not excessive advantage the higher-seeded teams might have. (It happens it’s close to the average you get of the higher-seed beating the lower-seed in the first round of play, something that I took as confirming my intuition about a plausible advantage the higher seed has.) If, in the second and third rounds, the higher-seed wins 75 percent of the time and the lower-seed 25 percent, then the outcome of each game is about 0.8113 bits of information. Since there are 24 games total in the second and third rounds, that suggests the second and third rounds carry about 19.471 bits of information.
Warning: Again, I went to three digits past the decimal just because three digits looks nice. Given that I do not actually know the chance a higher-seed beats a lower-seed in these rounds, and that I just made up a number that seems plausible you should not be surprised if the actual information content turns out to be 19.468 or even 19.472 bits of information.
Taking all these numbers, though — the first round with its something like 21.567 bits of information; the second and third rounds with something like 19.471 bits; the fourth through sixth rounds with 7 bits — the conclusion is that the win/loss results of the entire 63-game tournament are about 48 bits of information. It’s a bit higher the more unpredictable the games involving the final 32 and the Sweet 16 are; it’s a bit lower the more foregone those conclusions are. But 48 bits sounds like a plausible enough answer to me.
When I wrote last weekend’s piece about how interesting a basketball tournament was, I let some terms slide without definition, mostly so I could explain what ideas I wanted to use and how they should relate. My love, for example, read the article and looked up and asked what exactly I meant by “interesting”, in the attempt to measure how interesting a set of games might be, even if the reasoning that brought me to a 63-game tournament having an interest level of 63 seemed to satisfy.
When I spoke about something being interesting, what I had meant was that it’s something whose outcome I would like to know. In mathematical terms this “something whose outcome I would like to know” is often termed an `experiment’ to be performed or, even better, a `message’ that presumably I wil receive; and the outcome is the “information” of that experiment or message. And information is, in this context, something you do not know but would like to.
So the information content of a foregone conclusion is low, or at least very low, because you already know what the result is going to be, or are pretty close to knowing. The information content of something you can’t predict is high, because you would like to know it but there’s no (accurately) guessing what it might be.
This seems like a straightforward idea of what information should mean, and it’s a very fruitful one; the field of “information theory” and a great deal of modern communication theory is based on them. This doesn’t mean there aren’t some curious philosophical implications, though; for example, technically speaking, this seems to imply that anything you already know is by definition not information, and therefore learning something destroys the information it had. This seems impish, at least. Claude Shannon, who’s largely responsible for information theory as we now know it, was renowned for jokes; I recall a Time Life science-series book mentioning how he had built a complex-looking contraption which, turned on, would churn to life, make a hand poke out of its innards, and turn itself off, which makes me smile to imagine. Still, this definition of information is a useful one, so maybe I’m imagining a prank where there’s not one intended.
And something I hadn’t brought up, but which was hanging awkwardly loose, last time was: granted that the outcome of a single game might have an interest level, or an information content, of 1 unit, what’s the unit? If we have units of mass and length and temperature and spiciness of chili sauce, don’t we have a unit of how informative something is?
We have. If we measure how interesting something is — how much information there is in its result — using base-two logarithms the way we did last time, then the unit of information is a bit. That is the same bit that somehow goes into bytes, which go on your computer into kilobytes and megabytes and gigabytes, and onto your hard drive or USB stick as somehow slightly fewer gigabytes than the label on the box says. A bit is, in this sense, the amount of information it takes to distinguish between two equally likely outcomes. Whether that’s a piece of information in a computer’s memory, where a 0 or a 1 is a priori equally likely, or whether it’s the outcome of a basketball game between two evenly matched teams, it’s the same quantity of information to have.
So a March Madness-style tournament has an information content of 63 bits, if all you’re interested in is which teams win. You could communicate the outcome of the whole string of matches by indicating whether the “home” team wins or loses for each of the 63 distinct games. You could do it with 63 flashes of light, or a string of dots and dashes on a telegraph, or checked boxes on a largely empty piece of graphing paper, coins arranged tails-up or heads-up, or chunks of memory on a USB stick. We’re quantifying how much of the message is independent of the medium.