Let Me Remind You How Interesting a Basketball Tournament Is

Several years ago I stumbled into a nice sequence. All my nice sequences have been things I stumbled upon. This one looked at the most basic elements of information theory by what they tell us about the NCAA College Basketball tournament. This is (in the main) a 64-team single-elimination playoff. It’s been a few years since I ran through the sequence. But it’s been a couple years since the tournament could be run with a reasonably clear conscience too. So here’s my essays:

How Interesting Is A Basketball Tournament? 63.
What We Talk About When We Talk About How Interesting What We’re Talking About Is, that is, 63 what.
But How Interesting Is A Real Basketball Tournament? considering it’s not really 63 exactly.
But How Interesting Is A Basketball Score? I make it out at about 5 ¹/₃.
Doesn’t The Other Team Count? How Much? I figure about 10.8.
A Little More Talk About What We Talk About When We Talk About How Interesting What We Talk About Is to put some technical terms in, and ask a question I admit I can’t answer.

And this spins off to questions about other sports events.

How Interesting Is A Football Score? Something like 8.72.
How Interesting Is A Baseball Score? Some Partial Results The historical record suggests it’s between 3.76 and 9.25.
How Interesting Is A Baseball Score? Some Further Results I have reasons to say it’s about 7.
How Interesting Is A Low-Scoring Game? If I make up numbers, then I come to say it’s about 5.8.

And I still figure to get to this year’s Pi Day comic strips. Soon. It’s been a while since I felt I had so much to write up.

My 2019 Mathematics A To Z: Encryption schemes

Today’s A To Z term is encryption schemes. It’s another suggested by aajohannas. It’s a chance to dip into information theory.

Mr Wu, author of the Mathtuition88 blog, suggested the Extreme Value Theorem. I was tempted and then realized that I had written this in the 2018 A-to-Z, as the “X” letter. The end of the alphabet has a shortage of good mathematics words. Sometimes we have to work around problems.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'. — Art by Thomas K Dye, creator of the web comics **Projection Edge**, **Newshounds**, **Infinity Refugees**, and **Something Happens**. He’s on Twitter as @projectionedge. You can get to read **Projection Edge** six months early by subscribing to his Patreon.

Encryption schemes.

Why encrypt anything?

The oldest reason is to hide a message, at least from all but select recipients. Ancient encryption methods will substitute one letter for another, or will mix up the order of letters in a message. This won’t hide a message forever. But it will slow down a person trying to decrypt the message until they decide they don’t need to know what it says. Or decide to bludgeon the message-writer into revealing the secret.

Substituting one letter for another won’t stop an eavesdropper from working out the message. Not indefinitely, anyway. There are patterns in the language. Any language, but take English as an example. A single-letter word is either ‘I’ or ‘A’. A two-letter word has a great chance of being ‘in’, ‘on’, ‘by’, ‘of’, ‘an’, or a couple other choices. Solving this is a fun pastime, for people who like this. If you need it done fast, let a computer work it out.

To hide the message better requires being cleverer. For example, you could substitue letters according to a slightly different scheme for each letter in the original message. The Vignère cipher is an example of this. I remember some books from my childhood, written in the second person. They had programs that you-the-reader could type in to live the thrill of being a child secret agent computer programmer. This encryption scheme was one of the programs used for passing on messages. We can make the plans more complicated yet, but that won’t give us better insight yet.

The objective is to turn the message into something less predictable. An encryption which turns, say, ‘the’ into ‘rgw’ will slow the reader down. But if they pay attention and notice, oh, the text also has the words ‘rgwm’, ‘rgey’, and rgwb’ turn up a lot? It’s hard not to suspect these are ‘the’, ‘them’, ‘they’, and ‘then’. If a different three-letter code is used for every appearance of ‘the’, good. If there’s a way to conceal the spaces as something else, that’s even better, if we want it harder to decrypt the message.

So the messages hardest to decrypt should be the most random. We can give randomness a precise definition. We owe it to information theory, which is the study of how to encode and successfully transmit and decode messages. In this, the information content of a message is its entropy. Yes, the same word as used to describe broken eggs and cream stirred into coffee. The entropy measures how likely each possible message is. Encryption matches the message you really want with a message of higher entropy. That is, one that’s harder to predict. Decrypting reverses that matching.

So what goes into a message? We call them words, or codewords, so we have a clear noun to use. A codeword is a string of letters from an agreed-on alphabet. The terminology draws from common ordinary language. Cryptography grew out of sending sentences.

But anything can be the letters of the alphabet. Any string of them can be a codeword. An unavoidable song from my childhood told the story of a man asking his former lover to tie a yellow ribbon around an oak tree. This is a tiny alphabet, but it only had to convey two words, signalling whether she was open to resuming their relationship. Digital computers use an alphabet of two memory states. We label them ‘0’ and ‘1’, although we could as well label them +5 and -5, or A and B, or whatever. It’s not like actual symbols are scrawled very tight into the chips. Morse code uses dots and dashes and short and long pauses. Naval signal flags have a set of shapes and patterns to represent the letters of the alphabet, as well as common or urgent messages. There is not a single universally correct number of letters or length of words for encryption. It depends on what the code will be used for, and how.

Naval signal flags help me to my next point. There’s a single pattern which, if shown, communicates the message “I require a pilot”. Another, “I am on fire and have dangerous cargo”. Still another, “All persons should report on board as the vessel is about to set to sea”. These are whole sentences; they’re encrypted into a single letter.

And this is the second great use of encryption. English — any human language — has redundancy to it. Think of the sentence “No, I’d rather not go out this evening”. It’s polite, but is there anything in it not communicated by texting back “N”? An encrypted message is, often, shorter than the original. To send a message costs something. Time, if nothing else. To send it more briefly is typically better.

There are dangers to this. Strike out any word from “No, I’d rather not go out this evening”. Ask someone to guess what belongs there. Only the extroverts will have trouble. I guess if you strike out “evening” people might guess “time” or “weekend” or something. The sentiment of the sentence endures.

But strike out a letter from “N” and ask someone to guess what was meant. And this is a danger of encryption. The encrypted message has a higher entropy, a higher unpredictability. If some mistake happens in transmission, we’re lost.

We can fight this. It’s possible to build checks into an encryption. To carry a bit of extra information that lets one know that the message was garbled. These are “error-detecting codes”. It’s even possible to carry enough extra information to correct some errors. These are “error-correcting codes”. There are limits, of course. This kind of error-correcting takes calculation time and message space. We lose some economy but gain reliability. There is a general lesson in this.

And not everything can compress. There are (if I’m reading this right) 26 letter, 10 numeral, and four repeater flags used under the International Code of Symbols. So there are at most 40 signals that could be reduced to a single flag. If we need to communicate “I am on fire but have no dangerous cargo” we’re at a loss. We have to spell things out more. It’s a quick proof, by way of the pigeonhole principle, which tells us that not every message can compress. But this is all right. There are many messages we will never need to send. (“I am on fire and my cargo needs updates on Funky Winkerbean.”) If it’s mostly those that have no compressed version, who cares?

Encryption schemes are almost as flexible as language itself. There are families of kinds of schemes. This lets us fit schemes to needs: how many different messages do we need to be able to send? How sure do we need to be that errors are corrected? Or that errors are detected? How hard do we want it to be for eavesdroppers to decode the message? Are we able to set up information with the intended recipients separately? What we need, and what we are willing to do without, guide the scheme we use.

Thank you again for reading. All of Fall 2019 A To Z posts should be at this link. I hope to have a letter F piece on Thursday. All of the A To Z essays should be at this link and if I can sort out some trouble with the first two, they will be soon. And if you’d like to nominate topics for essays, I’m asking for the letters I through N at this link.

Let Me Tell You How Interesting March Madness Could Possibly Be

I read something alarming in the daily “Best of GoComics” e-mail this morning. It was a panel of Dave Whamond’s Reality Check. It’s a panel comic, although it stands out from the pack by having a squirrel character in the margins. And here’s the panel.

Three mathematicians standing around chalkboards. One says: 'My pick is (9 +/- sqrt(5)). What's your bracket?' '(x + 3).' '(n - 1).' Caption: 'March Mathness'. — Dave Whamond’s **Reality Check** for the 2nd of March, 2019. **Edge City** inspires discussions in the essays at this link. I don’t know when or how **Reality Check** dropped off my comics page. It must have been after October 2018. Here’s essays including **Reality Check**, to serve as proof. I probably won’t go reading five months’ worth of the strip to get all the strips I’d missed. But the strip should return to my regular Reading the Comics posts now.

Certainly a solid enough pun to rate a mention. I don’t know of anyone actually doing a March Mathness bracket, but it’s not a bad idea. Rating mathematical terms for their importance or usefulness or just beauty might be fun. And might give a reason to talk about their meaning some. It’s a good angle to discuss what’s interesting about mathematical terms.

And that lets me segue into talking about a set of essays. The next few weeks see the NCAA college basketball tournament, March Madness. I’ve used that to write some stuff about information theory, as it applies to the question: is a basketball game interesting?

How Interesting Is A Basketball Tournament? leads the series off. From it I answer: 63.
What We Talk About When We Talk About How Interesting What We’re Talking About Is fills in a bit of context. I answer 63 what.
But How Interesting Is A Real Basketball Tournament? starts to shade the answer, reflecting things like how the number 1 seed nearly always beats the number 16 seed. It had never happened, when I wrote this essay. The number 16 beat the number 1 seed for the first time, I think, last year.
But How Interesting Is A Basketball Score? This doesn’t relate to the essay, but a few weeks ago I read a book about the New York Original Celtics, who played — and invented — much of professional basketball in the 1910s and 1920s, and it’s all fascinating but it also mentions newspaper clippings of, like, the greatest game anyone had ever seen and the score was 28 to 25. (The game had much less offense and much more defense back then, plus you couldn’t necessarily count on the hoop having features like a backboard or stuff.)
Doesn’t The Other Team Count? How Much? Earlier I put up an answer about how interesting one team’s score was. The tricky part is the other team has a score, too, and you know it’s not the same as the first team’s. So, how to account for that?
A Little More Talk About What We Talk About When We Talk About How Interesting What We Talk About Is as I circle back around to the 63 or some other number that’s appeared before, and I name just what the 63 of things are.

Along the way here I got to looking up actual scoring results from major sports. This let me estimate the information-theory content of the scores of soccer, (US) football, and baseball scores, to match my estimate of basketball scores’ information content.

How Interesting Is A Football Score? Football scoring is a complicated thing. But I was able to find a trove of historical data to give me an estimate of the information theory content of a score.
How Interesting Is A Baseball Score? Some Partial Results I found some summaries of actual historical baseball scores. Somehow I couldn’t find the detail I wanted for baseball, a sport that since 1845 has kept track of every possible bit of information, including how long the games ran, about every game ever. I made do, though.
How Interesting Is A Baseball Score? Some Further Results Since I found some more detailed summaries and refined the estimate a little.
How Interesting Is A Low-Scoring Game? And here, well, I start making up scores. It’s meant to represent low-scoring games such as soccer, hockey, or baseball to draw some conclusions. This includes the question: just because a distribution of small whole numbers is good for mathematicians, is that a good match for what sports scores are like?

Is A Basketball Tournament Interesting? My Thoughts

It’s a good weekend to bring this back. I have some essays about information theory and sports contests and maybe you missed them earlier. Here goes.

How Interesting Is A Basketball Tournament? I say: 63.
What We Talk About When We Talk About How Interesting What We’re Talking About Is and so I say 63 what.
But How Interesting Is A Real Basketball Tournament? in which I say: Nah, 63 is too interesting. There’s some games you really don’t have to watch to know how they turned out. They’re the ones where the number 1 seed beats up the number 16. And nearly all the ones where the number 2 seed plays the number 15.
But How Interesting Is A Basketball Score? If you want to know more than just who won and who lost ? I say: oh, something like 5.4.
Doesn’t The Other Team Count? How Much? Since you can be pretty sure the two teams in a basketball game won’t tie, what’s that tell you about the two teams’ score? Nothing much, turns out.
A Little More Talk About What We Talk About When We Talk About How Interesting What We Talk About Is and I finally say what these 63 or 5.4 or 10.8 things are. You know the name of them.

And then for a follow-up I started looking into actual scoring results from major sports. This let me estimate the information-theory content of the scores of soccer, (US) football, and baseball scores, to match my estimate of basketball scores’ information content.

How Interesting Is A Football Score? US football has an annoyingly complicated set of scoring rules. But it’s also got enough historical data that I can make an estimate: about 8.7.
How Interesting Is A Baseball Score? Some Partial Results Baseball is even better about keeping statistics than football is, but I couldn’t get at convenient summaries. So I made do with what I could find.
How Interesting Is A Baseball Score? Some Further Results With the help of someone else’s aggregation of data I made a slightly more precise estimate.
How Interesting Is A Low-Scoring Game? This uses distributions, estimates of what kinds of results are likely for low-scoring games such as soccer, hockey, or baseball to draw some conclusions, including questioning whether distributions that mathematicians like are actually good fits for sports.

Don’t try to use this to pass your computer science quals. But I hope it gives you something interesting to talk about while sulking over your brackets, and maybe to read about after that.

The Arthur Christmas Season

I don’t know how you spend your December, but part of it really ought to be done watching the Aardman Animation film Arthur Christmas. It inspired me to ponder a mathematical-physics question that got into some heady territory and this is a good time to point people back to that.

The first piece is Could `Arthur Christmas’ Happen In Real Life? At one point in the movie Arthur and Grand-Santa are stranded on a Caribbean island while the reindeer and sleigh, without them, go flying off in a straight line. This raises the question of what is a straight line if you’re on the surface of something spherical like the Earth. Also, Grand-Santa is such a fantastic idea for the Santa canon it’s hard to believe that Rankin-Bass never did it.

Returning To Arthur Christmas was titled that because I’d left the subject for a couple weeks. You know how it gets. Here the discussion becomes more spoiler-y. And it has to address the question of what kind of straight line the reindeer might move in. There’s several possible answers and they’re all interesting.

Arthur Christmas And The Least Common Multiple supposes that reindeer move as way satellites do. By making some assumptions about the speed of the reindeer and the path they’re taking, I get to see how long Arthur and Grand-Santa would need to wait before the reindeer and sled are back if they’re lucky enough to be waiting on the equator.

Six Minutes Off makes the problem of Arthur and Grand-Santa waiting for the return of flying reindeer more realistic. This involves supposing that they’re not on the equator, which makes meeting up the reindeer a much nastier bit of timing. If they get unlucky it could make their rescue take over five thousand years, which would complicate the movie’s plot some.

And finally Arthur Christmas and the End of Time gets into one of those staggering thoughts. This would be recurrence, an idea that weaves into statistical mechanics and that seems to require that we accept how the conservation of energy and the fact of entropy are, together, a paradox. So we get into considerations of the long-term fate of the universe. Maybe.

How Interesting Is March Madness?

And now let me close the week with some other evergreen articles. A couple years back I mixed the NCAA men’s basketball tournament with information theory to produce a series of essays that fit the title I’ve given this recap. They also sprawl out into (US) football and baseball. Let me link you to them:

How Interesting Is A Basketball Tournament? in which I consider 63 single games and whether one team or the other wins, which is what usually happens.
What We Talk About When We Talk About How Interesting What We’re Talking About Is as I fill in some terminology.
But How Interesting Is A Real Basketball Tournament? noting that sometimes you know who’s going to win a game before the game even starts.
But How Interesting Is A Basketball Score? in which I open information theory to points-shaving.
Doesn’t The Other Team Count? How Much? in which I ponder how to extend the information content of a single score to cover the case of two teams being in the game.
A Little More Talk About What We Talk About When We Talk About How Interesting What We Talk About Is which fills in some more of the terminology and historical content.
How Interesting Is A Football Score? as I try to figure out (US) football as an information-theory puzzle.
How Interesting Is A Baseball Score? Some Partial Results that are based on historic data, so far as I could find, and that should extend to any low-scoring sport.
How Interesting Is A Baseball Score? Some Further Results as I got some other historical data and refined my estimate based on what scores actually turn up a lot, as opposed to that time a game ended with a score of 49 to 33.
How Interesting Is A Low-Scoring Game? in which I address baseball and any other low-scoring sport, such as soccer or hockey, by the simple process of making up data and seeing what those imply.

Reading the Comics, February 6, 2017: Another Pictureless Half-Week Edition

Got another little flood of mathematically-themed comic strips last week and so once again I’ll split them along something that looks kind of middle-ish. Also this is another bunch of GoComics.com-only posts. Since those seem to be accessible to anyone whether or not they’re subscribers indefinitely far into the future I don’t feel like I can put the comics directly up and will trust you all to click on the links that you find interesting. Which is fine; the new GoComics.com design makes it annoyingly hard to download a comic strip. I don’t think that was their intention. But that’s one of the two nagging problems I have with their new design. So you know.

Tony Cochran’s Agnes for the 5th sees a brand-new mathematics. Always dangerous stuff. But mathematicians do invent, or discover, new things in mathematics all the time. Part of the task is naming the things in it. That’s something which takes talent. Some people, such as Leonhard Euler, had the knack a great novelist has for putting names to things. The rest of us muddle along. Often if there’s any real-world inspiration, or resemblance to anything, we’ll rely on that. And we look for terminology that evokes similar ideas in other fields. … And, Agnes would like to know, there is mathematics that’s about approximate answers, being “right around” the desired answer. Unfortunately, that’s hard. (It’s all hard, if you’re going to take it seriously, much like everything else people do.)

Scott Hilburn’s The Argyle Sweater for the 5th is the anthropomorphic numerals joke for this essay.

Carol Lay’s Lay Lines for the 6th depicts the hazards of thinking deeply and hard about the infinitely large and the infinitesimally small. They’re hard. Our intuition seems well-suited to handing a modest bunch of household-sized things. Logic guides us when thinking about the infinitely large or small, but it takes a long time to get truly conversant and comfortable with it all.

Paul Gilligan’s Pooch Cafe for the 6th sees Poncho try to argue there’s thermodynamical reasons for not being kind. Reasoning about why one should be kind (or not) is the business of philosophers and I won’t overstep my expertise. Poncho’s mathematics, that’s something I can write about. He argues “if you give something of yourself, you inherently have less”. That seems to be arguing for a global conservation of self-ness, that the thing can’t be created or lost, merely transferred around. That’s fair enough as a description of what the first law of thermodynamics tells us about energy. The equation he reads off reads, “the change in the internal energy (Δ U) equals the heat added to the system (U) minus the work done by the system (W)”. Conservation laws aren’t unique to thermodynamics. But Poncho may be aware of just how universal and powerful thermodynamics is. I’m open to an argument that it’s the most important field of physics.

Jonathan Lemon’s Rabbits Against Magic for the 6th is another strip Intro to Calculus instructors can use for their presentation on instantaneous versus average velocities. There’s been a bunch of them recently. I wonder if someone at Comic Strip Master Command got a speeding ticket.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 6th is about numeric bases. They’re fun to learn about. There’s an arbitrariness in the way we represent concepts. I think we can understand better what kinds of problems seem easy and what kinds seem harder if we write them out different ways. But base eleven is only good for jokes.

Reading the Comics, February 3, 2017: Counting Edition

And now I can close out last week’s mathematically-themed comic strips. Two of them are even about counting, which is enough for me to make that the name of this set.

John Allen’s Nest Heads for the 2nd mentions a probability and statistics class and something it’s supposed to be good for. I would agree that probability and statistics are probably (I can’t find a better way to write this) the most practically useful mathematics one can learn. At least once you’re past arithmetic. They’re practical by birth; humans began studying them because they offer guidance in uncertain situations. And one can use many of their tools without needing more than arithmetic.

I’m not so staunchly anti-lottery as many mathematics people are. I’ll admit I play it myself, when the jackpot is large enough. When the expectation value of the prize gets to be positive, it’s harder to rationalize not playing. This happens only once or twice a year, but it’s fun to watch and see when it happens. I grant it’s a foolish way to use two dollars (two tickets are my limit), but you know? My budget is not so tight I can’t spend four dollars foolishly a year. Besides, I don’t insist on winning one of those half-billion-dollar prizes. I imagine I’d be satisfied if I brought in a mere $10,000.

'Hey, Ruthie's Granny, how old are you?' 'You can't count that high, James.' 'I can too!' 'Fine! Start at one and I'll tell you when you get to my age.' '1, 2, 3, 4, 11, 22, 88, 99, 200, a gazillion!' 'Very good! It's somewhere between 22 and a gazillion!' 'Gazowie!' — Rick Detorie’s **One Big Happy** for the 3rd of February, 2017. A ‘gazillion’ is actually a surprisingly low number, hovering as it does somewhere around 212. Fun fact!

Rick Detorie’s One Big Happy for the 3rd continues my previous essay’s bit of incompetence at basic mathematics, here, counting. But working out that her age is between 22 an a gazillion may be worth doing. It’s a common mathematical challenge to find a correct number starting from little information about it. Usually we find it by locating bounds: the number must be larger than this and smaller than that. And then get the bounds closer together. Stop when they’re close enough for our needs, if we’re numerical mathematicians. Stop when the bounds are equal to each other, if we’re analytic mathematicians. That can take a lot of work. Many problems in number theory amount to “improve our estimate of the lowest (or highest) number for which this is true”. We have to start somewhere.

Samson’s Dark Side of the Horse for the 3rd is a counting-sheep joke and I was amused that the counting went so awry here. On looking over the strip again for this essay, though, I realize I read it wrong. It’s the fences that are getting counted, not the sheep. Well, it’s a cute little sheep having the same problems counting that Horace has. We don’t tend to do well counting more than around seven things at a glance. We can get a bit farther if we can group things together and spot that, say, we have four groups of four fences each. That works and it’s legitimate; we’re counting and we get the right count out of it. But it does feel like we’re doing something different from how we count, say, three things at a glance.

Mick Mastroianni and Mason MastroianniDogs of C Kennel for the 3rd is about the world’s favorite piece of statistical mechanics, entropy. There’s room for quibbling about what exactly we mean by thermodynamics saying all matter is slowly breaking down. But the gist is fair enough. It’s still mysterious, though. To say that the disorder of things is always increasing forces us to think about what we mean by disorder. It’s easy to think we have an idea what we mean by it. It’s hard to make that a completely satisfying definition. In this way it’s much like randomness, which is another idea often treated as the same as disorder.

Bill Amend’s FoxTrot Classics for the 3rd reprinted the comic from the 10th of February, 2006. Mathematics teachers always want to see how you get your answers. Why? … Well, there are different categories of mistakes someone can make. One can set out trying to solve the wrong problem. One can set out trying to solve the right problem in a wrong way. One can set out solving the right problem in the right way and get lost somewhere in the process. Or one can be doing just fine and somewhere along the line change an addition to a subtraction and get what looks like the wrong answer. Each of these is a different kind of mistake. Knowing what kinds of mistakes people make is key to helping them not make these mistakes. They can get on to making more exciting mistakes.

Reading the Comics, August 19, 2016: Mathematics Signifier Edition

I know it seems like when I write these essays I spend the most time on the first comic in the bunch and give the last ones a sentence, maybe two at most. I admit when there’s a lot of comics I have to write up at once my energy will droop. But Comic Strip Master Command apparently wants the juiciest topics sent out earlier in the week. I have to follow their lead.

Stephen Beals’s Adult Children for the 14th uses mathematics to signify deep thinking. In this case Claremont, the dog, is thinking of the Riemann Zeta function. It’s something important in number theory, so longtime readers should know this means it leads right to an unsolved problem. In this case it’s the Riemann Hypothesis. That’s the most popular candidate for “what is the most important unsolved problem in mathematics right now?” So you know Claremont is a deep-thinking dog.

The big Σ ordinary people might recognize as representing “sum”. The notation means to evaluate, for each legitimate value of the thing underneath — here it’s ‘n’ — the value of the expression to the right of the Sigma. Here that’s $\frac{1}{n^s}$ . Then add up all those terms. It’s not explicit here, but context would make clear, n is positive whole numbers: 1, 2, 3, and so on. s would be a positive number, possibly a whole number.

The big capital Pi is more mysterious. It’s Sigma’s less popular brother. It means “product”. For each legitimate value of the thing underneath it — here it’s “p” — evaluate the expression on the right. Here that’s $\frac{1}{1 - \frac{1}{p^s}}$ . Then multiply all that together. In the context of the Riemann Zeta function, “p” here isn’t just any old number, or even any old whole number. It’s only the prime numbers. Hence the “p”. Good notation, right? Yeah.

This particular equation, once shored up with the context the symbols live in, was proved by Leonhard Euler, who proved so much you sometimes wonder if later mathematicians were needed at all. It ties in to how often whole numbers are going to be prime, and what the chances are that some set of numbers are going to have no factors in common. (Other than 1, which is too boring a number to call a factor.) But even if Claremont did know that Euler got there first, it’s almost impossible to do good new work without understanding the old.

Charlos Gary’s Working It Out for the 14th is this essay’s riff on pie charts. Or bar charts. Somewhere around here the past week I read that a French idiom for the pie chart is the “cheese chart”. That’s a good enough bit I don’t want to look more closely and find out whether it’s true. If it turned out to be false I’d be heartbroken.

Ryan North’s Dinosaur Comics for the 15th talks about everyone’s favorite physics term, entropy. Everyone knows that it tends to increase. Few advanced physics concepts feel so important to everyday life. I almost made one expression of this — Boltzmann’s H-Theorem — a Theorem Thursday post. I might do a proper essay on it yet. Utahraptor describes this as one of “the few statistical laws of physics”, which I think is a bit unfair. There’s a lot about physics that is statistical; it’s often easier to deal with averages and distributions than the mass of real messy data.

Utahraptor’s right to point out that it isn’t impossible for entropy to decrease. It can be expected not to, in time. Indeed decent scientists thinking as philosophers have proposed that “increasing entropy” might be the only way to meaningfully define the flow of time. (I do not know how decent the philosophy of this is. This is far outside my expertise.) However: we would expect at least one tails to come up if we simultaneously flipped infinitely many coins fairly. But there is no reason that it couldn’t happen, that infinitely many fairly-tossed coins might all come up heads. The probability of this ever happening is zero. If we try it enough times, it will happen. Such is the intuition-destroying nature of probability and of infinitely large things.

Tony Cochran’s Agnes on the 16th proposes to decode the Voynich Manuscript. Mathematics comes in as something with answers that one can check for comparison. It’s a familiar role. As I seem to write three times a month, this is fair enough to say to an extent. Coming up with an answer to a mathematical question is hard. Checking the answer is typically easier. Well, there are many things we can try to find an answer. To see whether a proposed answer works usually we just need to go through it and see if the logic holds. This might be tedious to do, especially in those enormous brute-force problems where the proof amounts to showing there are a hundred zillion special cases and here’s an answer for each one of them. But it’s usually a much less hard thing to do.

Johnny Hart and Brant Parker’s Wizard of Id Classics for the 17th uses what seems like should be an old joke about bad accountants and nepotism. Well, you all know how important bookkeeping is to the history of mathematics, even if I’m never that specific about it because it never gets mentioned in the histories of mathematics I read. And apparently sometime between the strip’s original appearance (the 20th of August, 1966) and my childhood the Royal Accountant character got forgotten. That seems odd given the comic potential I’d imagine him to have. Sometimes a character’s only good for a short while is all.

Mark Anderson’s Andertoons for the 18th is the Andertoons representative for this essay. Fair enough. The kid speaks of exponents as a kind of repeating oneself. This is how exponents are inevitably introduced: as multiplying a number by itself many times over. That’s a solid way to introduce raising a number to a whole number. It gets a little strained to describe raising a number to a rational number. It’s a confusing mess to describe raising a number to an irrational number. But you can make that logical enough, with effort. And that’s how we do make the idea rigorous. A number raised to (say) the square root of two is something greater than the number raised to 1.4, but less than the number raised to 1.5. More than the number raised to 1.41, less than the number raised to 1.42. More than the number raised to 1.414, less than the number raised to 1.415. This takes work, but it all hangs together. And then we ask about raising numbers to an imaginary or complex-valued number and we wave that off to a higher-level mathematics class.

Nate Fakes’s Break of Day for the 18th is the anthropomorphic-numerals joke for this essay.

Lachowski’s Get A Life for the 18th is the sudoku joke for this essay. It’s also a representative of the idea that any mathematical thing is some deep, complicated puzzle at least as challenging as calculating one’s taxes. I feel like this is a rerun, but I don’t see any copyright dates. Sudoku jokes like this feel old, but comic strips have been known to make dated references before.

Samson’s Dark Side Of The Horse for the 19th is this essay’s Dark Side Of The Horse gag. I thought initially this was a counting-sheep in a lab coat. I’m going to stick to that mistaken interpretation because it’s more adorable that way.

How Interesting Is A Low-Scoring Game?

I’m still curious about the information-theory content, the entropy, of sports scores. I haven’t found the statistics I need about baseball or soccer game outcomes that I need. I’d also like hockey score outcomes if I could get them. If anyone knows a reference I’d be glad to know of it.

But there’s still stuff I can talk about without knowing details of every game ever. One of them suggested itself when I looked at the Washington Post‘s graphic. I mean the one giving how many times each score came up in baseball’s history.

I had planned to write about this when one of my Twitter friends wrote —

@nebusj Cool, looks sort of like a beta distribution. Most movie grosses over time follow a similar curve.

— Feloni Mayhem (@felonimayhem) May 19, 2016

By “distribution” mathematicians mean almost what you would imagine. Suppose we have something that might hold any of a range of values. This we call a “random variable”. How likely is it to hold any particular value? That’s what the distribution tells us. The higher the distribution, the more likely it is we’ll see that value. In baseball terms, that means we’re reasonably likely to see a game with a team scoring three runs. We’re not likely to see a game with a team scoring twenty runs.

Frequency (in thousands) of various baseball scores. I think I know what kind of distribution this is and I mean to follow up about that. — Philip Bump writes for The Washington Post on the scores of all basketball, football, and baseball games in (United States) major league history. Also I have thoughts about what this looks like.

There are many families of distributions. Feloni Mayhem suggested the baseball scores look like one called the Beta Distribution. I can’t quite agree, on technical grounds. Beta Distributions describe continuously-valued variables. They’re good for stuff like the time it takes to do something, or the height of a person, or the weight of a produced thing. They’re for measurements that can, in principle, go on forever after the decimal point. A baseball score isn’t like that. A team can score zero points, or one, or 46, but it can’t score four and two-thirds points. Baseball scores are “discrete” variables.

But there are good distributions for discrete variables. Almost everything you encounter taking an Intro to Probability class will be about discrete variables. So will most any recreational mathematics puzzle. The distribution of a tossed die’s outcomes is discrete. So is the number of times tails comes up in a set number of coin tosses. So are the birth dates of people in a room, or the number of cars passed on the side of the road during your ride, or the number of runs scored by a baseball team in a full game.

I suspected that, of the simpler distributions, the best model for baseball should be the Poisson distribution. It also seems good for any other low-scoring game, such as soccer or hockey. The Poisson distribution turns up whenever you have a large number of times that some discrete event can happen. But that event can happen only once each chance. And it has a constant chance of happening. That is, happening this chance doesn’t make it more likely or less likely it’ll happen next chance.

I have reasons to think baseball scoring should be well-modelled this way. There are hundreds of pitches in a game. Each of them is in principle a scoring opportunity. (Well, an intentional walk takes three pitches without offering any chance for scoring. And there’s probably some other odd case where a pitched ball can’t even in principle let someone score. But these are minor fallings-away from the ideal.) This is part of the appeal of baseball, at least for some: the chance is always there.

We only need one number to work out the Poisson distribution of something. That number is the mean, the arithmetic mean of all the possible values. Let me call the mean μ, which is the Greek version of m and so a good name for a mean. The probability that you’ll see the thing happen n times is $\mu^n e^{-\mu} \div (n!)$ . Here e is that base of the natural logarithm, that 2.71828 et cetera number. n! is the factorial. That’s n times (n – 1) times (n – 2) times (n – 3) and so on all the way down to times 2 times 1.

And here is the Poisson distribution for getting numbers from 0 through 20, if we take the mean to be 3.4. I can defend using the Poisson distribution much more than I can defend picking 3.4 as the mean. Why not 3.2, or 3.8? Mostly, I tried a couple means around the three-to-four runs range and picked one that looked about right. Given the lack of better data, what else can I do?

The Poisson distribution starts pretty low, with zero, then rises up high at three runs and dwindles down for ever-higher scores. — A simulation of baseball, or other low-scoring games, based on a Poisson distribution with mean of 3.4.

I don’t think it’s a bad fit. The shape looks about right, to me. But the Poisson distribution suggests fewer zero- and one-run games than the actual data offers. And there are more high-scoring games in the real data than in the Poisson distribution. Maybe there’s something that needs tweaking.

And there are several plausible causes for this. A Poisson distribution, for example, supposes that there are a lot of chances for a distinct event. That would be scoring on a pitch. But in an actual baseball game there might be up to four runs scored on one pitch. It’s less likely to score four runs than to score one, sure, but it does happen. This I imagine boosts the number of high-scoring games.

I suspect this could be salvaged by a model that’s kind of a chain of Poisson distributions. That is, have one distribution that represents the chance of scoring on any given pitch. Then use another distribution to say whether the scoring was one, two, three, or four runs.

Low-scoring games I have a harder time accounting for. My suspicion is that each pitch isn’t quite an independent event. Experience shows that pitchers lose control of their game the more they pitch. This results in the modern close watching of pitch counts. We see pitchers replaced at something like a hundred pitches even if they haven’t lost control of the game yet.

If we ignore reasons to doubt this distribution, then, it suggests an entropy of about 2.9 for a single team’s score. That’s lower than the 3.5 bits I estimated last time, using score frequencies. I think that’s because of the multiple-runs problem. Scores are spread out across more values than the Poisson distribution suggests.

If I am right this says we might model games like soccer and hockey, with many chances to score a single run each, with a Poisson distribution. A game like baseball, or basketball, with many chances to score one or more points at once needs a more complicated model.

Reading the Comics, May 17, 2016: Again, No Pictures Edition

Last week’s Reading The Comics was a bunch of Gocomics.com strips. And I don’t feel the need to post the images for those, since they’re reasonably stable links. Today’s is also a bunch of Gocomics.com strips. I know how every how-to-bring-in-readers post ever says you should include images. Maybe I will commission someone to do some icons. It couldn’t hurt.

Someone looking close at the title, with responsible eye protection, might notice it’s dated the 17th, a day this is not. There haven’t been many mathematically-themed comic strips since the 17th is all. And I’m thinking to try out, at least for a while, making the day on which a Reading the Comics post is issued regular. Maybe Monday. This might mean there are some long and some short posts, but being a bit more scheduled might help my writing.

Mark Anderson’s Andertoons for the 14th is the charting joke for this essay. Also the Mark Anderson joke for this essay. I was all ready to start explaining ways that the entropy of something can decrease. The easiest way is by expending energy, which we can see as just increasing entropy somewhere else in the universe. The one requiring the most patience is simply waiting: entropy almost always increases, or at least doesn’t decrease. But “almost always” isn’t the same as “always”. But I have to pass. I suspect Anderson drew the chart going down because of the sense of entropy being a winding-down of useful stuff. Or because of down having connotations of failure, and the increase of entropy suggesting the failing of the universe. And we can also read this as a further joke: things are falling apart so badly that even entropy isn’t working like it ought. Anderson might not have meant for a joke that sophisticated, but if he wants to say he did I won’t argue it.

Scott Adams’s Dilbert Classics for the 14th reprinted the comic of the 20th of March, 1993. I admit I do this sort of compulsive “change-simplifying” paying myself. It’s easy to do if you have committed to memory pairs of numbers separated by five: 0 and 5, 1 and 6, 2 and 7, and so on. So if I get a bill for (say) $4.18, I would look for whether I have three cents in change. If I have, have I got 23 cents? That would give me back a nickel. 43 cents would give me back a quarter in change. And a quarter is great because I can use that for pinball.

Sometimes the person at the cash register doesn’t want a ridiculous bunch of change. I don’t blame them. It’s easy to suppose that someone who’s given you $5.03 for a $4.18 charge misunderstood what the bill was. Some folks will take this as a chance to complain mightily about how kids don’t learn even the basics of mathematics anymore and the world is doomed because the young will follow their job training and let machines that are vastly better at arithmetic than they are do arithmetic. This is probably what Adams was thinking, since, well, look at the clerk’s thought balloon in the final panel.

But consider this: why would Dilbert have handed over $7.14? Or, specifically, how could he give $7.14 to the clerk but not have been able to give $2.14, which would make things easier on everybody? There’s no combination of bills — in United States or, so far as I’m aware, any major world currency — in which you can give seven dollars but not two dollars. He had to be handing over five dollars he was getting right back. The clerk would be right to suspect this. It looks like the start of a change scam, begun by giving a confusing amount of money.

Had Adams written it so that the charge was $6.89, and Dilbert “helpfully” gave $12.14, then Dilbert wouldn’t be needlessly confusing things.

Dave Whamond’s Reality Check for the 15th is that pirate-based find-x joke that feels like it should be going around Facebook, even though I don’t think it has been. I can’t say the combination of jokes quite makes logical sense, but I’m amused. It might be from the Reality Check squirrel in the corner.

Nate Fakes’s Break of Day for the 16th is the anthropomorphized shapes joke for this essay. It’s not the only shapes joke, though.

Doug Bratton’s Pop Culture Shock Therapy for the 16th is the Einstein joke for this essay.

Rick Detorie’s One Big Happy rerun for the 17th is another shapes joke. Ruthie has strong ideas about what distinguishes a pyramid from a triangle. In this context I can’t say she’s wrong to assert what a pyramid is.

How Interesting Is A Baseball Score? Some Further Results

While researching for my post about the information content of baseball scores I found some tantalizing links. I had wanted to know how often each score came up. From this I could calculate the entropy, the amount of information in the score. That’s the sum, taken over every outcome, of minus one times the frequency of that score times the base-two logarithm of the frequency of the outcome. And I couldn’t find that.

An article in The Washington Post had a fine lead, though. It offers, per the title, “the score of every basketball, football, and baseball game in league history visualized”. And as promised it gives charts of how often each number of runs has turned up in a game. The most common single-team score in a game is 3, with 4 and 2 almost as common. I’m not sure the date range for these scores. The chart says it includes (and highlights) data from “a century ago”. And as the article was posted in December 2014 it can hardly use data from after that. I can’t imagine that the 2015 season has changed much, though. And whether they start their baseball statistics at either 1871, 1876, 1883, 1891, or 1901 (each a defensible choice) should only change details.

Which is fine. I can’t get precise frequency data from the chart. The chart offers how many thousands of times a particular score has come up. But there’s not the reference lines to say definitely whether a zero was scored closer to 21,000 or 22,000 times. I will accept a rough estimate, since I can’t do any better.

I made my best guess at the frequency, from the chart. And then made a second-best guess. My best guess gave the information content of a single team’s score as a touch more than 3.5 bits. My second-best guess gave the information content as a touch less than 3.5 bits. So I feel safe in saying a single team’s score is about three and a half bits of information.

So the score of a baseball game, with two teams scoring, is probably somewhere around twice that, or about seven bits of information.

I have to say “around”. This is because the two teams aren’t scoring runs independently of one another. Baseball doesn’t allow for tie games except in rare circumstances. (It would usually be a game interrupted for some reason, and then never finished because the season ended with neither team in a position where winning or losing could affect their standing. I’m not sure that would technically count as a “game” for Major League Baseball statistical purposes. But I could easily see a roster of game scores counting that.) So if one team’s scored three runs in a game, we have the information that the other team almost certainly didn’t score three runs.

This estimate, though, does fit within my range estimate from 3.76 to 9.25 bits. And as I expected, it’s closer to nine bits than to four bits. The entropy seems to be a bit less than (American) football scores — somewhere around 8.7 bits — and college basketball — probably somewhere around 10.8 bits — which is probably fair. There are a lot of numbers that make for plausible college basketball scores. There are slightly fewer pairs of numbers that make for plausible football scores. There are fewer still pairs of scores that make for plausible baseball scores. So there’s less information conveyed in knowing that the game’s score is.

How Interesting Is A Baseball Score? Some Partial Results

Meanwhile I have the slight ongoing quest to work out the information-theory content of sports scores. For college basketball scores I made up some plausible-looking score distributions and used that. For professional (American) football I found a record of all the score outcomes that’ve happened, and how often. I could use experimental results. And I’ve wanted to do other sports. Soccer was asked for. I haven’t been able to find the scoring data I need for that. Baseball, maybe the supreme example of sports as a way to generate statistics … has been frustrating.

The raw data is available. Retrosheet.org has logs of pretty much every baseball game, going back to the forming of major leagues in the 1870s. What they don’t have, as best I can figure, is a list of all the times each possible baseball score has turned up. That I could probably work out, when I feel up to writing the scripts necessary, but “work”? Ugh.

Some people have done the work, although they haven’t shared all the results. I don’t blame them; the full results make for a boring sort of page. “The Most Popular Scores In Baseball History”, at ValueOverReplacementGrit.com, reports the top ten most common scores from 1871 through 2010. The essay also mentions that as of then there were 611 unique final scores. And that lets me give some partial results, if we trust that blogger post from people I never heard of before are accurate and true. I will make that assumption over and over here.

There’s, in principle, no limit to how many scores are possible. Baseball contains many implied infinities, and it’s not impossible that a game could end, say, 580 to 578. But it seems likely that after 139 seasons of play there can’t be all that many more scores practically achievable.

Suppose then there are 611 possible baseball score outcomes, and that each of them is equally likely. Then the information-theory content of a score’s outcome is negative one times the logarithm, base two, of 1/611. That’s a number a little bit over nine and a quarter. You could deduce the score for a given game by asking usually nine, sometimes ten, yes-or-no questions from a source that knew the outcome. That’s a little higher than the 8.7 I worked out for football. And it’s a bit less than the 10.8 I estimate for college basketball.

And there’s obvious rubbish there. In no way are all 611 possible outcomes equally likely. “The Most Popular Scores In Baseball History” says that right there in the essay title. The most common outcome was a score of 3-2, with 4-3 barely less popular. Meanwhile it seems only once, on the 28th of June, 1871, has a baseball game ended with a score of 49-33. Some scores are so rare we can ignore them as possibilities.

(You may wonder how incompetent baseball players of the 1870s were that a game could get to 49-33. Not so bad as you imagine. But the equipment and conditions they were playing with were unspeakably bad by modern standards. Notably, the playing field couldn’t be counted on to be flat and level and well-mowed. There would be unexpected divots or irregularities. This makes even simple ground balls hard to field. The baseball, instead of being replaced with every batter, would stay in the game. It would get beaten until it was a little smashed shell of unpredictable dynamics and barely any structural integrity. People were playing without gloves. If a game ran long enough, they would play at dusk, without lights, with a muddy ball on a dusty field. And sometimes you just have four innings that get out of control.)

What’s needed is a guide to what are the common scores and what are the rare scores. And I haven’t found that, nor worked up the energy to make the list myself. But I found some promising partial results. In a September 2008 post on Baseball-Fever.com, user weskelton listed the 24 most common scores and their frequency. This was for games from 1993 to 2008. One might gripe that the list only covers fifteen years. True enough, but if the years are representative that’s fine. And the top scores for the fifteen-year survey look to be pretty much the same as the 139-year tally. The 24 most common scores add up to just over sixty percent of all baseball games, which leaves a lot of scores unaccounted for. I am amazed that about three in five games will have a score that’s one of these 24 choices though.

But that’s something. We can calculate the information content for the 25 outcomes, one each of the 24 particular scores and one for “other”. This will under-estimate the information content. That’s because “other” is any of 587 possible outcomes that we’re not distinguishing. But if we have a lower bound and an upper bound, then we’ve learned something about what the number we want can actually be. The upper bound is that 9.25, above.

The information content, the entropy, we calculate from the probability of each outcome. We don’t know what that is. Not really. But we can suppose that the frequency of each outcome is close to its probability. If there’ve been a lot of games played, then the frequency of a score and the probability of a score should be close. At least they’ll be close if games are independent, if the score of one game doesn’t affect another’s. I think that’s close to true. (Some games at the end of pennant races might affect each other: why try so hard to score if you’re already out for the year? But there’s few of them.)

The entropy then we find by calculating, for each outcome, a product. It’s minus one times the probability of that outcome times the base-two logarithm of the probability of that outcome. Then add up all those products. There’s good reasons for doing it this way and in the college-basketball link above I give some rough explanations of what the reasons are. Or you can just trust that I’m not lying or getting things wrong on purpose.

So let’s suppose I have calculated this right, using the 24 distinct outcomes and the one “other” outcome. That makes out the information content of a baseball score’s outcome to be a little over 3.76 bits.

As said, that’s a low estimate. Lumping about two-fifths of all games into the single category “other” drags the entropy down.

But that gives me a range, at least. A baseball game’s score seems to be somewhere between about 3.76 and 9.25 bits of information. I expect that it’s closer to nine bits than it is to four bits, but will have to do a little more work to make the case for it.

How Interesting Is A Football Score?

Last month, Sarcastic Goat asked me how interesting a soccer game was. This is “interesting” in the information theory sense. I describe what that is in a series of posts, linked to from above. That had been inspired by the NCAA “March Madness” basketball tournament. I’d been wondering about the information-theory content of knowing the outcome of the tournament, and of each game.

This measure, called the entropy, we can work out from knowing how likely all the possible outcomes of something — anything — are. If there’s two possible outcomes and they’re equally likely, the entropy is 1. If there’s two possible outcomes and one is a sure thing while the other can’t happen, the entropy is 0. If there’s four possible outcomes and they’re all equally likely, the entropy is 2. If there’s eight possible outcomes, all equally likely, the entropy is 3. If there’s eight possible outcomes and some are likely while some are long shots, the entropy is … smaller than 3, but bigger than 0. The entropy grows with the number of possible outcomes and shrinks with the number of unlikely outcomes.

But it’s easy to calculate. List all the possible outcomes. Find the probability of each of those possible outcomes happening. Then, calculate minus one times the probability of each outcome times the logarithm, base two, of that outcome. For each outcome, so yes, this might take a while. Then add up all those products.

I’d estimated the outcome of the 63-game basketball tournament was somewhere around 48 bits of information. There’s a fair number of foregone, or almost foregone, conclusions in the game, after all. And I guessed, based on a toy model of what kinds of scores often turn up in college basketball games, that the game’s score had an information content of a little under 11 bits of information.

Sarcastic Goat, as I say, asked about soccer scores. I don’t feel confident that I could make up a plausible model of soccer score distributions. So I went looking for historical data. Surely, a history of actual professional soccer scores over a couple decades would show all the possible, plausible, outcomes and how likely each was to turn out.

I didn’t find one. My search for soccer scores kept getting contaminated with (American) football scores. But that turned up something interesting anyway. Sports Reference LLC has a table which purports to list the results of all professional football games played from 1920 through the start of 2016. There’ve been, apparently, some 1,026 different score outcomes, from 0-0 through to 73-0.

As you’d figure, there are a lot of freakish scores; only once in professional football history has the game ended 62-28. (Although it’s ended 62-14 twice, somehow.) There hasn’t been a 2-0 game since the second week of the 1938 season. Some scores turn up a lot; 248 games (as of this writing) have ended 20-17. That’s the most common score, in its records. 27-24 and 17-14 are the next most common scores. If I’m not making a dumb mistake, 7-0 is the 21st most common score. 93 games have ended with that tally. But it hasn’t actually been a game’s final score since the 14th week of the 1983 season, somehow. 98 games have ended 21-17; only ten have ended 21-18. Weird.

Anyway, there’s 1,026 recorded outcomes. That’s surely as close to “all the possible outcomes” as we can expect to get, at least until the Jets manage to lose 74-0 in their home opener. But if all 1,026 outcomes were equally likely then the information content of the game’s score would be a touch over 10 bits. But these outcomes aren’t all equally likely. It’s vastly more likely that a game ended 16-13 than it is likely it ended 16-8.

Let’s suppose I didn’t make any stupid mistakes in working out the frequency of all the possible outcomes. Then the information content of a football game’s outcome is a little over 8.72 bits.

Don’t be too hypnotized by the digits past the decimal. It’s approximate. But it suggests that if you were asking a source that would only answer ‘yes’ or ‘no’, then you could expect to get the score for any particular football game with about nine well-chosen questions.

I’m not surprised this is less than my estimated information content of a basketball game’s score. I think basketball games see a wider range of likely scores than football games do.

If someone has a reference for the outcomes of soccer games — or other sports — over a reasonably long time please let me know. I can run the same sort of calculation. We might even manage the completely pointless business of ranking all major sports by the information content of their scores.

A Leap Day 2016 Mathematics A To Z: Kullbach-Leibler Divergence

Today’s mathematics glossary term is another one requested by Jacob Kanev. Kaven, I learned last time, has got a blog, “Some Unconsidered Trifles”, for those interested in having more things to read. Kanev’s request this time was a term new to me. But learning things I didn’t expect to consider is part of the fun of this dance.

Kullback-Leibler Divergence.

The Kullback-Leibler Divergence comes to us from information theory. It’s also known as “information divergence” or “relative entropy”. Entropy is by now a familiar friend. We got to know it through, among other things, the “How interesting is a basketball tournament?” question. In this context, entropy is a measure of how surprising it would be to know which of several possible outcomes happens. A sure thing has an entropy of zero; there’s no potential surprise in it. If there are two equally likely outcomes, then the entropy is 1. If there are four equally likely outcomes, then the entropy is 2. If there are four possible outcomes, but one is very likely and the other three mediocre, the entropy might be low, say, 0.5 or so. It’s mostly but not perfectly predictable.

Suppose we have a set of possible outcomes for something. (Pick anything you like. It could be the outcomes of a basketball tournament. It could be how much a favored stock rises or falls over the day. It could be how long your ride into work takes. As long as there are different possible outcomes, we have something workable.) If we have a probability, a measure of how likely each of the different outcomes is, then we have a probability distribution. More likely things have probabilities closer to 1. Less likely things have probabilities closer to 0. No probability is less than zero or more than 1. All the probabilities added together sum up to 1. (These are the rules which make something a probability distribution, not just a bunch of numbers we had in the junk drawer.)

The Kullback-Leibler Divergence describes how similar two probability distributions are to one another. Let me call one of these probability distributions p. I’ll call the other one q. We have some number of possible outcomes, and we’ll use k as an index for them. p_k is how likely, in distribution p, that outcome number k is. q_k is how likely, in distribution q, that outcome number k is.

To calculate this divergence, we work out, for each k, the number p_k times the logarithm of p_k divided by q_k. Here the logarithm is base two. Calculate all this for every one of the possible outcomes, and add it together. This will be some number that’s at least zero, but it might be larger.

The closer that distribution p and distribution q are to each other, the smaller this number is. If they’re exactly the same, this number will be zero. The less that distribution p and distribution q are like each other, the bigger this number is.

And that’s all good fun, but, why bother with it? And at least one answer I can give is that it lets us measure how good a model of something is.

Suppose we think we have an explanation for how something varies. We can say how likely it is we think there’ll be each of the possible different outcomes. This gives us a probability distribution which let’s call q. We can compare that to actual data. Watch whatever it is for a while, and measure how often each of the different possible outcomes actually does happen. This gives us a probability distribution which let’s call p.

If our model is a good one, then the Kullback-Leibler Divergence between p and q will be small. If our model’s a lousy one, then this divergence will be large. If we have a couple different models, we can see which ones make for smaller divergences and which ones make for larger divergences. Probably we’ll want smaller divergences.

Here you might ask: why do we need a model? Isn’t the actual data the best model we might have? It’s a fair question. But no, real data is kind of lousy. It’s all messy. It’s complicated. We get extraneous little bits of nonsense clogging it up. And the next batch of results is going to be different from the old ones anyway, because real data always varies.

Furthermore, one of the purposes of a model is to be simpler than reality. A model should do away with complications so that it is easier to analyze, easier to make predictions with, and easier to teach than the reality is. But a model mustn’t be so simple that it can’t represent important aspects of the thing we want to study.

The Kullback-Leibler Divergence is a tool that we can use to quantify how much better one model or another fits our data. It also lets us quantify how much of the grit of reality we lose in our model. And this is at least some of the use of this quantity.

How Interesting Can A Basketball Tournament Be?

The United States is about to spend a good bit of time worrying about the NCAA men’s basketball tournament. It’s a good distraction from the women’s basketball tournament and from the National Invitational Tournament. Last year I used this to write a couple essays that stepped into information theory. Nobody knowledgeable in information theory has sent me threatening letters since. So since the inspiration is back in season I’d like to bring them to your attention again:

How Interesting Is A Basketball Tournament?, the starting point, laying out how we measure one game’s interestingness and go on to 63 games from that.
What We Talk About When We Talk About How Interesting What We’re Talking About Is, which wasn’t just written because of how much fun a title that was. It introduced this whole “bit” concept.
But How Interesting Is A Real Basketball Tournament? Because I started out assuming that games were perfectly even match ups either team was likely to win. This isn’t so. If we grant that a number-16 seed is almost sure to lose to a number-1 seed, how does the information content change?
But How Interesting Is A Basketball Score? We’re not just interested in who wins a basketball game. We’re interested in how much they win by. So what’s that?
Doesn’t The Other Team Count? How Much? The previous essay worked out the information content of one team’s score. What about the second team’s score?
A Little More Talk About What We Talk About When We Talk About How Interesting What We Talk About Is, because who can resist a title construction like that? Anyway, this puts in some more vocabulary.

Making A Joke Of Entropy

This entered into my awareness a few weeks back. Of course I’ve lost where I got it from. But the headline is of natural interest to me. Kristy Condon’s “Researchers establish the world’s first mathematical theory of humor” describes the results of an interesting paper studying the phenomenon of funny words.

The original paper is by Chris Westbury, Cyrus Shaoul, Gail Moroschan, and Michael Ramscar, titled “Telling the world’s least funny jokes: On the quantification of humor as entropy”. It appeared in The Journal of Memory and Language. The thing studied was whether it’s possible to predict how funny people are likely to find a made-up non-word.

As anyone who tries to be funny knows, some words just are funnier than others. Or at least they sound funnier. (This brings us into the problem of whether something is actually funny or whether we just think it is.) Westbury, Shaoul, Moroschan, and Ramscar try testing whether a common measure of how unpredictable something is — the entropy, a cornerstone of information theory — can tell us how funny a word might be.

We’ve encountered entropy in these parts before. I used it in that series earlier this year about how interesting a basketball tournament was. Entropy, in this context, is low if something is predictable. It gets higher the more unpredictable the thing being studied is. You see this at work in auto-completion: if you have typed in ‘th’, it’s likely your next letter is going to be an ‘e’. This reflects the low entropy of ‘the’ as an english word. It’s rather unlikely the next letter will be ‘j’, because English has few contexts that need ‘thj’ to be written out. So it will suggest words that start ‘the’ (and ‘tha’, and maybe even ‘thi’), while giving ‘thj’ and ‘thq’ and ‘thd’ a pass.

Westbury, Shaoul, Moroschan, and Ramscar found that the entropy of a word, how unlikely that collection of letters is to appear in an English word, matches quite well how funny people unfamiliar with it find it. This fits well with one of the more respectable theories of comedy, Arthur Schopenhauer’s theory that humor comes about from violating expectations. That matches well with unpredictability.

Of course it isn’t just entropy that makes words funny. Anyone trying to be funny learns that soon enough, since a string of perfect nonsense is also boring. But this is one of the things that can be measured and that does have an influence.

(I doubt there is any one explanation for why things are funny. My sense is that there are many different kinds of humor, not all of them perfectly compatible. It would be bizarre if any one thing could explain them all. But explanations for pieces of them are plausible enough.)

Anyway, I recommend looking at the Kristy Condon report. It explains the paper and the research in some more detail. And if you feel up to reading an academic paper, try Westbury, Shaoul, Moroschan, and Ramscar’s report. I thought it readable, even though so much of it is outside my field. And if all else fails there’s a list of two hundred made-up words used in field tests for funniness. Some of them look pretty solid to me.

Conditions of equilibrium and stability

This month Peter Mander’s CarnotCycle blog talks about the interesting world of statistical equilibriums. And particularly it talks about stable equilibriums. A system’s in equilibrium if it isn’t going to change over time. It’s in a stable equilibrium if being pushed a little bit out of equilibrium isn’t going to make the system unpredictable.

For simple physical problems these are easy to understand. For example, a marble resting at the bottom of a spherical bowl is in a stable equilibrium. At the exact bottom of the bowl, the marble won’t roll away. If you give the marble a little nudge, it’ll roll around, but it’ll stay near where it started. A marble sitting on the top of a sphere is in an equilibrium — if it’s perfectly balanced it’ll stay where it is — but it’s not a stable one. Give the marble a nudge and it’ll roll away, never to come back.

In statistical mechanics we look at complicated physical systems, ones with thousands or millions or even really huge numbers of particles interacting. But there are still equilibriums, some stable, some not. In these, stuff will still happen, but the kind of behavior doesn’t change. Think of a steadily-flowing river: none of the water is staying still, or close to it, but the river isn’t changing.

CarnotCycle describes how to tell, from properties like temperature and pressure and entropy, when systems are in a stable equilibrium. These are properties that don’t tell us a lot about what any particular particle is doing, but they can describe the whole system well. The essay is higher-level than usual for my blog. But if you’re taking a statistical mechanics or thermodynamics course this is just the sort of essay you’ll find useful.

carnotcycle

In terms of simplicity, purely mechanical systems have an advantage over thermodynamic systems in that stability and instability can be defined solely in terms of potential energy. For example the center of mass of the tower at Pisa, in its present state, must be higher than in some infinitely near positions, so we can conclude that the structure is not in stable equilibrium. This will only be the case if the tower attains the condition of metastability by returning to a vertical position or absolute stability by exceeding the tipping point and falling over.

Thermodynamic systems lack this simplicity, but in common with purely mechanical systems, thermodynamic equilibria are always metastable or stable, and never unstable. This is equivalent to saying that every spontaneous (observable) process proceeds towards an equilibrium state, never away from it.

If we restrict our attention to a thermodynamic system of unchanging composition and apply…

View original post 2,534 more words

Reading the Comics, June 4, 2015: Taking It Easy Edition

I do like looking for thematic links among the comic strips that mention mathematical topics that I gather for these posts. This time around all I can find is a theme of “nothing big going on”. I’m amused by some of them but don’t think there’s much depth to the topics. But I like them anyway.

Mark Anderson’s Andertoons gets its appearance here with the May 25th strip. And it’s a joke about the hatred of fractions. It’s a suitable one for posting in mathematics classes too, since it is right about naming three famous irrational numbers — pi, the “golden ratio” phi, and the square root of two — and the fact they can’t be written as fractions which use only whole numbers in the numerator and denominator. Pi is, well, pi. The square root of two is nice and easy to find, and has that famous legend about the Pythagoreans attached to it. And it’s probably the easiest number to prove is irrational.

The “golden ratio” is that number that’s about 1.618. It’s interesting because 1 divided by phi is about 0.618, and who can resist a symmetry like that? That’s about all there is to say for it, though. The golden ratio is otherwise a pretty boring number, really. It’s gained celebrity as an “ideal” ratio — that a rectangle with one side about 1.6 times as long as the other is somehow more appealing than other choices — but that’s rubbish. It’s a novelty act among numbers. Novelty acts are fun, but we should appreciate them for what they are.

Continue reading “Reading the Comics, June 4, 2015: Taking It Easy Edition”

Reversible and irreversible change

Entropy is hard to understand. It’s deceptively easy to describe, and the concept is popular, but to understand it is challenging. In this month’s entry CarnotCycle talks about thermodynamic entropy and where it comes from. I don’t promise you will understand it after this essay, but you will be closer to understanding it.

carnotcycle

Reversible change is a key concept in classical thermodynamics. It is important to understand what is meant by the term as it is closely allied to other important concepts such as equilibrium and entropy. But reversible change is not an easy idea to grasp – it helps to be able to visualize it.

Reversibility and mechanical systems

The simple mechanical system pictured above provides a useful starting point. The aim of the experiment is to see how much weight can be lifted by the fixed weight M1. Experience tells us that if a small weight M2 is attached – as shown on the left – then M1 will fall fast while M2 is pulled upwards at the same speed.

Experience also tells us that as the weight of M2 is increased, the lifting speed will decrease until a limit is reached when the weight difference between M2 and M1 becomes…

View original post 692 more words

A Little More Talk About What We Talk About When We Talk About How Interesting What We Talk About Is

I had been talking about how much information there is in the outcome of basketball games, or tournaments, or the like. I wanted to fill in at least one technical term, to match some of the others I’d given.

In this information-theory context, an experiment is just anything that could have different outcomes. A team can win or can lose or can tie in a game; that makes the game an experiment. The outcomes are the team wins, or loses, or ties. A team can get a particular score in the game; that makes that game a different experiment. The possible outcomes are the team scores zero points, or one point, or two points, or so on up to whatever the greatest possible score is.

If you know the probability p of each of the different outcomes, and since this is a mathematics thing we suppose that you do, then we have what I was calling the information content of the outcome of the experiment. That’s a number, measured in bits, and given by the formula

$\sum_{j} - p_j \cdot \log\left(p_j\right)$

The sigma summation symbol means to evaluate the expression to the right of it for every value of some index j. The p_j means the probability of outcome number j. And the logarithm may be that of any base, although if we use base two then we have an information content measured in bits. Those are the same bits as are in the bytes that make up the megabytes and gigabytes in your computer. You can see this number as an estimate of how many well-chosen yes-or-no questions you’d have to ask to pick the actual result out of all the possible ones.

I’d called this the information content of the experiment’s outcome. That’s an idiosyncratic term, chosen because I wanted to hide what it’s normally called. The normal name for this is the “entropy”.

To be more precise, it’s known as the “Shannon entropy”, after Claude Shannon, pioneer of the modern theory of information. However, the equation defining it looks the same as one that defines the entropy of statistical mechanics, that thing everyone knows is always increasing and somehow connected with stuff breaking down. Well, almost the same. The statistical mechanics one multiplies the sum by a constant number called the Boltzmann constant, after Ludwig Boltzmann, who did so much to put statistical mechanics in its present and very useful form. We aren’t thrown by that. The statistical mechanics entropy describes energy that is in a system but that can’t be used. It’s almost background noise, present but nothing of interest.

Is this Shannon entropy the same entropy as in statistical mechanics? This gets into some abstract grounds. If two things are described by the same formula, are they the same kind of thing? Maybe they are, although it’s hard to see what kind of thing might be shared by “how interesting the score of a basketball game is” and “how much unavailable energy there is in an engine”.

The legend has it that when Shannon was working out his information theory he needed a name for this quantity. John von Neumann, the mathematician and pioneer of computer science, suggested, “You should call it entropy. In the first place, a mathematical development very much like yours already exists in Boltzmann’s statistical mechanics, and in the second place, no one understands entropy very well, so in any discussion you will be in a position of advantage.” There are variations of the quote, but they have the same structure and punch line. The anecdote appears to trace back to an April 1961 seminar at MIT given by one Myron Tribus, who claimed to have heard the story from Shannon. I am not sure whether it is literally true, but it does express a feeling about how people understand entropy that is true.

Well, these entropies have the same form. And they’re given the same name, give or take a modifier of “Shannon” or “statistical” or some other qualifier. They’re even often given the same symbol; normally a capital S or maybe an H is used as the quantity of entropy. (H tends to be more common for the Shannon entropy, but your equation would be understood either way.)

I’m not comfortable saying they’re the same thing, though. After all, we use the same formula to calculate a batting average and to work out the average time of a commute. But we don’t think those are the same thing, at least not more generally than “they’re both averages”. These entropies measure different kinds of things. They have different units that just can’t be sensibly converted from one to another. And the statistical mechanics entropy has many definitions that not just don’t have parallels for information, but wouldn’t even make sense for information. I would call these entropies siblings, with strikingly similar profiles, but not more than that.

But let me point out something about the Shannon entropy. It is low when an outcome is predictable. If the outcome is unpredictable, presumably knowing the outcome will be interesting, because there is no guessing what it might be. This is where the entropy is maximized. But an absolutely random outcome also has a high entropy. And that’s boring. There’s no reason for the outcome to be one option instead of another. Somehow, as looked at by the measure of entropy, the most interesting of outcomes and the most meaningless of outcomes blur together. There is something wondrous and strange in that.

Doesn’t The Other Team Count? How Much?

I’d worked out an estimate of how much information content there is in a basketball score, by which I was careful to say the score that one team manages in a game. I wasn’t able to find out what the actual distribution of real-world scores was like, unfortunately, so I made up a plausible-sounding guess: that college basketball scores would be distributed among the imaginable numbers (whole numbers from zero through … well, infinitely large numbers, though in practice probably not more than 150) according to a very common distribution called the “Gaussian” or “normal” distribution, that the arithmetic mean score would be about 65, and that the standard deviation, a measure of how spread out the distribution of scores is, would be about 10.

If those assumptions are true, or are at least close enough to true, then there are something like 5.4 bits of information in a single team’s score. Put another way, if you were trying to divine the score by asking someone who knew it a series of carefully-chosen questions, like, “is the score less than 65?” or “is the score more than 39?”, with at each stage each question equally likely to be answered yes or no, you could expect to hit the exact score with usually five, sometimes six, such questions.

==>

But How Interesting Is A Basketball Score?

When I worked out how interesting, in an information-theory sense, a basketball game — and from that, a tournament — might be, I supposed there was only one thing that might be interesting about the game: who won? Or to be exact, “did (this team) win”? But that isn’t everything we might want to know about a game. For example, we might want to know what a team scored. People often do. So how to measure this?

The answer was given, in embryo, in my first piece about how interesting a game might be. If you can list all the possible outcomes of something that has multiple outcomes, and how probable each of those outcomes is, then you can describe how much information there is in knowing the result. It’s the sum, for all of the possible results, of the quantity negative one times the probability of the result times the logarithm-base-two of the probability of the result. When we were interested in only whether a team won or lost there were just the two outcomes possible, which made for some fairly simple calculations, and indicates that the information content of a game can be as high as 1 — if the team is equally likely to win or to lose — or as low as 0 — if the team is sure to win, or sure to lose. And the units of this measure are bits, the same kind of thing we use to measure (in groups of bits called bytes) how big a computer file is.

Continue reading “But How Interesting Is A Basketball Score?”

But How Interesting Is A Real Basketball Tournament?

When I wrote about how interesting the results of a basketball tournament were, and came to the conclusion that it was 63 (and filled in that I meant 63 bits of information), I was careful to say that the outcome of a basketball game between two evenly-matched opponents has an information content of 1 bit. If the game is a foregone conclusion, then the game hasn’t got so much information about it. If the game really is foregone, the information content is 0 bits; you already know what the result will be. If the game is an almost sure thing, there’s very little information to be had from actually seeing the game. An upset might be thrilling to watch, but you would hardly count on that, if you’re being rational. But most games aren’t sure things; we might expect the higher-seed to win, but it’s plausible they don’t. How does that affect how much information there is in the results of a tournament?

Last year, the NCAA College Men’s Basketball tournament inspired me to look up what the outcomes of various types of matches were, and which teams were more likely to win than others. If some person who wrote something for statistics.about.com is correct, based on 27 years of March Madness outcomes, the play between a number one and a number 16 seed is a foregone conclusion — the number one seed always wins — while number two versus number 15 is nearly sure. So while the first round of play will involve 32 games — four regions, each region having eight games — there’ll be something less than 32 bits of information in all these games, since many of them are so predictable.

If we take the results from that statistics.about.com page as accurate and reliable as a way of predicting the outcomes of various-seeded teams, then we can estimate the information content of the first round of play at least.

Here’s how I work it out, anyway:

Contest	Probability the Higher Seed Wins	Information Content of this Outcome
#1 seed vs #16 seed	100%	0 bits
#2 seed vs #15 seed	96%	0.2423 bits
#3 seed vs #14 seed	85%	0.6098 bits
#4 seed vs #13 seed	79%	0.7415 bits
#5 seed vs #12 seed	67%	0.9149 bits
#6 seed vs #11 seed	67%	0.9149 bits
#7 seed vs #10 seed	60%	0.9710 bits
#8 seed vs #9 seed	47%	0.9974 bits

So if the eight contests in a single region were all evenly matched, the information content of that region would be 8 bits. But there’s one sure and one nearly-sure game in there, and there’s only a couple games where the two teams are close to evenly matched. As a result, I make out the information content of a single region to be about 5.392 bits of information. Since there’s four regions, that means the first round of play — the first 32 games — have altogether about 21.567 bits of information.

Warning: I used three digits past the decimal point just because three is a nice comfortable number. Do not by hypnotized into thinking this is a more precise measure than it really is. I don’t know what the precise chance of, say, a number three seed beating a number fourteen seed is; all I know is that in a 27-year sample, it happened the higher-seed won 85 percent of the time, so the chance of the higher-seed winning is probably close to 85 percent. And I only know that if whoever it was wrote this article actually gathered and processed and reported the information correctly. I would not be at all surprised if the first round turned out to have only 21.565 bits of information, or as many as 21.568.

A statistical analysis of the tournaments which I dug up last year indicated that in the last three rounds — the Elite Eight, Final Four, and championship game — the higher- and lower-seeded teams are equally likely to win, and therefore those games have an information content of 1 bit per game. The last three rounds therefore have 7 bits of information total.

Unfortunately, experimental data seems to fall short for the second round — 16 games, where the 32 winners in the first round play, producing the Sweet Sixteen teams — and the third round — 8 games, producing the Elite Eight. If someone’s done a study of how often the higher-seeded team wins I haven’t run across it.

There are six of these games in each of the four regions, for 24 games total. Presumably the higher-seeded is more likely than the lower-seeded to win, but I don’t know how much more probable it is the higher-seed will win. I can come up with some bounds: the 24 games total in the second and third rounds can’t have an information content less than 0 bits, since they’re not all foregone conclusions. The higher-ranked seed won’t win all the time. And they can’t have an information content of more than 24 bits, since that’s how much there would be if the games were perfectly even matches.

So, then: the first round carries about 21.567 bits of information. The second and third rounds carry between 0 and 24 bits. The fourth through sixth rounds (the sixth round is the championship game) carry seven bits. Overall, the 63 games of the tournament carry between 28.567 and 52.567 bits of information. I would expect that many of the second-round and most of the third-round games are pretty close to even matches, so I would expect the higher end of that range to be closer to the true information content.

Let me make the assumption that in this second and third round the higher-seed has roughly a chance of 75 percent of beating the lower seed. That’s a number taken pretty arbitrarily as one that sounds like a plausible but not excessive advantage the higher-seeded teams might have. (It happens it’s close to the average you get of the higher-seed beating the lower-seed in the first round of play, something that I took as confirming my intuition about a plausible advantage the higher seed has.) If, in the second and third rounds, the higher-seed wins 75 percent of the time and the lower-seed 25 percent, then the outcome of each game is about 0.8113 bits of information. Since there are 24 games total in the second and third rounds, that suggests the second and third rounds carry about 19.471 bits of information.

Warning: Again, I went to three digits past the decimal just because three digits looks nice. Given that I do not actually know the chance a higher-seed beats a lower-seed in these rounds, and that I just made up a number that seems plausible you should not be surprised if the actual information content turns out to be 19.468 or even 19.472 bits of information.

Taking all these numbers, though — the first round with its something like 21.567 bits of information; the second and third rounds with something like 19.471 bits; the fourth through sixth rounds with 7 bits — the conclusion is that the win/loss results of the entire 63-game tournament are about 48 bits of information. It’s a bit higher the more unpredictable the games involving the final 32 and the Sweet 16 are; it’s a bit lower the more foregone those conclusions are. But 48 bits sounds like a plausible enough answer to me.

What We Talk About When We Talk About How Interesting What We’re Talking About Is

When I wrote last weekend’s piece about how interesting a basketball tournament was, I let some terms slide without definition, mostly so I could explain what ideas I wanted to use and how they should relate. My love, for example, read the article and looked up and asked what exactly I meant by “interesting”, in the attempt to measure how interesting a set of games might be, even if the reasoning that brought me to a 63-game tournament having an interest level of 63 seemed to satisfy.

When I spoke about something being interesting, what I had meant was that it’s something whose outcome I would like to know. In mathematical terms this “something whose outcome I would like to know” is often termed an `experiment’ to be performed or, even better, a `message’ that presumably I wil receive; and the outcome is the “information” of that experiment or message. And information is, in this context, something you do not know but would like to.

So the information content of a foregone conclusion is low, or at least very low, because you already know what the result is going to be, or are pretty close to knowing. The information content of something you can’t predict is high, because you would like to know it but there’s no (accurately) guessing what it might be.

This seems like a straightforward idea of what information should mean, and it’s a very fruitful one; the field of “information theory” and a great deal of modern communication theory is based on them. This doesn’t mean there aren’t some curious philosophical implications, though; for example, technically speaking, this seems to imply that anything you already know is by definition not information, and therefore learning something destroys the information it had. This seems impish, at least. Claude Shannon, who’s largely responsible for information theory as we now know it, was renowned for jokes; I recall a Time Life science-series book mentioning how he had built a complex-looking contraption which, turned on, would churn to life, make a hand poke out of its innards, and turn itself off, which makes me smile to imagine. Still, this definition of information is a useful one, so maybe I’m imagining a prank where there’s not one intended.

And something I hadn’t brought up, but which was hanging awkwardly loose, last time was: granted that the outcome of a single game might have an interest level, or an information content, of 1 unit, what’s the unit? If we have units of mass and length and temperature and spiciness of chili sauce, don’t we have a unit of how informative something is?

We have. If we measure how interesting something is — how much information there is in its result — using base-two logarithms the way we did last time, then the unit of information is a bit. That is the same bit that somehow goes into bytes, which go on your computer into kilobytes and megabytes and gigabytes, and onto your hard drive or USB stick as somehow slightly fewer gigabytes than the label on the box says. A bit is, in this sense, the amount of information it takes to distinguish between two equally likely outcomes. Whether that’s a piece of information in a computer’s memory, where a 0 or a 1 is a priori equally likely, or whether it’s the outcome of a basketball game between two evenly matched teams, it’s the same quantity of information to have.

So a March Madness-style tournament has an information content of 63 bits, if all you’re interested in is which teams win. You could communicate the outcome of the whole string of matches by indicating whether the “home” team wins or loses for each of the 63 distinct games. You could do it with 63 flashes of light, or a string of dots and dashes on a telegraph, or checked boxes on a largely empty piece of graphing paper, coins arranged tails-up or heads-up, or chunks of memory on a USB stick. We’re quantifying how much of the message is independent of the medium.

How Interesting Is A Basketball Tournament?

Yes, I can hear people snarking, “not even the tiniest bit”. These are people who think calling all athletic contests “sportsball” is still a fresh and witty insult. No matter; what I mean to talk about applies to anything where there are multiple possible outcomes. If you would rather talk about how interesting the results of some elections are, or whether the stock market rises or falls, whether your preferred web browser gains or loses market share, whatever, read it as that instead. The work is all the same.

'I didn't think much of college basketball till I discovered online betting.' — Mark Litzler’s Joe Vanilla for the 14th of March, 2015. We all make events interesting in our own ways.

To talk about quantifying how interesting the outcome of a game (election, trading day, whatever) means we have to think about what “interesting” qualitatively means. A sure thing, a result that’s bound to happen, is not at all interesting, since we know going in that it’s the result. A result that’s nearly sure but not guaranteed is at least a bit interesting, since after all, it might not happen. An extremely unlikely result would be extremely interesting, if it could happen.

Continue reading “How Interesting Is A Basketball Tournament?”

Reading the Comics, February 14, 2015: Valentine’s Eve Edition

I haven’t had the chance to read today’s comics, what with it having snowed just enough last night that we have to deal with it instead of waiting for the sun to melt it, so, let me go with what I have. There’s a sad lack of strips I feel justified including the images of, since they’re all Gocomics.com representatives and I’m used to those being reasonably stable links. Too bad.

Eric the Circle has a pair of strips by Griffinetsabine, the first on the 7th of February, and the next on February 13, both returning to “the Shape Single’s Bar” and both working on “complementary angles” for a pun. That all may help folks remember the difference between complementary angles — those add up to a right angle — and supplementary angles — those add up to two right angles, a straight line — although what it makes me wonder is the organization behind the Eric the Circle art collective. It hasn’t got any nominal author, after all, and there’s what appear to be different people writing and often drawing it, so, who does the scheduling so that the same joke doesn’t get repeated too frequently? I suppose there’s some way of finding that out for myself, but this is the Internet, so it’s easier to admit my ignorance and let the answer come up to me.

Mark Anderson’s Andertoons (February 10) surprised me with a joke about the Dewey decimal system that I hadn’t encountered before. I don’t know how that happened; it just did. This is, obviously, a use of decimal that’s distinct from the number system, but it’s so relatively rare to think of decimals as apart from representations of numbers that pointing it out has the power to surprise me at least.

Continue reading “Reading the Comics, February 14, 2015: Valentine’s Eve Edition”

The Geometry of Thermodynamics (Part 2)

I should mention — I should have mentioned earlier, but it has been a busy week — that CarnotCycle has published the second part of “The Geometry of Thermodynamics”. This is a bit of a tougher read than the first part, admittedly, but it’s still worth reading. The essay reviews how James Clerk Maxwell — yes, that Maxwell — developed the thermodynamic relationships that would have made him famous in physics if it weren’t for his work in electromagnetism that ultimately overthrew the Newtonian paradigm of space and time.

The ingenious thing is that the best part of this work is done on geometric grounds, on thinking of the spatial relationships between quantities that describe how a system moves heat around. “Spatial” may seem a strange word to describe this since we’re talking about things that don’t have any direct physical presence, like “temperature” and “entropy”. But if you draw pictures of how these quantities relate to one another, you have curves and parallelograms and figures that follow the same rules of how things fit together that you’re used to from ordinary everyday objects.

A wonderful side point is a touch of human fallibility from a great mind: in working out his relations, Maxwell misunderstood just what was meant by “entropy”, and needed correction by the at-least-as-great Josiah Willard Gibbs. Many people don’t quite know what to make of entropy even today, and Maxwell was working when the word was barely a generation away from being coined, so it’s quite reasonable he might not understand a term that was relatively new and still getting its precise definition. It’s surprising nevertheless to see.

carnotcycle

James Clerk Maxwell and the geometrical figure with which he proved his famous thermodynamic relations

Historical background

Every student of thermodynamics sooner or later encounters the Maxwell relations – an extremely useful set of statements of equality among partial derivatives, principally involving the state variables P, V, T and S. They are general thermodynamic relations valid for all systems.

The four relations originally stated by Maxwell are easily derived from the (exact) differential relations of the thermodynamic potentials:

dU = TdS – PdV   ⇒   (∂T/∂V)_S = –(∂P/∂S)_V
dH = TdS + VdP   ⇒   (∂T/∂P)_S = (∂V/∂S)_P
dG = –SdT + VdP   ⇒   –(∂S/∂P)_T = (∂V/∂T)_P
dA = –SdT – PdV   ⇒   (∂S/∂V)_T = (∂P/∂T)_V

This is how we obtain these Maxwell relations today, but it disguises the history of their discovery. The thermodynamic state functions H, G and A were yet to…

View original post 1,262 more words

Reading the Comics, August 29, 2014: Recurring Jokes Edition

Well, I did say we were getting to the end of summer. It’s taken only a couple days to get a fresh batch of enough mathematics-themed comics to include here, although the majority of them are about mathematics in ways that we’ve seen before, sometimes many times. I suppose that’s fair; it’s hard to keep thinking of wholly original mathematics jokes, after all. When you’ve had one killer gag about “537”, it’s tough to move on to “539” and have it still feel fresh.

Tom Toles’s Randolph Itch, 2 am (August 27, rerun) presents Randolph suffering the nightmare of contracting a case of entropy. Entropy might be the 19th-century mathematical concept that’s most achieved popular recognition: everyone knows it as some kind of measure of how disorganized things are, and that it’s going to ever increase, and if pressed there’s maybe something about milk being stirred into coffee that’s linked with it. The mathematical definition of entropy is tied to the probability one will find whatever one is looking at in a given state. Work out the probability of finding a system in a particular state — having particles in these positions, with these speeds, maybe these bits of magnetism, whatever — and multiply that by the logarithm of that probability. Work out that product for all the possible ways the system could possibly be configured, however likely or however improbable, just so long as they’re not impossible states. Then add together all those products over all possible states. (This is when you become grateful for learning calculus, since that makes it imaginable to do all these multiplications and additions.) That’s the entropy of the system. And it applies to things with stunning universality: it can be meaningfully measured for the stirring of milk into coffee, to heat flowing through an engine, to a body falling apart, to messages sent over the Internet, all the way to the outcomes of sports brackets. It isn’t just body parts falling off.

Stanley's old algebra teacher insists there is yet hope for him. — Randy Glasbergen’s _The Better Half_ For the 28th of August, 2014.

Randy Glasbergen’s The Better Half (August 28) does the old joke about not giving up on algebra someday being useful. Do teachers in other subjects get this? “Don’t worry, someday your knowledge of the Panic of 1819 will be useful to you!” “Never fear, someday they’ll all look up to you for being able to diagram a sentence!” “Keep the faith: you will eventually need to tell someone who only speaks French that the notebook of your uncle is on the table of your aunt!”

Eric the Circle (August 28, by “Gilly” this time) sneaks into my pages again by bringing a famous mathematical symbol into things. I’d like to make a mention of the links between mathematics and music which go back at minimum as far as the Ancient Greeks and the observation that a lyre string twice as long produced the same note one octave lower, but lyres and strings don’t fit the reference Gilly was going for here. Too bad.

Zach Weinersmith’s Saturday Morning Breakfast Cereal (August 28) is another strip to use a “blackboard full of mathematical symbols” as visual shorthand for “is incredibly smart stuff going on”. The symbols look to me like they at least started out as being meaningful — they’re the kinds of symbols I expect in describing the curvature of space, and which you can find by opening up a book about general relativity — though I’m not sure they actually stay sensible. (It’s not the kind of mathematics I’ve really studied.) However, work in progress tends to be sloppy, the rough sketch of an idea which can hopefully be made sound.

Anthony Blades’s Bewley (August 29) has the characters stare into space pondering the notion that in the vastness of infinity there could be another of them out there. This is basically the same existentially troublesome question of the recurrence of the universe in enough time, something not actually prohibited by the second law of thermodynamics and the way entropy tends to increase with the passing of time, but we have already talked about that.

Reading the Comics, August 16, 2014: Saturday Morning Breakfast Cereal Edition

Zach Weinersmith’s Saturday Morning Breakfast Cereal is a long-running and well-regarded web comic that I haven’t paid much attention to because I don’t read many web comics. XKCD, Newshounds, and a couple others are about it. I’m not opposed to web comics, mind you, I just don’t get around to following them typically. But Saturday Morning Breakfast Cereal started running on Gocomics.com recently, and Gocomics makes it easy to start adding comics, and I did, and that’s served me well for the mathematical comics collections since it’s been a pretty dry spell. I bet it’s the summer vacation.

Saturday Morning Breakfast Cereal (July 30) seems like a reach for inclusion in mathematical comics since its caption is “Physicists make lousy firemen” and it talks about the action of a fire — and of the “living things” caught in the fire — as processes producing wobbling and increases in disorder. That’s an effort at describing a couple of ideas, the first that the temperature of a thing is connected to the speed at which the molecules making it up are moving, and the second that the famous entropy is a never-decreasing quantity. We get these notions from thermodynamics and particularly the attempt to understand physically important quantities like heat and temperature in terms of particles — which have mass and position and momentum — and their interactions. You could write an entire blog about entropy and probably someone does.

Randy Glasbergen’s Glasbergen Cartoons (August 2) uses the word-problem setup for a strip of “Dog Math” and tries to remind everyone teaching undergraduates the quotient rule that it really could be worse, considering.

Nate Fakes’s Break of Day (August 4) takes us into an anthropomorphized world that isn’t numerals for a change, to play on the idea that skill in arithmetic is evidence of particular intelligence.

Jiggs tries to explain addition to his niece, and learns his brother-in-law is his brother-in-law. — George McManus’s _Bringing Up Father_, originally run the 12th of April, 1949.

George McManus’s Bringing Up Father (August 11, rerun from April 12, 1949) goes to the old motif of using money to explain addition problems. It’s not a bad strategy, of course: in a way, arithmetic is one of the first abstractions one does, in going from the idea that a hundred of something added to a hundred fifty of something will yield two hundred fifty of that thing, and it doesn’t matter what that something is: you’ve abstracted out the ideas of “a hundred plus a hundred fifty”. In algebra we start to think about whether we can add together numbers without knowing what one or both of the numbers are — “x plus y” — and later still we look at adding together things that aren’t necessarily numbers.

And back to Saturday Morning Breakfast Cereal (August 13), which has a physicist type building a model of his “lack of dates” based on random walks and, his colleague objects, “only works if we assume you’re an ideal gas molecule”. But models are often built on assumptions that might, taken literally, be nonsensical, like imagining the universe to have exactly three elements in it, supposing that people never act against their maximal long-term economic gain, or — to summon a traditional mathematics/physics joke — assuming a spherical cow. The point of a model is to capture some interesting behavior, and avoid the complicating factors that can’t be dealt with precisely or which don’t relate to the behavior being studied. Choosing how to simplify is the skill and art that earns mathematicians the big money.

And then for August 16, Saturday Morning Breakfast Cereal does a binary numbers joke. I confess my skepticism that there are any good alternate-base-number jokes, but you might like them.

The Geometry of Thermodynamics (Part 1)

I should mention that Peter Mander’s Carnot Cycle blog has a fine entry, “The Geometry of Thermodynamics (Part I)” which admittedly opens with a diagram that looks like the sort of thing you create when you want to present a horrifying science diagram. That’s a bit of flavor.

Mander writes about part of what made J Willard Gibbs probably the greatest theoretical physicist that the United States has yet produced: Gibbs put much of thermodynamics into a logically neat system, the kind we still basically use today, and all the better saw represent it and understand it as a matter of surface geometries. This is an abstract kind of surface — looking at the curve traced out by, say, mapping the energy of a gas against its volume, or its temperature versus its entropy — but if you can accept the idea that we can draw curves representing these quantities then you get to use your understanding how how solid objects (and Gibbs even got made solid objects — James Clerk Maxwell, of Maxwell’s Equations fame, even sculpted some) look and feel.

This is a reblogging of only part one, although as Mander’s on summer holiday you haven’t missed part two.

carnotcycle

Volume One of the Scientific Papers of J. Willard Gibbs, published posthumously in 1906, is devoted to Thermodynamics. Chief among its content is the hugely long and desperately difficult “On the equilibrium of heterogeneous substances (1876, 1878)”, with which Gibbs single-handedly laid the theoretical foundations of chemical thermodynamics.

In contrast to James Clerk Maxwell’s textbook Theory of Heat (1871), which uses no calculus at all and hardly any algebra, preferring geometry as the means of demonstrating relationships between quantities, Gibbs’ magnum opus is stuffed with differential equations. Turning the pages of this calculus-laden work, one could easily be drawn to the conclusion that the writer was not a visual thinker.

But in Gibbs’ case, this is far from the truth.

The first two papers on thermodynamics that Gibbs published, in 1873, were in fact visually-led. Paper I deals with indicator diagrams and their comparative properties, while Paper II

View original post 1,490 more words

Reading the Comics, May 4, 2014: Summing the Series Edition

Before I get to today’s round of mathematics comics, a legend-or-joke, traditionally starring John Von Neumann as the mathematician.

The recreational word problem goes like this: two bicyclists, twenty miles apart, are pedaling toward each other, each at a steady ten miles an hour. A fly takes off from the first bicyclist, heading straight for the second at fifteen miles per hour (ground speed); when it touches the second bicyclist it instantly turns around and returns to the first at again fifteen miles per hour, at which point it turns around again and head for the second, and back to the first, and so on. By the time the bicyclists reach one another, the fly — having made, incidentally, infinitely many trips between them — has travelled some distance. What is it?

And this is not hard problem to set up, inherently: each leg of the fly’s trip is going to be a certain ratio of the previous leg, which means that formulas for a geometric infinite series can be used. You just need to work out what the lengths of those legs are to start with, and what that ratio is, and then work out the formula in your head. This is a bit tedious and people given the problem may need some time and a couple sheets of paper to make it work.

Von Neumann, who was an expert in pretty much every field of mathematics and a good number of those in physics, allegedly heard the problem and immediately answered: 15 miles! And the problem-giver said, oh, he saw the trick. (Since the bicyclists will spend one hour pedaling before meeting, and the fly is travelling fifteen miles per hour all that time, it travels a total of a fifteen miles. Most people don’t think of that, and try to sum the infinite series instead.) And von Neumann said, “What trick? All I did was sum the infinite series.”

Did this charming story of a mathematician being all mathematicky happen? Wikipedia’s description of the event credits Paul Halmos’s recounting of Nicholas Metropolis’s recounting of the story, which as a source seems only marginally better than “I heard it on the Internet somewhere”. (Other versions of the story give different distances for the bicyclists and different speeds for the fly.) But it’s a wonderful legend and can be linked to a Herb and Jamaal comic strip from this past week.

Paul Trap’s Thatababy (April 29) has the baby “blame entropy”, which fits as a mathematical concept, it seems to me. Entropy as a concept was developed in the mid-19th century as a thermodynamical concept, and it’s one of those rare mathematical constructs which becomes a superstar of pop culture. It’s become something of a fancy word for disorder or chaos or just plain messes, and the notion that the entropy of a system is ever-increasing is probably the only bit of statistical mechanics an average person can be expected to know. (And the situation is more complicated than that; for example, it’s just more probable that the entropy is increasing in time.)

Entropy is a great concept, though, as besides capturing very well an idea that’s almost universally present, it also turns out to be meaningful in surprising new places. The most powerful of those is in information theory, which is just what the label suggests; the field grew out of the problem of making messages understandable even though the telegraph or telephone lines or radio beams on which they were sent would garble the messages some, even if people sent or received the messages perfectly, which they would not. The most captivating (to my mind) new place is in black holes: the event horizon of a black hole has a surface area which is (proportional to) its entropy, and consideration of such things as the conservation of energy and the link between entropy and surface area allow one to understand something of the way black holes ought to interact with matter and with one another, without the mathematics involved being nearly as complicated as I might have imagined a priori.

Meanwhile, Lincoln Pierce’s Big Nate (April 30) mentions how Nate’s Earned Run Average has changed over the course of two innings. Baseball is maybe the archetypical record-keeping statistics-driven sport; Alan Schwarz’s The Numbers Game: Baseball’s Lifelong Fascination With Statistics notes that the keeping of some statistical records were required at least as far back as 1837 (in the Constitution of the Olympic Ball Club of Philadelphia). Earned runs — along with nearly every other baseball statistic the non-stathead has heard of other than batting averages — were developed as a concept by the baseball evangelist and reporter Henry Chadwick, who presented them from 1867 as an attempt to measure the effectiveness of batting and fielding. (The idea of the pitcher as an active player, as opposed to a convenient way to get the ball into play, was still developing.) But — and isn’t this typical? — he would come to oppose the earned run average as a measure of pitching performance, because things that were really outside the pitcher’s control, such as stolen bases, contributed to it.

It seems to me there must be some connection between the record-keeping of baseball and the development of statistics as a concept in the 19th century. Granted the 19th century was a century of statistics, starting with nation-states measuring their populations, their demographics, their economies, and projecting what this would imply for future needs; and then with science, as statistical mechanics found it possible to quite well understand the behavior of millions of particles despite it being impossible to perfectly understand four; and in business, as manufacturing and money were made less individual and more standard. There was plenty to drive the field without an amusing game, but, I can’t help thinking of sports as a gateway into the field.

Creator.com's _Donald Duck_ for 2 May 2014: Ludwig von Drake orders his computer to stop with the thinking.

The Disney Company’s Donald Duck (May 2, rerun) suggests that Ludwig von Drake is continuing to have problems with his computing machine. Indeed, he’s apparently having the same problem yet. I’d like to know when these strips originally ran, but the host site of creators.com doesn’t give any hint.

Stephen Bentley’s Herb and Jamaal (May 3) has the kid whose name I don’t really know fret how he spent “so much time” on an equation which would’ve been easy if he’d used “common sense” instead. But that’s not a rare phenomenon mathematically: it’s quite possible to set up an equation, or a process, or a something which does indeed inevitably get you to a correct answer but which demands a lot of time and effort to finish, when a stroke of insight or recasting of the problem would remove that effort, as in the von Neumann legend. The commenter Dartpaw86, on the Comics Curmudgeon site, brought up another excellent example, from Katie Tiedrich’s Awkward Zombie web comic. (I didn’t use the insight shown in the comic to solve it, but I’m happy to say, I did get it right without going to pages of calculations, whether or not you believe me.)

However, having insights is hard. You can learn many of the tricks people use for different problems, but, say, no amount of studying the Awkward Zombie puzzle about a square inscribed in a circle inscribed in a square inscribed in a circle inscribed in a square will help you in working out the area left behind when a cylindrical tube is drilled out of a sphere. Setting up an approach that will, given enough work, get you a correct solution is worth knowing how to do, especially if you can give the boring part of actually doing the calculations to a computer, which is indefatigable and, certain duck-based operating systems aside, pretty reliable. That doesn’t mean you don’t feel dumb for missing the recasting.

Rick Detorie's _One Big Happy_ for 3 May 2014: Joe names the whole numbers.

Rick DeTorie’s One Big Happy (May 3) puns a little on the meaning of whole numbers. It might sound a little silly to have a name for only a handful of numbers, but, there’s no reason not to if the group is interesting enough. It’s possible (although I’d be surprised if it were the case) that there are only 47 Mersenne primes (a number, such as 7 or 31, that is one less than a whole power of 2), and we have the concept of the “odd perfect number”, when there might well not be any such thing.

CarnotCycle on the Gibbs-Helmholtz Equation

I’m a touch late discussing this and can only plead that it has been December after all. Over on the CarnotCycle blog — which is focused on thermodynamics in a way I rather admire — was recently a discussion of the Gibbs-Helmholtz Equation, which turns up in thermodynamics classes, and goes a bit better than the class I remember by showing a couple examples of actually using it to understand how chemistry works. Well, it’s so easy in a class like this to get busy working with symbols and forget that thermodynamics is a supremely practical science [1].

The Gibbs-Helmholtz Equation — named for Josiah Willard Gibbs and for Hermann von Helmholtz, both of whom developed it independently (Helmholtz first) — comes in a couple of different forms, which CarnotCycle describes. All these different forms are meant to describe whether a particular change in a system is likely to happen. CarnotCycle’s discussion gives a couple of examples of actually working out the numbers, including for the Haber process, which I don’t remember reading about in calculative detail before. So I wanted to recommend it as a bit of practical mathematics or physics.

[1] I think it was Stephen Brush pointed out many of the earliest papers in thermodynamics appeared in railroad industry journals, because the problems of efficiently getting power from engines, and of how materials change when they get below freezing, are critically important to turning railroads from experimental contraptions into a productive industry. The observation might not be original to him. The observation also might have been Wolfgang Schivelbusch’s instead.

Reblog: Mixed-Up Views Of Entropy

The blog CarnotCycle, which tries to explain thermodynamics — a noble goal, since thermodynamics is a really big, really important, and criminally unpopularized part of science and mathematics — here starts from an “Unpublished Fragment” by the great Josiah Willard Gibbs to talk about entropy.

Gibbs — a strong candidate for the greatest scientist the United States ever produced, complete with fascinating biographical quirks to make him seem accessibly peculiar — gave to statistical mechanics much of the mathematical form and power that it now has. Gibbs had planned to write something about “On entropy as mixed-up-ness”, which certainly puts in one word what people think of entropy as being. The concept is more powerful and more subtle than that, though, and CarnotCycle talks about some of the subtleties.

carnotcycle

Tucked away at the back of Volume One of The Scientific Papers of J. Willard Gibbs, is a brief chapter headed ‘Unpublished Fragments’. It contains a list of nine subject headings for a supplement that Professor Gibbs was planning to write to his famous paper “On the Equilibrium of Heterogeneous Substances”. Sadly, he completed his notes for only two subjects before his death in April 1903, so we will never know what he had in mind to write about the sixth subject in the list: On entropy as mixed-up-ness.

View original post 686 more words

Kindly share this:

Encryption schemes.

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kullback-Leibler Divergence.

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this:

Kindly share this: