## Let Me Remind You How Interesting a Basketball Tournament Is

Several years ago I stumbled into a nice sequence. All my nice sequences have been things I stumbled upon. This one looked at the most basic elements of information theory by what they tell us about the NCAA College Basketball tournament. This is (in the main) a 64-team single-elimination playoff. It’s been a few years since I ran through the sequence. But it’s been a couple years since the tournament could be run with a reasonably clear conscience too. So here’s my essays:

And this spins off to questions about other sports events.

And I still figure to get to this year’s Pi Day comic strips. Soon. It’s been a while since I felt I had so much to write up.

## Let Me Tell You How Interesting March Madness Could Possibly Be

I read something alarming in the daily “Best of GoComics” e-mail this morning. It was a panel of Dave Whamond’s Reality Check. It’s a panel comic, although it stands out from the pack by having a squirrel character in the margins. And here’s the panel.

Certainly a solid enough pun to rate a mention. I don’t know of anyone actually doing a March Mathness bracket, but it’s not a bad idea. Rating mathematical terms for their importance or usefulness or just beauty might be fun. And might give a reason to talk about their meaning some. It’s a good angle to discuss what’s interesting about mathematical terms.

And that lets me segue into talking about a set of essays. The next few weeks see the NCAA college basketball tournament, March Madness. I’ve used that to write some stuff about information theory, as it applies to the question: is a basketball game interesting?

Along the way here I got to looking up actual scoring results from major sports. This let me estimate the information-theory content of the scores of soccer, (US) football, and baseball scores, to match my estimate of basketball scores’ information content.

• How Interesting Is A Football Score? Football scoring is a complicated thing. But I was able to find a trove of historical data to give me an estimate of the information theory content of a score.
• How Interesting Is A Baseball Score? Some Partial Results I found some summaries of actual historical baseball scores. Somehow I couldn’t find the detail I wanted for baseball, a sport that since 1845 has kept track of every possible bit of information, including how long the games ran, about every game ever. I made do, though.
• How Interesting Is A Baseball Score? Some Further Results Since I found some more detailed summaries and refined the estimate a little.
• How Interesting Is A Low-Scoring Game? And here, well, I start making up scores. It’s meant to represent low-scoring games such as soccer, hockey, or baseball to draw some conclusions. This includes the question: just because a distribution of small whole numbers is good for mathematicians, is that a good match for what sports scores are like?

## Is A Basketball Tournament Interesting? My Thoughts

It’s a good weekend to bring this back. I have some essays about information theory and sports contests and maybe you missed them earlier. Here goes.

And then for a follow-up I started looking into actual scoring results from major sports. This let me estimate the information-theory content of the scores of soccer, (US) football, and baseball scores, to match my estimate of basketball scores’ information content.

Don’t try to use this to pass your computer science quals. But I hope it gives you something interesting to talk about while sulking over your brackets, and maybe to read about after that.

## How Interesting Is March Madness?

And now let me close the week with some other evergreen articles. A couple years back I mixed the NCAA men’s basketball tournament with information theory to produce a series of essays that fit the title I’ve given this recap. They also sprawl out into (US) football and baseball. Let me link you to them:

I’ve found a good way to procrastinate on the next essay in the Why Stuff Can Orbit series. (I’m considering explaining all of differential calculus, or as much as anyone really needs, to save myself a little work later on.) In the meanwhile, though, here’s some interesting reading that’s come to my attention the last few weeks and that you might procrastinate your own projects with. (Remember Benchley’s Principle!)

First is Jeremy Kun’s essay Habits of highly mathematical people. I think it’s right in describing some of the worldview mathematics training instills, or that encourage people to become mathematicians. It does seem to me, though, that most everything Kun describes is also true of philosophers. I’m less certain, but I strongly suspect, that it’s also true of lawyers. These concentrations all tend to encourage thinking about we mean by things, and to test those definitions by thought experiments. If we suppose this to be true, then what implications would it have? What would we have to conclude is also true? Does it include anything that would be absurd to say? And is are the results useful enough we can accept a bit of apparent absurdity?

New York magazine had an essay: Jesse Singal’s How Researchers Discovered the Basketball “Hot Hand”. The “Hot Hand” phenomenon is one every sports enthusiast, and most casual fans, know: sometimes someone is just playing really, really well. The problem has always been figuring out whether it exists. Do anything that isn’t a sure bet long enough and there will be streaks. There’ll be a stretch where it always happens; there’ll be a stretch where it never does. That’s how randomness works.

But it’s hard to show that. The messiness of the real world interferes. A chance of making a basketball shot is not some fixed thing over the course of a career, or over a season, or even over a game. Sometimes players do seem to be hot. Certainly anyone who plays anything competitively experiences a feeling of being in the zone, during which stuff seems to just keep going right. It’s hard to disbelieve something that you witness, even experience.

So the essay describes some of the challenges of this: coming up with a definition of a “hot hand”, for one. Coming up with a way to test whether a player has a hot hand. Seeing whether they’re observed in the historical record. Singal’s essay writes about some of the history of studying hot hands. There is a lot of probability, and of psychology, and of experimental design in it.

And then there’s this intriguing question Analysis Fact Of The Day linked to: did Gaston Julia ever see a computer-generated image of a Julia Set? There are many Julia Sets; they and their relative, the Mandelbrot Set, became trendy in the fractals boom of the 1980s. If you knew a mathematics major back then, there was at least one on her wall. It typically looks like a craggly, lightning-rimmed cloud. Its shapes are not easy to imagine. It’s almost designed for the computer to render. Gaston Julia died in March of 1978. Could he have seen a depiction?

It’s not clear. The linked discussion digs up early computer renderings. It also brings up an example of a late-19th-century hand-drawn depiction of a Julia-like set, and compares it to a modern digital rendition of the thing. Numerical simulation saves a lot of tedious work; but it’s always breathtaking to see how much can be done by reason.

## How Interesting Can A Basketball Tournament Be?

The United States is about to spend a good bit of time worrying about the NCAA men’s basketball tournament. It’s a good distraction from the women’s basketball tournament and from the National Invitational Tournament. Last year I used this to write a couple essays that stepped into information theory. Nobody knowledgeable in information theory has sent me threatening letters since. So since the inspiration is back in season I’d like to bring them to your attention again:

## Reading the Comics, January 4, 2015: An Easy New Year Edition

It looks like Comic Strip Master Command wanted to give me a nice, easy start of the year. The first group of mathematics-themed comic strips doesn’t get into deep waters and so could be written up with just a few moments. I foiled them by not having even a few moments to write things up, so that I’m behind on 2016 already. I’m sure I kind of win.

Dan Thompson’s Brevity for the 1st of January starts us off with icons of counting and computing. The abacus, of course, is one of the longest-used tools for computing. The calculator was a useful stopgap between the slide rule and the smart phone. The Count infects numerals with such contagious joy. And the whiteboard is where a lot of good mathematics work gets done. And yes, I noticed the sequence of numbers on the board. The prime numbers are often cited as the sort of message an alien entity would recognize. I suppose it’s likely an intelligence alert enough to pick up messages across space would be able to recognize prime numbers. Whether they’re certain to see them as important building blocks to the ways numbers work, the way we do? I don’t know. But I would expect someone to know the sequence, at least.

Ryan Pagelow’s Buni for New Year’s Day qualifies as the anthropomorphic-numerals joke for this essay.

Scott Hilburn’s The Argyle Sweater for the 2nd of January qualifies as the Roman numerals joke for this essay. It does prompt me to wonder whether about the way people who used Roman numerals as a their primary system thought, though. Obviously, “XCIX red balloons” should be pronounced as “ninety-nine red balloons”. But would someone scan it as “ninety-nine” or would it be read as the characters, “x-c-i-x” and then that converted to a number? I’m not sure I’m expressing the thing I wonder.

Steve Moore’s In The Bleachers for the 4th of January shows a basketball player overthinking the problem of getting a ball in the basket. The overthinking includes a bundle of equations which are all relevant to the problem, though. They’re the kinds of things you get in describing an object tossed up and falling without significant air resistance. I had thought I’d featured this strip — a rerun — before, but it seems not. Moore has used the same kind of joke a couple of other times, though, and he does like getting the equations right where possible.

Justin Boyd’s absurdist Invisible Bread for the 4th of January has Mom clean up a messy hard drive by putting all the 1’s together and all the 0’s together. And, yes, that’s not how data works. We say we represent data, on a computer, with 1’s and 0’s, but those are just names. We need to call them something. They’re in truth — oh, they’re positive or negative electric charges, or magnetic fields pointing one way or another, or they’re switches that are closed or open, or whatever. That’s for the person building the computer to worry about. Our description of what a computer does doesn’t care about the physical manifestation of our data. We could be as right if we say we’re representing data with A’s and purples, or with stop signs and empty cups of tea. What’s important is the pattern, and how likely it is that a 1 will follow a 0, or a 0 will follow a 1. If that sounds reminiscent of my information-theory talk about entropy, well, good: it is. Sweeping all the data into homogenous blocks of 1’s and of 0’s, alas, wipes out the interesting stuff. Information is hidden, somehow, in the ways we line up 1’s and 0’s, whatever we call them.

Steve Boreman’s Little Dog Lost for the 4th of January does a bit of comic wordplay with ones, zeroes, and twos. I like this sort of comic interplay.

And finally, John Deering and John Newcombe saw that Facebook meme about algebra just a few weeks ago, then drew the Zack Hill for the 4th of January.

## Doesn’t The Other Team Count? How Much?

I’d worked out an estimate of how much information content there is in a basketball score, by which I was careful to say the score that one team manages in a game. I wasn’t able to find out what the actual distribution of real-world scores was like, unfortunately, so I made up a plausible-sounding guess: that college basketball scores would be distributed among the imaginable numbers (whole numbers from zero through … well, infinitely large numbers, though in practice probably not more than 150) according to a very common distribution called the “Gaussian” or “normal” distribution, that the arithmetic mean score would be about 65, and that the standard deviation, a measure of how spread out the distribution of scores is, would be about 10.

If those assumptions are true, or are at least close enough to true, then there are something like 5.4 bits of information in a single team’s score. Put another way, if you were trying to divine the score by asking someone who knew it a series of carefully-chosen questions, like, “is the score less than 65?” or “is the score more than 39?”, with at each stage each question equally likely to be answered yes or no, you could expect to hit the exact score with usually five, sometimes six, such questions.

## But How Interesting Is A Basketball Score?

When I worked out how interesting, in an information-theory sense, a basketball game — and from that, a tournament — might be, I supposed there was only one thing that might be interesting about the game: who won? Or to be exact, “did (this team) win”? But that isn’t everything we might want to know about a game. For example, we might want to know what a team scored. People often do. So how to measure this?

The answer was given, in embryo, in my first piece about how interesting a game might be. If you can list all the possible outcomes of something that has multiple outcomes, and how probable each of those outcomes is, then you can describe how much information there is in knowing the result. It’s the sum, for all of the possible results, of the quantity negative one times the probability of the result times the logarithm-base-two of the probability of the result. When we were interested in only whether a team won or lost there were just the two outcomes possible, which made for some fairly simple calculations, and indicates that the information content of a game can be as high as 1 — if the team is equally likely to win or to lose — or as low as 0 — if the team is sure to win, or sure to lose. And the units of this measure are bits, the same kind of thing we use to measure (in groups of bits called bytes) how big a computer file is.

## My Mathematics Blog, As March 2015 Would Have It

And now for my monthly review of publication statistics. This is a good month to do it with, since it was a record month: I had 1,022 pages viewed around these parts, the first time (according to WordPress) that I’ve had more than a thousand in a month. In January I’d had 944, and in February a mere 859, which I was willing to blame on the shortness of that month. March’s is a clean record, though, more views per day than either of those months.

The total number of visitors was up, too, to 468. That’s compared to 438 in January and 407 in short February, although it happens it’s not a record; that’s still held by January 2013 and its 473 visitors. The number of views per visitor keeps holding about steady: from 2.16 in January to 2.11 in February to 2.18 in March. It appears that I’m getting a little better at finding people who like to read what I like to write, but haven’t caught that thrilling transition from linear to exponential growth.

The new WordPress statistics tell me I had a record 265 likes in March, up from January’s 196 and February’s 179. The number of comments rose from January’s 51 and February’s 56 to a full 93 for March. I take all this as supporting evidence that I’m better at reaching people lately. (Although I do wonder if it counts backlinks from one of my articles to another as a comment.)

The mathematics blog starts the month at 22,837 total views, and with 454 WordPress followers.

The most popular articles in March, though, were the set you might have guessed without actually reading things around here:

I admit I thought the “how interesting is a basketball tournament?” thing would be more popular, but it’s hampered by having started out in the middle of the month. I might want to start looking at the most popular articles of the past 30 days in the middle of the month too.

The countries sending me the greatest number of readers were the usual set: the United States at 658 in first place, and Canada in second at 66. The United Kingdom was a strong third at 57, and Austria in fourth place at 30.

Sending me a single reader each were Belgium, Ecuador, Israel, Japan, Lebanon, Mexico, Nepal, Norway, Portugal, Romania, Samoa, Saudi Arabia, Slovakia, Thailand, the United Arab Emirates, Uruguay, and Venezuela. The repeats from February were Japan, Mexico, Romania, and Venezuela. Japan is on a three-month streak, while Mexico has sent me a solitary reader four months in a row. India’s declined slightly in reading me, from 6 to 5. Ah well.

Among the interesting search terms were:

• right trapezoid 5 (I loved this anime as a kid)
• a short comic strip on reminding people on how to order decimals correctly (I hope they found what they were looking for)
• are there other ways to draw a trapezoid (try with food dye on the back of your pet rabbit!)
• motto of ideal gas (veni vidi v = nRT/P ?)
• rectangular states (the majority of United States states are pretty rectangular, when you get down to it)
• what is the definition of rerun (I don’t think this has come up before)
• what are the chances of consecutive friday the 13th’s in a year (I make it out at 3/28, or a touch under 11 percent; anyone have another opinion?)

Well, with luck, I should have a fresh comic strips post soon and some more writing in the curious mix between information theory and college basketball.

## But How Interesting Is A Real Basketball Tournament?

When I wrote about how interesting the results of a basketball tournament were, and came to the conclusion that it was 63 (and filled in that I meant 63 bits of information), I was careful to say that the outcome of a basketball game between two evenly-matched opponents has an information content of 1 bit. If the game is a foregone conclusion, then the game hasn’t got so much information about it. If the game really is foregone, the information content is 0 bits; you already know what the result will be. If the game is an almost sure thing, there’s very little information to be had from actually seeing the game. An upset might be thrilling to watch, but you would hardly count on that, if you’re being rational. But most games aren’t sure things; we might expect the higher-seed to win, but it’s plausible they don’t. How does that affect how much information there is in the results of a tournament?

Last year, the NCAA College Men’s Basketball tournament inspired me to look up what the outcomes of various types of matches were, and which teams were more likely to win than others. If some person who wrote something for statistics.about.com is correct, based on 27 years of March Madness outcomes, the play between a number one and a number 16 seed is a foregone conclusion — the number one seed always wins — while number two versus number 15 is nearly sure. So while the first round of play will involve 32 games — four regions, each region having eight games — there’ll be something less than 32 bits of information in all these games, since many of them are so predictable.

If we take the results from that statistics.about.com page as accurate and reliable as a way of predicting the outcomes of various-seeded teams, then we can estimate the information content of the first round of play at least.

Here’s how I work it out, anyway:

Contest Probability the Higher Seed Wins Information Content of this Outcome
#1 seed vs #16 seed 100% 0 bits
#2 seed vs #15 seed 96% 0.2423 bits
#3 seed vs #14 seed 85% 0.6098 bits
#4 seed vs #13 seed 79% 0.7415 bits
#5 seed vs #12 seed 67% 0.9149 bits
#6 seed vs #11 seed 67% 0.9149 bits
#7 seed vs #10 seed 60% 0.9710 bits
#8 seed vs #9 seed 47% 0.9974 bits

So if the eight contests in a single region were all evenly matched, the information content of that region would be 8 bits. But there’s one sure and one nearly-sure game in there, and there’s only a couple games where the two teams are close to evenly matched. As a result, I make out the information content of a single region to be about 5.392 bits of information. Since there’s four regions, that means the first round of play — the first 32 games — have altogether about 21.567 bits of information.

Warning: I used three digits past the decimal point just because three is a nice comfortable number. Do not by hypnotized into thinking this is a more precise measure than it really is. I don’t know what the precise chance of, say, a number three seed beating a number fourteen seed is; all I know is that in a 27-year sample, it happened the higher-seed won 85 percent of the time, so the chance of the higher-seed winning is probably close to 85 percent. And I only know that if whoever it was wrote this article actually gathered and processed and reported the information correctly. I would not be at all surprised if the first round turned out to have only 21.565 bits of information, or as many as 21.568.

A statistical analysis of the tournaments which I dug up last year indicated that in the last three rounds — the Elite Eight, Final Four, and championship game — the higher- and lower-seeded teams are equally likely to win, and therefore those games have an information content of 1 bit per game. The last three rounds therefore have 7 bits of information total.

Unfortunately, experimental data seems to fall short for the second round — 16 games, where the 32 winners in the first round play, producing the Sweet Sixteen teams — and the third round — 8 games, producing the Elite Eight. If someone’s done a study of how often the higher-seeded team wins I haven’t run across it.

There are six of these games in each of the four regions, for 24 games total. Presumably the higher-seeded is more likely than the lower-seeded to win, but I don’t know how much more probable it is the higher-seed will win. I can come up with some bounds: the 24 games total in the second and third rounds can’t have an information content less than 0 bits, since they’re not all foregone conclusions. The higher-ranked seed won’t win all the time. And they can’t have an information content of more than 24 bits, since that’s how much there would be if the games were perfectly even matches.

So, then: the first round carries about 21.567 bits of information. The second and third rounds carry between 0 and 24 bits. The fourth through sixth rounds (the sixth round is the championship game) carry seven bits. Overall, the 63 games of the tournament carry between 28.567 and 52.567 bits of information. I would expect that many of the second-round and most of the third-round games are pretty close to even matches, so I would expect the higher end of that range to be closer to the true information content.

Let me make the assumption that in this second and third round the higher-seed has roughly a chance of 75 percent of beating the lower seed. That’s a number taken pretty arbitrarily as one that sounds like a plausible but not excessive advantage the higher-seeded teams might have. (It happens it’s close to the average you get of the higher-seed beating the lower-seed in the first round of play, something that I took as confirming my intuition about a plausible advantage the higher seed has.) If, in the second and third rounds, the higher-seed wins 75 percent of the time and the lower-seed 25 percent, then the outcome of each game is about 0.8113 bits of information. Since there are 24 games total in the second and third rounds, that suggests the second and third rounds carry about 19.471 bits of information.

Warning: Again, I went to three digits past the decimal just because three digits looks nice. Given that I do not actually know the chance a higher-seed beats a lower-seed in these rounds, and that I just made up a number that seems plausible you should not be surprised if the actual information content turns out to be 19.468 or even 19.472 bits of information.

Taking all these numbers, though — the first round with its something like 21.567 bits of information; the second and third rounds with something like 19.471 bits; the fourth through sixth rounds with 7 bits — the conclusion is that the win/loss results of the entire 63-game tournament are about 48 bits of information. It’s a bit higher the more unpredictable the games involving the final 32 and the Sweet 16 are; it’s a bit lower the more foregone those conclusions are. But 48 bits sounds like a plausible enough answer to me.

When I wrote last weekend’s piece about how interesting a basketball tournament was, I let some terms slide without definition, mostly so I could explain what ideas I wanted to use and how they should relate. My love, for example, read the article and looked up and asked what exactly I meant by “interesting”, in the attempt to measure how interesting a set of games might be, even if the reasoning that brought me to a 63-game tournament having an interest level of 63 seemed to satisfy.

When I spoke about something being interesting, what I had meant was that it’s something whose outcome I would like to know. In mathematical terms this “something whose outcome I would like to know” is often termed an experiment’ to be performed or, even better, a message’ that presumably I wil receive; and the outcome is the “information” of that experiment or message. And information is, in this context, something you do not know but would like to.

So the information content of a foregone conclusion is low, or at least very low, because you already know what the result is going to be, or are pretty close to knowing. The information content of something you can’t predict is high, because you would like to know it but there’s no (accurately) guessing what it might be.

This seems like a straightforward idea of what information should mean, and it’s a very fruitful one; the field of “information theory” and a great deal of modern communication theory is based on them. This doesn’t mean there aren’t some curious philosophical implications, though; for example, technically speaking, this seems to imply that anything you already know is by definition not information, and therefore learning something destroys the information it had. This seems impish, at least. Claude Shannon, who’s largely responsible for information theory as we now know it, was renowned for jokes; I recall a Time Life science-series book mentioning how he had built a complex-looking contraption which, turned on, would churn to life, make a hand poke out of its innards, and turn itself off, which makes me smile to imagine. Still, this definition of information is a useful one, so maybe I’m imagining a prank where there’s not one intended.

And something I hadn’t brought up, but which was hanging awkwardly loose, last time was: granted that the outcome of a single game might have an interest level, or an information content, of 1 unit, what’s the unit? If we have units of mass and length and temperature and spiciness of chili sauce, don’t we have a unit of how informative something is?

We have. If we measure how interesting something is — how much information there is in its result — using base-two logarithms the way we did last time, then the unit of information is a bit. That is the same bit that somehow goes into bytes, which go on your computer into kilobytes and megabytes and gigabytes, and onto your hard drive or USB stick as somehow slightly fewer gigabytes than the label on the box says. A bit is, in this sense, the amount of information it takes to distinguish between two equally likely outcomes. Whether that’s a piece of information in a computer’s memory, where a 0 or a 1 is a priori equally likely, or whether it’s the outcome of a basketball game between two evenly matched teams, it’s the same quantity of information to have.

So a March Madness-style tournament has an information content of 63 bits, if all you’re interested in is which teams win. You could communicate the outcome of the whole string of matches by indicating whether the “home” team wins or loses for each of the 63 distinct games. You could do it with 63 flashes of light, or a string of dots and dashes on a telegraph, or checked boxes on a largely empty piece of graphing paper, coins arranged tails-up or heads-up, or chunks of memory on a USB stick. We’re quantifying how much of the message is independent of the medium.

## Reading the Comics, March 15, 2015: Pi Day Edition

I had kind of expected the 14th of March — the Pi Day Of The Century — would produce a flurry of mathematics-themed comics. There were some, although they were fewer and less creatively diverse than I had expected. Anyway, between that, and the regular pace of comics, there’s plenty for me to write about. Recently featured, mostly on Gocomics.com, a little bit on Creators.com, have been:

Brian Anderson’s Dog Eat Doug (March 11) features a cat who claims to be “pondering several quantum equations” to prove something about a parallel universe. It’s an interesting thing to claim because, really, how can the results of an equation prove something about reality? We’re extremely used to the idea that equations can model reality, and that the results of equations predict real things, to the point that it’s easy to forget that there is a difference. A model’s predictions still need some kind of validation, reason to think that these predictions are meaningful and correct when done correctly, and it’s quite hard to think of a meaningful way to validate a predication about “another” universe.

## Gaussian distribution of NBA scores

The Prior Probability blog points out an interesting graph, showing the most common scores in basketball teams, based on the final scores of every NBA game. It’s actually got three sets of data there, one for all basketball games, one for games this decade, and one for basketball games of the 1950s. Unsurprisingly there’s many more results for this decade — the seasons are longer, and there are thirty teams in the league today, as opposed to eight or nine in 1954. (The Baltimore Bullets played fourteen games before folding, and the games were expunged from the record. The league dropped from eleven teams in 1950 to eight for 1954-1959.)

I’m fascinated by this just as a depiction of probability distributions: any team can, in principle, reach most any non-negative score in a game, but it’s most likely to be around 102. Surely there’s a maximum possible score, based on the fact a team has to get the ball and get into position before it can score; I’m a little curious what that would be.

Prior Probability itself links to another blog which reviews the distribution of scores for other major sports, and the interesting result of what the most common basketball score has been, per decade. It’s increased from the 1940s and 1950s, but it’s considerably down from the 1960s.

You can see the most common scores in such sports as basketball, football, and baseball in Philip Bump’s fun Wonkblog post here. Mr Bump writes: “Each sport follows a rough bell curve … Teams that regularly fall on the left side of that curve do poorly. Teams that land on the right side do well.” Read more about Gaussian distributions here.

View original post

## The Math Blog Statistics, March 2014

It’s the start of a fresh month, so let me carry on my blog statistics reporting. In February 2014, apparently, there were a mere 423 pages viewed around here, with 209 unique visitors. That’s increased a bit, to 453 views from 257 visitors, my second-highest number of views since last June and second-highest number of visitors since last April. I can make that depressing, though: it means views per visitor dropped from 2.02 to 1.76, but then, they were at 1.76 in January anyway. And I reached my 14,000th page view, which is fun, but I’d need an extraordinary bit of luck to get to 15,000 this month.

March’s most popular articles were a mix of the evergreens — trapezoids and comics — with a bit of talk about March Madness serving as obviously successful clickbait:

1. How Many Trapezoids I Can Draw, and again, nobody’s found one I overlooked.
2. Calculating March Madness, and the tricky problem of figuring out the chance of getting a perfect bracket.
3. Reading The Comics, March 1, 2014: Isn’t It One-Half X Squared Plus C? Edition, showing how well an alleged joke will make comic strips popular.
4. Reading The Comics, March 26, 2014: Kitchen Science Department, showing that maybe it’s just naming the comics installments that matters.
5. What Are The Chances Of An Upset, which introduces some of the interesting quirks of the bracket and seed system of playoffs, such as the apparent advantage an eleventh seed has over an eighth seed.

There’s a familiar set of countries sending me the most readers: as ever the United States up top (277), with Denmark in second (26) and Canada in third (17). That’s almost a tie, though, as the United Kingdom (16), Austria (15), and the Philippines (13) could have taken third easily. I don’t want to explicitly encourage international rivalries to drive up my page count here, I’m just pointing it out. Singapore is in range too. The single-visitor countries this past month were the Bahamas, Belgium, Brazil, Colombia, Hungary, Mexico, Peru, Rwanda, Saudi Arabia, Spain, Sri Lanka, Sweden, Syria, and Taiwan. Hungary, Peru, and Saudi Arabia are the only repeat visitors from February, and nobody’s got a three-month streak going.

There wasn’t any good search-term poetry this month; mostly it was questions about trapezoids, but there were a couple interesting ones:

So, that’s where things stand: I need to get back to writing about trapezoids and comic strips.

## What Are The Chances Of An Upset?

I’d wondered idly the other day if a number-16 seed had ever lost to a number-one seed in the NCAA Men’s Basketball tournament. This finally made me go and actually try looking it up; a page on statistics.about.com has what it claims are the first-round results from 1985 (when the current 64-team format was adopted) to 2012. This lets us work out roughly the probability of, for example, the number-three seed beating the number-14, at least by what’s termed the “frequentist” interpretation of probability. In that interpretation, the probability of something happening is roughly how many times the thing you’re interested in happens for the number of times it could happen. From 1985 to 2012 each of the various first-round possibilites was played 112 times (28 tournaments with four divisions each); if we make some plausible assumptions about games being independent events (how one seed did last year doesn’t affect how it does this year), we should have a decent rough idea of the probability of each seed winning.

According to its statistics, and remarkable to me, is that apparently the number-one seed has never been beaten by the number-16. I’m surprised; I’d have guessed the bottom team had at least a one percent chance of victory. I’m also surprised that the Internet seems to have only the one page that’s gathered explicitly how often the first rounds go to the various seeds, although perhaps I’m just not searching for the right terms.

From http://bracketodds.cs.illinois.edu I learn that Dr Sheldon Jacobson and Dr Douglas M King of the University of Illinois (Urbana) published an interesting paper “Seeding In The NCAA Men’s Basketball Tournament: When is A Higher Seed Better?” which runs a variety of statistical tests on the outcomes of March Madness tournaments and finds that the seeding does seem to correspond to the stronger team in the first few rounds, but that after the Elite Eight round there’s not the evidence that a higher seed is more likely to win than the lower; effectively, after the first few rounds you might as well make a random pick.

Jacobson and King, along with Dr Alexander Nikolaev at SUNY/Buffalo and Dr Adrian J Lee, Central Illinois Technology and Education Research Institute, also wrote “Seed Distributions for the NCAA Men’s Basketball Tournament” which tries to model the tournament’s outcomes as random variables, and compares how these random-variable projections compare to what actually happened between 1985 and 2010. This includes some interesting projections about how often we might expect the various seeds to appear in the Sweet Sixteen, Elite Eight, or Final Four. It brings out some surprises — which make sense when you look back at the brackets — such as that the number-eight or number-nine seed has a worse chance of getting to the Sweet Sixteen than the eleventh- or twelfth-seed does.

(The eighth or ninth seed, if they win, have to play whoever wins the sixteen-versus-one contest, which will be the number-one seed. The eleventh seed has to beat first the number-six seed, and then either the number-three or the number-14 seed, either one of which is more likely.)

Meanwhile, it turns out that in my brackets I had picked Connecticut to beat Villanova, which has me doing well in my group — we get bonus points for calling upsets — apart from the accusations of witchcraft.

I did join a little group of people competing to try calling the various NCAA basketball tournament brackets. It’s a silly pastime and way to commiserate with other people about how badly we’re doing forecasting the outcome of the 63 games in the match. We’re competing just for points and the glory of doing a little better than our friends, but there’s some actual betting pools out there, and some contests that offer, for perfect brackets, a billion dollars (Warren Buffet, if I have that right), or maybe even a new car (WLNS-TV, channel 6, Lansing).

Working out what the odds are of getting all 63 games right is more interesting than it might seem at first. The natural (it seems to me) first guess at working out the odds is to say, well, there are 63 games, and whatever team you pick has a 50 percent chance of winning that game, so the chance of getting all 63 games right is $\left(\frac{1}{2}\right)^{63}$, or one chance in 9,223,372,036,854,775,808.

But it’s not quite so, and the reason is buried in the assumption that every team has a 50 percent chance of winning any given game. And that’s just not so: it’s plausible (as of this writing) to think that the final game will be Michigan State playing the University of Michigan. It’s just ridiculous to think that the final game will be SUNY/Albany (16th seeded) playing Wofford (15th).

The thing is that not all the matches are equally likely to be won by either team. The contest starts out with the number one seed playing the number 16, the number two seed playing the number 15, and so on. The seeding order roughly approximates the order of how good the teams are. It doesn’t take any great stretch to imagine the number ten seed beating the number nine seed; but, has a number 16 seed ever beaten the number one?

To really work out the probability of getting all the brackets right turns into a fairly involved problem. We can probably assume that the chance of, say, number-one seed Virginia beating number-16 seed Coastal Carolina is close to how frequently number-one seeds have beaten number-16 seeds in the past, and similarly that number-four seed Michigan State’s chances over number-13 Delaware is close to that historical average. But there are some 9,223,372,036,854,775,808 possible ways that the tournament could, in principle, go, and they’ve all got different probabilities of happening.

So there isn’t a unique answer to what is the chance that you’ve picked a perfect bracket set. It’s higher if you’ve picked a lot of higher-ranking seeds, certainly, at least assuming that this year’s tournament is much like previous years’, and that seeds do somewhat well reflect how likely teams are to win. At some point it starts to be easier to accept “one chance in 9,223,372,036,854,775,808” as close enough. Me, I’ll be gloating for the whole tournament thanks to my guess that Ohio State would lose to Dayton.

[Edit: first paragraph originally read “games in the match”, which doesn’t quite parse.]

## Reblog: Lawler’s Log

I don’t intend to transform my writings here into a low-key sports mathematics blog. I just happen to have run across a couple of interesting problems and, after all, sports do offer a lot of neat questions about probability and statistics.

benperreira here makes mention of “Lawler’s Law”, something I had not previously noticed. The “Law” is the observation that the first basketball team to make it to 100 points wins the game just about 90 percent of the time. It was apparently first observed by Los Angeles Clippers announcer Ralph Lawler and has been supported by a review of the statistics of NBA teams over the decades.

benperreira is unimpressed with the law, regarding it as just a restatement of the principle that a team that scores more than the league average number of points per game will tend to have a winning record in an unduly wise-sounding phrasing. I’m inclined to agree the Law doesn’t seem to be particularly much, though I was caught by the implication that the team which lets the other get to 100 points first still pulls out a victory one time out of ten.

To underscore his point benperreira includes a diagram purporting to show the likelihood of victory to points scored, although it’s pretty obviously meant to be a quick joke extrapolating from the data that both teams start with a 50 percent chance of victory and zero points, and apparently 100 points gives a nearly 90 percent chance of victory. I am curious about a more precise chart — showing how often the first team to make 10, or 25, or 50, or so points goes on to victory, but I certainly haven’t got time to compile that data.

Well, perhaps I do, but my reading in baseball history and brushes up against people with SABR connections makes it very clear I have every possible risk factor for getting lost in the world of sports statistics so I want to stay far from the meat of actual games.

Still, there are good probability questions to be asked about things like how big a lead is effectively unbeatable, and I’ll leave this post and reblog as a way to nag myself in the future to maybe thinking about it later.

Lawler’s Law states that the NBA team that reaches 100 points first will win the game. It is based on Lawler’s observations and confirmed by looking back at NBA statistics that show it is true over 90% of the time.

Its brilliance lies in its uselessness. Like NyQuil helps us sleep but does little to help our immune systems make us well, Lawler’s Law soothes us by making us think it means something more than it does.

Why is it so useless, one may venture to ask?

This is a graphical representation of Lawler’s Law. Point A represents the beginning of a game. This team (which ultimately wins this game) has roughly a 50% chance of winning at that point. As the game goes on, and more points are scored, the team depicted here increases its chance of victory based on the number of points it has scored. Point B…

View original post 142 more words