## Can We Tell Whether A Pinball Player Is Improving?

The question posed for the pinball league was: can we say which of the players most improved over the season? I had data. I had the rankings of each of the players over the course of eight league nights. I had tools. I’ve taken statistics classes.

Could I say what a “most improved” pinball player looks like? Well, I can give a rough idea. A player’s improving if their rankings increase over the the season. The most-improved person would show the biggest improvement. This definition might go awry; maybe there’s some important factor I overlooked. But it was a place to start looking.

So here’s the first problem. It’s the plot of my own data, my league scores over the season. Yes, league night 2 is dismal. I’d had to miss the night and so got the lowest score possible.

Is this getting better? Or worse? The obvious thing to do is to look for a curve that goes through these points. Then look at what that curve is doing. The thing is, it’s always possible to draw a curve through a bunch of data points. As long as there’s not something crazy like there’s four data points for the same league night. As long as there’s one data point for each measurement you can always connect those points to some curve. Worse, you can always fit more than one curve through those points. We need to think harder.

Here’s the thing about pinball league night results. Or any other data that comes from the real world. It’s got noise in it. There’s some amount of it that’s just random. We don’t need to look for a curve that matches every data point. Or any data point particularly. What if the actual data is “some easy-to-understand curve, plus some random noise”?

It’s a good thought. It’s a dangerous thought. You need to have an idea of what the “real” curve should be. There’s infinitely many possibilities. You can bias your answer by choosing what curve you think the data ought to represent. Or by not thinking before you make a choice. As ever, the hard part is not in doing a calculation. It’s choosing what calculation to do.

That said there’s a couple safe bets. One of them is straight lines. Why? … Well, they’re easy to work with. But we have deeper reasons. Lots of stuff, when it changes, looks like it’s changing in a straight line. Take any curve that hasn’t got a corner or a jump or a break in it. There’s a straight line that looks close enough to it. Maybe not for long, but at least for some stretch. In the absence of a better idea of what ought to be right, a line is at least a starting point. You might learn something even if a line doesn’t fit well, and get ideas for why to look at particular other shapes.

So there’s good, steady mathematics business to be found in doing “linear regression”. That is, find the line that best fits a set of data points. What do we mean by “best fits”?

The mathematical community has an answer. I agree with it, surely to the comfort of the mathematical community. Here’s the premise. You have a bunch of data points, with a dependent variable ‘x’ and an independent variable ‘y’. So the data points are a bunch of points, $\left(x_j, y_j\right)$ for a couple values of j. You want the line that “best” matches that. Fine. In my pinball league case here, j is the whole numbers from 1 to 8. $x_j$ is … just j again. All right, as happens, this is more mechanism than we need for this problem. But there’s problems where it would be useful anyway. And for $y_j$, well, here:

j yj
1 467
2 420
3 472
4 473
5 472
6 455
7 479
8 462

For the linear regression, propose a line described by the equation $y = m\cdot x + b$. No idea what ‘m’ and ‘b’ are just yet. But. Calculate for each of the $x_j$ values what the projection would be, that is, what $m\cdot x_j + b$. How far are those from the actual $y_j$ data?

Are there choices for ‘m’ and ‘b’ that make the difference smaller? It’s easy to convince yourself there are. Suppose we started out with ‘m’ equal to 0 and ‘b’ equal to 472. That’s an okay fit. Suppose we started out with ‘m’ equal to 100,000,000 and ‘b’ equal to -2,038. That’s a crazy bad fit. So there must be some ‘m’ and ‘b’ that make for better fits.

Is there a best fit? If you don’t think much about mathematics the answer is obvious: of course there’s a best fit. If there’s some poor, some decent, some good fits there must be a best. If you’re a bit better-learned and have thought more about mathematics you might grow suspicious. That term ‘best’ is dangerous. Maybe there’s several fits that are all different but equally good. Maybe there’s an endless series of ever-better fits but no one best. (If you’re not clear how this could work, ponder: what’s the largest negative real number?)

Good suspicions. If you learn a bit more mathematics you learn the calculus of variations. This is the study of how small changes in one quantity change something that depends on it; and it’s all about finding the maxima or minima of stuff. And that tells us that there is, indeed, a best choice for ‘m’ and ‘b’.

(Here I’m going to hedge. I’ve learned a bit more mathematics than that. I don’t think there’s some freaky set of data that will turn up multiple best-fit curves. But my gut won’t let me just declare that. There’s all kinds of crazy, intuition-busting stuff out there. But if there exists some data set that breaks linear regression you aren’t going to run into it by accident.)

So. How to find the best ‘m’ and ‘b’ for this? You’ve got choices. You can open up DuckDuckGo and search for ‘matlab linear regression’ and follow the instructions. Or ‘excel linear regression’, if you have an easier time entering data into spreadsheets. If you’re on the Mac, maybe ‘apple numbers linear regression’. Follow the directions on the second or third link returned. Oh, you can do the calculation yourself. It’s not hard. It’s just tedious. It’s a lot of multiplication and addition and you know what? We’ve already built tools that know how to do this. Use them. Not if your homework assignment is to do this by hand, but, for stuff you care about yes. (In Octave, an open-source clone of Matlab, you can do it by an admirably slick formula that might even be memorizable.)

If you suspect that some shape other than a line is best, okay. Then you’ll want to look up and understand the formulas for these linear regression coefficients. That’ll guide you to finding a best-fit for these other shapes. Or you can do a quick, dirty hack. Like, if you think it should be an exponential curve, then try fitting a line to x and the logarithm of y. And then don’t listen to those doubts about whether this would be the best-fit exponential curve. It’s a calculation, it’s done, isn’t that enough?

Back to lines, back to my data. I’ll spare you the calculations and show you the results.

Done. For me, this season, I ended up with a slope ‘m’ of about 2.48 and a ‘b’ of about 451.3. That is, the slightly diagonal black line here. The red circles are what my scores would have been if my performance exactly matched the line.

That seems like a claim that I’m improving over the season. Maybe not a compelling case. That missed night certainly dragged me down. But everybody had some outlier bad night, surely. Why not find the line that best fits everyone’s season, and declare the most-improved person to be the one with the largest positive slope?

## Who’s The Most Improved Pinball Player?

My love just completed a season as head of a competitive pinball league. People find this an enchanting fact. People find competitive pinball at all enchanting. Many didn’t know pinball was still around, much less big enough to have regular competitions.

Pinball’s in great shape compared to, say, the early 2000s. There’s one major manufacturer. There’s a couple of small manufacturers who are well-organized enough to make a string of games without (yet) collapsing from not knowing how to finance game-building. Many games go right to private collections. But the “barcade” model of a hipster bar with a bunch of pinball machines and, often, video games is working quite well right now. We’re fortunate to live in Michigan. All the major cities in the lower part of the state have pretty good venues and leagues in or near them. We’re especially fortunate to live in Lansing, so that most of these spots are within an hour’s drive, and all of them are within two hours’ drive.

Ah, but how do they work? Many ways, but there are a couple of popular ones. My love’s league uses a scheme that surely has a name. In this scheme everybody plays their own turn on a set of games. Then they get ranked for each game. So the person who puts up the highest score on the game Junkyard earns 100 league points. The person who puts up the second-highest score on Junkyard earns 99 league points. The person with the third-highest score on Junkyard earns 98 league points. And so on, like this. If 20 people showed up for the day, then the poor person who bottoms out earns a mere 81 league points for the game.

This is a relative ranking, yes. I don’t know any competitive-pinball scheme that uses more than one game that doesn’t rank players relative to each other. I’m not sure how an alternative could work. Different games have different scoring schemes. Some games try to dazzle with blazingly high numbers. Some hoard their points as if giving them away cost them anything. A score of 50 million points? If you had that on Attack From Mars you would earn sympathetic hugs and the promise that life will not always be like that. (I’m not sure it’s possible to get a score that low without tilting your game away.) 50 million points on Lord of the Rings would earn a bunch of nods that yeah, that’s doing respectably, but there’s other people yet to play. 50 million points on Scared Stiff would earn applause for the best game anyone had seen all year. 50 million points on The Wizard of Oz would get you named the Lord Mayor of Pinball, your every whim to be rapidly done.

And each individual manifestation of a table is different. It’s part of the fun of pinball. Each game is a real, physical thing, with its own idiosyncrasies. The flippers are a little different in strength. The rubber bands that guard most things are a little harder or softer. The table is a little more or less worn. The sensors are a little more or less sensitive. The tilt detector a little more forgiving, or a little more brutal. Really the least unfair way to rate play is comparing people to each other on a particular table played at approximately the same time.

It’s not perfectly fair. How could any real thing be? It’s maddening to put up the best game of your life on some table, and come in the middle of the pack because everybody else was having great games too. It’s some compensation that there’ll be times you have a mediocre game but everybody else has a lousy one so you’re third-place for the night.

Back to league. Players earn these points for every game played. So whoever has the highest score of all on, say, Attack From Mars gets 100 league points for that regardless of whatever they did on Junkyard. Whoever has the best score on Iron Maiden (a game so new we haven’t actually played it during league yet, and that somehow hasn’t got an entry on the Internet Pinball Database; give it time) gets their 100 points. And so on. A player’s standings for the night are based on all the league points earned on all the tables played. For us that’s usually five games. Five or six games seems about standard; that’s enough time playing and hanging out to feel worthwhile without seeming too long.

So each league night all the players earn between (about) 420 and 500 points. We have eight league nights. Add the scores up over those league nights and there we go. (Well, we drop the lowest nightly total for each player. This lets them miss a night for some responsibility, like work or travel or recovering from sickness or something, without penalizing them.)

As we got to the end of the season my love asked: is it possible to figure out which player showed the best improvement over time?

Well. I had everybody’s scores from every night played. And I’ve taken multiple classes in statistics. Why would I not be able to?

## Reading the Comics, April 25, 2018: Coronet Blue Edition

You know what? Sometimes there just isn’t any kind of theme for the week’s strips. I can use an arbitrary name.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 21st of April, 2018 would have gone in last week if I weren’t preoccupied on Saturday. The joke is aimed at freshman calculus students and then intro Real Analysis students. The talk about things being “arbitrarily small” turns up a lot in these courses. Why? Well, in them we usually want to show that one thing equals another. But it’s hard to do that. What we can show is some estimate of how different the first thing can be from the second. And if you can show that that difference can be made small enough by calculating it correctly, great. You’ve shown the two things are equal.

Delta and epsilon turn up in these a lot. In the generic proof of this you say you want to show the difference between the thing you can calculate and the thing you want is smaller than epsilon. So you have the thing you can calculate parameterized by delta. Then your problem becomes showing that if delta is small enough, the difference between what you can do and what you want is smaller than epsilon. This is why it’s an appropriately-formed joke to show someone squeezed by a delta and an epsilon. These are the lower-case delta and epsilon, which is why it’s not a triangle on the left there.

For example, suppose you want to know how long the perimeter of an ellipse is. But all you can calculate is the perimeter of a polygon. I would expect to make a proof of it look like this. Give me an epsilon that’s how much error you’ll tolerate between the polygon’s perimeter and the ellipse’s perimeter. I would then try to find, for epsilon, a corresponding delta. And that if the edges of a polygon are never farther than delta from a point on the ellipse, then the perimeter of the polygon and that of the ellipse are less than epsilon away from each other. And that’s Calculus and Real Analysis.

John Zakour and Scott Roberts’s Maria’s Day for the 22nd is the anthropomorphic numerals joke for this week. I’m curious whether the 1 had a serif that could be wrestled or whether the whole number had to be flopped over, as though it were a ruler or a fat noodle.

Anthony Blades’s Bewley for the 23rd offers advice for what to do if you’ve not got your homework. This strip’s already been run, and mentioned here. I might drop this from my reading if it turns out the strip is done and I’ve exhausted all the topics it inspires.

Dave Whamond’s Reality Check for the 23rd is designed for the doors of mathematics teachers everywhere. It does incidentally express one of those truths you barely notice: that statisticians and mathematicians don’t seem to be quite in the same field. They’ve got a lot of common interest, certainly. But they’re often separate departments in a college or university. When they do share a department it’s named the Department of Mathematics and Statistics, itself an acknowledgement that they’re not quite the same thing. (Also it seems to me it’s always Mathematics-and-Statistics. If there’s a Department of Statistics-and-Mathematics somewhere I don’t know of it and would be curious.) This has to reflect historical influence. Statistics, for all that it uses the language of mathematics and that logical rigor and ideas about proofs and all, comes from a very practical, applied, even bureaucratic source. It grew out of asking questions about the populations of nations and the reliable manufacture of products. Mathematics, even the mathematics that is about real-world problems, is different. A mathematician might specialize in the equations that describe fluid flows, for example. But it could plausibly be because they have interesting and strange analytical properties. It’d be only incidental that they might also say something enlightening about why the plumbing is stopped up.

Neal Rubin and Rod Whigham’s Gil Thorp for the 24th seems to be setting out the premise for the summer storyline. It’s sabermetrics. Or at least the idea that sports performance can be quantized, measured, and improved. The principle behind that is sound enough. The trick is figuring out what are the right things to measure, and what can be done to improve them. Also another trick is don’t be a high school student trying to lecture classmates about geometry. Seriously. They are not going to thank you. Even if you turn out to be right. I’m not sure how you would have much control of the angle your ball comes off the bat, but that’s probably my inexperience. I’ve learned a lot about how to control a pinball hitting the flipper. I’m not sure I could quantize any of it, but I admit I haven’t made a serious attempt to try either. Also, when you start doing baseball statistics you run a roughly 45% chance of falling into a deep well of calculation and acronyms of up to twelve letters from which you never emerge. Be careful. (This is a new comic strip tag.)

Randy Glasbergen’s Glasbergen Cartoons rerun for the 25th feels a little like a slight against me. Well, no matter. Use the things that get you in the mood you need to do well. (Not a new comic strip tag because I’m filing it under ‘Randy Glasbergen’ which I guess I used before?)

## Reading the Comics, May 2, 2017: Puzzle Week

If there was a theme this week, it was puzzles. So many strips had little puzzles to work out. You’ll see. Thank you.

Bill Amend’s FoxTrot for the 30th of April tries to address my loss of Jumble panels. Thank you, whoever at Comic Strip Master Command passed along word of my troubles. I won’t spoil your fun. As sometimes happens with a Jumble you can work out the joke punchline without doing any of the earlier ones. 64 in binary would be written 1000000. And from this you know what fits in all the circles of the unscrambled numbers. This reduces a lot of the scrambling you have to do: just test whether 341 or 431 is a prime number. Check whether 8802, 8208, or 2808 is divisible by 117. The integer cubed you just have to keep trying possibilities. But only one combination is the cube of an integer. The factorial of 12, just, ugh. At least the circles let you know you’ve done your calculations right.

Steve McGarry’s activity feature Kidtown for the 30th plays with numbers some. And a puzzle that’ll let you check how well you can recognize multiles of four that are somewhere near one another. You can use diagonals too; that’s important to remember.

Mac King and Bill King’s Magic in a Minute feature for the 30th is also a celebration of numerals. Enjoy the brain teaser about why the encoding makes sense. I don’t believe the hype about NASA engineers needing days to solve a puzzle kids got in minutes. But if it’s believable, is it really hype?

Marty Links’s Emmy Lou from the 29th of October, 1963 was rerun the 2nd of May. It’s a reminder that mathematics teachers of the early 60s also needed something to tape to their doors.

Mel Henze’s Gentle Creatures rerun for the 2nd of May is another example of the conflating of “can do arithmetic” with “intelligence”.

Mark Litzler’s Joe Vanilla for the 2nd name-drops the Null Hypothesis. I’m not sure what Litzler is going for exactly. The Null Hypothesis, though, comes to us from statistics and from inference testing. It turns up everywhere when we sample stuff. It turns up in medicine, in manufacturing, in psychology, in economics. Everywhere we might see something too complicated to run the sorts of unambiguous and highly repeatable tests that physics and chemistry can do — things that are about immediately practical questions — we get to testing inferences. What we want to know is, is this data set something that could plausibly happen by chance? Or is it too far out of the ordinary to be mere luck? The Null Hypothesis is the explanation that nothing’s going on. If your sample is weird in some way, well, everything is weird. What’s special about your sample? You hope to find data that will let you reject the Null Hypothesis, showing that the data you have is so extreme it just can’t plausibly be chance. Or to conclude that you fail to reject the Null Hypothesis, showing that the data is not so extreme that it couldn’t be chance. We don’t accept the Null Hypothesis. We just allow that more data might come in sometime later.

I don’t know what Litzler is going for with this. I feel like I’m missing a reference and I’ll defer to a finance blogger’s Reading the Comics post.

Keith Tutt and Daniel Saunders’s Lard’s World Peace Tips for the 3rd is another in the string of jokes using arithmetic as source of indisputably true facts. And once again it’s “2 + 2 = 5”. Somehow one plus one never rates in this use.

Aaron Johnson’s W T Duck rerun for the 3rd is the Venn Diagram joke for this week. It’s got some punch to it, too.

Je Mallett’s Frazz for the 5th took me some time to puzzle out. I’ll allow it.

## Did This German Retiree Solve A Decades-Old Conjecture?

And then this came across my desktop (my iPad’s too old to work with the Twitter client anymore):

The underlying news is that one Thomas Royen, a Frankfurt (Germany)-area retiree, seems to have proven the Gaussian Correlation Inequality. It wasn’t a conjecture that sounded familiar to me, but the sidebar (on the Quanta Magazine article to which I’ve linked there) explains it and reminds me that I had heard about it somewhere or other. It’s about random variables. That is, things that can take on one of a set of different values. If you think of them as the measurements of something that’s basically consistent but never homogenous you’re doing well.

Suppose you have two random variables, two things that can be measured. There’s a probability the first variable is in a particular range, greater than some minimum and less than some maximum. There’s a probability the second variable is in some other particular range. What’s the probability that both variables are simultaneously in these particular ranges? This is easy to answer for some specific cases. For example if the two variables have nothing to do with each other then everybody who’s taken a probability class knows. The probability of both variables being in their ranges is the probability the first is in its range times the probability the second is in its range. The challenge is telling whether it’s always true, whether the variables are related to each other or not. Or telling when it’s true if it isn’t always.

The article (and pop reporting on this) is largely about how the proof has gone unnoticed. There’s some interesting social dynamics going on there. Royen published in an obscure-for-the-field journal, one he was an editor for; this makes it look dodgy, at least. And the conjecture’s drawn “proofs” that were just wrong; this discourages people from looking for obscurely-published proofs.

Some of the articles I’ve seen on this make Royen out to be an amateur. And I suppose there is a bias against amateurs in professional mathematics. There is in every field. It’s true that mathematics doesn’t require professional training the way that, say, putting out oil rig fires does. Anyone capable of thinking through an argument rigorously is capable of doing important original work. But there are a lot of tricks to thinking an argument through that are important, and I’d bet on the person with training.

In any case, Royen isn’t a newcomer to the field who just heard of an interesting puzzle. He’d been a statistician, first for a pharmaceutical company and then for a technical university. He may not have a position or tie to a mathematics department or a research organization but he’s someone who would know roughly what to do.

So did he do it? I don’t know; I’m not versed enough in the field to say. It’s interesting to see if he has.

## Reading the Comics, March 4, 2017: Frazz, Christmas Trees, and Weddings Edition

It was another of those curious weeks when Comic Strip Master Command didn’t send quite enough comics my way. Among those they did send were a couple of strips in pairs. I can work with that.

Samson’s Dark Side Of The Horse for the 26th is the Roman Numerals joke for this essay. I apologize to Horace for being so late in writing about Roman Numerals but I did have to wait for Cecil Adams to publish first.

In Jef Mallett’s Frazz for the 26th Caulfield ponders what we know about Pythagoras. It’s hard to say much about the historical figure: he built a cult that sounds outright daft around himself. But it’s hard to say how much of their craziness was actually their craziness, how much was just that any ancient society had a lot of what seems nutty to us, and how much was jokes (or deliberate slander) directed against some weirdos. What does seem certain is that Pythagoras’s followers attributed many of their discoveries to him. And what’s certain is that the Pythagorean Theorem was known, at least a thing that could be used to measure things, long before Pythagoras was on the scene. I’m not sure if it was proved as a theorem or whether it was just known that making triangles with the right relative lengths meant you had a right triangle.

Greg Evans’s Luann Againn for the 28th of February — reprinting the strip from the same day in 1989 — uses a bit of arithmetic as generic homework. It’s an interesting change of pace that the mathematics homework is what keeps one from sleep. I don’t blame Luann or Puddles for not being very interested in this, though. Those sorts of complicated-fraction-manipulation problems, at least when I was in middle school, were always slogs of shuffling stuff around. They rarely got to anything we’d like to know.

Jef Mallett’s Frazz for the 1st of March is one of those little revelations that statistics can give one. Myself, I was always haunted by the line in Carl Sagan’s Cosmos about how, in the future, with the Sun ageing and (presumably) swelling in size and heat, the Earth would see one last perfect day. That there would most likely be quite fine days after that didn’t matter, and that different people might disagree on what made a day perfect didn’t matter. Setting out the idea of a “perfect day” and realizing there would someday be a last gave me chills. It still does.

Richard Thompson’s Poor Richard’s Almanac for the 1st and the 2nd of March have appeared here before. But I like the strip so I’ll reuse them too. They’re from the strip’s guide to types of Christmas trees. The Cubist Fur is described as “so asymmetrical it no longer inhabits Euclidean space”. Properly neither do we, but we can’t tell by eye the difference between our space and a Euclidean space. “Non-Euclidean” has picked up connotations of being so bizarre or even horrifying that we can’t hope to understand it. In practice, it means we have to go a little slower and think about, like, what would it look like if we drew a triangle on a ball instead of a sheet of paper. The Platonic Fir, in the 2nd of March strip, looks like a geometry diagram and I doubt that’s coincidental. It’s very hard to avoid thoughts of Platonic Ideals when one does any mathematics with a diagram. We know our drawings aren’t very good triangles or squares or circles especially. And three-dimensional shapes are worse, as see every ellipsoid ever done on a chalkboard. But we know what we mean by them. And then we can get into a good argument about what we mean by saying “this mathematical construct exists”.

Mark Litzler’s Joe Vanilla for the 3rd uses a chalkboard full of mathematics to represent the deep thinking behind a silly little thing. I can’t make any of the symbols out to mean anything specific, but I do like the way it looks. It’s quite well-done in looking like the shorthand that, especially, physicists would use while roughing out a problem. That there are subscripts with forms like “12” and “22” with a bar over them reinforces that. I would, knowing nothing else, expect this to represent some interaction between particles 1 and 2, and 2 with itself, and that the bar means some kind of complement. This doesn’t mean much to me, but with luck, it means enough to the scientist working it out that it could be turned into a coherent paper.

Bill Holbrook’s On The Fastrack is this week about the wedding of the accounting-minded Fi. And she’s having last-minute doubts, which is why the strip of the 3rd brings in irrational and anthropomorphized numerals. π gets called in to serve as emblematic of the irrational numbers. Can’t fault that. I think the only more famously irrational number is the square root of two, and π anthropomorphizes more easily. Well, you can draw an established character’s face onto π. The square root of 2 is, necessarily, at least two disconnected symbols and you don’t want to raise distracting questions about whether the root sign or the 2 gets the face.

That said, it’s a lot easier to prove that the square root of 2 is irrational. Even the Pythagoreans knew it, and a bright child can follow the proof. A really bright child could create a proof of it. To prove that π is irrational is not at all easy; it took mathematicians until the 19th century. And the best proof I know of the fact does it by a roundabout method. We prove that if a number (other than zero) is rational then the tangent of that number must be irrational, and vice-versa. And the tangent of π/4 is 1, so therefore π/4 must be irrational, so therefore π must be irrational. I know you’ll all trust me on that argument, but I wouldn’t want to sell it to a bright child.

Holbrook continues the thread on the 4th, extends the anthropomorphic-mathematics-stuff to call people variables. There’s ways that this is fair. We use a variable for a number whose value we don’t know or don’t care about. A “random variable” is one that could take on any of a set of values. We don’t know which one it does, in any particular case. But we do know — or we can find out — how likely each of the possible values is. We can use this to understand the behavior of systems even if we never actually know what any one of it does. You see how I’m going to defend this metaphor, then, especially if we allow that what people are likely or unlikely to do will depend on context and evolve in time.

## Reading the Comics, February 23, 2017: The Week At Once Edition

For the first time in ages there aren’t enough mathematically-themed comic strips to justify my cutting the week’s roundup in two. No, I have no idea what I’m going to write about for Thursday. Let’s find out together.

Jenny Campbell’s Flo and Friends for the 19th faintly irritates me. Flo wants to make sure her granddaughter understands that just because it takes people on average 14 minutes to fall asleep doesn’t mean that anyone actually does, by listing all sorts of reasons that a person might need more than fourteen minutes to sleep. It makes me think of a behavior John Allen Paulos notes in Innumeracy, wherein the statistically wise points out that someone has, say, a one-in-a-hundred-million chance of being killed by a terrorist (or whatever) and is answered, “ah, but what if you’re that one?” That is, it’s a response that has the form of wisdom without the substance. I notice Flo doesn’t mention the many reasons someone might fall asleep in less than fourteen minutes.

But there is something wise in there nevertheless. For most stuff, the average is the most common value. By “the average” I mean the arithmetic mean, because that is what anyone means by “the average” unless they’re being difficult. (Mathematicians acknowledge the existence of an average called the mode, which is the most common value (or values), and that’s most common by definition.) But just because something is the most common result does not mean that it must be common. Toss a coin fairly a hundred times and it’s most likely to come up tails 50 times. But you shouldn’t be surprised if it actually turns up tails 51 or 49 or 45 times. This doesn’t make 50 a poor estimate for the average number of times something will happen. It just means that it’s not a guarantee.

Gary Wise and Lance Aldrich’s Real Life Adventures for the 19th shows off an unusually dynamic camera angle. It’s in service for a class of problem you get in freshman calculus: find the longest pole that can fit around a corner. Oh, a box-spring mattress up a stairwell is a little different, what with box-spring mattresses being three-dimensional objects. It’s the same kind of problem. I want to say the most astounding furniture-moving event I’ve ever seen was when I moved a fold-out couch down one and a half flights of stairs single-handed. But that overlooks the caged mouse we had one winter, who moved a Chinese finger-trap full of crinkle paper up the tight curved plastic to his nest by sheer determination. The trap was far longer than could possibly be curved around the tube. We have no idea how he managed it.

J R Faulkner’s Promises, Promises for the 20th jokes that one could use Roman numerals to obscure calculations. So you could. Roman numerals are terrible things for doing arithmetic, at least past addition and subtraction. This is why accountants and mathematicians abandoned them pretty soon after learning there were alternatives.

Mark Anderson’s Andertoons for the 21st is the Mark Anderson’s Andertoons for the week. Probably anything would do for the blackboard problem, but something geometry reads very well.

Jef Mallett’s Frazz for the 21st makes some comedy out of the sort of arithmetic error we all make. It’s so easy to pair up, like, 7 and 3 make 10 and 8 and 2 make 10. It takes a moment, or experience, to realize 78 and 32 will not make 100. Forgive casual mistakes.

Bud Fisher’s Mutt and Jeff rerun for the 22nd is a similar-in-tone joke built on arithmetic errors. It’s got the form of vaudeville-style sketch compressed way down, which is probably why the third panel could be made into a satisfying final panel too.

Bud Blake’s Tiger rerun for the 23rd just name-drops mathematics; it could be any subject. But I need some kind of picture around here, don’t I?

Mike Baldwin’s Cornered for the 23rd is the anthropomorphic numerals joke for the week.

## Reading the Comics, February 11, 2017: Trivia Edition

And now to wrap up last week’s mathematically-themed comic strips. It’s not a set that let me get into any really deep topics however hard I tried overthinking it. Maybe something will turn up for Sunday.

Mason Mastroianni, Mick Mastroianni, and Perri Hart’s B.C. for the 7th tries setting arithmetic versus celebrity trivia. It’s for the old joke about what everyone should know versus what everyone does know. One might question whether Kardashian pet eating habits are actually things everyone knows. But the joke needs some hyperbole in it to have any vitality and that’s the only available spot for it. It’s easy also to rate stuff like arithmetic as trivia since, you know, calculators. But it is worth knowing that seven squared is pretty close to 50. It comes up when you do a lot of estimates of calculations in your head. The square root of 10 is pretty near 3. The square root of 50 is near 7. The cube root of 10 is a little more than 2. The cube root of 50 a little more than three and a half. The cube root of 100 is a little more than four and a half. When you see ways to rewrite a calculation in estimates like this, suddenly, a lot of amazing tricks become possible.

Leigh Rubin’s Rubes for the 7th is a “mathematics in the real world” joke. It could be done with any mythological animals, although I suppose unicorns have the advantage of being relatively easy to draw recognizably. Mermaids would do well too. Dragons would also read well, but they’re more complicated to draw.

Mark Pett’s Mr Lowe rerun for the 8th has the kid resisting the mathematics book. Quentin’s grounds are that how can he know a dated book is still relevant. There’s truth to Quentin’s excuse. A mathematical truth may be universal. Whether we find it interesting is a matter of culture and even fashion. There are many ways to present any fact, and the question of why we want to know this fact has as many potential answers as it has people pondering the question.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 8th is a paean to one of the joys of numbers. There is something wonderful in counting, in measuring, in tracking. I suspect it’s nearly universal. We see it reflected in people passing around, say, the number of rivets used in the Chrysler Building or how long a person’s nervous system would reach if stretched out into a line or ever-more-fanciful measures of stuff. Is it properly mathematics? It’s delightful, isn’t that enough?

Scott Hilburn’s The Argyle Sweater for the 10th is a Fibonacci Sequence joke. That’s a good one for taping to the walls of a mathematics teacher’s office.

Bill Rechin’s Crock rerun for the 11th is a name-drop of mathematics. Really anybody’s homework would be sufficiently boring for the joke. But I suppose mathematics adds the connotation that whatever you’re working on hasn’t got a human story behind it, the way English or History might, and that it hasn’t got the potential to eat, explode, or knock a steel ball into you the way Biology, Chemistry, or Physics have. Fair enough.

## The End 2016 Mathematics A To Z: Hat

I was hoping to pick a term that was a quick and easy one to dash off. I learned better.

## Hat.

This is a simple one. It’s about notation. Notation is never simple. But it’s important. Good symbols organize our thoughts. They tell us what are the common ordinary bits of our problem, and what are the unique bits we need to pay attention to here. We like them to be easy to write. Easy to type is nice, too, but in my experience mathematicians work by hand first. Typing is tidying-up, and we accept that being sluggish. Unique would be nice, so that anyone knows what kind of work we’re doing just by looking at the symbols. I don’t think anything manages that. But at least some notation has alternate uses rare enough we don’t have to worry about it.

“Hat” has two major uses I know of. And we call it “hat”, although our friends in the languages department would point out this is a caret. The little pointy corner that goes above a letter, like so: $\hat{i}$. $\hat{x}$. $\hat{e}$. It’s not something we see on its own. It’s always above some variable.

The first use of the hat like this comes up in statistics. It’s a way of marking that something is an estimate. By “estimate” here we mean what anyone might mean by “estimate”. Statistics is full of uses for this sort of thing. For example, we often want to know what the arithmetic mean of some quantity is. The average height of people. The average temperature for the 18th of November. The average weight of a loaf of bread. We have some letter that we use to mean “the value this has for any one example”. By some letter we mean ‘x’, maybe sometimes ‘y’. We can use any and maybe the problem begs for something. But it’s ‘x’, maybe sometimes ‘y’.

For the arithmetic mean of ‘x’ for the whole population we write the letter with a horizontal bar over it. (The arithmetic mean is the thing everybody in the world except mathematicians calls the average. Also, it’s what mathematicians mean when they say the average. We just get fussy because we know if we don’t say “arithmetic mean” someone will come along and point out there are other averages.) That arithmetic mean is $\bar{x}$. Maybe $\bar{y}$ if we must. Must be some number. But what is it? If we can’t measure whatever it is for every single example of our group — the whole population — then we have to make an estimate. We do that by taking a sample, ideally one that isn’t biased in some way. (This is so hard to do, or at least be sure you’ve done.) We can find the mean for this sample, though, because that’s how we picked it. The mean of this sample is probably close to the mean of the whole population. It’s an estimate. So we can write $\hat{x}$ and understand. This is not $\bar{x}$ but it does give us a good idea what $\hat{x}$ should be.

(We don’t always use the caret ^ for this. Sometimes we use a tilde ~ instead. ~ has the advantage that it’s often used for “approximately equal to”. So it will carry that suggestion over to its new context.)

The other major use of the hat comes in vectors. Mathematics types do a lot of work with vectors. It turns out a lot of mathematical structures work the way that pointing and moving in directions in ordinary space do. That’s why back when I talked about what vectors were I didn’t say “they’re like arrows pointing some length in some direction”. Arrows pointing some length in some direction are vectors, yes, but there are many more things that are vectors. Thinking of moving in particular directions gives us good intuition for how to work with vectors, and for stuff that turns out to be vectors. But they’re not everything.

If we need to highlight that something is a vector we put a little arrow over its name. $\vec{x}$. $\vec{e}$. That sort of thing. (Or if we’re typing, we might put the letter in boldface: x. This was good back before computers let us put in mathematics without giving the typesetters hazard pay.) We don’t always do that. By the time we do a lot of stuff with vectors we don’t always need the reminder. But we will include it if we need a warning. Like if we want to have both $\vec{r}$ telling us where something is and to use a plain old $r$ to tell us how big the vector $\vec{r}$ is. That turns up a lot in physics problems.

Every vector has some length. Even vectors that don’t seem to have anything to do with distances do. We can make a perfectly good vector out of “polynomials defined for the domain of numbers between -2 and +2”. Those polynomials are vectors, and they have lengths.

There’s a special class of vectors, ones that we really like in mathematics. They’re the “unit vectors”. Those are vectors with a length of 1. And we are always glad to see them. They’re usually good choices for a basis. Basis vectors are useful things. They give us, in a way, a representative slate of cases to solve. Then we can use that representative slate to give us whatever our specific problem’s solution is. So mathematicians learn to look instinctively to them. We want basis vectors, and we really like them to have a length of 1. Even if we aren’t putting the arrow over our variables we’ll put the caret over the unit vectors.

There are some unit vectors we use all the time. One is just the directions in space. That’s $\hat{e}_1$ and $\hat{e}_2$ and for that matter $\hat{e}_3$ and I bet you have an idea what the next one in the set might be. You might be right. These are basis vectors for normal, Euclidean space, which is why they’re labelled “e”. We have as many of them as we have dimensions of space. We have as many dimensions of space as we need for whatever problem we’re working on. If we need a basis vector and aren’t sure which one, we summon one of the letters used as indices all the time. $\hat{e}_i$, say, or $\hat{e}_j$. If we have an n-dimensional space, then we have unit vectors all the way up to $\hat{e}_n$.

We also use the hat a lot if we’re writing quaternions. You remember quaternions, vaguely. They’re complex-valued numbers for people who’re bored with complex-valued numbers and want some thrills again. We build them as a quartet of numbers, each added together. Three of them are multiplied by the mysterious numbers ‘i’, ‘j’, and ‘k’. Each ‘i’, ‘j’, or ‘k’ multiplied by itself is equal to -1. But ‘i’ doesn’t equal ‘j’. Nor does ‘j’ equal ‘k’. Nor does ‘k’ equal ‘i’. And ‘i’ times ‘j’ is ‘k’, while ‘j’ times ‘i’ is minus ‘k’. That sort of thing. Easy to look up. You don’t need to know all the rules just now.

But we often end up writing a quaternion as a number like $4 + 2\hat{i} - 3\hat{j} + 1 \hat{k}$. OK, that’s just the one number. But we will write numbers like $a + b\hat{i} + c\hat{j} + d\hat{k}$. Here a, b, c, and d are all real numbers. This is kind of sloppy; the pieces of a quaternion aren’t in fact vectors added together. But it is hard not to look at a quaternion and see something pointing in some direction, like the first vectors we ever learn about. And there are some problems in pointing-in-a-direction vectors that quaternions handle so well. (Mostly how to rotate one direction around another axis.) So a bit of vector notation seeps in where it isn’t appropriate.

I suppose there’s some value in pointing out that the ‘i’ and ‘j’ and ‘k’ in a quaternion are fixed and set numbers. They’re unlike an ‘a’ or an ‘x’ we might see in the expression. I’m not sure anyone was thinking they were, though. Notation is a tricky thing. It’s as hard to get sensible and consistent and clear as it is to make words and grammar sensible. But the hat is a simple one. It’s good to have something like that to rely on.

## Something To Read: Galton Boards

I do need to take another light week of writing I’m afraid. There’ll be the Theorem Thursday post and all that. But today I’d like to point over to Gaurish4Math’s WordPress Blog, and a discussion of the Galton Board. I’m not familiar with it by that name, but it is a very familiar concept. You see it as Plinko boards on The Price Is Right and as a Boardwalk or amusement-park game. Set an array of pins on a vertical board and drop a ball or a round chip or something that can spin around freely on it. Where will it fall?

It’s random luck, it seems. At least it is incredibly hard to predict where, underneath all the pins, the ball will come to rest. Some of that is ignorance: we just don’t know the weight distribution of the ball, the exact way it’s dropped, the precise spacing of pins well enough to predict it all. We don’t care enough to do that. But some of it is real randomness. Ideally we make the ball bounce so many times that however well we estimated its drop, the tiny discrepancy between where the ball is and where we predict it is, and where it is going and where we predict it is going, will grow larger than the Plinko board and our prediction will be meaningless.

(I am not sure that this literally happens. It is possible, though. It seems more likely the more rows of pins there are on the board. But I don’t know how tall a board really needs to be to be a chaotic system, deterministic but unpredictable.)

But here is the wonder. We cannot predict what any ball will do. But we can predict something about what every ball will do, if we have enormously many of them. Gaurish writes some about the logic of why that is, and the theorems in probability that tell us why that should be so.

## Reading the Comics, July 13, 2016: Catching Up On Vacation Week Edition

I confess I spent the last week on vacation, away from home and without the time to write about the comics. And it was another of those curiously busy weeks that happens when it’s inconvenient. I’ll try to get caught up ahead of the weekend. No promises.

Art and Chip Samson’s The Born Loser for the 10th talks about the statistics of body measurements. Measuring bodies is one of the foundations of modern statistics. Adolphe Quetelet, in the mid-19th century, found a rough relationship between body mass and the square of a person’s height, used today as the base for the body mass index.Francis Galton spent much of the late 19th century developing the tools of statistics and how they might be used to understand human populations with work I will describe as “problematic” because I don’t have the time to get into how much trouble the right mind at the right idea can be.

No attempt to measure people’s health with a few simple measurements and derived quantities can be fully successful. Health is too complicated a thing for one or two or even ten quantities to describe. Measures like height-to-waist ratios and body mass indices and the like should be understood as filters, the way temperature and blood pressure are. If one or more of these measurements are in dangerous ranges there’s reason to think there’s a health problem worth investigating here. It doesn’t mean there is; it means there’s reason to think it’s worth spending resources on tests that are more expensive in time and money and energy. And similarly just because all the simple numbers are fine doesn’t mean someone is perfectly healthy. But it suggests that the person is more likely all right than not. They’re guides to setting priorities, easy to understand and requiring no training to use. They’re not a replacement for thought; no guides are.

Jeff Harris’s Shortcuts educational panel for the 10th is about zero. It’s got a mix of facts and trivia and puzzles with a few jokes on the side.

I don’t have a strong reason to discuss Ashleigh Brilliant’s Pot-Shots rerun for the 11th. It only mentions odds in a way that doesn’t open up to discussing probability. But I do like Brilliant’s “Embrace-the-Doom” tone and I want to share that when I can.

John Hambrock’s The Brilliant Mind of Edison Lee for the 13th of July riffs on the world’s leading exporter of statistics, baseball. Organized baseball has always been a statistics-keeping game. The Olympic Ball Club of Philadelphia’s 1837 rules set out what statistics to keep. I’m not sure why the game is so statistics-friendly. It must be in part that the game lends itself to representation as a series of identical events — pitcher throws ball at batter, while runners wait on up to three bases — with so many different outcomes.

Alan Schwarz’s book The Numbers Game: Baseball’s Lifelong Fascination With Statistics describes much of the sport’s statistics and record-keeping history. The things recorded have varied over time, with the list of things mostly growing. The number of statistics kept have also tended to grow. Sometimes they get dropped. Runs Batted In were first calculated in 1880, then dropped as an inherently unfair statistic to keep; leadoff hitters were necessarily cheated of chances to get someone else home. How people’s idea of what is worth measuring changes is interesting. It speaks to how we change the ways we look at the same event.

Dana Summers’s Bound And Gagged for the 13th uses the old joke about computers being abacuses and the like. I suppose it’s properly true that anything you could do on a real computer could be done on the abacus, just, with a lot ore time and manual labor involved. At some point it’s not worth it, though.

Nate Fakes’s Break of Day for the 13th uses the whiteboard full of mathematics to denote intelligence. Cute birds, though. But any animal in eyeglasses looks good. Lab coats are almost as good as eyeglasses.

David L Hoyt and Jeff Knurek’s Jumble for the 13th is about one of geometry’s great applications, measuring how large the Earth is. It’s something that can be worked out through ingenuity and a bit of luck. Once you have that, some clever argument lets you work out the distance to the Moon, and its size. And that will let you work out the distance to the Sun, and its size. The Ancient Greeks had worked out all of this reasoning. But they had to make observations with the unaided eye, without good timekeeping — time and position are conjoined ideas — and without photographs or other instantly-made permanent records. So their numbers are, to our eyes, lousy. No matter. The reasoning is brilliant and deserves respect.

## A Leap Day 2016 Mathematics A To Z: Z-score

And we come to the last of the Leap Day 2016 Mathematics A To Z series! Z is a richer letter than x or y, but it’s still not so rich as you might expect. This is why I’m using a term that everybody figured I’d use the last time around, when I went with z-transforms instead.

## Z-Score

You get an exam back. You get an 83. Did you do well?

Hard to say. It depends on so much. If you expected to barely pass and maybe get as high as a 70, then you’ve done well. If you took the Preliminary SAT, with a composite score that ranges from 60 to 240, an 83 is catastrophic. If the instructor gave an easy test, you maybe scored right in the middle of the pack. If the instructor sees tests as a way to weed out the undeserving, you maybe had the best score in the class. It’s impossible to say whether you did well without context.

The z-score is a way to provide that context. It draws that context by comparing a single score to all the other values. And underlying that comparison is the assumption that whatever it is we’re measuring fits a pattern. Usually it does. The pattern we suppose stuff we measure will fit is the Normal Distribution. Sometimes it’s called the Standard Distribution. Sometimes it’s called the Standard Normal Distribution, so that you know we mean business. Sometimes it’s called the Gaussian Distribution. I wouldn’t rule out someone writing the Gaussian Normal Distribution. It’s also called the bell curve distribution. As the names suggest by throwing around “normal” and “standard” so much, it shows up everywhere.

A normal distribution means that whatever it is we’re measuring follows some rules. One is that there’s a well-defined arithmetic mean of all the possible results. And that arithmetic mean is the most common value to turn up. That’s called the mode. Also, this arithmetic mean, and mode, is also the median value. There’s as many data points less than it as there are greater than it. Most of the data values are pretty close to the mean/mode/median value. There’s some more as you get farther from this mean. But the number of data values far away from it are pretty tiny. You can, in principle, get a value that’s way far away from the mean, but it’s unlikely.

We call this standard because it might as well be. Measure anything that varies at all. Draw a chart with the horizontal axis all the values you could measure. The vertical axis is how many times each of those values comes up. It’ll be a standard distribution uncannily often. The standard distribution appears when the thing we measure satisfies some quite common conditions. Almost everything satisfies them, or nearly satisfies them. So we see bell curves so often when we plot how frequently data points come up. It’s easy to forget that not everything is a bell curve.

The normal distribution has a mean, and median, and mode, of 0. It’s tidy that way. And it has a standard deviation of exactly 1. The standard deviation is a way of measuring how spread out the bell curve is. About 95 percent of all observed results are less than two standard deviations away from the mean. About 99 percent of all observed results are less than three standard deviations away. 99.9997 percent of all observed results are less than six standard deviations away. That last might sound familiar to those who’ve worked in manufacturing. At least it des once you know that the Greek letter sigma is the common shorthand for a standard deviation. “Six Sigma” is a quality-control approach. It’s meant to make sure one understands all the factors that influence a product and controls them. This is so the product falls outside the design specifications only 0.0003 percent of the time.

This is the normal distribution. It has a standard deviation of 1 and a mean of 0, by definition. And then people using statistics go and muddle the definition. It is always so, with the stuff people actually use. Forgive them. It doesn’t really change the shape of the curve if we scale it, so that the standard deviation is, say, two, or ten, or π, or any positive number. It just changes where the tick marks are on the x-axis of our plot. And it doesn’t really change the shape of the curve if we translate it, adding (or subtracting) some number to it. That makes the mean, oh, 80. Or -15. Or eπ. Or some other number. That just changes what value we write underneath the tick marks on the plot’s x-axis. We can find a scaling and translation of the normal distribution that fits whatever data we’re observing.

When we find the z-score for a particular data point we’re undoing this translation and scaling. We figure out what number on the standard distribution maps onto the original data set’s value. About two-thirds of all data points are going to have z-scores between -1 and 1. About nineteen out of twenty will have z-scores between -2 and 2. About 99 out of 100 will have z-scores between -3 and 3. If we don’t see this, and we have a lot of data points, then that’s suggests our data isn’t normally distributed.

I don’t know why the letter ‘z’ is used for this instead of, say, ‘y’ or ‘w’ or something else. ‘x’ is out, I imagine, because we use that for the original data. And ‘y’ is a natural pick for a second measured variable. z’, I expect, is just far enough from ‘x’ it isn’t needed for some more urgent duty, while being close enough to ‘x’ to suggest it’s some measured thing.

The z-score gives us a way to compare how interesting or unusual scores are. If the exam on which we got an 83 has a mean of, say, 74, and a standard deviation of 5, then we can say this 83 is a pretty solid score. If it has a mean of 78 and a standard deviation of 10, then the score is better-than-average but not exceptional. If the exam has a mean of 70 and a standard deviation of 4, then the score is fantastic. We get to meaningfully compare scores from the measurements of different things. And so it’s one of the tools with which statisticians build their work.

## Reading the Comics, March 14, 2016: Pi Day Comics Event

Comic Strip Master Command had the regular pace of mathematically-themed comic strips the last few days. But it remembered what the 14th would be. You’ll see that when we get there.

Ray Billingsley’s Curtis for the 11th of March is a student-resists-the-word-problem joke. But it’s a more interesting word problem than usual. It’s your classic problem of two trains meeting, but rather than ask when they’ll meet it asks where. It’s just an extra little step once the time of meeting is made, but that’s all right by me. Anything to freshen the scenario up.

Tony Carrillo’s F Minus for the 11th was apparently our Venn Diagram joke for the week. I’m amused.

Mason Mastroianni, Mick Mastroianni, and Perri Hart’s B.C. for the 12th of March name-drops statisticians. Statisticians are almost expected to produce interesting pictures of their results. It is the field that gave us bar charts, pie charts, scatter plots, and many more. Statistics is, in part, about understanding a complicated set of data with a few numbers. It’s also about turning those numbers into recognizable pictures, all in the hope of finding meaning in a confusing world (ours).

Brian Anderson’s Dog Eat Doug for the 13th of March uses walls full of mathematical scrawl as signifier for “stuff thought deeply about’. I don’t recognize any of the symbols specifically, although some of them look plausibly like calculus. I would not be surprised if Anderson had copied equations from a book on string theory. I’d do it to tell this joke.

And then came the 14th of March. That gave us a bounty of Pi Day comics. Among them:

John Hambrock’s The Brilliant Mind of Edison Lee trusts that the name of the day is wordplay enough.

Scott Hilburn’s The Argyle Sweater is also a wordplay joke, although it’s a bit more advanced.

Tim Rickard’s Brewster Rockit fuses the pun with one of its running, or at least rolling, gags.

Bill Whitehead’s Free Range makes an urban legend out of the obsessive calculation of digits of π.

And Missy Meyer’s informational panel cartoon Holiday Doodles mentions that besides “National” Pi Day it was also “National” Potato Chip Day, “National” Children’s Craft Day, and “International” Ask A Question Day. My question: for the first three days, which nation?

Edited To Add: And I forgot to mention, after noting to myself that I ought to mention it. The Price Is Right (the United States edition) hopped onto the Pi Day fuss. It used the day as a thematic link for its Showcase prize packages, noting how you could work out π from the circumference of your new bicycles, or how π was a letter from your vacation destination of Greece, and if you think there weren’t brand-new cars in both Showcases you don’t know the game show well. Did anyone learn anything mathematical from this? I am skeptical. Do people come away thinking mathematics is more fun after this? … Conceivably. At least it was a day fairly free of people declaring they Hate Math and Can Never Do It.

## How 2015 Treated My Mathematics Blog

Oh yeah, I also got one of these. WordPress put together a review of what all went on around here last year. The most startling thing to me is that I had 188 posts over the course of the year. A lot of that is thanks to the A To Z project, which gave me something to post each day for 31 days in a row. If I’d been thinking just a tiny bit harder I’d have come up with two more posts and made a clean sweep of June.

The unit of comparison for my readership this year was the Sydney Opera House. That’s a great comparison because everybody thinks they know how big an opera house is. It reminds me of a bit in Carl Sagan and and Ann Druyan’s Comet in which they compare the speed of an Oort cloud comet puttering around the sun to the speed of a biplane. We may have only a foggy idea how fast that is (I guess maybe a hundred miles per hour?) but it sounds nice and homey.

## Reading the Comics, December 30, 2015: Seeing Out The Year Edition

There’s just enough comic strips with mathematical themes that I feel comfortable doing a last Reading the Comics post for 2015. And as maybe fits that slow week between Christmas and New Year’s, there’s not a lot of deep stuff to write about. But there is a Jumble puzzle.

Keith Tutt and Daniel Saunders’s Lard’s World Peace Tips gives us someone so wrapped up in measuring data as to not notice the obvious. The obvious, though, isn’t always right. This is why statistics is a deep and useful field. It’s why measurement is a powerful tool. Careful measurement and statistical tools give us ways to not fool ourselves. But it takes a lot of sampling, a lot of study, to give those tools power. It can be easy to get lost in the problems of gathering data. Plus numbers have this hypnotic power over human minds. I understand Lard’s problem.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 27th of December messes with a kid’s head about the way we know 1 + 1 equals 2. The classic Principia Mathematica construction builds it out of pure logic. We come up with an idea that we call “one”, and another that we call “plus one”, and an idea we call “two”. If we don’t do anything weird with “equals”, then it follows that “one plus one equals two” must be true. But does the logic mean anything to the real world? Or might we be setting up a game with no relation to anything observable? The punchy way I learned this question was “one cup of popcorn added to one cup of water doesn’t give you two cups of soggy popcorn”. So why should the logical rules that say “one plus one equals two” tell us anything we might want to know about how many apples one has?

David L Hoyt and Jeff Knurek’s Jumble for the 28th of December features a mathematics teacher. That’s enough to include here. (You might have an easier time getting the third and fourth words if you reason what the surprise-answer word must be. You can use that to reverse-engineer what letters have to be in the circles.)

Richard Thompson’s Richard’s Poor Almanac for the 28th of December repeats the Platonic Fir Christmas Tree joke. It’s in color this time. Does the color add to the perfection of the tree, or take away from it? I don’t know how to judge.

Hilary Price’s Rhymes With Orange for the 29th of December gives its panel over to Rina Piccolo. Price often has guest-cartoonist weeks, which is a generous use of her space. Piccolo already has one and a sixth strips — she’s one of the Six Chix cartoonists, and also draws the charming Tina’s Groove — but what the heck. Anyway, this is a comic strip about the butterfly effect. That’s the strangeness by which a deterministic system can still be unpredictable. This counter-intuitive conclusion dates back to the 1890s, when Henri Poincaré was trying to solve the big planetary mechanics question. That question is: is the solar system stable? Is the Earth going to remain in about its present orbit indefinitely far into the future? Or might the accumulated perturbations from Jupiter and the lesser planets someday pitch it out of the solar system? Or, less likely, into the Sun? And the sad truth is, the best we can say is we can’t tell.

In Brian Anderson’s Dog Eat Doug for the 30th of December, Sophie ponders some deep questions. Most of them are purely philosophical questions and outside my competence. “What are numbers?” is also a philosophical question, but it feels like something a mathematician ought to have a position on. I’m not sure I can offer a good one, though. Numbers seem to be to be these things which we imagine. They have some properties and that obey certain rules when we combine them with other numbers. The most familiar of these numbers and properties correspond with some intuition many animals have about discrete objects. Many times over we’ve expanded the idea of what kinds of things might be numbers without losing the sense of how numbers can interact, somehow. And those expansions have generally been useful. They strangely match things we would like to know about the real world. And we can discover truths about these numbers and these relations that don’t seem to be obviously built into the definitions. It’s almost as if the numbers were real objects with the capacity to surprise and to hold secrets.

Why should that be? The lazy answer is that if we came up with a construct that didn’t tell us anything interesting about the real world, we wouldn’t bother studying it. A truly irrelevant concept would be a couple forgotten papers tucked away in an unread journal. But that is missing the point. It’s like answering “why is there something rather than nothing” with “because if there were nothing we wouldn’t be here to ask the question”. That doesn’t satisfy. Why should it be possible to take some ideas about quantity that ravens, raccoons, and chimpanzees have, then abstract some concepts like “counting” and “addition” and “multiplication” from that, and then modify those concepts, and finally have the modification be anything we can see reflected in the real world? There is a mystery here. I can’t fault Sophie for not having an answer.

## A Summer 2015 Mathematics A to Z Roundup

Since I’ve run out of letters there’s little dignified to do except end the Summer 2015 Mathematics A to Z. I’m still organizing my thoughts about the experience. I’m quite glad to have done it, though.

For the sake of good organization, here’s the set of pages that this project’s seen created:

## Reading the Comics, July 12, 2015: Chuckling At Hart Edition

I haven’t had the chance to read the Gocomics.com comics yet today, but I’d had enough strips to bring up anyway. And I might need something to talk about on Tuesday. Two of today’s strips are from the legacy of Johnny Hart. Hart’s last decades at especially B.C., when he most often wrote about his fundamentalist religious views, hurt his reputation and obscured the fact that his comics were really, really funny when they start. His heirs and successors have been doing fairly well at reviving the deliberately anachronistic and lightly satirical edge that made the strips funny to begin with, and one of them’s a perennial around here. The other, Wizard of Id Classics, is literally reprints from the earliest days of the comic strip’s run. That shows the strip when it was earning its place on every comics page everywhere, and made a good case for it.

Mason Mastroianni, Mick Mastroianni, and Perri Hart’s B.C. (July 8) shows how a compass, without straightedge, can be used to ensure one’s survival. I suppose it’s really only loosely mathematical but I giggled quite a bit.

Ken Cursoe’s Tiny Sepuku (July 9) talks about luck as being just the result of probability. That’s fair enough. Random chance will produce strings of particularly good, or bad, results. Those strings of results can look so long or impressive that we suppose they have to represent something real. Look to any sport and the argument about whether there are “hot hands” or “clutch performers”. And Maneki-Neko is right that a probability manipulator would help. You can get a string of ten tails in a row on a fair coin, but you’ll get many more if the coin has an eighty percent chance of coming up tails.

Brant Parker and Johnny Hart’s Wizard of Id Classics (July 9, rerun from July 12, 1965) is a fun bit of volume-guessing and logic. So, yes, I giggled pretty solidly at both B.C. and The Wizard of Id this week.

Mell Lazarus’s Momma (July 11) identifies “long division” as the first thing a person has to master to be an engineer. I don’t know that this is literally true. It’s certainly true that liking doing arithmetic helps one in a career that depends on calculation, though. But you can be a skilled songwriter without being any good at writing sheet music. I wouldn’t be surprised if there are skilled engineers who are helpless at dividing fourteen into 588.

Bunny Hoest and John Reiner’s Lockhorns (July 12) includes an example of using “adding up” to mean “make sense”. It’s a slight thing. But the same idiom was used last week, in Eric Teitelbaum and Bill Teitelbaum’s Bottomliners. I don’t think Comic Strip Master Command is ordering this punch line yet, but you never know.

And finally, I do want to try something a tiny bit new, and explicitly invite you-the-readers to say what strip most amused you. Please feel free to comment about your choices, r warn me that I set up the poll wrong. I haven’t tried this before.

## Quintile.

Why is there statistics?

There are many reasons statistics got organized as a field of study mostly in the late 19th and early 20th century. Mostly they reflect wanting to be able to say something about big collections of data. People can only keep track of so much information at once. Even if we could keep track of more information, we’re usually interested in relationships between pieces of data. When there’s enough data there are so many possible relationships that we can’t see what’s interesting.

One of the things statistics gives us is a way of representing lots of data with fewer numbers. We trust there’ll be few enough numbers we can understand them all simultaneously, and so understand something about the whole data.

Quintiles are one of the tools we have. They’re a lesser tool, I admit, but that makes them sound more exotic. They’re descriptions of how the values of a set of data are distributed. Distributions are interesting. They tell us what kinds of values are likely and which are rare. They tell us also how variable the data is, or how reliably we are measuring data. These are things we often want to know: what is normal for the thing we’re measuring, and what’s a normal range?

We get quintiles from imagining the data set placed in ascending order. There’s some value that one-fifth of the data points are smaller than, and four-fifths are greater than. That’s your first quintile. Suppose we had the values 269, 444, 525, 745, and 1284 as our data set. The first quintile would be the arithmetic mean of the 269 and 444, that is, 356.5.

The second quintile is some value that two-fifths of your data points are smaller than, and that three-fifths are greater than. With that data set we started with that would be the mean of 444 and 525, or 484.5.

The third quintile is a value that three-fifths of the data set is less than, and two-fifths greater than; in this case, that’s 635.

And the fourth quintile is a value that four-fifths of the data set is less than, and one-fifth greater than. That’s the mean of 745 and 1284, or 1014.5.

From looking at the quintiles we can say … well, not much, because this is a silly made-up problem that demonstrates how quintiles are calculated rather instead of why we’d want to do anything with them. At least the numbers come from real data. They’re the word counts of my first five A-to-Z definitions. But the existence of the quintiles at 365.5, 484.5, 635, and 1014.5, along with the minimum and maximum data points at 269 and 1284, tells us something. Mostly that numbers are bunched up in the three and four hundreds, but there could be some weird high numbers. If we had a bigger data set the results would be less obvious.

If the calculating of quintiles sounds much like the way we work out the median, that’s because it is. The median is the value that half the data is less than, and half the data is greater than. There are other ways of breaking down distributions. The first quartile is the value one-quarter of the data is less than. The second quartile a value two-quarters of the data is less than (so, yes, that’s the median all over again). The third quartile is a value three-quarters of the data is less than.

Percentiles are another variation on this. The (say) 26th percentile is a value that 26 percent — 26 hundredths — of the data is less than. The 72nd percentile a value greater than 72 percent of the data.

Are quintiles useful? Well, that’s a loaded question. They are used less than quartiles are. And I’m not sure knowing them is better than looking at a spreadsheet’s plot of the data. A plot of the data with the quintiles, or quartiles if you prefer, drawn in is better than either separately. But these are among the tools we have to tell what data values are likely, and how tightly bunched-up they are.

## How May 2015 Treated My Mathematics Blog

For May 2015 I tried a new WordPress theme — P2 Classic — and I find I rather like it. Unfortunately it seems to be rubbish on mobile devices and I’m not WordPress Theme-equipped-enough to figure out how to fix that. I’m sorry, mobile readers. I’m honestly curious whether the theme change affected my readership, which was down appreciably over May.

According to WordPress, the number of pages viewed here dropped to 936 in May, down just over ten percent from April’s 1047 and also below March’s 1022. Perhaps the less-mobile-friendly theme was shooing people away. Maybe not, though: in March and April I’d posted 14 articles each, while in May there were a mere twelve. The number of views per post increased steadily, from 73 in March to just under 75 in April to 78 in May. I’m curious if this signifies anything. I may get some better idea next month. June should have at least 13 posts from the Mathematics A To Z gimmick, plus this statistics post, and there’ll surely be at least two Reading The Comics posts, or at least sixteen posts. And who knows what else I’ll feel like throwing in? It’ll be an interesting experiment at least.

Anyway, the number of unique visitors rose to 415 in May, up from April’s 389 but still below March’s 468. The number of views per visitor dropped to 2.26, far below April’s 2.68, but closer in line with March’s 2.18. And 2.26 is close to the normal count for this sort of thing.

The number of likes on posts dropped to 259. In April it was 296 likes and in March 265. That may just reflect the lower number of posts, though. Divide the number of likes by the number of posts and March saw an average of 18.9, April 21.14, and May 21.58. That’s all at least consistent, although there’s not much reason to suppose that only things from the current month were liked.

The number of comments recovered also. May saw 83 comments, up from April’s 64, but not quite back to March’s 93. That comes to, for May, 6.9 comments for each post, but that’s got to be counting links to other posts, including pingbacks and maybe the occasional reblogging. I’ve been getting chattier with folks around here, but not seven comments per post chatty.

June starts at 24,820 views, and 485 people following specifically through WordPress.

I’ve got a healthy number of popular posts the past month; all of these got at least 37 page views each. I cut off at 37 because that’s where the Trapezoids one came in and we already know that’s popular. More popular than that were:

I have the suspicion that comics fans are catching on, quietly, to all this stuff.

Now the countries report. The nations sending me at least twenty page views were the United States (476), the United Kingdom (85), Canada (65), Italy (53), and Austria (20).

Sending just a single reader were Belgium, Bulgaria, Colombia, Nigeria, Norway, Pakistan, Romania, and Vietnam. Romania is on a three-month single-reader streak; Vietnam, two. India sent me a mere two readers, down from six last month. The European Union sent me three.

And among the interesting search terms this past month were:

• origin is the gateway to your entire gaming universe.
• how to do a cube box (the cube is easy enough, it’s getting the boxing gloves on that’s hard)
• popeye “computer king” (Remember that comic?)
• google can you show me in 1 trapezoid how many cat how many can you make of 2 (?, although I like the way Google is named at the start of the query, like someone on Next Generation summoning the computer)
• plato “divided line” “arthur cayley” (I believe that mathematics comes in on the lower side of the upper half of Plato’s divided line)
• where did negative numbers originate from

Someday I must work out that “origin is the gateway” thing.

## How April 2015 Treated My Mathematics Blog

(I apologize if the formatting is messed up. For some reason preview is not working, and I will not be trying the new page for entering posts if I can at all help it. I will fix when I can, if it needs fixing.)

As it’s the start of the month I want to try understanding the readership of my blogs, as WordPress gives me statistics. It’s been a more confusing month than usual, though. One thing is easy to say: the number of pages read was 1,047, an all-time high around these parts for a single month. It’s up from 1,022 in March, and 859 in February. And it’s the second month in a row there’ve been more than a thousand readers. That part’s easy.

The number of visitors has dropped. It was down to 389 in April, from a record 468 in March and still-higher 407 in April. This is, if WordPress doesn’t lead me awry, my fifth-highest number of viewers. This does mean the number of views per visitor was my highest since June of 2013. The blog had 2.69 views per visitor, compared to 2.18 in March and 2.11 in February. It’s one of my highest views-per-visitor on record anyway. Perhaps people quite like what they see and are archive-binging. I approve of this. I’m curious why the number of readers dropped so, though, particularly when I look at my humor blog statistics (to be posted later).

I’m confident the readers are there, though. The number of likes on my mathematics blog was 297, up from March’s 265 and February’s 179. It’s the highest on record far as WordPress will tell me. So readers are more engaged, or else they’re clicking like from the WordPress Reader or an RSS feed. Neither gets counted as a page view or a visitor. That’s another easy part. The number of comments is down to 64, from March’s record 93, but March seems to have been an exceptional month. February had 56 comments so I’m not particularly baffled by April’s drop.

May starts out with 23,884 total views, and 472 people following specifically through WordPress.

It’s a truism that my most popular posts are the trapezoids one and the Reading The Comics posts, but for April that was incredibly true. Most popular the past thirty days were:

I am relieved that I started giving all these Comics posts their own individual “Edition” titles. Otherwise there’d be no way to tell them apart.

The nations sending me the most readers were, as ever, the United States (662), Canada (82), and the United Kingdom (47), with Slovenia once again strikingly high (36). Hong Kong came in with 24 readers, Italy 23, and Austria a mere 18. Elke Stangl’s had a busy month, I know.

This month’s single-reader countries were Czech Republic, Morocco, the Netherlands, Puerto Rico, Romania, Taiwan, and Vietnam. Romania’s the only one that sent me a single reader last month. India bounced back from five readers to six.

Among the search terms bringing people to me were no poems. Among the interesting phrases were:

• what point is driving the area difference between two triangles (A good question!)
• how do you say 1,898,600,000,000,000,000,000,000,000 (I almost never do.)
• is julie larson still drawing the dinette set (Yes, to the best of my knowledge.)
• jpe fast is earth spinning? (About once per day, although the answer can be surprisingly difficult to say! But also figure about 465 times the cosine of your latitude meters per second, roughly.)
• origin is the gateway to your entire gaming universe. (Again, I don’t know what this means, and I’m a little scared to find out.)
• i hate maths 2015 photos (Well, that just hurts.)
• getting old teacher jokes (Again, that hurts, even if it’s not near my birthday.)
• two trapezoids make a (This could be a poem, actually.)
• how to draw 2 trapezoids (I’d never thought about that one. Shall have to consider writing it.)

I don’t know quite what it all means, other than that I need to write about comic strips and trapezoids more somehow.