My All 2020 Mathematics A to Z: Statistics


I owe Mr Wu, author of the Singapore Maths Tuition blog, thanks for another topic for this A-to-Z. Statistics is a big field of mathematics, and so I won’t try to give you a course’s worth in 1500 words. But I have to start with a question. I seem to have ended at two thousand words.

Color cartoon illustration of a coati in a beret and neckerchief, holding up a director's megaphone and looking over the Hollywood hills. The megaphone has the symbols + x (division obelus) and = on it. The Hollywood sign is, instead, the letters MATHEMATICS. In the background are spotlights, with several of them crossing so as to make the letters A and Z; one leg of the spotlights has 'TO' in it, so the art reads out, subtly, 'Mathematics A to Z'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Statistics.

Is statistics mathematics?

The answer seems obvious at first. Look at a statistics textbook. It’s full of algebra. And graphs of great sloped mounds. There’s tables full of four-digit numbers in back. The first couple chapters are about probability. They’re full of questions about rolling dice and dealing cards and guessing whether the sibling who just entered is the younger.

But then, why does Rutgers University have a Department of Mathematics and also a Department of Statistics? And considered so distinct as to have an interdisciplinary mathematics-and-statistics track? It’s not an idiosyncrasy of Rutgers. Many schools have the same division between mathematics and statistics. Some join them into a Department of Mathematics and Statistics. But the name hints at something just different about the field. Not too different, though. Physics and Chemistry and important threads of Economics and History are full of mathematics. But you never see a Department of Mathematics and History.

Thinking of the field’s history, though, and its use, tell us more. Some of the earliest work we now recognize as statistics was Arab mathematicians deciphering messages. This cryptanalysis is the observation that (in English) a three-letter word is very likely to be ‘the’, mildly likely to be ‘one’, and not likely to be ‘pyx’. A more modern forerunner is the Republic of Venice supposedly calculating that war with Milan would not be worth the winning. Or the gatherings of mortality tables, recording how many people of what age can be expected to die any year, and what from. (Mortality tables are another of Edmond Halley’s claims to fame, though it won’t displace his comet work.) Florence Nightingale’s charts explaining how more soldiers die of disease than in fighting the Crimean War. William Sealy Gosset sharing sample-testing methods developed at the Guinness brewery.

You see a difference in kind to a mathematical question like finding a square with the same area as this trapezoid. It’s not that mathematics is not practical; it’s always been. And it’s not that statistics lacks abstraction and pure mathematics content. But statistics wears practicality in a way that number theory won’t.

Practical about what? History and etymology tip us off. The early uses of things we now see as statistics are about things of interest to the State. Decoding messages. Counting the population. Following — in the study of annuities — the flow of money between peoples. With the industrial revolution, statistics sneaks into the factory. To have an economy of scale you need a reliable product. How do you know whether the product is reliable, without testing every piece? How can you test every beer brewed without drinking it all?

One great leg of statistics — it’s tempting to call it the first leg, but the history is not so neat as to make that work — is descriptive. This gives us things like mean and median and mode and standard deviation and quartiles and quintiles. These try to let us represent more data than we can really understand in a few words. We lose information in doing so. But if we are careful to remember the difference between the descriptive statistics we have and the original population? (nb, a word of the State) We might not do ourselves much harm.

Another great leg is inferential statistics. This uses tools with names like z-score and the Student t distribution. And talk about things like p-values and confidence intervals. Terms like correlation and regression and such. This is about looking for causes in complex scenarios. We want to believe there is a cause to, say, a person’s lung cancer. But there is no tracking down what that is; there are too many things that could start a cancer, and too many of them will go unobserved. But we can notice that people who smoke have lung cancer more often than those who don’t. We can’t say why a person recovered from the influenza in five days. But we can say people who were vaccinated got fewer influenzas, and ones that passed quicker, than those who did not. We can get the dire warning that “correlation is not causation”, uttered by people who don’t like what the correlation suggests may be a cause.

Also by people being honest, though. In the 1980s geologists wondered if the sun might have a not-yet-noticed companion star. Its orbit would explain an apparent periodicity in meteor bombardments of the Earth. But completely random bombardments would produce apparent periodicity sometimes. It’s much the same way trees in a forest will sometimes seem to line up. Or imagine finding there is a neighborhood in your city with a high number of arrests. Is this because it has the highest rate of street crime? Or is the rate of street crime the same as any other spot and there are simply more cops here? But then why are there more cops to be found here? Perhaps they’re attracted by the neighborhood’s reputation for high crime. It is difficult to see through randomness, to untangle complex causes, and to root out biases.

The tools of statistics, as we recognize them, largely came together in the 19th and early 20th century. Adolphe Quetelet, a Flemish scientist, set out much early work, including introducing the concept of the “average man”. He studied the crime statistics of Paris for five years and noticed how regular the numbers were. The implication, to Quetelet — who introduced the idea of the “average man”, representative of societal matters — was that crime is a societal problem. It’s something we can control by mindfully organizing society, without infringing anyone’s autonomy. Put like that, the study of statistics seems an obvious and indisputable good, a way for governments to better serve their public.

So here is the dispute. It’s something mathematicians understate when sharing the stories of important pioneers like Francis Galton or Karl Pearson. They were eugenicists. Part of what drove their interest in studying human populations was to find out which populations were the best. And how to help them overcome their more-populous lessers.

I don’t have the space, or depth of knowledge, to fully recount the 19th century’s racial politics, popular scientific understanding, and international relations. Please accept this as a loose cartoon of the situation. Do not forget the full story is more complex and more ambiguous than I write.

One of the 19th century’s greatest scientific discoveries was evolution. That populations change in time, in size and in characteristics, even budding off new species, is breathtaking. Another of the great discoveries was entropy. This incorporated into science the nostalgic romantic notion that things used to be better. I write that figuratively, but to express the way the notion is felt.

There are implications. If the Sun itself will someday wear out, how long can the Tories last? It was easy for the aristocracy to feel that everything was quite excellent as it was now and dread the inevitable change. This is true for the aristocracy of any country, although the United Kingdom had a special position here. The United Kingdom enjoyed a privileged position among the Great Powers and the Imperial Powers through the 19th century. Note we still call it the Victorian era, when Louis Napoleon or Giuseppe Garibaldi or Otto von Bismarck are more significant European figures. (Granting Victoria had the longer presence on the world stage; “the 19th century” had a longer presence still.) But it could rarely feel secure, always aware that France or Germany or Russia was ready to displace it.

And even internally: if Darwin was right and reproductive success all that matters in the long run, what does it say that so many poor people breed so much? How long could the world hold good things? Would the eternal famines and poverty of the “overpopulated” Irish or Indian colonial populations become all that was left? During the Crimean War, the British military found a shocking number of recruits from the cities were physically unfit for service. In the 1850s this was only an inconvenience; there were plenty of strong young farm workers to recruit. But the British population was already majority-urban, and becoming more so. What would happen by 1880? 1910?

One can follow the reasoning, even if we freeze at the racist conclusions. And we have the advantage of a century-plus hindsight. We can see how the eugenic attitude leads quickly to horrors. And also that it turns out “overpopulated” Ireland and India stopped having famines once they evicted their colonizers.

Does this origin of statistics matter? The utility of a hammer does not depend on the moral standing of its maker. The Central Limit Theorem has an even stronger pretense to objectivity. Why not build as best we can with the crooked timbers of mathematics?

It is in my lifetime that a popular racist book claimed science proved that Black people were intellectual inferiors to White people. This on the basis of supposedly significant differences in the populations’ IQ scores. It proposed that racism wasn’t a thing, or at least nothing to do anything about. It would be mere “realism”. Intelligence Quotients, incidentally, are another idea we can trace to Francis Galton. But an IQ test is not objective. The best we can say is it might be standardized. This says nothing about the biases built into the test, though, or of the people evaluating the results.

So what if some publisher 25 years ago got suckered into publishing a bad book? And racist chumps bought it because they liked its conclusion?

The past is never fully past. In the modern environment of surveillance capitalism we have abundant data on any person. We have abundant computing power. We can find many correlations. This gives people wild ideas for “artificial intelligence”. Something to make predictions. Who will lose a job soon? Who will get sick, and from what? Who will commit a crime? Who will fail their A-levels? At least, who is most likely to?

These seem like answerable questions. One can imagine an algorithm that would answer them fairly. And make for a better world, one which concentrates support around the people most likely to need it. If we were wise, we would ask our friends in the philosophy department about how to do this. Or we might just plunge ahead and trust that since an algorithm runs automatically it must be fair. Our friends in the philosophy department might have some advice there too.

Consider, for example, the body mass index. It was developed by our friend Adolphe Quetelet, as he tried to understand the kinds of bodies in the population. It is now used to judge whether someone is overweight. Weight is treated as though it were a greater threat to health than actual illnesses are. Your diagnosis for the same condition with the same symptoms will be different — and on average worse — if your number says 25.2 rather than 24.8.

We must do better. We can hope that learning how tools were used to injure people will teach us to use them better, to reduce or to avoid harm. We must fight our tendency to latch on to simple ideas as the things we can understand in the world. We must not mistake the greater understanding we have from the statistics for complete understanding. To do this we must have empathy, and we must have humility, and we must understand what we have done badly in the past. We must catch ourselves when we repeat the patterns that brought us to past evils. We must do more than only calculate.


This and the rest of the 2020 A-to-Z essays should be at this link. All the essays from every A-to-Z series should be gathered at this link. And I am looking for V, W, and X topics to write about. Thanks for your thoughts, and thank you for reading.

Reading the Comics, April 1, 2017: Connotations Edition


Last week ended with another little string of mathematically-themed comic strips. Most of them invited, to me, talk about the cultural significance of mathematics and what connotations they have. So, this title for an artless essay.

Berkeley Breathed’s Bloom County 2017 for the 28th of March uses “two plus two equals” as the definitive, inarguable truth. It always seems to be “two plus two”, doesn’t it? Never “two plus three”, never “three plus three”. I suppose I’ve sometimes seen “one plus one” or “two times two”. It’s easy to see why it should be a simple arithmetic problem, nothing with complicated subtraction or division or numbers as big as six. Maybe the percussive alliteration of those repeated two’s drives the phrase’s success. But then why doesn’t “two times two” show up nearly as often? Maybe the phrase isn’t iambic enough. “Two plus two” allows (to my ear) the “plus” sink in emphasis, while “times” stays a little too prominent. We need a wordsmith in to explore it. (I’m open to other hypotheses, including that “two times two” gets used more than my impression says.)

Christiann MacAuley’s Sticky Comics for the 28th uses mathematics as the generic “more interesting than people” thing that nerds think about. The thing being thought of there is the Mandelbrot Set. It’s built on complex-valued numbers. Pick a complex number, any you like; that’s called ‘C’. Square the number and add ‘C’ back to itself. This will be some new complex-valued number. Square that new number and add the original ‘C’ back to it again. Square that new number and add the original ‘C’ back once more. And keep at this. There are two things that might happen. These squared numbers might keep growing infinitely large. They might be negative, or imaginary, or (most likely) complex-valued, but their size keeps growing. Or these squared numbers might not grow arbitrarily large. The Mandelbrot Set is the collection of ‘C’ values for which the numbers don’t just keep growing in size. That’s the sort of lumpy kidney bean shape with circles and lightning bolts growing off it that you saw on every pop mathematics book during the Great Fractal Boom of the 80s and 90s. There’s almost no point working it out in your head; the great stuff about fractals almost requires a computer. They take a lot of computation. But if you’re just avoiding conversation, well, anything will do.

Olivia Walch’s Imogen Quest for the 29th riffs on the universe-as-simulation hypothesis. It’s one of those ideas that catches the mind and is hard to refute as long as we don’t talk to the people in the philosophy department, which we’re secretly scared of. Anyway the comic shows one of the classic uses of statistical modeling: try out a number of variations of a model in the hopes of understanding real-world behavior. This is an often-useful way to balance how the real world has stuff going on that’s important and that we don’t know about, or don’t know how to handle exactly.

Mason Mastroianni’s The Wizard of Id for the 31st uses a sprawl of arithmetic as symbol of … well, of status, really. The sort of thing that marks someone a white-collar criminal. I suppose it also fits with the suggestion of magic that accompanies huge sprawls of mathematical reasoning. Bundle enough symbols together and it looks like something only the intellectual aristocracy, or at least secret cabal, could hope to read.

Bob Shannon’s Tough Town for the 1st name-drops arithmetic. And shows off the attitude that anyone we find repulsive must also be stupid, as proven by their being bad at arithmetic. I admit to having no discernable feelings about the Kardashians; but I wouldn’t be so foolish as to conflate intelligence and skill-at-arithmetic.

Reading the Comics, August 16, 2014: Saturday Morning Breakfast Cereal Edition


Zach Weinersmith’s Saturday Morning Breakfast Cereal is a long-running and well-regarded web comic that I haven’t paid much attention to because I don’t read many web comics. XKCD, Newshounds, and a couple others are about it. I’m not opposed to web comics, mind you, I just don’t get around to following them typically. But Saturday Morning Breakfast Cereal started running on Gocomics.com recently, and Gocomics makes it easy to start adding comics, and I did, and that’s served me well for the mathematical comics collections since it’s been a pretty dry spell. I bet it’s the summer vacation.

Saturday Morning Breakfast Cereal (July 30) seems like a reach for inclusion in mathematical comics since its caption is “Physicists make lousy firemen” and it talks about the action of a fire — and of the “living things” caught in the fire — as processes producing wobbling and increases in disorder. That’s an effort at describing a couple of ideas, the first that the temperature of a thing is connected to the speed at which the molecules making it up are moving, and the second that the famous entropy is a never-decreasing quantity. We get these notions from thermodynamics and particularly the attempt to understand physically important quantities like heat and temperature in terms of particles — which have mass and position and momentum — and their interactions. You could write an entire blog about entropy and probably someone does.

Randy Glasbergen’s Glasbergen Cartoons (August 2) uses the word-problem setup for a strip of “Dog Math” and tries to remind everyone teaching undergraduates the quotient rule that it really could be worse, considering.

Nate Fakes’s Break of Day (August 4) takes us into an anthropomorphized world that isn’t numerals for a change, to play on the idea that skill in arithmetic is evidence of particular intelligence.

Jiggs tries to explain addition to his niece, and learns his brother-in-law is his brother-in-law.
George McManus’s _Bringing Up Father_, originally run the 12th of April, 1949.

George McManus’s Bringing Up Father (August 11, rerun from April 12, 1949) goes to the old motif of using money to explain addition problems. It’s not a bad strategy, of course: in a way, arithmetic is one of the first abstractions one does, in going from the idea that a hundred of something added to a hundred fifty of something will yield two hundred fifty of that thing, and it doesn’t matter what that something is: you’ve abstracted out the ideas of “a hundred plus a hundred fifty”. In algebra we start to think about whether we can add together numbers without knowing what one or both of the numbers are — “x plus y” — and later still we look at adding together things that aren’t necessarily numbers.

And back to Saturday Morning Breakfast Cereal (August 13), which has a physicist type building a model of his “lack of dates” based on random walks and, his colleague objects, “only works if we assume you’re an ideal gas molecule”. But models are often built on assumptions that might, taken literally, be nonsensical, like imagining the universe to have exactly three elements in it, supposing that people never act against their maximal long-term economic gain, or — to summon a traditional mathematics/physics joke — assuming a spherical cow. The point of a model is to capture some interesting behavior, and avoid the complicating factors that can’t be dealt with precisely or which don’t relate to the behavior being studied. Choosing how to simplify is the skill and art that earns mathematicians the big money.

And then for August 16, Saturday Morning Breakfast Cereal does a binary numbers joke. I confess my skepticism that there are any good alternate-base-number jokes, but you might like them.

Reading the Comics, July 24, 2014: Math Is Just Hard Stuff, Right? Edition


Maybe there is no pattern to how Comic Strip Master Command directs the making of mathematics-themed comic strips. It hasn’t quite been a week since I had enough to gather up again. But it’s clearly the summertime anyway; the most common theme this time seems to be just that mathematics is some hard stuff, without digging much into particular subjects. I can work with that.

Pab Sungenis’s The New Adventures of Queen Victoria (July 19) brings in Erwin Schrödinger and his in-strip cat Barfly for a knock-knock joke about proof, with Andrew Wiles’s name dropped probably because he’s the only person who’s gotten to be famous for a mathematical proof. Wiles certainly deserves fame for proving Fermat’s Last Theorem and opening up what I understand to be a useful new field for mathematical research (Fermat’s Last Theorem by itself is nice but unimportant; the tools developed to prove it, though, that’s worthwhile), but remembering only Wiles does slight Richard Taylor, whose help Wiles needed to close a flaw in his proof.

Incidentally I don’t know why the cat is named Barfly. It has the feel to me of a name that was a punchline for one strip and then Sungenis felt stuck with it. As Thomas Dye of the web comic Newshounds said, “Joke names’ll kill you”. (I’m inclined to think that funny names can work, as the Marx Brotehrs, Fred Allen, and Vic and Sade did well with them, but they have to be a less demanding kind of funny.)

John Deering’s Strange Brew (July 19) uses a panel full of mathematical symbols scrawled out as the representation of “this is something really hard being worked out”. I suppose this one could also be filed under “rocket science themed comics”, but it comes from almost the first problem of mathematical physics: if you shoot something straight up, how long will it take to fall back down? The faster the thing starts up, the longer it takes to fall back, until at some speed — the escape velocity — it never comes back. This is because the size of the gravitational attraction between two things decreases as they get farther apart. At or above the escape velocity, the thing has enough speed that all the pulling of gravity, from the planet or moon or whatever you’re escaping from, will not suffice to slow the thing down to a stop and make it fall back down.

The escape velocity depends on the size of the planet or moon or sun or galaxy or whatever you’re escaping from, of course, and how close to the surface (or center) you start from. It also assumes you’re talking about the speed when the thing starts flying away, that is, that the thing doesn’t fire rockets or get a speed boost by flying past another planet or anything like that. And things don’t have to reach the escape velocity to be useful. Nothing that’s in earth orbit has reached the earth’s escape velocity, for example. I suppose that last case is akin to how you can still get some stuff done without getting out of the recliner.

Mel Henze’s Gentle Creatures (July 21) uses mathematics as the standard for proving intelligence exists. I’ve got a vested interest in supporting that proposition, but I can’t bring myself to say more than that it shows a particular kind of intelligence exists. I appreciate the equation of the final panel, though, as it can be pretty well generalized.

To disguise a sports venue it's labelled ``Math Arena'', with ``lectures on the actual odds of beating the casino''.
Bill Holbrook’s _Safe Havens_ for the 22nd of July, 2014.

Bill Holbrook’s Safe Havens (July 22) plays on mathematics’ reputation of being not very much a crowd-pleasing activity. That’s all right, although I think Holbrook makes a mistake by having the arena claim to offer a “lecture on the actual odds of beating the casino”, since the mathematics of gambling is just the sort of mathematics I think would draw a crowd. Probability enjoys a particular sweet spot for popular treatment: many problems don’t require great amounts of background to understand, and have results that are surprising, but which have reasons that are easy to follow and don’t require sophisticated arguments, and are about problems that are easy to imagine or easy to find interesting: cards being drawn, dice being rolled, coincidences being found, or secrets being revealed. I understand Holbrook’s editorial cartoon-type point behind the lecture notice he put up, but the venue would have better scared off audiences if it offered a lecture on, say, “Chromatic polynomials for rigidly achiral graphs: new work on Yamada’s invariant”. I’m not sure I could even explain that title in 1200 words.

Missy Meyer’s Holiday Doodles (July 22) revelas to me that apparently the 22nd of July was “Casual Pi Day”. Yeah, I suppose that passes. I didn’t see much about it in my Twitter feed, but maybe I need some more acquaintances who don’t write dates American-fashion.

Thom Bluemel’s Birdbrains (July 24) again uses mathematics — particularly, Calculus — as not just the marker for intelligence but also as The Thing which will decide whether a kid goes on to success in life. I think the dolphin (I guess it’s a dolphin?) parent is being particularly horrible here, as it’s not as if a “B+” is in any way a grade to be ashamed of, and telling kids it is either drives them to give up on caring about grades, or makes them send whiny e-mails to their instructors about how they need this grade and don’t understand why they can’t just do some make-up work for it. Anyway, it makes the kid miserable, it makes the kid’s teachers or professors miserable, and for crying out loud, it’s a B+.

(I’m also not sure whether a dolphin would consider a career at Sea World success in life, but that’s a separate and very sad issue.)