My All 2020 Mathematics A to Z: Statistics


I owe Mr Wu, author of the Singapore Maths Tuition blog, thanks for another topic for this A-to-Z. Statistics is a big field of mathematics, and so I won’t try to give you a course’s worth in 1500 words. But I have to start with a question. I seem to have ended at two thousand words.

Color cartoon illustration of a coati in a beret and neckerchief, holding up a director's megaphone and looking over the Hollywood hills. The megaphone has the symbols + x (division obelus) and = on it. The Hollywood sign is, instead, the letters MATHEMATICS. In the background are spotlights, with several of them crossing so as to make the letters A and Z; one leg of the spotlights has 'TO' in it, so the art reads out, subtly, 'Mathematics A to Z'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Statistics.

Is statistics mathematics?

The answer seems obvious at first. Look at a statistics textbook. It’s full of algebra. And graphs of great sloped mounds. There’s tables full of four-digit numbers in back. The first couple chapters are about probability. They’re full of questions about rolling dice and dealing cards and guessing whether the sibling who just entered is the younger.

But then, why does Rutgers University have a Department of Mathematics and also a Department of Statistics? And considered so distinct as to have an interdisciplinary mathematics-and-statistics track? It’s not an idiosyncrasy of Rutgers. Many schools have the same division between mathematics and statistics. Some join them into a Department of Mathematics and Statistics. But the name hints at something just different about the field. Not too different, though. Physics and Chemistry and important threads of Economics and History are full of mathematics. But you never see a Department of Mathematics and History.

Thinking of the field’s history, though, and its use, tell us more. Some of the earliest work we now recognize as statistics was Arab mathematicians deciphering messages. This cryptanalysis is the observation that (in English) a three-letter word is very likely to be ‘the’, mildly likely to be ‘one’, and not likely to be ‘pyx’. A more modern forerunner is the Republic of Venice supposedly calculating that war with Milan would not be worth the winning. Or the gatherings of mortality tables, recording how many people of what age can be expected to die any year, and what from. (Mortality tables are another of Edmond Halley’s claims to fame, though it won’t displace his comet work.) Florence Nightingale’s charts explaining how more soldiers die of disease than in fighting the Crimean War. William Sealy Gosset sharing sample-testing methods developed at the Guinness brewery.

You see a difference in kind to a mathematical question like finding a square with the same area as this trapezoid. It’s not that mathematics is not practical; it’s always been. And it’s not that statistics lacks abstraction and pure mathematics content. But statistics wears practicality in a way that number theory won’t.

Practical about what? History and etymology tip us off. The early uses of things we now see as statistics are about things of interest to the State. Decoding messages. Counting the population. Following — in the study of annuities — the flow of money between peoples. With the industrial revolution, statistics sneaks into the factory. To have an economy of scale you need a reliable product. How do you know whether the product is reliable, without testing every piece? How can you test every beer brewed without drinking it all?

One great leg of statistics — it’s tempting to call it the first leg, but the history is not so neat as to make that work — is descriptive. This gives us things like mean and median and mode and standard deviation and quartiles and quintiles. These try to let us represent more data than we can really understand in a few words. We lose information in doing so. But if we are careful to remember the difference between the descriptive statistics we have and the original population? (nb, a word of the State) We might not do ourselves much harm.

Another great leg is inferential statistics. This uses tools with names like z-score and the Student t distribution. And talk about things like p-values and confidence intervals. Terms like correlation and regression and . This is about looking for causes in complex scenarios. We want to believe there is a cause to, say, a person’s lung cancer. But there is no tracking down what that is; there are too many things that could start a cancer, and too many of them will go unobserved. But we can notice that people who smoke have lung cancer more often than those who don’t. We can’t say why a person recovered from the influenza in five days. But we can say people who were vaccinated got fewer influenzas, and ones that passed quicker, than those who did not. We can get the dire warning that “correlation is not causation”, uttered by people who don’t like what the correlation suggests may be a cause.

Also by people being honest, though. In the 1980s geologists wondered if the sun might have a not-yet-noticed companion star. Its orbit would explain an apparent periodicity in meteor bombardments of the Earth. But completely random bombardments would produce apparent periodicity sometimes. It’s much the same way trees in a forest will sometimes seem to line up. Or imagine finding there is a neighborhood in your city with a high number of arrests. Is this because it has the highest rate of street crime? Or is the rate of street crime the same as any other spot and there are simply more cops here? But then why are there more cops to be found here? Perhaps they’re attracted by the neighborhood’s reputation for high crime. It is difficult to see through randomness, to untangle complex causes, and to root out biases.

The tools of statistics, as we recognize them, largely came together in the 19th and early 20th century. Adolphe Quetelet, a Flemish scientist, set out much early work, including introducing the concept of the “average man”. He studied the crime statistics of Paris for five years and noticed how regular the numbers were. The implication, to Quetelet — who introduced the idea of the “average man”, representative of societal matters — was that crime is a societal problem. It’s something we can control by mindfully organizing society, without infringing anyone’s autonomy. Put like that, the study of statistics seems an obvious and indisputable good, a way for governments to better serve their public.

So here is the dispute. It’s something mathematicians understate when sharing the stories of important pioneers like Francis Galton or Karl Pearson. They were eugenicists. Part of what drove their interest in studying human populations was to find out which populations were the best. And how to help them overcome their more-populous lessers.

I don’t have the space, or depth of knowledge, to fully recount the 19th century’s racial politics, popular scientific understanding, and international relations. Please accept this as a loose cartoon of the situation. Do not forget the full story is more complex and more ambiguous than I write.

One of the 19th century’s greatest scientific discoveries was evolution. That populations change in time, in size and in characteristics, even budding off new species, is breathtaking. Another of the great discoveries was entropy. This incorporated into science the nostalgic romantic notion that things used to be better. I write that figuratively, but to express the way the notion is felt.

There are implications. If the Sun itself will someday wear out, how long can the Tories last? It was easy for the aristocracy to feel that everything was quite excellent as it was now and dread the inevitable change. This is true for the aristocracy of any country, although the United Kingdom had a special position here. The United Kingdom enjoyed a privileged position among the Great Powers and the Imperial Powers through the 19th century. Note we still call it the Victorian era, when Louis Napoleon or Giuseppe Garibaldi or Otto von Bismarck are more significant European figures. (Granting Victoria had the longer presence on the world stage; “the 19th century” had a longer presence still.) But it could rarely feel secure, always aware that France or Germany or Russia was ready to displace it.

And even internally: if Darwin was right and reproductive success all that matters in the long run, what does it say that so many poor people breed so much? How long could the world hold good things? Would the eternal famines and poverty of the “overpopulated” Irish or Indian colonial populations become all that was left? During the Crimean War, the British military found a shocking number of recruits from the cities were physically unfit for service. In the 1850s this was only an inconvenience; there were plenty of strong young farm workers to recruit. But the British population was already majority-urban, and becoming more so. What would happen by 1880? 1910?

One can follow the reasoning, even if we freeze at the racist conclusions. And we have the advantage of a century-plus hindsight. We can see how the eugenic attitude leads quickly to horrors. And also that it turns out “overpopulated” Ireland and India stopped having famines once they evicted their colonizers.

Does this origin of statistics matter? The utility of a hammer does not depend on the moral standing of its maker. The Central Limit Theorem has an even stronger pretense to objectivity. Why not build as best we can with the crooked timbers of mathematics?

It is in my lifetime that a popular racist book claimed science proved that Black people were intellectual inferiors to White people. This on the basis of supposedly significant differences in the populations’ IQ scores. It proposed that racism wasn’t a thing, or at least nothing to do anything about. It would be mere “realism”. Intelligence Quotients, incidentally, are another idea we can trace to Francis Galton. But an IQ test is not objective. The best we can say is it might be standardized. This says nothing about the biases built into the test, though, or of the people evaluating the results.

So what if some publisher 25 years ago got suckered into publishing a bad book? And racist chumps bought it because they liked its conclusion?

The past is never fully past. In the modern environment of surveillance capitalism we have abundant data on any person. We have abundant computing power. We can find many correlations. This gives people wild ideas for “artificial intelligence”. Something to make predictions. Who will lose a job soon? Who will get sick, and from what? Who will commit a crime? Who will fail their A-levels? At least, who is most likely to?

These seem like answerable questions. One can imagine an algorithm that would answer them fairly. And make for a better world, one which concentrates support around the people most likely to need it. If we were wise, we would ask our friends in the philosophy department about how to do this. Or we might just plunge ahead and trust that since an algorithm runs automatically it must be fair. Our friends in the philosophy department might have some advice there too.

Consider, for example, the body mass index. It was developed by our friend Adolphe Quetelet, as he tried to understand the kinds of bodies in the population. It is now used to judge whether someone is overweight. Weight is treated as though it were a greater threat to health than actual illnesses are. Your diagnosis for the same condition with the same symptoms will be different — and on average worse — if your number says 25.2 rather than 24.8.

We must do better. We can hope that learning how tools were used to injure people will teach us to use them better, to reduce or to avoid harm. We must fight our tendency to latch on to simple ideas as the things we can understand in the world. We must not mistake the greater understanding we have from the statistics for complete understanding. To do this we must have empathy, and we must have humility, and we must understand what we have done badly in the past. We must catch ourselves when we repeat the patterns that brought us to past evils. We must do more than only calculate.


This and the rest of the 2020 A-to-Z essays should be at this link. All the essays from every A-to-Z series should be gathered at this link. And I am looking for V, W, and X topics to write about. Thanks for your thoughts, and thank you for reading.

Reading the Comics, July 13, 2016: Catching Up On Vacation Week Edition


I confess I spent the last week on vacation, away from home and without the time to write about the comics. And it was another of those curiously busy weeks that happens when it’s inconvenient. I’ll try to get caught up ahead of the weekend. No promises.

Art and Chip Samson’s The Born Loser for the 10th talks about the statistics of body measurements. Measuring bodies is one of the foundations of modern statistics. Adolphe Quetelet, in the mid-19th century, found a rough relationship between body mass and the square of a person’s height, used today as the base for the body mass index.Francis Galton spent much of the late 19th century developing the tools of statistics and how they might be used to understand human populations with work I will describe as “problematic” because I don’t have the time to get into how much trouble the right mind at the right idea can be.

No attempt to measure people’s health with a few simple measurements and derived quantities can be fully successful. Health is too complicated a thing for one or two or even ten quantities to describe. Measures like height-to-waist ratios and body mass indices and the like should be understood as filters, the way temperature and blood pressure are. If one or more of these measurements are in dangerous ranges there’s reason to think there’s a health problem worth investigating here. It doesn’t mean there is; it means there’s reason to think it’s worth spending resources on tests that are more expensive in time and money and energy. And similarly just because all the simple numbers are fine doesn’t mean someone is perfectly healthy. But it suggests that the person is more likely all right than not. They’re guides to setting priorities, easy to understand and requiring no training to use. They’re not a replacement for thought; no guides are.

Jeff Harris’s Shortcuts educational panel for the 10th is about zero. It’s got a mix of facts and trivia and puzzles with a few jokes on the side.

I don’t have a strong reason to discuss Ashleigh Brilliant’s Pot-Shots rerun for the 11th. It only mentions odds in a way that doesn’t open up to discussing probability. But I do like Brilliant’s “Embrace-the-Doom” tone and I want to share that when I can.

John Hambrock’s The Brilliant Mind of Edison Lee for the 13th of July riffs on the world’s leading exporter of statistics, baseball. Organized baseball has always been a statistics-keeping game. The Olympic Ball Club of Philadelphia’s 1837 rules set out what statistics to keep. I’m not sure why the game is so statistics-friendly. It must be in part that the game lends itself to representation as a series of identical events — pitcher throws ball at batter, while runners wait on up to three bases — with so many different outcomes.

'Edison, let's discuss stats while we wait for the opening pitch.' 'Statistics? I have plenty of those. A hot dog has 400 calories and costs five dollars. A 12-ounce root beer has 38 grams of sugar.' 'I mean *player* stats.' 'Oh'. (To his grandfather instead) 'Did you know the average wait time to buy nachos is eight minutes and six seconds?'
John Hambrock’s The Brilliant Mind of Edison Lee for the 13th of July, 2016. Properly speaking, the waiting time to buy nachos isn’t a player statistic, but I guess Edison Lee did choose to stop talking to his father for it. Which is strange considering his father’s totally natural and human-like word emission ‘Edison, let’s discuss stats while we wait for the opening pitch’.

Alan Schwarz’s book The Numbers Game: Baseball’s Lifelong Fascination With Statistics describes much of the sport’s statistics and record-keeping history. The things recorded have varied over time, with the list of things mostly growing. The number of statistics kept have also tended to grow. Sometimes they get dropped. Runs Batted In were first calculated in 1880, then dropped as an inherently unfair statistic to keep; leadoff hitters were necessarily cheated of chances to get someone else home. How people’s idea of what is worth measuring changes is interesting. It speaks to how we change the ways we look at the same event.

Dana Summers’s Bound And Gagged for the 13th uses the old joke about computers being abacuses and the like. I suppose it’s properly true that anything you could do on a real computer could be done on the abacus, just, with a lot ore time and manual labor involved. At some point it’s not worth it, though.

Nate Fakes’s Break of Day for the 13th uses the whiteboard full of mathematics to denote intelligence. Cute birds, though. But any animal in eyeglasses looks good. Lab coats are almost as good as eyeglasses.

LERBE ( O O - O - ), GIRDI ( O O O - - ), TACNAV ( O - O - O - ), ULDNOA ( O O O - O - ). When it came to measuring the Earth's circumference, there was a ( - - - - - - - - ) ( - - - - - ).
David L Hoyt and Jeff Knurek’s Jumble for the 13th of July, 2016. The link will be gone sometime after mid-August I figure. I hadn’t thought of a student being baffled by using the same formula for an orange and a planet’s circumference because of their enormous difference in size. It feels authentic, though.

David L Hoyt and Jeff Knurek’s Jumble for the 13th is about one of geometry’s great applications, measuring how large the Earth is. It’s something that can be worked out through ingenuity and a bit of luck. Once you have that, some clever argument lets you work out the distance to the Moon, and its size. And that will let you work out the distance to the Sun, and its size. The Ancient Greeks had worked out all of this reasoning. But they had to make observations with the unaided eye, without good timekeeping — time and position are conjoined ideas — and without photographs or other instantly-made permanent records. So their numbers are, to our eyes, lousy. No matter. The reasoning is brilliant and deserves respect.

The Least Pleasant Thing About WiiFit


We got a WiiFit, and a Wii, for Christmas in 2008, and for me, at that time, it was just what I needed to lose an extraordinary amount of weight. As part of the daily weighing-in routine it offers a set of challenges to your mental and physical agility. This is a pair drawn from, in the original release, five exercises. One is the Balance Test, measuring whether you can shift a certain percentage of your weight to the left or right and hold it for three seconds; the balance board, used for each of these tests, measures how much of your weight is where, left or right, front or back of the board. One is the Steadiness Test, about how still you can stand for thirty seconds and is trickier than it looks. (Breathe slowly, is my advice.) One is the Single Leg balance Test, trying to keep your balance within a certain range of centered for thirty seconds (and the range narrows at ten, twenty, and twenty-five seconds in). One — the most fun — is the Agility Test, in which you swing your body forward and back, left and right to hit as many targets as possible. And the most agonizing of them is the Walking Test, which is simply to take twenty footfalls, left and right, and which reports back how incredibly far from balanced your walk is. The game almost shakes its head and sighs, at least, at how imbalanced I am.

Continue reading “The Least Pleasant Thing About WiiFit”