## Quintile.

Why is there statistics?

There are many reasons statistics got organized as a field of study mostly in the late 19th and early 20th century. Mostly they reflect wanting to be able to say something about big collections of data. People can only keep track of so much information at once. Even if we could keep track of more information, we’re usually interested in relationships between pieces of data. When there’s enough data there are so many possible relationships that we can’t see what’s interesting.

One of the things statistics gives us is a way of representing lots of data with fewer numbers. We trust there’ll be few enough numbers we can understand them all simultaneously, and so understand something about the whole data.

Quintiles are one of the tools we have. They’re a lesser tool, I admit, but that makes them sound more exotic. They’re descriptions of how the values of a set of data are distributed. Distributions are interesting. They tell us what kinds of values are likely and which are rare. They tell us also how variable the data is, or how reliably we are measuring data. These are things we often want to know: what is normal for the thing we’re measuring, and what’s a normal range?

We get quintiles from imagining the data set placed in ascending order. There’s some value that one-fifth of the data points are smaller than, and four-fifths are greater than. That’s your first quintile. Suppose we had the values 269, 444, 525, 745, and 1284 as our data set. The first quintile would be the arithmetic mean of the 269 and 444, that is, 356.5.

The second quintile is some value that two-fifths of your data points are smaller than, and that three-fifths are greater than. With that data set we started with that would be the mean of 444 and 525, or 484.5.

The third quintile is a value that three-fifths of the data set is less than, and two-fifths greater than; in this case, that’s 635.

And the fourth quintile is a value that four-fifths of the data set is less than, and one-fifth greater than. That’s the mean of 745 and 1284, or 1014.5.

From looking at the quintiles we can say … well, not much, because this is a silly made-up problem that demonstrates how quintiles are calculated rather instead of why we’d want to do anything with them. At least the numbers come from real data. They’re the word counts of my first five A-to-Z definitions. But the existence of the quintiles at 365.5, 484.5, 635, and 1014.5, along with the minimum and maximum data points at 269 and 1284, tells us something. Mostly that numbers are bunched up in the three and four hundreds, but there could be some weird high numbers. If we had a bigger data set the results would be less obvious.

If the calculating of quintiles sounds much like the way we work out the median, that’s because it is. The median is the value that half the data is less than, and half the data is greater than. There are other ways of breaking down distributions. The first quartile is the value one-quarter of the data is less than. The second quartile a value two-quarters of the data is less than (so, yes, that’s the median all over again). The third quartile is a value three-quarters of the data is less than.

Percentiles are another variation on this. The (say) 26th percentile is a value that 26 percent — 26 hundredths — of the data is less than. The 72nd percentile a value greater than 72 percent of the data.

Are quintiles useful? Well, that’s a loaded question. They are used less than quartiles are. And I’m not sure knowing them is better than looking at a spreadsheet’s plot of the data. A plot of the data with the quintiles, or quartiles if you prefer, drawn in is better than either separately. But these are among the tools we have to tell what data values are likely, and how tightly bunched-up they are.

## Reading The Comics, December 22, 2015: National Mathematics Day Edition

It was a busy week — well, it’s a season for busy weeks, after all — which is why the mathematics comics pile grew so large before I could do anything about it this time around. I’m not sure which I’d pick as my favorite; the Truth Facts tickles me by playing symbols up for confusion and ambiguity, but Quincy is surely the best-drawn of this collection, and good comic strip art deserves attention. Happily that’s a vintage strip from King Features so I feel comfortable including the comic strip for you to see easily.

Tony Murphy’s It’s All About You (December 15), a comic strip about people not being quite couples, tells a “what happens in Vegas” joke themed to mathematics. The particular topic — a “seminar on gap unification theory” — is something that might actually be a mathematics department specialty. The topic appears in number theory, and particularly in the field of partitions, the study of ways to subdivide collections of discrete things. At this point the subject starts getting specialized enough I can’t say very much intelligible about it; apparently there’s a way of studying these divisions by looking at the distances (the gaps) between where divisions are made (the partitions), but my attempts to find a clear explanation for this all turn up papers in number theory journals that I haven’t got access to and that, I confess, would take me a long while to understand. If anyone from the number theory group wanted to explain things I’d be glad to offer the space.