## A Leap Day 2016 Mathematics A To Z: Z-score

And we come to the last of the Leap Day 2016 Mathematics A To Z series! Z is a richer letter than x or y, but it’s still not so rich as you might expect. This is why I’m using a term that everybody figured I’d use the last time around, when I went with z-transforms instead.

## Z-Score

You get an exam back. You get an 83. Did you do well?

Hard to say. It depends on so much. If you expected to barely pass and maybe get as high as a 70, then you’ve done well. If you took the Preliminary SAT, with a composite score that ranges from 60 to 240, an 83 is catastrophic. If the instructor gave an easy test, you maybe scored right in the middle of the pack. If the instructor sees tests as a way to weed out the undeserving, you maybe had the best score in the class. It’s impossible to say whether you did well without context.

The z-score is a way to provide that context. It draws that context by comparing a single score to all the other values. And underlying that comparison is the assumption that whatever it is we’re measuring fits a pattern. Usually it does. The pattern we suppose stuff we measure will fit is the Normal Distribution. Sometimes it’s called the Standard Distribution. Sometimes it’s called the Standard Normal Distribution, so that you know we mean business. Sometimes it’s called the Gaussian Distribution. I wouldn’t rule out someone writing the Gaussian Normal Distribution. It’s also called the bell curve distribution. As the names suggest by throwing around “normal” and “standard” so much, it shows up everywhere.

A normal distribution means that whatever it is we’re measuring follows some rules. One is that there’s a well-defined arithmetic mean of all the possible results. And that arithmetic mean is the most common value to turn up. That’s called the mode. Also, this arithmetic mean, and mode, is also the median value. There’s as many data points less than it as there are greater than it. Most of the data values are pretty close to the mean/mode/median value. There’s some more as you get farther from this mean. But the number of data values far away from it are pretty tiny. You can, in principle, get a value that’s way far away from the mean, but it’s unlikely.

We call this standard because it might as well be. Measure anything that varies at all. Draw a chart with the horizontal axis all the values you could measure. The vertical axis is how many times each of those values comes up. It’ll be a standard distribution uncannily often. The standard distribution appears when the thing we measure satisfies some quite common conditions. Almost everything satisfies them, or nearly satisfies them. So we see bell curves so often when we plot how frequently data points come up. It’s easy to forget that not everything is a bell curve.

The normal distribution has a mean, and median, and mode, of 0. It’s tidy that way. And it has a standard deviation of exactly 1. The standard deviation is a way of measuring how spread out the bell curve is. About 95 percent of all observed results are less than two standard deviations away from the mean. About 99 percent of all observed results are less than three standard deviations away. 99.9997 percent of all observed results are less than six standard deviations away. That last might sound familiar to those who’ve worked in manufacturing. At least it des once you know that the Greek letter sigma is the common shorthand for a standard deviation. “Six Sigma” is a quality-control approach. It’s meant to make sure one understands all the factors that influence a product and controls them. This is so the product falls outside the design specifications only 0.0003 percent of the time.

This is the normal distribution. It has a standard deviation of 1 and a mean of 0, by definition. And then people using statistics go and muddle the definition. It is always so, with the stuff people actually use. Forgive them. It doesn’t really change the shape of the curve if we scale it, so that the standard deviation is, say, two, or ten, or π, or any positive number. It just changes where the tick marks are on the x-axis of our plot. And it doesn’t really change the shape of the curve if we translate it, adding (or subtracting) some number to it. That makes the mean, oh, 80. Or -15. Or eπ. Or some other number. That just changes what value we write underneath the tick marks on the plot’s x-axis. We can find a scaling and translation of the normal distribution that fits whatever data we’re observing.

When we find the z-score for a particular data point we’re undoing this translation and scaling. We figure out what number on the standard distribution maps onto the original data set’s value. About two-thirds of all data points are going to have z-scores between -1 and 1. About nineteen out of twenty will have z-scores between -2 and 2. About 99 out of 100 will have z-scores between -3 and 3. If we don’t see this, and we have a lot of data points, then that’s suggests our data isn’t normally distributed.

I don’t know why the letter ‘z’ is used for this instead of, say, ‘y’ or ‘w’ or something else. ‘x’ is out, I imagine, because we use that for the original data. And ‘y’ is a natural pick for a second measured variable. z’, I expect, is just far enough from ‘x’ it isn’t needed for some more urgent duty, while being close enough to ‘x’ to suggest it’s some measured thing.

The z-score gives us a way to compare how interesting or unusual scores are. If the exam on which we got an 83 has a mean of, say, 74, and a standard deviation of 5, then we can say this 83 is a pretty solid score. If it has a mean of 78 and a standard deviation of 10, then the score is better-than-average but not exceptional. If the exam has a mean of 70 and a standard deviation of 4, then the score is fantastic. We get to meaningfully compare scores from the measurements of different things. And so it’s one of the tools with which statisticians build their work.

## Reading the Comics, January 4, 2015: An Easy New Year Edition

It looks like Comic Strip Master Command wanted to give me a nice, easy start of the year. The first group of mathematics-themed comic strips doesn’t get into deep waters and so could be written up with just a few moments. I foiled them by not having even a few moments to write things up, so that I’m behind on 2016 already. I’m sure I kind of win.

Dan Thompson’s Brevity for the 1st of January starts us off with icons of counting and computing. The abacus, of course, is one of the longest-used tools for computing. The calculator was a useful stopgap between the slide rule and the smart phone. The Count infects numerals with such contagious joy. And the whiteboard is where a lot of good mathematics work gets done. And yes, I noticed the sequence of numbers on the board. The prime numbers are often cited as the sort of message an alien entity would recognize. I suppose it’s likely an intelligence alert enough to pick up messages across space would be able to recognize prime numbers. Whether they’re certain to see them as important building blocks to the ways numbers work, the way we do? I don’t know. But I would expect someone to know the sequence, at least.

Ryan Pagelow’s Buni for New Year’s Day qualifies as the anthropomorphic-numerals joke for this essay.

Scott Hilburn’s The Argyle Sweater for the 2nd of January qualifies as the Roman numerals joke for this essay. It does prompt me to wonder whether about the way people who used Roman numerals as a their primary system thought, though. Obviously, “XCIX red balloons” should be pronounced as “ninety-nine red balloons”. But would someone scan it as “ninety-nine” or would it be read as the characters, “x-c-i-x” and then that converted to a number? I’m not sure I’m expressing the thing I wonder.

Steve Moore’s In The Bleachers for the 4th of January shows a basketball player overthinking the problem of getting a ball in the basket. The overthinking includes a bundle of equations which are all relevant to the problem, though. They’re the kinds of things you get in describing an object tossed up and falling without significant air resistance. I had thought I’d featured this strip — a rerun — before, but it seems not. Moore has used the same kind of joke a couple of other times, though, and he does like getting the equations right where possible.

Justin Boyd’s absurdist Invisible Bread for the 4th of January has Mom clean up a messy hard drive by putting all the 1’s together and all the 0’s together. And, yes, that’s not how data works. We say we represent data, on a computer, with 1’s and 0’s, but those are just names. We need to call them something. They’re in truth — oh, they’re positive or negative electric charges, or magnetic fields pointing one way or another, or they’re switches that are closed or open, or whatever. That’s for the person building the computer to worry about. Our description of what a computer does doesn’t care about the physical manifestation of our data. We could be as right if we say we’re representing data with A’s and purples, or with stop signs and empty cups of tea. What’s important is the pattern, and how likely it is that a 1 will follow a 0, or a 0 will follow a 1. If that sounds reminiscent of my information-theory talk about entropy, well, good: it is. Sweeping all the data into homogenous blocks of 1’s and of 0’s, alas, wipes out the interesting stuff. Information is hidden, somehow, in the ways we line up 1’s and 0’s, whatever we call them.

Steve Boreman’s Little Dog Lost for the 4th of January does a bit of comic wordplay with ones, zeroes, and twos. I like this sort of comic interplay.

And finally, John Deering and John Newcombe saw that Facebook meme about algebra just a few weeks ago, then drew the Zack Hill for the 4th of January.

## z-transform.

The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)

It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a0 represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.

a1, a2, a3, and so on are the first, second, third, and so on samples after the reference start. a-1, a-2, a-3, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.

So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a-4 = 77, a-3 = 81, a-2 = 84, a-1 = 82, a0 = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.

The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0$. (z1 is the same as z. z0 is the same as the number “1”. I wrote it this way to make the pattern more clear.)

I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a1 = 83 and a2 = 76. Then the z-transform for this set of data would be $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$. You’d probably agree that’s not a polynomial, although it looks a lot like one.

The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.

A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$.

Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure $\cdots + 0z^2 + 0z^1 + 1 + 1\left(\frac{1}{z}\right)^1 + 1\left(\frac{1}{z}\right)^2 + 1\left(\frac{1}{z}\right)^3 + 1\left(\frac{1}{z}\right)^4 + \cdots$. And that turns out to be the same as $1 \div \left(1 - \left(\frac{1}{z}\right)\right)$. That’s much shorter to write down, at least.

Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a0, and adding to that one-fifth of yesterday’s temperature, a-1, and one-fifth of the day before’s temperature a-2, and one-fifth a-3, and one-fifth a-4.

The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?

We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.

## A Neat Accident

I wanted to point folks over to a blog post by Rick Wicklin, on the web site for SAS. It’s the company that makes, well, SAS, software designed for data management and analysis.

The incident behind this was an accident, as his daughter spilled a bottle of black nail polish, and it splattered on the wall in an interesting spiral. Dr Wicklin wondered if it might be a logarithmic spiral and gathered data to work out whether it might plausibly be. There’s a nice description for how to go from the messiness of a real world splatter to a clearly defined mathematical problem, and how to try fitting a curve to the messy reality of data.

Curve-fitting real-world data is a challenging field. Curves are always members of families, groups of curves that look similar. For example, circles may have any point as their center and have any radius. Lines may pass through any point you like and be as horizontal or vertical or diagonal as you like. (Yes, a straight line isn’t much of a curve, but it’s too wordy to talk of “line or curve fitting” if you don’t have to. In this context, a line is a kind of curve in the same way a square is a kind of parallelogram.) There are many, many more kinds of curves, parabolas and hyperbolas and cubics and quartics and trigonometric functions and, oh yes, we can add them together, or multiply them, or even compose them (anyone up for the sine of a logarithm?).

So you start with the kind of curve you think your data really is, and try to find the set of parameters that make the curve and the data look like they’re representations of the same thing. The drawing of your curve and the drawing of your data points will never exactly overlap, though. Your data, coming from the real world, will be messy. Some of the nail polish spots will be in the ‘wrong’ place, or it’ll be ambiguous what the ‘real’ location of a point should be. (After all, what is the real location of a spot? Its center? How do you know where the exact center is? What if the spot is a smeared raindrop-shape rather than a circle?)

It’s not just an artistic eye that judges whether the parameters you’ve picked are a good fit. We can quantify how “good” a fit the curve is to the data, and to find the parameters that make the best possible, or the best findable, fit. But there is still an artistic eye involved: there are infinitely many imaginable curves. If you start from the wrong kind of curve, you might get a tolerable fit. But it won’t give insight into the reasons the data looks like this, or what it might look like as more data comes in. Happily, computers make it easy to try out many different kinds of curves, but having a sense of what curves are plausible makes for better work.