## From my First A-to-Z: Z-transform

Back in the day I taught in a Computational Science department, which threw me out to exciting and new-to-me subjects more than once. One quite fun semester I was learning, and teaching, signal processing. This set me up for the triumphant conclusion of my first A-to-Z.

One of the things you can see in my style is mentioning the connotations implied by whether one uses x or z as a variable. Any letter will do, for the use it’s put to. But to use the name ‘z’ suggests an openness to something that ‘x’ doesn’t.

There’s a mention here about stability in algorithms, and the note that we can process data in ways that are stable or are unstable. I don’t mention why one would want or not want stability. Wanting stability hardly seems to need explaining; isn’t that the good option? And, often, yes, we want stable systems because they correct and wipe away error. But there are reasons we might want instability, or at least less stability. Too stable a system will obscure weak trends, or the starts of trends. Your weight flutters day by day in ways that don’t mean much, which is why it’s better to consider a seven-day average. If you took instead a 700-day running average, these meaningless fluctuations would be invisible. But you also would take a year or more to notice whether you were losing or gaining weight. That’s one of the things stability costs.

## z-transform.

The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)

It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a0 represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.

a1, a2, a3, and so on are the first, second, third, and so on samples after the reference start. a-1, a-2, a-3, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.

So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a-4 = 77, a-3 = 81, a-2 = 84, a-1 = 82, a0 = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.

The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0$. (z1 is the same as z. z0 is the same as the number “1”. I wrote it this way to make the pattern more clear.)

I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a1 = 83 and a2 = 76. Then the z-transform for this set of data would be $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$. You’d probably agree that’s not a polynomial, although it looks a lot like one.

The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.

A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$.

Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure $\cdots + 0z^2 + 0z^1 + 1 + 1\left(\frac{1}{z}\right)^1 + 1\left(\frac{1}{z}\right)^2 + 1\left(\frac{1}{z}\right)^3 + 1\left(\frac{1}{z}\right)^4 + \cdots$. And that turns out to be the same as $1 \div \left(1 - \left(\frac{1}{z}\right)\right)$. That’s much shorter to write down, at least.

Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a0, and adding to that one-fifth of yesterday’s temperature, a-1, and one-fifth of the day before’s temperature a-2, and one-fifth a-3, and one-fifth a-4.

The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?

We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.

## From my First A-to-Z: Tensor

Of course I can’t just take a break for the sake of having a break. I feel like I have to do something of interest. So why not make better use of my several thousand past entries and repost one? I’d just reblog it except WordPress’s system for that is kind of rubbish. So here’s what I wrote, when I was first doing A-to-Z’s, back in summer of 2015. Somehow I was able to post three of these a week. I don’t know how.

I had remembered this essay as mostly describing the boring part of tensors, that we usually represent them as grids of numbers and then symbols with subscripts and superscripts. I’m glad to rediscover that I got at why we do such things to numbers and subscripts and superscripts.

## Tensor.

The true but unenlightening answer first: a tensor is a regular, rectangular grid of numbers. The most common kind is a two-dimensional grid, so that it looks like a matrix, or like the times tables. It might be square, with as many rows as columns, or it might be rectangular.

It can also be one-dimensional, looking like a row or a column of numbers. Or it could be three-dimensional, rows and columns and whole levels of numbers. We don’t try to visualize that. It can be what we call zero-dimensional, in which case it just looks like a solitary number. It might be four- or more-dimensional, although I confess I’ve never heard of anyone who actually writes out such a thing. It’s just so hard to visualize.

You can add and subtract tensors if they’re of compatible sizes. You can also do something like multiplication. And this does mean that tensors of compatible sizes will form a ring. Of course, that doesn’t say why they’re interesting.

Tensors are useful because they can describe spatial relationships efficiently. The word comes from the same Latin root as “tension”, a hint about how we can imagine it. A common use of tensors is in describing the stress in an object. Applying stress in different directions to an object often produces different effects. The classic example there is a newspaper. Rip it in one direction and you get a smooth, clean tear. Rip it perpendicularly and you get a raggedy mess. The stress tensor represents this: it gives some idea of how a force put on the paper will create a tear.

Tensors show up a lot in physics, and so in mathematical physics. Technically they show up everywhere, since vectors and even plain old numbers (scalars, in the lingo) are kinds of tensors, but that’s not what I mean. Tensors can describe efficiently things whose magnitude and direction changes based on where something is and where it’s looking. So they are a great tool to use if one wants to represent stress, or how well magnetic fields pass through objects, or how electrical fields are distorted by the objects they move in. And they describe space, as well: general relativity is built on tensors. The mathematics of a tensor allow one to describe how space is shaped, based on how to measure the distance between two points in space.

My own mathematical education happened to be pretty tensor-light. I never happened to have courses that forced me to get good with them, and I confess to feeling intimidated when a mathematical argument gets deep into tensor mathematics. Joseph C Kolecki, with NASA’s Glenn (Lewis) Research Center, published in 2002 a nice little booklet “An Introduction to Tensors for Students of Physics and Engineering”. This I think nicely bridges some of the gap between mathematical structures like vectors and matrices, that mathematics and physics majors know well, and the kinds of tensors that get called tensors and that can be intimidating.

## A Summer 2015 Mathematics A to Z Roundup

Since I’ve run out of letters there’s little dignified to do except end the Summer 2015 Mathematics A to Z. I’m still organizing my thoughts about the experience. I’m quite glad to have done it, though.

For the sake of good organization, here’s the set of pages that this project’s seen created:

## z-transform.

The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)

It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a0 represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.

a1, a2, a3, and so on are the first, second, third, and so on samples after the reference start. a-1, a-2, a-3, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.

So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a-4 = 77, a-3 = 81, a-2 = 84, a-1 = 82, a0 = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.

The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0$. (z1 is the same as z. z0 is the same as the number “1”. I wrote it this way to make the pattern more clear.)

I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a1 = 83 and a2 = 76. Then the z-transform for this set of data would be $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$. You’d probably agree that’s not a polynomial, although it looks a lot like one.

The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.

A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$.

Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure $\cdots + 0z^2 + 0z^1 + 1 + 1\left(\frac{1}{z}\right)^1 + 1\left(\frac{1}{z}\right)^2 + 1\left(\frac{1}{z}\right)^3 + 1\left(\frac{1}{z}\right)^4 + \cdots$. And that turns out to be the same as $1 \div \left(1 - \left(\frac{1}{z}\right)\right)$. That’s much shorter to write down, at least.

Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a0, and adding to that one-fifth of yesterday’s temperature, a-1, and one-fifth of the day before’s temperature a-2, and one-fifth a-3, and one-fifth a-4.

The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?

We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.

## y-axis.

It’s easy to tell where you are on a line. At least it is if you have a couple tools. One is a reference point. Another is the ability to say how far away things are. Then if you say something is a specific distance from the reference point you can pin down its location to one of at most two points. If we add to the distance some idea of direction we can pin that down to at most one point. Real numbers give us a good sense of distance. Positive and negative numbers fit the idea of orientation pretty well.

To tell where you are on a plane, though, that gets tricky. A reference point and a sense of how far things are help. Knowing something is a set distance from the reference point tells you something about its position. But there’s still an infinite number of possible places the thing could be, unless it’s at the reference point.

The classic way to solve this is to divide space into a couple directions. René Descartes made his name for himself — well, with many things. But one of them, in mathematics, was to describe the positions of things by components. One component describes how far something is in one direction from the reference point. The next component describes how far the thing is in another direction.

This sort of scheme we see as laying down axes. One, conventionally taken to be the horizontal or left-right axis, we call the x-axis. The other direction — one perpendicular, or orthogonal, to the x-axis — we call the y-axis. Usually this gets drawn as the vertical axis, the one running up and down the sheet of paper. That’s not required; it’s just convention.

We surely call it the x-axis in echo of the use of x as the name for a number whose value we don’t know right away. (That, too, is a convention Descartes gave us.) x carries with it connotations of the unknown, the sought-after, the mysterious thing to be understood. The next axis we name y because … well, that’s a letter near x and we don’t much need it for anything else, I suppose. If we need another direction yet, if we want something in space rather than a plane, then the third axis we dub the z-axis. It’s perpendicular to the x- and the y-axis directions.

These aren’t the only names for these directions, though. It’s common and often convenient to describe positions of things using vector notation. A vector describes the relative distance and orientation of things. It’s compact symbolically. It lets one think of the position of things as a single variable, a single concept. Then we can talk about a position being a certain distance in the direction of the x-axis plus a certain distance in the direction of the y-axis. And, if need be, plus some distance in the direction of the z-axis.

The direction of the x-axis is often written as $\hat{i}$, and the direction of the y-axis as $\hat{j}$. The direction of the z-axis if needed gets written $\hat{k}$. The circumflex there indicates two things. First is that the thing underneath it is a vector. Second is that it’s a vector one unit long. A vector might have any length, including zero. It’s convenient to make some mention when it’s a nice one unit long.

Another popular notation is to write the direction of the x-axis as the vector $\hat{e}_1$, and the y-axis as the vector $\hat{e}_2$, and so on. This method offers several advantages. One is that we can talk about the vector $\hat{e}_j$, that is, some particular direction without pinning down just which one. That’s the equivalent of writing “x” or “y” for a number we don’t want to commit ourselves to just yet. Another is that we can talk about axes going off in two, or three, or four, or more directions without having to pin down how many there are. And then we don’t have to think of what to call them. x- and y- and z-axes make sense. w-axis sounds a little odd but some might accept it. v-axis? u-axis? Nobody wants that, trust me.

Sometimes people start the numbering from $\hat{e}_0$ so that the y-axis is the direction $\hat{e}_1$. Usually it’s either clear from context or else it doesn’t matter.

## Xor.

Xor comes to us from logic. In this field we look at propositions, which can be be either true or false. Propositions serve the same rule here that variables like “x” and “y” serve in algebra. They have some value. We might know what the value is to start with. We might be hoping to deduce what the value is. We might not actually care what the value is, but need a placeholder for it while we do other work.

A variable, or a proposition, can carry some meaning. The variable “x” may represent “the longest straight board we can fit around this corner”. The proposition “A” may represent “The blue house is the one for sale”. (Logic has a couple of conventions. In one we use capital letters from the start of the alphabet for propositions. In other we use lowercase p’s and q’s and r’s and letters from that patch of the alphabet. This is a difference in dialect, not in content.) That’s convenient, since it can help us understand the meaning of a problem we’re working on, but it’s not essential. The process of solving an equation is the same whether or not the equation represents anything in the real world. So it is with logic.

We can combine propositions to make more interesting statements. If we know what whether the propositions are true or false we know whether the statements are true. If we know starting out only that the statements are true (or false) we might be able to work out whether the propositions are true or false.

Xor, the exclusive or, is one of the common combinations. Start with the propositions A and B, both of which may be true or may be false. A Xor B is a true statement when A is true while B is false, or when A is false while B is true. It’s false when A and B are simultaneously false. It’s also false when A and B are simultaneously true.

It’s the logic of whether a light bulb on a two-way switch is on. If one switch it on and the other off, the bulb is on. If both switches are on, or both switches off, the bulb is off. This is also the logic of what’s offered when the menu says you can have french fries or onion rings with your sandwich. You can get both, but it’ll cost an extra 95 cents.

## Well-Posed Problem.

This is another mathematical term almost explained by what the words mean in English. Probably you’d guess a well-posed problem to be a question whose answer you can successfully find. This also implies that there is an answer, and that it can be found by some method other than guessing luckily.

Mathematicians demand three things of a problem to call it “well-posed”. The first is that a solution exists. The second is that a solution has to be unique. It’s imaginable there might be several answers that answer a problem. In that case we weren’t specific enough about what we’re looking for. Or we should have been looking for a set of answers instead of a single answer.

The third requirement takes some time to understand. It’s that the solution has to vary continuously with the initial conditions. That is, suppose we started with a slightly different problem. If the answer would look about the same, then the problem was well-posed to begin with. Suppose we’re looking at the problem of how a block of ice gets melted by a heater set in its center. The way that melts won’t change much if the heater is a little bit hotter, or if it’s moved a little bit off center. This heating problem is well-posed.

There are problems that don’t have this continuous variation, though. Typically these are “inverse problems”. That is, they’re problems in which you look at the outcome of something and try to say what caused it. That would be looking at the puddle of melted water and the heater and trying to say what the original block of ice looked like. There are a lot of blocks of ice that all look about the same once melted, and there’s no way of telling which was the one you started with.

You might think of these conditions as “there’s an answer, there’s only one answer, and you can find it”. That’s good enough as a memory aid, but it isn’t quite so. A problem’s solution might have this continuous variation, but still be “numerically unstable”. This is a difficulty you can run across when you try doing calculations on a computer.

You know the thing where on a calculator you type in 1 / 3 and get back 0.333333? And you multiply that by three and get 0.999999 instead of exactly 1? That’s the thing that underlies numerical instability. We want to work with numbers, but the calculator or computer will let us work with only an approximation to them. 0.333333 is close to 1/3, but isn’t exactly that.

For many calculations the difference doesn’t matter. 0.999999 is really quite close to 1. If you lost 0.000001 parts of every dollar you earned there’s a fine chance you’d never even notice. But in some calculations, numerically unstable ones, that difference matters. It gets magnified until the error created by the difference between the number you want and the number you can calculate with is too big to ignore. In that case we call the calculation we’re doing “ill-conditioned”.

And it’s possible for a problem to be well-posed but ill-conditioned. This is annoying and is why numerical mathematicians earn the big money, or will tell you they should. Trying to calculate the answer will be so likely to give something meaningless that we can’t trust the work that’s done. But often it’s possible to rework a calculation into something equivalent but well-conditioned. And a well-posed, well-conditioned problem is great. Not only can we find its solution, but we can usually have a computer do the calculations, and that’s a great breakthrough.

## Vertex.

I mentioned graph theory several weeks back, when this Mathematics A To Z project was barely begun. It’s a fun field. It’s a great one for doodlers, and it’s one that has surprising links to other problems.

Graph theory divides the conceptual universe into “things that could be connected” and “ways they are connected”. The “things that could be connected” we call vertices. The “ways they are connected” are the edges. Vertices might have an obvious physical interpretation. They might, represent the corners of a cube or a pyramid or some other common shape. That, I imagine, is why these things were ever called vertices. A diagram of a graph can look a lot like a drawing of a solid object. It doesn’t have to, though. Many graphs will have vertices and edges connected in ways that no solid object could have. They will usually be ones that you could build in wireframe. Use gumdrops for the vertices and strands of wire or plastic or pencils for the edges.

Vertices might stand in for the houses that need to be connected to sources of water and electricity and Internet. They might be the way we represent devices connected on the Internet. They might represent all the area within a state’s boundaries. The Köningsburg bridge problem, held up as the ancestor of graph theory, has its vertices represent the islands and river banks one gets to by bridges. Vertices are, as I say, the things that might be connected.

“Things that might be connected” is a broader category than you might imagine. For example, an important practical use of mathematics is making error-detecting and error-correcting codes. This is how you might send a message that gets garbled — in sending, in transmitting, or in reception — and still understand what was meant. You can model error-detecting or correcting codes as a graph. In this case every possible message is a vertex. Edges connect together the messages that could plausibly be misinterpreted as one another. How many edges you draw — how much misunderstanding you allow for — depends on how many errors you want to be able to detect, or to correct.

When we draw this on paper or a chalkboard or the like we usually draw it as a + or an x or maybe a *. How much we draw depends on how afraid we are of losing sight of it as we keep working. In publication it’s often drawn as a simple dot. This is because printers are able to draw dots that don’t get muddied up by edges being drawn in or eraser marks removing edges.

## Unbounded.

Something is unbounded if it is not bounded. To summon a joke from my college newspaper days, all things considered, this wasn’t too tough a case for Inspector Bazalo.

Admittedly that doesn’t tell us much until we know what “bounded” means. But that means nearly what you might expect from common everyday English. A set of numbers is bounded if you can identify a value that the set never gets larger than, or smaller than. Specifically it’s bounded above if there’s some number that nothing in the set is bigger than. It’s bounded below if there’s some number that nothing in the set is smaller than. If someone just says bounded, they might mean that the set is bounded above and below simultaneously. Or she might mean there’s just an upper or a lower bound. The context should make it clear. If she says something is unbounded, she means that it’s not bounded below, or it’s not bounded above, or it’s not bounded on both sides.

We speak of a function being unbounded if its smallest possible range is unbounded. For example, think of a function with domain of all the real numbers. Give it the rule “match every number in the domain with its square”. In high school algebra you’d write this “f(x) = x2”. Then the range has to be the real numbers from 0 up to … well, just keep going up. It’s unbounded above, although it is bounded below. 0 or any negative number is a valid lower bound.

That’s a fairly obvious example, though. Functions can be more intricate and still be unbounded. For example, consider a function whose domain is all the counting numbers — 1, 2, 3, and so on. (This domain is an unbounded set.) Let the rule be that you match every number in the domain with one divided by its sine. That is, “f(x) = 1 / sin(x)”. There’s no highest, or lowest, number in this set. Pick any possible bound and you can find at least one x for which f(x) is bigger, or smaller.

Regions of space can be bounded or unbounded, too. A region of space is what it sounds like, some blotch on the map. The blotch doesn’t have to be contiguous. If it’s possible to draw a circle that the whole region fits within, then the region is bounded. If it’s impossible to do this, then the region is unbounded. I write blotches on maps and circles as if I’m necessarily talking about two-dimensional spaces. That’s a good way to get a feeling for bounded and unbounded regions. It appeals to our sense of drawing stuff out on paper and of looking at maps. But there’s no reason it has to be two-dimensional. The same ideas apply for one-dimensional spaces and three-dimensional ones. They also apply for higher dimensions. Just change “circles” to “spheres” or “hyperspheres” and the idea carries over.

You might remember the talk about measure, and how it gives an idea of how big a set is. And in that case you might expect an unbounded region has to have an infinitely large measure. After all, imagine a rectangle that’s one unit wide, starts at the left side of your paper, and goes off forever to the right. That’s obviously got infinitely large area. But it’s not so. You can have regions that are unbounded, but have finite — even zero — measure.

It’s often possible to swap a bounded set (function, region) for an unbounded one, or vice-versa. For example, if your set was the range of “1 / sin(x)”, you might match that up with “sin(x)”, its reciprocal. That’s obviously bounded. It’s less obvious how you might make a bounded set out of the range of “x2”. One way would be to match it with the function whose rule is “1 / (x2 + 1)”, which is bounded, above and below. As with duals, this is a way we can turn one problem into another, that we might be able to solve more easily.

## Tensor.

The true but unenlightening answer first: a tensor is a regular, rectangular grid of numbers. The most common kind is a two-dimensional grid, so that it looks like a matrix, or like the times tables. It might be square, with as many rows as columns, or it might be rectangular.

It can also be one-dimensional, looking like a row or a column of numbers. Or it could be three-dimensional, rows and columns and whole levels of numbers. We don’t try to visualize that. It can be what we call zero-dimensional, in which case it just looks like a solitary number. It might be four- or more-dimensional, although I confess I’ve never heard of anyone who actually writes out such a thing. It’s just so hard to visualize.

You can add and subtract tensors if they’re of compatible sizes. You can also do something like multiplication. And this does mean that tensors of compatible sizes will form a ring. Of course, that doesn’t say why they’re interesting.

Tensors are useful because they can describe spatial relationships efficiently. The word comes from the same Latin root as “tension”, a hint about how we can imagine it. A common use of tensors is in describing the stress in an object. Applying stress in different directions to an object often produces different effects. The classic example there is a newspaper. Rip it in one direction and you get a smooth, clean tear. Rip it perpendicularly and you get a raggedy mess. The stress tensor represents this: it gives some idea of how a force put on the paper will create a tear.

Tensors show up a lot in physics, and so in mathematical physics. Technically they show up everywhere, since vectors and even plain old numbers (scalars, in the lingo) are kinds of tensors, but that’s not what I mean. Tensors can describe efficiently things whose magnitude and direction changes based on where something is and where it’s looking. So they are a great tool to use if one wants to represent stress, or how well magnetic fields pass through objects, or how electrical fields are distorted by the objects they move in. And they describe space, as well: general relativity is built on tensors. The mathematics of a tensor allow one to describe how space is shaped, based on how to measure the distance between two points in space.

My own mathematical education happened to be pretty tensor-light. I never happened to have courses that forced me to get good with them, and I confess to feeling intimidated when a mathematical argument gets deep into tensor mathematics. Joseph C Kolecki, with NASA’s Glenn (Lewis) Research Center, published in 2002 a nice little booklet “An Introduction to Tensors for Students of Physics and Engineering”. This I think nicely bridges some of the gap between mathematical structures like vectors and matrices, that mathematics and physics majors know well, and the kinds of tensors that get called tensors and that can be intimidating.

## Step.

On occasion a friend or relative who’s got schoolkids asks me how horrified I am by some bit of Common Core mathematics. This is a good chance for me to disappoint the friend or relative. Usually I’m just sincerely not horrified. Much of what raises horror is students being asked to estimate and approximate answers. This is instead of calculating the answer directly. But I like estimation and approximation. If I want an exact answer I’ll do better to use a calculator. What I need is assurance the thing I’m calculating can sensibly be the thing I want to know. Nearly all my feats of mental arithmetic amount to making an estimate. If I must I improve it until someone’s impressed.

The other horror-raising examples I get amount to “look at how many steps it takes to do this simple problem!” The ones that cross my desk are usually subtraction problems. Someone’s offended the student is told to work out 107 minus 18 (say) by counting by ones from 18 up to 20, then by tens from 20 up to 100, and then by ones again up to 107. And this when they could just write one number above another and do some borrowing and get 89 right away, no steps needed. Assuring my acquaintance that the other method is really just the way you might count change, and that I do subtraction that way much of the time, doesn’t change minds. (More often I do that to double-check my answer. This raises the question of why I don’t do it that way the first time.) Though it does make the acquaintance conclude I’m some crazy person with no idea how to teach kids.

That’s probably fair. I’ve never taught elementary school students, and haven’t any training for it. I’ve only taught college students. For that my entire training consisted of a single one-credit course my first semester as a Teaching Assistant, plus whatever I happened to pick up while TAing for professors who wanted me to sit in on lecture. From the first I learned there is absolutely no point to saying anything while I face the chalkboard because it will be unheard except by the board, which has already been through this class forty times. From the second I learned to toss hard candies as reward to anyone who would say anything, anything, in class. Both are timeless pedagogical truths.

But the worry about the number of steps it takes to do some arithmetic calculation stays with me. After all, what is a step? How much work is it? How hard is a step?

I don’t think there is a concrete measure of hardness. I’m not sure there could be. If I needed to, I’d work out 107 minus 18 by noticing it’s just about 110 minus 20, so it’s got to be about 90, and a 7 minus 8 has to end in a 9 so the answer must be 89. How many steps was that? I guess there are maybe three thoughts involved there. But I don’t do that, at least not deliberately, when I look at the problem. 89 just appears, and if I stay interested in the question, the reasons why that’s right follow in short order. So how many steps did I take? Three? One?

On the other hand, I know that in elementary school I would have had to work it out by looking at 7 minus 8. And then I’d need to borrow from the tens column. And oh dear there’s a 0 to the left of the 7 so I have to borrow from the hundreds and … That’s the procedure as it was taught back then. Now, I liked that. I understood it. And I was taught with appeals to breaking dollars into dimes and pennies, which worked for my imagination. But it’s obviously a bunch of steps. How many? I’m not sure; probably around ten or so. And, if we’re being honest, borrowing from a zero in the tens column is a deeply weird thing to do. I can understand people freezing up rather than do that.

Similarly, I know that if I needed to differentiate the logarithm of the cosine of x, I would have the answer in a flash. It’d be at most one step. If I were still in high school, in my calculus class, I’d need longer. I’d struggle through the chain rule and some simplifications after that. Call it maybe four or five steps. If I were in elementary school I’d need infinitely many steps. I couldn’t even understand the problem except in the most vague, metaphoric way.

This leads me to my suggestion for what a “step” is, at least for problems you work out by hand. (Numerical computing has a more rigorous definition of a step; that’s when you do one of the numerical processing operations.) A step is “the most work you can do in your head without a significant chance of making a mistake”. I think that’s a definition that clarifies the problem of counting steps. It will be different for different people. It will be different for the same person, depending on how experienced she is. The steps a newcomer has to a subject are smaller than the ones an expert has. And it’s not just that newcomer takes more steps to get to the same conclusion than the expert does. The expert might imagine the problem breaks down into different steps from the ones a newcomer can do. Possibly the most important skill a teacher has is being able to work out what the steps the newcomer can take are. These will not always be what the expert thinks the smaller steps would be.

But what to do with problem-solving approaches that require lots of steps? And here I recommend one of the wisest pieces of advice I’ve ever run across. It’s from the 1954 Printer 1 & C United States Navy Training Course manual, NavPers 10458. I apologize if I’m citing it wrong, but I hope people can follow that to the exact document. I have it because I’m interested in Linotype operation is why. From page 308, the section “Don’t Overlook Instructions” in Chapter 7:

When starting on a new piece of copy, or “take” is it is called, be sure to read all instructions, such as the style and size of type, the measure to be set, whether it is to be leaded, indented, and so on.

Then go slowly. Try to develop even, rhythmic strokes, rather than quick, sporadic motions. Strive for accuracy rather than speed. Speed will come with practice.

As with Linotype operations, so it is with arithmetic. Be certain you are doing what you mean to do, and strive to do it accurately. I don’t know how many steps you need, but you probably won’t get a wrong answer if you take more than the minimum number of steps. If you take fewer steps than you need the results will be wretched. Speed will come with practice.

## Ring.

Early on in her undergraduate career a mathematics major will take a class called Algebra. Actually, Introduction to Algebra is more likely, but another Algebra will follow. She will have to explain to her friends and parents that no, it’s not more of that stuff they didn’t understand in high school about expanding binomial terms and finding quadratic equations. The class is the study of constructs that work much like numbers do, but that aren’t necessarily numbers.

The first structure studied is the group. That’s made of two components. One is a set of elements. There might be infinitely many of them — the real numbers, say, or the whole numbers. Or there might be finitely many — the whole numbers from 0 up to 11, or even just the numbers 0 and 1. The other component is an operation that works like addition. What we mean by “works like addition” is that you can take two of the things in the set, “add” them together, and get something else that’s in the set. It has to be associative: something plus the sum of two other things has to equal the sum of the first two things plus the third thing. That is, 1 + (2 + 3) is the same as (1 + 2) + 3.

Also, by the rules of what makes a group, the addition has to commute. First thing plus second thing has to be the same as second thing plus first thing. That is, 1 + 2 has the same value as 2 + 1 does. Furthermore, there has to be something called the additive identity. It works like zero does in ordinary arithmetic. Anything plus the additive identity is that original thing again. And finally, everything in the group has something that’s its additive inverse. The thing plus the additive inverse is the additive identity, our zero.

If you’re lost, that’s all right. A mathematics major spends as much as four weeks in Intro to Algebra feeling lost here. But this is an example. Suppose we have a group made up of the elements 0, 1, 2, and 3. 0 will be the additive identity: 0 plus anything is that original thing. So 1 plus 0 is 1. 1 plus 1 is 2. 1 plus 2 will be 3. 1 plus 3 will be … well, make that 0 again. 2 plus 0 is 2. 2 plus 1 will be 3. 2 plus 2 will be 0. 2 plus 3 will be 1. 3 plus 0 will be 3. 3 plus 1 will be 0. 3 plus 2 will be 1. 3 plus 3 will be 2. Plus will look like a very strange word at this point.

All the elements in this have an additive inverse. Add 3 to 1 and you get 0. Add 2 to 2 and you get 0. Add 1 to 3 and you get 0. And, yes, add 0 to 0 and you get 0. This means you get to do subtraction just as well as you get to do addition.

We’re halfway there. A “ring”, introduced just as the mathematics major has got the hang of groups, is a group with a second operation. Besides being a collection of elements and an addition-like operation, a ring also has a multiplication-like operation. It doesn’t have to do much, as a multiplication. It has to be associative. That is, something times the product of two other things has to be the same as the product of the first two things times the third. You’ve seen that, though. 1 x (2 x 3) is the same as (1 x 2) x 3. And it has to distribute: something times the sum of two other things has to be the same as the sum of the something times the first thing and the something times the second. That is, 2 x (3 + 4) is the same as 2 x 3 plus 2 x 4.

For example, the group we had before, 0 times anything will be 0. 1 times anything will be what we started with: 1 times 0 is 0, 1 times 1 is 1, 1 times 2 is 2, and 1 times 3 is 3. 2 times 0 is 0, 2 times 1 is 2, 2 times 2 will be 0 again, and 2 times 3 will be 2 again. 3 times 0 is 0, 3 times 1 is 3, 3 times 2 is 2, and 3 times 3 is 1. Believe it or not, this all works out. And “times” doesn’t get to look nearly so weird as “plus” does.

And that’s all you need: a collection of things, an operation that looks a bit like addition, and an operation that looks even more vaguely like multiplication.

Now the controversy. How much does something have to look like multiplication? Some people insist that a ring has to have a multiplicative identity, something that works like 1. The ring I described has one, but one could imagine a ring that hasn’t, such as the even numbers and ordinary addition and multiplication. People who want rings to have multiplicative identity sometimes use “rng” to speak — well, write — of rings that haven’t.

Some people want rings to have multiplicative inverses. That is, anything except zero has something you can multiply it by to get 1. The little ring I built there hasn’t got one, because there’s nothing you can multiply 2 by to get 1. Some insist on multiplication commuting, that 2 times 3 equals 3 times 2.

Who’s right? It depends what you want to do. Everybody agrees that a ring has to have elements, and addition, and multiplication, and that the multiplication has to distribute across addition. The rest depends on the author, and the tradition the author works in. Mathematical constructs are things humans find interesting to study. The details of how they’re made will depend on what work we want to do.

If a mathematician wishes to make clear that she expects a ring to have multiplication that commutes and to have a multiplicative identity she can say so. She would write that something is a commutative ring with identity. Or the context may make things clear. If you’re not sure, then you can suppose she uses the definition of “ring” that was in the textbook from her Intro to Algebra class sophomore year.

It may seem strange to think that mathematicians don’t all agree on what a ring is. After all, don’t mathematicians deal in universal, eternal truths? … And they do; things that are proven by rigorous deduction are inarguably true. But the parts of these truths that are interesting are a matter of human judgement. We choose the bunches of ideas that are convenient to work with, and give names to those. That’s much of what makes this glossary an interesting project.

## Quintile.

Why is there statistics?

There are many reasons statistics got organized as a field of study mostly in the late 19th and early 20th century. Mostly they reflect wanting to be able to say something about big collections of data. People can only keep track of so much information at once. Even if we could keep track of more information, we’re usually interested in relationships between pieces of data. When there’s enough data there are so many possible relationships that we can’t see what’s interesting.

One of the things statistics gives us is a way of representing lots of data with fewer numbers. We trust there’ll be few enough numbers we can understand them all simultaneously, and so understand something about the whole data.

Quintiles are one of the tools we have. They’re a lesser tool, I admit, but that makes them sound more exotic. They’re descriptions of how the values of a set of data are distributed. Distributions are interesting. They tell us what kinds of values are likely and which are rare. They tell us also how variable the data is, or how reliably we are measuring data. These are things we often want to know: what is normal for the thing we’re measuring, and what’s a normal range?

We get quintiles from imagining the data set placed in ascending order. There’s some value that one-fifth of the data points are smaller than, and four-fifths are greater than. That’s your first quintile. Suppose we had the values 269, 444, 525, 745, and 1284 as our data set. The first quintile would be the arithmetic mean of the 269 and 444, that is, 356.5.

The second quintile is some value that two-fifths of your data points are smaller than, and that three-fifths are greater than. With that data set we started with that would be the mean of 444 and 525, or 484.5.

The third quintile is a value that three-fifths of the data set is less than, and two-fifths greater than; in this case, that’s 635.

And the fourth quintile is a value that four-fifths of the data set is less than, and one-fifth greater than. That’s the mean of 745 and 1284, or 1014.5.

From looking at the quintiles we can say … well, not much, because this is a silly made-up problem that demonstrates how quintiles are calculated rather instead of why we’d want to do anything with them. At least the numbers come from real data. They’re the word counts of my first five A-to-Z definitions. But the existence of the quintiles at 365.5, 484.5, 635, and 1014.5, along with the minimum and maximum data points at 269 and 1284, tells us something. Mostly that numbers are bunched up in the three and four hundreds, but there could be some weird high numbers. If we had a bigger data set the results would be less obvious.

If the calculating of quintiles sounds much like the way we work out the median, that’s because it is. The median is the value that half the data is less than, and half the data is greater than. There are other ways of breaking down distributions. The first quartile is the value one-quarter of the data is less than. The second quartile a value two-quarters of the data is less than (so, yes, that’s the median all over again). The third quartile is a value three-quarters of the data is less than.

Percentiles are another variation on this. The (say) 26th percentile is a value that 26 percent — 26 hundredths — of the data is less than. The 72nd percentile a value greater than 72 percent of the data.

Are quintiles useful? Well, that’s a loaded question. They are used less than quartiles are. And I’m not sure knowing them is better than looking at a spreadsheet’s plot of the data. A plot of the data with the quintiles, or quartiles if you prefer, drawn in is better than either separately. But these are among the tools we have to tell what data values are likely, and how tightly bunched-up they are.

## Proper.

So there’s this family of mathematical jokes. They run about like this:

A couple people are in a hot air balloon that’s drifted off course. They’re floating towards a hill, and they can barely make out a person on the hill. They cry out, “Where are we?” And the person stares at them, and thinks, and watches the balloon sail aimlessly on. Just as the balloon is about to leave shouting range, the person cries out, “You are in a balloon!” And one of the balloonists says, “Great, we would have to get a mathematician.” “How do you know that was a mathematician?” “The person gave us an answer that’s perfectly true, is completely useless, and took a long time to produce.”

(There are equivalent jokes told about lawyers and consultants and many other sorts of people.)

A lot of mathematical questions have multiple answers. Factoring is a nice familiar example. If I ask “what’s a factor of 5,280”, you can answer “1” or “2” or “55” or “1,320” or some 44 other answers, each of them right. But some answers are boring. For example, 1 is a factor of every whole number. And any number is a factor of itself; you can divide 5,280 by 5,280 and get 1. The answers are right, yes, but they don’t tell you anything interesting. You know these two answers before you’ve even heard the question. So a boring answer like that we often write off as trivial.

A proper solution, then, is one that isn’t boring. The word runs through mathematics, attaching to many concepts. What exactly it means depends on the concept, but the general idea is the same: it means “not one of the obvious, useless answers”. A proper factor, for example, excludes the original number. Sometimes it excludes “1”, sometimes not. Depends on who’s writing the textbook. For another example, consider sets, which are collections of things. A subset is a collection of things all of which are already in a set. Every set is therefore a subset of itself. To be a proper subset, there has to be at least one thing in the original set that isn’t in the proper subset.

## Orthogonal.

Orthogonal is another word for perpendicular. So why do we need another word for that?

It helps to think about why “perpendicular” is a useful way to organize things. For example, we can describe the directions to a place in terms of how far it is north-south and how far it is east-west, and talk about how fast it’s travelling in terms of its speed heading north or south and its speed heading east or west. We can separate the north-south motion from the east-west motion. If we’re lucky these motions separate entirely, and we turn a complicated two- or three-dimensional problem into two or three simpler problems. If they can’t be fully separated, they can often be largely separated. We turn a complicated problem into a set of simpler problems with a nice and easy part plus an annoying yet small hard part.

And this is why we like perpendicular directions. We can often turn a problem into several simpler ones describing each direction separately, or nearly so.

And now the amazing thing. We can separate these motions because the north-south and the east-west directions are at right angles to one another. But we can describe something that works like an angle between things that aren’t necessarily directions. For example, we can describe an angle between things like functions that have the same domain. And once we can describe the angle between two functions, we can describe functions that make right angles between each other.

This means we can describe functions as being perpendicular to one another. An example. On the domain of real numbers from -1 to 1, the function $f(x) = x$ is perpendicular to the function $g(x) = x^2$. And when we want to study a more complicated function we can separate the part that’s in the “direction” of f(x) from the part that’s in the “direction” of g(x). We can treat functions, even functions we don’t know, as if they were locations in space. And we can study and even solve for the different parts of the function as if we were pinning down the north-south and the east-west movements of a thing.

So if we want to study, say, how heat flows through a body, we can work out a series of “direction” for functions, and work out the flow in each of those “directions”. These don’t have anything to do with left-right or up-down directions, but the concepts and the convenience is similar.

I’ve spoken about this in terms of functions. But we can define the “angle” between things for many kinds of mathematical structures. Once we can do that, we can have “perpendicular” pairs of things. I’ve spoken only about functions, but that’s because functions are more familiar than many of the mathematical structures that have orthogonality.

Ah, but why call it “orthogonal” rather than “perpendicular”? And I don’t know. The best I can work out is that it feels weird to speak of, say, the cosine function being “perpendicular” to the sine function when you can’t really say either is in any particular direction. “Orthogonal” seems to appeal less directly to physical intuition while still meaning something. But that’s my guess, rather than the verdict of a skilled etymologist.

## N-tuple.

We use numbers to represent things we want to think about. Sometimes the numbers represent real-world things: the area of our backyard, the number of pets we have, the time until we have to go back to work. Sometimes the numbers mean something more abstract: an index of all the stuff we’re tracking, or how its importance compares to other things we worry about.

Often we’ll want to group together several numbers. Each of these numbers may measure a different kind of thing, but we want to keep straight what kind of thing it is. For example, we might want to keep track of how many people are in each house on the block. The houses have an obvious index number — the street number — and the number of people in each house is just what it says. So instead of just keeping track of, say, “32” and “34” and “36”, and “3” and “2” and “3”, we would keep track of pairs: “32, 3”, and “34, 2”, and “36, 3”. These are called ordered pairs.

They’re not called ordered because the numbers are in order. They’re called ordered because the order in which the numbers are recorded contains information about what the numbers mean. In this case, the first number is the street address, and the second number is the count of people in the house, and woe to our data set if we get that mixed up.

And there’s no reason the ordering has to stop at pairs of numbers. You can have ordered triplets of numbers — (32, 3, 2), say, giving the house number, the number of people in the house, and the number of bathrooms. Or you can have ordered quadruplets — (32, 3, 2, 6), say, house number, number of people, bathroom count, room count. And so on.

An n-tuple is an ordered set of some collection of numbers. How many? We don’t care, or we don’t care to say right now. There are two popular ways to pronounce it. One is to say it the way you say “multiple” only with the first syllable changed to “enn”. Others say it about the same, but with a long u vowel, so, “enn-too-pull”. I believe everyone worries that everyone else says it the other way and that they sound like they’re the weird ones.

You might care to specify what your n is for your n-tuple. In that case you can plug in a value for that n right in the symbol: a 3-tuple is an ordered triplet. A 4-tuple is that ordered quadruplet. A 26-tuple seems like rather a lot but I’ll trust that you know what you’re trying to study. A 1-tuple is just a number. We might use that if we’re trying to make our notation consistent with something else in the discussion.

If you’re familiar with vectors you might ask: so, an n-tuple is just a vector? It’s not quite. A vector is an n-tuple, but in the same way a square is a rectangle. It has to meet some extra requirements. To be a vector we have to be able to add corresponding numbers together and get something meaningful out of it. The ordered pair (32, 3) representing “32 blocks north and 3 blocks east” can be a vector. (32, 3) plus (34, 2) can give us us (66, 5). This makes sense because we can say, “32 blocks north, 3 blocks east, 34 more blocks north, 2 more blocks east gives us 66 blocks north, 5 blocks east.” At least it makes sense if we don’t run out of city. But to add together (32, 3) plus (34, 2) meaning “house number 32 with 3 people plus house number 34 with 2 people gives us house number 66 with 5 people”? That’s not good, whatever town you’re in.

I think the commonest use of n-tuples is to talk about vectors, though. Vectors are such useful things.

## Measure.

Before painting a room you should spackle the walls. This fills up small holes and cracks. My father is notorious for using enough spackle to appreciably diminish the room’s volume. (So says my mother. My father disagrees.) I put spackle on as if I were paying for it myself, using so little my father has sometimes asked when I’m going to put any on. I’ll get to mathematics in the next paragraph.

One of the natural things to wonder about a set — a collection of things — is how big it is. The “measure” of a set is how we describe how big a set is. If we’re looking at a set that’s a line segment within a longer line, the measure pretty much matches our idea of length. If we’re looking at a shape on the plane, the measure matches our idea of area. A solid in space we expect has a measure that’s like the volume.

We might say the cracks and holes in a wall are as big as the amount of spackle it takes to fill them. Specifically, we mean it’s the least bit of spackle needed to fill them. And similarly we describe the measure of a set in terms of how much it takes to cover it. We even call this “covering”.

We use the tool of “cover sets”. These are sets with a measure — a length, a volume, a hypervolume, whatever — that we know. If we look at regular old normal space, these cover sets are typically circles or spheres or similar nice, round sets. They’re familiar. They’re easy to work with. We don’t have to worry about how to orient them, the way we might if we had square or triangular covering sets. These covering sets can be as small or as large as you need. And we suppose that we have some standard reference. This is a covering set with measure 1, this with measure 1/2, this with measure 24, this with measure 1/72.04, and so on. (If you want to know what units these measures are in, they’re “units of measure”. What we’re interested in is unchanged whether we measure in “inches” or “square kilometers” or “cubic parsecs” or something else. It’s just longer to say.)

You can imagine this as a game. I give you a set; you try to cover it. You can cover it with circles (or spheres, or whatever fits the space we’re in) that are big, or small, or whatever size you like. You can use as many as you like. You can cover more than just the things in the set I gave you. The only absolute rule is you must not miss anything, even one point, in the set I give you. Find the smallest total area of the covering circles you use. That smallest total area that covers the whole set is the measure of that set.

Generally, measure matches pretty well the intuitive feel we might have for length or area or volume. And the idea extends to things that don’t really have areas. For example, we can study the probability of events by thinking of the space of all possible outcomes of an experiment, like all the ways twenty coins might come up. We find the measure of the set of outcomes we’re interested in, like all the sets that have ten tails. The probability of the outcome we’re interested in is the measure of the set we’re interested in divided by the measure of the set of all possible outcomes. (There’s more work to do to make this quite true. In an advanced probability course we do this work. Please trust me that we could do it if we had to. Also you see why we stride briskly past the discussion of units. What unit would make sense for measuring “the space of all possible outcomes of an experiment” anyway?)

But there are surprises. For example, there’s the Cantor set. The easiest way to make the Cantor set is to start with a line of length 1 — of measure 1 — and take out the middle third. This produces two line segments of length, measure, 1/3 each. Take out the middle third of each of those segments. This leaves four segments each of length 1/9. Take out the middle third of each of those four segments, producing eight segments, and so on. If you do this infinitely many times you’ll create a set that has no measure; it fills no volume, it has no length. And yet you can prove there are just as many points in this set as there are in a real normal space. Somehow merely having a lot of points doesn’t mean they fill space.

Measure is useful not just because it can give us paradoxes like that. We often want to say how big sets, or subsets, of whatever we’re interested in are. And using measure lets us adapt things like calculus to become more powerful. We’re able to say what the integral is for functions that are much more discontinuous, more chopped up, than ones that high school or freshman calculus can treat, for example. The idea of measure takes length and area and such and makes it more abstract, giving it great power and applicability.

## Locus.

A locus is a collection of points that all satisfy some property. For example, the locus of points that are all equally distant from some center point is a circle. Or maybe it’ll be a sphere, or even a hypersphere. That depends whether we’re looking at points in a plane, in three-dimensional space, or something more. When we draw lines and parabolas and other figures like that in algebra we’re drawing locuses. Those locuses are the points that satisfy the property “the values of the coordinates of this point make that equation true”.

The idea is a bit different in connotation from “the curve of an equation”. We might not be talking about points that can be conveniently, or sensibly, described by an equation. We might want something like “the shape made by the reflection of this rectangle across this cylindrical mirror”. Or we might want “the points in space from which a space probe will crash into the moon, instead of crashing into Earth”. It’s convenient to have a shorthand way of talking about that idea. Using this word avoids necessarily tying ourselves to drawings or figures we might not be able to produce even in theory.

## Knot.

It’s a common joke that mathematicians shun things that have anything to do with the real world. You can see where the impression comes from, though. Even common mathematical constructs, such as “functions”, are otherworldly abstractions once a mathematician is done defining them precisely. It can look like mathematicians find real stuff to be too dull to study.

Knot theory goes against the stereotype. A mathematician’s knot is just about what you would imagine: threads of something that get folded and twisted back around themselves. Every now and then a knot theorist will get a bit of human-interest news going for the department by announcing a new way to tie a tie, or to tie a shoelace, or maybe something about why the Christmas tree lights get so tangled up. These are really parts of the field, and applications that almost leap off the page as one studies. It’s a bit silly, admittedly. The only way anybody needs to tie a tie is go see my father and have him do it for you, and then just loosen and tighten the knot for the two or three times you’ll need it. And there’s at most two ways of tying a shoelace anybody needs. Christmas tree lights are a bigger problem but nobody can really help with getting them untangled. But studying the field encourages a lot of sketches of knots, and they almost cry out to be done out of some real material.

One amazing thing about knots is that they can be described as mathematical expressions. There are multiple ways to encode a description for how a knot looks as a polynomial. An expression like $t + t^3 - t^4$ contains enough information to draw one knot as opposed to all the others that might exist. (In this case it’s a very simple knot, one known as the right-hand trefoil knot. A trefoil knot is a knot with a trefoil-like pattern.) Indeed, it’s possible to describe knots with polynomials that let you distinguish between a knot and its mirror-image reflection.

Biology, life, is knots. The DNA molecules that carry and transmit genes tangle up on themselves, creating knots. The molecules that DNA encodes, proteins and enzymes and all the other basic tools of cells, can be represented as knots. Since at this level the field is about how molecules interact you probably would expect that much of chemistry can be seen as the ways knots interact. Statistical mechanics, the study of unspeakably large number of particles, do as well. A field you can be introduced to by studying your sneaker runs through the most useful arteries of science.

That said, mathematicians do make their knots of unreal stuff. The mathematical knot is, normally, a one-dimensional thread rather than a cylinder of stuff like a string or rope or shoelace. No matter; just imagine you’ve got a very thin string. And we assume that it’s frictionless; the knot doesn’t get stuck on itself. As a result a mathematician just learning knot theory would snootily point out that however tightly wound up your extension cord is, it’s not actually knotted. You could in principle push one of the ends of the cord all the way through the knot and so loosen it into an untangled string, if you could push the cord from one end and if the cord didn’t get stuck on itself. So, yes, real-world knots are mathematically not knots. After all, something that just falls apart with a little push hardly seems worth the name “knot”.

My point is that mathematically a knot has to be a closed loop. And it’s got to wrap around itself in some sufficiently complicated way. A simple circle of string is not a knot. If “not a knot” sounds a bit childish you might use instead the Lewis Carrollian term “unknot”.

We can fix that, though, using a surprisingly common mathematical trick. Take the shoelace or rope or extension cord you want to study. And extend it: draw lines from either end of the cord out to the edge of your paper. (This is a great field for doodlers.) And then pretend that the lines go out and loop around, touching each other somewhere off the sheet of paper, as simply as possible. What had been an unknot is now not an unknot. Study wisely.

## Jump discontinuity.

Analysis is one of the major subjects in mathematics. That’s the study of functions. These usually have numbers as the domain and the range. The domain and range might be the real numbers, or complex numbers, or they might be sets of real or complex numbers. But they’re all numbers. If you asked for an example of one of these functions you’d get something that looked more or less like a function out of high school.

Continuity is one of the things mathematicians look for in functions. To a mathematician continuity means almost what you’d imagine from the everyday definition of the term. You could draw a sketch of a continuous function without having to lift your pen off the paper. (Typically. If you want to, you can define functions that meet the proper mathematical definition of “continuous” but that you really can’t draw. Mathematicians use these functions to keep one another humble.)

Continuous functions tend to be nice ones to work with. Continuity usually makes it easier to prove a function has whatever other properties you’d like. Mathematicians will even talk about continuous functions as being nice and well-behaved and even normal, as though the functions being easier to work with bestowed on them some moral virtue. However, not every function is continuous. Properly speaking, most functions aren’t continuous. This is the same way that most numbers aren’t whole numbers.

There are different ways that a function can be discontinuous. One of the easiest to understand and to work with is called a jump discontinuity. If you draw a plot representing a function with a jump discontinuity, it looks rather like the plot of a nice, well-behaved, continuous function except that at the discontinuity it jumps. From one side of the discontinuity to the other the function suddenly hops upward, or drops downward.

If a function only has jump discontinuities we aren’t badly off. We can write a function with jump discontinuities as the sum of a continuous function and a function made up only of jumps. The continuous function will be easy to work with, since it’s continuous. The function made of jumps isn’t continuous, by definition, but it’s going to be “flat” — it’ll have the same value in-between any two jumps. That’s usually easy to work with, and while the details of these jump functions will be different they’ll all look about the same. They’ll have different heights and jump up or down at different points, but if you know how to understand a function that jumps from being equal to 0 to being equal to 1 when the input goes from just below to just above 2, then you know how to understand a function that jumps from being equal to 0 to being equal to 3 when the input goes from just below 2.5 to just above 2.5.

This won’t let us work with every function. Most functions are going to be discontinuous in ways that we can’t resolve with jump functions. But a lot of the functions we’re naturally interested in, because they model interesting problems, can be. And so we can divide tricky functions into sets of functions that are easier to deal with.

## Into.

The definition of “into” will call back to my A to Z piece on “bijections”. It particularly call on what mathematicians mean by a function. When a mathematician talks about a functions she means a combination three components. The first is a set called the domain. The second is a set called the range. The last is a rule that matches up things in the domain to things in the range.

We said the function was “onto” if absolutely everything which was in the range got used. That is, if everything in the range has at least one thing in the domain that the rule matches to it. The function that has domain of -3 to 3, and range of -27 to 27, and the rule that matches a number x in the domain to the number x3 in the range is “onto”.

## Hypersphere.

If you asked someone to say what mathematicians do, there are, I think, three answers you’d get. One would be “they write out lots of decimal places”. That’s fair enough; that’s what numerical mathematics is about. One would be “they write out complicated problems in calculus”. That’s also fair enough; say “analysis” instead of “calculus” and you’re not far off. The other answer I’d expect is “they draw really complicated shapes”. And that’s geometry. All fair enough; this is stuff real mathematicians do.

Geometry has always been with us. You may hear jokes about never using algebra or calculus or such in real life. You never hear that about geometry, though. The study of shapes and how they fill space is so obviously useful that you sound like a fool saying you never use it. That would be like claiming you never use floors.

There are different kinds of geometry, though. The geometry we learn in school first is usually plane geometry, that is, how shapes on a two-dimensional surface like a sheet of paper or a computer screen work. Here we see squares and triangles and trapezoids and theorems with names like “side-angle-side congruence”. The geometry we learn as infants, and perhaps again in high school, is solid geometry, how shapes in three-dimensional spaces work. Here we see spheres and cubes and cones and something called “ellipsoids”. And there’s spherical geometry, the way shapes on the surface of a sphere work. This gives us great circle routes and loxodromes and tales of land surveyors trying to work out what Vermont’s northern border should be.

## Graph. (As in Graph Theory)

When I started this A to Z I figured it would be a nice string of quick, two-to-four paragraph compositions. So far each one has been a monster post instead. I’m hoping to get to some easier ones. For today I mean to talk about a graph, as in graph theory. That’s not the kind of graph that’s a plot of some function or a pie chart or a labelled map or something like that.

This kind of graph we do study as pictures, though. Specifically, they’re pictures with two essential pieces: a bunch of points or dots and a bunch of curves connecting dots. The dots we call vertices. The curves we call edges. My mental model tends to be of a bunch of points in a pegboard connected by wire or string. That might not work for you, but the idea of connecting things is what graphs, and graph theory, are good for studying.

## Fallacy.

Mathematics is built out of arguments. These are normally logical arguments, sequences of things which we say are true. We know they’re true because either they start from something we assume to be true or because they follow from logical deduction from things we assumed were true. Even calculations are a string of arguments. We start out with an expression we’re interested in, and do things which change the way it looks but which we can prove don’t change whether it’s true.

A fallacy is an argument that isn’t deductively sound. By deductively sound we mean that the premises we start with are true, and the reasoning we follow obeys the rules of deductive logic (omitted for clarity). if we’ve done that, then the conclusion at the end of the reasoning is — and must be — true.

## Error

This is one of my A to Z words that everyone knows. An error is some mistake, evidence of our human failings, to be minimized at all costs. That’s … well, it’s an attitude that doesn’t let you use error as a tool.

An error is the difference between what we would like to know and what we do know. Usually, what we would like to know is something hard to work out. Sometimes it requires complicated work. Sometimes it requires an infinite amount of work to get exactly right. Who has the time for that?

This is how we use errors. We look for methods that approximate the thing we want, and that estimate how much of an error that method makes. Usually, the method involves doing some basic step some large number of times. And usually, if we did the step more times, the estimate of the error we make will be smaller. My essay “Calculating Pi Less Terribly” shows an example of this. If we add together more terms from that Leibniz formula we get a running total that’s closer to the actual value of π.

## A Summer 2015 Mathematics A To Z: dual

And now to start my second week of this summer mathematics A to Z challenge. This time I’ve got another word that just appears all over the mathematics world.

## Dual.

The word “dual” turns up in a lot of fields. The details of what the dual is depend on which field of mathematics we’re talking about. But the general idea is the same. Start with some mathematical construct. The dual is some new mathematical thing, which is based on the thing you started with.

For example, for the box (or die) you create the dual this way. At the center of each of the flat surfaces (the faces, in the lingo) put a dot. That’s a corner (a vertex) of a new shape. You should have six of them when you’re done. Now imagine drawing in new edges between the corners. The rule is that you put an edge in from one corner to another only if the surfaces those corners come from were adjacent. And on your new shape you put in a surface, a face, between the new edges if the old edges shared a corner. If you’ve done this right, you should get out of it an eight-sided shape, with triangular surfaces, and six corners. It’s known as an octahedron, although you might know it better as an eight-sided die.

## Characteristic function. (Not the probability one.)

Today’s entry in my mathematical A-To-Z challenge is easier than the bijection function was. This is the characteristic function. Its domain is any set, any collection of things you like. This can be real numbers, it can be regions of space, it can be houses in a neighborhood. Its range, however, is just the two numbers 0 and 1. Its rule — well, that’s the trick. It’s not right to say there’s “the” characteristic function. There are many characteristic functions. It’s just they all look alike. This is the way they look.

To define a characteristic function we need some subset of the domain. A subset is just a collection of things that are also all in another set. So we want a subset — let me give it the name D — of the domain. This subset D can have one or a couple of things in it; it could have everything in the domain that’s in it. The one rule is that D can’t have something in it which isn’t also in the domain. Otherwise, anything goes. (It’s even fine if D doesn’t have anything in it.)

Now, the rule for the characteristic function for D is that the function for any given item in the domain — use x as a name for that — is equal to 1 if x is in D, and is equal to 0 if x is not in D. The function is usually written as the Greek letter chi ( $\chi$), or the letter I, or the number 1 put in some kind of fancy heavy font, with the D as a subscript so we know which characteristic function it is.

For example. Suppose the domain is the counting numbers. Suppose the subset D is the prime numbers: 2, 3, 5, 7, 11, 13, and so on. Then the characteristic function looks like this:

For the number x $\chi_D(x)$ is
1 0
2 1
3 1
4 0
5 1
6 0
7 1
8 0
9 0
10 0

… and so on.

Some might ask why create, much less care about, such a boring function? These are people who’ve never had to count how many rows on a large spreadsheet satisfied some complicated set of conditions. That trick where you create a column with a rule like ‘IF((C$2 > 80 AND C$2 ’01/01/2013’ AND C\$4 < '05/01/2013'), 1, 0)', and then add up the column, to find out how many things had a value between 80 and 90 and a date between the start of January and the start of May, 2013? That's using a characteristic function to figure out how large a collection of things is.

Characteristic functions offer ways of breaking down a complicated set into smaller ones all of which share some property. This can be used just to work out how large are the collections of things that share different properties. It can also be a way to break a big problem into multiple smaller problems. We hope those smaller problems are simpler enough that we’re making overall less work for ourselves despite increasing the number of problems. And that’s a good trick, one mathematicians rely on a lot.

## Bijection.

To explain this second term in my mathematical A to Z challenge I have to describe yet another term. That’s function. A non-mathematician’s idea a function is something like “a line with a bunch of x’s in it, and maybe also a cosine or something”. That’s fair enough, although it’s a bit like defining chemistry as “mixing together colored, bubbling liquids until something explodes”.

By a function a mathematician means a rule describing how to pair up things found in one set, called the domain, with the things found in another set, called the range. The domain and the range can be collections of anything. They can be counting numbers, real numbers, letters, shoes, even collections of numbers or sets of shoes. They can be the same kinds of thing. They can be different kinds of thing.

## A Summer 2015 Mathematics A To Z: ansatz

Sue Archer at the Doorway Between Worlds blog recently completed an A to Z challenge. I decided to follow her model and challenge and intend to do a little tour of some mathematical terms through the alphabet. My intent is to focus on some that are interesting terms of art that I feel non-mathematicians never hear. Or that they never hear clearly. Indeed, my first example is one I’m not sure I ever heard clearly described.

## Ansatz.

I first encountered this term in grad school. I can’t tell you when. I just realized that every couple sessions in differential equations the professor mentioned the ansatz for this problem. By then it felt too late to ask what it was I’d missed. In hindsight I’m not sure the professor ever made it clear. My research suggests the word is still a dialect rather than part of the universal language of mathematicians, and that it isn’t quite precisely defined.

What a mathematician means by the “ansatz” is the collection of ideas that go into solving a problem. This may be an assumption of what the solution should look like. This might be the assumptions of physical or mathematical properties a solution has to have. This might be a listing of properties that a valid solution would have to have. It could be the set of things you judge should be included, or ignored, in constructing a mathematical model of something. In short the ansatz is the set of possibly ad hoc assumptions you have to bring to a topic to make it something answerable. It’s different from the axioms of the field or the postulates for a problem. An axiom or postulate is assumed to be true by definition. The ansatz is a bunch of ideas we suppose are true because they seem likely to bring us to a solution.

An ansatz is good for getting an answer. It doesn’t do anything to verify that the answer means anything, though. The ansatz contains assumptions you the mathematician brought to the problem. You need to argue that the assumptions are reasonable, and reflect the actual problem you’re studying. You also should prove that the answer ultimately derived matches the actual behavior of whatever you were studying. Validating a solution can be the hardest part of mathematics, other than all the other parts of mathematics.