Some context. Since 2015 I’ve run a series of A-to-Z essays. This is writing a short glossary about various mathematics terms. Most years they were a complete pass through the alphabet, with some fudging to allow for there being fewer terms that start with ‘X’ than you’d think. For 2021, I changed the format up a little, writing instead one for each letter in the phrase “Mathematics A to Z”. You can read all 198 of the completed essays at this link.

Hard as I thought 2021 was, 2022 has been worse, in wearing down my energy and enthusiasm for big projects. Or projects at all. And an A-to-Z is a big, worthwhile but exhausting project. I am already too late in the year to run one essay a week and complete the full alphabet. Given the lead times necessary even a partial alphabet would be hard to fit in. (I don’t want to run these projects across years, although I accepted that for the 2021 project.) In 2015 and 2016 I was able to write three essays a week, a commitment of time and energy I can’t imagine making now.

So rather than sit idly hoping something will turn up, I accept the situation. I am not up for an A-to-Z this year. I hope that 2023 will be a happier year, one in which I can do a project of that magnitude. I suppose we all hope for that.

I try, at the end of each of these A-to-Z sessions, to think about what I’ve learned from the experience. The challenge is reliably interesting, thanks to the kind readers who suggest topics. While I reserve the right to choose my own subject for any letter, I usually go for what of the suggestions sounds most interesting. That nudges me out of my comfortable, familiar thoughts and into topics I know less well. I would never have written about cohomologies if I waited to think I had something to say about them.

I didn’t have any deep experiences like that this time, although I did get a better handle on tangent spaces and why we like them. Most of what I did learn was about process, and about how to approach writing here.

For example, I started appealing for topics more letters ahead than I had previous projects. The goal was to let myself build a reserve, so that I would have a week or more to let an essay sit while I re-thought what I’d said. Early on, this worked well and I liked the results. It also made it easier to tie essays together; multiplication and addition could complement one another. This is something I could expand on.

And varying from the strict alphabetical order seems to have worked too. The advantage of doing every letter in order is that I’m pushed into some unpromising letters, like ‘Q’ or ‘Y’. It’s fantastic when I get a good essay out of that. But that’s harder work. This time around I did three topics starting with A, and three with T, and there’s so many more I could write.

The biggest and hardest thing I learned was related to how my plans went awry. How I lost the several-weeks lead time I started with, and how I had to put the project on hold for nearly three months.

2021 was a hard year, after another hard year, after a succession of hard years. Mostly, these were hard years because the world had been hard. Wearying, which is why I started out doing a mere 15 essays instead of the full 26. But not things that too directly hit my personal comfort. During the Little 2021 A-to-Z, though, the hard got intimate. Personal disasters hit starting in mid-August, and kept progressing — or dragging out — through to the new year. Just in time for the world-hardness of the first Omicron wave of the pandemic.

I have always thought of myself as a Sabbath-is-made-for-Man person. That is, schedules are ways to help you get done what you want or need; they’re not of value in themselves. Yet I do value them. I like their hold, and I thrive within them. Part of my surviving the pandemic, when all normal activities stopped, was the schedule of things I write here and on my humor blog. They offered a reason to do something particular. If I were not living up to this commitment, then what was I doing?

The answer is I would be not stressing myself past what I can do. I like these A-to-Z essays, and all the writing I do, or I wouldn’t do it. It’s nourishing and often exciting. But it is labor, and it is stress. Exercising a bit longer or a bit harder than one feels able to helps one build endurance and strength. But there are times one’s muscles are exhausted, or one’s joints are worked too much, and you must rest. Not just stick to the routine exercise, but take a break so that you can recover. I had not taken a serious break since starting this blog, and hadn’t realized I would need to. Over the course of this A-to-Z I learned I sometimes need to, and I should.

I need also to think of what I will do next. I’m not sure when I will feel confident that I can do a full A-to-Z, or even a truncated version. My hunch is I need to do more mathematical projects here that are fun and playful. This implies thinking of fun and playful projects, and thinking is the hard part again. But I understand, in a way I had not before, that I can let go.

It’s good to have an index of the topics I wrote about for each of my A-to-Z sequences. It’s good for me, at least. It makes my future work much easier. And it might help people find past essays. I hope to have my essay about what I learned from a project that was supposed to be nearly one-third shorter, and ended up sprawling past its designated year, next week.

The joke to which I alluded last week was a quick pun. The setup is, “What is yellow and equivalent to the Axiom of Choice?” It’s the topic for this week, and the conclusion of the Little 2021 Mathematics A-to-Z. I again thank Mr Wu, of Singapore Maths Tuition, for a delightful topic.

The lemma is about partially ordered sets. A set’s partially ordered if it has a relationship between pairs of items in it. You will sometimes see a partially ordered set called a “poset”, a term of mathematical art which make me smile too. If we don’t know anything about the ordering relationship we’ll use the ≤ symbol, just like this was ordinary numbers. To be partially ordered, whenever x ≤ y and y ≤ x, we know that x and y must be equal. And the converse: if x = y then x ≤ y and y ≤ x. What makes this partial is that we’re not guaranteed that every x and y relate in some way. It’s a totally ordered set if we’re guaranteed that at least one of x ≤ y and y ≤ x is always true. And then there is such a thing as a well-ordered set. This is a totally ordered set for which every subset (unless it’s empty) has a minimal element.

If we have a couple elements, each of which we can put in some order, then we can create a chain. If x ≤ y and y ≤ z, then we can write x ≤ y ≤ z and we have at least three things all relating to one another. This seems like stuff too basic to notice, if we think too literally about the relationship being “is less than or equal to”. If the relationship is, say, “divides wholly into”, then we get some interesting different chains. Like, 2 divides into 4, which divides into 8, which divides into 24. And 3 divides into 6 which divides into 24. But 2 doesn’t divide into 3, nor 3 into 2. 4 doesn’t divide into 6, nor 6 into either 8 or 4.

So what Zorn’s Lemma says is, if all the chains in a partially ordered set each have an upper bound, then, the partially ordered set has a maximal element. “Maximal element” here means an element that doesn’t have a bigger comparable element. (That is, m is maximal if there’s no other element b for which m ≤ b. It’s possible that m and b can’t be compared, though, the way 6 doesn’t divide 8 and 8 doesn’t divide 6.) This is a little different from a “maximum” . It’s possible for there to be several maximal elements. But if you parse this as “if you can always find a maximum in a string of elements, there’s some maximum element”? And remember there could be many maximums? Then you’re getting the point.

You may also ask how this could be interesting. Zorn’s Lemma is an existence proof. Most existence proofs assure us a thing we thought existed does, but don’t tell us how to find it. This is all right. We tend to rely on an existence proof when we want to talk about some mathematical item but don’t care about fussy things like what it is. It is much the way we might talk about “an odd perfect number N”. We can describe interesting things that follow from having such a number even before we know what value N has.

A classic example, the one you find in any discussion of using Zorn’s Lemma, is about the basis for a vector space. This is like deciding how to give directions to a point in space. But vector spaces include some quite abstract things. One vector space is “the set of all functions you can integrate”. Another is “matrices whose elements are all four-dimensional rotations”. There might be literally infinitely many “directions” to go. How do we know we can find a set of directions that work as well as, for guiding us around a city, the north-south-east-west compass rose does? So there’s the answer. There are other things done all the time, too. A nontrivial ring-with-identity, for example, has to have a maximal ideal. (An ideal is a subset of the ring that’s still a ring.) This is handy to know if you’re working with rings a lot.

The joke in my prologue was built on the claim Zorn’s Lemma is equivalent to the Axiom of Choice. The Axiom of Choice is a piece of set theory that surprised everyone by being independent of the Zermelo-Fraenkel axioms. The Axiom says that, if you have a collection of disjoint nonempty sets, then there must exist at least one set with exactly one element from each of those sets. That is, you can pick one thing out of each of a set of bins. It’s easy to see how this has in common with Zorn’s Lemma being too obvious to imagine proving. That’s the sort of thing that makes a good axiom. Thing about a lemma, though, is we do prove it. That’s how we know it’s a lemma. How can a lemma be equivalent to an axiom?

I’l argue by analogy. In Euclidean geometry one of the axioms is this annoying statement about on which side of a line two other lines that intersect it will meet. If you have this axiom, you can prove some nice results, like, the interior angles of a triangle add up to two right angles. If you decide you’d rather make your axiom that bit about the interior angles adding up? You can go from that to prove the thing about two lines crossing a third line.

So it is here. If you suppose the Axiom of Choice is true, you can get Zorn’s Lemma: you can pick an element in your set, find a chain for which that’s the minimum, and find your maximal element from that. If you make Zorn’s Lemma your axiom? You can use x ≤ y to mean “x is a less desirable element to pick out of this set than is y”. And then you can choose a maximal element out of your set. (It’s a bit more work than that, but it’s that kind of work.)

There’s another theorem, or principle, that’s (with reservations) equivalent to both Zorn’s Lemma and the Axiom of Choice. It’s another piece that seems so obvious it should defy proof. This is the well-ordering theorem, which says that every set can be well-ordered. That is, so that every non-empty subset has some minimum element. Finally, a mathematical excuse for why we have alphabetical order, even if there’s no clear reason that “j” should come after “i”.

(I said “with reservations” above. This is because whether these are equivalent depends on what, precisely, kind of deductive logic you’re using. If you are not using ordinary propositional logic, and are using a “second-order logic” instead, they differ.)

Ermst Zermelo introduced the Axiom of Choice to set theory so that he could prove this in a way that felt reasonable. I bet you can imagine how you’d go from “every non-empty set has a minimum element” right back to “you can always pick one element of every set”, though. And, maybe if you squint, can see how to get from “there’s always a minimum” to “there has to be a maximum”. I’m speaking casually here because proving it precisely is more work than we need to do.

I mentioned how Zorn did not name his lemma after himself. Mathematicians typically don’t name things for themselves. Nor did he even think of it as a lemma. His name seems to have adhered to the principle in the late 30s. Credit the nonexistent mathematician Bourbaki writing about “le théorème de Zorn”. By 1940 John Tukey, celebrated for the Fast Fourier Transform, wrote of “Zorn’s Lemma”. Tukey’s impression was that this is how people in Princeton spoke of it at the time. He seems to have been the first to put the words “Zorn’s Lemma” in print, though. Zorn isn’t the first to have stated this. Kazimierez Kuratowski, in 1922, described what is clearly Zorn’s Lemma in a different form. Zorn remembered being aware of Kuratowski’s publication but did not remember noticing the property. The Hausdorff Maximal Principle, of Felix Hausdorff, has much the same content. Zorn said he did not know about Hausdorff’s 1927 paper until decades later.

Zorn’s lemma, the Axiom of Choice, the well-ordering theorem, and Hausdorff’s Maximal Principle all date to the early 20th century. So do a handful of other ideas that turn out to be equivalent. This was an era when set theory saw an explosive development of new and powerful ideas. The point of describing this chain is to emphasize that great concepts often don’t have a unique presentation. Part of the development of mathematics is picking through several quite similar expressions of a concept. Which one do we enshrine as an axiom, or at least the canonical presentation of the idea?

Mr Wu, my Singapore Maths Tuition friend, has offered many fine ideas for A-to-Z topics. This week’s is another of them, and I’m grateful for it.

Ordinary Differential Equations

As a rule, if you can do something with a number, you can do the same thing with a function. Not always, of course, but the exceptions are fewer than you might imagine. I’ll start with one of those things you can do to both.

A powerful thing we learn in (high school) algebra is that we can use a number without knowing what it is. We give it a name like ‘x’ or ‘y’ and describe what we find interesting about it. If we want to know what it is, we (usually) find some equation or set of equations and find what value of x could make that true. If we study enough (college) mathematics we learn its equivalent in functions. We give something a name like f or g or Ψ and describe what we know about it. And then try to find functions which make that true.

There are a couple common types of equation for these not-yet-known functions. The kind you expect to learn as a mathematics major involves differential equations. These are ones where your equation (or equations) involve derivatives of the not-yet-known f. A derivative describes the rate at which something changes. If we imagine the original f is a position, the derivative is velocity. Derivatives can have derivatives also; this second derivative would be the acceleration. And then second derivatives can have derivatives also, and so on, into infinity. When an equation involves a function and its derivatives we have a differential equation.

(The second common type is the integral equation, using a function and its integrals. And a third involves both derivatives and integrals. That’s known as an integro-differential equation, and isn’t life complicated enough? )

Differential equations themselves naturally divide into two kinds, ordinary and partial. They serve different roles. Usually an ordinary differential equation we can describe the change for from knowing only the current situation. (This may include velocities and accelerations and stuff. We could ask what the velocity at an instant means. But never mind that here.) Usually a partial differential equation bases the change where you are on the neighborhood of where your location. If you see holes you can pick in that, you’re right. The precise difference is about the independent variables. If the function f has more than one independent variable, it’s possible to take a partial derivative. This describes how f changes if one variable changes while the others stay fixed. If the function f has only the one independent variable, you can only take ordinary derivatives. So you get an ordinary differential equation.

But let’s speak casually here. If what you’re studying can be fully represented with a dashboard readout? Like, an ordered list of positions and velocities and stuff? You probably have an ordinary differential equation. If you need a picture with a three-dimensional surface or a color map to understand it? You probably have a partial differential equation.

One more metaphor. If you can imagine the thing you’re modeling as a marble rolling around on a hilly table? Odds are that’s an ordinary differential equation. And that representation covers a lot of interesting problems. Marbles on hills, obviously. But also rigid pendulums: we can treat the angle a pendulum makes and the rate at which those change as dimensions of space. The pendulum’s swinging then matches exactly a marble rolling around the right hilly table. Planets in space, too. We need more dimensions — three space dimensions and three velocity dimensions — for each planet. So, like, the Sun-Earth-and-Moon would be rolling around a hilly table with 18 dimensions. That’s all right. We don’t have to draw it. The mathematics works about the same. Just longer.

[ To be precise we need three momentum dimensions for each orbiting body. If they’re not changing mass appreciably, and not moving too near the speed of light, velocity is just momentum times a constant number, so we can use whichever is easier to visualize. ]

We mostly work with ordinary differential equations of either the first or the second order. First order means we have first derivatives in the equation, but never have to deal with more than the original function and its first derivative. Second order means we have second derivatives in the equation, but never have to deal with more than the original function or its first or second derivatives. You’ll never guess what a “third order” differential equation is unless you have experience in reading words. There are some reasons we stick to these low orders like first and second, though. One is that we know of good techniques for solving most first- and second-order ordinary differential equations. For higher-order differential equations we often use techniques that find a related normal old polynomial. Its solution helps with the thing we want. Or we break a high-order differential equation into a set of low-order ones. So yes, again, we search for answers where the light is good. But the good light covers many things we like to look at.

There’s simple harmonic motion, for example. It covers pendulums and springs and perturbations around stable equilibriums and all. This turns out to cover so many problems that, as a physics major, you get a little sick of simple harmonic motion. There’s the Airy function, which started out to describe the rainbow. It turns out to describe particles trapped in a triangular quantum well. The van der Pol equation, about systems where a small oscillation gets energy fed into it while a large oscillation gets energy drained. All kinds of exponential growth and decay problems. Very many functions where pairs of particles interact.

This doesn’t cover everything we would like to do. That’s all right. Ordinary differential equations lend themselves to numerical solutions. It requires considerable study and thought to do these numerical solutions well. But this doesn’t make the subject unapproachable. Few of us could animate the “Pink Elephants on Parade” scene from Dumbo. But could you draw a flip book of two stick figures tossing a ball back and forth? If you’ve had a good rest, a hearty breakfast, and have not listened to the news yet today, so you’re in a good mood?

The flip book ball is a decent example here, too. The animation will look good if the ball moves about the “right” amount between pages. A little faster when it’s first thrown, a bit slower as it reaches the top of its arc, a little faster as it falls back to the catcher. The ordinary differential equation tells us how fast our marble is rolling on this hilly table, and in what direction. So we can calculate how far the marble needs to move, and in what direction, to make the next page in the flip book.

Almost. The rate at which the marble should move will change, in the interval between one flip-book page and the next. The difference, the error, may not be much. But there is a difference between the exact and the numerical solution. Well, there is a difference between a circle and a regular polygon. We have many ways of minimizing and estimating and controlling the error. Doing that is what makes numerical mathematics the high-paid professional industry it is. Our game of catch we can verify by flipping through the book. The motion of four dozen planets and moons attracting one another is harder to be sure we calculate it right.

I said at the top that most anything one can do with numbers one can do with functions also. I would like to close the essay with some great parallel. Like, the way that trying to solve cubic equations made people realize complex numbers were good things to have. I don’t have a good example like that for ordinary differential equations, where the study expanded our ideas of what functions could be. Part of that is that complex numbers are more accessible than the stranger functions. Part of that is that complex numbers have a story behind them. The story features titanic figures like Gerolamo Cardano, Niccolò Tartaglia and Ludovico Ferrari. We see some awesome and weird personalities in 19th century mathematics. But their fights are generally harder to watch from the sidelines and cheer on. And part is that it’s easier to find pop historical treatments of the kinds of numbers. The historiography of what a “function” is is a specialist occupation.

But I can think of a possible case. A tool that’s sometimes used in solving ordinary differential equations is the “Dirac delta function”. Yes, that Paul Dirac. It’s a weird function, written as . It’s equal to zero everywhere, except where is zero. When is zero? It’s … we don’t talk about what it is. Instead we talk about what it can do. The integral of that Dirac delta function times some other function can equal that other function at a single point. It strains credibility to call this a function the way we speak of, like, or being functions. Many will classify it as a distribution instead. But it is so useful, for a particular kind of problem, that it’s impossible to throw away.

So perhaps the parallels between numbers and functions extend that far. Ordinary differential equations can make us notice kinds of functions we would not have seen otherwise.

And with this — I can see the much-postponed end of the Little 2021 Mathematics A-to-Z! You can read all my entries for 2021 at this link, and if you’d like can find all my A-to-Z essays here. How will I finish off the shortest yet most challenging sequence I’ve done yet? Will it be yellow and equivalent to the Axiom of Choice? Answers should come, in a week, if all starts going well.

And now, finally, I resume and hopefully finish what was meant to be a simpler and less stressful A-to-Z for last year. I’m feeling much better about my stress loads now and hope that I can soon enjoy the feeling of having a thing accomplished.

This topic is one of many suggestions that Elkement, one of my longest blog-friendships here, offered. It’s a creation that sent me back to my grad school textbooks, some of those slender paperback volumes with tiny, close-set type that turn out to be far more expensive than you imagine. Though not in this case: my most useful reference here was V I Arnold’s Ordinary Differential Equations, stamped inside as costing $18.75. The field is full of surprises. Another wonderful reference was this excellent set of notes prepared by Jodin Morey. They would have done much to help me through that class.

Tangent Space

Stand in midtown Manhattan, holding a map of midtown Manhattan. You have — not a tangent space, not yet. A tangent plane, representing the curved surface of the Earth with the flat surface of your map, though. But the tangent space is near: see how many blocks you must go, along the streets and the avenues, to get somewhere. Four blocks north, three west. Two blocks south, ten east. And so on. Those directions, of where you need to go, are the tangent space around you.

There is the first trick in tangent spaces. We get accustomed, early in learning calculus, to think of tangent lines and then of tangent planes. These are nice, flat approximations to some original curve. But while we’re introduced to the tangent space, and first learn examples of it, as tangent planes, we don’t stay there. There are several ways to define tangent spaces. One recasts tangent spaces in group theory terms, describing them as a ring based on functions that are equal to zero at the tangent point. (To be exact, it’s an ideal, based on a quotient group, based on two sets of such functions.)

That’s a description mathematicians are inclined to like, not only because it’s far harder to imagine than a map of the city is. But this ring definition describes the tangent space in terms of what we can do with it, rather than how to calculate finding it. That tends to appeal to mathematicians. And it offers surprising insights. Cleverer mathematicians than I am notice how this makes tangent spaces very close to Lagrange multipliers. Lagrange multipliers are a technique to find the maximum of a function subject to a constraint from another function. They seem to work by magic, and tangent spaces will echo that.

I’ll step back from the abstraction. There’s relevant observations to make from this map of midtown. The directions “four blocks north, three west” do not represent any part of Manhattan. It describes a way you might move in Manhattan, yes. But you could move in that direction from many places in the city. And you could go four blocks north and three west if you were in any part of any city with a grid of streets. It is a vector space, with elements that are velocities at a tangent point.

The tangent space is less a map showing where things are and more one of how to get to other places, closer to a subway map than a literal one. Still, the topic is steeped in the language of maps. I’ll find it a useful metaphor too. We do not make a map unless we want to know how to find something. So the interesting question is what do we try to find in these tangent spaces?

There are several routes to tangent spaces. The one I’m most familiar with is through dynamical systems. These are typically physics-driven, sometimes biology-driven, problems. They describe things that change in time according to ordinary differential equations. Physics problems particularly are often about things moving in space. Space, in dynamical systems, becomes “phase space”, an abstract universe spanned by all of the possible values of the variables. The variables are, usually, the positions and momentums of the particles (for a physics problem). Sometimes time and energy appear as variables. In biology variables are often things that represent populations. The role the Earth served in my first paragraph is now played by a manifold. The manifold represents whatever constraints are relevant to the problem. That’s likely to be conservation laws or limits on how often arctic hares can breed or such.

The evolution in time of this system, though, is now the tracing out of a path in phase space. An understandable and much-used system is the rigid pendulum. A stick, free to swing around a point. There are two useful coordinates here. There’s the angle the stick makes, relative to the vertical axis, . And there’s how fast the stick is changing, . You can draw these axes; I recommend as the horizontal and as the vertical axis but, you know, you do you.

If you give the pendulum a little tap, it’ll swing back and forth. It rises and moves to the right, then falls while moving to the left, then rises and moves to the left, then falls and moves to the right. In phase space, this traces out an ellipse. It’s your choice whether it’s going clockwise or anticlockwise. If you give the pendulum a huge tap, it’ll keep spinning around and around. It’ll spin a little slower as it gets nearly upright, but it speeds back up again. So in phase space that’s a wobbly line, moving either to the right or the left, depending what direction you hit it.

You can even imagine giving the pendulum just the right tap, exactly hard enough that it rises to vertical and balances there, perfectly aligned so it doesn’t fall back down. This is a special path, the dividing line between those ellipses and that wavy line. Or setting it vertically there to start with and trusting no truck driving down the street will rattle it loose. That’s a very precise dot, where is exactly zero. These paths, the trajectories, match whatever walking you did in the first paragraph to get to some spot in midtown Manhattan. And now let’s look again at the map, and the tangent space.

Within the tangent space we see what changes would change the system’s behavior. How much of a tap we would need, say, to launch our swinging pendulum into never-ending spinning. Or how much of a tap to stop a spinning pendulum. Every point on a trajectory of a dynamical system has a tangent space. And, for many interesting systems, the tangent space will be separable into two pieces. One of them will be perturbations that don’t go far from the original trajectory. One of them will be perturbations that do wander far from the original.

These regions may have a complicated border, with enclaves and enclaves within enclaves, and so on. This can be where we get (deterministic) chaos from. But what we usually find interesting is whether the perturbation keeps the old behavior intact or destroys it altogether. That is, how we can change where we are going.

That said, in practice, mathematicians don’t use tangent spaces to send pendulums swinging. They tend to come up when one is past studying such petty things as specific problems. They’re more often used in studying the ways that dynamical systems can behave. Tangent spaces themselves often get wrapped up into structures with names like tangent bundles. You’ll see them proving the existence of some properties, describing limit points and limit cycles and invariants and quite a bit of set theory. These can take us surprising places. It’s possible to use a tangent-space approach to prove the fundamental theorem of algebra, that every polynomial has at least one root. This seems to me the long way around to get there. But it is amazing to learn that is a place one can go.

Here I stand at the end of the pause I took in 2021’s Little Mathematics A-to-Z, in the hopes of building the time and buffer space to write its last three essays. Have I succeeded? We’ll see next week, but I will say that I feel myself in a much better place than I was in December.

The Zero Devisor closed out my big project for the first plague year. It let me get back to talking about abstract algebra, one of the cores of a mathematics major’s education. And it let me get into graph theory, the unrequited love of my grad school life. The subject also let me tie back to Michael Atiyah, the start of that year’s A-to-Z. Often a sequence will pick up a theme and 2020’s gave a great illusion of being tightly constructed.

Jacob Siehler had several suggestions for this last of the A-to-Z essays for 2020. Zorn’s Lemma was an obvious choice. It’s got an important place in set theory, it’s got some neat and weird implications. It’s got a great name. The zero divisor is one of those technical things mathematics majors have deal with. It never gets any pop-mathematics attention. I picked the less-travelled road and found a delightful scenic spot.

Zero Divisor.

3 times 4 is 12. That’s a clear, unambiguous, and easily-agreed-upon arithmetic statement. The thing to wonder is what kind of mathematics it takes to mess that up. The answer is algebra. Not the high school kind, with x’s and quadratic formulas and all. The college kind, with group theory and rings.

A ring is a mathematical construct that lets you do a bit of arithmetic. Something that looks like arithmetic, anyway. It has a set of elements. (An element is just a thing in a set. We say “element” because it feels weird to call it “thing” all the time.) The ring has an addition operation. The ring has a multiplication operation. Addition has an identity element, something you can add to any element without changing the original element. We can call that ‘0’. The integers, or to use the lingo , are a ring (among other things).

Among the rings you learn, after the integers, is the integers modulo … something. This can be modulo any counting number. The integers modulo 10, for example, we write as for short. There are different ways to think of what this means. The one convenient for this essay is that it’s the integers 0, 1, 2, up through 9. And that the result of any calculation is “how much more than a whole multiple of 10 this calculation would otherwise be”. So then 3 times 4 is now 2. 3 times 5 is 5; 3 times 6 is 8. 3 times 7 is 1, and doesn’t that seem peculiar? That’s part of how modulo arithmetic warns us that groups and rings can be quite strange things.

We can do modulo arithmetic with any of the counting numbers. Look, for example, at instead. In the integers modulo 5, 3 times 4 is … 2. This doesn’t seem to get us anything new. How about ? In this, 3 times 4 is 4. That’s interesting. It doesn’t make 3 the multiplicative identity for this ring. 3 times 3 is 1, for example. But you’d never see something like that for regular arithmetic.

How about ? Now we have 3 times 4 equalling 0. And that’s a dramatic break from how regular numbers work. One thing we know about regular numbers is that if a times b is 0, then either a is 0, or b is zero, or they’re both 0. We rely on this so much in high school algebra. It’s what lets us pick out roots of polynomials. Now? Now we can’t count on that.

When this does happen, when one thing times another equals zero, we have “zero divisors”. These are anything in your ring that can multiply by something else to give 0. Is zero, the additive identity, always a zero divisor? … That depends on what the textbook you first learned algebra from said. To avoid ambiguity, you can write a “nonzero zero divisor”. This clarifies your intentions and slows down your copy editing every time you read “nonzero zero”. Or call it a “nontrivial zero divisor” or “proper zero divisor” instead. My preference is to accept 0 as always being a zero divisor. We can disagree on this. What of zero divisors other than zero?

Your ring might or might not have them. It depends on the ring. The ring of integers , for example, doesn’t have any zero divisors except for 0. The ring of integers modulo 12 , though? Anything that isn’t relatively prime to 12 is a zero divisor. So, 2, 3, 6, 8, 9, and 10 are zero divisors here. The ring of integers modulo 13 ? That doesn’t have any zero divisors, other than zero itself. In fact any ring of integers modulo a prime number, , lacks zero divisors besides 0.

Focusing too much on integers modulo something makes zero divisors sound like some curious shadow of prime numbers. There are some similarities. Whether a number is prime depends on your multiplication rule and what set of things it’s in. Being a zero divisor in one ring doesn’t directly relate to whether something’s a zero divisor in any other. Knowing what the zero divisors are tells you something about the structure of the ring.

It’s hard to resist focusing on integers-modulo-something when learning rings. They work very much like regular arithmetic does. Even the strange thing about them, that every result is from a finite set of digits, isn’t too alien. We do something quite like it when we observe that three hours after 10:00 is 1:00. But many sets of elements can create rings. Square matrixes are the obvious extension. Matrixes are grids of elements, each of which … well, they’re most often going to be numbers. Maybe integers, or real numbers, or complex numbers. They can be more abstract things, like rotations or whatnot, but they’re hard to typeset. It’s easy to find zero divisors in matrixes of numbers. Imagine, like, a matrix that’s all zeroes except for one element, somewhere. There are a lot of matrices which, multiplied by that, will be a zero matrix, one with nothing but zeroes in it. Another common kind of ring is the polynomials. For these you need some constraint like the polynomial coefficients being integers-modulo-something. You can make that work.

In 1988 Istvan Beck tried to establish a link between graph theory and ring theory. We now have a usable standard definition of one. If is any ring, then is the zero-divisor graph of . (I know some of you think is the real numbers. No; that’s a bold-faced instead. Unless that’s too much bother to typeset.) You make the graph by putting in a vertex for the elements in . You connect two vertices a and b if the product of the corresponding elements is zero. That is, if they’re zero divisors for one other. (In Beck’s original form, this included all the elements. In modern use, we don’t bother including the elements that are not zero divisors.)

Drawing this graph makes tools from graph theory available to study rings. We can measure things like the distance between elements, or what paths from one vertex to another exist. What cycles — paths that start and end at the same vertex — exist, and how large they are. Whether the graphs are bipartite. A bipartite graph is one where you can divide the vertices into two sets, and every edge connects one thing in the first set with one thing in the second. What the chromatic number — the minimum number of colors it takes to make sure no two adjacent vertices have the same color — is. What shape does the graph have?

It’s easy to think that zero divisors are just a thing which emerges from a ring. The graph theory connection tells us otherwise. You can make a potential zero divisor graph and ask whether any ring could fit that. And, from that, what we can know about a ring from its zero divisors. Mathematicians are drawn as if by an occult hand to things that let you answer questions about a thing from its “shape”.

And this lets me complete a cycle in this year’s A-to-Z, to my delight. There is an important question in topology which group theory could answer. It’s a generalization of the zero-divisors conjecture, a hypothesis about what fits in a ring based on certain types of groups. This hypothesis — actually, these hypotheses. There are a bunch of similar questions about invariants called the L^{2}-Betti numbers can be. These we call the Atiyah Conjecture. This because of work Michael Atiyah did in the cohomology of manifolds starting in the 1970s. It’s work, I admit, I don’t understand well enough to summarize, and hope you’ll forgive me for that. I’m still amazed that one can get to cutting-edge mathematics research this. It seems, at its introduction, to be only a subversion of how we find x for which .

I suspect it is impossible to say enough about Zeno’s Paradoxes. To close out my 2019 A-to-Z, though, I tried saying something. There are four particularly famous paradoxes and I discuss what are maybe the second and third-most-popular ones here. (The paradox of the Dichotomy is surely most famous.) The problems presented are about motion and may seem to be about physics, or at least about perception. But calculus is built on differentials, on the idea that we can describe how fast a thing is changing at an instant. Mathematicians have worked out a way to define this that we’re satisfied with and that doesn’t require (obvious) nonsense. But to claim we’ve solved Zeno’s Paradoxes — as unwary STEM majors sometimes do — is unwarranted.

Also I was able to work in a picture from an amusement park trip I took, the closing weekend of Kings Island park in 2019 and the last day that The Vortex roller coaster would run.

Today’s A To Z term was nominated by Dina Yagodich, who runs a YouTube channel with a host of mathematics topics. Zeno’s Paradoxes exist in the intersection of mathematics and philosophy. Mathematics majors like to declare that they’re all easy. The Ancient Greeks didn’t understand infinite series or infinitesimals like we do. Now they’re no challenge at all. This reflects a belief that philosophers must be silly people who haven’t noticed that one can, say, exit a room.

This is your classic STEM-attitude of missing the point. We may suppose that Zeno of Elea occasionally exited rooms himself. That is a supposition, though. Zeno, like most philosophers who lived before Socrates, we know from other philosophers making fun of him a century after he died. Or at least trying to explain what they thought he was on about. Modern philosophers are expected to present others’ arguments as well and as strongly as possible. This even — especially — when describing an argument they want to say is the stupidest thing they ever heard. Or, to use the lingo, when they wish to refute it. Ancient philosophers had no such compulsion. They did not mind presenting someone else’s argument sketchily, if they supposed everyone already knew it. Or even badly, if they wanted to make the other philosopher sound ridiculous. Between that and the sparse nature of the record, we have to guess a bit about what Zeno precisely said and what he meant. This is all right. We have some idea of things that might reasonably have bothered Zeno.

And they have bothered philosophers for thousands of years. They are about change. The ones I mean to discuss here are particularly about motion. And there are things we do not understand about change. This essay will not answer what we don’t understand. But it will, I hope, show something about why that’s still an interesting thing to ponder.

Zeno’s Paradoxes.

When we capture a moment by photographing it we add lies to what we see. We impose a frame on its contents, discarding what is off-frame. We rip an instant out of its context. And that before considering how we stage photographs, making people smile and stop tilting their heads. We forgive many of these lies. The things excluded from or the moments around the one photographed might not alter what the photograph represents. Making everyone smile can convey the emotional average of the event in a way that no individual moment represents. Arranging people to stand in frame can convey the participation in the way a candid photograph would not.

But there remains the lie that a photograph is “a moment”. It is no such thing. We notice this when the photograph is blurred. It records all the light passing through the lens while the shutter is open. A photograph records an eighth of a second. A thirtieth of a second. A thousandth of a second. But still, some time. There is always the ghost of motion in a picture. If we do not see it, it is because our photograph’s resolution is too coarse. If we could photograph something with infinite fidelity we would see, even in still life, the wobbling of the molecules that make up a thing.

Which implies something fascinating to me. Think of a reel of film. Here I mean old-school pre-digital film, the thing that’s a great strip of pictures, a new one shown 24 times per second. Each frame of film is a photograph, recording some split-second of time. How much time is actually in a film, then? How long, cumulatively, was a camera shutter open during a two-hour film? I use pre-digital, strip-of-film movies for convenience. Digital films offer the same questions, but with different technical points. And I do not want the writing burden of describing both analog and digital film technologies. So I will stick to the long sequence of analog photographs model.

Let me imagine a movie. One of an ordinary everyday event; an actuality, to use the terminology of 1898. A person overtaking a walking tortoise. Look at the strip of film. There are many frames which show the person behind the tortoise. There are many frames showing the person ahead of the tortoise. When are the person and the tortoise at the same spot?

We have to put in some definitions. Fine; do that. Say we mean when the leading edge of the person’s nose overtakes the leading edge of the tortoise’s, as viewed from our camera. Or, since there must be blur, when the center of the blur of the person’s nose overtakes the center of the blur of the tortoise’s nose.

Do we have the frame when that moment happened? I’m sure we have frames from the moments before, and frames from the moments after. But the exact moment? Are you positive? If we zoomed in, would it actually show the person is a millimeter behind the tortoise? That the person is a hundredth of a millimeter ahead? A thousandth of a hair’s width behind? Suppose that our camera is very good. It can take frames representing as small a time as we need. Does it ever capture that precise moment? To the point that we know, no, it’s not the case that the tortoise is one-trillionth the width of a hydrogen atom ahead of the person?

If we can’t show the frame where this overtaking happened, then how do we know it happened? To put it in terms a STEM major will respect, how can we credit a thing we have not observed with happening? … Yes, we can suppose it happened if we suppose continuity in space and time. Then it follows from the intermediate value theorem. But then we are begging the question. We impose the assumption that there is a moment of overtaking. This does not prove that the moment exists.

Fine, then. What if time is not continuous? If there is a smallest moment of time? … If there is, then, we can imagine a frame of film that photographs only that one moment. So let’s look at its footage.

One thing stands out. There’s finally no blur in the picture. There can’t be; there’s no time during which to move. We might not catch the moment that the person overtakes the tortoise. It could “happen” in-between moments. But at least we have a moment to observe at leisure.

So … what is the difference between a picture of the person overtaking the tortoise, and a picture of the person and the tortoise standing still? A movie of the two walking should be different from a movie of the two pretending to be department store mannequins. What, in this frame, is the difference? If there is no observable difference, how does the universe tell whether, next instant, these two should have moved or not?

A mathematical physicist may toss in an answer. Our photograph is only of positions. We should also track momentum. Momentum carries within it the information of how position changes over time. We can’t photograph momentum, not without getting blurs. But analytically? If we interpret a photograph as “really” tracking the positions of a bunch of particles? To the mathematical physicist, momentum is as good a variable as position, and it’s as measurable. We can imagine a hyperspace photograph that gives us an image of positions and momentums. So, STEM types show up the philosophers finally, right?

Hold on. Let’s allow that somehow we get changes in position from the momentum of something. Hold off worrying about how momentum gets into position. Where does a change in momentum come from? In the mathematical physics problems we can do, the change in momentum has a value that depends on position. In the mathematical physics problems we have to deal with, the change in momentum has a value that depends on position and momentum. But that value? Put it in words. That value is the change in momentum. It has the same relationship to acceleration that momentum has to velocity. For want of a real term, I’ll call it acceleration. We need more variables. An even more hyperspatial film camera.

… And does acceleration change? Where does that change come from? That is going to demand another variable, the change-in-acceleration. (The “jerk”, according to people who want to tell you that “jerk” is a commonly used term for the change-in-acceleration, and no one else.) And the change-in-change-in-acceleration. Change-in-change-in-change-in-acceleration. We have to invoke an infinite regression of new variables. We got here because we wanted to suppose it wasn’t possible to divide a span of time infinitely many times. This seems like a lot to build into the universe to distinguish a person walking past a tortoise from a person standing near a tortoise. And then we still must admit not knowing how one variable propagates into another. That a person is wide is not usually enough explanation of how they are growing taller.

Numerical integration can model this kind of system with time divided into discrete chunks. It teaches us some ways that this can make logical sense. It also shows us that our projections will (generally) be wrong. At least unless we do things like have an infinite number of steps of time factor into each projection of the next timestep. Or use the forecast of future timesteps to correct the current one. Maybe use both. These are … not impossible. But being “ … not impossible” is not to say satisfying. (We allow numerical integration to be wrong by quantifying just how wrong it is. We call this an “error”, and have techniques that we can use to keep the error within some tolerated margin.)

So where has the movement happened? The original scene had movement to it. The movie seems to represent that movement. But that movement doesn’t seem to be in any frame of the movie. Where did it come from?

We can have properties that appear in a mass which don’t appear in any component piece. No molecule of a substance has a color, but a big enough mass does. No atom of iron is ferromagnetic, but a chunk might be. No grain of sand is a heap, but enough of them are. The Ancient Greeks knew this; we call it the Sorites paradox, after Eubulides of Miletus. (“Sorites” means “heap”, as in heap of sand. But if you had to bluff through a conversation about ancient Greek philosophers you could probably get away with making up a quote you credit to Sorites.) Could movement be, in the term mathematical physicists use, an intensive property? But intensive properties are obvious to the outside observer of a thing. We are not outside observers to the universe. It’s not clear what it would mean for there to be an outside observer to the universe. Even if there were, what space and time are they observing in? And aren’t their space and their time and their observations vulnerable to the same questions? We’re in danger of insisting on an infinite regression of “universes” just so a person can walk past a tortoise in ours.

We can say where movement comes from when we watch a movie. It is a trick of perception. Our eyes take some time to understand a new image. Our brains insist on forming a continuous whole story even out of disjoint ideas. Our memory fools us into remembering a continuous line of action. That a movie moves is entirely an illusion.

You see the implication here. Surely Zeno was not trying to lead us to understand all motion, in the real world, as an illusion? … Zeno seems to have been trying to support the work of Parmenides of Elea. Parmenides is another pre-Socratic philosopher. So we have about four words that we’re fairly sure he authored, and we’re not positive what order to put them in. Parmenides was arguing about the nature of reality, and what it means for a thing to come into or pass out of existence. He seems to have been arguing something like that there was a true reality that’s necessary and timeless and changeless. And there’s an apparent reality, the thing our senses observe. And in our sensing, we add lies which make things like change seem to happen. (Do not use this to get through your PhD defense in philosophy. I’m not sure I’d use it to get through your Intro to Ancient Greek Philosophy quiz.) That what we perceive as movement is not what is “really” going on is, at least, imaginable. So it is worth asking questions about what we mean for something to move. What difference there is between our intuitive understanding of movement and what logic says should happen.

(I know someone wishes to throw down the word Quantum. Quantum mechanics is a powerful tool for describing how many things behave. It implies limits on what we can simultaneously know about the position and the time of a thing. But there is a difference between “what time is” and “what we can know about a thing’s coordinates in time”. Quantum mechanics speaks more to the latter. There are also people who would like to say Relativity. Relativity, special and general, implies we should look at space and time as a unified set. But this does not change our questions about continuity of time or space, or where to find movement in both.)

And this is why we are likely never to finish pondering Zeno’s Paradoxes. In this essay I’ve only discussed two of them: Achilles and the Tortoise, and The Arrow. There are two other particularly famous ones: the Dichotomy, and the Stadium. The Dichotomy is the one about how to get somewhere, you have to get halfway there. But to get halfway there, you have to get a quarter of the way there. And an eighth of the way there, and so on. The Stadium is the hardest of the four great paradoxes to explain. This is in part because the earliest writings we have about it don’t make clear what Zeno was trying to get at. I can think of something which seems consistent with what’s described, and contrary-to-intuition enough to be interesting. I’m satisfied to ponder that one. But other people may have different ideas of what the paradox should be.

There are a handful of other paradoxes which don’t get so much love, although one of them is another version of the Sorites Paradox. Some of them the Stanford Encyclopedia of Philosophy dubs “paradoxes of plurality”. These ask how many things there could be. It’s hard to judge just what he was getting at with this. We know that one argument had three parts, and only two of them survive. Trying to fill in that gap is a challenge. We want to fill in the argument we would make, projecting from our modern idea of this plurality. It’s not Zeno’s idea, though, and we can’t know how close our projection is.

I don’t have the space to make a thematically coherent essay describing these all, though. The set of paradoxes have demanded thought, even just to come up with a reason to think they don’t demand thought, for thousands of years. We will, perhaps, have to keep trying again to fully understand what it is we don’t understand.

Thank you, all who’ve been reading, and who’ve offered topics, comments on the material, or questions about things I was hoping readers wouldn’t notice I was shorting. I’ll probably do this again next year, after I’ve had some chance to rest.

My final glossary term for this year’s A To Z sequence was suggested by aajohannas, who’d also suggested “randomness” and “tiling”. I don’t know of any blogs or other projects they’re behind, but if I do hear, I’ll pass them on.

Zugzwang.

Some areas of mathematics struggle against the question, “So what is this useful for?” As though usefulness were a particular merit — or demerit — for a field of human study. Most mathematics fields discover some use, though, even if it takes centuries. Others are born useful. Probability, for example. Statistics. Know what the fields are and you know why they’re valuable.

Game theory is another of these. The subject, as often happens, we can trace back centuries. Usually as the study of some particular game. Occasionally in the study of some political science problem. But game theory developed a particular identity in the early 20th century. Some of this from set theory experts. Some from probability experts. Some from John von Neumann, because it was the 20th century and all that. Calling it “game theory” explains why anyone might like to study it. Who doesn’t like playing games? Who, studying a game, doesn’t want to play it better?

But why it might be interesting is different from why it might be important. Think of what a game is. It is a string of choices made by one or more parties. The point of the choices is to achieve some goal. Put that way you realize: this is everything. All life is making choices, all in the pursuit of some goal, even if that goal is just “not end up any worse off”. I don’t know that the earliest researchers in game theory as a field realized what a powerful subject they had touched on. But by the 1950s they were doing serious work in strategic planning, and by 1964 were even giving us Stanley Kubrick movies.

This is taking me away from my glossary term. The field of games is enormous. If we narrow the field some we can discuss specific kinds of games. And say more involved things about these games. So first we’ll limit things by thinking only of sequential games. These are ones where there are a set number of players, and they take turns making choices. I’m not sure whether the field expects the order of play to be the same every time. My understanding is that much of the focus is on two-player games. What’s important is that at any one step there’s only one party making a choice.

The other thing narrowing the field is to think of information. There are many things that can affect the state of the game. Some of them might be obvious, like where the pieces are on the game board. Or how much money a player has. We’re used to that. But there can be hidden information. A player might conceal some game money so as to make other players underestimate her resources. Many card games have one or more cards concealed from the other players. There can be information unknown to any party. No one can make a useful prediction what the next throw of the game dice will be. Or what the next event card will be.

But there are games where there’s none of this ambiguity. These are called games with “perfect information”. In them all the players know the past moves every player has made. Or at least should know them. Players are allowed to forget what they ought to know.

There’s a separate but similar-sounding idea called “complete information”. In a game with complete information, players know everything that affects the gameplay. At least, probably, apart from what their opponents intend to do. This might sound like an impossibly high standard, at first. All games with shuffled decks of cards and with dice to roll are out. There’s no concealing or lying about the state of affairs.

Set complete-information aside; we don’t need it here. Think only of perfect-information games. What are they? Some ancient games, certainly. Tic-tac-toe, for example. Some more modern versions, like Connect Four and its variations. Some that are actually deep, like checkers and chess and go. Some that are, arguably, more puzzles than games, as in sudoku. Some that hardly seem like games, like several people agreeing how to cut a cake fairly. Some that seem like tests to prove people are fundamentally stupid, like when you auction off a dollar. (The rules are set so players can easily end up paying more then a dollar.) But that’s enough for me, at least. You can see there are games of clear, tangible interest here.

The last restriction: think only of two-player games. Or at least two parties. Any of these two-party sequential games with perfect information are a part of “combinatorial game theory”. It doesn’t usually allow for incomplete-information games. But at least the MathWorld glossary doesn’t demand they be ruled out. So I will defer to this authority. I’m not sure how the name “combinatorial” got attached to this kind of game. My guess is that it seems like you should be able to list all the possible combinations of legal moves. That number may be enormous, as chess and go players are always going on about. But you could imagine a vast book which lists every possible game. If your friend ever challenged you to a game of chess the two of you could simply agree, oh, you’ll play game number 2,038,940,949,172 and then look up to see who won. Quite the time-saver.

Most games don’t have such a book, though. Players have to act on what they understand of the current state, and what they think the other player will do. This is where we get strategies from. Not just what we plan to do, but what we imagine the other party plans to do. When working out a strategy we often expect the other party to play perfectly. That is, to make no mistakes, to not do anything that worsens their position. Or that reduces their chance of winning.

… And yes, arguably, the word “chance” doesn’t belong there. These are games where the rules are known, every past move is known, every future move is in principle computable. And if we suppose everyone is making the best possible move then we can imagine forecasting the whole future of the game. One player has a “chance” of winning in the same way Christmas day of the year 2038 has a “chance” of being on a Tuesday. That is, the probability is just an expression of our ignorance, that we don’t happen to be able to look it up.

But what choice do we have? I’ve never seen a reference that lists all the possible games of tic-tac-toe. And that’s about the simplest combinatorial-game-theory game anyone might actually play. What’s possible is to look at the current state of the game. And evaluate which player seems to be closer to her goal. And then look at all the possible moves.

There are three things a move can do. It can put the party closer to the goal. It can put the party farther from the goal. Or it can do neither. On her turn the other party might do something that moves you farther from your goal, moves you closer to your goal, or doesn’t affect your status at all. It seems like this makes strategy obvious. On every step take the available move that takes one closest to the goal. This is known as a “greedy” strategy. As the name suggests it isn’t automatically bad. If you expect the game to be a short one, greed might be the best approach. The catch is that moves that seem less good — even ones that seem to hurt you initially — might set up other, even better moves. So strategy requires some thinking beyond the current step. Properly, it requires thinking through to the end of the game. Or at least until the end of the game seems obvious.

We should like a strategy that leaves us no choice but to win. Next-best would be one that leaves the game undecided, since something might happen like the other player needing to catch a bus and so resigning. This is how I got my solitary win in the two months I spent in the college chess club. Worst would be the games that leave us no choice but to lose.

It can be that there are no good moves. That is, that every move available makes it a little less likely that we win. Sometimes a game offers the chance to pass, preserving the state of the game but giving the other party the turn. Then maybe the other party will do something that creates a better opportunity for us. But if we are allowed to pass, there’s a good chance the game lets the other party pass, too, and we end up in the same fix. And it may be the rules of the game don’t allow passing anyway. One must move.

The phenomenon of having to make a move when it’s impossible to make a good move has prominence in chess. I don’t have the chess knowledge to say how common the situation is. But it seems to be a situation people who study chess problems love. I suppose it appeals to a love of lost causes and the hope that you can be brilliant enough to see what everyone else has overlooked. German chess literates gave it a name 160 years ago, “zugzwang”, “compulsion to move”. Somehow I never encountered the term when I was briefly a college chess player. Perhaps because I was never in zugzwang and was just too incompetent a player to find my good moves. I first encountered the term in Michael Chabon’s The Yiddish Policeman’s Union. The protagonist picked up on the term as he investigated the murder of a chess player and then felt himself in one.

Combinatorial game theorists have picked up the word, and sharpened its meaning. If I understand correctly chess players allow the term to be used for any case where a player hurts her position by moving at all. Game theorists make it more dire. This may reflect their knowledge that an optimal strategy might require taking some dismal steps along the way. The game theorist formally grants the term only to the situation where the compulsion to move changes what should be a win into a loss. This seems terrible, but then, we’ve all done this in play. We all feel terrible about it.

I’d like here to give examples. But in searching the web I can find only either courses in game theory. These are a bit too much for even me to sumarize. Or chess problems, which I’m not up to understanding. It seems hard to set out an example: I need to not just set out the game, but show that what had been a win is now, by any available move, turned into a loss. Chess is looser. It even allows, I discover, a double zugzwang, where both players are at a disadvantage if they have to move.

It’s a quite relatable problem. You see why game theory has this reputation as mathematics that touches all life.

I did not remember how long a buildup there was to my Summer 2017 writings about the Zeta function. But it’s something that takes a lot of setup. I don’t go into why the Riemann Hypothesis is interesting. I might have been saving that for a later A-to-Z. Or I might have trusted that since every pop mathematics blog has a good essay about the Riemann Hypothesis already there wasn’t much I could add.

I realize on re-reading that one might take me to have said that the final exam for my Intro to Complex Analysis course was always in the back of my textbook. I’d meant that after the final, I tucked it into my book and left it there. Probably nobody was confused by this.

Today Gaurish, of For the love of Mathematics, gives me the last subject for my Summer 2017 A To Z sequence. And also my greatest challenge: the Zeta function. The subject comes to all pop mathematics blogs. It comes to all mathematics blogs. It’s not difficult to say something about a particular zeta function. But to say something at all original? Let’s watch.

Zeta Function.

The spring semester of my sophomore year I had Intro to Complex Analysis. Monday Wednesday 7:30; a rare evening class, one of the few times I’d eat dinner and then go to a lecture hall. There I discovered something strange and wonderful. Complex Analysis is a far easier topic than Real Analysis. Both are courses about why calculus works. But why calculus for complex-valued numbers works is a much easier problem than why calculus for real-valued numbers works. It’s dazzling. Part of this is that Complex Analysis, yes, builds on Real Analysis. So Complex can take for granted some things that Real has to prove. I didn’t mind. Given the way I crashed through Intro to Real Analysis I was glad for a subject that was, relatively, a breeze.

As we worked through Complex Variables and Applications so many things, so very many things, got to be easy. The basic unit of complex analysis, at least as we young majors learned it, was in contour integrals. These are integrals whose value depends on the values of a function on a closed loop. The loop is in the complex plane. The complex plane is, well, your ordinary plane. But we say the x-coordinate and the y-coordinate are parts of the same complex-valued number. The x-coordinate is the real-valued part. The y-coordinate is the imaginary-valued part. And we call that summation ‘z’. In complex-valued functions ‘z’ serves the role that ‘x’ does in normal mathematics.

So a closed loop is exactly what you think. Take a rubber band and twist it up and drop it on the table. That’s a closed loop. Suppose you want to integrate a function, ‘f(z)’. If you can always take its derivative on this loop and on the interior of that loop, then its contour integral is … zero. No matter what the function is. As long as it’s “analytic”, as the terminology has it. Yeah, we were all stunned into silence too. (Granted, mathematics classes are usually quiet, since it’s hard to get a good discussion going. Plus many of us were in post-dinner digestive lulls.)

Integrating regular old functions of real-valued numbers is this tedious process. There’s sooooo many rules and possibilities and special cases to consider. There’s sooooo many tricks that get you the integrals of some functions. And then here, with complex-valued integrals for analytic functions, you know the answer before you even look at the function.

As you might imagine, since this is only page 113 of a 341-page book there’s more to it. Most functions that anyone cares about aren’t analytic. At least they’re not analytic everywhere inside regions that might be interesting. There’s usually some points where an interesting function ‘f(z)’ is undefined. We call these “singularities”. Yes, like starships are always running into. Only we rarely get propelled into other universes or other times or turned into ghosts or stuff like that.

So much of the rest of the course turns into ways to avoid singularities. Sometimes you can spackle them over. This is when the function happens not to be defined somewhere, but you can see what it ought to be. Sometimes you have to do something more. This turns into a search for “removable” singularities. And this does something so brilliant it looks illicit. You modify your closed loop, so that it comes up very close, as close as possible, to the singularity, but studiously avoids it. Follow this game of I’m-not-touching-you right and you can turn your integral into two parts. One is the part that’s equal to zero. The other is the part that’s a constant times whatever the function is at the singularity you’re removing. And that ought to be easy to find the value for. (Being able to find a function’s value doesn’t mean you can find its derivative.)

Those tricks were hard to master. Not because they were hard. Because they were easy, in a context where we expected hard. But after that we got into how to move singularities. That is, how to do a change of variables that moved the singularities to where they’re more convenient for some reason. How could this be more convenient? Because of chapter five, “Series”. In regular old calculus we learn how to approximate well-behaved functions with polynomials. In complex-variable calculus, we learn the same thing all over again. They’re polynomials of complex-valued variables, but it’s the same sort of thing. And not just polynomials, but things that look like polynomials except they’re powers of instead. These open up new ways to approximate functions, and to remove singularities from functions.

And then we get into transformations. These are about turning a problem that’s hard into one that’s easy. Or at least different. They’re a change of variable, yes. But they also change what exactly the function is. This reshuffles the problem. Makes for a change in singularities. Could make ones that are easier to work with.

One of the useful, and so common, transforms is called the Laplace-Stieltjes Transform. (“Laplace” is said like you might guess. “Stieltjes” is said, or at least we were taught to say it, like “Stilton cheese” without the “ton”.) And it tends to create functions that look like a series, the sum of a bunch of terms. Infinitely many terms. Each of those terms looks like a number times another number raised to some constant times ‘z’. As the course came to its conclusion, we were all prepared to think about these infinite series. Where singularities might be. Which of them might be removable.

These functions, these results of the Laplace-Stieltjes Transform, we collectively call ‘zeta functions’. There are infinitely many of them. Some of them are relatively tame. Some of them are exotic. One of them is world-famous. Professor Walsh — I don’t mean to name-drop, but I discovered the syllabus for the course tucked in the back of my textbook and I’m delighted to rediscover it — talked about it.

That world-famous one is, of course, the Riemann Zeta function. Yes, that same Riemann who keeps turning up, over and over again. It looks simple enough. Almost tame. Take the counting numbers, 1, 2, 3, and so on. Take your ‘z’. Raise each of the counting numbers to that ‘z’. Take the reciprocals of all those numbers. Add them up. What do you get?

A mass of fascinating results, for one. Functions you wouldn’t expect are concealed in there. There’s strips where the real part is zero. There’s strips where the imaginary part is zero. There’s points where both the real and imaginary parts are zero. We know infinitely many of them. If ‘z’ is -2, for example, the sum is zero. Also if ‘z’ is -4. -6. -8. And so on. These are easy to show, and so are dubbed ‘trivial’ zeroes. To say some are ‘trivial’ is to say that there are others that are not trivial. Where are they?

Professor Walsh explained. We know of many of them. The nontrivial zeroes we know of all share something in common. They have a real part that’s equal to 1/2. There’s a zero that’s at about the number . Also at . There’s one at about . Also about . (There’s a symmetry, you maybe guessed.) Every nontrivial zero we’ve found has a real component that’s got the same real-valued part. But we don’t know that they all do. Nobody does. It is the Riemann Hypothesis, the great unsolved problem of mathematics. Much more important than that Fermat’s Last Theorem, which back then was still merely a conjecture.

What a prospect! What a promise! What a way to set us up for the final exam in a couple of weeks.

I had an inspiration, a kind of scheme of showing that a nontrivial zero couldn’t be within a given circular contour. Make the size of this circle grow. Move its center farther away from the z-coordinate to match. Show there’s still no nontrivial zeroes inside. And therefore, logically, since I would have shown nontrivial zeroes couldn’t be anywhere but on this special line, and we know nontrivial zeroes exist … I leapt enthusiastically into this project. A little less enthusiastically the next day. Less so the day after. And on. After maybe a week I went a day without working on it. But came back, now and then, prodding at my brilliant would-be proof.

The Riemann Zeta function was not on the final exam, which I’ve discovered was also tucked into the back of my textbook. It asked more things like finding all the singular points and classifying what kinds of singularities they were for functions like instead. If the syllabus is accurate, we got as far as page 218. And I’m surprised to see the professor put his e-mail address on the syllabus. It was merely “bwalsh@math”, but understand, the Internet was a smaller place back then.

I finished the course with an A-, but without answering any of the great unsolved problems of mathematics.

The close of my End 2016 A-to-Z let me show off one of my favorite modes, that of amateur historian of mathematics who doesn’t check his primary references enough. So far as I know I don’t have any serious errors here, but then, how would I know? … But keep in mind that the full story is more complicated and more ambiguous than presented. (This is true of all histories.) That I could fit some personal history in was also a delight.

I don’t know why Thoralf Skolem’s name does not attach to the Zermelo-Fraenkel Axioms. Mathematical things are named with a shocking degree of arbitrariness. Skolem did well enough for himself.

gaurish gave me a choice for the Z-term to finish off the End 2016 A To Z. I appreciate it. I’m picking the more abstract thing because I’m not sure that I can explain zero briefly. The foundations of mathematics are a lot easier.

Zermelo-Fraenkel Axioms

I remember the look on my father’s face when I asked if he’d tell me what he knew about sets. He misheard what I was asking about. When we had that straightened out my father admitted that he didn’t know anything particular. I thanked him and went off disappointed. In hindsight, I kind of understand why everyone treated me like that in middle school.

My father’s always quick to dismiss how much mathematics he knows, or could understand. It’s a common habit. But in this case he was probably right. I knew a bit about set theory as a kid because I came to mathematics late in the “New Math” wave. Sets were seen as fundamental to why mathematics worked without being so exotic that kids couldn’t understand them. Perhaps so; both my love and I delighted in what we got of set theory as kids. But if you grew up before that stuff was popular you probably had a vague, intuitive, and imprecise idea of what sets were. Mathematicians had only a vague, intuitive, and imprecise idea of what sets were through to the late 19th century.

And then came what mathematics majors hear of as the Crisis of Foundations. (Or a similar name, like Foundational Crisis. I suspect there are dialect differences here.) It reflected mathematics taking seriously one of its ideals: that everything in it could be deduced from clearly stated axioms and definitions using logically rigorous arguments. As often happens, taking one’s ideals seriously produces great turmoil and strife.

Before about 1900 we could get away with saying that a set was a bunch of things which all satisfied some description. That’s how I would describe it to a new acquaintance if I didn’t want to be treated like I was in middle school. The definition is fine if we don’t look at it too hard. “The set of all roots of this polynomial”. “The set of all rectangles with area 2”. “The set of all animals with four-fingered front paws”. “The set of all houses in Central New Jersey that are yellow”. That’s all fine.

And then if we try to be logically rigorous we get problems. We always did, though. They’re embodied by ancient jokes like the person from Crete who declared that all Cretans always lie; is the statement true? Or the slightly less ancient joke about the barber who shaves only the men who do not shave themselves; does he shave himself? If not jokes these should at least be puzzles faced in fairy-tale quests. Logicians dressed this up some. Bertrand Russell gave us the quite respectable “The set consisting of all sets which are not members of themselves”, and asked us to stare hard into that set. To this we have only one logical response, which is to shout, “Look at that big, distracting thing!” and run away. This satisfies the problem only for a while.

The while ended in — well, that took a while too. But between 1908 and the early 1920s Ernst Zermelo, Abraham Fraenkel, and Thoralf Skolem paused from arguing whose name would also be the best indie rock band name long enough to put set theory right. Their structure is known as Zermelo-Fraenkel Set Theory, or ZF. It gives us a reliable base for set theory that avoids any contradictions or catastrophic pitfalls. Or does so far as we have found in a century of work.

It’s built on a set of axioms, of course. Most of them are uncontroversial, things like declaring two sets are equivalent if they have the same elements. Declaring that the union of sets is itself a set. Obvious, sure, but it’s the obvious things that we have to make axioms. Maybe you could start an argument about whether we should just assume there exists some infinitely large set. But if we’re aware sets probably have something to teach us about numbers, and that numbers can get infinitely large, then it seems fair to suppose that there must be some infinitely large set. The axioms that aren’t simple obvious things like that are too useful to do without. They assume stuff like that no set is an element of itself. Or that every set has a “power set”, a new set comprising all the subsets of the original set. Good stuff to know.

There is one axiom that’s controversial. Not controversial the way Euclid’s Parallel Postulate was. That’s the ugly one about lines crossing another line meeting on the same side they make angles smaller than something something or other. That axiom was controversial because it read so weird, so needlessly complicated. (It isn’t; it’s exactly as complicated as it must be. Or for a more instructive view, it’s as simple as it could be and still be useful.) The controversial axiom of Zermelo-Fraenkel Set Theory is known as the Axiom of Choice. It says if we have a collection of mutually disjoint sets, each with at least one thing in them, then it’s possible to pick exactly one item from each of the sets.

It’s impossible to dispute this is what we have axioms for. It’s about something that feels like it should be obvious: we can always pick something from a set. How could this not be true?

If it is true, though, we get some unsavory conclusions. For example, it becomes possible to take a ball the size of an orange and slice it up. We slice using mathematical blades. They’re not halted by something as petty as the desire not to slice atoms down the middle. We can reassemble the pieces. Into two balls. And worse, it doesn’t require we do something like cut the orange into infinitely many pieces. We expect crazy things to happen when we let infinities get involved. No, though, we can do this cut-and-duplicate thing by cutting the orange into five pieces. When you hear that it’s hard to know whether to point to the big, distracting thing and run away. If we dump the Axiom of Choice we don’t have that problem. But can we do anything useful without the ability to make a choice like that?

And we’ve learned that we can. If we want to use the Zermelo-Fraenkel Set Theory with the Axiom of Choice we say we were working in “ZFC”, Zermelo-Fraenkel-with-Choice. We don’t have to. If we don’t want to make any assumption about choices we say we’re working in “ZF”. Which to use depends on what one wants to use.

Either way Zermelo and Fraenkel and Skolem established set theory on the foundation we use to this day. We’re not required to use them, no; there’s a construction called von Neumann-Bernays-Gödel Set Theory that’s supposed to be more elegant. They didn’t mention it in my logic classes that I remember, though.

And still there’s important stuff we would like to know which even ZFC can’t answer. The most famous of these is the continuum hypothesis. Everyone knows — excuse me. That’s wrong. Everyone who would be reading a pop mathematics blog knows there are different-sized infinitely-large sets. And knows that the set of integers is smaller than the set of real numbers. The question is: is there a set bigger than the integers yet smaller than the real numbers? The Continuum Hypothesis says there is not.

Zermelo-Fraenkel Set Theory, even though it’s all about the properties of sets, can’t tell us if the Continuum Hypothesis is true. But that’s all right; it can’t tell us if it’s false, either. Whether the Continuum Hypothesis is true or false stands independent of the rest of the theory. We can assume whichever state is more useful for our work.

Back to the ideals of mathematics. One question that produced the Crisis of Foundations was consistency. How do we know our axioms don’t contain a contradiction? It’s hard to say. Typically a set of axioms we can prove consistent are also a set too boring to do anything useful in. Zermelo-Fraenkel Set Theory, with or without the Axiom of Choice, has a lot of interesting results. Do we know the axioms are consistent?

No, not yet. We know some of the axioms are mutually consistent, at least. And we have some results which, if true, would prove the axioms to be consistent. We don’t know if they’re true. Mathematicians are generally confident that these axioms are consistent. Mostly on the grounds that if there were a problem something would have turned up by now. It’s withstood all the obvious faults. But the universe is vaster than we imagine. We could be wrong.

It’s hard to live up to our ideals. After a generation of valiant struggling we settle into hoping we’re doing good enough. And waiting for some brilliant mind that can get us a bit closer to what we ought to be.

When I first published this I mentioned not knowing why ‘z’ got picked as a variable name. Any letter besides ‘x’ would make sense. As happens when I toss this sort of question out, I haven’t learned anything about why ‘z’ and not, oh, ‘y’ or ‘t’ or even ‘d’. My best guess is that we don’t want to confuse references to the original data with references to the transformed. And while you can write a ‘z’ so badly it looks like an ‘x’, it’s much easier to write a ‘y’ that looks like an ‘x’. I don’t know whether the Preliminary SAT is still a thing.

And we come to the last of the Leap Day 2016 Mathematics A To Z series! Z is a richer letter than x or y, but it’s still not so rich as you might expect. This is why I’m using a term that everybody figured I’d use the last time around, when I went with z-transforms instead.

Z-Score

You get an exam back. You get an 83. Did you do well?

Hard to say. It depends on so much. If you expected to barely pass and maybe get as high as a 70, then you’ve done well. If you took the Preliminary SAT, with a composite score that ranges from 60 to 240, an 83 is catastrophic. If the instructor gave an easy test, you maybe scored right in the middle of the pack. If the instructor sees tests as a way to weed out the undeserving, you maybe had the best score in the class. It’s impossible to say whether you did well without context.

The z-score is a way to provide that context. It draws that context by comparing a single score to all the other values. And underlying that comparison is the assumption that whatever it is we’re measuring fits a pattern. Usually it does. The pattern we suppose stuff we measure will fit is the Normal Distribution. Sometimes it’s called the Standard Distribution. Sometimes it’s called the Standard Normal Distribution, so that you know we mean business. Sometimes it’s called the Gaussian Distribution. I wouldn’t rule out someone writing the Gaussian Normal Distribution. It’s also called the bell curve distribution. As the names suggest by throwing around “normal” and “standard” so much, it shows up everywhere.

A normal distribution means that whatever it is we’re measuring follows some rules. One is that there’s a well-defined arithmetic mean of all the possible results. And that arithmetic mean is the most common value to turn up. That’s called the mode. Also, this arithmetic mean, and mode, is also the median value. There’s as many data points less than it as there are greater than it. Most of the data values are pretty close to the mean/mode/median value. There’s some more as you get farther from this mean. But the number of data values far away from it are pretty tiny. You can, in principle, get a value that’s way far away from the mean, but it’s unlikely.

We call this standard because it might as well be. Measure anything that varies at all. Draw a chart with the horizontal axis all the values you could measure. The vertical axis is how many times each of those values comes up. It’ll be a standard distribution uncannily often. The standard distribution appears when the thing we measure satisfies some quite common conditions. Almost everything satisfies them, or nearly satisfies them. So we see bell curves so often when we plot how frequently data points come up. It’s easy to forget that not everything is a bell curve.

The normal distribution has a mean, and median, and mode, of 0. It’s tidy that way. And it has a standard deviation of exactly 1. The standard deviation is a way of measuring how spread out the bell curve is. About 95 percent of all observed results are less than two standard deviations away from the mean. About 99 percent of all observed results are less than three standard deviations away. 99.9997 percent of all observed results are less than six standard deviations away. That last might sound familiar to those who’ve worked in manufacturing. At least it des once you know that the Greek letter sigma is the common shorthand for a standard deviation. “Six Sigma” is a quality-control approach. It’s meant to make sure one understands all the factors that influence a product and controls them. This is so the product falls outside the design specifications only 0.0003 percent of the time.

This is the normal distribution. It has a standard deviation of 1 and a mean of 0, by definition. And then people using statistics go and muddle the definition. It is always so, with the stuff people actually use. Forgive them. It doesn’t really change the shape of the curve if we scale it, so that the standard deviation is, say, two, or ten, or π, or any positive number. It just changes where the tick marks are on the x-axis of our plot. And it doesn’t really change the shape of the curve if we translate it, adding (or subtracting) some number to it. That makes the mean, oh, 80. Or -15. Or e^{π}. Or some other number. That just changes what value we write underneath the tick marks on the plot’s x-axis. We can find a scaling and translation of the normal distribution that fits whatever data we’re observing.

When we find the z-score for a particular data point we’re undoing this translation and scaling. We figure out what number on the standard distribution maps onto the original data set’s value. About two-thirds of all data points are going to have z-scores between -1 and 1. About nineteen out of twenty will have z-scores between -2 and 2. About 99 out of 100 will have z-scores between -3 and 3. If we don’t see this, and we have a lot of data points, then that’s suggests our data isn’t normally distributed.

I don’t know why the letter ‘z’ is used for this instead of, say, ‘y’ or ‘w’ or something else. ‘x’ is out, I imagine, because we use that for the original data. And ‘y’ is a natural pick for a second measured variable. z’, I expect, is just far enough from ‘x’ it isn’t needed for some more urgent duty, while being close enough to ‘x’ to suggest it’s some measured thing.

The z-score gives us a way to compare how interesting or unusual scores are. If the exam on which we got an 83 has a mean of, say, 74, and a standard deviation of 5, then we can say this 83 is a pretty solid score. If it has a mean of 78 and a standard deviation of 10, then the score is better-than-average but not exceptional. If the exam has a mean of 70 and a standard deviation of 4, then the score is fantastic. We get to meaningfully compare scores from the measurements of different things. And so it’s one of the tools with which statisticians build their work.

Back in the day I taught in a Computational Science department, which threw me out to exciting and new-to-me subjects more than once. One quite fun semester I was learning, and teaching, signal processing. This set me up for the triumphant conclusion of my first A-to-Z.

One of the things you can see in my style is mentioning the connotations implied by whether one uses x or z as a variable. Any letter will do, for the use it’s put to. But to use the name ‘z’ suggests an openness to something that ‘x’ doesn’t.

There’s a mention here about stability in algorithms, and the note that we can process data in ways that are stable or are unstable. I don’t mention why one would want or not want stability. Wanting stability hardly seems to need explaining; isn’t that the good option? And, often, yes, we want stable systems because they correct and wipe away error. But there are reasons we might want instability, or at least less stability. Too stable a system will obscure weak trends, or the starts of trends. Your weight flutters day by day in ways that don’t mean much, which is why it’s better to consider a seven-day average. If you took instead a 700-day running average, these meaningless fluctuations would be invisible. But you also would take a year or more to notice whether you were losing or gaining weight. That’s one of the things stability costs.

z-transform.

The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)

It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a_{0} represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.

a_{1}, a_{2}, a_{3}, and so on are the first, second, third, and so on samples after the reference start. a_{-1}, a_{-2}, a_{-3}, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.

So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a_{-4} = 77, a_{-3} = 81, a_{-2} = 84, a_{-1} = 82, a_{0} = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.

The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial . (z^{1} is the same as z. z^{0} is the same as the number “1”. I wrote it this way to make the pattern more clear.)

I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a_{1} = 83 and a_{2} = 76. Then the z-transform for this set of data would be . You’d probably agree that’s not a polynomial, although it looks a lot like one.

The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.

A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression .

Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure . And that turns out to be the same as . That’s much shorter to write down, at least.

Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a_{0}, and adding to that one-fifth of yesterday’s temperature, a_{-1}, and one-fifth of the day before’s temperature a_{-2}, and one-fifth a_{-3}, and one-fifth a_{-4}.

The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?

We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.

Mr Wu, author of the Singapore Maths Tuition blog, asked me to explain a technical term today. I thought that would be a fun, quick essay. I don’t learn very fast, do I?

A note on style. I make reference here to “Big-O” and “Little-O”, capitalizing and hyphenating them. This is to give them visual presence as a name. In casual discussion they’re just read, or said, as the two words or word-and-a-letter. Often the Big- or Little- gets dropped and we just talk about O. An O, without further context, in my experience means Big-O.

The part of me that wants smooth consistency in prose urges me to write “Little-o”, as the thing described is represented with a lowercase ‘o’. But Little-o sounds like a midway game or an Eyerly Aircraft Company amusement park ride. And I never achieve consistency in my prose anyway. Maybe for the book publication. Until I’m convinced another is better, though, “Little-O” it is.

Big-O and Little-O Notation.

When I first went to college I had a campus post office box. I knew my box number. I also knew the length of the sluggish line for the combination lock code. The lock was a dial, lettered A through J. Being a young STEM-class idiot I thought, boy, would it actually be quicker to pick the lock than wait for the line? A three-letter combination, of ten options? That’s 1,000 possibilities. If I could try five a minute that’s, at worst, three hours 20 minutes. Combination might be anywhere in that set; I might get lucky. I could expect to spend 80 minutes picking my lock.

I decided to wait in line instead, and good that I did. I was unaware lock settings might not be a letter, like ‘A’. It could be the midway point between adjacent letters, like ‘AB’. That meant there were eight times as many combinations as I estimated, and I could expect to spend over ten hours. Even the slow line was faster than that. It transpired that my combination had two of these midway letters.

But that’s a little demonstration of algorithmic complexity. Also in cracking passwords by trial-and-error. Doubling the set of possible combination codes octuples the time it takes to break into the set. Making the combination longer would also work; each extra letter would multiply the cracking time by twenty. So you understand why your password should include “special characters” like punctuation, but most of all should be long.

We’re often interested in how long to expect a task to take. Sometimes we’re interested in the typical time it takes. Often we’re interested in the longest it could ever take. If we have a deterministic algorithm, we can say. We can count how many steps it takes. Sometimes this is easy. If we want to add two two-digit numbers together we know: it will be, at most, three single-digit additions plus, maybe, writing down a carry. (To add 98 and 37 is adding 8 + 7 to get 15, to add 9 + 3 to get 12, and to take the carry from the 15, so, 1 + 12 to get 13, so we have 135.) We can get a good quarrel going about what “a single step” is. We can argue whether that carry into the hundreds column is really one more addition. But we can agree that there is some smallest bit of arithmetic work, and proceed from that.

For any algorithm we have something that describes how big a thing we’re working on. It’s often ‘n’. If we need more than one variable to describe how big it is, ‘m’ gets called up next. If we’re estimating how long it takes to work on a number, ‘n’ is the number of digits in the number. If we’re thinking about a square matrix, ‘n’ is the number of rows and columns. If it’s a not-square matrix, then ‘n’ is the number of rows and ‘m’ the number of columns. Or vice-versa; it’s your matrix. If we’re looking for an item in a list, ‘n’ is the number of items in the list. If we’re looking to evaluate a polynomial, ‘n’ is the order of the polynomial.

In normal circumstances we don’t work out how many steps some operation does take. It’s more useful to know that multiplying these two long numbers would take about 900 steps than that it would need only 816. And so this gives us an asymptotic estimate. We get an estimate of how much longer cracking the combination lock will take if there’s more letters to pick from. This allowing that some poor soul will get the combination A-B-C.

There are a couple ways to describe how long this will take. The more common is the Big-O. This is just the letter, like you find between N and P. Since that’s easy, many have taken to using a fancy, vaguely cursive O, one that looks like . I agree it looks nice. Particularly, though, we write , where f is some function. In practice, we’ll see functions like or or . Usually something simple like that. It can be tricky. There’s a scheme for multiplying large numbers together that’s . What you will not see is something like , or or such. This comes to what we mean by the Big-O.

It’ll be convenient for me to have a name for the actual number of steps the algorithm takes. Let me call the function describing that g(n). Then g(n) is if once n gets big enough, g(n) is always less than C times f(n). Here c is some constant number. Could be 1. Could be 1,000,000. Could be 0.00001. Doesn’t matter; it’s some positive number.

There’s some neat tricks to play here. For example, the function ‘‘ is . It’s also and and . The function ‘‘ is also and those later terms, but it is not . And you can see why is right out.

There is also a Little-O notation. It, too, is an upper bound on the function. But it is a stricter bound, setting tighter restrictions on what g(n) is like. You ask how it is the stricter bound gets the minuscule letter. That is a fine question. I think it’s a quirk of history. Both symbols come to us through number theory. Big-O was developed first, published in 1894 by Paul Bachmann. Little-O was published in 1909 by Edmund Landau. Yes, the one with the short Hilbert-like list of number theory problems. In 1914 G H Hardy and John Edensor Littlewood would work on another measure and they used Ω to express it. (If you see the letter used for Big-O and Little-O as the Greek omicron, then you see why a related concept got called omega.)

What makes the Little-O measure different is its sternness. g(n) is if, for every positive number C, whenever n is large enough g(n) is less than or equal to C times f(n). I know that sounds almost the same. Here’s why it’s not.

If g(n) is , then you can go ahead and pick a C and find that, eventually, . If g(n) is , then I, trying to sabotage you, can go ahead and pick a C, trying my best to spoil your bounds. But I will fail. Even if I pick, like a C of one millionth of a billionth of a trillionth, eventually f(n) will be so big that . I can’t find a C small enough that f(n) doesn’t eventually outgrow it, and outgrow g(n).

This implies some odd-looking stuff. Like, that the function n is not . But the function n is at least , and and those other fun variations. Being Little-O compels you to be Big-O. Big-O is not compelled to be Little-O, although it can happen.

These definitions, for Big-O and Little-O, I’ve laid out from algorithmic complexity. It’s implicitly about functions defined on the counting numbers. But there’s no reason I have to limit the ideas to that. I could define similar ideas for a function g(x), with domain the real numbers, and come up with an idea of being on the order of f(x).

We make some adjustments to this. The important one is that, with algorithmic complexity, we assumed g(n) had to be a positive number. What would it even mean for something to take minus four steps to complete? But a regular old function might be zero or negative or change between negative and positive. So we look at the absolute value of g(x). Is there some value of C so that, when x is big enough, the absolute value of g(x) stays less than C times f(x)? If it does, then g(x) is . Is it the case that for every positive number C it’s true that g(x) is less than C times f(x), once x is big enough? Then g(x) is .

Fine, but why bother defining this?

A compelling answer is that it gives us a way to describe how different a function is from an approximation to that function. We are always looking for approximations to functions because most functions are hard. We have a small set of functions we like to work with. Polynomials are great numerically. Exponentials and trig functions are great analytically. That’s about all the functions that are easy to work with. Big-O notation particularly lets us estimate how bad an error we make using the approximation.

For example, the Runge-Kutta method numerically approximates solutions to ordinary differential equations. It does this by taking the information we have about the function at some point x to approximate its value at a point x + h. ‘h’ is some number. The difference between the actual answer and the Runge-Kutta approximation is . We use this knowledge to make sure our error is tolerable. Also, we don’t usually care what the function is at x + h. It’s just what we can calculate. What we want is the function at some point a fair bit away from x, call it x + L. So we use our approximate knowledge of conditions at x + h to approximate the function at x + 2h. And use x + 2h to tell us about x + 3h, and from that x + 4h and so on, until we get to x + L. We’d like to have as few of these uninteresting intermediate points as we can, so look for as big an h as is safe.

That context may be the more common one. We see it, particularly, in Taylor Series and other polynomial approximations. For example, the sine of a number is approximately:

This has consequences. It tells us, for example, that if x is about 0.1, this approximation is probably pretty good. So it is: the sine of 0.1 (radians) is about 0.0998334166468282 and that’s exactly what five terms here gives us. But it also warns that if x is about 10, this approximation may be gibberish. And so it is: the sine of 10.0 is about -0.5440 and the polynomial is about 1448.27.

The connotation in using Big-O notation here is that we look for small h’s, and for to be a tiny number. It seems odd to use the same notation with a large independent variable and with a small one. The concept carries over, though, and helps us talk efficiently about this different problem.

One of the many small benefits of these essays is getting myself clearly grounded on terms that I had accepted without thinking much about. Operator, like functional (mentioned in here), is one of them. I’m sure that when these were first introduced my instructors gave them clear definitions. Buut when they’re first introduced it’s not clear why these are important, or that we are going to spend the rest of grad school talking about them. So this piece from 2019’s A-to-Z sequence secured my footing on a term I had a fair understanding of. You get some idea of what has to be intended from the context in which the term is used. Also from knowing how terms like this tend to be defined. But having it down to where I could certainly pass a true-false test about “is this an operator”? That was new.

Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there.

Operator.

An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same sapce but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.

So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.

My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.

I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.

Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.

I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.

One of the conceptual leaps in high school algebra is that we start analyzing the things we do with numbers. Like, we don’t just take the number three, square it, multiply that by two and add to that the number three times four and add to that the number 1. We think about what if we take any number, call it x, and think of . And what if we make equations based on doing this ; what values of x make those equations true? Or tell us something interesting?

Operators represent a similar leap. We can think of functions as things we manipulate, and think of those manipulations as a particular thing to do. For example, let me come up with a differential expression. For some function u(x) work out the value of this:

Let me join in the convention of using ‘D’ for the differential operator. Then we can rewrite this expression like so:

Suddenly the differential equation looks a lot like a polynomial. Of course it does. Remember that everything in mathematics is polynomials. We get new tools to solve differential equations by rewriting them as operators. That’s nice. It also scratches that itch that I think everyone in Intro to Calculus gets, of wanting to somehow see as if it were a square of . It’s not, and is not the square of . It’s composing with itself. But it looks close enough to squaring to feel comfortable.

Nobody needs to do except to learn some stuff about operators. But you might imagine a world where we did this process all the time. If we did, then we’d develop shorthand for it. Maybe a new operator, call it T, and define it that . You see the grammar of treating functions as if they were real numbers becoming familiar. You maybe even noticed the ‘1’ sitting there, serving as the “identity operator”. You know how you’d write out if you needed to write it in full.

But there are operators that we use all the time. These do get special names, and often shorthand. For example, there’s the gradient operator. This applies to any function with several independent variables. The gradient has a great physical interpretation if the variables represent coordinates of space. If they do, the gradient of a function at a point gives us a vector that describes the direction in which the function increases fastest. And the size of that gradient — a functional on this operator — describes how fast that increase is.

The gradient itself defines more operators. These have names you get very familiar with in Vector Calculus, with names like divergence and curl. These have compelling physical interpretations if we think of the function we operate on as describing a moving fluid. A positive divergence means fluid is coming into the system; a negative divergence, that it is leaving. The curl, in fluids, describe how nearby streams of fluid move at different rate.

Physical interpretations are common in operators. This probably reflects how much influence physics has on mathematics and vice-versa. Anyone studying quantum mechanics gets familiar with a host of operators. These have comfortable names like “position operator” or “momentum operator” or “spin operator”. These are operators that apply to the wave function for a problem. They transform the wave function into a probability distribution. That distribution describes what positions or momentums or spins are likely, how likely they are. Or how unlikely they are.

They’re not all physical, though. Or not purely physical. Many operators are useful because they are powerful mathematical tools. There is a variation of the Fourier series called the Fourier transform. We can interpret this as an operator. Suppose the original function started out with time or space as its independent variable. This often happens. The Fourier transform operator gives us a new function, one with frequencies as independent variable. This can make the function easier to work with. The Fourier transform is an integral operator, by the way, so don’t go thinking everything is a complicated set of derivatives.

Another integral-based operator that’s important is the Laplace transform. This is a great operator because it turns differential equations into algebraic equations. Often, into polynomials. You saw that one coming.

This is all a lot of good press for operators. Well, they’re powerful tools. They help us to see that we can manipulate functions in the ways that functions let us manipulate numbers. It should sound good to realize there is much new that you can do, and you already know most of what’s needed to do it.

My grad-student career took me into Monte Carlo methods and viscosity-free fluid flow. It’s a respectable path. But I could have ended up in graph theory; I got a couple courses in it in grad school and loved it. I just could not find a problem I could work on that was both solvable and interesting. But hints of that alternative path for me turn up now and then, such as in this piece from 2018.

I am surprised to have had no suggestions for an ‘O’ letter. I’m glad to take a free choice, certainly. It let me get at one of those fields I didn’t specialize in, but could easily have. And let me mention that while I’m still taking suggestions for the letters P through T, each other letter has gotten at least one nomination. I can be swayed by a neat term, though, so if you’ve thought of something hard to resist, try me. And later this month I’ll open up the letters U through Z. Might want to start thinking right away about what X, Y, and Z could be.

Oriented Graph.

This is another term from graph theory, one of the great mathematical subjects for doodlers. A graph, here, is made of two sets of things. One is a bunch of fixed points, called ‘vertices’. The other is a bunch of curves, called ‘edges’. Every edge starts at one vertex and ends at one vertex. We don’t require that every vertex have an edge grow from it.

Already you can see why this is a fun subject. It models some stuff really well. Like, anything where you have a bunch of sources of stuff, that come together and spread out again? Chances are there’s a graph that describes this. There’s a compelling all-purpose interpretation. Have vertices represent the spots where something accumulates, or rests, or changes, or whatever. Have edges represent the paths along which something can move. This covers so much.

The next step is a “directed graph”. This comes from making the edges different. If we don’t say otherwise we suppose that stuff can move along an edge in either direction. But suppose otherwise. Suppose there are some edges that can be used in only one direction. This makes a “directed edge”. It’s easy to see in graph theory networks of stuff like city streets. Once you ponder that, one-way streets follow close behind. If every edge in a graph is directed, then you have a directed graph. Moving from a regular old undirected graph to a directed graph changes everything you’d learned about graph theory. Mostly it makes things harder. But you get some good things in trade. We become able to model sources, for example. This is where whatever might move comes from. Also sinks, which is where whatever might move disappears from our consideration.

You might fear that by switching to a directed graph there’s no way to have a two-way connection between a pair of vertices. Or that if there is you have to go through some third vertex. I understand your fear, and wish to reassure you. We can get a two-way connection even in a directed graph: just have the same two vertices be connected by two edges. One goes one way, one goes the other. I hope you feel some comfort.

What if we don’t have that, though? What if the directed graph doesn’t have any vertices with a pair of opposite-directed edges? And that, then, is an oriented graph. We get the orientation from looking at pairs of vertices. Each pair either has no edge connecting them, or has a single directed edge between them.

There’s a lot of potential oriented graphs. If you have three vertices, for example, there’s seven oriented graphs to make of that. You’re allowed to have a vertex not connected to any others. You’re also allowed to have the vertices grouped into a couple of subsets, and connect only to other vertices in their own subset. This is part of why four vertices can give you 42 different oriented graphs. Five vertices can give you 582 different oriented graphs. You can insist on a connected oriented graph.

A connected graph is what you guess. It’s a graph where there’s no vertices off on their own, unconnected to anything. There’s no subsets of vertices connected only to each other. This doesn’t mean you can always get from any one vertex to any other vertex. The directions might not allow you to that. But if you’re willing to break the laws, and ignore the directions of these edges, you could then get from any vertex to any other vertex. Limiting yourself to connected graphs reduces the number of oriented graphs you can get. But not by as much as you might guess, at least not to start. There’s only one connected oriented graph for two vertices, instead of two. Three vertices have five connected oriented graphs, rather than seven. Four vertices have 34, rather than 42. Five vertices, 535 rather than 582. The total number of lost graphs grows, of course. The percentage of lost graphs dwindles, though.

There’s something more. What if there are no unconnected vertices? That is, every pair of vertices has an edge? If every pair of vertices in a graph has a direct connection we call that a “complete” graph. This is true whether the graph is directed or not. If you do have a complete oriented graph — every pair of vertices has a direct connection, and only the one direction — then that’s a “tournament”. If that seems like a whimsical name, consider one interpretation of it. Imagine a sports tournament in which every team played every other team once. And that there’s no ties. Each vertex represents one team. Each edge is the match played by the two teams. The direction is, let’s say, from the losing team to the winning team. (It’s as good if the direction is from the winning team to the losing team.) Then you have a complete, oriented, directed graph. And it represents your tournament.

And that delights me. A mathematician like me might talk a good game about building models. How one can represent things with mathematical constructs. Here, it’s done. You can make little dots, for vertices, and curved lines with arrows, for edges. And draw a picture that shows how a round-robin tournament works. It can be that direct.

It’s quite funny to notice the first paragraph’s shame at missing my self-imposed schedule. I still have not found confirmation of my hunch that “open” and “closed”, as set properties, were named independently. I haven’t found evidence I’m wrong, though, either.

Today’s glossary entry is another request from Elke Stangl, author of the Elkemental Force blog. I’m hoping this also turns out to be a well-received entry. Half of that is up to you, the kind reader. At least I hope you’re a reader. It’s already gone wrong, as it was supposed to be Friday’s entry. I discovered I hadn’t actually scheduled it while I was too far from my laptop to do anything about that mistake. This spoils the nice Monday-Wednesday-Friday routine of these glossary entries that dates back to the first one I ever posted and just means I have to quit forever and not show my face ever again. Sorry, Ulam Spiral. Someone else will have to think of you.

Open Set.

Mathematics likes to present itself as being universal truths. And it is. At least if we allow that the rules of logic by which mathematics works are universal. Suppose them to be true and the rest follows. But we start out with intuition, with things we observe in the real world. We’re happy when we can remove the stuff that’s clearly based on idiosyncratic experience. We find something that’s got to be universal.

Sets are pretty abstract things, as mathematicians use the term. They get to be hard to talk about; we run out of simpler words that we can use. A set is … a bunch of things. The things are … stuff that could be in a set, or else that we’d rule out of a set. We can end up better understanding things by drawing a picture. We draw the universe, which is a rectangular block, sometimes with dashed lines as the edges. The set is some blotch drawn on the inside of it. Some shade it in to emphasize which stuff we want in the set. If we need to pick out a couple things in the universe we drop in dots or numerals. If we’re rigorous about the drawing we could create a Venn Diagram.

When we do this, we’re giving up on the pure mathematical abstraction of the set. We’re replacing it with a territory on a map. Several territories, if we have several sets. The territories can overlap or be completely separate. We’re subtly letting our sense of geography, our sense of the spaces in which we move, infiltrate our understanding of sets. That’s all right. It can give us useful ideas. Later on, we’ll try to separate out the ideas that are too bound to geography.

A set is open if whenever you’re in it, you can’t be on its boundary. We never quite have this in the real world, with territories. The border between, say, New Jersey and New York becomes this infinitesimally slender thing, as wide in space as midnight is in time. But we can, with some effort, imagine the state. Imagine being as tiny in every direction as the border between two states. Then we can imagine the difference between being on the border and being away from it.

And not being on the border matters. If we are not on the border we can imagine the problem of getting to the border. Pick any direction; we can move some distance while staying inside the set. It might be a lot of distance, it might be a tiny bit. But we stay inside however we might move. If we are on the border, then there’s some direction in which any movement, however small, drops us out of the set. That’s a difference in kind between a set that’s open and a set that isn’t.

I say “a set that’s open and a set that isn’t”. There are such things as closed sets. A set doesn’t have to be either open or closed. It can be neither, a set that includes some of its borders but not other parts of it. It can even be both open and closed simultaneously. The whole universe, for example, is both an open and a closed set. The empty set, with nothing in it, is both open and closed. (This looks like a semantic trick. OK, if you’re in the empty set you’re not on its boundary. But you can’t be in the empty set. So what’s going on? … The usual. It makes other work easier if we call the empty set ‘open’. And the extra work we’d have to do to rule out the empty set doesn’t seem to get us anything interesting. So we accept what might be a trick.) The definitions of ‘open’ and ‘closed’ don’t exclude one another.

I’m not sure how this confusing state of affairs developed. My hunch is that the words ‘open’ and ‘closed’ evolved independent of each other. Why do I think this? An open set has its openness from, well, not containing its boundaries; from the inside there’s always a little more to it. A closed set has its closedness from sequences. That is, you can consider a string of points inside a set. Are these points leading somewhere? Is that point inside your set? If a string of points always leads to somewhere, and that somewhere is inside the set, then you have closure. You have a closed set. I’m not sure that the terms were derived with that much thought. But it does explain, at least in terms a mathematician might respect, why a set that isn’t open isn’t necessarily closed.

Back to open sets. What does it mean to not be on the boundary of the set? How do we know if we’re on it? We can define sets by all sorts of complicated rules: complex-valued numbers of size less than five, say. Rational numbers whose denominator (in lowest form) is no more than ten. Points in space from which a satellite dropped would crash into the moon rather than into the Earth or Sun. If we have an idea of distance we could measure how far it is from a point to the nearest part of the boundary. Do we need distance, though?

No, it turns out. We can get the idea of open sets without using distance. Introduce a neighborhood of a point. A neighborhood of a point is an open set that contains that point. It doesn’t have to be small, but that’s the connotation. And we get to thinking of little N-balls, circle or sphere-like constructs centered on the target point. It doesn’t have to be N-balls. But we think of them so much that we might as well say it’s necessary. If every point in a set has a neighborhood around it that’s also inside the set, then the set’s open.

You’re going to accuse me of begging the question. Fair enough. I was using open sets to define open sets. This use is all right for an intuitive idea of what makes a set open, but it’s not rigorous. We can give in and say we have to have distance. Then we have N-balls and we can build open sets out of balls that don’t contain the edges. Or we can try to drive distance out of our idea of open sets.

We can do it this way. Start off by saying the whole universe is an open set. Also that the union of any number of open sets is also an open set. And that the intersection of any finite number of open sets is also an open set. Does this sound weak? So it sounds weak. It’s enough. We get the open sets we were thinking of all along from this.

This works for the sets that look like territories on a map. It also works for sets for which we have some idea of distance, however strange it is to our everyday distances. It even works if we don’t have any idea of distance. This lets us talk about topological spaces, and study what geometry looks like if we can’t tell how far apart two points are. We can, for example, at least tell that two points are different. Can we find a neighborhood of one that doesn’t contain the other? Then we know they’re some distance apart, even without knowing what distance is.

That we reached so abstract an idea of what an open set is without losing the idea’s usefulness suggests we’re doing well. So we are. It also shows why Nicholas Bourbaki, the famous nonexistent mathematician, thought set theory and its related ideas were the core of mathematics. Today category theory is a more popular candidate for the core of mathematics. But set theory is still close to the core, and much of analysis is about what we can know from the fact of sets being open. Open sets let us explain a lot.

With the third A-to-Z choice for the letter O, I finally set ortho-ness down. I had thought the letter might become a reference for everything described as ortho-. It has to be acknowledged that two or three examples gets you the general idea of what’s got at when something is named ortho-, though.

Must admit, I haven’t that I remember ever solved a differential equation using osculating circles instead of, you know, polynomials or sine functions (Fourier series). But references I trust say that would be a way to go.

I’m happy to say it’s another request today. This one’s from HowardAt58, author of the Saving School Math blog. He’s given me some great inspiration in the past.

Osculating Circle.

It’s right there in the name. Osculating. You know what that is from that one Daffy Duck cartoon where he cries out “Greetings, Gate, let’s osculate” while wearing a moustache. Daffy’s imitating somebody there, but goodness knows who. Someday the mystery drives the young you to a dictionary web site. Osculate means kiss. This doesn’t seem to explain the scene. Daffy was imitating Jerry Colonna. That meant something in 1943. You can find him on old-time radio recordings. I think he’s funny, in that 40s style.

Make the substitution. A kissing circle. Suppose it’s not some playground antic one level up from the Kissing Bandit that plagues recess yet one or two levels down what we imagine we’d do in high school. It suggests a circle that comes really close to something, that touches it a moment, and then goes off its own way.

But then touching. We know another word for that. It’s the root behind “tangent”. Tangent is a trigonometry term. But it appears in calculus too. The tangent line is a line that touches a curve at one specific point and is going in the same direction as the original curve is at that point. We like this because … well, we do. The tangent line is a good approximation of the original curve, at least at the tangent point and for some region local to that. The tangent touches the original curve, and maybe it does something else later on. What could kissing be?

The osculating circle is about approximating an interesting thing with a well-behaved thing. So are similar things with names like “osculating curve” or “osculating sphere”. We need that a lot. Interesting things are complicated. Well-behaved things are understood. We move from what we understand to what we would like to know, often, by an approximation. This is why we have tangent lines. This is why we build polynomials that approximate an interesting function. They share the original function’s value, and its derivative’s value. A polynomial approximation can share many derivatives. If the function is nice enough, and the polynomial big enough, it can be impossible to tell the difference between the polynomial and the original function.

The osculating circle, or sphere, isn’t so concerned with matching derivatives. I know, I’m as shocked as you are. Well, it matches the first and the second derivatives of the original curve. Anything past that, though, it matches only by luck. The osculating circle is instead about matching the curvature of the original curve. The curvature is what you think it would be: it’s how much a function curves. If you imagine looking closely at the original curve and an osculating circle they appear to be two arcs that come together. They must touch at one point. They might touch at others, but that’s incidental.

Osculating circles, and osculating spheres, sneak out of mathematics and into practical work. This is because we often want to work with things that are almost circles. The surface of the Earth, for example, is not a sphere. But it’s only a tiny bit off. It’s off in ways that you only notice if you are doing high-precision mapping. Or taking close measurements of things in the sky. Sometimes we do this. So we map the Earth locally as if it were a perfect sphere, with curvature exactly what its curvature is at our observation post.

Or we might be observing something moving in orbit. If the universe had only two things in it, and they were the correct two things, all orbits would be simple: they would be ellipses. They would have to be “point masses”, things that have mass without any volume. They never are. They’re always shapes. Spheres would be fine, but they’re never perfect spheres even. The slight difference between a perfect sphere and whatever the things really are affects the orbit. Or the other things in the universe tug on the orbiting things. Or the thing orbiting makes a course correction. All these things make little changes in the orbiting thing’s orbit. The actual orbit of the thing is a complicated curve. The orbit we could calculate is an osculating — well, an osculating ellipse, rather than an osculating circle. Similar idea, though. Call it an osculating orbit if you’d rather.

That osculating circles have practical uses doesn’t mean they aren’t respectable mathematics. I’ll concede they’re not used as much as polynomials or sine curves are. I suppose that’s because polynomials and sine curves have nicer derivatives than circles do. But osculating circles do turn up as ways to try solving nonlinear differential equations. We need the help. Linear differential equations anyone can solve. Nonlinear differential equations are pretty much impossible. They also turn up in signal processing, as ways to find the frequencies of a signal from a sampling of data. This, too, we would like to know.

We get the name “osculating circle” from Gottfried Wilhelm Leibniz. This might not surprise. Finding easy-to-understand shapes that approximate interesting shapes is why we have calculus. Isaac Newton described a way of making them in the Principia Mathematica. This also might not surprise. Of course they would on this subject come so close together without kissing.

For early 2016 — dubbed “Leap Day 2016” as that’s when it started — I got a request to explain orthogonal. I went in a different direction, although not completely different. This essay does get a bit more into specifics of how mathematicians use the idea, like, showing some calculations and such. I put in a casual description of vectors here. For book publication I’d want to rewrite that to be clearer that, like, ordered sets of numbers are just one (very common) way to represent vectors.

“Orthogonal” is another word for “perpendicular”. Mathematicians use it for reasons I’m not precisely sure of. My belief is that it’s because “perpendicular” sounds like we’re talking about directions. And we want to extend the idea to things that aren’t necessarily directions. As majors, mathematicians learn orthogonality for vectors, things pointing in different directions. Then we extend it to other ideas. To functions, particularly, but we can also define it for spaces and for other stuff.

I was vague, last summer, about how we do that. We do it by creating a function called the “inner product”. That takes in two of whatever things we’re measuring and gives us a real number. If the inner product of two things is zero, then the two things are orthogonal.

The first example mathematics majors learn of this, before they even hear the words “inner product”, are dot products. These are for vectors, ordered sets of numbers. The dot product we find by matching up numbers in the corresponding slots for the two vectors, multiplying them together, and then adding up the products. For example. Give me the vector with values (1, 2, 3), and the other vector with values (-6, 5, -4). The inner product will be 1 times -6 (which is -6) plus 2 times 5 (which is 10) plus 3 times -4 (which is -12). So that’s -6 + 10 – 12 or -8.

So those vectors aren’t orthogonal. But how about the vectors (1, -1, 0) and (0, 0, 1)? Their dot product is 1 times 0 (which is 0) plus -1 times 0 (which is 0) plus 0 times 1 (which is 0). The vectors are perpendicular. And if you tried drawing this you’d see, yeah, they are. The first vector we’d draw as being inside a flat plane, and the second vector as pointing up, through that plane, like a thumbtack.

So that’s orthogonal. What about this orthonormal stuff?

Well … the inner product can tell us something besides orthogonality. What happens if we take the inner product of a vector with itself? Say, (1, 2, 3) with itself? That’s going to be 1 times 1 (which is 1) plus 2 times 2 (4, according to rumor) plus 3 times 3 (which is 9). That’s 14, a tidy sum, although, so what?

The inner product of (-6, 5, -4) with itself? Oh, that’s some ugly numbers. Let’s skip it. How about the inner product of (1, -1, 0) with itself? That’ll be 1 times 1 (which is 1) plus -1 times -1 (which is positive 1) plus 0 times 0 (which is 0). That adds up to 2. And now, wait a minute. This might be something.

Start from somewhere. Move 1 unit to the east. (Don’t care what the unit is. Inches, kilometers, astronomical units, anything.) Then move -1 units to the north, or like normal people would say, 1 unit o the south. How far are you from the starting point? … Well, you’re the square root of 2 units away.

Now imagine starting from somewhere and moving 1 unit east, and then 2 units north, and then 3 units straight up, because you found a convenient elevator. How far are you from the starting point? This may take a moment of fiddling around with the Pythagorean theorem. But you’re the square root of 14 units away.

And what the heck, (0, 0, 1). The inner product of that with itself is 0 times 0 (which is zero) plus 0 times 0 (still zero) plus 1 times 1 (which is 1). That adds up to 1. And, yeah, if we go one unit straight up, we’re one unit away from where we started.

The inner product of a vector with itself gives us the square of the vector’s length. At least if we aren’t using some freak definition of inner products and lengths and vectors. And this is great! It means we can talk about the length — maybe better to say the size — of things that maybe don’t have obvious sizes.

Some stuff will have convenient sizes. For example, they’ll have size 1. The vector (0, 0, 1) was one such. So is (1, 0, 0). And you can think of another example easily. Yes, it’s . (Go ahead, check!)

So by “orthonormal” we mean a collection of things that are orthogonal to each other, and that themselves are all of size 1. It’s a description of both what things are by themselves and how they relate to one another. A thing can’t be orthonormal by itself, for the same reason a line can’t be perpendicular to nothing in particular. But a pair of things might be orthogonal, and they might be the right length to be orthonormal too.

Why do this? Well, the same reasons we always do this. We can impose something like direction onto a problem. We might be able to break up a problem into simpler problems, one in each direction. We might at least be able to simplify the ways different directions are entangled. We might be able to write a problem’s solution as the sum of solutions to a standard set of representative simple problems. This one turns up all the time. And an orthogonal set of something is often a really good choice of a standard set of representative problems.

This sort of thing turns up a lot when solving differential equations. And those often turn up when we want to describe things that happen in the real world. So a good number of mathematicians develop a habit of looking for orthonormal sets.

I haven’t had the space yet to finish my Little 2021 A-to-Z, so let me resume playing the hits of past ones. For my first, Summer 2015, one, I picked all the topics myself. This one, Orthogonal, I remember as one of the challenging ones. The challenge was the question put in the first paragraph: why do we have this term, which is so nearly a synonym for “perpendicular”? I didn’t find an answer, then, or since. But I was able to think about how we use “orthogonal” and what it might do that “perpendicular ” doesn’t..

Orthogonal.

Orthogonal is another word for perpendicular. So why do we need another word for that?

It helps to think about why “perpendicular” is a useful way to organize things. For example, we can describe the directions to a place in terms of how far it is north-south and how far it is east-west, and talk about how fast it’s travelling in terms of its speed heading north or south and its speed heading east or west. We can separate the north-south motion from the east-west motion. If we’re lucky these motions separate entirely, and we turn a complicated two- or three-dimensional problem into two or three simpler problems. If they can’t be fully separated, they can often be largely separated. We turn a complicated problem into a set of simpler problems with a nice and easy part plus an annoying yet small hard part.

And this is why we like perpendicular directions. We can often turn a problem into several simpler ones describing each direction separately, or nearly so.

And now the amazing thing. We can separate these motions because the north-south and the east-west directions are at right angles to one another. But we can describe something that works like an angle between things that aren’t necessarily directions. For example, we can describe an angle between things like functions that have the same domain. And once we can describe the angle between two functions, we can describe functions that make right angles between each other.

This means we can describe functions as being perpendicular to one another. An example. On the domain of real numbers from -1 to 1, the function is perpendicular to the function . And when we want to study a more complicated function we can separate the part that’s in the “direction” of f(x) from the part that’s in the “direction” of g(x). We can treat functions, even functions we don’t know, as if they were locations in space. And we can study and even solve for the different parts of the function as if we were pinning down the north-south and the east-west movements of a thing.

So if we want to study, say, how heat flows through a body, we can work out a series of “direction” for functions, and work out the flow in each of those “directions”. These don’t have anything to do with left-right or up-down directions, but the concepts and the convenience is similar.

I’ve spoken about this in terms of functions. But we can define the “angle” between things for many kinds of mathematical structures. Once we can do that, we can have “perpendicular” pairs of things. I’ve spoken only about functions, but that’s because functions are more familiar than many of the mathematical structures that have orthogonality.

Ah, but why call it “orthogonal” rather than “perpendicular”? And I don’t know. The best I can work out is that it feels weird to speak of, say, the cosine function being “perpendicular” to the sine function when you can’t really say either is in any particular direction. “Orthogonal” seems to appeal less directly to physical intuition while still meaning something. But that’s my guess, rather than the verdict of a skilled etymologist.

For the 2020 A-to-Z I took the suggestion to write about tiling. It’s a fun field with many interesting wrinkles. And I realized after publishing that I had already written about Tiling, just two years before. There was no scrambling together a replacement essay, so I had to let it stand as is.

The accidental remake allows for some interesting studies, though. The two essays have very similar structures, which probably reflects that I came to both essays with similar rough ideas what to write, and went to similar sources to fill in details. The second essay turned out longer. Also, I think, better. I did a bit more tracking down specifics, such as trying to find Hao Wang’s paper and see just what it says. And rewriting is often key to good writing. This offers lessons in preparing these essays for book publication.

Mr Wu, author of the Singapore Maths Tuition blog, had an interesting suggestion for the letter T: Talent. As in mathematical talent. It’s a fine topic but, in the end, too far beyond my skills. I could share some of the legends about mathematical talent I’ve received. But what that says about the culture of mathematicians is a deeper and more important question.

So I picked my own topic for the week. I do have topics for next week — U — and the week after — V — chosen. But the letters W and X? I’m still open to suggestions. I’m open to creative or wild-card interpretations of the letters. Especially for X and (soon) Z. Thanks for sharing any thoughts you care to.

Tiling.

Think of a floor. Imagine you are bored. What do you notice?

What I hope you notice is that it is covered. Perhaps by carpet, or concrete, or something homogeneous like that. Let’s ignore that. My floor is covered in small pieces, repeated. My dining room floor is slats of wood, about three and a half feet long and two inches wide. The slats are offset from the neighbors so there’s a pleasant strong line in one direction and stippled lines in the other. The kitchen is squares, one foot on each side. This is a grid we could plot high school algebra functions on. The bathroom is more elaborate. It has white rectangles about two inches long, tan rectangles about two inches long, and black squares. Each rectangle is perpendicular to ones of the other color, and arranged to bisect those. The black squares fill the gaps where no rectangle would fit.

Move from my house to pure mathematics. It’s easy to turn the floor of a room into abstract mathematics. We start with something to tile. Usually this is the infinite, two-dimensional plane. The thing you get if you have a house and forget the walls. Sometimes we look to tile the hyperbolic plane, a different geometry that we of course represent with a finite circle. (Setting particular rules about how to measure distance makes this equivalent to a funny-shaped plane.) Or the surface of a sphere, or of a torus, or something like that. But if we don’t say otherwise, it’s the plane.

What to cover it with? … Smaller shapes. We have a mathematical tiling if we have a collection of not-overlapping open sets. And if those open sets, plus their boundaries, cover the whole plane. “Cover” here means what “cover” means in English, only using more technical words. These sets — these tiles — can be any shape. We can have as many or as few of them as we like. We can even add markings to the tiles, give them colors or patterns or such, to add variety to the puzzles.

(And if we want, we can do this in other dimensions. There are good “tiling” questions to ask about how to fill a three-dimensional space, or a four-dimensional one, or more.)

Having an unlimited collection of tiles is nice. But mathematicians learn to look for how little we need to do something. Here, we look for the smallest number of distinct shapes. As with tiling an actual floor, we can get all the tiles we need. We can rotate them, too, to any angle. We can flip them over and put the “top” side “down”, something kitchen tiles won’t let us do. Can we reflect them? Use the shape we’d get looking at the mirror image of one? That’s up to whoever’s writing this paper.

What shapes will work? Well, squares, for one. We can prove that by looking at a sheet of graph paper. Rectangles would work too. We can see that by drawing boxes around the squares on our graph paper. Two-by-one blocks, three-by-two blocks, 40-by-1 blocks, these all still cover the paper and we can imagine covering the plane. If we like, we can draw two-by-two squares. Squares made up of smaller squares. Or repeat this: draw two-by-one rectangles, and then group two of these rectangles together to make two-by-two squares.

We can take it on faith that, oh, rectangles π long by e wide would cover the plane too. These can all line up in rows and columns, the way our squares would. Or we can stagger them, like bricks or my dining room’s wood slats are.

How about parallelograms? Those, it turns out, tile exactly as well as rectangles or squares do. Grids or staggered, too. Ah, but how about trapezoids? Surely they won’t tile anything. Not generally, anyway. The slanted sides will, most of the time, only fit in weird winding circle-like paths.

Unless … take two of these trapezoid tiles. We’ll set them down so the parallel sides run horizontally in front of you. Rotate one of them, though, 180 degrees. And try setting them — let’s say so the longer slanted line of both trapezoids meet, edge to edge. These two trapezoids come together. They make a parallelogram, although one with a slash through it. And we can tile parallelograms, whether or not they have a slash.

OK, but if you draw some weird quadrilateral shape, and it’s not anything that has a more specific name than “quadrilateral”? That won’t tile the plane, will it?

It will! In one of those turns that surprises and impresses me every time I run across it again, any quadrilateral can tile the plane. It opens up so many home decorating options, if you get in good with a tile maker.

That’s some good news for quadrilateral tiles. How about other shapes? Triangles, for example? Well, that’s good news too. Take two of any identical triangle you like. Turn one of them around and match sides of the same length. The two triangles, bundled together like that, are a quadrilateral. And we can use any quadrilateral to tile the plane, so we’re done.

How about pentagons? … With pentagons, the easy times stop. It turns out not every pentagon will tile the plane. The pentagon has to be of the right kind to make it fit. If the pentagon is in one of these kinds, it can tile the plane. If not, not. There are fifteen families of tiling known. The most recent family was discovered in 2015. It’s thought that there are no other convex pentagon tilings. I don’t know whether the proof of that is generally accepted in tiling circles. And we can do more tilings if the pentagon doesn’t need to be convex. For example, we can cut any parallelogram into two identical pentagons. So we can make as many pentagons as we want to cover the plane. But we can’t assume any pentagon we like will do it.

And there the good times end. There are no convex heptagons or octagons or any other shape with more sides that tile the plane.

Not by themselves, anyway. If we have more than one tile shape we can start doing fine things again. Octagons assisted by squares, for example, will tile the plane. I’ve lived places with that tiling. Or something that looks like it. It’s easier to install if you have square tiles with an octagon pattern making up the center, and triangle corners a different color. These squares come together to look like octagons and squares.

And this leads to a fun avenue of tiling. Hao Wang, in the early 60s, proposed a sort of domino-like tiling. You may have seen these in mathematics puzzles, or in toys. Each of these Wang Tiles, or Wang Dominoes, is a square. But the square is cut along the diagonals, into four quadrants. Each quadrant is a right triangle. Each quadrant, each triangle, is one of a finite set of colors. Adjacent triangles can have the same color. You can place down tiles, subject only to the rule that the tile edge has to have the same color on both sides. So a tile with a blue right-quadrant has to have on its right a tile with a blue left-quadrant. A tile with a white upper-quadrant on its top has, above it, a tile with a white lower-quadrant.

In 1961 Wang conjectured that if a finite set of these tiles will tile the plane, then there must be a periodic tiling. That is, if you picked up the plane and slid it a set horizontal and vertical distance, it would all look the same again. This sort of translation is common. All my floors do that. If we ignore things like the bounds of their rooms, or the flaws in their manufacture or installation or where a tile broke in some mishap.

This is not to say you couldn’t arrange them aperiodically. You don’t even need Wang Tiles for that. Get two colors of square tile, a white and a black, and lay them down based on whether the next decimal digit of π is odd or even. No; Wang’s conjecture was that if you had tiles that you could lay down aperiodically, then you could also arrange them to set down periodically. With the black and white squares, lay down alternate colors. That’s easy.

In 1964, Robert Berger proved Wang’s conjecture was false. He found a collection of Wang Tiles that could only tile the plane aperiodically. In 1966 he published this in the Memoirs of the American Mathematical Society. The 1964 proof was for his thesis. 1966 was its general publication. I mention this because while doing research I got irritated at how different sources dated this to 1964, 1966, or sometimes 1961. I want to have this straightened out. It appears Berger had the proof in 1964 and the publication in 1966.

I would like to share details of Berger’s proof, but haven’t got access to the paper. What fascinates me about this is that Berger’s proof used a set of 20,426 different tiles. I assume he did not work this all out with shards of construction paper, but then, how to get 20,426 of anything? With computer time as expensive as it was in 1964? The mystery of how he got all these tiles is worth an essay of its own and regret I can’t write it.

Berger conjectured that a smaller set might do. Quite so. He himself reduced the set to 104 tiles. Donald Knuth in 1968 modified the set down to 92 tiles. In 2015 Emmanuel Jeandel and Michael Rao published a set of 11 tiles, using four colors. And showed by computer search that a smaller set of tiles, or fewer colors, would not force some aperiodic tiling to exist. I do not know whether there might be other sets of 11, four-colored, tiles that work. You can see the set at the top of Wikipedia’s page on Wang Tiles.

These Wang Tiles, all squares, inspired variant questions. Could there be other shapes that only aperiodically tile the plane? What if they don’t have to be squares? Raphael Robinson, in 1971, came up with a tiling using six shapes. The shapes have patterns on them too, usually represented as colored lines. Tiles can be put down only in ways that fit and that make the lines match up.

Among my readers are people who have been waiting, for 1800 words now, for Roger Penrose. It’s now that time. In 1974 Penrose published an aperiodic tiling, one based on pentagons and using a set of six tiles. You’ve never heard of that either, because soon after he found a different set, based on a quadrilateral cut into two shapes. The shapes, as with Wang Tiles or Robinson’s tiling, have rules about what edges may be put against each other. Penrose — and independently Robert Ammann — also developed another set, this based on a pair of rhombuses. These have rules about what edges may tough one another, and have patterns on them which must line up.

To show that the rhombus-based Penrose tiling is aperiodic takes some arguing. But it uses tools already used in this essay. Remember drawing rectangles around several squares? And then drawing squares around several of these rectangles? We can do that with these Penrose-Ammann rhombuses. From the rhombus tiling we can draw bigger rhombuses. Ones which, it turns out, follow the same edge rules that the originals do. So that we can go again, grouping these bigger rhombuses into even-bigger rhombuses. And into even-even-bigger rhombuses. And so on.

What this gets us is this: suppose the rhombus tiling is periodic. Then there’s some finite-distance horizontal-and-vertical move that leaves the pattern unchanged. So, the same finite-distance move has to leave the bigger-rhombus pattern unchanged. And this same finite-distance move has to leave the even-bigger-rhombus pattern unchanged. Also the even-even-bigger pattern unchanged.

Keep bundling rhombuses together. You get eventually-big-enough-rhombuses. Now, think of how far you have to move the tiles to get a repeat pattern. Especially, think how many eventually-big-enough-rhombuses it is. This distance, the move you have to make, is less than one eventually-big-enough rhombus. (If it’s not you aren’t eventually-big-enough yet. Bundle them together again.) And that doesn’t work. Moving one tile over without changing the pattern makes sense. Moving one-half a tile over? That doesn’t. So the eventually-big-enough pattern can’t be periodic, and so, the original pattern can’t be either. This is explained in graphic detail a nice Powerpoint slide set from Professor Alexander F Ritter, A Tour Of Tilings In Thirty Minutes.

It’s possible to do better. In 2010 Joshua E S Socolar and Joan M Taylor published a single tile that can force an aperiodic tiling. As with the Wang Tiles, and Robinson shapes, and the Penrose-Ammann rhombuses, markings are part of it. They have to line up so that the markings — in two colors, in the renditions I’ve seen — make sense. With the Penrose tilings, you can get away from the pattern rules for the edges by replacing them with little notches. The Socolar-Taylor shape can make a similar trade. Here the rules are complex enough that it would need to be a three-dimensional shape, one that looks like the dilithium housing of the warp core. You can see the tile — in colored, marked form, and also in three-dimensional tile shape — at the PDF here. It’s likely not coming to the flooring store soon.

It’s all wonderful, but is it useful? I could go on a few hundred words about, particularly, crystals and quasicrystals. These are important for materials science. Especially these days as we have harnessed slightly-imperfect crystals to be our computers. I don’t care. These are lovely to look at. If you see nothing appealing in a great heap of colors and polygons spread over the floor there are things we cannot communicate about. Tiling is a delight; what more do you need?

By the time of 2019 and my sixth A-to-Z series , I had some standard narrative tricks I could deploy. My insistence that everything is polynomials, for example. Anecdotes from my slight academic career. A prose style that emphasizes what we do with the idea of something rather than instructions. That last comes from the idea that if you wanted to know how to compute a Taylor series you’d just look it up on Mathworld or Wikipedia or whatnot. The thing a pop mathematics blog can do is give some reason that you’d want to know how to compute a Taylor series. I regret talking about functions that break Taylor series, though. I have to treat these essays as introducing the idea of a Taylor series to someone who doesn’t know anything about them. And it’s bad form to teach how stuff doesn’t work too close to teaching how it does work. Readers tend to blur what works and what doesn’t together. Still, is a really neat weird function and it’d be a shame to let it go completely unmentioned.

Today’s A To Z term was nominated by APMA, author of the Everybody Makes DATA blog. It was a topic that delighted me to realize I could explain. Then it started to torment me as I realized there is a lot to explain here, and I had to pick something. So here’s where things ended up.

Taylor Series.

In the mid-2000s I was teaching at a department being closed down. In its last semester I had to teach Computational Quantum Mechanics. The person who’d normally taught it had transferred to another department. But a few last majors wanted the old department’s version of the course, and this pressed me into the role. Teaching a course you don’t really know is a rush. It’s a semester of learning, and trying to think deeply enough that you can convey something to students. This while all the regular demands of the semester eat your time and working energy. And this in the leap of faith that the syllabus you made up, before you truly knew the subject, will be nearly enough right. And that you have not committed to teaching something you do not understand.

So around mid-course I realized I needed to explain finding the wave function for a hydrogen atom with two electrons. The wave function is this probability distribution. You use it to find things like the probability a particle is in a certain area, or has a certain momentum. Things like that. A proton with one electron is as much as I’d ever done, as a physics major. We treat the proton as the center of the universe, immobile, and the electron hovers around that somewhere. Two electrons, though? A thing repelling your electron, and repelled by your electron, and neither of those having fixed positions? What the mathematics of that must look like terrified me. When I couldn’t procrastinate it farther I accepted my doom and read exactly what it was I should do.

It turned out I had known what I needed for nearly twenty years already. Got it in high school.

Of course I’m discussing Taylor Series. The equations were loaded down with symbols, yes. But at its core, the important stuff, was this old and trusted friend.

The premise behind a Taylor Series is even older than that. It’s universal. If you want to do something complicated, try doing the simplest thing that looks at all like it. And then make that a little bit more like you want. And then a bit more. Keep making these little improvements until you’ve got it as right as you truly need. Put that vaguely, the idea describes Taylor series just as well as it describes making a video game or painting a state portrait. We can make it more specific, though.

A series, in this context, means the sum of a sequence of things. This can be finitely many things. It can be infinitely many things. If the sum makes sense, we say the series converges. If the sum doesn’t, we say the series diverges. When we first learn about series, the sequences are all numbers. , for example, which diverges. (It adds to a number bigger than any finite number.) Or , which converges. (It adds to .)

In a Taylor Series, the terms are all polynomials. They’re simple polynomials. Let me call the independent variable ‘x’. Sometimes it’s ‘z’, for the reasons you would expect. (‘x’ usually implies we’re looking at real-valued functions. ‘z’ usually implies we’re looking at complex-valued functions. ‘t’ implies it’s a real-valued function with an independent variable that represents time.) Each of these terms is simple. Each term is the distance between x and a reference point, raised to a whole power, and multiplied by some coefficient. The reference point is the same for every term. What makes this potent is that we use, potentially, many terms. Infinitely many terms, if need be.

Call the reference point ‘a’. Or if you prefer, x_{0}. z_{0} if you want to work with z’s. You see the pattern. This ‘a’ is the “point of expansion”. The coefficients of each term depend on the original function at the point of expansion. The coefficient for the term that has is the first derivative of f, evaluated at a. The coefficient for the term that has is the second derivative of f, evaluated at a (times a number that’s the same for the squared-term for every Taylor Series). The coefficient for the term that has is the third derivative of f, evaluated at a (times a different number that’s the same for the cubed-term for every Taylor Series).

You’ll never guess what the coefficient for the term with is. Nor will you ever care. The only reason you would wish to is to answer an exam question. The instructor will, in that case, have a function that’s either the sine or the cosine of x. The point of expansion will be 0, , , or .

Otherwise you will trust that this is one of the terms of , ‘n’ representing some counting number too great to be interesting. All the interesting work will be done with the Taylor series either truncated to a couple terms, or continued on to infinitely many.

What a Taylor series offers is the chance to approximate a function we’re genuinely interested in with a polynomial. This is worth doing, usually, because polynomials are easier to work with. They have nice analytic properties. We can automate taking their derivatives and integrals. We can set a computer to calculate their value at some point, if we need that. We might have no idea how to start calculating the logarithm of 1.3. We certainly have an idea how to start calculating . (Yes, it’s 0.3. I’m using a Taylor series with a = 1 as the point of expansion.)

The first couple terms tell us interesting things. Especially if we’re looking at a function that represents something physical. The first two terms tell us where an equilibrium might be. The next term tells us whether an equilibrium is stable or not. If it is stable, it tells us how perturbations, points near the equilibrium, behave.

The first couple terms will describe a line, or a quadratic, or a cubic, some simple function like that. Usually adding more terms will make this Taylor series approximation a better fit to the original. There might be a larger region where the polynomial and the original function are close enough. Or the difference between the polynomial and the original function will be closer together on the same old region.

We would really like that region to eventually grow to the whole domain of the original function. We can’t count on that, though. Roughly, the interval of convergence will stretch from ‘a’ to wherever the first weird thing happens. Weird things are, like, discontinuities. Vertical asymptotes. Anything you don’t like dealing with in the original function, the Taylor series will refuse to deal with. Outside that interval, the Taylor series diverges and we just can’t use it for anything meaningful. Which is almost supernaturally weird of them. The Taylor series uses information about the original function, but it’s all derivatives at a single point. Somehow the derivatives of, say, the logarithm of x around x = 1 give a hint that the logarithm of 0 is undefinable. And so they won’t help us calculate the logarithm of 3.

Things can be weirder. There are functions that just break Taylor series altogether. Some are obvious. A function needs lots of derivatives at a point to have a good Taylor series approximation. So, many fractal curves won’t have a Taylor series approximation. These curves are all corners, points where they aren’t continuous or where derivatives don’t exist. Some are obviously designed to break Taylor series approximations. We can make a function that follows different rules if x is rational than if x is irrational. There’s no approximating that, and you’d blame the person who made such a function, not the Taylor series. It can be subtle. The function defined by the rule , with the note that if x is zero then f(x) is 0, seems to satisfy everything we’d look for. It’s a function that’s mostly near 1, that drops down to being near zero around where x = 0. But its Taylor series expansion around a = 0 is a horizontal line always at 0. The interval of convergence can be a single point, challenging our idea of what an interval is.

That’s all right. If we can trust that we’re avoiding weird parts, Taylor series give us an outstanding new tool. Grant that the Taylor series describes a function with the same rule as our original function. The Taylor series is often easier to work with, especially if we’re working on differential equations. We can automate, or at least find formulas for, taking the derivative of a polynomial. Or adding together derivatives of polynomials. Often we can attack a differential equation too hard to solve otherwise by supposing the answer is a polynomial. This is essentially what that quantum mechanics problem used, and why the tool was so familiar when I was in a strange land.

Roughly. What I was actually doing was treating the function I wanted as a power series. This is, like the Taylor series, the sum of a sequence of terms, all of which are times some coefficient. What makes it not a Taylor series is that the coefficients weren’t the derivatives of any function I knew to start. But the experience of Taylor series trained me to look at functions as things which could be approximated by polynomials.

This gives us the hint to look at other series that approximate interesting functions. We get a host of these, with names like Laurent series and Fourier series and Chebyshev series and such. Laurent series look like Taylor series but we allow powers to be negative integers as well as positive ones. Fourier series do away with polynomials. They instead use trigonometric functions, sines and cosines. Chebyshev series build on polynomials, but not on pure powers. They’ll use orthogonal polynomials. These behave like perpendicular directions do. That orthogonality makes many numerical techniques behave better.

The Taylor series is a great introduction to these tools. Its first several terms have good physical interpretations. Its calculation requires tools we learn early on in calculus. The habits of thought it teaches guides us even in unfamiliar territory.

And I feel very relieved to be done with this. I often have a few false starts to an essay, but those are mostly before I commit words to text editor. This one had about four branches that now sit in my scrap file. I’m glad to have a deadline forcing me to just publish already.

I keep saying in picking A-to-Z topics that just because I don’t take a suggestion now doesn’t mean I won’t in the future. 2018’s A-to-Z I notice includes Mr Wu’s suggestion of “torus”. I didn’t take it then, but did get to it in this year’s little project. I’m glad to have the proof my word is good. I have thought sometime I might fill a gap in my inspiration by taking topics I hadn’t used in A-to-Z’s (I’ve kept lists) and doing them. I’d just need a catchy name for the set of essays.

Here is a surprising thought for the next time you consider remodeling the kitchen. It’s common to tile the floor. Perhaps some of the walls behind the counter. What patterns could you use? And there are infinitely many possibilities. You might leap ahead of me and say, yes, but they’re all boring. A tile that’s eight inches square is different from one that’s twelve inches square and different from one that’s 12.01 inches square. Fine. Let’s allow that all square tiles are “really” the same pattern. The only difference between a square two feet on a side and a square half an inch on a side is how much grout you have to deal with. There are still infinitely many possibilities.

You might still suspect me of being boring. Sure, there’s a rectangular tile that’s, say, six inches by eight inches. And one that’s six inches by nine inches. Six inches by ten inches. Six inches by one millimeter. Yes, I’m technically right. But I’m not interested in that. Let’s allow that all rectangular tiles are “really” the same pattern. So we have “squares” and “rectangles”. There are still infinitely many tile possibilities.

Let me shorten the discussion here. Draw a quadrilateral. One that doesn’t intersect itself. That is, there’s four corners, four lines, and there’s no X crossings. If you have that, then you have a tiling. Get enough of these tiles and arrange them correctly and you can cover the plane. Or the kitchen floor, if you have a level floor. It might not be obvious how to do it. You might have to rotate alternating tiles, or set them in what seem like weird offsets. But you can do it. You’ll need someone to make the tiles for you, if you pick some weird pattern. I hope I live long enough to see it become part of the dubious kitchen package on junk home-renovation shows.

Let me broaden the discussion here. What do I mean by a tiling if I’m allowing any four-sided figure to be a tile? We start with a surface. Usually the plane, a flat surface stretching out infinitely far in two dimensions. The kitchen floor, or any other mere mortal surface, approximates this. But the floor stops at some point. That’s all right. The ideas we develop for the plane work all right for the kitchen. There’s some weird effects for the tiles that get too near the edges of the room. We don’t need to worry about them here. The tiles are some collection of open sets. No two tiles overlap. The tiles, plus their boundaries, cover the whole plane. That is, every point on the plane is either inside exactly one of the open sets, or it’s on the boundary between one (or more) sets.

There isn’t a requirement that all these sets have the same shape. We usually do, and will limit our tiles to one or two shapes endlessly repeated. It seems to appeal to our aesthetics and our installation budget. Using a single pattern allows us to cover the plane with triangles. Any triangle will do. Similarly any quadrilateral will do. For convex pentagonal tiles — here things get weird. There are fourteen known families of pentagons that tile the plane. Each member of the family looks about the same, but there’s some room for variation in the sides. Plus there’s one more special case that can tile the plane, but only that one shape, with no variation allowed. We don’t know if there’s a sixteenth pattern. But then until 2015 we didn’t know there was a 15th, and that was the first pattern found in thirty years. Might be an opening for someone with a good eye for doodling.

There are also exciting opportunities in convex hexagons. Anyone who plays strategy games knows a regular hexagon will tile the plane. (Regular hexagonal tilings fit a certain kind of strategy game well. Particularly they imply an equal distance between the centers of any adjacent tiles. Square and triangular tiles don’t guarantee that. This can imply better balance for territory-based games.) Irregular hexagons will, too. There are three known families of irregular hexagons that tile the plane. You can treat the regular hexagon as a special case of any of these three families. No one knows if there’s a fourth family. Ready your notepad at the next overlong, agenda-less meeting.

There aren’t tilings for identical convex heptagons, figures with seven sides. Nor eight, nor nine, nor any higher figure. You can cover them if you have non-convex figures. See any Tetris game where you keep getting the ‘s’ or ‘t’ shapes. And you can cover them if you use several shapes.

There’s some guidance if you want to create your own periodic tilings. I see it called the Conway Criterion. I don’t know the field well enough to say whether that is a common term. It could be something one mathematics popularizer thought of and that other popularizers imitated. (I don’t find “Conway Criterion” on the Mathworld glossary, but that isn’t definitive.) Suppose your polygon satisfies a couple of rules about the shapes of the edges. The rules are given in that link earlier this paragraph. If your shape does, then it’ll be able to tile the plane. If you don’t satisfy the rules, don’t despair! It might yet. The Conway Criterion tells you when some shape will tile the plane. It won’t tell you that something won’t.

(The name “Conway” may nag at you as familiar from somewhere. This criterion is named for John H Conway, who’s famous for a bunch of work in knot theory, group theory, and coding theory. And in popular mathematics for the “Game of Life”. This is a set of rules on a grid of numbers. The rules say how to calculate a new grid, based on this first one. Iterating them, creating grid after grid, can make patterns that seem far too complicated to be implicit in the simple rules. Conway also developed an algorithm to calculate the day of the week, in the Gregorian calendar. It is difficult to explain to the non-calendar fan how great this sort of thing is.)

This has all gotten to periodic tilings. That is, these patterns might be complicated. But if need be, we could get them printed on a nice square tile and cover the floor with that. Almost as beautiful and much easier to install. Are there tilings that aren’t periodic? Aperiodic tilings?

Well, sure. Easily. Take a bunch of tiles with a right angle, and two 45-degree angles. Put any two together and you have a square. So you’re “really” tiling squares that happen to be made up of a pair of triangles. Each pair, toss a coin to decide whether you put the diagonal as a forward or backward slash. Done. That’s not a periodic tiling. Not unless you had a weird run of luck on your coin tosses.

All right, but is that just a technicality? We could have easily installed this periodically and we just added some chaos to make it “not work”. Can we use a finite number of different kinds of tiles, and have it be aperiodic however much we try to make it periodic? And through about 1966 mathematicians would have mostly guessed that no, you couldn’t. If you had a set of tiles that would cover the plane aperiodically, there was also some way to do it periodically.

And then in 1966 came a surprising result. No, not Penrose tiles. I know you want me there. I’ll get there. Not there yet though. In 1966 Robert Berger — who also attended Rensselaer Polytechnic Institute, thank you — discovered such a tiling. It’s aperiodic, and it can’t be made periodic. Why do we know Penrose Tiles rather than Berger Tiles? Couple reasons, including that Berger has to use 20,426 distinct tile shapes. In 1971 Raphael M Robinson simplified matters a bit and got that down to six shapes. Roger Penrose in 1974 squeezed the set down to two, although by adding some rules about what edges may and may not touch one another. (You can turn this into a pure edges thing by putting notches into the shapes.) That really caught the public imagination. It’s got simplicity and accessibility to combine with beauty. Aperiodic tiles seem to relate to “quasicrystals”, which are what the name suggests and do happen in some materials. And they’ve got beauty. Aperiodic tiling embraces our need to have not too much order in our order.

I’ve discussed, in all this, tiling the plane. It’s an easy surface to think about and a popular one. But we can form tiling questions about other shapes. Cylinders, spheres, and toruses seem like they should have good tiling questions available. And we can imagine “tiling” stuff in more dimensions too. If we can fill a volume with cubes, or rectangles, it’s natural to wonder what other shapes we can fill it with. My impression is that fewer definite answers are known about the tiling of three- and four- and higher-dimensional space. Possibly because it’s harder to sketch out ideas and test them. Possibly because the spaces are that much stranger. I would be glad to hear more.

In 2017 I reverted to just one A-to-Z per year. And I got banner art for the first time. It’s a small bit of polish that raised my apparent professionalism a whole order of magnitude. And for the letter T, I did something no pop mathematics blog had ever done before. I wrote about topology without starting from stretchy rubber doughnuts and coffee cups. Let me prove that to you now.

Today’s glossary entry comes from Elke Stangl, author of the Elkemental Force blog. I’ll do my best, although it would have made my essay a bit easier if I’d had the chance to do another topic first. We’ll get there.

Topology.

Start with a universe. Nice thing to have around. Call it ‘M’. I’ll get to why that name.

I’ve talked a fair bit about weird mathematical objects that need some bundle of traits to be interesting. So this will change the pace some. Here, I request only that the universe have a concept of “sets”. OK, that carries a little baggage along with it. We have to have intersections and unions. Those come about from having pairs of sets. The intersection of two sets is all the things that are in both sets simultaneously. The union of two sets is all the things that are in one set, or the other, or both simultaneously. But it’s hard to think of something that could have sets that couldn’t have intersections and unions.

So from your universe ‘M’ create a new collection of things. Call it ‘T’. I’ll get to why that name. But if you’ve formed a guess about why, then you know. So I suppose I don’t need to say why, now. ‘T’ is a collection of subsets of ‘M’. Now let’s suppose these four things are true.

First. ‘M’ is one of the sets in ‘T’.

Second. The empty set ∅ (which has nothing at all in it) is one of the sets in ‘T’.

Third. Whenever two sets are in ‘T’, their intersection is also in ‘T’.

Fourth. Whenever two (or more) sets are in ‘T’, their union is also in ‘T’.

Got all that? I imagine a lot of shrugging and head-nodding out there. So let’s take that. Your universe ‘M’ and your collection of sets ‘T’ are a topology. And that’s that.

Yeah, that’s never that. Let me put in some more text. Suppose we have a universe that consists of two symbols, say, ‘a’ and ‘b’. There’s four distinct topologies you can make of that. Take the universe plus the collection of sets {∅}, {a}, {b}, and {a, b}. That’s a topology. Try it out. That’s the first collection you would probably think of.

Here’s another collection. Take this two-thing universe and the collection of sets {∅}, {a}, and {a, b}. That’s another topology and you might want to double-check that. Or there’s this one: the universe and the collection of sets {∅}, {b}, and {a, b}. Last one: the universe and the collection of sets {∅} and {a, b} and nothing else. That one barely looks legitimate, but it is. Not a topology: the universe and the collection of sets {∅}, {a}, and {b}.

The number of toplogies grows surprisingly with the number of things in the universe. Like, if we had three symbols, ‘a’, ‘b’, and ‘c’, there would be 29 possible topologies. The universe of the three symbols and the collection of sets {∅}, {a}, {b, c}, and {a, b, c}, for example, would be a topology. But the universe and the collection of sets {∅}, {a}, {b}, {c}, and {a, b, c} would not. It’s a good thing to ponder if you need something to occupy your mind while awake in bed.

With four symbols, there’s 355 possibilities. Good luck working those all out before you fall asleep. Five symbols have 6,942 possibilities. You realize this doesn’t look like any expected sequence. After ‘4’ the count of topologies isn’t anything obvious like “two to the number of symbols” or “the number of symbols factorial” or something.

Are you getting ready to call me on being inconsistent? In the past I’ve talked about topology as studying what we can know about geometry without involving the idea of distance. How’s that got anything to do with this fiddling about with sets and intersections and stuff?

So now we come to that name ‘M’, and what it’s finally mnemonic for. I have to touch on something Elke Stangl hoped I’d write about, but a letter someone else bid on first. That would be a manifold. I come from an applied-mathematics background so I’m not sure I ever got a proper introduction to manifolds. They appeared one day in the background of some talk about physics problems. I think they were introduced as “it’s a space that works like normal space”, and that was it. We were supposed to pretend we had always known about them. (I’m translating. What we were actually told would be that it “works like R^{3}”. That’s how mathematicians say “like normal space”.) That was all we needed.

Properly, a manifold is … eh. It’s something that works kind of like normal space. That is, it’s a set, something that can be a universe. And it has to be something we can define “open sets” on. The open sets for the manifold follow the rules I gave for a topology above. You can make a collection of these open sets. And the empty set has to be in that collection. So does the whole universe. The intersection of two open sets in that collection is itself in that collection. The union of open sets in that collection is in that collection. If all that’s true, then we have a manifold.

And now the piece that makes every pop mathematics article about topology talk about doughnuts and coffee cups. It’s possible that two topologies might be homeomorphic to each other. “Homeomorphic” is a term of art. But you understand it if you remember that “morph” means shape, and suspect that “homeo” is probably close to “homogenous”. Two things being homeomorphic means you can match their parts up. In the matching there’s nothing left over in the first thing or the second. And the relations between the parts of the first thing are the same as the relations between the parts of the second thing.

So. Imagine the snippet of the number line for the numbers larger than -π and smaller than π. Think of all the open sets you can use to cover that. It will have a set like “the numbers bigger than 0 and less than 1”. A set like “the numbers bigger than -π and smaller than 2.1”. A set like “the numbers bigger than 0.01 and smaller than 0.011”. And so on.

Now imagine the points that exist on a circle, if you’ve omitted one point. Let’s say it’s the unit circle, centered on the origin, and that what we’re leaving out is the point that’s exactly to the left of the origin. The open sets for this are the arcs that cover some part of this punctured circle. There’s the arc that corresponds to the angles from 0 to 1 radian measure. There’s the arc that corresponds to the angles from -π to 2.1 radians. There’s the arc that corresponds to the angles from 0.01 to 0.011 radians. You see where this is going. You see why I say we can match those sets on the number line to the arcs of this punctured circle. There’s some details to fill in here. But you probably believe me this could be done if I had to.

There’s two (or three) great branches of topology. One is called “algebraic topology”. It’s the one that makes for fun pop mathematics articles about imaginary rubber sheets. It’s called “algebraic” because this field makes it natural to study the holes in a sheet. And those holes tend to form groups and rings, basic pieces of Not That Algebra. The field (I’m told) can be interpreted as looking at functors on groups and rings. This makes for some neat tying-together of subjects this A To Z round.

The other branch is called “differential topology”, which is a great field to study because it sounds like what Mister Spock is thinking about. It inspires awestruck looks where saying you study, like, Bayesian probability gets blank stares. Differential topology is about differentiable functions on manifolds. This gets deep into mathematical physics.

As you study mathematical physics, you stop worrying about ever solving specific physics problems. Specific problems are petty stuff. What you like is solving whole classes of problems. A steady trick for this is to try to find some properties that are true about the problem regardless of what exactly it’s doing at the time. This amounts to finding a manifold that relates to the problem. Consider a central-force problem, for example, with planets orbiting a sun. A planet can’t move just anywhere. It can only be in places and moving in directions that give the system the same total energy that it had to start. And the same linear momentum. And the same angular momentum. We can match these constraints to manifolds. Whatever the planet does, it does it without ever leaving these manifolds. To know the shapes of these manifolds — how they are connected — and what kinds of functions are defined on them tells us something of how the planets move.

The maybe-third branch is “low-dimensional topology”. This is what differential topology is for two- or three- or four-dimensional spaces. You know, shapes we can imagine with ease in the real world. Maybe imagine with some effort, for four dimensions. This kind of branches out of differential topology because having so few dimensions to work in makes a lot of problems harder. We need specialized theoretical tools that only work for these cases. Is that enough to count as a separate branch? It depends what topologists you want to pick a fight with. (I don’t want a fight with any of them. I’m over here in numerical mathematics when I’m not merely blogging. I’m happy to provide space for anyone wishing to defend her branch of topology.)

But each grows out of this quite general, quite abstract idea, also known as “point-set topology”, that’s all about sets and collections of sets. There is much that we can learn from thinking about how to collect the things that are possible.

It’s difficult to remember but there was a time I didn’t just post three A-to-Z essays in a week, but I did two such sequences in a year. It’s hard to imagine having that much energy now. The End 2016 A-to-Z got that name, rather than “End Of 2016”, because — hard as this may be to believe now — 2016 seemed like a particularly brutal year that we could not wait to finish. Unfortunately it turned out to be one of those years that will get pop-histories with subtitles like “Twelve Months That Changed The World” or “The Crisis Of Our Times”. Still, this piece shows off some of what I think characteristic of my writing: an interest in the legends that accrue around mathematical fields, and my reasons to be skeptical of the legends.

Graph theory arises whenever we have a bunch of things that can be connected. We call the things “vertices”, because that’s a good corner-type word. The connections we call “edges”, because that’s a good connection-type word. It’s easy to create graphs that look like the edges of a crystal, especially if you draw edges as straight as much as possible. You don’t have to. You can draw them curved. Then they look like the scary tangles of wire around your wireless router complex.

Graph theory really got organized in the 19th century, and went crazy in the 20th. It turns out there’s lots of things that connect to other things. Networks, whether computers or social or thematically linked concepts. Anything that has to be delivered from one place to another. All the interesting chemicals. Anything that could be put in a pipe or taken on a road has some graph theory thing applicable to it.

A lot of graph theory ponders loops. The original problem was about how to use every bridge, every edge, exactly one time. Look at a tangled mass of a graph and it’s hard not to start looking for loops. They’re often interesting. It’s not easy to tell if there’s a loop that lets you get to every vertex exactly once.

What if there aren’t loops? What if there aren’t any vertices you can step away from and get back to by another route? Well, then you have a tree.

A tree’s a graph where all the vertices are connected so that there aren’t any closed loops. We normally draw them with straight lines, the better to look like actual trees. We then stop trying to make them look like actual trees by doing stuff like drawing them as a long horizontal spine with a couple branches sticking off above and below, or as * type stars, or H shapes. They still correspond to real-world things. If you’re not sure how consider the layout of one of those long, single-corridor hallways as in a hotel or dormitory. The rooms connect to one another as a tree once again, as long as no room opens to anything but its own closet or bathroom or the central hallway.

We can talk about the radius of a graph. That’s how many edges away any point can be from the center of the tree. And every tree has a center. Or two centers. If it has two centers they share an edge between the two. And that’s one of the quietly amazing things about trees to me. However complicated and messy the tree might be, we can find its center. How many things allow us that?

A tree might have some special vertex. That’s called the ‘root’. It’s what the vertices and the connections represent that make a root; it’s not something inherent in the way trees look. We pick one for some special reason and then we highlight it. Maybe put it at the bottom of the drawing, making ‘root’ for once a sensible name for a mathematics thing. Often we put it at the top of the drawing, because I guess we’re just being difficult. Well, we do that because we were modelling stuff where a thing’s properties depend on what it comes from. And that puts us into thoughts of inheritance and of family trees. And weird as it is to put the root of a tree at the top, it’s also weird to put the eldest ancestors at the bottom of a family tree. People do it, but in those illuminated drawings that make a literal tree out of things. You don’t see it in family trees used for actual work, like filling up a couple pages at the start of a king or a queen’s biography.

Trees give us neat new questions to ponder, like, how many are there? I mean, if you have a certain number of vertices then how many ways are there to arrange them? One or two or three vertices all have just the one way to arrange them. Four vertices can be hooked up a whole two ways. Five vertices offer a whole three different ways to connect them. Six vertices offer six ways to connect and now we’re finally getting something interesting. There’s eleven ways to connect seven vertices, and 23 ways to connect eight vertices. The number keeps on rising, but it doesn’t follow the obvious patterns for growth of this sort of thing.

And if that’s not enough to idly ponder then think of destroying trees. Draw a tree, any shape you like. Pick one of the vertices. Imagine you obliterate that. How many separate pieces has the tree been broken into? It might be as few as two. It might be as many as the number of remaining vertices. If graph theory took away the pastime of wandering around Köningsburg’s bridges, it has given us this pastime we can create anytime we have pen, paper, and a long meeting.

The second time I did one of these A-to-Z’s, I hit on the idea of asking people for suggestions. It was a good move as it opened up subjects I had not come close to considering. I didn’t think to include the instructions for making your own transcendental number, though. You never get craft projects in mathematics, not after you get past the stage of making construction-paper rhombuses or something. I am glad to see my schtick of including a warning about using this stuff at your thesis defense was established by then.

Take a huge bag and stuff all the real numbers into it. Give the bag a good solid shaking. Stir up all the numbers until they’re thoroughly mixed. Reach in and grab just the one. There you go: you’ve got a transcendental number. Enjoy!

OK, I detect some grumbling out there. The first is that you tried doing this in your head because you somehow don’t have a bag large enough to hold all the real numbers. And you imagined pulling out some number like “2” or “37” or maybe “one-half”. And you may not be exactly sure what a transcendental number is. But you’re confident the strangest number you extracted, “minus 8”, isn’t it. And you’re right. None of those are transcendental numbers.

I regret saying this, but that’s your own fault. You’re lousy at picking random numbers from your head. So am I. We all are. Don’t believe me? Think of a positive whole number. I predict you probably picked something between 1 and 10. Almost surely something between 1 and 100. Surely something less than 10,000. You didn’t even consider picking something between 10,012,002,214,473,325,937,775 and 10,012,002,214,473,325,937,785. Challenged to pick a number, people will select nice and familiar ones. The nice familiar numbers happen not to be transcendental.

I detect some secondary grumbling there. Somebody picked π. And someone else picked e. Very good. Those are transcendental numbers. They’re also nice familiar numbers, at least to people who like mathematics a lot. So they attract attention.

Still haven’t said what they are. What they are traces back, of course, to polynomials. Take a polynomial that’s got one variable, which we call ‘x’ because we don’t want to be difficult. Suppose that all the coefficients of the polynomial, the constant numbers we presumably know or could find out, are integers. What are the roots of the polynomial? That is, for what values of x is the polynomial a complicated way of writing ‘zero’?

For example, try the polynomial x^{2} – 6x + 5. If x = 1, then that polynomial is equal to zero. If x = 5, the polynomial’s equal to zero. Or how about the polynomial x^{2} + 4x + 4? That’s equal to zero if x is equal to -2. So a polynomial with integer coefficients can certainly have positive and negative integers as roots.

How about the polynomial 2x – 3? Yes, that is so a polynomial. This is almost easy. That’s equal to zero if x = 3/2. How about the polynomial (2x – 3)(4x + 5)(6x – 7)? It’s my polynomial and I want to write it so it’s easy to find the roots. That polynomial will be zero if x = 3/2, or if x = -5/4, or if x = 7/6. So a polynomial with integer coefficients can have positive and negative rational numbers as roots.

How about the polynomial x^{2} – 2? That’s equal to zero if x is the square root of 2, about 1.414. It’s also equal to zero if x is minus the square root of 2, about -1.414. And the square root of 2 is irrational. So we can certainly have irrational numbers as roots.

So if we can have whole numbers, and rational numbers, and irrational numbers as roots, how can there be anything else? Yes, complex numbers, I see you raising your hand there. We’re not talking about complex numbers just now. Only real numbers.

It isn’t hard to work out why we can get any whole number, positive or negative, from a polynomial with integer coefficients. Or why we can get any rational number. The irrationals, though … it turns out we can only get some of them this way. We can get square roots and cube roots and fourth roots and all that. We can get combinations of those. But we can’t get everything. There are irrational numbers that are there but that even polynomials can’t reach.

It’s all right to be surprised. It’s a surprising result. Maybe even unsettling. Transcendental numbers have something peculiar about them. The 19th Century French mathematician Joseph Liouville first proved the things must exist, in 1844. (He used continued fractions to show there must be such things.) It would be seven years later that he gave an example of one in nice, easy-to-understand decimals. This is the number 0.110 001 000 000 000 000 000 001 000 000 (et cetera). This number is zero almost everywhere. But there’s a 1 in the n-th digit past the decimal if n is the factorial of some number. That is, 1! is 1, so the 1st digit past the decimal is a 1. 2! is 2, so the 2nd digit past the decimal is a 1. 3! is 6, so the 6th digit past the decimal is a 1. 4! is 24, so the 24th digit past the decimal is a 1. The next 1 will appear in spot number 5!, which is 120. After that, 6! is 720 so we wait for the 720th digit to be 1 again.

And what is this Liouville number 0.110 001 000 000 000 000 000 001 000 000 (et cetera) used for, besides showing that a transcendental number exists? Not a thing. It’s of no other interest. And this plagued the transcendental numbers until 1873. The only examples anyone had of transcendental numbers were ones built to show that they existed. In 1873 Charles Hermite showed finally that e, the base of the natural logarithm, was transcendental. e is a much more interesting number; we have reasons to care about it. Every exponential growth or decay or oscillating process has e lurking in it somewhere. In 1882 Ferdinand von Lindemann showed that π was transcendental, and that’s an even more interesting number.

That bit about π has interesting implications. One goes back to the ancient Greeks. Is it possible, using straightedge and compass, to create a square that’s exactly the same size as a given circle? This is equivalent to saying, if I give you a line segment, can you create another line segment that’s exactly the square root of π times as long? This geometric problem is equivalent to an algebraic one. That problem: can you create a polynomial, with integer coefficients, that has the square root of π as a root? (WARNING: I’m skipping some important points for the sake of clarity. DO NOT attempt to use this to pass your thesis defense without putting those points back in.) We want the square root of π because … well, what’s the area of a square whose sides are the square root of π long? That’s right. So we start with a line segment that’s equal to the radius of the circle and we can do that, surely. Once we have the radius, can’t we make a line that’s the square root of π times the radius, and from that make a square with area exactly π times the radius squared? Since π is transcendental, then, no. We can’t. Sorry. One of the great problems of ancient mathematics, and one that still has the power to attract the casual mathematician, got its final answer in 1882.

Georg Cantor is a name even non-mathematicians might recognize. He showed there have to be some infinite sets bigger than others, and that there must be more real numbers than there are rational numbers. Four years after showing that, he proved there are as many transcendental numbers as there are real numbers.

They’re everywhere. They permeate the real numbers so much that we can understand the real numbers as the transcendental numbers plus some dust. They’re almost the dark matter of mathematics. We don’t actually know all that many of them. Wolfram MathWorld has a table listing numbers proven to be transcendental, and the fact we can list that on a single web page is remarkable. Some of them are large sets of numbers, yes, like for every positive whole number d. And we can infer many more from them; if π is transcendental then so is 2π, and so is 5π, and so is -20.38π, and so on. But the table of numbers proven to be irrational is still just 25 rows long.

There are even mysteries about obvious numbers. π is transcendental. So is e. We know that at least one of π times e and π plus e is transcendental. Perhaps both are. We don’t know which one is, or if both are. We don’t know whether π^{π} is transcendental. We don’t know whether e^{e} is, either. Don’t even ask if π^{e} is.

How, by the way, does this fit with my claim that everything in mathematics is polynomials? — Well, we found these numbers in the first place by looking at polynomials. The set is defined, even to this day, by how a particular kind of polynomial can’t reach them. Thinking about a particular kind of polynomial makes visible this interesting set.

Of course I can’t just take a break for the sake of having a break. I feel like I have to do something of interest. So why not make better use of my several thousand past entries and repost one? I’d just reblog it except WordPress’s system for that is kind of rubbish. So here’s what I wrote, when I was first doing A-to-Z’s, back in summer of 2015. Somehow I was able to post three of these a week. I don’t know how.

I had remembered this essay as mostly describing the boring part of tensors, that we usually represent them as grids of numbers and then symbols with subscripts and superscripts. I’m glad to rediscover that I got at why we do such things to numbers and subscripts and superscripts.

Tensor.

The true but unenlightening answer first: a tensor is a regular, rectangular grid of numbers. The most common kind is a two-dimensional grid, so that it looks like a matrix, or like the times tables. It might be square, with as many rows as columns, or it might be rectangular.

It can also be one-dimensional, looking like a row or a column of numbers. Or it could be three-dimensional, rows and columns and whole levels of numbers. We don’t try to visualize that. It can be what we call zero-dimensional, in which case it just looks like a solitary number. It might be four- or more-dimensional, although I confess I’ve never heard of anyone who actually writes out such a thing. It’s just so hard to visualize.

You can add and subtract tensors if they’re of compatible sizes. You can also do something like multiplication. And this does mean that tensors of compatible sizes will form a ring. Of course, that doesn’t say why they’re interesting.

Tensors are useful because they can describe spatial relationships efficiently. The word comes from the same Latin root as “tension”, a hint about how we can imagine it. A common use of tensors is in describing the stress in an object. Applying stress in different directions to an object often produces different effects. The classic example there is a newspaper. Rip it in one direction and you get a smooth, clean tear. Rip it perpendicularly and you get a raggedy mess. The stress tensor represents this: it gives some idea of how a force put on the paper will create a tear.

Tensors show up a lot in physics, and so in mathematical physics. Technically they show up everywhere, since vectors and even plain old numbers (scalars, in the lingo) are kinds of tensors, but that’s not what I mean. Tensors can describe efficiently things whose magnitude and direction changes based on where something is and where it’s looking. So they are a great tool to use if one wants to represent stress, or how well magnetic fields pass through objects, or how electrical fields are distorted by the objects they move in. And they describe space, as well: general relativity is built on tensors. The mathematics of a tensor allow one to describe how space is shaped, based on how to measure the distance between two points in space.

My own mathematical education happened to be pretty tensor-light. I never happened to have courses that forced me to get good with them, and I confess to feeling intimidated when a mathematical argument gets deep into tensor mathematics. Joseph C Kolecki, with NASA’s Glenn (Lewis) Research Center, published in 2002 a nice little booklet “An Introduction to Tensors for Students of Physics and Engineering”. This I think nicely bridges some of the gap between mathematical structures like vectors and matrices, that mathematics and physics majors know well, and the kinds of tensors that get called tensors and that can be intimidating.

Mathematics is like every field in having jargon. Some jargon is unique to the field; there is no lay meaning of a “homeomorphism”. Some jargon is words plucked from the common language, such as “smooth”. The common meaning may guide you to what mathematicians want in it. A smooth function has a graph with no gaps, no discontinuities, no sharp corners; you can see smoothness in it. Sometimes the common meaning is an ambiguous help. A “series” is the sum of a sequence of numbers, that is, it is one number. Mathematicians study the series, but by looking at properties of the sequence.

So what sort of jargon is “atlas”? In common English, an atlas is a book of maps. Each map represents something different. Perhaps a different region of space. Perhaps a different scale, or a different projection altogether. The maps may show different features, or show them at different times. The maps must be about the same sort of thing. No slipping a map of Narnia in with the map of an amusement park, unless you warn of that in the title. The maps must not contradict one another. (So far as human-made things can be consistent, anyway.) And that’s the important stuff.

Atlas is the first kind of common-word jargon. Mathematicians use it to mean a collection of things. Those collected things aren’t mathematical maps. “Map” is the second type of jargon. The collected things are coordinate charts. “Coordinate chart” is a pairing of words not likely to appear in common English. But if you did encounter them? The meaning you might guess from their common use is not far off their mathematical use.

A coordinate chart is a matching of the points in an open set to normal coordinates. Euclidean coordinates, to be precise. But, you know, latitude and longitude, if it’s two dimensional. Add in the altitude if it’s three dimensions. Your x-y-z coordinates. It still counts if this is one dimension, or four dimensions, or sixteen dimensions. You’re less likely to draw a sketch of those. (In practice, you draw a sketch of a three-dimensional blob, and put N = 16 off in the corner, maybe in a box.)

These coordinate charts are on a manifold. That’s the second type of common-language jargon. Manifold, to pick the least bad of its manifold common definitions, is a “complicated object or subject”. The mathematical manifold is a surface. The things on that surface are connected by relationships that could be complicated. But the shape can be as simple as a plane or a sphere or a torus.

Every point on a coordinate chart needs some unique set of coordinates. And if a point appears on two coordinate charts, they have to be consistent. Consistent here is the matching between charts being a homeomorphism. A homeomorphism is a map, in the jargon sense. So it’s a function matching open sets on one chart to ope sets in the other chart. There’s more to it (there always is). But the important thing is that, away from the edges of the chart, we don’t create any new gaps or punctures or missing sections.

Some manifolds are easy to spot. The surface of the Earth, for example. Many are easy to come up with charts for. Think of any map of the Earth. Each point on the surface of the Earth matches some point on the sheet of paper. The coordinate chart is … let’s say how far your point is from the upper left corner of the page. (Pretend that you can measure those points precisely enough to match them to, like, the town you’re in.) Could be how far you are from the center, or the lower right corner, or whatever. These are all as good, and even count as other coordinate charts.

It’s easy to imagine that as latitude and longitude. We see maps of the world arranged by latitude and longitude so often. And that’s fine; latitude and longitude makes a good chart. But we have a problem in giving coordinates to the north and south pole. The latitude is easy but the longitude? So we have two points that can’t be covered on the map. We can save our atlas by having a couple charts. For the Earth this can be a map of most of the world arranged by latitude and longitude, and then two insets showing a disc around the north and the south poles. Thus we have an atlas of three charts.

We can make this a little tighter, reducing this to two charts. Have one that’s your normal sort of wall map, centered on the equator. Have the other be a transverse Mercator map. Make its center the great circle going through the prime meridian and the 180-degree antimeridian. Then every point on the planet, including the poles, has a neat unambiguous coordinate in at least one chart. A good chunk of the world will be on both charts. We can throw in more charts if we like, but two is enough.

The requirements to be an atlas aren’t hard to meet. So a lot of geometric structures end up being atlases. Theodore Frankel’s wonderful The Geometry of Physics introduces them on page 15. But that’s also the last appearance of “atlas”, at least in the index. The idea gets upstaged. The manifolds that the atlas charts end up being more interesting. Many problems about things in motion are easy to describe as paths traced out on manifolds. A large chunk of mathematical physics is then looking at this problem and figuring out what the space of possible behaviors looks like. What its topology is.

In a sense, the mathematical physicist might survey a problem, like a scout exploring new territory, more than solve it. This exploration brings us to directional derivatives. To tangent bundles. To other terms, jargon only partially informed by the common meanings.

All right, I can be a little more clear. By the inverse I mean subtraction is the name the name we give to adding the additive inverse of something. It’s what lets addition be a group action. That is, we write to mean we find whatever number, added to b, gives us 0. Then we add that to a. We do this pretty often, so it’s convenient to have a name for it. The word “subtraction” appears in English from about 1400. It grew from the Latin for “taking away”. By about 1425 the word has its mathematical meaning. I imagine this wasn’t too radical a linguistic evolution.

All right, so some other thoughts. What’s so interesting about subtraction that it’s worth a name? We don’t have a particular word for reversing, say, a permutation. But don’t go very far in school not thinking about inverting an addition. Must come down to subtraction’s practical use in finding differences between things. Often in figuring out change. Debts at least. Nobody needs the inverse of a permutation unless they’re putting a deck of cards back in order.

Subtraction has other roles, though. Not so much in mathematics, but in teaching us how to learn about mathematics. For example, subtraction gives us a good reason to notice zero. Zero, the additive identity, is implicit to addition. But if you’re learning addition, and you think of it as “put these two piles of things together into one larger pile”? What good does an empty pile do you there? It’s easy to not notice there’s a concept there. But subtraction, taking stuff away from a pile? You can imagine taking everything away, and wanting a word for that. This isn’t the only way to notice zero is worth some attention. It’s a good way, though.

There’s more, though. Learning subtraction teaches us limits of what we can do, mathematically. We can add 3 to 7 or, if it’s more convenient, 7 to 3. But we learn from the start that while we can subtract 3 from 7, there’s no subtracting 7 from 3. This is true when we’re learning arithmetic and numbers are all positive. Some time later we ask, what happens if we go ahead and do this anyway? And figure out a number that makes sense as the answer to “what do you get subtracting 7 from 3”? This introduces us to the negative numbers. It’s a richer idea of what it is to have numbers. We can start to see addition and subtraction as expressions of the same operation.

But we also notice they’re not quite the same. As mentioned, addition can be done in any order. If I need to do 7 + 4 + 3 + 6 I can decide I’d rather do 4 + 6 + 7 + 3 and make that 10 + 10 before getting to 20. This all simplifies my calculating. If I need to do 7 – 4 – 3 – 6 I get into a lot of trouble if I simplify my work by writing 4 – 6 – 7 – 3 instead. Even if I decide I’d rather take the 3 – 6 and turn that into a negative 3 first, I’ve made a mess of things.

The first property this teaches us to notice we call “commutativity”. Most mathematical operations don’t have that. But a lot of the ones we find useful do. The second property this points out is “associativity”, which more of the operations we find useful have. It’s not essential that someone learning how to calculate know this is a way to categorize mathematics operations. (I’ve read that before the New Math educational reforms of the 1960s, American elementary school mathematics textbooks never mentioned commutativity or associativity.) But I suspect it is essential that someone learning mathematics learn the things you can do come in families.

So let me mention division, the inverse of multiplication. (And that my chosen theme won’t let me get to in sequence.) Like subtraction, division refuses to be commutative or associative. Subtraction prompts us to treat the negative numbers as something useful. In parallel, division prompts us to accept fractions as numbers. (We accepted fractions as numbers long before we accepted negative numbers, mind. Anyone with a pie and three friends has an interest in “one-quarter” that they may not have with “negative four”.) When we start learning about numbers raised to powers, or exponentials, we have questions ready to ask. How do the operations behave? Do they encourage us to find other kinds of number?

And we also think of how to patch up subtraction’s problems. If we want subtraction to be a kind of addition, we have to get precise about what that little subtraction sign means. What we’ve settled on is that is shorthand for , where is the additive inverse of .

Once we do that all subtraction’s problems with commutativity and associativity go away. 7 – 4 – 3 – 6 becomes 7 + (-4) + (-3) + (-6), and that we can shuffle around however convenient. Say, to 7 + (-3) + (-4) + (-6), then to 7 + (-3) + (-10), then to 4 + (-10), and so -6. Thus do we domesticate a useful, wild operation like subtraction.

Any individual subtraction has one right answer. There are many ways to get there, though. I had learned, for example, to do a problem such as 738 minus 451 by subtracting one column of numbers at a time. Right to left, so, subtracting 8 minus 1, and then 3 minus 5, and after the borrowing then 6 minus 4. I remember several elementary school textbooks explaining borrowing as unwrapping rolls of dimes. It was a model well-suited to me.

We don’t need to, though. We can go from the left to the right, doing 7 minus 4 first and 8 minus 1 last. We can go through and figure out all the possible carries before doing any work. There’s a slick method called partial differences which skips all the carrying. But it demands writing out several more intermediate terms. This uses more paper, but if there isn’t a paper shortage, so what?

There are more ways to calculate. If we turn things over to a computer, we’re likely to do subtraction using a complements technique. When I say computer you likely think electronic computer, or did right up to the adjective there. But mechanical computers were a thing too. Blaise Pascal’s computing device of the 1650s used nines’ complements to subtract on the gears that did addition. Explaining the trick would take me farther afield than I want to go now. But, you know how, like, 6 plus 3 is 9? So you can turn a subtraction of 6 into an addition of 3. Or a subtraction of 3 into an addition of 6. Plus some bookkeeping.

A digital computer is likely to use ones’ complements, representing every number as a string of 0’s and 1’s. This has great speed advantages. The complement of 0 is 1 and vice-versa, and it’s very quick for a computer to swap between 0 and 1. Subtraction by complements is different and, to my eye, takes more steps. But they might be steps you do better.

One more thought subtraction gives us, though. In a previous paragraph I wrote out 7 – 4, and also wrote 7 + (-4). We use the symbol – for two things. Do those two uses of – mean the same thing? You may think I’m being fussy here. After all, the value of -4 is the same as the value of 0 – 4. And even a fussy mathematician says whichever of “minus four” and “negative four” better fits the meter of the sentence. But our friends in the philosophy department would agree this is a fair question. Are we collapsing two related ideas together by using the same symbol for them?

My inclination is to say that the – of -4 is different from the – in 0 – 4, though. The – in -4 is a unary operation: it means “give me the inverse of the number on the right”. The – in 0 – 4 is a binary operation: it means “subtract the number on the right from the number on the left”. So I would say these are different things sharing a symbol. Unfortunately our friends in the philosophy department can’t answer the question for us. The university laid them off four years ago, part of society’s realignment away from questions like “how can we recognize when a thing is true?” and towards “how can we teach proto-laborers to use Excel macros?”. We have to use subtraction to expand our thinking on our own.

Jacob Siehler, a friend from Mathstodon, and Assistant Professor at Gustavus Adolphus College, offered several good topics for the letter ‘C’. I picked the one that seemed to connect to the greatest number of other topics I’ve covered recently.

Convex

It’s easy to say what convex is, if we’re talking about shapes in ordinary space. A convex shape is one where the line connecting any two points inside the shape always stays inside the shape. Circles are convex. Triangles and rectangles too. Star shapes are not. Is a torus? That depends. If it’s a doughnut shape sitting in some bigger space, then it’s not convex. If the doughnut shape is all the space there is to consider, then it is. There’s a parallel here to prime numbers. Whether 5 is a prime depends on whether you think 5 is an integer, a real number, or a complex number.

Still, this seems easy to the point of boring. So how does Wolfram Mathworld match 337 items for ‘convex’? For a sense of scale, it has only 112 matches for ‘quadrilateral’. This is a word used almost as much as ‘quadratic’, with 370 items. Why?

Why is that it’s one of those terms that sneaks in everywhere. Some of it is obvious. There’s a concept called “star-convex”, where two points only need a connection by some path. It doesn’t have to be a straight line. That’s a familiar mathematical trick, coming up with a less-demanding version of a property. There’s the “convex hull”, which is the smallest convex set that contains a given set of points. We even come up with “convex functions”, functions of real numbers. A function’s convex if, the space above the graph of a function is convex. This seems like stretching the idea of convexity rather a bit.

Still, we wouldn’t coin such a term if we couldn’t use it. Well, if someone couldn’t use it. The saving thing here is the idea of “space”. We get it from our idea of what space is from looking around rooms and walking around hills and stuff. But what makes something a space? When we look at what’s essential? What we need is traits like, there are things. We can measure how far apart things are. We have some idea of paths between things. That’s not asking a lot.

So many things become spaces. And so convexity sneaks in everywhere. A convex function has nice properties if you’re looking for minimums. Or maximums; that’s as easy to do. And we look for minimums a lot. A large, practical set of mathematics is the search for optimum values, the set of values that maximize, or minimize, something. You may protest that not everything we’re intersted in is a convex function. This is true. But a lot of what we are interested in is, or is approximately.

This gets into some surprising corners. Economics, for example. The mathematics of economics is often interested in how much of a thing you can make. But you have to put things in to make it. You expect, at least once the system is set up, that if you halve the components you put in you get half the thing out. Or double the components in and get double the thing out. But you can run out of the components. Or related stuff, like, floor space to store partly-complete product. Or transport available to send this stuff to the customer. Or time to get things finished. For our needs these are all “things you can run out of”.

And so we have a problem of linear programming. We have something or other we want to optimize. Call it . It depends on a whole range of variables, which we describe as a vector . And we have constraints. Each of these is an inequality; we can represent that as demanding some functions of these variables be at most some numbers. We can bundle those functions together as a matrix called . We can bundle those maximum numbers together as a vector called . So the problem is finding . Also, we demand that none of these values be smaller than some minimum we might as well call 0. The range of all the possible values of these variables is a space. These constraints chop up that space, into a shape. Into a convex shape, of course, or this paragraph wouldn’t belong in this essay. If you need to be convinced of this, imagine taking a wedge of cheese and hacking away slices all the way through it. How do you cut a cave or a tunnel in it?

So take this convex shape, called a polytope. That’s what we call a polygon or polyhedron if we don’t want to commit to any particular number of dimensions of space. (If we’re being careful. My suspicion is ‘polyhedron’ is more often said.) This makes a shape. Some point in that shape has the best possible value of . (Also the worst, if that’s your thing.) Where is it? There is an answer, and it gives a pretext to share a fun story. The answer is that it’s on the outside, on one of the faces of the polytope. And you can find it following along the edges of those polytopes. This we know as the simplex method, or Dantzig’s Simplex Method if we must be more particular, for George Dantzig. Its success relies on looking at convex functions in convex spaces and how much this simplifies finding things.

Usually. The simplex method is one of polynomial-order complexity for normal, typical problems. That’s a measure of how much longer it takes to find an answer as you get more variables, more constraints, more work. Polynomial is okay, growing about the way it takes longer to multiply when you have more digits in the numbers. But there’s a worst case, in which the complexity grows exponentially. We shy away from exponential-complexity because … you know, exponentials grow fast, given a chance. What saves us is that that’s a worst case, not a typical case. The convexity lets us set up our problem and, rather often, solve it well enough.

Now the story, a mutation of which it’s likely you encountered. George Dantzig, as a student in Jerzy Neyman’s statistics class, arrived late one day to find a couple problems on the board. He took these to be homework, and struggled with the harder-than-usual set. But turned them in, apologizing for them being late. Neyman accepted the work, and eventually got around to looking at it. This wasn’t the homework. This was some unsolved problems in statistics. Six weeks later Neyman had prepared them for publication. A year later, Neyman explained to Dantzig that all he needed to earn his PhD was put these two papers together in a nice binder.

This cute story somehow escaped into the wild. It became an inspirational tale for more than mathematics grad students. That part’s easy to see; it has most everything inspiration needs. It mutated further, into the movie Good Will Hunting. I do not know that the unsolved problems, work done in the late 1930s, related to Dantzig’s simplex method, proved after World War II. It may be that they are simply connected in their originator. But perhaps it is more than I realize now.

I’m approaching the end of this year’s little Mathematics A-to-Z. The project’s been smaller, as I’d hoped, although I’m not sure I managed to make it any less hard on myself. Still, I’m glad to be doing it and glad to have the suggestions of you kind readers for topics. This quartet should wrap up the year, and the project.

So please let me know of any topics you’d like to see me try taking on. The topic should be anything mathematics-related, although I tend to take a broad view of mathematics-related. (I’m also open to biographical sketches.) To suggest something, please, say so in a comment. If you do, please also let me know about any projects you have — blogs, YouTube channels, real-world projects — that I should mention at the top of that essay.

I am happy to revisit a subject I think I have more to write about, so don’t be shy about suggesting those. Past essays for these letters include:

I owe Iva Sallay thanks for the suggestion of today’s topic. Sallay is a longtime friend of my blog here. And runs the Find the Factors recreational mathematics puzzle site. If you haven’t been following, or haven’t visited before, this is a fun week to step in again. The puzzles this week include (American) Thanksgiving-themed pictures.

Inverse.

When we visit the museum made of a visual artist’s studio we often admire the tools. The surviving pencils and crayons, pens, brushes and such. We don’t often notice the eraser, the correction tape, the unused white-out, or the pages cut into scraps to cover up errors. To do something is to want to undo it. This is as true for the mathematics of a circle as it is for the drawing of one.

If not to undo something, we do often want to know where something comes from. A classic paper asks can one hear the shape of a drum? You hear a sound. Can you say what made that sound? Fine, dismiss the drum shape as idle curiosity. The same question applies to any sensory data. If our hand feels cooler here, where is the insulation of the building damaged? If we have this electrocardiogram reading, what can we say about the action of the heart producing that? If we see the banks of a river, what can we know about how the river floods?

And this is the point, and purpose, of inverses. We can understand them as finding the causes of what we observe.

The first inverse we meet is usually the inverse function. It’s introduced as a way to undo what a function does. That’s an odd introduction, if you’re comfortable with what a function is. A function is a mathematical construct. It’s two sets — a domain and a range — and a rule that links elements in the domain to the range. To “undo” a function is like “undoing” a rectangle. But a function has a compelling “physical” interpretation. It’s routine to introduce functions as machines that take some numbers in and give numbers out. We think of them as ways to transform the domain into the range. In functional analysis get to thinking of domains as the most perfect putty. We expect functions to stretch and rotate and compress and slide along as though they were drawing a Betty Boop cartoon.

So we’re trained to speak of a function as a verb, acting on pieces of the domain. An element or point, or a region, or the whole domain. We think the function “maps”, or “takes”, or “transforms” this into its image in the range. And if we can turn one thing into another, surely we can turn it back.

Some things it’s obvious we can turn back. Suppose our function adds 2 to whatever we give it. We can get the original back by subtracting 2. If the function subtracts 32 and divides by 1.8, we can reverse it by multiplying by 1.8 and adding 32. If the function takes the reciprocal, we can take the reciprocal again. We have a bit of a problem if we started out taking the reciprocal of 0, but who would want to do such a thing anyway? If the function squares a number, we can undo that by taking the square root. Unless we started from a negative number. Then we have trouble.

The trouble is not every function has an inverse. Which we could have realized by thinking how to undo “multiply by zero”. To be a well-defined function, the rule part has to match elements in the domain to exactly one element in the range. This makes the function, in the impenetrable jargon of the mathematician, a “one-to-one function”. Or you can describe it with the more intuitive label of “bijective”.

But there’s no reason more than one thing in the domain can’t match to the same thing in the range. If I know the cosine of my angle is , my angle might be 30 degrees. Or -30 degrees. Or 390 degrees. Or 330 degrees. You may protest there’s no difference between a 30 degree and a 390 degree angle. I agree those angles point in the same direction. But a gear rotated 390 degrees has done something that a gear rotated 30 degrees hasn’t. If all I know is where the dot I’ve put on the gear is, how can I know how much it’s rotated?

So what we do is shift from the actual cosine into one branch of the cosine. By restricting the domain we can create a function that has the same rule as the one we want, but that’s also one-to-one and so has an inverse. What restriction to use? That depends on what you want. But mathematicians have some that come up so often they might as well be defaults. So the square root is the inverse of the square of nonnegative numbers. The inverse Cosine is the inverse of the cosine of angles from 0 to 180 degrees. The inverse Sine is the inverse of the sine of angles from -90 to 90 degrees. The capital letters are convention to say we’re doing this. If we want a different range, we write out that we’re looking for an inverse cosine from -180 to 0 degrees or whatever. (Yes, the mathematician will default to using radians, rather than degrees, for angles. That’s a different essay.) It’s an imperfect solution, but it often works well enough.

The trouble we had with cosines, and functions, continues through all inverses. There are almost always alternate causes. Many shapes of drums sound alike. Take two metal bars. Heat both with a blowtorch, one on the end and one in the center. Not to the point of melting, only to the point of being too hot to touch. Let them cool in insulated boxes for a couple weeks. There’ll be no measurement you can do on the remaining heat that tells you which one was heated on the end and which the center. That’s not because your thermometers are no good or the flow of heat is not deterministic or anything. It’s that both starting cases settle to the same end. So here there is no usable inverse.

This is not to call inverses futile. We can look for what we expect to find useful. We are inclined to find inverses of the cosine between 0 and 180 degrees, even though 4140 through 4320 degrees is as legitimate. We may not know what is wrong with a heart, but have some idea what a heart could do and still beat. And there’s a famous example in 19th-century astronomy. After the discovery of Uranus came the discovery it did not move right. For a while it moved across the sky too fast for its distance from the sun. Then it started moving too slow. The obvious supposition was that there was another, not-yet-seen, planet, affecting its orbit.

The trouble is finding it. Calculating the orbit from what data they had required solving equations with 13 unknown quantities. John Couch Adams and Urbain Le Verrier attempted this anyway, making suppositions about what they could not measure. They made great suppositions. Le Verrier made the better calculations, and persuaded an astronomer (Johann Gottfried Galle, assisted by Heinrich Louis d’Arrest) to go look. Took about an hour of looking. They also made lucky suppositions. Both, for example, supposed the trans-Uranian planet would obey “Bode’s Law”, a seeming pattern in the size of planetary radiuses. The actual Neptune does not. It was near enough in the sky to where the calculated planet would be, though. The world is vaster than our imaginations.

That there are many ways to draw Betty Boop does not mean there’s nothing to learn about how this drawing was done. And so we keep having inverses as a vibrant field of mathematics.

And I have another topic suggested by John Golden, author of Math Hombre. It’s one of the basic bits of mathematics, and so is hard to think about.

Triangle.

Edward Brisse assembled a list of 2,001 things to call a “center” of a triangle. I’d have run out around three. We don’t need most of them. I mention them because the list speaks of how interesting we find triangles. Nobody’s got two thousand thoughts about enneadecagons (19-sided figures).

As always with mathematics it’s hard to say whether triangles are all that interesting or whether we humans are obsessed. They’ve got great publicity. The Pythagorean Theorem may be the only bit of interesting mathematics an average person can be assumed to recognize. The kinds of triangles — acute, obtuse, right, equilateral, isosceles, scalene — are fit questions for trivia games. An ordinary mathematics education can end in trigonometry. This ends up being about circles, but we learn it through triangles. The art and science of determining where a thing is we call “triangulation”.

But triangles do seem to stand out. They’re the simplest polygon, only three vertices and three edges. So we can slice any other polygon into triangles. Any triangle can tile the plane. Even quadrilaterals may need reflections of themselves. One of the first geometry facts we learn is the interior angles of a triangle add up to two right angles. And one of the first geometry facts we learn, discovering there are non-Euclidean geometries, is that they don’t have to.

Triangles have to be convex, that is, they don’t have any divots. This property sounds boring. But it’s a good boring; it makes other work easier. It tells us that the length of any two sides of a triangle add together to something longer than the third side. And that’s a powerful idea.

There are many ways to define “distance”. Mathematicians have tried to find the most abstract version of the concept. This inequality is one of the few pieces that every definition of “distance” must respect. This idea of distance leaps out of shapes drawn on paper. Last week I mentioned a triangle inequality, in discussing functions and . We can define operators that describe a distance between functions. And the distances between trios of functions behave like the distances between points on the triangle. Thus does geometry sneak in to abstract concepts like “piecewise continuous functions”.

And they serve in curious blends of the abstract and the concrete. For example, numerical solutions to partial differential equations. A partial differential equation is one where we want to know a function of two or more variables, and only have information about how the function changes as those variables change. These turn up all the time in any study of things in bulk. Heat flowing through space. Waves passing through fluids. Fluids running through channels. So any classical physics problem that isn’t, like, balls bouncing against each other or planets orbiting stars. We can solve these if they’re linear. Linear here is a term of art meaning “easy”. I kid; “linear” means more like “manageable”. All the good problems are nonlinear and we can exactly solve about two of them.

So, numerical solutions. We make approximations by putting down a mesh on the differential equation’s domain. And then, using several graduate-level courses’ worth of tricks, approximating the equation we want with one that we can solve here. That mesh, though? … It can be many things. One powerful technique is “finite elements”. An element is a small piece of space. Guess what the default shape for these elements are. There are times, and reasons, to use other shapes as elements. You learn those once you have the hang of triangles. (Dividing the space of your variables up into elements lets you look for an approximate solution using tools easier to manage than you’d have without. This is a bit like looking for one’s keys over where the light is better. But we can find something that’s as close as we need to our keys.)

If we need finite elements for, oh, three dimensions of space, or four, then triangles fail us. We can’t fill a volume with two-dimensional shapes like triangles. But the triangle has its analog. The tetrahedron, in some sense four triangles joined together, has all the virtues of the triangle for three dimensions. We can look for a similar shape in four and five and more dimensions. If we’re looking for the thing most like an equilateral triangle, we’re looking for a “simplex”.

These simplexes, or these elements, sprawl out across the domain we want to solve problems for. They look uncannily like the triangles surveyors draw across the chart of a territory, as they show us where things are.

Analysis is about proving why the rest of mathematics works. It’s a hard field. My experience, a typical one, included crashing against real analysis as an undergraduate and again as a graduate student. It turns out mathematics works by throwing a lot of symbols around.

Let me give an example. If you read pop mathematics blogs you know about the number represented by . You’ve seen proofs, some of them even convincing, that this number equals 1. Not a tiny bit less than 1, but exactly 1. Here’s a real-analysis treatment. And — I may regret this — I recommend you don’t read it. Not closely, at least. Instead, look at its shape. Look at the words and symbols as graphic design elements, and trust that what I say is not nonsense. Resume reading after the horizontal rule.

It’s convenient to have a name for the number . I’ll call that , for “repeating”. 1 we’ll call 1. I think you’ll grant that whatever r is, it can’t be more than 1. I hope you’ll accept that if the difference between 1 and r is zero, then r equals 1. So what is the difference between 1 and r?

Give me some number . It has to be a positive number. The implication in the letter is that it’s a small number. This isn’t actually required in general. We expect it. We feel surprise and offense if it’s ever not the case.

I can show that the difference between 1 and r is less than . I know there is some smallest counting number N so that . For example, say is 0.125. Then we can let N = 1, and . Or suppose is 0.00625. But then if N = 3, . (If is bigger than 1, let N = 1.) Now we have to ask why I want this N.

Whatever the value of r is, I know that it is more than 0.9. And that it is more than 0.99. And that it is more than 0.999. In fact, it’s more than the number you get by truncating r after any whole number N of digits. Let me call the number you get by truncating r after N digits. So, and and and so on.

Since , it has to be true that . And since we know what is, we can say exactly what is. It's . And we picked N so that . So . But all we know of is that it's a positive number. It can be any positive number. So has to be smaller than each and every positive number. The biggest number that’s smaller than every positive number is zero. So the difference between 1 and r must be zero and so they must be equal.

That is a compelling argument. Granted, it compels much the way your older brother kneeling on your chest and pressing your head into the ground compels. But this argument gives the flavor of what much of analysis is like.

For one, it is fussy, leaning to technical. You see why the subject has the reputation of driving off all but the most intent mathematics majors. If you get comfortable with this sort of argument it’s hard to notice anymore.

For another, the argument shows that the difference between two things is less than every positive number. Therefore the difference is zero and so the things are equal. This is one of mathematics’ most important tricks. And another point, there’s a lot of talk about . And about finding differences that are, it usually turns out, smaller than some . (As an undergraduate I found something wasteful in how the differences were so often so much less than . We can’t exhaust the small numbers, though. It still feels uneconomic.)

Something this misses is another trick, though. That’s adding zero. I couldn’t think of a good way to use that here. What we often get is the need to show that, say, function and function are equal. That is, that they are less than apart. What we can often do is show that is close to some related function, which let me call .

I know what you’re suspecting: must be a polynomial. Good thought! Although in my experience, it’s actually more likely to be a piecewise constant function. That is, it’s some number, eg, “2”, for part of the domain, and then “2.5” in some other region, with no transition between them. Some other values, even values not starting with “2”, in other parts of the domain. Usually this is easier to prove stuff about than even polynomials are.

But get back to . It’s got the same deal as , some approximation easier to prove stuff about. Then we want to show that is close to some . And then show that is close to . So — watch this trick. Or, again, watch the shape of this trick. Read again after the horizontal rule.

The difference is equal to since adding zero, that is, adding the number , can’t change a quantity. And is equal to . Same reason: is zero. So:

Now we use the “triangle inequality”. If a, b, and c are the lengths of a triangle’s sides, the sum of any two of those numbers is larger than the third. And that tells us:

And then if you can show that is less than ? And that is also ? And you see where this is going for ? Then you’ve shown that . With luck, each of these little pieces is something you can prove.

Don’t worry about what all this means. It’s meant to give a flavor of what you do in an analysis course. It looks hard, but most of that is because it’s a different sort of work than you’d done before. If you hadn’t seen the adding-zero and triangle-inequality tricks? I don’t know how long you’d need to imagine them.

There are other tricks too. An old reliable one is showing that one thing is bounded by the other. That is, that . You use this trick all the time because if you can also show that , then those two have to be equal.

The good thing — and there is good — is that once you get the hang of these tricks analysis starts to come together. And even get easier. The first course you take as a mathematics major is real analysis, all about functions of real numbers. The next course in this track is complex analysis, about functions of complex-valued numbers. And it is easy. Compared to what comes before, yes. But also on its own. Every theorem in complex analysis named after Augustin-Louis Cauchy. They all show that the integral of your function, calculated along a closed loop, is zero. I exaggerate by .

In grad school, if you make it, you get to functional analysis, which examines functions on functions and other abstractions like that. This, too, is easy, possibly because all the basic approaches you’ve seen several courses over. Or it feels easy after all that mucking around with the real numbers.

This is not the entirety of explaining how mathematics works. Since all these proofs depend on how numbers work, we need to show how numbers work. How logic works. But those are subjects we can leave for grad school, for someone who’s survived this gauntlet.

This week’s topic is one of several suggested again by Mr Wu, blogger and Singaporean mathematics tutor. He’d suggested several topics, overlapping in their subject matter, and I was challenged to pick one.

Monte Carlo.

The reputation of mathematics has two aspects: difficulty and truth. Put “difficulty” to the side. “Truth” seems inarguable. We expect mathematics to produce sound, deductive arguments for everything. And that is an ideal. But we often want to know things we can’t do, or can’t do exactly. We can handle that often. If we can show that a number we want must be within some error range of a number we can calculate, we have a “numerical solution”. If we can show that a number we want must be within every error range of a number we can calculate, we have an “analytic solution”.

There are many things we’d like to calculate and can’t exactly. Many of them are integrals, which seem like they should be easy. We can represent any integral as finding the area, or volume, of a shape. The trick is that there’s only a few shapes with volumes we can find exact formulas for. You may remember the area of a triangle or a parallelogram. You have no idea what the area of a regular nonagon is. The trick we rely on is to approximate the shape we want with shapes we know formulas for. This usually gives us a numerical solution.

If you’re any bit devious you’ve had the impulse to think of a shape that can’t be broken up like that. There are such things, and a good swath of mathematics in the late 19th and early 20th centuries was arguments about how to handle them. I don’t mean to discuss them here. I’m more interested in the practical problems of breaking complicated shapes up into simpler ones and adding them all together.

One catch, an obvious one, is that if the shape is complicated you need a lot of simpler shapes added together to get a decent approximation. Less obvious is that you need way more shapes to do a three-dimensional volume well than you need for a two-dimensional area. That’s important because you need even way-er more to do a four-dimensional hypervolume. And more and more and more for a five-dimensional hypervolume. And so on.

That matters because many of the integrals we’d like to work out represent things like the energy of a large number of gas particles. Each of those particles carries six dimensions with it. Three dimensions describe its position and three dimensions describe its momentum. Worse, each particle has its own set of six dimensions. The position of particle 1 tells you nothing about the position of particle 2. So you end up needing ridiculously,impossibly many shapes to get even a rough approximation.

With no alternative, then, we try wisdom instead. We train ourselves to think of deductive reasoning as the only path to certainty. By the rules of deductive logic it is. But there are other unshakeable truths. One of them is randomness.

We can show — by deductive logic, so we trust the conclusion — that the purely random is predictable. Not in the way that lets us say how a ball will bounce off the floor. In the way that we can describe the shape of a great number of grains of sand dropped slowly on the floor.

The trick is one we might get if we were bad at darts. If we toss darts at a dartboard, badly, some will land on the board and some on the wall behind. How many hit the dartboard, compared to the total number we throw? If we’re as likely to hit every spot of the wall, then the fraction that hit the dartboard, times the area of the wall, should be about the area of the dartboard.

So we can do something equivalent to this dart-throwing to find the volumes of these complicated, hyper-dimensional shapes. It’s a kind of numerical integration. It isn’t particularly sensitive to how complicated the shape is, though. It takes more work to find the volume of a shape with more dimensions, yes. But it takes less more-work than the breaking-up-into-known-shapes method does. There are wide swaths of mathematics and mathematical physics where this is the best way to calculate the integral.

This bit that I’ve described is called “Monte Carlo integration”. The “integration” part of the name because that’s what we started out doing. To call it “Monte Carlo” implies either the method was first developed there or the person naming it was thinking of the famous casinos. The case is the latter. Monte Carlo methods as we know them come from Stanislaw Ulam, mathematical physicist working on atomic weapon design. While ill, he got to playing the game of Canfield solitaire, about which I know nothing except that Stanislaw Ulam was playing it in 1946 while ill. He wondered what the chance was that a given game was winnable. The most practical approach was sampling: set a computer to play a great many games and see what fractions of them were won. (The method comes from Ulam and John von Neumann. The name itself comes from their colleague Nicholas Metropolis.)

There are many Monte Carlo methods, with integration being only one very useful one. They hold in common that they’re build on randomness. We try calculations — often simple ones — many times over with many different possible values. And the regularity, the predictability, of randomness serves us. The results come together to an average that is close to the thing we do want to know.