While I’m not necessarily going to continue highlighting old A to Z essays every Friday and Saturday, it is a fact I’ve now got six pages listing all the topics for the six A to Z’s that I have completed. So let me share them here. This may be convenient for you, the reader, to see what kinds of things I’ve written up. It’s certainly convenient for me, since someday I’ll want all this stuff organized. The past A to Z sequences have been:
Summer 2015. Featuring anzatz, into, and well-posed problem.
Leap Day 2016. With continued fractions, polynomials, quaternions, and transcendental numbers.
End 2016. Featuring the Fredholm alternative, general covariance, normal numbers, and the Monster Group.
The most important thing I learned this time around was that I should have started a week or two earlier. Not that this should have been a Summer A to Z. It would be true for any season. It’s more that I started soliciting subjects for the first letters of the alphabet about two weeks ahead of publication. I didn’t miss a deadline this time around, and I didn’t hit the dread of starting an essay the day of publication. But the great thing about an A to Z sequence like this is being able to write well ahead of publication and I never got near that.
The Reading the Comics posts are already, necessarily, done close to publication. The only way to alter that is to make the Reading the Comics posts go even more than a week past the comics’ publication. Or lean on syndicated cartoonists to send me heads-ups. Anyway, if neither Reading the Comics nor A to Zs can give me breathing room, then what’s going wrong? So probably having topics picked as much as a month ahead of publication is the way I should go.
Picking topics is always the hardest part of writing things here. The A to Z gimmick makes it easy to get topics, though. The premise is both open and structured. I’m not sure I’d have as fruitful a response if I tossed out “explainer Fridays” or something and hoped people had ideas. A structured choice tends to be easier to make.
The biggest structural experiment this time around is that I put in two “recap” posts each week. These were little one- and two-paragraph things pointing to past A to Z essays. I’ve occasionally reblogged a piece, or done a post that points to old posts. Never systematically, though. Two recap posts a week seemed to work well enough. Some old stuff got new readers and nobody seemed upset. I even got those, at least, done comfortably ahead of deadline. When I finished a Thursday post I could feel like I was luxuriating in a long weekend, until I remembered the comics needed to be read.
Also, this now completes the sixth of my A to Z sequences. I’ve got enough that if I really wanted, I could drop to one new post a week, and do nothing but recaps the rest of the time. It would give me six months posting something every day. I have got nearly nine years’ worth of material here. Much of it is Reading the Comics posts, which date instantly. But the rest of the stuff in principle hasn’t aged, except in how my prose style has changed.
Another thing learned, and a bit of a surprise, was that I found a lot of fundamentals this time around. Things like “differential equations” or “Fourier series” or “Taylor series”. These are things that any mathematics major would know. These are even things that anyone a bit curious about mathematics might know. There is a place for very specific, technical terms. But some big-picture essays turn out to be comfortable too.
One of the things I wanted to write about and couldn’t was the Yang-Mills Equation. It would have taken too many words for me to write. If I’d used earlier essays as lemmas, to set up parts of this, I might have made it. In past A to Z sequences some essays built on one another. But by the time I was considering Y, the prerequisite letters had already been filled. This is an argument for soliciting for the whole alphabet from the start, rather than breaking it up into several requests for topics. But even then I’d have had to be planning Y, at a time when I know I’d be trying to think about D’s and E’s. I’m not sure that’s plausible. It does imply, as I started out thinking, that I need to work farther ahead of deadline anyway.
And I have made it to the end! As is traditional, I mean to write a few words about what I learned in doing all of this. Also as is traditional, I need to collapse after the work of thirteen weeks of two essays per week describing a small glossary of terms mostly suggested by kind readers. So while I wait to do that, let me gather in one bundle a list of all the essays from this project. If this seems to you like a lazy use of old content to fill a publication hole let me assure you: this will make my life so much easier next time I do an A-to-Z. I’ve learned that, at least, over the years.
Today’s A To Z term was nominated by Dina Yagodich, who runs a YouTube channel with a host of mathematics topics. Zeno’s Paradoxes exist in the intersection of mathematics and philosophy. Mathematics majors like to declare that they’re all easy. The Ancient Greeks didn’t understand infinite series or infinitesimals like we do. Now they’re no challenge at all. This reflects a belief that philosophers must be silly people who haven’t noticed that one can, say, exit a room.
This is your classic STEM-attitude of missing the point. We may suppose that Zeno of Elea occasionally exited rooms himself. That is a supposition, though. Zeno, like most philosophers who lived before Socrates, we know from other philosophers making fun of him a century after he died. Or at least trying to explain what they thought he was on about. Modern philosophers are expected to present others’ arguments as well and as strongly as possible. This even — especially — when describing an argument they want to say is the stupidest thing they ever heard. Or, to use the lingo, when they wish to refute it. Ancient philosophers had no such compulsion. They did not mind presenting someone else’s argument sketchily, if they supposed everyone already knew it. Or even badly, if they wanted to make the other philosopher sound ridiculous. Between that and the sparse nature of the record, we have to guess a bit about what Zeno precisely said and what he meant. This is all right. We have some idea of things that might reasonably have bothered Zeno.
And they have bothered philosophers for thousands of years. They are about change. The ones I mean to discuss here are particularly about motion. And there are things we do not understand about change. This essay will not answer what we don’t understand. But it will, I hope, show something about why that’s still an interesting thing to ponder.
When we capture a moment by photographing it we add lies to what we see. We impose a frame on its contents, discarding what is off-frame. We rip an instant out of its context. And that before considering how we stage photographs, making people smile and stop tilting their heads. We forgive many of these lies. The things excluded from or the moments around the one photographed might not alter what the photograph represents. Making everyone smile can convey the emotional average of the event in a way that no individual moment represents. Arranging people to stand in frame can convey the participation in the way a candid photograph would not.
But there remains the lie that a photograph is “a moment”. It is no such thing. We notice this when the photograph is blurred. It records all the light passing through the lens while the shutter is open. A photograph records an eighth of a second. A thirtieth of a second. A thousandth of a second. But still, some time. There is always the ghost of motion in a picture. If we do not see it, it is because our photograph’s resolution is too coarse. If we could photograph something with infinite fidelity we would see, even in still life, the wobbling of the molecules that make up a thing.
Which implies something fascinating to me. Think of a reel of film. Here I mean old-school pre-digital film, the thing that’s a great strip of pictures, a new one shown 24 times per second. Each frame of film is a photograph, recording some split-second of time. How much time is actually in a film, then? How long, cumulatively, was a camera shutter open during a two-hour film? I use pre-digital, strip-of-film movies for convenience. Digital films offer the same questions, but with different technical points. And I do not want the writing burden of describing both analog and digital film technologies. So I will stick to the long sequence of analog photographs model.
Let me imagine a movie. One of an ordinary everyday event; an actuality, to use the terminology of 1898. A person overtaking a walking tortoise. Look at the strip of film. There are many frames which show the person behind the tortoise. There are many frames showing the person ahead of the tortoise. When are the person and the tortoise at the same spot?
We have to put in some definitions. Fine; do that. Say we mean when the leading edge of the person’s nose overtakes the leading edge of the tortoise’s, as viewed from our camera. Or, since there must be blur, when the center of the blur of the person’s nose overtakes the center of the blur of the tortoise’s nose.
Do we have the frame when that moment happened? I’m sure we have frames from the moments before, and frames from the moments after. But the exact moment? Are you positive? If we zoomed in, would it actually show the person is a millimeter behind the tortoise? That the person is a hundredth of a millimeter ahead? A thousandth of a hair’s width behind? Suppose that our camera is very good. It can take frames representing as small a time as we need. Does it ever capture that precise moment? To the point that we know, no, it’s not the case that the tortoise is one-trillionth the width of a hydrogen atom ahead of the person?
If we can’t show the frame where this overtaking happened, then how do we know it happened? To put it in terms a STEM major will respect, how can we credit a thing we have not observed with happening? … Yes, we can suppose it happened if we suppose continuity in space and time. Then it follows from the intermediate value theorem. But then we are begging the question. We impose the assumption that there is a moment of overtaking. This does not prove that the moment exists.
Fine, then. What if time is not continuous? If there is a smallest moment of time? … If there is, then, we can imagine a frame of film that photographs only that one moment. So let’s look at its footage.
One thing stands out. There’s finally no blur in the picture. There can’t be; there’s no time during which to move. We might not catch the moment that the person overtakes the tortoise. It could “happen” in-between moments. But at least we have a moment to observe at leisure.
So … what is the difference between a picture of the person overtaking the tortoise, and a picture of the person and the tortoise standing still? A movie of the two walking should be different from a movie of the two pretending to be department store mannequins. What, in this frame, is the difference? If there is no observable difference, how does the universe tell whether, next instant, these two should have moved or not?
A mathematical physicist may toss in an answer. Our photograph is only of positions. We should also track momentum. Momentum carries within it the information of how position changes over time. We can’t photograph momentum, not without getting blurs. But analytically? If we interpret a photograph as “really” tracking the positions of a bunch of particles? To the mathematical physicist, momentum is as good a variable as position, and it’s as measurable. We can imagine a hyperspace photograph that gives us an image of positions and momentums. So, STEM types show up the philosophers finally, right?
Hold on. Let’s allow that somehow we get changes in position from the momentum of something. Hold off worrying about how momentum gets into position. Where does a change in momentum come from? In the mathematical physics problems we can do, the change in momentum has a value that depends on position. In the mathematical physics problems we have to deal with, the change in momentum has a value that depends on position and momentum. But that value? Put it in words. That value is the change in momentum. It has the same relationship to acceleration that momentum has to velocity. For want of a real term, I’ll call it acceleration. We need more variables. An even more hyperspatial film camera.
… And does acceleration change? Where does that change come from? That is going to demand another variable, the change-in-acceleration. (The “jerk”, according to people who want to tell you that “jerk” is a commonly used term for the change-in-acceleration, and no one else.) And the change-in-change-in-acceleration. Change-in-change-in-change-in-acceleration. We have to invoke an infinite regression of new variables. We got here because we wanted to suppose it wasn’t possible to divide a span of time infinitely many times. This seems like a lot to build into the universe to distinguish a person walking past a tortoise from a person standing near a tortoise. And then we still must admit not knowing how one variable propagates into another. That a person is wide is not usually enough explanation of how they are growing taller.
Numerical integration can model this kind of system with time divided into discrete chunks. It teaches us some ways that this can make logical sense. It also shows us that our projections will (generally) be wrong. At least unless we do things like have an infinite number of steps of time factor into each projection of the next timestep. Or use the forecast of future timesteps to correct the current one. Maybe use both. These are … not impossible. But being “ … not impossible” is not to say satisfying. (We allow numerical integration to be wrong by quantifying just how wrong it is. We call this an “error”, and have techniques that we can use to keep the error within some tolerated margin.)
So where has the movement happened? The original scene had movement to it. The movie seems to represent that movement. But that movement doesn’t seem to be in any frame of the movie. Where did it come from?
We can have properties that appear in a mass which don’t appear in any component piece. No molecule of a substance has a color, but a big enough mass does. No atom of iron is ferromagnetic, but a chunk might be. No grain of sand is a heap, but enough of them are. The Ancient Greeks knew this; we call it the Sorites paradox, after Eubulides of Miletus. (“Sorites” means “heap”, as in heap of sand. But if you had to bluff through a conversation about ancient Greek philosophers you could probably get away with making up a quote you credit to Sorites.) Could movement be, in the term mathematical physicists use, an intensive property? But intensive properties are obvious to the outside observer of a thing. We are not outside observers to the universe. It’s not clear what it would mean for there to be an outside observer to the universe. Even if there were, what space and time are they observing in? And aren’t their space and their time and their observations vulnerable to the same questions? We’re in danger of insisting on an infinite regression of “universes” just so a person can walk past a tortoise in ours.
We can say where movement comes from when we watch a movie. It is a trick of perception. Our eyes take some time to understand a new image. Our brains insist on forming a continuous whole story even out of disjoint ideas. Our memory fools us into remembering a continuous line of action. That a movie moves is entirely an illusion.
You see the implication here. Surely Zeno was not trying to lead us to understand all motion, in the real world, as an illusion? … Zeno seems to have been trying to support the work of Parmenides of Elea. Parmenides is another pre-Socratic philosopher. So we have about four words that we’re fairly sure he authored, and we’re not positive what order to put them in. Parmenides was arguing about the nature of reality, and what it means for a thing to come into or pass out of existence. He seems to have been arguing something like that there was a true reality that’s necessary and timeless and changeless. And there’s an apparent reality, the thing our senses observe. And in our sensing, we add lies which make things like change seem to happen. (Do not use this to get through your PhD defense in philosophy. I’m not sure I’d use it to get through your Intro to Ancient Greek Philosophy quiz.) That what we perceive as movement is not what is “really” going on is, at least, imaginable. So it is worth asking questions about what we mean for something to move. What difference there is between our intuitive understanding of movement and what logic says should happen.
(I know someone wishes to throw down the word Quantum. Quantum mechanics is a powerful tool for describing how many things behave. It implies limits on what we can simultaneously know about the position and the time of a thing. But there is a difference between “what time is” and “what we can know about a thing’s coordinates in time”. Quantum mechanics speaks more to the latter. There are also people who would like to say Relativity. Relativity, special and general, implies we should look at space and time as a unified set. But this does not change our questions about continuity of time or space, or where to find movement in both.)
And this is why we are likely never to finish pondering Zeno’s Paradoxes. In this essay I’ve only discussed two of them: Achilles and the Tortoise, and The Arrow. There are two other particularly famous ones: the Dichotomy, and the Stadium. The Dichotomy is the one about how to get somewhere, you have to get halfway there. But to get halfway there, you have to get a quarter of the way there. And an eighth of the way there, and so on. The Stadium is the hardest of the four great paradoxes to explain. This is in part because the earliest writings we have about it don’t make clear what Zeno was trying to get at. I can think of something which seems consistent with what’s described, and contrary-to-intuition enough to be interesting. I’m satisfied to ponder that one. But other people may have different ideas of what the paradox should be.
There are a handful of other paradoxes which don’t get so much love, although one of them is another version of the Sorites Paradox. Some of them the Stanford Encyclopedia of Philosophy dubs “paradoxes of plurality”. These ask how many things there could be. It’s hard to judge just what he was getting at with this. We know that one argument had three parts, and only two of them survive. Trying to fill in that gap is a challenge. We want to fill in the argument we would make, projecting from our modern idea of this plurality. It’s not Zeno’s idea, though, and we can’t know how close our projection is.
I don’t have the space to make a thematically coherent essay describing these all, though. The set of paradoxes have demanded thought, even just to come up with a reason to think they don’t demand thought, for thousands of years. We will, perhaps, have to keep trying again to fully understand what it is we don’t understand.
Thank you, all who’ve been reading, and who’ve offered topics, comments on the material, or questions about things I was hoping readers wouldn’t notice I was shorting. I’ll probably do this again next year, after I’ve had some chance to rest.
Today’s A To Z term is … well, my second choice. Goldenoj suggested Yang-Mills and I was so interested. Yang-Mills describes a class of mathematical structures. They particularly offer insight into how to do quantum mechanics. Especially particle physics. It’s of great importance. But, on thinking out what I would have to explain I realized I couldn’t write a coherent essay about it. Getting to what the theory is made of would take explaining a bunch of complicated mathematical structures. If I’d scheduled the A-to-Z differently, setting up matters like Lie algebras, maybe I could do it, but this time around? No such help. And I don’t feel comfortable enough in my knowledge of Yang-Mills to describe it without describing its technical points.
That said I hope that Jacob Siehler, who suggested the Game of ‘Y’, does not feel slighted. I hadn’t known anything of the game going in to the essay-writing. When I started research I was delighted. I have yet to actually play a for-real game of this. But I like what I see, and what I can think I can write about it.
Game of ‘Y’.
This is, as the name implies, a game. It has two players. They have the same objective: to create a ‘y’. Here, they do it by laying down tokens representing their side. They take turns, each laying down one token in a turn. They do this on a shape with three edges. The ‘y’ is created when there’s a continuous path of their tokens that reaches all three edges. Yes, it counts to have just a single line running along one edge of the board. This makes a pretty sorry ‘y’ but it suggests your opponent isn’t trying.
There are details of implementation. The board is a mesh of, mostly, hexagons. I take this to be for the same reason that so many conquest-type strategy games use hexagons. They tile space well, they give every space a good number of neighbors, and the distance from the centers of one neighbor to another is constant. In a square grid, the centers of diagonal neighbors are farther than the centers of left-right or up-down neighbors. Hexagons do well for this kind of game, where the goal is to fill space, or at least fill paths in space. There’s even a game named Hex, slightly older than Y, with similar rules. In that the goal is to draw a continuous path from one end of the rectangular grid to another. The grid of commercial boards, that I see, are around nine hexagons on a side. This probably reflects a desire to have a big enough board that games go on a while, but not so big that they go on forever
Mathematicians have things to say about this game. It fits nicely in game theory. It’s well-designed to show some things about game theory. It’s the kind of game which has perfect information game, for example. Each player knows, at all times, the moves all the players have made. Just look at the board and see where they’ve placed their tokens. A player might have forgotten the order the tokens were placed in, but that’s the player’s problem, not the game’s. Anyway in Y, the order of token-placing doesn’t much matter.
It’s also a game of complete information. Every player knows, at every step, what the other player could do. And what objective they’re working towards. One party, thinking enough, could forecast the other’s entire game. This comes close to the joke about the prisoners telling each other jokes by shouting numbers out to one another.
It is also a game in which a draw is impossible. Play long enough and someone must win. This even if both parties are for some reason trying to lose. There are ingenious proofs of this, but we can show it by considering a really simple game. Imagine playing Y on a tiny board, one that’s just one hex on each side. Definitely want to be the first player there.
So now imagine playing a slightly bigger board. Augment this one-by-one-by-one board by one row. That is, here, add two hexes along one of the sides of the original board. So there’s two pieces here; one is the original territory, and one is this one-row augmented territory. Look first at the original territory. Suppose that one of the players has gotten a ‘Y’ for the original territory. Will that player win the full-size board? … Well, sure. The other player can put a token down on either hex in the augmented territory. But there’s two hexes, either of which would make a path that connects the three edges of the board. The first player can put a token down on the other hex in the augmented territory, and now connects all three of the new sides again. First player wins.
All right, but how about a slightly bigger board? So take that two-by-two-by-two board and augment it, adding three hexes along one of the sides. Imagine a player’s won the original territory board. Do they have to win the full-size board? … Sure. The second player can put something in the augmented territory. But there’s again two hexes that would make the path connecting all three sides of the full board. The second player can put a token in one of those hexes. But the first player can put a token in the other of those. First player wins again.
How about a slightly bigger board yet? … Same logic holds. Really the only reason that the first player doesn’t always win is that, at some point, the first player screws up. And this is an existence proof, showing that the first player can always win. It doesn’t give any guidance into how to play, though. If the first player plays perfectly, she’s compelled to win. This is something which happens in many two-player, symmetric games. A symmetric game is one where either player has the same set of available moves, and can make the same moves with the same results. This proof needs to be tightened up to really hold. But it should convince you, at least, that the first player has an advantage.
So given that, the question becomes why play this game after you’ve decided who’ll go first? The reason you might if you were playing a game is, what, you have something else to do? And maybe you think you’ll make fewer mistakes than your opponent. One approach often used in symmetric games like this is the “pie rule”. The name comes from the story about how to slice a pie so you and your sibling don’t fight over the results. One cuts the pie, the other gets first pick of the slice, and then you fight anyway. In this game, though, one player makes a tentative first move. The other decides whether they will be Player One with that first move made or whether they’ll be Player Two, responding.
There are some neat quirks in the commercial Y games. One is that they don’t actually show hexes, and you don’t put tokens in the middle of hexes. Instead you put tokens on the spots that would be the center of the hex. On the board are lines pointing to the neighbors. This makes the board actually a string of triangles. This is the dual to the hex grid. It shows a set of vertices, and their connections, instead of hexes and their neighbors. Whether you think the hex grid or this dual makes it easier to tell when you’ve connected all three edges is a matter of taste. It does make the edges less jagged all around.
Another is that there will be three vertices that don’t connect to six others. They connect to five others, instead. Their spaces would be pentagons. As I understand the literature on this, this is a concession to game balance. It makes it easier for one side to fend off a path coming from the center.
It has geometric significance, though. A pure hexagonal grid is a structure that tiles the plane. A mostly hexagonal grid, with a couple of pentagons, though? That can tile the sphere. To cover the whole sphere you need something like at least twelve irregular spots. But this? With the three pentagons? That gives you a space that’s topographically equivalent to a hemisphere, or at least a slice of the sphere. If we do imagine the board to be a hemisphere covered, then the result of the handful of pentagon spaces is to make the “pole” closer to the equator.
So as I say the game seems fun enough to play. And it shows off some of the ways that game theorists classify games. And the questions they ask about games. Is the game always won by someone? Does one party have an advantage? Can one party always force a win? It also shows the kinds of approach game theorists can use to answer these questions. This before they consider whether they’d enjoy playing it.
Today’s A To Z term is another from Mr Wu, of mathtuition88.com. The term does not, technically, start with X. But the Greek letter χ certainly looks like an X. And the modern English letter X traces back to that χ. So that’s near enough for my needs.
The χ2 test is a creature of statistics. Creatures, really. But if one just says “the χ2 test” without qualification they mean Pearson’s χ2 test. Pearson here is a familiar name to anyone reading the biographical sidebar in their statistics book. He was Karl Pearson, who in the late 19th and early 20th century developed pretty much every tool of inferential statistics.
Pearson was, besides a ferocious mathematical talent, a white supremacist and eugenicist. This is something to say about many pioneers of statistics. Many of the important basics of statistics were created to prove that some groups of humans were inferior to the kinds of people who get offers an OBE. They were created at a time that white society was very afraid that it might be out-bred by Italians or something even worse. This is not to say the tools of statistics are wrong, or bad. It is to say that anyone telling you mathematics is a socially independent, politically neutral thing is a fool or a liar.
Inferential statistics is the branch of statistics used to test hypotheses. The hypothesis, generally, is about whether one sample of things is really distinguishable from a population of things. It is different from descriptive statistics, which is that thing I do each month when I say how many pages got a view and from how many countries. Descriptive statistics give us a handful of numbers with which to approximate a complicated things. Both do valuable work, although I agree it seems like descriptive statistics are the boring part. Without them, though, inferential statistics has nothing to study.
The χ2 test works like many hypothesis-testing tools do. It takes two parts. One of this is observations. We start with something that comes in two or more categories. Categories come in many kinds: the postal code where a person comes from. The color of a car. The number of years of schooling someone has had. The species of flower. What is important is that the categories be mutually exclusive. One has either been a smoker for more than one year or else one has not.
Count the number of observations of … whatever is interesting … for each category. There is some fraction of observations that belong to the first category, some fraction that belong to the second, some to the third, and so on. Find those fractions. This is all easy enough stuff, really. Counting and dividing by the total number of observations. Which is a hallmark of many inferential statistics tools. They are often tedious, involving a lot of calculation. But they rarely involve difficult calculations. Square roots are often where they top out.
That covers observations. What we also need are expectations. This is our hypothesis for what fraction “ought” to be in each category. How do you know what there “ought” to be? … This is the hard part of inferential statistics. Often we are interested in showing that some class is more likely than another to have whatever we’ve observed happen. So we can use as a hypothesis that the thing is observed just as much in one case as another. If we want to test whether one sample is indistinguishable from another, we use the proportions from the other sample. If we want to test whether one sample matches a theoretical ideal, we use that theoretical ideal. People writing probability and statistics problems love throwing dice. Let me make that my example. We hypothesize that on throwing a six-sided die a thousand times, each number comes up exactly one-sixth of the time.
It’s impossible that each number will come up exactly one-sixth of the time, in a thousand throws. We could only hope to achieve this if we tossed some ridiculous number like a thousand and two times. But even if we went to that much extra work, it’s impossible that each number would come up exactly the 167 times. Here I mean it’s “impossible” in the same way it’s impossible I could drop eight coins from my pocket and have them all come up tails. Undoubtedly, some number will be unlucky and just not turn up the full 167 times. Some other number will come up a bit too much. But it’s not required; it’s just like that. Some coin lands heads.
This doesn’t necessarily mean the die is biased. The question is whether the observations are too far off from the prediction. How far is that? For each category, take the difference between the observed frequency and the expected frequency. Square that. Divide it by the expected frequency. Once you’ve done that for every category, add up all these numbers. This is χ2. Do all this and you’ll get some nice nonnegative number like, oh, 5.094 or 11.216 or, heck, 20.482.
The χ2 test works like many inferential-statistics tests do. It tells us how likely it is that, if the hypothetical expected values were right, that random chance would give us the observed data. The farther χ2 is from zero, the less likely it is this was pure chance. Which, all right. But how big does it have to be?
It depends on two important things. First is the number of categories that you have. Or, to use the lingo, the degrees of freedom in your problem. This is one minus the total number of categories. The greater the number of degrees of freedom, the bigger χ2 can be without it saying this difference can’t just be chance.
The second important thing is called the alpha level. This is a judgement call. This is how unlikely you want a result to be before you’ll declare that it couldn’t be chance. We have an instinctive idea of this. If you toss a coin twenty times and it comes up tails every time, you’ll declare that was impossible and the coin must be rigged. But it isn’t impossible. Start a run of twenty coin flips right now. You have a 0.000 095 37% chance of it being all tails. But I would be comfortable, on the 20th tail, to say something is up. I accept that I am ascribing to malice what is in fact just one of those things.
So the choice of alpha level is a measure of how willing we are to make a mistake in our conclusions. In a simple science like particle physics we can set very stringent standards. There are many particles around and we can smash them as long as the budget holds out. In more difficult sciences, such as epidemiology, we must let alpha be larger. We often accept an alpha of five-percent or one-percent.
What we must do, then, is find for an alpha level and a number of degrees of freedom, what the threshold χ2 is. If the sample’s χ2 is below that threshold, OK. The observations are consistent with the hypothesis. If the sample’s χ2 is larger than that threshold, OK. It’s less-than-the-alpha-level percent likely that the observations are consistent with the hypothesis. This is what most statistical inference tests are like. You calculate a number and check whether it is above or below a threshold. If it’s below the threshold, the observation is consistent with the hypothesis. If it’s above the threshold, there’s less than the alpha-level chance that the observation is consistent with the hypothesis.
How do we find these threshold values? … Well, under no circumstances do we try to calculate those. They’re based on a thing called the χ2 distributions, the name you’d expect. They’re hard to calculate. There is no earthly reason for you to calculate them. You can find them in the back of your statistics textbook. Or do a web search for χ2 test tables. I’m sure Matlab has a function to give you this. If it doesn’t, there’s a function you can download from somebody to work it out. There’s no need to calculate that yourself. Which is again common to inferential statistics tests. You find the thresholds by just looking them up.
χ2 tests are just one of the hypothesis-testing tools of inferential statistics. They are a good example of such. They’re designed for observations that can be fit into several categories, and comparing those to an expected forecast. But the calculations one does, and the way one interprets them, are typical for these tests. Even the way they are more tedious than hard is typical. It’s a good example of the family of tools.
Today’s A To Z term was suggested by Dina Yagodich, whose YouTube channel features many topics, including calculus and differential equations, statistics, discrete math, and Matlab. Matlab is especially valuable to know as a good quick calculation can answer many questions.
The Wallis named here is John Wallis, an English clergyman and mathematician and cryptographer. His most tweetable work is how we follow his lead in using the symbol ∞ to represent infinity. But he did much in calculus. And it’s a piece of that which brings us to today. He particularly noticed this:
This is an infinite product. It’s multiplication’s answer to the infinite series. It always amazes me when an infinite product works. There are dangers when you do anything with an infinite number of terms. Even the basics of arithmetic, like that you can change the order in which you calculate but still get the same result, break down. Series, in which you add together infinitely many things, are risky, but I’m comfortable with the rules to know when the sum can be trusted. Infinite products seem more mysterious. Then you learn an infinite product converges if and only if the series made from the logarithms of the terms in it also converges. Then infinite products seem less exciting.
There are many infinite products that give us π. Some work quite efficiently, giving us lots of digits for a few terms’ work. Wallis’s formula does not. We need about a thousand terms for it to get us a π of about 3.141. This is a bit much to calculate even today. In 1656, when he published it in Arithmetica Infinitorum, a book I have never read? Wallis was able to do mental arithmetic well. His biography at St Andrews says once when having trouble sleeping he calculated the square root of a 53-digit number in his head, and in the morning, remembered it, and was right. Still, this would be a lot of work. How could Wallis possibly do it? And what work could possibly convince anyone else that he was right?
As it common to striking discoveries it was a mixture of insight and luck and persistence and pattern recognition. He seems to have started with pondering the value of
Happily, he knew exactly what this was: . He knew this because of a bit of insight. We can interpret the integral here as asking for the area that’s enclosed, on a Cartesian coordinate system, by the positive x-axis, the positive y-axis, and the set of points which makes true the equation . This curve is the upper half of a circle with radius 1 and centered on the origin. The area enclosed by all this is one-fourth the area of a circle of radius 1. So that’s how he could know the value of the integral, without doing any symbol manipulation.
The question, in modern notation, would be whether he could do that integral. And, for this? He couldn’t. But, unable to do the problem he wanted, he tried doing the most similar problem he could and see what that proved. was beyond his power to integrate; but what if he swapped those exponents? Worked on instead? This would not — could not — give him what he was interested in. But it would give him something he could calculate. So can we:
And now here comes persistence. What if it’s not inside the parentheses there? If it’s x raised to some other unit fraction instead? What if the parentheses aren’t raised to the second power, but to some other whole number? Might that reveal something useful? Each of these integrals is calculable, and he calculated them. He worked out a table for many values of
for different sets of whole numbers p and q. He trusted that if he kept this up, he’d find some interesting pattern. And he does. The integral, for example, always turns out to be a unit fraction. And there’s a deeper pattern. Let me share results for different values of p and q; the integral is the reciprocal of the number inside the table. The topmost row is values of q; the leftmost column is values of p.
There is a deep pattern here, although I’m not sure Wallis noticed that one. Look along the diagonals, running from lower-left to upper-right. These are the coefficients of the binomial expansion. Yang Hui’s triangle, if you prefer. Pascal’s triangle, if you prefer that. Let me call the term in row p, column q of this table . Then
Great material, anyway. The trouble is that it doesn’t help Wallis with the original problem, which — in this notation — would have and . What he really wanted was the Binomial Theorem, but western mathematicians didn’t know it yet. Here a bit of luck comes in. He had noticed there’s a relationship between terms in one column and terms in another, particularly, that
So why shouldn’t that hold if p and q aren’t whole numbers? … We would today say why should they hold? But Wallis was working with a different idea of mathematical rigor. He made assumptions that it turned out in this case were correct. Of course, had he been wrong, we wouldn’t have heard of any of this and I would have an essay on some other topic.
With luck in Wallis’s favor we can go back to making a table. What would the row for look like? We’ll need both whole and half-integers. is easy; its reciprocal is 1. is also easy; that’s the insight Wallis had to start with. Its reciprocal is . What about the rest? Use the equation just up above, relating to ; then we can start to fill in:
Anything we can learn from this? … Well, sure. For one, as we go left to right, all these entries are increasing. So, like, the second column is less than the third which is less than the fourth. Here’s a triple inequality for you:
Multiply all that through by, on, . And then divide it all through by . What have we got?
I did some rearranging of terms, but, that’s the pattern. One-half π has to be between and four-thirds that.
Move over a little. Start from the row where . This starts us out with
Multiply everything by , and divide everything by and follow with some symbol manipulation. And here’s a tip which would have saved me some frustration working out my notes: . Also, 6 equals 2 times 3. Later on, you may want to remember that 8 equals 2 times 4. All this gets us eventually to
Move over to the next terms, starting from . This will get us eventually to
You see the pattern here. Whatever the value of , it’s squeezed between some number, on the left side of this triple inequality, and that same number times … uh … something like or or or . That last one is a number very close to 1. So the conclusion is that has to equal whatever that pattern is making for the number on the left there.
We can make this more rigorous. Like, we don’t have to just talk about squeezing the number we want between two nearly-equal values. We can rely on the use of the … Squeeze Theorem … to prove this is okay. And there’s much we have to straighten out. Particularly, we really don’t want to write out expressions like
Put that way, it looks like, well, we can divide each 3 in the denominator into a 6 in the numerator to get a 2, each 5 in the denominator to a 10 in the numerator to get a 2, and so on. We get a product that’s infinitely large, instead of anything to do with π. This is that problem where arithmetic on infinitely long strings of things becomes dangerous. To be rigorous, we need to write this product as the limit of a sequence, with finite numerator and denominator, and be careful about how to compose the numerators and denominators.
But this is all right. Wallis found a lovely result and in a way that’s common to much work in mathematics. It used a combination of insight and persistence, with pattern recognition and luck making a great difference. Often when we first find something the proof of it is rough, and we need considerable work to make it rigorous. The path that got Wallis to these products is one we still walk.
Jacob Siehler suggested the term for today’s A to Z essay. The letter V turned up a great crop of subjects: velocity, suggested by Dina Yagodich, and variable, from goldenoj, were also great suggestions. But Siehler offered something almost designed to appeal to me: an obscure function that shone in the days before electronic computers could do work for us. There was no chance of my resisting.
A story about the comeuppance of a know-it-all who was not me. It was in mathematics class in high school. The teacher was explaining logic, and showing off diagrams. These would compute propositions very interesting to logic-diagram-class connecting symbols. These symbols meant logical AND and OR and NOT and so on. One of the students pointed out, you know, the only symbol you actually need is NAND. The teacher nodded; this was so. By the clever arrangement of enough NAND operations you could get the result of all the standard logic operations. He said he’d wait while the know-it-all tried it for any realistic problem. If we are able to do NAND we can construct an XOR. But we will understand what we are trying to do more clearly if we have an XOR in the kit.
So the versine. It’s a (spherical) trigonometric function. The versine of an angle is the same value as 1 minus the cosine of the angle. This seems like a confused name; shouldn’t something called “versine” have, you know, a sine in its rule? Sure, and if you don’t like that 1 minus the cosine thing, you can instead use this. The versine of an angle is two times the square of the sine of half the angle. There is a vercosine, so you don’t need to worry about that. The vercosine is two times the square of the cosine of half the angle. That’s also equal to 1 plus the cosine of the angle.
This is all fine, but what’s the point? We can see why it might be easier to say “versine of θ” than to say “2 sin(1/2 θ)”. But how is “versine of θ” easier than “one minus cosine of θ”?
The strongest answer, at the risk of sounding old, is to ask back, you know we haven’t always done things the way we do them now, right?
We have, these days, settled on an idea of what the important trigonometric functions are. Start with Cartesian coordinates on some flat space. Draw a circle of radius 1 and with center at the origin. Draw a horizontal line starting at the origin and going off in the positive-x-direction. Draw another line from the center and making an angle with respect to the horizontal line. That line intersects the circle somewhere. The x-coordinate of that point is the cosine of the angle. The y-coordinate of that point is the sine of the angle. What could be more sensible?
That depends what you think sensible. We’re so used to drawing circles and making lines inside that we forget we can do other things. Here’s one.
Start with a circle. Again with radius 1. Now chop an arc out of it. Pick up that arc and drop it down on the ground. How far does this arc reach, left to right? How high does it reach, top to bottom?
Well, the arc you chopped out has some length. Let me call that length 2θ. That two makes it easier to put this in terms of familiar trig functions. How much space does this chopped and dropped arc cover, horizontally? That’s twice the sine of θ. How tall is this chopped and dropped arc? That’s the versine of θ.
We are accustomed to thinking of the relationships between pieces of a circle like this in terms of angles inside the circle. Or of the relationships of the legs of triangles. It seems obviously useful. We even know many formulas relating sines and cosines and other major functions to each other. But it’s no less valid to look at arcs plucked out of a circle and the length of that arc and its width and its height. This might be more convenient, especially if we are often thinking about the outsides of circular things. Indeed, the oldest tables we in the Western tradition have of trigonometric functions list sines and versines. Cosines would come later.
This partly answers why there should have ever been a versine. But we’ve had the cosine since Arabian mathematicians started thinking seriously about triangles. Why had we needed versine the last 1200 years? And why don’t we need it anymore?
One answer here is that mention about the oldest tables of trigonometric functions. Or of less-old tables. Until recently, as things go, anyone who wanted to do much computing needed tables of common functions at many different values. These tables might not have the since we really need of, say, 1.17 degrees. But if the table had 1.1 and 1.2 we could get pretty close.
But trigonometry will be needed. One of the great fields of practical mathematics has long been navigation. We locate points on the globe using latitude and longitude. To find the distance between points is a messy calculation. The calculation becomes less longwinded, and more clear, written using versines. (Properly, it uses the haversine, which is one-half times the versine. It will not surprise you that a 19th-century English mathematician coined that name.)
Having a neat formula is pleasant, but, you know. It’s navigators and surveyors using these formulas. They can deal with a lengthy formula. The typesetters publishing their books are already getting hazard pay. Why change a bunch of references to instead?
We get a difference when it comes time to calculate. Like, pencil on paper. The cosine (sine, versine, haversine, whatever) of 1.17 degrees is a transcendental number. We do not have the paper to write that number out. We’ll write down instead enough digits until we get tired. 0.99979, say. Maybe 0.9998. To take 1 minus that number? That’s 0.00021. Maybe 0.0002. What’s the difference?
It’s in the precision. 1.17 degrees is a measure that has three significant digits. 0.00021? That’s only two digits. 0.0002? That’s got only one digit. We’ve lost precision, and without even noticing it. Whatever calculations we’re making on this have grown error margins. Maybe we’re doing calculations for which this won’t matter. Do we know that, though?
This reflects what we call numerical instability. You may have made only a slight error. But your calculation might magnify that error until it overwhelms your calculation. There isn’t any one fix for numerical instability. But there are some good general practices. Like, don’t divide a large number by a small one. Don’t add a tiny number to a large one. And don’t subtract two very-nearly-equal numbers. Calculating 1 minus the cosines of a small angle is subtracting a number that’s quite close to 1 from a number that is 1. You’re not guaranteed danger, but you are at greater risk.
A table of versines, rather than one of cosines, can compensate for this. If you’re making a table of versines it’s because you know people need the versine of 1.17 degrees with some precision. You can list it as 2.08488 times 10-4, and trust them to use as much precision as they need. For the cosine table, 0.999792 will give cosine-users the same number of significant digits. But it shortchanges versine-users.
And here we see a reason for the versine to go from minor but useful function to obscure function. Any modern computer calculates with floating point numbers. You can get fifteen or thirty or, if you really need, sixty digits of precision for the cosine of 1.17 degrees. So you can get twelve or twenty-seven or fifty-seven digits for the versine of 1.17 degrees. We don’t need to look this up in a table constructed by someone working out formulas carefully.
This, I have to warn, doesn’t always work. Versine formulas for things like distance work pretty well with small angles. There are other angles for which they work badly. You have to calculate in different orders and maybe use other functions in that case. Part of numerical computing is selecting the way to compute the thing you want to do. Versines are for some kinds of problems a good way.
There are other advantages versines offered back when computing was difficult. In spherical trigonometry calculations they can let one skip steps demanding squares and square roots. If you do need to take a square root, you have the assurance that the versine will be non-negative. You don’t have to check that you aren’t slipping complex-valued numbers into your computation. If you need to take a logarithm, similarly, you know you don’t have to deal with the log of a negative number. (You still have to do something to avoid taking the logarithm of zero, but we can’t have everything.)
So this is what the versine offered. You could get precision that just using a cosine table wouldn’t necessarily offer. You could dodge numerical instabilities. You could save steps, in calculations and in thinking what to calculate. These are all good things. We can respect that. We enjoy now a computational abundance, which makes the things versine gave us seem like absurd penny-pinching. If computing were hard again, we might see the versine recovered from obscurity to, at least, having more special interest.
Wikipedia tells me that there are still specialized uses for the versine, or for the haversine. Recent decades, apparently, have found useful tools for calculating lunar distances, and sight reductions. The lunar distance is the angular separation between the Moon and some other body in the sky. Sight reduction is calculating positions from the apparent positions of reference objects. These are not problems that I work on often. But I would appreciate that we are still finding ways to do them well.
I mentioned that besides the versine there was a coversine and a haversine. There’s also a havercosine, and then some even more obscure functions with no less wonderful names like the exsecant. I cannot imagine needing a hacovercosine, except maybe to someday meet an obscure poetic meter, but I am happy to know such a thing is out there in case. A person on Wikipedia’s Talk page about the versine wished to know if we could define a vertangent and some other terms. We can, of course, but apparently no one has found a need for such a thing. If we find a problem that we would like to solve that they do well, this may change.
Goldenoj suggested my topic for today’s essay. It delighted me because I had no idea what it was. It wasn’t even listed on Mathworld, where I start all my research for these essays. It turned out to be something that I use all the time, but that I learned so long ago that it’s faded to invisibility. I didn’t even know that the concept had a name. So that makes it a great topic for an essay like this. I hope.
I once interviewed for a job I didn’t expect to get (or take). I would have taught for a university that provided courses for United States armed forces dependents. One bit of small talk that I thought went well had my potential department head mention a weird little quirk. United States-raised children were unusually good in multiplying stuff by 25. I had a ready hypothesis: the United States (and Canada) have a quarter-dollar coin. Many other countries just don’t, making do with 20-cent and 50-cent pieces instead. The potential department head said that was a good observation. United States-raised kids got practice turning four 25’s into a block of 100.
And this is the thing labelled as unitizing. A unit is, in this context, the thing we think of as “one thing”. This can be dollars, or feet of distance, or loaves of bread, or weeks of paid vacation. Whatever we need to measure. A unit often is made up of tinier pieces, cents or inches or slices or days. It can often be bundled up into bigger ones. Unitizing is about finding the bundle of things that makes the work one wants to do easy to understand.
This is a difficult topic for me to write about. I find it hard to notice myself doing it. But, for example, consider counting. Most people have a fair time counting up to five or six things at a glance. Eighteen things? There’s no telling that at a glance. What you can do, though, is notice that they group together, a block of six things here, another six here, another six there. Then the mass of things has turned into a manageable several collections of manageable counts of things. And, if we need to reverse the process, we can do that. Recognize that the 36 little triangular-wedge game tokens can be given out nine each to the four players. They can in turn arrange six of the tokens into an attractive complete wheel, and make do with the three remainder.
Slices of things turn up a good bit in thought about unitizing. One of particular delight that I found is this paper, by Susan J Lamon. It’s The Development of Unitizing: Its Role in Children’s Partitioning Strategies. Lamon investigated how children understand quantity, and the paper describes several experiments. A typical example is asking children how to evenly divide four pizzas among six people. And how their strategies change if all the pizzas are cut beforehand, versus whether they have to make the cuts themselves. Or how the question changes if things that are not pizza are considered. One child had different cutting strategies for four pizzas versus four cookies. The good reason: cookies are harder to slice than pizzas. You need to be more economical with your cuts so you don’t ruin the food.
And what kids found to be units depended on what was being divided. Four pizzas with different toppings would be divided differently from four identical pizzas. Four Chinese dinners were split by different strategies too. One child explained it just didn’t seem right to call what each person got four-sixths of each dinners. Lamon speculates this reflects cultural conventions about meals that are often eaten in common, and that feels right to me.
There’s obvious uses to this unitizing, in figuring how to divide pizzas and cases of 24 pop cans. There are subtler uses. Positional notation depends on unitizing. We group ten individual things into a new block, and denote it as something in a tens column. Or ten individual blocks-of-ten, which we denote as something in a hundreds column. And we go the other way as we need, when subtracting or dividing.
When I was learning base-ten (and other) arithmetic, they taught me to think of exchanging ten pennies for a dime, or ten dimes for a dollar, or back the other way. To someone hoarding pennies so as to afford things from the bookmobile the practice working out units worked well.
With that context you see why it’s hard to point out what’s happening. You aren’t reading a pop mathematics blog unless you’re quite at ease with calculation. That there is a particular skill done becomes invisible due to its ubiquity. It takes special circumstances to see it again.
Today’s A To Z term was nominated by APMA, author of the Everybody Makes DATA blog. It was a topic that delighted me to realize I could explain. Then it started to torment me as I realized there is a lot to explain here, and I had to pick something. So here’s where things ended up.
In the mid-2000s I was teaching at a department being closed down. In its last semester I had to teach Computational Quantum Mechanics. The person who’d normally taught it had transferred to another department. But a few last majors wanted the old department’s version of the course, and this pressed me into the role. Teaching a course you don’t really know is a rush. It’s a semester of learning, and trying to think deeply enough that you can convey something to students. This while all the regular demands of the semester eat your time and working energy. And this in the leap of faith that the syllabus you made up, before you truly knew the subject, will be nearly enough right. And that you have not committed to teaching something you do not understand.
So around mid-course I realized I needed to explain finding the wave function for a hydrogen atom with two electrons. The wave function is this probability distribution. You use it to find things like the probability a particle is in a certain area, or has a certain momentum. Things like that. A proton with one electron is as much as I’d ever done, as a physics major. We treat the proton as the center of the universe, immobile, and the electron hovers around that somewhere. Two electrons, though? A thing repelling your electron, and repelled by your electron, and neither of those having fixed positions? What the mathematics of that must look like terrified me. When I couldn’t procrastinate it farther I accepted my doom and read exactly what it was I should do.
It turned out I had known what I needed for nearly twenty years already. Got it in high school.
Of course I’m discussing Taylor Series. The equations were loaded down with symbols, yes. But at its core, the important stuff, was this old and trusted friend.
The premise behind a Taylor Series is even older than that. It’s universal. If you want to do something complicated, try doing the simplest thing that looks at all like it. And then make that a little bit more like you want. And then a bit more. Keep making these little improvements until you’ve got it as right as you truly need. Put that vaguely, the idea describes Taylor series just as well as it describes making a video game or painting a state portrait. We can make it more specific, though.
A series, in this context, means the sum of a sequence of things. This can be finitely many things. It can be infinitely many things. If the sum makes sense, we say the series converges. If the sum doesn’t, we say the series diverges. When we first learn about series, the sequences are all numbers. , for example, which diverges. (It adds to a number bigger than any finite number.) Or , which converges. (It adds to .)
In a Taylor Series, the terms are all polynomials. They’re simple polynomials. Let me call the independent variable ‘x’. Sometimes it’s ‘z’, for the reasons you would expect. (‘x’ usually implies we’re looking at real-valued functions. ‘z’ usually implies we’re looking at complex-valued functions. ‘t’ implies it’s a real-valued function with an independent variable that represents time.) Each of these terms is simple. Each term is the distance between x and a reference point, raised to a whole power, and multiplied by some coefficient. The reference point is the same for every term. What makes this potent is that we use, potentially, many terms. Infinitely many terms, if need be.
Call the reference point ‘a’. Or if you prefer, x0. z0 if you want to work with z’s. You see the pattern. This ‘a’ is the “point of expansion”. The coefficients of each term depend on the original function at the point of expansion. The coefficient for the term that has is the first derivative of f, evaluated at a. The coefficient for the term that has is the second derivative of f, evaluated at a. The coefficient for the term that has is the third derivative of f, evaluated at a.
You’ll never guess what the coefficient for the term with is. Nor will you ever care. The only reason you would wish to is to answer an exam question. The instructor will, in that case, have a function that’s either the sine or the cosine of x. The point of expansion will be 0, , , or .
Otherwise you will trust that this is one of the terms of , ‘n’ representing some counting number too great to be interesting. All the interesting work will be done with the Taylor series either truncated to a couple terms, or continued on to infinitely many.
What a Taylor series offers is the chance to approximate a function we’re genuinely interested in with a polynomial. This is worth doing, usually, because polynomials are easier to work with. They have nice analytic properties. We can automate taking their derivatives and integrals. We can set a computer to calculate their value at some point, if we need that. We might have no idea how to start calculating the logarithm of 1.3. We certainly have an idea how to start calculating . (Yes, it’s 0.3. I’m using a Taylor series with a = 1 as the point of expansion.)
The first couple terms tell us interesting things. Especially if we’re looking at a function that represents something physical. The first two terms tell us where an equilibrium might be. The next term tells us whether an equilibrium is stable or not. If it is stable, it tells us how perturbations, points near the equilibrium, behave.
The first couple terms will describe a line, or a quadratic, or a cubic, some simple function like that. Usually adding more terms will make this Taylor series approximation a better fit to the original. There might be a larger region where the polynomial and the original function are close enough. Or the difference between the polynomial and the original function will be closer together on the same old region.
We would really like that region to eventually grow to the whole domain of the original function. We can’t count on that, though. Roughly, the interval of convergence will stretch from ‘a’ to wherever the first weird thing happens. Weird things are, like, discontinuities. Vertical asymptotes. Anything you don’t like dealing with in the original function, the Taylor series will refuse to deal with. Outside that interval, the Taylor series diverges and we just can’t use it for anything meaningful. Which is almost supernaturally weird of them. The Taylor series uses information about the original function, but it’s all derivatives at a single point. Somehow the derivatives of, say, the logarithm of x around x = 1 give a hint that the logarithm of 0 is undefinable. And so they won’t help us calculate the logarithm of 3.
Things can be weirder. There are functions that just break Taylor series altogether. Some are obvious. A function needs lots of derivatives at a point to have a good Taylor series approximation. So, many fractal curves won’t have a Taylor series approximation. These curves are all corners, points where they aren’t continuous or where derivatives don’t exist. Some are obviously designed to break Taylor series approximations. We can make a function that follows different rules if x is rational than if x is irrational. There’s no approximating that, and you’d blame the person who made such a function, not the Taylor series. It can be subtle. The function defined by the rule , with the note that if x is zero then f(x) is 0, seems to satisfy everything we’d look for. It’s a function that’s mostly near 1, that drops down to being near zero around where x = 0. But its Taylor series expansion around a = 0 is a horizontal line always at 0. The interval of convergence can be a single point, challenging our idea of what an interval is.
That’s all right. If we can trust that we’re avoiding weird parts, Taylor series give us an outstanding new tool. Grant that the Taylor series describes a function with the same rule as our original function. The Taylor series is often easier to work with, especially if we’re working on differential equations. We can automate, or at least find formulas for, taking the derivative of a polynomial. Or adding together derivatives of polynomials. Often we can attack a differential equation too hard to solve otherwise by supposing the answer is a polynomial. This is essentially what that quantum mechanics problem used, and why the tool was so familiar when I was in a strange land.
Roughly. What I was actually doing was treating the function I wanted as a power series. This is, like the Taylor series, the sum of a sequence of terms, all of which are times some coefficient. What makes it not a Taylor series is that the coefficients weren’t the derivatives of any function I knew to start. But the experience of Taylor series trained me to look at functions as things which could be approximated by polynomials.
This gives us the hint to look at other series that approximate interesting functions. We get a host of these, with names like Laurent series and Fourier series and Chebyshev series and such. Laurent series look like Taylor series but we allow powers to be negative integers as well as positive ones. Fourier series do away with polynomials. They instead use trigonometric functions, sines and cosines. Chebyshev series build on polynomials, but not on pure powers. They’ll use orthogonal polynomials. These behave like perpendicular directions do. That orthogonality makes many numerical techniques behave better.
The Taylor series is a great introduction to these tools. Its first several terms have good physical interpretations. Its calculation requires tools we learn early on in calculus. The habits of thought it teaches guides us even in unfamiliar territory.
And I feel very relieved to be done with this. I often have a few false starts to an essay, but those are mostly before I commit words to text editor. This one had about four branches that now sit in my scrap file. I’m glad to have a deadline forcing me to just publish already.
Today’s A To Z term is another from goldenoj. It’s one important to probability, and it’s one at the center of the field.
The sample space is a tool for probability questions. We need them. Humans are bad at probability questions. Thinking of sample spaces helps us. It’s a way to recast probability questions so that our intuitions about space — which are pretty good — will guide us to probabilities.
A sample space collects the possible results of some experiment. “Experiment” means what way mathematicians intend, so, not something with test tubes and colorful liquids that might blow up. Instead it’s things like tossing coins and dice and pulling cards out of reduced decks. At least while we’re learning. In real mathematical work this turns into more varied stuff. Fluid flows or magnetic field strengths or economic forecasts. The experiment is the doing of something which gives us information. This information is the result of flipping this coin or drawing this card or measuring this wind speed. Once we know the information, that’s the outcome.
So each possible outcome we represent as a point in the sample space. Describing it as a “space” might cause trouble. “Space” carries connotations of something three-dimensional and continuous and contiguous. This isn’t necessarily so. We can be interested in discrete outcomes. A coin’s toss has two possible outcomes. Three, if we count losing the coin. The day of the week on which someone’s birthday falls has seven possible outcomes. We can also be interested in continuous outcomes. The amount of rain over the day is some nonnegative real number. The amount of time spent waiting at this traffic light is some nonnegative real number. We’re often interested in discrete representations of something continuous. We did not have inches of rain overnight, even if we did. We recorded 0.71 inches after the storm.
We don’t demand every point in the sample space to be equally probable. There seems to be a circularity to requiring that. What we do demand is that the sample space be a “sigma algebra”, or σ-algebra to write it briefly. I don’t know how σ came to be the shorthand for this kind of algebra. Here “algebra” means a thing with a bunch of rules. These rules are about what you’d guess if you read pop mathematics blogs and had to bluff your way through a conversation of rules about sets. The algebra’s this collection of sets made up of the elements of X. Subsets of this algebra have to be contained in this collection. Their complements are also sets in the collection. The unions of sets have to be in the collection.
So the sample space is a set. All the possible outcomes of the experiment we’re thinking about are its elements. Every experiment must have some outcome that’s inside the sample space. And any two different outcomes have to be mutually exclusive. That is, if outcome A has happened, then outcome B has not happened. And vice-versa; I’m not so fond of A that I would refuse B.
I see your protest. You’ve worked through probability homework problems where you’re asked the chance a card drawn from this deck is either a face card or a diamond. The jack of diamonds is both. This is true; but it’s not what we’re looking at. The outcome of this experiment is the card that’s drawn, which might be any of 52 options.
If you like treating it that way. You might build the sample space differently, like saying that it’s an ordered pair. One part of the pair is the suit of the card. The other part is the value. This might be better for the problem you’re doing. This is part of why the probability department commands such high wages. There are many sample spaces that can describe the problem you’re interested in. This does include one where one event is “draw a card that’s a face card or diamond” and the other is “draw one that isn’t”. (These events don’t have an equal probability.) The work is finding a sample space that clarifies your problem.
Working out the sample space that clarifies the problem is the hard part, usually. Not being rigorous about the space gives us many probability paradoxes. You know, like the puzzle where you’re told someone’s two children are either boys or girls. One walks in and it’s a girl. You’re told the probability the other is a boy is two-thirds. And you get mad. Or the Monty Hall Paradox, where you’re asked to pick which of three doors has the grand prize behind it. You’re shown one that you didn’t pick which hasn’t. You’re given the chance to switch to the remaining door. You’re told the probability that the grand prize is behind that other door is two-thirds, and you get mad. There are probability paradoxes that don’t involve a chance of two-thirds. Having a clear idea of the sample space avoids getting the answers wrong, at least. There’s not much to do about not getting mad.
Like I said, we don’t insist that every point in the sample space have an equal probability of being the outcome. Or, if it’s a continuous space, that every region of the same area has the same probability. It is certainly easier if it does. Then finding the probability of some result becomes easy. You count the number of outcomes that satisfy that result, and divide by the total number of outcomes. You see this in problems about throwing two dice and asking the chance the total is seven, or five, or twelve.
For a continuous sample space, you’d find the area of all the results that satisfy the result. Divide that by the area of the sample space and there’s the probability of that result. (It’s possible for a result to have an area of zero, which implies that the thing cannot happen. This presents a paradox. A thing is in the sample space because it is a possible outcome. What these measure-zero results are, typically, is something like every one of infinitely many tossed coins coming up tails. That can’t happen, but it’s not like there’s any reason it can’t.)
If every outcome isn’t equally likely, though? Sometimes we can redesign the sample space to something that is. The result of rolling two dice is a familiar example. The chance of the dice totalling 2 is different from the chance of them totalling 4. So a sample space that’s just the sums, the numbers 2 through 12, is annoying to deal with. But rewrite the space as the ordered pairs, the result of die one and die two? Then we have something nice. The chance of die one being 1 and die two being 1 is the same as the chance of die one being 2 and die two being 2. There happen to be other die combinations that add up to 4 is all.
Sometimes there’s no finding a sample space which describes what you’re interested in and that makes every point equally probable. Or nearly enough. The world is vast and complicated. That’s all right. We can have a function that describes, for each point in the sample space, the probability of its turning up. Really we had that already, for equally-probable outcomes. It’s just that was all the same number. But this function is called the probability measure. If we combine together a sample space, and a collection of all the events we’re interested in, and a probability measure for all these events, then this triad is a probability space.
And probability spaces give us all sorts of great possibilities. Dearest to my own work is Monte Carlo methods, in which we look for particular points inside the sample space. We do this by starting out anywhere, picking a point at random. And then try moving to a different point, picking the “direction” of the change at random. We decide whether that move succeeds by a rule that depends in part on the probability measure, and in part on how well whatever we’re looking for holds true. This is a scheme that demands a lot of calculation. You won’t be surprised that it only became a serious tool once computing power was abundant.
So for many problems there is no actually listing all the sample space. A real problem might include, say, the up-or-down orientation of millions of magnets. This is a sample space of unspeakable vastness. But thinking out this space, and what it must look like, helps these probability questions become ones that our intuitions help us with instead. If you do not know what to do with a probability question, think to the sample spaces.
And now the most challenging part of doing an A to Z series: the time after the end of Daylight Saving, when I absolutely positively have to have my final copy ready to go at 1 pm, rather than 2 pm. I’m looking for nominations for what to write about for the last half-dozen letters of the alphabet.
These letters do include X. There’s no getting around that. After about two iterations of this the choices for ‘X’ I was running out of candidates on Mathworld’s dictionary of topics. Last year I opened up ‘X’ as a wild card topic, taking subjects from other letters. It’s just coincidence that we then went with ‘extreme’, like it was the 90s or something.
And I do thank everyone who makes a suggestion. As much as I sometimes feel crushed by the attempt to write two 800-word essays that both blow up to 1900 words each week, they get me to learn things, and to practice thinking about things, and that’s such fantastic fun.
Please nominate topics in comments here. I have a better chance of keeping nominations organized if they’re all together. Also please, if you do suggest something, let me know if you have a blog or YouTube channel or Twitter or Mathstodon account, or even a good old-fashioned web site, that you’d like to show off. I do try to credit ideas and let folks know what the people who give me ideas are doing that’s worth showing off, too.
Here’s the essays I’ve written in past years for the letters U through Z.
I have another subject nominated by goldenoj today. And it even lets me get into number theory, the field of mathematics questions that everybody understands and nobody can prove.
I was once a young grad student working as a teaching assistant and unaware of the principles of student privacy. Near the end of semesters I would e-mail students their grades. This so they could correct any mistakes and know what they’d have to get on the finals. I was learning Perl, which was an acceptable pastime in the 1990s. So I wrote scripts that would take my spreadsheet of grades and turn it into e-mails that were automatically sent. And then I got all fancy.
It seemed boring to send out completely identical form letters, even if any individual would see it once. Maybe twice if they got me for another class. So I started writing variants of the boilerplate sentences. My goal was that every student would get a mass-produced yet unique e-mail. To best the chances of this I had to make sure of something about all these variant sentences and paragraphs.
So you see the trick. I needed a set of relatively prime numbers. That way, it would be the greatest possible number of students before I had a completely repeated text. We know what prime numbers are. They’re the numbers that, in your field, have exactly two factors. In the counting numbers the primes are numbers like 2, 3, 5, 7 and so on. In the Gaussian integers, these are numbers like 3 and 7 and . But not 2 or 5. We can look to primes among the polynomials. Among polynomials with rational coefficients, is prime. So is . is not.
The idea of relative primes appears wherever primes appears. We can say without contradiction that 4 and 9 are relative primes, among the whole numbers. Though neither’s prime, in the whole numbers, neither has a prime factor in common. This is an obvious way to look at it. We can use that definition for any field that has a concept of primes. There are others, though. We can say two things are relatively prime if there’s a linear combination of them that adds to the identity element. You get a linear combination by multiplying each of the things by a scalar and adding these together. Multiply 4 by -2 and 9 by 1 and add them and look what you get. Or, if the least common multiple of a set of elements is equal to their product, then the elements are relatively prime. Some make sense only for the whole numbers. Imagine the first quadrant of a plane, marked in Cartesian coordinates. Draw the line segment connecting the point at (0, 0) and the point with coordinates (m, n). If that line segment touches no dots between (0, 0) and (m, n), then the whole numbers m and n are relatively prime.
We start looking at relative primes as pairs of things. We can be interested in larger sets of relative primes, though. My little e-mail generator, for example, wouldn’t work so well if any pair of sentence replacements were not relatively prime. So, like, the set of numbers 2, 6, 9 is relatively prime; all three numbers share no prime factors. But neither the pair 2, 6 and the pair 6, 9 are not relatively prime. 2, 9 is, at least there’s that. I forget how many replaceable sentences were in my form e-mails. I’m sure I did the cowardly thing, coming up with a prime number of alternate ways to phrase as many sentences as possible. As an undergraduate I covered the student government for four years’ worth of meetings. I learned a lot of ways to say the same thing.
Which is all right, but are relative primes important? Relative primes turn up all over the place in number theory, and in corners of group theory. There are some thing that are easier to calculate in modulo arithmetic if we have relatively prime numbers to work with. I know when I see modulo arithmetic I expect encryption schemes to follow close behind. Here I admit I’m ignorant whether these imply things which make encryption schemes easier or harder.
Some of the results are neat, certainly. Suppose that the function f is a polynomial. Then, if its first derivative f’ is relatively prime to f, it turns out f has no repeated roots. And vice-versa: if f has no repeated roots, then it and its first derivative are relatively prime. You remember repeated roots. They’re factors like , that foiled your attempt to test a couple points and figure roughly where a polynomial crossed the x-axis.
I mentioned that primeness depends on the field. This is true of relative primeness. Polynomials really show this off. (Here I’m using an example explained in a 2007 Ask Dr Math essay.) Is the polynomial relatively prime to ?
It is, if we are interested in polynomials with integer coefficients. There’s no linear combination of and which gets us to 1. Go ahead and try.
It is not, if we are interested in polynomials with rational coefficients. Multiply by and multiply by . Then add those up.
Tell me what polynomials you want to deal with today and I will tell you which answer is right.
This may all seem cute if, perhaps, petty. A bunch of anonymous theorems dotting the center third of an abstract algebra text will inspire that. The most important relative-primes thing I know of is the abc conjecture, posed in the mid-80s by Joseph Oesterlé and David Masser. Start with three counting numbers, a, b, and c. Require that a + b = c.
There is a product of the unique prime factors of a, b, and c. That is, let’s say a is 36. This is 2 times 2 times 3 times 3. Let’s say b is 5. This is prime. c is 41; it’s prime. Their unique prime factors are 2, 3, 5, and 41; the product of all these is 1,230.
The conjecture deals with this product of unique prime factors for this relatively prime triplet. Almost always, c is going to be smaller than this unique prime factors product. The conjecture says that there will be, for every positive real number , at most finitely many cases where c is larger than this product raised to the power . I do not know why raising this product to this power is so important. I assume it rules out some case where this product raised to the first power would be too easy a condition.
Apart from that bit, though, this is a classic sort of number theory conjecture. Like, it involves some technical terms, but nothing too involved. You could almost explain it at a party and expect to be understood, and to get some people writing down numbers, testing out specific cases. Nobody will go away solving the problem, but they’ll have some good exercise and that’s worthwhile.
And it has consequences. We do not know whether the abc conjecture is true. We do know that if it is true, then a bunch of other things follow. The one that a non-mathematician would appreciate is that Fermat’s Last Theorem would be provable by an alterante route. The abc conjecture would only prove the cases for Fermat’s Last Theorem for powers greater than 5. But that’s all right. We can separately work out the cases for the third, fourth, and fifth powers, and then cover everything else at once. (That we know Fermat’s Last Theorem is true doesn’t let us conclude the abc conjecture is true, unfortunately.)
There are other implications. Some are about problems that seem like fun to play with. If the abc conjecture is true, then for every integer A, there are finitely many values of n for which is a perfect square. Some are of specialist interest: Lang’s conjecture, about elliptic curves, would be true. This is a lower bound for the height of non-torsion rational points. I’d stick to the stuff at a party. A host of conjectures about Diophantine equations — (high school) algebra problems where only integers may be solutions — become theorems. Also coming true: the Fermat-Catalan conjecture. This is a neat problem; it claims that the equation
where a, b, and c are relatively prime, and m, n, and k are positive integers satisfying the constraint
has only finitely many solutions with distinct triplets . The inequality about reciprocals of m, n, and k is needed so we don’t have boring solutions like clogging us up. The bit about distinct triplets is so we don’t clog things up with a or b being 1 and then technically every possible m or n giving us a “different” set. To date we know something like ten solutions, one of them having a equal to 1.
Another implication is Pillai’s Conjecture. This one asks whether every positive integer occurs only finitely many times as the difference between perfect powers. Perfect powers are, like 32 (two to the fifth power) or 81 (three to the fourth power) or such.
So as often happens when we stumble into a number theory thing, the idea of relative primes is easy. And there are deep implications to them. But those in turn give us things that seem like fun arithmetic puzzles.
I got a good nomination for a Q topic, thanks again to goldenoj. It was for Qualitative/Quantitative. Either would be a good topic, but they make a natural pairing. They describe the things mathematicians look for when modeling things. But ultimately I couldn’t find an angle that I liked. So rather than carry on with an essay that wasn’t working I went for a topic of my own. Might come back around to it, though, especially if nothing good presents itself for the letter X, which will probably need to be a wild card topic anyway.
We like comparing sizes. I talked about that some with norms. We do the same with shapes, though. We’d like to know which one is bigger than another, and by how much. We rely on squares to do this for us. It could be any shape, but we in the western tradition chose squares. I don’t know why.
My guess, unburdened by knowledge, is the ancient Greek tradition of looking at the shapes one can make with straightedge and compass. The easiest shape these tools make is, of course, circles. But it’s hard to find a circle with the same area as, say, any old triangle. Squares are probably a next-best thing. I don’t know why not equilateral triangles or hexagons. Again I would guess that the ancient Greeks had more rectangular or square rooms than the did triangles or hexagons, and went with what they knew.
So that’s what lurks behind that word “quadrature”. It may be hard for us to judge whether this pentagon is bigger than that octagon. But if we find squares that are the same size as the pentagon and the octagon, great. We can spot which of the squares is bigger, and by how much.
Straightedge-and-compass lets you find the quadrature for many shapes. Like, take a rectangle. Let me call that ABCD. Let’s say that AB is one of the long sides and BC one of the short sides. OK. Extend AB, outwards, to another point that I’ll call E. Pick E so that the length of BE is the same as the length of BC.
Next, bisect the line segment AE. Call that point F. F is going to be the center of a new semicircle, one with radius FE. Draw that in, on the side of AE that’s opposite the point C. Because we are almost there.
Extend the line segment CB upwards, until it touches this semicircle. Call the point where it touches G. The line segment BG is the side of a square with the same area as the original rectangle ABCD. If you know enough straightedge-and-compass geometry to do that bisection, you know enough to turn BG into a square. If you’re not sure why that’s the correct length, you can get there quickly. Use a little algebra and the Pythagorean theorem.
Neat, yeah, I agree. Also neat is that you can use the same trick to find the area of a parallelogram. A parallelogram has the same area as a square with the same bases and height between them, you remember. So take your parallelogram, draw in some perpendiculars to share that off into a rectangle, and find the quadrature of that rectangle. you’ve got the quadrature of your parallelogram.
Having the quadrature of a parallelogram lets you find the quadrature of any triangle. Pick one of the sides of the triangle as the base. You have a third point not on that base. Draw in the parallel to that base that goes through that third point. Then choose one of the other two sides. Draw the parallel to that side which goes through the other point. Look at that: you’ve got a parallelogram with twice the area of your original triangle. Bisect either the base or the height of this parallelogram, as you like. Then follow the rules for the quadrature of a parallelogram, and you have the quadrature of your triangle. Yes, you’re doing a lot of steps in-between the triangle you started with and the square you ended with. Those steps don’t count, not by this measure. Getting the results right matters.
And here’s some more beauty. You can find the quadrature for any polygon. Remember how you can divide any polygon into triangles? Go ahead and do that. Find the quadrature for every one of those triangles then. And you can create a square that has an area as large as all those squares put together. I’ll refrain from saying quite how, because realizing how is such a delight, one of those moments that at least made me laugh at how of course that’s how. It’s through one of those things that even people who don’t know mathematics know about.
With that background you understand why people thought the quadrature of the circle ought to be possible. Moreso when you know that the lune, a particular crescent-moon-like shape, can be squared. It looks so close to a half-circle that it’s obvious the rest should be possible. It’s not, and it took two thousand years and a completely different idea of geometry to prove it. But it sure looks like it should be possible.
Along the way to modernity quadrature picked up a new role. This is as part of calculus. One of the legs of calculus is integration. There is an interpretation of what the (definite) integral of a function means so common that we sometimes forget it doesn’t have to be that. This is to say that the integral of a function is the area “underneath” the curve. That is, it’s the area bounded by the limits of integration, by the horizontal axis, and by the curve represented by the function. If the function is sometimes less than zero, within the limits of integration, we’ll say that the integral represents the “net area”. Then we allow that the net area might be less than zero. Then we ignore the scolding looks of the ancient Greek mathematicians.
No matter. We love being able to find “the” integral of a function. This is a new function, and evaluating it tells us what this net area bounded by the limits of integration is. Finding this is “integration by quadrature”. At least in books published back when they wrote words like “to-day” or “coördinate”. My experience is that the term’s passed out of the vernacular, at least in North American Mathematician’s English.
Anyway the real flaw is that there are, like, six functions we can find the integral for. For the rest, we have to make do with approximations. This gives us “numerical quadrature”, a phrase which still has some currency.
And with my prologue about compass-and-straightedge quadrature you can see why it’s called that. Numerical integration schemes often rely on finding a polynomial with a part that looks like a graph of the function you’re interested in. The other edges look like the limits of the integration. Then the area of that polygon should be close to the area “underneath” this function. So it should be close to the integral of the function you want. And we’re old hands at how the quadrature of polygons, since we talked that out like five hundred words ago.
Now, no person ever has or ever will do numerical quadrature by compass-and-straightedge on some function. So why call it “numerical quadrature” instead of just “numerical integration”? Style, for one. “Quadrature” as a word has a nice tone, clearly jargon but not threateningly alien. Also “numerical integration” often connotes the solving differential equations numerically. So it can clarify whether you’re evaluating integrals or solving differential equations. If you think that’s a distinction worth making. Evaluating integrals and solving differential equations are similar together anyway.
And there is another adjective that often attaches to quadrature. This is Gaussian Quadrature. Gaussian Quadrature is, in principle, a fantastic way to do numerical integration perfectly. For some problems. For some cases. The insight which justifies it to me is one of those boring little theorems you run across in the chapter introducing How To Integrate. It runs something like this. Suppose ‘f’ is a continuous function, with domain the real numbers and range the real numbers. Suppose a and b are the limits of integration. Then there’s at least one point c, between a and b, for which:
So if you could pick the right c, any integration would be so easy. Evaluate the function for one point and multiply it by whatever b minus a is. The catch is, you don’t know what c is.
Except there’s some cases where you kinda do. Like, if f is a line, rising or falling with a constant slope from a to b? Then have c be the midpoint of a and b.
That won’t always work. Like, if f is a parabola on the region from a to b, then c is not going to be the midpoint. If f is a cubic, then the midpoint is probably not c. And so on. And if you don’t know what kind of function f is? There’s no guessing where c will be.
But. If you decide you’re only trying to certain kinds of functions? Then you can do all right. If you decide you only want to integrate polynomials, for example, then … well, you’re not going to find a single point c for this. But what you can find is a set of points between a and b. Evaluate the function for those points. And then find a weighted average by rules I’m not getting into here. And that weighted average will be exactly that integral.
Of course there’s limits. The Gaussian Quadrature of a function is only possible if you can evaluate the function at arbitrary points. If you’re trying to integrate, like, a set of sample data it’s inapplicable. The points you pick, and the weighting to use, depend on what kind of function you want to integrate. The results will be worse the less your function is like what you supposed. It’s tedious to find what these points are for a particular assumption of function. But you only have to do that once, or look it up, if you know (say) you’re going to use polynomials of degree up to six or something like that.
And there are variations on this. They have names like the Chevyshev-Gauss Quadrature, or the Hermite-Gauss Quadrature, or the Jacobi-Gauss Quadrature. There are even some that don’t have Gauss’s name in them at all.
Despite that, you can get through a lot of mathematics not talking about quadrature. The idea implicit in the name, that we’re looking to compare areas of different things by looking at squares, is obsolete. It made sense when we worked with numbers that depended on units. One would write about a shape’s area being four times another shape’s, or the length of its side some multiple of a reference length.
We’ve grown comfortable thinking of raw numbers. It makes implicit the step where we divide the polygon’s area by the area of some standard reference unit square. This has advantages. We don’t need different vocabulary to think about integrating functions of one or two or ten independent variables. We don’t need wordy descriptions like “the area of this square is to the area of that as the second power of this square’s side is to the second power of that square’s side”. But it does mean we don’t see squares as intermediaries to understanding different shapes anymore.
Today’s A To Z term is another from goldenoj. It was just the proposal “Platonic”. Most people, prompted, would follow that adjective with one of three words. There’s relationship, ideal, and solid. Relationship is a little too far off of mathematics for me to go into here. Platonic ideals run very close to mathematics. Probably the default philosophy of western mathematics is Platonic. At least a folk Platonism, where the rest of us follow what the people who’ve taken the study of mathematical philosophy seriously seem to be doing. The idea that mathematical constructs are “real things” and have some “existence” that we can understand even if we will never see a true circle or an unadulterated four. Platonic solids, though, those are nice and familiar things. Many of them we can find around the house. That’s one direction to go.
Before I get to the Platonic Solids, though, I’d like to think a little more about Platonic Ideals. What do they look like? I gather our friends in the philosophy department have debated this question a while. So I won’t pretend to speak as if I had actual knowledge. I just have an impression. That impression is … well, something simple. My reasoning is that the Platonic ideal of, say, a chair has to have all the traits that every chair ever has. And there’s not a lot that every chair has. Whatever’s in the Platonic Ideal chair has to be just the things that every chair has, and to omit things that non-chairs do not.
That’s comfortable to me, thinking like a mathematician, though. I think mathematicians train to look for stuff that’s very generally true. This will tend to be things that have few properties to satisfy. Things that look, in some way, simple.
So what is simple in a shape? There’s no avoiding aesthetic judgement here. We can maybe use two-dimensional shapes as a guide, though. Polygons seem nice. They’re made of line segments which join at vertices. Regular polygons even nicer. Each vertex in a regular polygon connects to two edges. Each edge connects to exactly two vertices. Each edge has the same length. The interior angles are all congruent. And if you get many many sides, the regular polygon looks like a circle.
So there’s some things we might look for in solids. Shapes where every edge is the same length. Shapes where every edge connects exactly two vertices. Shapes where every vertex connects to the same number of edges. Shapes where the interior angles are all constant. Shapes where each face is the same polygon as every other face. Look for that and, in three-dimensional space, we find nine shapes.
Yeah, you want that to be five also. The four extra ones are “star polyhedrons”. They look like spikey versions of normal shapes. What keeps these from being Platonic solids isn’t a lack of imagination on Plato’s part. It’s that they’re not convex shapes. There’s no pair of points in a convex shape for which the line segment connecting them goes outside the shape. For the star polyhedrons, well, look at the ends of any two spikes. If we decide that part of this beautiful simplicity is convexity, then we’re down to five shapes. They’re famous. Tetrahedron, cube, octahedron, icosahedron, and dodecahedron.
I’m not sure why they’re named the Platonic Solids, though. Before you explain to me that they were named by Plato in the dialogue Timaeus, let me say something. They were named by Plato in the dialogue Timaeus. That isn’t the same thing as why they have the name Platonic Solids. I trust Plato didn’t name them “the me solids”, since if I know anything about Plato he would have called them “the Socratic solids”. It’s not that Plato was the first to group them either. At least some of the solids were known long before Plato. I don’t know of anyone who thinks Plato particularly advanced human understanding of the solids.
But he did write about them, and in things that many people remembered. It’s natural for a name to attach to the most famous person writing them. Still, someone had the thought which we follow to group these solids together under Plato’s name. I’m curious who, and when. Naming is often a more arbitrary thing than you’d think. The Fibonacci sequence has been known at latest since Fibonacci wrote about it in 1204. But it could not have that name before 1838, when historian Guillaume Libri gave Leonardo of Pisa the name Fibonacci. I’m not saying that the name “Platonic Solid” was invented in, like, 2002. But traditions that seem age-old can be surprisingly recent.
What is an age-old tradition is looking for physical significance in the solids. Plato himself cleverly matched the solids to the ancient concept of four elements plus a quintessence. Johannes Kepler, whom we thank for noticing the star polyhedrons, tried to match them to the orbits of the planets around the sun. Wikipedia tells me of a 1980s attempt to understand the atomic nucleus using Platonic solids. The attempt even touches me. Along the way to my thesis I looked at uniform charges free to move on the surface of a sphere. It was obvious if there were four charges they’d move to the vertices of a tetrahedron on the sphere. Similarly, eight charges would go to the vertices of the cube. 20 charges to the vertices of the icosahedron. And so on. The Platonic Solids seem not just attractive but also of some deep physical significance.
There are not the four (or five) elements of ancient Greek atomism. Attractive as it is to think that fire is a bunch of four-sided dice. The orbits of the planets have nothing to do with the Platonic solids. I know too little about the physics of the atomic nucleus to say whether that panned out. However, that it doesn’t even get its own Wikipedia entry suggests something to me. And, in fact, eight charges on the sphere will not settle at the vertices of a cube. They’ll settle on a staggered pattern, two squares turned 45 degrees relative to each other. The shape is called a “square antiprism”. I was as surprised as you to learn that. It’s possible that the Platonic Solids are, ultimately, pleasant to us but not a key to the universe.
The example of the Platonic Solids does give us the cue to look for other families of solids. There are many such. The Archimedean Solids, for example, are again convex polyhedrons. They have faces of two or more regular polygons, rather than the lone one of Platonic Solids. There are 13 of these, with names of great beauty like the snub cube or the small rhombicuboctahedron. The Archimedean Solids have duals. The dual of a polyhedron represents a face of the original shape with a vertex. Faces that meet in the original polyhedron have an edge between their dual’s vertices. The duals to the Archimedean Solids get the name Catalan Solids. This for the Belgian mathematician Eugène Catalan, who described them in 1865. These attract names like “deltoidal icositetrahedron”. (The Platonic Solids have duals too, but those are all Platonic solids too. The tetrahedron is even its own dual.) The star polyhedrons hint us to look at stellations. These are shapes we get by stretching out the edges or faces of a polyhedron until we get a new polyhedron. It becomes a dizzying taxonomy of shapes, many of them with pointed edges.
There are things that look like Platonic Solids in more than three dimensions of space. In four dimensions of space there are six of these, five of which look like versions of the Platonic Solids we all know. The sixth is this novel shape called the 24-cell, or hyperdiamond, or icositetrachoron, or some other wild names. In five dimensions of space? … it turns out there are only three things that look like Platonic Solids. There’s versions of the tetrahedron, the cube, and the octahedron. In six dimensions? … Three shapes, again versions of the tetrahedron, cube, and octahedron. And it carries on like this for seven, eight, nine, any number of dimensions of space. Which is an interesting development. If I hadn’t looked up the answer I’d have expected more dimensions of space to allow for more Platonic Solid-like shapes. Well, our experience with two and three dimensions guides us to thinking about more dimensions of space. It doesn’t mean that they’re just regular space with a note in the corner that “N = 8”. Shapes hold surprises.
Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there.
An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same sapce but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.
So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.
My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.
I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.
Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.
I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.
One of the conceptual leaps in high school algebra is that we start analyzing the things we do with numbers. Like, we don’t just take the number three, square it, multiply that by two and add to that the number three times four and add to that the number 1. We think about what if we take any number, call it x, and think of . And what if we make equations based on doing this latex 2x^2 + 4x + 1 $; what values of x make those equations true? Or tell us something interesting?
Operators represent a similar leap. We can think of functions as things we manipulate, and think of those manipulations as a particular thing to do. For example, let me come up with a differential expression. For some function u(x) work out the value of this:
Let me join in the convention of using ‘D’ for the differential operator. Then we can rewrite this expression like so:
Suddenly the differential equation looks a lot like a polynomial. Of course it does. Remember that everything in mathematics is polynomials. We get new tools to solve differential equations by rewriting them as operators. That’s nice. It also scratches that itch that I think everyone in Intro to Calculus gets, of wanting to somehow see as if it were a square of . It’s not, and is not the square of . It’s composing with itself. But it looks close enough to squaring to feel comfortable.
Nobody needs to do except to learn some stuff about operators. But you might imagine a world where we did this process all the time. If we did, then we’d develop shorthand for it. Maybe a new operator, call it T, and define it that . You see the grammar of treating functions as if they were real numbers becoming familiar. You maybe even noticed the ‘1’ sitting there, serving as the “identity operator”. You know how you’d write out if you needed to write it in full.
But there are operators that we use all the time. These do get special names, and often shorthand. For example, there’s the gradient operator. This applies to any function with several independent variables. The gradient has a great physical interpretation if the variables represent coordinates of space. If they do, the gradient of a function at a point gives us a vector that describes the direction in which the function increases fastest. And the size of that gradient — a functional on this operator — describes how fast that increase is.
The gradient itself defines more operators. These have names you get very familiar with in Vector Calculus, with names like divergence and curl. These have compelling physical interpretations if we think of the function we operate on as describing a moving fluid. A positive divergence means fluid is coming into the system; a negative divergence, that it is leaving. The curl, in fluids, describe how nearby streams of fluid move at different rate.
Physical interpretations are common in operators. This probably reflects how much influence physics has on mathematics and vice-versa. Anyone studying quantum mechanics gets familiar with a host of operators. These have comfortable names like “position operator” or “momentum operator” or “spin operator”. These are operators that apply to the wave function for a problem. They transform the wave function into a probability distribution. That distribution describes what positions or momentums or spins are likely, how likely they are. Or how unlikely they are.
They’re not all physical, though. Or not purely physical. Many operators are useful because they are powerful mathematical tools. There is a variation of the Fourier series called the Fourier transform. We can interpret this as an operator. Suppose the original function started out with time or space as its independent variable. This often happens. The Fourier transform operator gives us a new function, one with frequencies as independent variable. This can make the function easier to work with. The Fourier transform is an integral operator, by the way, so don’t go thinking everything is a complicated set of derivatives.
Another integral-based operator that’s important is the Laplace transform. This is a great operator because it turns differential equations into algebraic equations. Often, into polynomials. You saw that one coming.
This is all a lot of good press for operators. Well, they’re powerful tools. They help us to see that we can manipulate functions in the ways that functions let us manipulate numbers. It should sound good to realize there is much new that you can do, and you already know most of what’s needed to do it.
Today’s A To Z term is another free choice. So I’m picking a term from the world of … mathematics. There are a lot of norms out there. Many are specialized to particular roles, such as looking at complex-valued numbers, or vectors, or matrices, or polynomials.
Still they share things in common, and that’s what this essay is for. And I’ve brushed up against the topic before.
The norm, also, has nothing particular to do with “normal”. “Normal” is an adjective which attaches to every noun in mathematics. This is security for me as while these A-To-Z sequences may run out of X and Y and W letters, I will never be short of N’s.
A “norm” is the size of whatever kind of thing you’re working with. You can see where this is something we look for. It’s easy to look at two things and wonder which is the smaller.
There are many norms, even for one set of things. Some seem compelling. For the real numbers, we usually let the absolute value do this work. By “usually” I mean “I don’t remember ever seeing a different one except from someone introducing the idea of other norms”. For a complex-valued number, it’s usually the square root of the sum of the square of the real part and the square of the imaginary coefficient. For a vector, it’s usually the square root of the vector dot-product with itself. (Dot product is this binary operation that is like multiplication, if you squint, for vectors.) Again, these, the “usually” means “always except when someone’s trying to make a point”.
Which is why we have the convention that there is a “the norm” for a kind of operation. The norm dignified as “the” is usually the one that looks as much as possible like the way we find distances between two points on a plane. I assume this is because we bring our intuition about everyday geometry to mathematical structures. You know how it is. Given an infinity of possible choices we take the one that seems least difficult.
Every sort of thing which can have a norm, that I can think of, is a vector space. This might be my failing imagination. It may also be that it’s quite easy to have a vector space. A vector space is a collection of things with some rules. Those rules are about adding the things inside the vector space, and multiplying the things in the vector space by scalars. These rules are not difficult requirements to meet. So a lot of mathematical structures are vector spaces, and the things inside them are vectors.
A norm is a function that has these vectors as its domain, and the non-negative real numbers as its range. And there are three rules that it has to meet. So. Give me a vector ‘u’ and a vector ‘v’. I’ll also need a scalar, ‘a. Then the function f is a norm when:
. This is a famous rule, called the triangle inequality. You know how in a triangle, the sum of the lengths of any two legs is greater than the length of the third leg? That’s the rule at work here.
. This doesn’t have so snappy a name. Sorry. It’s something about being homogeneous, at least.
If then u has to be the additive identity, the vector that works like zero does.
Norms take on many shapes. They depend on the kind of thing we measure, and what we find interesting about those things. Some are familiar. Look at a Euclidean space, with Cartesian coordinates, so that we might write something like (3, 4) to describe a point. The “the norm” for this, called the Euclidean norm or the L2 norm, is the square root of the sum of the squares of the coordinates. So, 5. But there are other norms. The L1 norm is the sum of the absolute values of all the coefficients; here, 7. The L∞ norm is the largest single absolute value of any coefficient; here, 4.
A polynomial, meanwhile? Write it out as . Take the absolute value of each of these terms. Then … you have choices. You could take those absolute values and add them up. That’s the L1 polynomial norm. Take those absolute values and square them, then add those squares, and take the square root of that sum. That’s the L2 norm. Take the largest absolute value of any of these coefficients. That’s the L∞ norm.
These don’t look so different, even though points in space and polynomials seem to be different things. We designed the tool. We want it not to be weirder than it has to be. When we try to put a norm on a new kind of thing, we look for a norm that resembles the old kind of thing. For example, when we want to define the norm of a matrix, we’ll typically rely on a norm we’ve already found for a vector. At least to set up the matrix norm; in practice, we might do a calculation that doesn’t explicitly use a vector’s norm, but gives us the same answer.
If we have a norm for some vector space, then we have an idea of distance. We can say how far apart two vectors are. It’s the norm of the difference between the vectors. This is called defining a metric on the vector space. A metric is that sense of how far apart two things are. What keeps a norm and a metric from being the same thing is that it’s possible to come up with a metric that doesn’t match any sensible norm.
It’s always possible to use a norm to define a metric, though. Doing that promotes our normed vector space to the dignified status of a “metric space”. Many of the spaces we find interesting enough to work in are such metric spaces. It’s hard to think of doing without some idea of size.
Today’s A To Z term was nominated again by @aajohannas. The other compelling nomination was from Vayuputrii, for the Mittag-Leffler function. I was tempted. But I realized I could not think of a clear way to describe why the function was interesting. Or even where it comes from that avoided being a heap of technical terms. There’s no avoiding technical terms in writing about mathematics, but there’s only so much I want to put in at once either. It also makes me realize I don’t understand the Mittag-Leffler function, but it is after all something I haven’t worked much with.
The Mittag-Leffler function looks like it’s one of those things named for several contributors, like Runge-Kutta Integration or Cauchy-Kovalevskaya Theorem or something. Not so here; this was one person, Gösta Mittag-Leffler. His name’s all over the theory of functions. And he was one of the people helping Sofia Kovalevskaya, whom you know from every list of pioneering women in mathematics, secure her professorship.
A martingale is how mathematicians prove you can’t get rich gambling.
Well, that exaggerates. Some people will be lucky, of course. But there’s no strategy that works. The only strategy that works is to rig the game. You can do this openly, by setting rules that give you a slight edge. You usually have to be the house to do this. Or you can do it covertly, using tricks like card-counting (in blackjack) or weighted dice or other tricks. But a fair game? Meaning one not biased towards or against any player? There’s no strategy to guarantee winning that.
We can make this more technical. Martingales arise form the world of stochastic processes. This is an indexed set of random variables. A random variable is some variable with a value that depends on the result of some phenomenon. A tossed coin. Rolled dice. Number of people crossing a particular walkway over a day. Engine temperature. Value of a stock being traded. Whatever. We can’t forecast what the next value will be. But we now the distribution, which values are more likely and which ones are unlikely and which ones impossible.
The field grew out of studying real-world phenomena. Things we could sample and do statistics on. So it’s hard to think of an index that isn’t time, or some proxy for time like “rolls of the dice”. Stochastic processes turn up all over the place. A lot of what we want to know is impossible, or at least impractical, to exactly forecast. Think of the work needed to forecast how many people will cross this particular walk four days from now. But it’s practical to describe what are more and less likely outcomes. What the average number of walk-crossers will be. What the most likely number will be. Whether to expect tomorrow to be a busier or a slower day.
And this is what the martingale is for. Start with a sequence of your random variables. How many people have crossed that street each day since you started studying. What is the expectation value, the best guess, for the next result? Your best guess for how many will cross tomorrow? Keeping in mind your knowledge of how all these past values. That’s an important piece. It’s not a martingale if the history of results isn’t a factor.
Every probability question has to deal with knowledge. Sometimes it’s easy. The probability of a coin coming up tails next toss? That’s one-half. The probability of a coin coming up tails next toss, given that it came up tails last time? That’s still one-half. The probability of a coin coming up tails next toss, given that it came up tails the last 40 tosses? That’s … starting to make you wonder if this is a fair coin. I’d bet tails, but I’d also ask to examine both sides, for a start.
So a martingale is a stochastic process where we can make forecasts about the future. Particularly, the expectation value. The expectation value is the sum of the products of every possible value and how probable they are. In a martingale, the expected value for all time to come is just the current value. So if whatever it was you’re measuring was, say, 40 this time? That’s your expectation for the whole future. Specific values might be above 40, or below 40, but on average, 40 is it.
Put it that way and you’d think, well, how often does that ever happen? Maybe some freak process will give you that, but most stuff?
Well, here’s one. The random walk. Set a value. At each step, it can increase or decrease by some fixed value. It’s as likely to increase as to decrease. This is a martingale. And it turns out a lot of stuff is random walks. Or can be processed into random walks. Even if the original walk is unbalanced — say it’s more likely to increase than decrease. Then we can do a transformation, and find a new random variable based on the original. Then that one is as likely to increase as decrease. That one is a martingale.
It’s not just random walks. Poisson processes are things where the chance of something happening is tiny, but it has lots of chances to happen. So this measures things like how many car accidents happen on this stretch of road each week. Or where a couple plants will grow together into a forest, as opposed to lone trees. How often a store will have too many customers for the cashiers on hand. These processes by themselves aren’t often martingales. But we can use them to make a new stochastic process, and that one is a martingale.
Where this all comes to gambling is in stopping times. This is a random variable that’s based on the stochastic process you started with. Its value at each index represents the probability that the random variable in that has reached some particular value by this index. The language evokes a gambler’s decision: when do you stop? There are two obvious stopping times for any game. One is to stop when you’ve won enough money. The other is to stop when you’ve lost your whole stake.
So there is something interesting about a martingale that has bounds. It will almost certainly hit at least one of those bounds, in a finite time. (“Almost certainly” has a technical meaning. It’s the same thing I mean when I say if you flip a fair coin infinitely many times then “almost certainly” it’ll come up tails at least once. Like, it’s not impossible that it doesn’t. It just won’t happen.) And for the gambler? The boundary of “runs out of money” is a lot closer than “makes the house run out of money”.
Oh, if you just want a little payoff, that’s fine. If you’re happy to walk away from the table with a one percent profit? You can probably do that. You’re closer to that boundary than to the runs-out-of-money one. A ten percent profit? Maybe so. Making an unlimited amount of money, like you’d want to live on your gambling winnings? No, that just doesn’t happen.
This gets controversial when we turn from gambling to the stock market. Or a lot of financial mathematics. Look at the value of a stock over time. I write “stock” for my convenience. It can be anything with a price that’s constantly open for renegotiation. Stocks, bonds, exchange funds, used cars, fish at the market, anything. The price over time looks like it’s random, at least hour-by-hour. So how can you reliably make money if the fluctuations of the price of a stock are random?
Well, if I knew, I’d have smaller student loans outstanding. But martingales seem like they should offer some guidance. Much of modern finance builds on not dealing with a stock price varying. Instead, buy the right to buy the stock at a set price. Or buy the right to sell the stock at a set price. This lets you pay to secure a certain profit, or a worst-possible loss, in case the price reaches some level. And now you see the martingale. Is it likely that the stock will reach a certain price within this set time? How likely? This can, in principle, guide you to a fair price for this right-to-buy.
The mathematical reasoning behind that is fine, so far as I understand it. Trouble arises because pricing correctly means having a good understanding of how likely it is prices will reach different levels. Fortunately, there are few things humans are better at than estimating probabilities. Especially the probabilities of complicated situations, with abstract and remote dangers.
So martingales are an interesting corner of mathematics. They apply to purely abstract problems like random walks. Or to good mathematical physics problems like Brownian motion and the diffusion of particles. And they’re lurking behind the scenes of the finance news. Exciting stuff.
I’m hopefully going to pass the halfway point on this year’s mathematics A-To-Z. This makes it a good time to panel for topics for the next several letters in the alphabet. It’s easier for me to keep my notes straight if you post requests as comments on this thread, but I’ll try to keep up if you do comment on other threads.
As ever, I’m happy to consider most mathematical topics, including ones that I’ve written about in the past if I think I can better an old essay. If there’s several suggestions for the same letter, I’ll pick the one that I think I can do most interestingly. If several seem interesting I might try rephrasing, if the subject allows for that.
And I do thank everyone who makes a suggestion, especially if it’s one that surprises me and that makes me learn something along the way.
Here’s the essays I’ve written in past years for the letters O through T.
I couldn’t find a place to fit this in the essay proper. But it’s too good to leave out. The simplex method, discussed within, traces to George Dantzig. He’d been planning methods for the US Army Air Force during the Second World War. Dantzig is a person you have heard about, if you’ve heard any mathematical urban legends. In 1939 he was late to Jerzy Neyman’s class. He took two statistics problems on the board to be homework. He found them “harder than usual”, but solved them in a couple days and turned in the late homework hoping Neyman would be understanding. They weren’t homework. They were examples of famously unsolved problems. Within weeks Neyman had written one of the solutions up for publication. When he needed a thesis topic Neyman advised him to just put what he already had in a binder. It’s the stuff every grad student dreams of. The story mutated. It picked up some glurge to become a narrative about positive thinking. And mutated further, into the movie Good Will Hunting.
Every three days one of the comic strips I read has the elderly main character talk about how they never used algebra. This is my hyperbole. But mathematics has got the reputation for being difficult and inapplicable to everyday life. We’ll concede using arithmetic, when we get angry at the fast food cashier who hands back our two pennies before giving change for our $6.77 hummus wrap. But otherwise, who knows what an elliptic integral is, and whether it’s working properly?
Linear programming does not have this problem. In part, this is because it lacks a reputation. But those who have heard of it, acknowledge it as immensely practical mathematics. It is about something a particular kind of human always finds compelling. That is how to do a thing best.
There are several kinds of “best”. There is doing a thing in as little time as possible. Or for as little effort as possible. For the greatest profit. For the highest capacity. For the best score. For the least risk. The goals have a thousand names, none of which we need to know. They all mean the same thing. They mean “the thing we wish to optimize”. To optimize has two directions, which are one. The optimum is either the maximum or the minimum. To be good at finding a maximum is to be good at finding a minimum.
It’s obvious why we call this “programming”; obviously, we leave the work of finding answers to a computer. It’s a spurious reason. The “programming” here comes from an independent sense of the word. It means more about finding a plan. Think of “programming” a night’s entertainment, so that every performer gets their turn, all scene changes have time to be done, you don’t put two comedians right after the other, and you accommodate the performer who has to leave early and the performer who’ll get in an hour late. Linear programming problems are often about finding how to do as well as possible given various priorities. All right. At least the “linear” part is obvious. A mathematics problem is “linear” when it’s something we can reasonably expect to solve. This is not the technical meaning. Technically what it means is we’re looking at a function something like:
Here, x, y, and z are the independent variables. We don’t know their values but wish to. a, b, and c are coefficients. These values are set to some constant for the problem, but they might be something else for other problems. They’re allowed to be positive or negative or even zero. If a coefficient is zero, then the matching variable doesn’t affect matters at all. The corresponding value can be anything at all, within the constraints.
I’ve written this for three variables, as an example and because ‘x’ and ‘y’ and ‘z’ are comfortable, familiar variables. There can be fewer. There can be more. There almost always are. Two- and three-variable problems will teach you how to do this kind of problem. They’re too simple to be interesting, usually. To avoid committing to a particular number of variables we can use indices. for values of j from 1 up to N. Or we can bundle all these values together into a vector, and write everything as . This has a particular advantage since when we can write the coefficients as a vector too. Then we use the notation of linear algebra, and write that we hope to maximize the value of:
(The superscript T means “transpose”. As a linear algebra problem we’d usually think of writing a vector as a tall column of things. By transposing that we write a long row of things. By transposing we can use the notation of matrix multiplication.)
This is the objective function. Objective here in the sense of goal; it’s the thing we want to find the best possible value of.
We have constraints. These represent limits on the variables. The variables are always things that come in limited supply. There’s no allocating more money than the budget allows, nor putting more people on staff than work for the company. Often these constraints interact. Perhaps not only is there only so much staff, but no one person can work more than a set number of days in a row. Something like that. That’s all right. We can write all these constraints as a matrix equation. An inequality, properly. We can bundle all the constraints into a big matrix named A, and demand:
Also, traditionally, we suppose that every component of is non-negative. That is, positive, or at lowest, zero. This reflects the field’s core problems of figuring how to allocate resources. There’s no allocating less than zero of something.
But we need some bounds. This is easiest to see with a two-dimensional problem. Try it yourself: draw a pair of axes on a sheet of paper. Now put in a constraint. Doesn’t matter what. The constraint’s edge is a straight line, which you can draw at any position and any angle you like. This includes horizontal and vertical. Shade in one side of the constraint. Whatever you shade in is the “feasible region”, the sets of values allowed under the constraint. Now draw in another line, another constraint. Shade in one side or the other of that. Draw in yet another line, another constraint. Shade in one side or another of that. The “feasible region” is whatever points have taken on all these shades. If you were lucky, this is a bounded region, a triangle. If you weren’t lucky, it’s not bounded. It’s maybe got some corners but goes off to the edge of the page where you stopped shading things in.
So adding that every component of is at least as big as zero is a backstop. It means we’ll usually get a feasible region with a finite volume. What was the last project you worked on that had no upper limits for anything, just minimums you had to satisfy? Anyway if you know you need something to be allowed less than zero go ahead. We’ll work it out. The important thing is there’s finite bounds on all the variables.
I didn’t see the bounds you drew. It’s possible you have a triangle with all three shades inside. But it’s also possible you picked the other sides to shade, and you have an annulus, with no region having more than two shades in it. This can happen. It means it’s impossible to satisfy all the constraints at once. At least one of them has to give. You may be reminded of the sign taped to the wall of your mechanics’ about picking two of good-fast-cheap.
But impossibility is at least easy. What if there is a feasible region?
Well, we have reason to hope. The optimum has to be somewhere inside the region, that’s clear enough. And it even has to be on the edge of the region. If you’re not seeing why, think of a simple example, like, finding the maximum of , inside the square where x is between 0 and 2 and y is between 0 and 3. Suppose you had a putative maximum on the inside, like, where x was 1 and y was 2. What happens if you increase x a tiny bit? If you increase y by twice that? No, it’s only on the edges you can get a maximum that can’t be locally bettered. And only on the corners of the edges, at that.
(This doesn’t prove the case. But it is what the proof gets at.)
So the problem sounds simple then! We just have to try out all the vertices and pick the maximum (or minimum) from them all.
OK, and here’s where we start getting into trouble. With two variables and, like, three constraints? That’s easy enough. That’s like five points to evaluate? We can do that.
We never need to do that. If someone’s hiring you to test five combinations I admire your hustle and need you to start getting me consulting work. A real problem will have many variables and many constraints. The feasible region will most often look like a multifaceted gemstone. It’ll extend into more than three dimensions, usually. It’s all right if you just imagine the three, as long as the gemstone is complicated enough.
Because now we’ve got lots of vertices. Maybe more than we really want to deal with. So what’s there to do?
The basic approach, the one that’s foundational to the field, is the simplex method. A “simplex” is a triangle. In three dimensions, anyway. In four dimensions it’s a tetrahedron. In two dimensions it’s a line segment. Generally, however many dimensions of space you have? The simplex is the simplest thing that fills up volume in your space.
You know how you can turn any polygon into a bunch of triangles? Just by connecting enough vertices together? You can turn a polyhedron into a bunch of tetrahedrons, by adding faces that connect trios of vertices. And for polyhedron-like shapes in more dimensions? We call those polytopes. Polytopes we can turn into a bunch of simplexes. So this is why it’s the “simplex method”. Any one simplex it’s easy to check the vertices on. And we can turn the polytope into a bunch of simplexes. And we can ignore all the interior vertices of the simplexes.
So here’s the simplex method. First, break your polytope up into simplexes. Next, pick any simplex; doesn’t matter which. Pick any outside vertex of that simplex. This is the first viable possible solution. It’s most likely wrong. That’s okay. We’ll make it better.
Because there are other vertices on this simplex. And there are other simplexes, adjacent to that first, which share this vertex. Test the vertices that share an edge with this one. Is there one that improves the objective function? Probably. Is there a best one of those in this simplex? Sure. So now that’s our second viable possible solution. If we had to give an answer right now, that would be our best guess.
But this new vertex, this new tentative solution? It shares edges with other vertices, across several simplexes. So look at these new neighbors. Are any of them an improvement? Which one of them is the best improvement? Move over there. That’s our next tentative solution.
You see where this is going. Keep at this. Eventually it’ll wind to a conclusion. Usually this works great. If you have, like, 8 constraints, you can usually expect to get your answer in from 16 to 24 iterations. If you have 20 constraints, expect an answer in from 40 to 60 iterations. This is doing pretty well.
But it might take a while. It’s possible for the method to “stall” a while, often because one or more of the variables is at its constraint boundary. Or the division of polytope into simplexes got unlucky, and it’s hard to get to better solutions. Or there might be a string of vertices that are all at, or near, the same value, so the simplex method can’t resolve where to “go” next. In the worst possible case, the simplex method takes a number of iterations that grows exponentially with the number of constraints. This, yes, is very bad. It doesn’t typically happen. It’s a numerical algorithm. There’s some problem to spoil any numerical algorithm.
You may have complaints. Like, the world is complicated. Why are we only looking at linear objective functions? Or, why only look at linear constraints? Well, if you really need to do that? Go ahead, but that’s not linear programming anymore. Think hard about whether you really need that, though. Linear anything is usually simpler than nonlinear anything. I mean, if your optimization function absolutely has to have in it? Could we just say you have a new variable that just happens to be equal to the square of y? Will that work? If you have to have the sine of z? Are you sure that z isn’t going to get outside the region where the sine of z is pretty close to just being z? Can you check?
Maybe you have, and there’s just nothing for it. That’s all right. This is why optimization is a living field of study. It demands judgement and thought and all that hard work.