Goldenoj suggested my topic for today’s essay. It delighted me because I had no idea what it was. It wasn’t even listed on Mathworld, where I start all my research for these essays. It turned out to be something that I use all the time, but that I learned so long ago that it’s faded to invisibility. I didn’t even know that the concept had a name. So that makes it a great topic for an essay like this. I hope.
I once interviewed for a job I didn’t expect to get (or take). I would have taught for a university that provided courses for United States armed forces dependents. One bit of small talk that I thought went well had my potential department head mention a weird little quirk. United States-raised children were unusually good in multiplying stuff by 25. I had a ready hypothesis: the United States (and Canada) have a quarter-dollar coin. Many other countries just don’t, making do with 20-cent and 50-cent pieces instead. The potential department head said that was a good observation. United States-raised kids got practice turning four 25’s into a block of 100.
And this is the thing labelled as unitizing. A unit is, in this context, the thing we think of as “one thing”. This can be dollars, or feet of distance, or loaves of bread, or weeks of paid vacation. Whatever we need to measure. A unit often is made up of tinier pieces, cents or inches or slices or days. It can often be bundled up into bigger ones. Unitizing is about finding the bundle of things that makes the work one wants to do easy to understand.
This is a difficult topic for me to write about. I find it hard to notice myself doing it. But, for example, consider counting. Most people have a fair time counting up to five or six things at a glance. Eighteen things? There’s no telling that at a glance. What you can do, though, is notice that they group together, a block of six things here, another six here, another six there. Then the mass of things has turned into a manageable several collections of manageable counts of things. And, if we need to reverse the process, we can do that. Recognize that the 36 little triangular-wedge game tokens can be given out nine each to the four players. They can in turn arrange six of the tokens into an attractive complete wheel, and make do with the three remainder.
Slices of things turn up a good bit in thought about unitizing. One of particular delight that I found is this paper, by Susan J Lamon. It’s The Development of Unitizing: Its Role in Children’s Partitioning Strategies. Lamon investigated how children understand quantity, and the paper describes several experiments. A typical example is asking children how to evenly divide four pizzas among six people. And how their strategies change if all the pizzas are cut beforehand, versus whether they have to make the cuts themselves. Or how the question changes if things that are not pizza are considered. One child had different cutting strategies for four pizzas versus four cookies. The good reason: cookies are harder to slice than pizzas. You need to be more economical with your cuts so you don’t ruin the food.
And what kids found to be units depended on what was being divided. Four pizzas with different toppings would be divided differently from four identical pizzas. Four Chinese dinners were split by different strategies too. One child explained it just didn’t seem right to call what each person got four-sixths of each dinners. Lamon speculates this reflects cultural conventions about meals that are often eaten in common, and that feels right to me.
There’s obvious uses to this unitizing, in figuring how to divide pizzas and cases of 24 pop cans. There are subtler uses. Positional notation depends on unitizing. We group ten individual things into a new block, and denote it as something in a tens column. Or ten individual blocks-of-ten, which we denote as something in a hundreds column. And we go the other way as we need, when subtracting or dividing.
When I was learning base-ten (and other) arithmetic, they taught me to think of exchanging ten pennies for a dime, or ten dimes for a dollar, or back the other way. To someone hoarding pennies so as to afford things from the bookmobile the practice working out units worked well.
With that context you see why it’s hard to point out what’s happening. You aren’t reading a pop mathematics blog unless you’re quite at ease with calculation. That there is a particular skill done becomes invisible due to its ubiquity. It takes special circumstances to see it again.
Today’s A To Z term was nominated by APMA, author of the Everybody Makes DATA blog. It was a topic that delighted me to realize I could explain. Then it started to torment me as I realized there is a lot to explain here, and I had to pick something. So here’s where things ended up.
In the mid-2000s I was teaching at a department being closed down. In its last semester I had to teach Computational Quantum Mechanics. The person who’d normally taught it had transferred to another department. But a few last majors wanted the old department’s version of the course, and this pressed me into the role. Teaching a course you don’t really know is a rush. It’s a semester of learning, and trying to think deeply enough that you can convey something to students. This while all the regular demands of the semester eat your time and working energy. And this in the leap of faith that the syllabus you made up, before you truly knew the subject, will be nearly enough right. And that you have not committed to teaching something you do not understand.
So around mid-course I realized I needed to explain finding the wave function for a hydrogen atom with two electrons. The wave function is this probability distribution. You use it to find things like the probability a particle is in a certain area, or has a certain momentum. Things like that. A proton with one electron is as much as I’d ever done, as a physics major. We treat the proton as the center of the universe, immobile, and the electron hovers around that somewhere. Two electrons, though? A thing repelling your electron, and repelled by your electron, and neither of those having fixed positions? What the mathematics of that must look like terrified me. When I couldn’t procrastinate it farther I accepted my doom and read exactly what it was I should do.
It turned out I had known what I needed for nearly twenty years already. Got it in high school.
Of course I’m discussing Taylor Series. The equations were loaded down with symbols, yes. But at its core, the important stuff, was this old and trusted friend.
The premise behind a Taylor Series is even older than that. It’s universal. If you want to do something complicated, try doing the simplest thing that looks at all like it. And then make that a little bit more like you want. And then a bit more. Keep making these little improvements until you’ve got it as right as you truly need. Put that vaguely, the idea describes Taylor series just as well as it describes making a video game or painting a state portrait. We can make it more specific, though.
A series, in this context, means the sum of a sequence of things. This can be finitely many things. It can be infinitely many things. If the sum makes sense, we say the series converges. If the sum doesn’t, we say the series diverges. When we first learn about series, the sequences are all numbers. , for example, which diverges. (It adds to a number bigger than any finite number.) Or , which converges. (It adds to .)
In a Taylor Series, the terms are all polynomials. They’re simple polynomials. Let me call the independent variable ‘x’. Sometimes it’s ‘z’, for the reasons you would expect. (‘x’ usually implies we’re looking at real-valued functions. ‘z’ usually implies we’re looking at complex-valued functions. ‘t’ implies it’s a real-valued function with an independent variable that represents time.) Each of these terms is simple. Each term is the distance between x and a reference point, raised to a whole power, and multiplied by some coefficient. The reference point is the same for every term. What makes this potent is that we use, potentially, many terms. Infinitely many terms, if need be.
Call the reference point ‘a’. Or if you prefer, x0. z0 if you want to work with z’s. You see the pattern. This ‘a’ is the “point of expansion”. The coefficients of each term depend on the original function at the point of expansion. The coefficient for the term that has is the first derivative of f, evaluated at a. The coefficient for the term that has is the second derivative of f, evaluated at a. The coefficient for the term that has is the third derivative of f, evaluated at a.
You’ll never guess what the coefficient for the term with is. Nor will you ever care. The only reason you would wish to is to answer an exam question. The instructor will, in that case, have a function that’s either the sine or the cosine of x. The point of expansion will be 0, , , or .
Otherwise you will trust that this is one of the terms of , ‘n’ representing some counting number too great to be interesting. All the interesting work will be done with the Taylor series either truncated to a couple terms, or continued on to infinitely many.
What a Taylor series offers is the chance to approximate a function we’re genuinely interested in with a polynomial. This is worth doing, usually, because polynomials are easier to work with. They have nice analytic properties. We can automate taking their derivatives and integrals. We can set a computer to calculate their value at some point, if we need that. We might have no idea how to start calculating the logarithm of 1.3. We certainly have an idea how to start calculating . (Yes, it’s 0.3. I’m using a Taylor series with a = 1 as the point of expansion.)
The first couple terms tell us interesting things. Especially if we’re looking at a function that represents something physical. The first two terms tell us where an equilibrium might be. The next term tells us whether an equilibrium is stable or not. If it is stable, it tells us how perturbations, points near the equilibrium, behave.
The first couple terms will describe a line, or a quadratic, or a cubic, some simple function like that. Usually adding more terms will make this Taylor series approximation a better fit to the original. There might be a larger region where the polynomial and the original function are close enough. Or the difference between the polynomial and the original function will be closer together on the same old region.
We would really like that region to eventually grow to the whole domain of the original function. We can’t count on that, though. Roughly, the interval of convergence will stretch from ‘a’ to wherever the first weird thing happens. Weird things are, like, discontinuities. Vertical asymptotes. Anything you don’t like dealing with in the original function, the Taylor series will refuse to deal with. Outside that interval, the Taylor series diverges and we just can’t use it for anything meaningful. Which is almost supernaturally weird of them. The Taylor series uses information about the original function, but it’s all derivatives at a single point. Somehow the derivatives of, say, the logarithm of x around x = 1 give a hint that the logarithm of 0 is undefinable. And so they won’t help us calculate the logarithm of 3.
Things can be weirder. There are functions that just break Taylor series altogether. Some are obvious. A function needs lots of derivatives at a point to have a good Taylor series approximation. So, many fractal curves won’t have a Taylor series approximation. These curves are all corners, points where they aren’t continuous or where derivatives don’t exist. Some are obviously designed to break Taylor series approximations. We can make a function that follows different rules if x is rational than if x is irrational. There’s no approximating that, and you’d blame the person who made such a function, not the Taylor series. It can be subtle. The function defined by the rule , with the note that if x is zero then f(x) is 0, seems to satisfy everything we’d look for. It’s a function that’s mostly near 1, that drops down to being near zero around where x = 0. But its Taylor series expansion around a = 0 is a horizontal line always at 0. The interval of convergence can be a single point, challenging our idea of what an interval is.
That’s all right. If we can trust that we’re avoiding weird parts, Taylor series give us an outstanding new tool. Grant that the Taylor series describes a function with the same rule as our original function. The Taylor series is often easier to work with, especially if we’re working on differential equations. We can automate, or at least find formulas for, taking the derivative of a polynomial. Or adding together derivatives of polynomials. Often we can attack a differential equation too hard to solve otherwise by supposing the answer is a polynomial. This is essentially what that quantum mechanics problem used, and why the tool was so familiar when I was in a strange land.
Roughly. What I was actually doing was treating the function I wanted as a power series. This is, like the Taylor series, the sum of a sequence of terms, all of which are times some coefficient. What makes it not a Taylor series is that the coefficients weren’t the derivatives of any function I knew to start. But the experience of Taylor series trained me to look at functions as things which could be approximated by polynomials.
This gives us the hint to look at other series that approximate interesting functions. We get a host of these, with names like Laurent series and Fourier series and Chebyshev series and such. Laurent series look like Taylor series but we allow powers to be negative integers as well as positive ones. Fourier series do away with polynomials. They instead use trigonometric functions, sines and cosines. Chebyshev series build on polynomials, but not on pure powers. They’ll use orthogonal polynomials. These behave like perpendicular directions do. That orthogonality makes many numerical techniques behave better.
The Taylor series is a great introduction to these tools. Its first several terms have good physical interpretations. Its calculation requires tools we learn early on in calculus. The habits of thought it teaches guides us even in unfamiliar territory.
And I feel very relieved to be done with this. I often have a few false starts to an essay, but those are mostly before I commit words to text editor. This one had about four branches that now sit in my scrap file. I’m glad to have a deadline forcing me to just publish already.
Today’s A To Z term is another from goldenoj. It’s one important to probability, and it’s one at the center of the field.
The sample space is a tool for probability questions. We need them. Humans are bad at probability questions. Thinking of sample spaces helps us. It’s a way to recast probability questions so that our intuitions about space — which are pretty good — will guide us to probabilities.
A sample space collects the possible results of some experiment. “Experiment” means what way mathematicians intend, so, not something with test tubes and colorful liquids that might blow up. Instead it’s things like tossing coins and dice and pulling cards out of reduced decks. At least while we’re learning. In real mathematical work this turns into more varied stuff. Fluid flows or magnetic field strengths or economic forecasts. The experiment is the doing of something which gives us information. This information is the result of flipping this coin or drawing this card or measuring this wind speed. Once we know the information, that’s the outcome.
So each possible outcome we represent as a point in the sample space. Describing it as a “space” might cause trouble. “Space” carries connotations of something three-dimensional and continuous and contiguous. This isn’t necessarily so. We can be interested in discrete outcomes. A coin’s toss has two possible outcomes. Three, if we count losing the coin. The day of the week on which someone’s birthday falls has seven possible outcomes. We can also be interested in continuous outcomes. The amount of rain over the day is some nonnegative real number. The amount of time spent waiting at this traffic light is some nonnegative real number. We’re often interested in discrete representations of something continuous. We did not have inches of rain overnight, even if we did. We recorded 0.71 inches after the storm.
We don’t demand every point in the sample space to be equally probable. There seems to be a circularity to requiring that. What we do demand is that the sample space be a “sigma algebra”, or σ-algebra to write it briefly. I don’t know how σ came to be the shorthand for this kind of algebra. Here “algebra” means a thing with a bunch of rules. These rules are about what you’d guess if you read pop mathematics blogs and had to bluff your way through a conversation of rules about sets. The algebra’s this collection of sets made up of the elements of X. Subsets of this algebra have to be contained in this collection. Their complements are also sets in the collection. The unions of sets have to be in the collection.
So the sample space is a set. All the possible outcomes of the experiment we’re thinking about are its elements. Every experiment must have some outcome that’s inside the sample space. And any two different outcomes have to be mutually exclusive. That is, if outcome A has happened, then outcome B has not happened. And vice-versa; I’m not so fond of A that I would refuse B.
I see your protest. You’ve worked through probability homework problems where you’re asked the chance a card drawn from this deck is either a face card or a diamond. The jack of diamonds is both. This is true; but it’s not what we’re looking at. The outcome of this experiment is the card that’s drawn, which might be any of 52 options.
If you like treating it that way. You might build the sample space differently, like saying that it’s an ordered pair. One part of the pair is the suit of the card. The other part is the value. This might be better for the problem you’re doing. This is part of why the probability department commands such high wages. There are many sample spaces that can describe the problem you’re interested in. This does include one where one event is “draw a card that’s a face card or diamond” and the other is “draw one that isn’t”. (These events don’t have an equal probability.) The work is finding a sample space that clarifies your problem.
Working out the sample space that clarifies the problem is the hard part, usually. Not being rigorous about the space gives us many probability paradoxes. You know, like the puzzle where you’re told someone’s two children are either boys or girls. One walks in and it’s a girl. You’re told the probability the other is a boy is two-thirds. And you get mad. Or the Monty Hall Paradox, where you’re asked to pick which of three doors has the grand prize behind it. You’re shown one that you didn’t pick which hasn’t. You’re given the chance to switch to the remaining door. You’re told the probability that the grand prize is behind that other door is two-thirds, and you get mad. There are probability paradoxes that don’t involve a chance of two-thirds. Having a clear idea of the sample space avoids getting the answers wrong, at least. There’s not much to do about not getting mad.
Like I said, we don’t insist that every point in the sample space have an equal probability of being the outcome. Or, if it’s a continuous space, that every region of the same area has the same probability. It is certainly easier if it does. Then finding the probability of some result becomes easy. You count the number of outcomes that satisfy that result, and divide by the total number of outcomes. You see this in problems about throwing two dice and asking the chance the total is seven, or five, or twelve.
For a continuous sample space, you’d find the area of all the results that satisfy the result. Divide that by the area of the sample space and there’s the probability of that result. (It’s possible for a result to have an area of zero, which implies that the thing cannot happen. This presents a paradox. A thing is in the sample space because it is a possible outcome. What these measure-zero results are, typically, is something like every one of infinitely many tossed coins coming up tails. That can’t happen, but it’s not like there’s any reason it can’t.)
If every outcome isn’t equally likely, though? Sometimes we can redesign the sample space to something that is. The result of rolling two dice is a familiar example. The chance of the dice totalling 2 is different from the chance of them totalling 4. So a sample space that’s just the sums, the numbers 2 through 12, is annoying to deal with. But rewrite the space as the ordered pairs, the result of die one and die two? Then we have something nice. The chance of die one being 1 and die two being 1 is the same as the chance of die one being 2 and die two being 2. There happen to be other die combinations that add up to 4 is all.
Sometimes there’s no finding a sample space which describes what you’re interested in and that makes every point equally probable. Or nearly enough. The world is vast and complicated. That’s all right. We can have a function that describes, for each point in the sample space, the probability of its turning up. Really we had that already, for equally-probable outcomes. It’s just that was all the same number. But this function is called the probability measure. If we combine together a sample space, and a collection of all the events we’re interested in, and a probability measure for all these events, then this triad is a probability space.
And probability spaces give us all sorts of great possibilities. Dearest to my own work is Monte Carlo methods, in which we look for particular points inside the sample space. We do this by starting out anywhere, picking a point at random. And then try moving to a different point, picking the “direction” of the change at random. We decide whether that move succeeds by a rule that depends in part on the probability measure, and in part on how well whatever we’re looking for holds true. This is a scheme that demands a lot of calculation. You won’t be surprised that it only became a serious tool once computing power was abundant.
So for many problems there is no actually listing all the sample space. A real problem might include, say, the up-or-down orientation of millions of magnets. This is a sample space of unspeakable vastness. But thinking out this space, and what it must look like, helps these probability questions become ones that our intuitions help us with instead. If you do not know what to do with a probability question, think to the sample spaces.
And now the most challenging part of doing an A to Z series: the time after the end of Daylight Saving, when I absolutely positively have to have my final copy ready to go at 1 pm, rather than 2 pm. I’m looking for nominations for what to write about for the last half-dozen letters of the alphabet.
These letters do include X. There’s no getting around that. After about two iterations of this the choices for ‘X’ I was running out of candidates on Mathworld’s dictionary of topics. Last year I opened up ‘X’ as a wild card topic, taking subjects from other letters. It’s just coincidence that we then went with ‘extreme’, like it was the 90s or something.
And I do thank everyone who makes a suggestion. As much as I sometimes feel crushed by the attempt to write two 800-word essays that both blow up to 1900 words each week, they get me to learn things, and to practice thinking about things, and that’s such fantastic fun.
Please nominate topics in comments here. I have a better chance of keeping nominations organized if they’re all together. Also please, if you do suggest something, let me know if you have a blog or YouTube channel or Twitter or Mathstodon account, or even a good old-fashioned web site, that you’d like to show off. I do try to credit ideas and let folks know what the people who give me ideas are doing that’s worth showing off, too.
Here’s the essays I’ve written in past years for the letters U through Z.
I have another subject nominated by goldenoj today. And it even lets me get into number theory, the field of mathematics questions that everybody understands and nobody can prove.
I was once a young grad student working as a teaching assistant and unaware of the principles of student privacy. Near the end of semesters I would e-mail students their grades. This so they could correct any mistakes and know what they’d have to get on the finals. I was learning Perl, which was an acceptable pastime in the 1990s. So I wrote scripts that would take my spreadsheet of grades and turn it into e-mails that were automatically sent. And then I got all fancy.
It seemed boring to send out completely identical form letters, even if any individual would see it once. Maybe twice if they got me for another class. So I started writing variants of the boilerplate sentences. My goal was that every student would get a mass-produced yet unique e-mail. To best the chances of this I had to make sure of something about all these variant sentences and paragraphs.
So you see the trick. I needed a set of relatively prime numbers. That way, it would be the greatest possible number of students before I had a completely repeated text. We know what prime numbers are. They’re the numbers that, in your field, have exactly two factors. In the counting numbers the primes are numbers like 2, 3, 5, 7 and so on. In the Gaussian integers, these are numbers like 3 and 7 and . But not 2 or 5. We can look to primes among the polynomials. Among polynomials with rational coefficients, is prime. So is . is not.
The idea of relative primes appears wherever primes appears. We can say without contradiction that 4 and 9 are relative primes, among the whole numbers. Though neither’s prime, in the whole numbers, neither has a prime factor in common. This is an obvious way to look at it. We can use that definition for any field that has a concept of primes. There are others, though. We can say two things are relatively prime if there’s a linear combination of them that adds to the identity element. You get a linear combination by multiplying each of the things by a scalar and adding these together. Multiply 4 by -2 and 9 by 1 and add them and look what you get. Or, if the least common multiple of a set of elements is equal to their product, then the elements are relatively prime. Some make sense only for the whole numbers. Imagine the first quadrant of a plane, marked in Cartesian coordinates. Draw the line segment connecting the point at (0, 0) and the point with coordinates (m, n). If that line segment touches no dots between (0, 0) and (m, n), then the whole numbers m and n are relatively prime.
We start looking at relative primes as pairs of things. We can be interested in larger sets of relative primes, though. My little e-mail generator, for example, wouldn’t work so well if any pair of sentence replacements were not relatively prime. So, like, the set of numbers 2, 6, 9 is relatively prime; all three numbers share no prime factors. But neither the pair 2, 6 and the pair 6, 9 are not relatively prime. 2, 9 is, at least there’s that. I forget how many replaceable sentences were in my form e-mails. I’m sure I did the cowardly thing, coming up with a prime number of alternate ways to phrase as many sentences as possible. As an undergraduate I covered the student government for four years’ worth of meetings. I learned a lot of ways to say the same thing.
Which is all right, but are relative primes important? Relative primes turn up all over the place in number theory, and in corners of group theory. There are some thing that are easier to calculate in modulo arithmetic if we have relatively prime numbers to work with. I know when I see modulo arithmetic I expect encryption schemes to follow close behind. Here I admit I’m ignorant whether these imply things which make encryption schemes easier or harder.
Some of the results are neat, certainly. Suppose that the function f is a polynomial. Then, if its first derivative f’ is relatively prime to f, it turns out f has no repeated roots. And vice-versa: if f has no repeated roots, then it and its first derivative are relatively prime. You remember repeated roots. They’re factors like , that foiled your attempt to test a couple points and figure roughly where a polynomial crossed the x-axis.
I mentioned that primeness depends on the field. This is true of relative primeness. Polynomials really show this off. (Here I’m using an example explained in a 2007 Ask Dr Math essay.) Is the polynomial relatively prime to ?
It is, if we are interested in polynomials with integer coefficients. There’s no linear combination of and which gets us to 1. Go ahead and try.
It is not, if we are interested in polynomials with rational coefficients. Multiply by and multiply by . Then add those up.
Tell me what polynomials you want to deal with today and I will tell you which answer is right.
This may all seem cute if, perhaps, petty. A bunch of anonymous theorems dotting the center third of an abstract algebra text will inspire that. The most important relative-primes thing I know of is the abc conjecture, posed in the mid-80s by Joseph Oesterlé and David Masser. Start with three counting numbers, a, b, and c. Require that a + b = c.
There is a product of the unique prime factors of a, b, and c. That is, let’s say a is 36. This is 2 times 2 times 3 times 3. Let’s say b is 5. This is prime. c is 41; it’s prime. Their unique prime factors are 2, 3, 5, and 41; the product of all these is 1,230.
The conjecture deals with this product of unique prime factors for this relatively prime triplet. Almost always, c is going to be smaller than this unique prime factors product. The conjecture says that there will be, for every positive real number , at most finitely many cases where c is larger than this product raised to the power . I do not know why raising this product to this power is so important. I assume it rules out some case where this product raised to the first power would be too easy a condition.
Apart from that bit, though, this is a classic sort of number theory conjecture. Like, it involves some technical terms, but nothing too involved. You could almost explain it at a party and expect to be understood, and to get some people writing down numbers, testing out specific cases. Nobody will go away solving the problem, but they’ll have some good exercise and that’s worthwhile.
And it has consequences. We do not know whether the abc conjecture is true. We do know that if it is true, then a bunch of other things follow. The one that a non-mathematician would appreciate is that Fermat’s Last Theorem would be provable by an alterante route. The abc conjecture would only prove the cases for Fermat’s Last Theorem for powers greater than 5. But that’s all right. We can separately work out the cases for the third, fourth, and fifth powers, and then cover everything else at once. (That we know Fermat’s Last Theorem is true doesn’t let us conclude the abc conjecture is true, unfortunately.)
There are other implications. Some are about problems that seem like fun to play with. If the abc conjecture is true, then for every integer A, there are finitely many values of n for which is a perfect square. Some are of specialist interest: Lang’s conjecture, about elliptic curves, would be true. This is a lower bound for the height of non-torsion rational points. I’d stick to the stuff at a party. A host of conjectures about Diophantine equations — (high school) algebra problems where only integers may be solutions — become theorems. Also coming true: the Fermat-Catalan conjecture. This is a neat problem; it claims that the equation
where a, b, and c are relatively prime, and m, n, and k are positive integers satisfying the constraint
has only finitely many solutions with distinct triplets . The inequality about reciprocals of m, n, and k is needed so we don’t have boring solutions like clogging us up. The bit about distinct triplets is so we don’t clog things up with a or b being 1 and then technically every possible m or n giving us a “different” set. To date we know something like ten solutions, one of them having a equal to 1.
Another implication is Pillai’s Conjecture. This one asks whether every positive integer occurs only finitely many times as the difference between perfect powers. Perfect powers are, like 32 (two to the fifth power) or 81 (three to the fourth power) or such.
So as often happens when we stumble into a number theory thing, the idea of relative primes is easy. And there are deep implications to them. But those in turn give us things that seem like fun arithmetic puzzles.
I got a good nomination for a Q topic, thanks again to goldenoj. It was for Qualitative/Quantitative. Either would be a good topic, but they make a natural pairing. They describe the things mathematicians look for when modeling things. But ultimately I couldn’t find an angle that I liked. So rather than carry on with an essay that wasn’t working I went for a topic of my own. Might come back around to it, though, especially if nothing good presents itself for the letter X, which will probably need to be a wild card topic anyway.
We like comparing sizes. I talked about that some with norms. We do the same with shapes, though. We’d like to know which one is bigger than another, and by how much. We rely on squares to do this for us. It could be any shape, but we in the western tradition chose squares. I don’t know why.
My guess, unburdened by knowledge, is the ancient Greek tradition of looking at the shapes one can make with straightedge and compass. The easiest shape these tools make is, of course, circles. But it’s hard to find a circle with the same area as, say, any old triangle. Squares are probably a next-best thing. I don’t know why not equilateral triangles or hexagons. Again I would guess that the ancient Greeks had more rectangular or square rooms than the did triangles or hexagons, and went with what they knew.
So that’s what lurks behind that word “quadrature”. It may be hard for us to judge whether this pentagon is bigger than that octagon. But if we find squares that are the same size as the pentagon and the octagon, great. We can spot which of the squares is bigger, and by how much.
Straightedge-and-compass lets you find the quadrature for many shapes. Like, take a rectangle. Let me call that ABCD. Let’s say that AB is one of the long sides and BC one of the short sides. OK. Extend AB, outwards, to another point that I’ll call E. Pick E so that the length of BE is the same as the length of BC.
Next, bisect the line segment AE. Call that point F. F is going to be the center of a new semicircle, one with radius FE. Draw that in, on the side of AE that’s opposite the point C. Because we are almost there.
Extend the line segment CB upwards, until it touches this semicircle. Call the point where it touches G. The line segment BG is the side of a square with the same area as the original rectangle ABCD. If you know enough straightedge-and-compass geometry to do that bisection, you know enough to turn BG into a square. If you’re not sure why that’s the correct length, you can get there quickly. Use a little algebra and the Pythagorean theorem.
Neat, yeah, I agree. Also neat is that you can use the same trick to find the area of a parallelogram. A parallelogram has the same area as a square with the same bases and height between them, you remember. So take your parallelogram, draw in some perpendiculars to share that off into a rectangle, and find the quadrature of that rectangle. you’ve got the quadrature of your parallelogram.
Having the quadrature of a parallelogram lets you find the quadrature of any triangle. Pick one of the sides of the triangle as the base. You have a third point not on that base. Draw in the parallel to that base that goes through that third point. Then choose one of the other two sides. Draw the parallel to that side which goes through the other point. Look at that: you’ve got a parallelogram with twice the area of your original triangle. Bisect either the base or the height of this parallelogram, as you like. Then follow the rules for the quadrature of a parallelogram, and you have the quadrature of your triangle. Yes, you’re doing a lot of steps in-between the triangle you started with and the square you ended with. Those steps don’t count, not by this measure. Getting the results right matters.
And here’s some more beauty. You can find the quadrature for any polygon. Remember how you can divide any polygon into triangles? Go ahead and do that. Find the quadrature for every one of those triangles then. And you can create a square that has an area as large as all those squares put together. I’ll refrain from saying quite how, because realizing how is such a delight, one of those moments that at least made me laugh at how of course that’s how. It’s through one of those things that even people who don’t know mathematics know about.
With that background you understand why people thought the quadrature of the circle ought to be possible. Moreso when you know that the lune, a particular crescent-moon-like shape, can be squared. It looks so close to a half-circle that it’s obvious the rest should be possible. It’s not, and it took two thousand years and a completely different idea of geometry to prove it. But it sure looks like it should be possible.
Along the way to modernity quadrature picked up a new role. This is as part of calculus. One of the legs of calculus is integration. There is an interpretation of what the (definite) integral of a function means so common that we sometimes forget it doesn’t have to be that. This is to say that the integral of a function is the area “underneath” the curve. That is, it’s the area bounded by the limits of integration, by the horizontal axis, and by the curve represented by the function. If the function is sometimes less than zero, within the limits of integration, we’ll say that the integral represents the “net area”. Then we allow that the net area might be less than zero. Then we ignore the scolding looks of the ancient Greek mathematicians.
No matter. We love being able to find “the” integral of a function. This is a new function, and evaluating it tells us what this net area bounded by the limits of integration is. Finding this is “integration by quadrature”. At least in books published back when they wrote words like “to-day” or “coördinate”. My experience is that the term’s passed out of the vernacular, at least in North American Mathematician’s English.
Anyway the real flaw is that there are, like, six functions we can find the integral for. For the rest, we have to make do with approximations. This gives us “numerical quadrature”, a phrase which still has some currency.
And with my prologue about compass-and-straightedge quadrature you can see why it’s called that. Numerical integration schemes often rely on finding a polynomial with a part that looks like a graph of the function you’re interested in. The other edges look like the limits of the integration. Then the area of that polygon should be close to the area “underneath” this function. So it should be close to the integral of the function you want. And we’re old hands at how the quadrature of polygons, since we talked that out like five hundred words ago.
Now, no person ever has or ever will do numerical quadrature by compass-and-straightedge on some function. So why call it “numerical quadrature” instead of just “numerical integration”? Style, for one. “Quadrature” as a word has a nice tone, clearly jargon but not threateningly alien. Also “numerical integration” often connotes the solving differential equations numerically. So it can clarify whether you’re evaluating integrals or solving differential equations. If you think that’s a distinction worth making. Evaluating integrals and solving differential equations are similar together anyway.
And there is another adjective that often attaches to quadrature. This is Gaussian Quadrature. Gaussian Quadrature is, in principle, a fantastic way to do numerical integration perfectly. For some problems. For some cases. The insight which justifies it to me is one of those boring little theorems you run across in the chapter introducing How To Integrate. It runs something like this. Suppose ‘f’ is a continuous function, with domain the real numbers and range the real numbers. Suppose a and b are the limits of integration. Then there’s at least one point c, between a and b, for which:
So if you could pick the right c, any integration would be so easy. Evaluate the function for one point and multiply it by whatever b minus a is. The catch is, you don’t know what c is.
Except there’s some cases where you kinda do. Like, if f is a line, rising or falling with a constant slope from a to b? Then have c be the midpoint of a and b.
That won’t always work. Like, if f is a parabola on the region from a to b, then c is not going to be the midpoint. If f is a cubic, then the midpoint is probably not c. And so on. And if you don’t know what kind of function f is? There’s no guessing where c will be.
But. If you decide you’re only trying to certain kinds of functions? Then you can do all right. If you decide you only want to integrate polynomials, for example, then … well, you’re not going to find a single point c for this. But what you can find is a set of points between a and b. Evaluate the function for those points. And then find a weighted average by rules I’m not getting into here. And that weighted average will be exactly that integral.
Of course there’s limits. The Gaussian Quadrature of a function is only possible if you can evaluate the function at arbitrary points. If you’re trying to integrate, like, a set of sample data it’s inapplicable. The points you pick, and the weighting to use, depend on what kind of function you want to integrate. The results will be worse the less your function is like what you supposed. It’s tedious to find what these points are for a particular assumption of function. But you only have to do that once, or look it up, if you know (say) you’re going to use polynomials of degree up to six or something like that.
And there are variations on this. They have names like the Chevyshev-Gauss Quadrature, or the Hermite-Gauss Quadrature, or the Jacobi-Gauss Quadrature. There are even some that don’t have Gauss’s name in them at all.
Despite that, you can get through a lot of mathematics not talking about quadrature. The idea implicit in the name, that we’re looking to compare areas of different things by looking at squares, is obsolete. It made sense when we worked with numbers that depended on units. One would write about a shape’s area being four times another shape’s, or the length of its side some multiple of a reference length.
We’ve grown comfortable thinking of raw numbers. It makes implicit the step where we divide the polygon’s area by the area of some standard reference unit square. This has advantages. We don’t need different vocabulary to think about integrating functions of one or two or ten independent variables. We don’t need wordy descriptions like “the area of this square is to the area of that as the second power of this square’s side is to the second power of that square’s side”. But it does mean we don’t see squares as intermediaries to understanding different shapes anymore.
Today’s A To Z term is another from goldenoj. It was just the proposal “Platonic”. Most people, prompted, would follow that adjective with one of three words. There’s relationship, ideal, and solid. Relationship is a little too far off of mathematics for me to go into here. Platonic ideals run very close to mathematics. Probably the default philosophy of western mathematics is Platonic. At least a folk Platonism, where the rest of us follow what the people who’ve taken the study of mathematical philosophy seriously seem to be doing. The idea that mathematical constructs are “real things” and have some “existence” that we can understand even if we will never see a true circle or an unadulterated four. Platonic solids, though, those are nice and familiar things. Many of them we can find around the house. That’s one direction to go.
Before I get to the Platonic Solids, though, I’d like to think a little more about Platonic Ideals. What do they look like? I gather our friends in the philosophy department have debated this question a while. So I won’t pretend to speak as if I had actual knowledge. I just have an impression. That impression is … well, something simple. My reasoning is that the Platonic ideal of, say, a chair has to have all the traits that every chair ever has. And there’s not a lot that every chair has. Whatever’s in the Platonic Ideal chair has to be just the things that every chair has, and to omit things that non-chairs do not.
That’s comfortable to me, thinking like a mathematician, though. I think mathematicians train to look for stuff that’s very generally true. This will tend to be things that have few properties to satisfy. Things that look, in some way, simple.
So what is simple in a shape? There’s no avoiding aesthetic judgement here. We can maybe use two-dimensional shapes as a guide, though. Polygons seem nice. They’re made of line segments which join at vertices. Regular polygons even nicer. Each vertex in a regular polygon connects to two edges. Each edge connects to exactly two vertices. Each edge has the same length. The interior angles are all congruent. And if you get many many sides, the regular polygon looks like a circle.
So there’s some things we might look for in solids. Shapes where every edge is the same length. Shapes where every edge connects exactly two vertices. Shapes where every vertex connects to the same number of edges. Shapes where the interior angles are all constant. Shapes where each face is the same polygon as every other face. Look for that and, in three-dimensional space, we find nine shapes.
Yeah, you want that to be five also. The four extra ones are “star polyhedrons”. They look like spikey versions of normal shapes. What keeps these from being Platonic solids isn’t a lack of imagination on Plato’s part. It’s that they’re not convex shapes. There’s no pair of points in a convex shape for which the line segment connecting them goes outside the shape. For the star polyhedrons, well, look at the ends of any two spikes. If we decide that part of this beautiful simplicity is convexity, then we’re down to five shapes. They’re famous. Tetrahedron, cube, octahedron, icosahedron, and dodecahedron.
I’m not sure why they’re named the Platonic Solids, though. Before you explain to me that they were named by Plato in the dialogue Timaeus, let me say something. They were named by Plato in the dialogue Timaeus. That isn’t the same thing as why they have the name Platonic Solids. I trust Plato didn’t name them “the me solids”, since if I know anything about Plato he would have called them “the Socratic solids”. It’s not that Plato was the first to group them either. At least some of the solids were known long before Plato. I don’t know of anyone who thinks Plato particularly advanced human understanding of the solids.
But he did write about them, and in things that many people remembered. It’s natural for a name to attach to the most famous person writing them. Still, someone had the thought which we follow to group these solids together under Plato’s name. I’m curious who, and when. Naming is often a more arbitrary thing than you’d think. The Fibonacci sequence has been known at latest since Fibonacci wrote about it in 1204. But it could not have that name before 1838, when historian Guillaume Libri gave Leonardo of Pisa the name Fibonacci. I’m not saying that the name “Platonic Solid” was invented in, like, 2002. But traditions that seem age-old can be surprisingly recent.
What is an age-old tradition is looking for physical significance in the solids. Plato himself cleverly matched the solids to the ancient concept of four elements plus a quintessence. Johannes Kepler, whom we thank for noticing the star polyhedrons, tried to match them to the orbits of the planets around the sun. Wikipedia tells me of a 1980s attempt to understand the atomic nucleus using Platonic solids. The attempt even touches me. Along the way to my thesis I looked at uniform charges free to move on the surface of a sphere. It was obvious if there were four charges they’d move to the vertices of a tetrahedron on the sphere. Similarly, eight charges would go to the vertices of the cube. 20 charges to the vertices of the icosahedron. And so on. The Platonic Solids seem not just attractive but also of some deep physical significance.
There are not the four (or five) elements of ancient Greek atomism. Attractive as it is to think that fire is a bunch of four-sided dice. The orbits of the planets have nothing to do with the Platonic solids. I know too little about the physics of the atomic nucleus to say whether that panned out. However, that it doesn’t even get its own Wikipedia entry suggests something to me. And, in fact, eight charges on the sphere will not settle at the vertices of a cube. They’ll settle on a staggered pattern, two squares turned 45 degrees relative to each other. The shape is called a “square antiprism”. I was as surprised as you to learn that. It’s possible that the Platonic Solids are, ultimately, pleasant to us but not a key to the universe.
The example of the Platonic Solids does give us the cue to look for other families of solids. There are many such. The Archimedean Solids, for example, are again convex polyhedrons. They have faces of two or more regular polygons, rather than the lone one of Platonic Solids. There are 13 of these, with names of great beauty like the snub cube or the small rhombicuboctahedron. The Archimedean Solids have duals. The dual of a polyhedron represents a face of the original shape with a vertex. Faces that meet in the original polyhedron have an edge between their dual’s vertices. The duals to the Archimedean Solids get the name Catalan Solids. This for the Belgian mathematician Eugène Catalan, who described them in 1865. These attract names like “deltoidal icositetrahedron”. (The Platonic Solids have duals too, but those are all Platonic solids too. The tetrahedron is even its own dual.) The star polyhedrons hint us to look at stellations. These are shapes we get by stretching out the edges or faces of a polyhedron until we get a new polyhedron. It becomes a dizzying taxonomy of shapes, many of them with pointed edges.
There are things that look like Platonic Solids in more than three dimensions of space. In four dimensions of space there are six of these, five of which look like versions of the Platonic Solids we all know. The sixth is this novel shape called the 24-cell, or hyperdiamond, or icositetrachoron, or some other wild names. In five dimensions of space? … it turns out there are only three things that look like Platonic Solids. There’s versions of the tetrahedron, the cube, and the octahedron. In six dimensions? … Three shapes, again versions of the tetrahedron, cube, and octahedron. And it carries on like this for seven, eight, nine, any number of dimensions of space. Which is an interesting development. If I hadn’t looked up the answer I’d have expected more dimensions of space to allow for more Platonic Solid-like shapes. Well, our experience with two and three dimensions guides us to thinking about more dimensions of space. It doesn’t mean that they’re just regular space with a note in the corner that “N = 8”. Shapes hold surprises.
Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there.
An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same sapce but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.
So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.
My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.
I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.
Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.
I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.
One of the conceptual leaps in high school algebra is that we start analyzing the things we do with numbers. Like, we don’t just take the number three, square it, multiply that by two and add to that the number three times four and add to that the number 1. We think about what if we take any number, call it x, and think of . And what if we make equations based on doing this latex 2x^2 + 4x + 1 $; what values of x make those equations true? Or tell us something interesting?
Operators represent a similar leap. We can think of functions as things we manipulate, and think of those manipulations as a particular thing to do. For example, let me come up with a differential expression. For some function u(x) work out the value of this:
Let me join in the convention of using ‘D’ for the differential operator. Then we can rewrite this expression like so:
Suddenly the differential equation looks a lot like a polynomial. Of course it does. Remember that everything in mathematics is polynomials. We get new tools to solve differential equations by rewriting them as operators. That’s nice. It also scratches that itch that I think everyone in Intro to Calculus gets, of wanting to somehow see as if it were a square of . It’s not, and is not the square of . It’s composing with itself. But it looks close enough to squaring to feel comfortable.
Nobody needs to do except to learn some stuff about operators. But you might imagine a world where we did this process all the time. If we did, then we’d develop shorthand for it. Maybe a new operator, call it T, and define it that . You see the grammar of treating functions as if they were real numbers becoming familiar. You maybe even noticed the ‘1’ sitting there, serving as the “identity operator”. You know how you’d write out if you needed to write it in full.
But there are operators that we use all the time. These do get special names, and often shorthand. For example, there’s the gradient operator. This applies to any function with several independent variables. The gradient has a great physical interpretation if the variables represent coordinates of space. If they do, the gradient of a function at a point gives us a vector that describes the direction in which the function increases fastest. And the size of that gradient — a functional on this operator — describes how fast that increase is.
The gradient itself defines more operators. These have names you get very familiar with in Vector Calculus, with names like divergence and curl. These have compelling physical interpretations if we think of the function we operate on as describing a moving fluid. A positive divergence means fluid is coming into the system; a negative divergence, that it is leaving. The curl, in fluids, describe how nearby streams of fluid move at different rate.
Physical interpretations are common in operators. This probably reflects how much influence physics has on mathematics and vice-versa. Anyone studying quantum mechanics gets familiar with a host of operators. These have comfortable names like “position operator” or “momentum operator” or “spin operator”. These are operators that apply to the wave function for a problem. They transform the wave function into a probability distribution. That distribution describes what positions or momentums or spins are likely, how likely they are. Or how unlikely they are.
They’re not all physical, though. Or not purely physical. Many operators are useful because they are powerful mathematical tools. There is a variation of the Fourier series called the Fourier transform. We can interpret this as an operator. Suppose the original function started out with time or space as its independent variable. This often happens. The Fourier transform operator gives us a new function, one with frequencies as independent variable. This can make the function easier to work with. The Fourier transform is an integral operator, by the way, so don’t go thinking everything is a complicated set of derivatives.
Another integral-based operator that’s important is the Laplace transform. This is a great operator because it turns differential equations into algebraic equations. Often, into polynomials. You saw that one coming.
This is all a lot of good press for operators. Well, they’re powerful tools. They help us to see that we can manipulate functions in the ways that functions let us manipulate numbers. It should sound good to realize there is much new that you can do, and you already know most of what’s needed to do it.
Today’s A To Z term is another free choice. So I’m picking a term from the world of … mathematics. There are a lot of norms out there. Many are specialized to particular roles, such as looking at complex-valued numbers, or vectors, or matrices, or polynomials.
Still they share things in common, and that’s what this essay is for. And I’ve brushed up against the topic before.
The norm, also, has nothing particular to do with “normal”. “Normal” is an adjective which attaches to every noun in mathematics. This is security for me as while these A-To-Z sequences may run out of X and Y and W letters, I will never be short of N’s.
A “norm” is the size of whatever kind of thing you’re working with. You can see where this is something we look for. It’s easy to look at two things and wonder which is the smaller.
There are many norms, even for one set of things. Some seem compelling. For the real numbers, we usually let the absolute value do this work. By “usually” I mean “I don’t remember ever seeing a different one except from someone introducing the idea of other norms”. For a complex-valued number, it’s usually the square root of the sum of the square of the real part and the square of the imaginary coefficient. For a vector, it’s usually the square root of the vector dot-product with itself. (Dot product is this binary operation that is like multiplication, if you squint, for vectors.) Again, these, the “usually” means “always except when someone’s trying to make a point”.
Which is why we have the convention that there is a “the norm” for a kind of operation. The norm dignified as “the” is usually the one that looks as much as possible like the way we find distances between two points on a plane. I assume this is because we bring our intuition about everyday geometry to mathematical structures. You know how it is. Given an infinity of possible choices we take the one that seems least difficult.
Every sort of thing which can have a norm, that I can think of, is a vector space. This might be my failing imagination. It may also be that it’s quite easy to have a vector space. A vector space is a collection of things with some rules. Those rules are about adding the things inside the vector space, and multiplying the things in the vector space by scalars. These rules are not difficult requirements to meet. So a lot of mathematical structures are vector spaces, and the things inside them are vectors.
A norm is a function that has these vectors as its domain, and the non-negative real numbers as its range. And there are three rules that it has to meet. So. Give me a vector ‘u’ and a vector ‘v’. I’ll also need a scalar, ‘a. Then the function f is a norm when:
. This is a famous rule, called the triangle inequality. You know how in a triangle, the sum of the lengths of any two legs is greater than the length of the third leg? That’s the rule at work here.
. This doesn’t have so snappy a name. Sorry. It’s something about being homogeneous, at least.
If then u has to be the additive identity, the vector that works like zero does.
Norms take on many shapes. They depend on the kind of thing we measure, and what we find interesting about those things. Some are familiar. Look at a Euclidean space, with Cartesian coordinates, so that we might write something like (3, 4) to describe a point. The “the norm” for this, called the Euclidean norm or the L2 norm, is the square root of the sum of the squares of the coordinates. So, 5. But there are other norms. The L1 norm is the sum of the absolute values of all the coefficients; here, 7. The L∞ norm is the largest single absolute value of any coefficient; here, 4.
A polynomial, meanwhile? Write it out as . Take the absolute value of each of these terms. Then … you have choices. You could take those absolute values and add them up. That’s the L1 polynomial norm. Take those absolute values and square them, then add those squares, and take the square root of that sum. That’s the L2 norm. Take the largest absolute value of any of these coefficients. That’s the L∞ norm.
These don’t look so different, even though points in space and polynomials seem to be different things. We designed the tool. We want it not to be weirder than it has to be. When we try to put a norm on a new kind of thing, we look for a norm that resembles the old kind of thing. For example, when we want to define the norm of a matrix, we’ll typically rely on a norm we’ve already found for a vector. At least to set up the matrix norm; in practice, we might do a calculation that doesn’t explicitly use a vector’s norm, but gives us the same answer.
If we have a norm for some vector space, then we have an idea of distance. We can say how far apart two vectors are. It’s the norm of the difference between the vectors. This is called defining a metric on the vector space. A metric is that sense of how far apart two things are. What keeps a norm and a metric from being the same thing is that it’s possible to come up with a metric that doesn’t match any sensible norm.
It’s always possible to use a norm to define a metric, though. Doing that promotes our normed vector space to the dignified status of a “metric space”. Many of the spaces we find interesting enough to work in are such metric spaces. It’s hard to think of doing without some idea of size.
Today’s A To Z term was nominated again by @aajohannas. The other compelling nomination was from Vayuputrii, for the Mittag-Leffler function. I was tempted. But I realized I could not think of a clear way to describe why the function was interesting. Or even where it comes from that avoided being a heap of technical terms. There’s no avoiding technical terms in writing about mathematics, but there’s only so much I want to put in at once either. It also makes me realize I don’t understand the Mittag-Leffler function, but it is after all something I haven’t worked much with.
The Mittag-Leffler function looks like it’s one of those things named for several contributors, like Runge-Kutta Integration or Cauchy-Kovalevskaya Theorem or something. Not so here; this was one person, Gösta Mittag-Leffler. His name’s all over the theory of functions. And he was one of the people helping Sofia Kovalevskaya, whom you know from every list of pioneering women in mathematics, secure her professorship.
A martingale is how mathematicians prove you can’t get rich gambling.
Well, that exaggerates. Some people will be lucky, of course. But there’s no strategy that works. The only strategy that works is to rig the game. You can do this openly, by setting rules that give you a slight edge. You usually have to be the house to do this. Or you can do it covertly, using tricks like card-counting (in blackjack) or weighted dice or other tricks. But a fair game? Meaning one not biased towards or against any player? There’s no strategy to guarantee winning that.
We can make this more technical. Martingales arise form the world of stochastic processes. This is an indexed set of random variables. A random variable is some variable with a value that depends on the result of some phenomenon. A tossed coin. Rolled dice. Number of people crossing a particular walkway over a day. Engine temperature. Value of a stock being traded. Whatever. We can’t forecast what the next value will be. But we now the distribution, which values are more likely and which ones are unlikely and which ones impossible.
The field grew out of studying real-world phenomena. Things we could sample and do statistics on. So it’s hard to think of an index that isn’t time, or some proxy for time like “rolls of the dice”. Stochastic processes turn up all over the place. A lot of what we want to know is impossible, or at least impractical, to exactly forecast. Think of the work needed to forecast how many people will cross this particular walk four days from now. But it’s practical to describe what are more and less likely outcomes. What the average number of walk-crossers will be. What the most likely number will be. Whether to expect tomorrow to be a busier or a slower day.
And this is what the martingale is for. Start with a sequence of your random variables. How many people have crossed that street each day since you started studying. What is the expectation value, the best guess, for the next result? Your best guess for how many will cross tomorrow? Keeping in mind your knowledge of how all these past values. That’s an important piece. It’s not a martingale if the history of results isn’t a factor.
Every probability question has to deal with knowledge. Sometimes it’s easy. The probability of a coin coming up tails next toss? That’s one-half. The probability of a coin coming up tails next toss, given that it came up tails last time? That’s still one-half. The probability of a coin coming up tails next toss, given that it came up tails the last 40 tosses? That’s … starting to make you wonder if this is a fair coin. I’d bet tails, but I’d also ask to examine both sides, for a start.
So a martingale is a stochastic process where we can make forecasts about the future. Particularly, the expectation value. The expectation value is the sum of the products of every possible value and how probable they are. In a martingale, the expected value for all time to come is just the current value. So if whatever it was you’re measuring was, say, 40 this time? That’s your expectation for the whole future. Specific values might be above 40, or below 40, but on average, 40 is it.
Put it that way and you’d think, well, how often does that ever happen? Maybe some freak process will give you that, but most stuff?
Well, here’s one. The random walk. Set a value. At each step, it can increase or decrease by some fixed value. It’s as likely to increase as to decrease. This is a martingale. And it turns out a lot of stuff is random walks. Or can be processed into random walks. Even if the original walk is unbalanced — say it’s more likely to increase than decrease. Then we can do a transformation, and find a new random variable based on the original. Then that one is as likely to increase as decrease. That one is a martingale.
It’s not just random walks. Poisson processes are things where the chance of something happening is tiny, but it has lots of chances to happen. So this measures things like how many car accidents happen on this stretch of road each week. Or where a couple plants will grow together into a forest, as opposed to lone trees. How often a store will have too many customers for the cashiers on hand. These processes by themselves aren’t often martingales. But we can use them to make a new stochastic process, and that one is a martingale.
Where this all comes to gambling is in stopping times. This is a random variable that’s based on the stochastic process you started with. Its value at each index represents the probability that the random variable in that has reached some particular value by this index. The language evokes a gambler’s decision: when do you stop? There are two obvious stopping times for any game. One is to stop when you’ve won enough money. The other is to stop when you’ve lost your whole stake.
So there is something interesting about a martingale that has bounds. It will almost certainly hit at least one of those bounds, in a finite time. (“Almost certainly” has a technical meaning. It’s the same thing I mean when I say if you flip a fair coin infinitely many times then “almost certainly” it’ll come up tails at least once. Like, it’s not impossible that it doesn’t. It just won’t happen.) And for the gambler? The boundary of “runs out of money” is a lot closer than “makes the house run out of money”.
Oh, if you just want a little payoff, that’s fine. If you’re happy to walk away from the table with a one percent profit? You can probably do that. You’re closer to that boundary than to the runs-out-of-money one. A ten percent profit? Maybe so. Making an unlimited amount of money, like you’d want to live on your gambling winnings? No, that just doesn’t happen.
This gets controversial when we turn from gambling to the stock market. Or a lot of financial mathematics. Look at the value of a stock over time. I write “stock” for my convenience. It can be anything with a price that’s constantly open for renegotiation. Stocks, bonds, exchange funds, used cars, fish at the market, anything. The price over time looks like it’s random, at least hour-by-hour. So how can you reliably make money if the fluctuations of the price of a stock are random?
Well, if I knew, I’d have smaller student loans outstanding. But martingales seem like they should offer some guidance. Much of modern finance builds on not dealing with a stock price varying. Instead, buy the right to buy the stock at a set price. Or buy the right to sell the stock at a set price. This lets you pay to secure a certain profit, or a worst-possible loss, in case the price reaches some level. And now you see the martingale. Is it likely that the stock will reach a certain price within this set time? How likely? This can, in principle, guide you to a fair price for this right-to-buy.
The mathematical reasoning behind that is fine, so far as I understand it. Trouble arises because pricing correctly means having a good understanding of how likely it is prices will reach different levels. Fortunately, there are few things humans are better at than estimating probabilities. Especially the probabilities of complicated situations, with abstract and remote dangers.
So martingales are an interesting corner of mathematics. They apply to purely abstract problems like random walks. Or to good mathematical physics problems like Brownian motion and the diffusion of particles. And they’re lurking behind the scenes of the finance news. Exciting stuff.
I’m hopefully going to pass the halfway point on this year’s mathematics A-To-Z. This makes it a good time to panel for topics for the next several letters in the alphabet. It’s easier for me to keep my notes straight if you post requests as comments on this thread, but I’ll try to keep up if you do comment on other threads.
As ever, I’m happy to consider most mathematical topics, including ones that I’ve written about in the past if I think I can better an old essay. If there’s several suggestions for the same letter, I’ll pick the one that I think I can do most interestingly. If several seem interesting I might try rephrasing, if the subject allows for that.
And I do thank everyone who makes a suggestion, especially if it’s one that surprises me and that makes me learn something along the way.
Here’s the essays I’ve written in past years for the letters O through T.
I couldn’t find a place to fit this in the essay proper. But it’s too good to leave out. The simplex method, discussed within, traces to George Dantzig. He’d been planning methods for the US Army Air Force during the Second World War. Dantzig is a person you have heard about, if you’ve heard any mathematical urban legends. In 1939 he was late to Jerzy Neyman’s class. He took two statistics problems on the board to be homework. He found them “harder than usual”, but solved them in a couple days and turned in the late homework hoping Neyman would be understanding. They weren’t homework. They were examples of famously unsolved problems. Within weeks Neyman had written one of the solutions up for publication. When he needed a thesis topic Neyman advised him to just put what he already had in a binder. It’s the stuff every grad student dreams of. The story mutated. It picked up some glurge to become a narrative about positive thinking. And mutated further, into the movie Good Will Hunting.
Every three days one of the comic strips I read has the elderly main character talk about how they never used algebra. This is my hyperbole. But mathematics has got the reputation for being difficult and inapplicable to everyday life. We’ll concede using arithmetic, when we get angry at the fast food cashier who hands back our two pennies before giving change for our $6.77 hummus wrap. But otherwise, who knows what an elliptic integral is, and whether it’s working properly?
Linear programming does not have this problem. In part, this is because it lacks a reputation. But those who have heard of it, acknowledge it as immensely practical mathematics. It is about something a particular kind of human always finds compelling. That is how to do a thing best.
There are several kinds of “best”. There is doing a thing in as little time as possible. Or for as little effort as possible. For the greatest profit. For the highest capacity. For the best score. For the least risk. The goals have a thousand names, none of which we need to know. They all mean the same thing. They mean “the thing we wish to optimize”. To optimize has two directions, which are one. The optimum is either the maximum or the minimum. To be good at finding a maximum is to be good at finding a minimum.
It’s obvious why we call this “programming”; obviously, we leave the work of finding answers to a computer. It’s a spurious reason. The “programming” here comes from an independent sense of the word. It means more about finding a plan. Think of “programming” a night’s entertainment, so that every performer gets their turn, all scene changes have time to be done, you don’t put two comedians right after the other, and you accommodate the performer who has to leave early and the performer who’ll get in an hour late. Linear programming problems are often about finding how to do as well as possible given various priorities. All right. At least the “linear” part is obvious. A mathematics problem is “linear” when it’s something we can reasonably expect to solve. This is not the technical meaning. Technically what it means is we’re looking at a function something like:
Here, x, y, and z are the independent variables. We don’t know their values but wish to. a, b, and c are coefficients. These values are set to some constant for the problem, but they might be something else for other problems. They’re allowed to be positive or negative or even zero. If a coefficient is zero, then the matching variable doesn’t affect matters at all. The corresponding value can be anything at all, within the constraints.
I’ve written this for three variables, as an example and because ‘x’ and ‘y’ and ‘z’ are comfortable, familiar variables. There can be fewer. There can be more. There almost always are. Two- and three-variable problems will teach you how to do this kind of problem. They’re too simple to be interesting, usually. To avoid committing to a particular number of variables we can use indices. for values of j from 1 up to N. Or we can bundle all these values together into a vector, and write everything as . This has a particular advantage since when we can write the coefficients as a vector too. Then we use the notation of linear algebra, and write that we hope to maximize the value of:
(The superscript T means “transpose”. As a linear algebra problem we’d usually think of writing a vector as a tall column of things. By transposing that we write a long row of things. By transposing we can use the notation of matrix multiplication.)
This is the objective function. Objective here in the sense of goal; it’s the thing we want to find the best possible value of.
We have constraints. These represent limits on the variables. The variables are always things that come in limited supply. There’s no allocating more money than the budget allows, nor putting more people on staff than work for the company. Often these constraints interact. Perhaps not only is there only so much staff, but no one person can work more than a set number of days in a row. Something like that. That’s all right. We can write all these constraints as a matrix equation. An inequality, properly. We can bundle all the constraints into a big matrix named A, and demand:
Also, traditionally, we suppose that every component of is non-negative. That is, positive, or at lowest, zero. This reflects the field’s core problems of figuring how to allocate resources. There’s no allocating less than zero of something.
But we need some bounds. This is easiest to see with a two-dimensional problem. Try it yourself: draw a pair of axes on a sheet of paper. Now put in a constraint. Doesn’t matter what. The constraint’s edge is a straight line, which you can draw at any position and any angle you like. This includes horizontal and vertical. Shade in one side of the constraint. Whatever you shade in is the “feasible region”, the sets of values allowed under the constraint. Now draw in another line, another constraint. Shade in one side or the other of that. Draw in yet another line, another constraint. Shade in one side or another of that. The “feasible region” is whatever points have taken on all these shades. If you were lucky, this is a bounded region, a triangle. If you weren’t lucky, it’s not bounded. It’s maybe got some corners but goes off to the edge of the page where you stopped shading things in.
So adding that every component of is at least as big as zero is a backstop. It means we’ll usually get a feasible region with a finite volume. What was the last project you worked on that had no upper limits for anything, just minimums you had to satisfy? Anyway if you know you need something to be allowed less than zero go ahead. We’ll work it out. The important thing is there’s finite bounds on all the variables.
I didn’t see the bounds you drew. It’s possible you have a triangle with all three shades inside. But it’s also possible you picked the other sides to shade, and you have an annulus, with no region having more than two shades in it. This can happen. It means it’s impossible to satisfy all the constraints at once. At least one of them has to give. You may be reminded of the sign taped to the wall of your mechanics’ about picking two of good-fast-cheap.
But impossibility is at least easy. What if there is a feasible region?
Well, we have reason to hope. The optimum has to be somewhere inside the region, that’s clear enough. And it even has to be on the edge of the region. If you’re not seeing why, think of a simple example, like, finding the maximum of , inside the square where x is between 0 and 2 and y is between 0 and 3. Suppose you had a putative maximum on the inside, like, where x was 1 and y was 2. What happens if you increase x a tiny bit? If you increase y by twice that? No, it’s only on the edges you can get a maximum that can’t be locally bettered. And only on the corners of the edges, at that.
(This doesn’t prove the case. But it is what the proof gets at.)
So the problem sounds simple then! We just have to try out all the vertices and pick the maximum (or minimum) from them all.
OK, and here’s where we start getting into trouble. With two variables and, like, three constraints? That’s easy enough. That’s like five points to evaluate? We can do that.
We never need to do that. If someone’s hiring you to test five combinations I admire your hustle and need you to start getting me consulting work. A real problem will have many variables and many constraints. The feasible region will most often look like a multifaceted gemstone. It’ll extend into more than three dimensions, usually. It’s all right if you just imagine the three, as long as the gemstone is complicated enough.
Because now we’ve got lots of vertices. Maybe more than we really want to deal with. So what’s there to do?
The basic approach, the one that’s foundational to the field, is the simplex method. A “simplex” is a triangle. In three dimensions, anyway. In four dimensions it’s a tetrahedron. In two dimensions it’s a line segment. Generally, however many dimensions of space you have? The simplex is the simplest thing that fills up volume in your space.
You know how you can turn any polygon into a bunch of triangles? Just by connecting enough vertices together? You can turn a polyhedron into a bunch of tetrahedrons, by adding faces that connect trios of vertices. And for polyhedron-like shapes in more dimensions? We call those polytopes. Polytopes we can turn into a bunch of simplexes. So this is why it’s the “simplex method”. Any one simplex it’s easy to check the vertices on. And we can turn the polytope into a bunch of simplexes. And we can ignore all the interior vertices of the simplexes.
So here’s the simplex method. First, break your polytope up into simplexes. Next, pick any simplex; doesn’t matter which. Pick any outside vertex of that simplex. This is the first viable possible solution. It’s most likely wrong. That’s okay. We’ll make it better.
Because there are other vertices on this simplex. And there are other simplexes, adjacent to that first, which share this vertex. Test the vertices that share an edge with this one. Is there one that improves the objective function? Probably. Is there a best one of those in this simplex? Sure. So now that’s our second viable possible solution. If we had to give an answer right now, that would be our best guess.
But this new vertex, this new tentative solution? It shares edges with other vertices, across several simplexes. So look at these new neighbors. Are any of them an improvement? Which one of them is the best improvement? Move over there. That’s our next tentative solution.
You see where this is going. Keep at this. Eventually it’ll wind to a conclusion. Usually this works great. If you have, like, 8 constraints, you can usually expect to get your answer in from 16 to 24 iterations. If you have 20 constraints, expect an answer in from 40 to 60 iterations. This is doing pretty well.
But it might take a while. It’s possible for the method to “stall” a while, often because one or more of the variables is at its constraint boundary. Or the division of polytope into simplexes got unlucky, and it’s hard to get to better solutions. Or there might be a string of vertices that are all at, or near, the same value, so the simplex method can’t resolve where to “go” next. In the worst possible case, the simplex method takes a number of iterations that grows exponentially with the number of constraints. This, yes, is very bad. It doesn’t typically happen. It’s a numerical algorithm. There’s some problem to spoil any numerical algorithm.
You may have complaints. Like, the world is complicated. Why are we only looking at linear objective functions? Or, why only look at linear constraints? Well, if you really need to do that? Go ahead, but that’s not linear programming anymore. Think hard about whether you really need that, though. Linear anything is usually simpler than nonlinear anything. I mean, if your optimization function absolutely has to have in it? Could we just say you have a new variable that just happens to be equal to the square of y? Will that work? If you have to have the sine of z? Are you sure that z isn’t going to get outside the region where the sine of z is pretty close to just being z? Can you check?
Maybe you have, and there’s just nothing for it. That’s all right. This is why optimization is a living field of study. It demands judgement and thought and all that hard work.
I’m more fluent in graph theory, and my writing will reflect that. But its critical insight involves looking at spaces and ignoring things like distance and area and angle. It is amazing that one can discard so much of geometry and still have anything to consider. What we do learn then applies to very many problems.
Königsberg Bridge Problem.
Once upon a time there was a city named Königsberg. It no longer is. It is Kaliningrad now. It’s no longer in that odd non-contiguous chunk of Prussia facing the Baltic Sea. It’s now in that odd non-contiguous chunk of Russia facing the Baltic Sea.
I put it this way because what the city evokes, to mathematicians, is a story. I do not have specific reason to think the story untrue. But it is a good story, and as I think more about history I grow more skeptical of good stories. A good story teaches, though not always the thing it means to convey.
The story is this. The city is on two sides of the Pregel river, now the Pregolya River. Two large islands are in the river. For several centuries these four land masses were connected by a total of seven bridges. And we are told that people in the city would enjoy free time with an idle puzzle. Was there a way to walk all seven bridges one and only one time? If no one did something fowl like taking a boat to cross the river, or not going the whole way across a bridge, anyway? There were enough bridges, though, and enough possible ways to cross them, that trying out every option was hopeless.
Then came Leonhard Euler. Who is himself a preposterous number of stories. Pick any major field of mathematics; there is an Euler’s Theorem at its center. Or an Euler’s Formula. Euler’s Method. Euler’s Function. Likely he brought great new light to it.
And in 1736 he solved the Königsberg Bridge Problem. The answer was to look at what would have to be true for a solution to exist. He noticed something so obvious it required genius not to dismiss it. It seems too simple to be useful. In a successful walk you enter each land mass (river bank or island) the same number of times you leave it. So if you cross each bridge exactly once, you use an even number of bridges per land mass. The exceptions are that you must start at one land mass, and end at a land mass. Maybe a different one. How you get there doesn’t count for the problem. How you leave doesn’t either. So the land mass you start from may have an odd number of bridges. So may the one you end on. So there are up to two land masses that may have an odd number of bridges.
Once this is observed, it’s easy to tell that Königsberg’s Bridges did not match that. All four land masses in Königsberg have an odd number of bridges. And so we could stop looking. It’s impossible to walk the seven bridges exactly once each in a tour, not without cheating.
Graph theoreticians, like the topologists of my prologue, now consider this foundational to their field. To look at a geographic problem and not concern oneself with areas and surfaces and shapes? To worry only about how sets connect? This guides graph theory in how to think about networks.
The city exists, as do the islands, and the bridges existed as described. So does Euler’s solution. And his reasoning is sound. The reasoning is ingenious, too. Everything hard about the problem evaporates. So what do I doubt about this fine story?
Teo Paoletti, author of that web page, says Danzig mayor Carl Leonhard Gottlieb Ehler wrote Euler, asking for a solution. This falls short of proving that the bridges were a common subject of speculation. It does show at least that Ehler thought it worth pondering. Euler apparently did not think it was even mathematics. Not that he thought it was hard; he simply thought it didn’t depend on mathematical principles. It took only reason. But he did find something interesting: why was it not mathematics? Paoletti quotes Euler as writing:
This question is so banal, but seemed to me worthy of attention in that [neither] geometry, nor algebra, nor even the art of counting was sufficient to solve it.
I am reminded of a mathematical joke. It’s about the professor who always went on at great length about any topic, however slight. I have no idea why this should stick with me. Finally one day the professor admitted of something, “This problem is not interesting.” The students barely had time to feel relief. The professor went on: “But the reasons why it is not interesting are very interesting. So let us explore that.”
The Königsberg Bridge Problem is in the first chapter of every graph theory book ever. And it is a good graph theory problem. It may not be fair to say it created graph theory, though. Euler seems to have treated this as a little side bit of business, unrelated to his real mathematics. Graph theory as we know it — as a genre — formed in the 19th century. So did topology. In hindsight we can see how studying these bridges brought us good questions to ask, and ways to solve them. But for something like a century after Euler published this, it was just the clever solution to a recreational mathematics puzzle. It was as important as finding knight’s tours of chessboards.
That we take it as the introduction to graph theory, and maybe topology, tells us something. It is an easy problem to pose. Its solution is clever, but not obscure. It takes no long chains of complex reasoning. Many people approach mathematics problems with fear. By telling this story, we promise mathematics that feels as secure as a stroll along the riverfront. This promise is good through about chapter three, section four, where there are four definitions on one page and the notation summons obscure demons of LaTeX.
Still. Look at what the story of the bridges tells us. We notice something curious about our environment. The problem seems mathematical, or at least geographic. The problem is of no consequence. But it lingers in the mind. The obvious approaches to solving it won’t work. But think of the problem differently. The problem becomes simple. And better than simple. It guides one to new insights. In a century it gives birth to two fields of mathematics. In two centuries these are significant fields. They’re things even non-mathematicians have heard of. It’s almost a mathematician’s fantasy of insight and accomplishment.
But this does happen. The world suggests no end of little mathematics problems. Sometimes they are wonderful. Richard Feynman’s memoirs tell of his imagination being captured by a plate spinning in the air. Solving that helped him resolve a problem in developing Quantum Electrodynamics. There are more mundane problems. One of my professors in grad school remembered tossing and catching a tennis racket and realizing he didn’t know why sometimes it flipped over and sometimes didn’t. His specialty was in dynamical systems, and he could work out the mechanics of what a tennis racket should do, and when. And I know that within me is the ability to work out when a pile of books becomes too tall to stand on its own. I just need to work up to it.
The story of the Königsberg Bridge Problem is about this. Even if nobody but the mayor of Danzig pondered how to cross the bridges, and he only got an answer because he infected Euler with the need to know? It is a story of an important piece of mathematics. Good stories will tell us things that are true, which are not necessarily the things that happen in them.
Today’s A To Z term is my pick again. So I choose the Julia Set. This is named for Gaston Julia, one of the pioneers in chaos theory and fractals. He was born earlier than you imagine. No, earlier than that: he was born in 1893.
The early 20th century saw amazing work done. We think of chaos theory and fractals as modern things, things that require vast computing power to understand. The computers help, yes. But the foundational work was done more than a century ago. Some of these pioneering mathematicians may have been able to get some numerical computing done. But many did not. They would have to do the hard work of thinking about things which they could not visualize. Things which surely did not look like they imagined.
We think of things as moving. Even static things we consider as implying movement. Else we’d think it odd to ask, “Where does that road go?” This carries over to abstract things, like mathematical functions. A function is a domain, a range, and a rule matching things in the domain to things in the range. It “moves” things as much as a dictionary moves words.
Yet we still think of a function as expressing motion. A common way for mathematicians to write functions uses little arrows, and describes what’s done as “mapping”. We might write . This is a general idea. We’re expressing that it maps things in the set D to things in the set R. We can use the notation to write something more specific. If ‘z’ is in the set D, we might write . This describes the rule that matches things in the domain to things in the range. represents the evaluation of this rule at a specific point, the one where the independent variable has the value ‘2’. represents the evaluation of this rule at a specific point without committing to what that point is. represents a collection of points. It’s the set you get by evaluating the rule at every point in D.
And it’s not bad to think of motion. Many functions are models of things that move. Particles in space. Fluids in a room. Populations changing in time. Signal strengths varying with a sensor’s position. Often we’ll calculate the development of something iteratively, too. If the domain and the range of a function are the same set? There’s no reason that we can’t take our z, evaluate f(z), and then take whatever that thing is and evaluate f(f(z)). And again. And again.
My age cohort, at least, learned to do this almost instinctively when we discovered you could take the result on a calculator and hit a function again. Calculate something and keep hitting square root; you get a string of numbers that eventually settle on 1. Or you started at zero. Calculate something and keep hitting square; you settle at either 0, 1, or grow to infinity. Hitting sine over and over … well, that was interesting since you might settle on 0 or some other, weird number. Same with tangent. Cosine you wouldn’t settle down to zero.
Serious mathematicians look at this stuff too, though. Take any set ‘D’, and find what its image is, f(D). Then iterate this, figuring out what f(f(D)) is. Then f(f(f(D))). f(f(f(f(D)))). And so on. What happens if you keep doing this? Like, forever?
We can say some things, at least. Even without knowing what f is. There could be a part of D that all these many iterations of f will send out to infinity. There could be a part of D that all these many iterations will send to some fixed point. And there could be a part of D that just keeps getting shuffled around without ever finishing.
Some of these might not exist. Like, doesn’t have any fixed points or shuffled-around points. It sends everything off to infinity. has only a fixed point; nothing from it goes off to infinity and nothing’s shuffled back and forth. has a fixed point and a lot of points that shuffle back and forth.
Thinking about these fixed points and these shuffling points gets us Julia Sets. These sets are the fixed points and shuffling-around points for certain kinds of functions. These functions are ones that have domain and range of the complex-valued numbers. Complex-valued numbers are the sum of a real number plus an imaginary number. A real number is just what it says on the tin. An imaginary number is a real number multiplied by . What is ? It’s the imaginary unit. It has the neat property that . That’s all we need to know about it.
Oh, also, zero times is zero again. So if you really want, you can say all real numbers are complex numbers; they’re just themselves plus . Complex-valued functions are worth a lot of study in their own right. Better, they’re easier to study (at the introductory level) than real-valued functions are. This is such a relief to the mathematics major.
And now let me explain some little nagging weird thing. I’ve been using ‘z’ to represent the independent variable here. You know, using it as if it were ‘x’. This is a convention mathematicians use, when working with complex-valued numbers. An arbitrary complex-valued number tends to be called ‘z’. We haven’t forgotten x, though. We just in this context use ‘x’ to mean “the real part of z”. We also use “y” to carry information about the imaginary part of z. When we write ‘z’ we hold in trust an ‘x’ and ‘y’ for which . This all comes in handy.
But we still don’t have Julia Sets for every complex-valued function. We need it to be a rational function. The name evokes rational numbers, but that doesn’t seem like much guidance. is a rational function. It seems too boring to be worth studying, though, and it is. A “rational function” is a function that’s one polynomial divided by another polynomial. This whether they’re real-valued or complex-valued polynomials.
So. Start with an ‘f’ that’s one complex-valued polynomial divided by another complex-valued polynomial. Start with the domain D, all of the complex-valued numbers. Find f(D). And f(f(D)). And f(f(f(D))). And so on. If you iterated this ‘f’ without limit, what’s the set of points that never go off to infinity? That’s the Julia Set for that function ‘f’.
There are some famous Julia sets, though. There are the Julia sets that we heard about during the great fractal boom of the 1980s. This was when computers got cheap enough, and their graphic abilities good enough, to automate the calculation of points in these sets. At least to approximate the points in these sets. And these are based on some nice, easy-to-understand functions. First, you have to pick a constant C. This C is drawn from the complex-valued numbers. But that can still be, like, ½, if that’s what interests you. For whatever your C is? Define this function:
And that’s it. Yes, this is a rational function. The numerator function is . The denominator function is .
This produces many different patterns. If you picked C = 0, you get a circle. Good on you for starting out with something you could double-check. If you picked C = -2? You get a long skinny line, again, easy enough to check. If you picked C = -1? Well, now you have a nice interesting weird shape, several bulging ovals with peninsulas of other bulging ovals all over. Pick other numbers. Pick numbers with interesting imaginary components. You get pinwheels. You get jagged streaks of lightning. You can even get separate islands, whole clouds of disjoint threatening-looking blobs.
There is some guessing what you’ll get. If you work out a Julia Set for a particular C, you’ll see a similar-looking Julia Set for a different C that’s very close to it. This is a comfort.
You can create a Julia Set for any rational function. I’ve only ever seen anyone actually do it for functions that look like what we already had. . Sometimes . I suppose once, in high school, I might have tried but I don’t remember what it looked like. If someone’s done, say, please write in and let me know what it looks like.
The Julia Set has a famous partner. Maybe the most famous fractal of them all, the Mandelbrot Set. That’s the strange blobby sea surrounded by lightning bolts that you see on the cover of every pop mathematics book from the 80s and 90s. If a C gives us a Julia Set that’s one single, contiguous patch? Then that C is in the Mandelbrot Set. Also vice-versa.
The ideas behind these sets are old. Julia’s paper about the iterations of rational functions first appeared in 1918. Julia died in 1978, the same year that the first computer rendering of the Mandelbrot set was done. I haven’t been able to find whether that rendering existed before his death. Nor have I decided which I would think the better sequence.
Today’s A To Z term is a free pick. I didn’t notice any suggestions for a mathematics term starting with this letter. I apologize if you did submit one and I missed it. I don’t mean any insult.
What I’ve picked is a concept from analysis. I’ve described this casually as the study of why calculus works. That’s a good part of what it is. Analysis is also about why real numbers work. Later on you also get to why complex numbers and why functions work. But it’s in the courses about Real Analysis where a mathematics major can expect to find the infimum, and it’ll stick around on the analysis courses after that.
The infimum is the thing you mean when you say “lower bound”. It applies to a set of things that you can put in order. The order has to work the way less-than-or-equal-to works with whole numbers. You don’t have to have numbers to put a number-like order on things. Otherwise whoever made up the Alphabet Song was fibbing to us all. But starting out with numbers can let you get confident with the idea, and we’ll trust you can go from numbers to other stuff, in case you ever need to.
A lower bound would start out meaning what you’d imagine if you spoke English. Let me call it L. It’ll make my sentences so much easier to write. Suppose that L is less than or equal to all the elements in your set. Then, great! L is a lower bound of your set.
You see the loophole here. It’s in the article “a”. If L is a lower bound, then what about L – 1? L – 10? L – 1,000,000,000½? Yeah, they’re all lower bounds, too. There’s no end of lower bounds. And that is not what you mean be a lower bound, in everyday language. You mean “the smallest thing you have to deal with”.
But you can’t just say “well, the lower bound of a set is the smallest thing in the set”. There’s sets that don’t have a smallest thing. The iconic example is positive numbers. No positive number can be a lower bound of this. All the negative numbers are lowest bounds of this. Zero can be a lower bound of this.
For the postive numbers, it’s obvious: zero is the lower bound we want. It’s smaller than all of the positive numbers. And there’s no greater number that’s also smaller than all the positive numbers. So this is the infimum of the positive numbers. It’s the greatest lower bound of the set.
The infimum of a set may or may not be part of the original set. But. Between the infimum of a set and the infimum plus any positive number, however tiny that is? There’s always at least one thing in the set.
And there isn’t always an infimum. This is obvious if your set is, like, the set of all the integers. If there’s no lower bound at all, there can’t be a greatest lower bound. So that’s obvious enough.
Infimums turn up in a good number of proofs. There are a couple reasons they do. One is that we want to prove a boundary between two kinds of things exist. It’s lurking in the proof, for example, of the intermediate value theorem. This is the proposition that if you have a continuous function on the domain [a, b], and range of real numbers, and pick some number g that’s between f(a) and f(b)? There’ll be at least one point c, between a and b, where f(c) equals g. You can structure this: look at the set of numbers x in the domain [a, b] whose f(x) is larger than g. So what’s the infimum of this set? What does f have to be for that infimum?
It also turns up a lot in proofs about calculus. Proofs about functions, particularly, especially integrating functions. A proof like this will, generically, not deal with the original function, which might have all kinds of unpleasant aspects. Instead it’ll look at a sequence of approximations of the original function. Each approximation is chosen so it has no unpleasant aspect. And then prove that we could make arbitrarily tiny the difference between the result for the function we want and the result for the sequence of functions we make. Infimums turn up in this, since we’ll want a minimum function without being sure that the minimum is in the sequence we work with.
This is the terminology of stuff to work as lower bounds. There’s a similar terminology to work with upper bounds. The upper-bound equivalent of the infimum is the supremum. They’re abbreviated as inf and sup. The supremum turns up most every time an infimum does, and for the reasons you’d expect.
If an infimum does exist, it’s unique; there can’t be two different ones. Same with the supremum.
And things can get weird. It’s possible to have lower bounds but no infimum. This seems bizarre. This is because we’ve been relying on the real numbers to guide our intuition. And the real numbers have a useful property called being “complete”. So let me break the real numbers. Imagine the real numbers except for zero. Call that the set R’. Now look at the set of positive numbers inside R’. What’s the infimum of the positive numbers, within R’? All we can do is shrug and say there is none, even though there are plenty of lower bounds. The infimum of a set depends on the set. It also depends on what bigger set that the set is within. That something depends both on a set and what the bigger set of things is, is another thing that turns up all the time in analysis. It’s worth becoming familiar with.
Today’s A To Z term is another I drew from Mr Wu, of the Singapore Math Tuition blog. It gives me more chances to discuss differential equations and mathematical physics, too.
The Hamiltonian we name for Sir William Rowan Hamilton, the 19th century Irish mathematical physicists who worked on everything. You might have encountered his name from hearing about quaternions. Or for coining the terms “scalar” and “tensor”. Or for work in graph theory. There’s more. He did work in Fourier analysis, which is what you get into when you feel at ease with Fourier series. And then wild stuff combining matrices and rings. He’s not quite one of those people where there’s a Hamilton’s Theorem for every field of mathematics you might be interested in. It’s close, though.
When you first learn about physics you learn about forces and accelerations and stuff. When you major in physics you learn to avoid dealing with forces and accelerations and stuff. It’s not explicit. But you get trained to look, so far as possible, away from vectors. Look to scalars. Look to single numbers that somehow encode your problem.
A great example of this is the Lagrangian. It’s built on “generalized coordinates”, which are not necessarily, like, position and velocity and all. They include the things that describe your system. This can be positions. It’s often angles. The Lagrangian shines in problems where it matters that something rotates. Or if you need to work with polar coordinates or spherical coordinates or anything non-rectangular. The Lagrangian is, in your general coordinates, equal to the kinetic energy minus the potential energy. It’ll be a function. It’ll depend on your coordinates and on the derivative-with-respect-to-time of your coordinates. You can take partial derivatives of the Lagrangian. This tells how the coordinates, and the change-in-time of your coordinates should change over time.
The Hamiltonian is a similar way of working out mechanics problems. The Hamiltonian function isn’t anything so primitive as the kinetic energy minus the potential energy. No, the Hamiltonian is the kinetic energy plus the potential energy. Totally different in idea.
From that description you maybe guessed you can transfer from the Lagrangian to the Hamiltonian. Maybe vice-versa. Yes, you can, although we use the term “transform”. Specifically a “Legendre transform”. We can use any coordinates we like, just as with Lagrangian mechanics. And, as with the Lagrangian, we can find how coordinates change over time. The change of any coordinate depends on the partial derivative of the Hamiltonian with respect to a particular other coordinate. This other coordinate is its “conjugate”. (It may either be this derivative, or minus one times this derivative. By the time you’re doing work in the field you’ll know which.)
That conjugate coordinate is the important thing. It’s why we muck around with Hamiltonians when Lagrangians are so similar. In ordinary, common coordinate systems these conjugate coordinates form nice pairs. In Cartesian coordinates, the conjugate to a particle’s position is its momentum, and vice-versa. In polar coordinates, the conjugate to the angular velocity is the angular momentum. These are nice-sounding pairs. But that’s our good luck. These happen to match stuff we already think is important. In general coordinates one or more of a pair can be some fusion of variables we don’t have a word for and would never care about. Sometimes it gets weird. In the problem of vortices swirling around each other on an infinitely great plane? The horizontal position is conjugate to the vertical position. Velocity doesn’t enter into it. For vortices on the sphere the longitude is conjugate to the cosine of the latitude.
What’s valuable about these pairings is that they make a “symplectic manifold”. A manifold is a patch of space where stuff works like normal Euclidean geometry does. In this case, the space is in “phase space”. This is the collection of all the possible combinations of all the variables that could ever turn up. Every particular moment of a mechanical system matches some point in phase space. Its evolution over time traces out a path in that space. Call it a trajectory or an orbit as you like.
We get good things from looking at the geometry that this symplectic manifold implies. For example, if we know that one variable doesn’t appear in the Hamiltonian, then its conjugate’s value never changes. This is almost the kindest thing you can do for a mathematical physicist. But more. A famous theorem by Emmy Noether tells us that symmetries in the Hamiltonian match with conservation laws in the physics. Time-invariance, for example — time not appearing in the Hamiltonian — gives us the conservation of energy. If only distances between things, not absolute positions, matter, then we get conservation of linear momentum. Stuff like that. To find conservation laws in physics problems is the kindest thing you can do for a mathematical physicist.
The Hamiltonian was born out of planetary physics. These are problems easy to understand and, apart from the case of one star with one planet orbiting each other, impossible to solve exactly. That’s all right. The formalism applies to all kinds of problems. They’re very good at handling particles that interact with each other and maybe some potential energy. This is a lot of stuff.
More, the approach extends naturally to quantum mechanics. It takes some damage along the way. We can’t talk about “the” position or “the” momentum of anything quantum-mechanical. But what we get when we look at quantum mechanics looks very much like what Hamiltonians do. We can calculate things which are quantum quite well by using these tools. This though they came from questions like why Saturn’s rings haven’t fallen part and whether the Earth will stay around its present orbit.
It holds surprising power, too. Notice that the Hamiltonian is the kinetic energy of a system plus its potential energy. For a lot of physics problems that’s all the energy there is. That is, the value of the Hamiltonian for some set of coordinates is the total energy of the system at that time. And, if there’s no energy lost to friction or heat or whatever? Then that’s the total energy of the system for all time.
Here’s where this becomes almost practical. We often want to do a numerical simulation of a physics problem. Generically, we do this by looking up what all the values of all the coordinates are at some starting time t0. Then we calculate how fast these coordinates are changing with time. We pick a small change in time, Δ t. Then we say that at time t0 plus Δ t, the coordinates are whatever they started at plus Δ t times that rate of change. And then we repeat, figuring out how fast the coordinates are changing now, at this position and time.
The trouble is we always make some mistake, and once we’ve made a mistake, we’re going to keep on making mistakes. We can do some clever stuff to make the smallest error possible figuring out where to go, but it’ll still happen. Usually, we stick to calculations where the error won’t mess up our results.
But when we look at stuff like whether the Earth will stay around its present orbit? We can’t make each step good enough for that. Unless we get to thinking about the Hamiltonian, and our symplectic variables. The actual system traces out a path in phase space. Everyone on that path the Hamiltonian is a particular value, the energy of the system. So use the regular methods to project most of the variables to the new time, t0 + Δ t. But the rest? Pick the values that makes the Hamiltonian work out right. Also momentum and angular momentum and other stuff we know get conserved. We’ll still make an error. But it’s a different kind of error. It’ll project to a point that’s maybe in the wrong place on the trajectory. But it’s on the trajectory.
(OK, it’s near the trajectory. Suppose the real energy is, oh, the square root of 5. The computer simulation will have an energy of 2.23607. This is close but not exactly the same. That’s all right. Each step will stay close to the real energy.)
So what we’ll get is a projection of the Earth’s orbit that maybe puts it in the wrong place in its orbit. Putting the planet on the opposite side of the sun from Venus when we ought to see Venus transiting the Sun. That’s all right, if what we’re interested in is whether Venus and Earth are still in the solar system.
There’s a special cost for this. If there weren’t we’d use it all the time. The cost is computational complexity. It’s pricey enough that you haven’t heard about these “symplectic integrators” before. That’s all right. These are the kinds of things open to us once we look long into the Hamiltonian.
These are named for George Green, an English mathematician of the early 19th century. He’s one of those people who gave us our idea of mathematical physics. He’s credited with coining the term “potential”, as in potential energy, and in making people realize how studying this simplified problems. Mostly problems in electricity and magnetism, which were so very interesting back then. On the side also came work in multivariable calculus. His work most famous to mathematics and physics majors connects integrals over the surface of a shape with (different) integrals over the entire interior volume. In more specific problems, he did work on the behavior of water in canals.
There’s a patch of (high school) algebra where you solve systems of equations in a couple variables. Like, you have to do one system where you’re solving, say,
And then maybe later on you get a different problem, one that looks like:
If you solve both of them you notice you’re doing a lot of the same work. All the same hard work. It’s only the part on the right-hand side of the equals signs that are different. Even then, the series of steps you follow on the right-hand-side are the same. They have different numbers is all. What makes the problem distinct is the stuff on the left-hand-side. It’s the set of what coefficients times what variables add together. If you get enough about matrices and vectors you get in the habit of writing this set of equations as one matrix equation, as
Here holds all the unknown variables, your x and y and z and anything else that turns up. Your holds the right-hand side. Do enough of these problems and you notice something. You can describe how to find the solution for these equations before you even know what the right-hand-side is. You can do all the hard work of solving this set of equations for a generic set of right-hand-side constants. Fill them in when you need a particular answer.
I mentioned, while writing about Fourier series, how it turns out most of what you do to numbers you can also do to functions. This really proves itself in differential equations. Both partial and ordinary differential equations. A differential equation works with some not-yet-known function u(x). For what I’m discussing here it doesn’t matter whether ‘x’ is a single variable or a whole set of independent variables, like, x and y and z. I’ll use ‘x’ as shorthand for all that. The differential equation takes u(x) and maybe multiplies it by something, and adds to that some derivatives of u(x) multiplied by something. Those somethings can be constants. They can be other, known, functions with independent variable x. They can be functions that depend on u(x) also. But if they are, then this is a nonlinear differential equation and there’s no solving that.
So suppose we have a linear differential equation. Partial or ordinary, whatever you like. There’s terms that have u(x) or its derivatives in them. Move them all to the left-hand-side. Move everything else to the right-hand-side. This right-hand-side might be constant. It might depend on x. Doesn’t matter. This right-hand-side is some function which I’ll call f(x). This f(x) might be constant; that’s fine. That’s still a legitimate function.
Put this way, every differential equation looks like:
That stuff with u(x) and its derivatives we can call an operator. An operator’s a function which has a domain of functions and a range of functions. So we can give give that a name. ‘L’ is a good name here, because if it’s not the operator for a linear differential equation — a linear operator — then we’re done anyway. So whatever our differential equation was we can write it:
Writing it makes it look like we’re multiplying L by u(x). We’re not. We’re really not. This is more like if ‘L’ is the predicate of a sentence and ‘u(x)’ is the object. Read it like, to make up an example, ‘L’ means ‘three times the second derivative plus two x times’ and ‘u(x)’ as ‘u(x)’.
Still, looking at and then back up at tells you what I’m thinking. We can find some set of instructions to, for any , find the that makes true. So why can’t we find some set of instructions to, for any , find the that makes true?
This is where a Green’s function comes in. Or, like everybody says, “the” Green’s function. “The” here we use like we might talk about “the” roots of a polynomial. Every polynomial has different roots. So, too, does every differential equation have a different Green’s function. What the Green’s function is depends on the equation. It can also depend on what domain the differential equation applies to. It can also depend on some extra information called initial values or boundary values.
The Green’s function for a differential equation has twice as many independent variables as the differential equation has. This seems like we’re making a mess of things. It’s all right. These new variables are the falsework, the scaffolding. Once they’ve helped us get our work done they disappear. This kind of thing we call a “dummy variable”. If x is the actual independent variable, then pick something else — s is a good choice — for the dummy variable. It’s from the same domain as the original x, though. So the Green’s function is some . All right, but how do you find it?
To get this, you have to solve a particular special case of the differential equation. You have to solve:
This may look like we’re not getting anywhere. It may even look like we’re getting in more trouble. What is this , for example? Well, this is a particular and famous thing called the Dirac delta function. It’s called a function as a courtesy to our mathematical physics friends, who don’t care about whether it truly is a function. Dirac is Paul Dirac, from over in physics. The one whose biography is called The Strangest Man. His delta function is a strange function. Let me say that its independent variable is t. Then is zero, unless t is itself zero. If t is zero then is … something. What is that something? … Oh … something big. It’s … just … don’t look directly at it. What’s important is the integral of this function:
I write it this way because there’s delta functions for two-dimensional spaces, three-dimensional spaces, everything. If you integrate over a region that includes the origin, the integral of the delta function is 1. If you integrate over a region that doesn’t, the integral of the delta function is 0.
The delta function has a neat property sometimes called filtering. This is what happens if you integrate some function times the Dirac delta function. Then …
This may look dumb. That’s fine. This scheme is so good at getting rid of integrals where you don’t want them. Or at getting integrals in where it’d be convenient to have.
So, I have a mental model of what the Dirac delta function does. It might help you. Think of beating a drum. It can sound like many different things. It depends on how hard you hit it, how fast you hit it, what kind of stick you use, where exactly you hit it. I think of each differential equation as a different drumhead. The Green’s function is then the sound of a specific, uniform, reference hit at a reference position. This produces a sound. I can use that sound to extrapolate how every different sort of drumming would sound on this particular drumhead.
So solving this one differential equation, to find the Green’s function for a particular case, may be hard. Maybe not. Often it’s easier than some particular f(x) because the Dirac delta function is so weird that it becomes kinda easy-ish. But you do have to find one solution to this differential equation, somehow.
Once you do, though? Once you have this ? That is glorious. Because then, whatever your f is? The solution to is:
Here the integral is over whatever the domain of the differential equation is, and whatever the domain of f is. This last integral is where the dummy variable finally evaporates. All that remains is x, as we want.
A little bit of … arithmetic isn’t the right word. But symbol manipulation will convince you this is right, if you need convincing. (The trick is remembering that ‘x’ and ‘s’ are different variables. When you differentiate with respect to ‘x’, ‘s’ acts like a constant. When you integrate with respect to ‘s’, ‘x’ acts like a constant.)
What can make a Green’s function worth finding is that we do a lot of the same kinds of differential equations. We do a lot of diffusion problems. A lot of wave transmission problems. A lot of wave-transmission-with-losses problems. So there are many problems that can all use the same tools to solve.
Consider remote detection problems. This can include things like finding things underground. It also includes, like, medical sensors. We would like to know “what kind of thing produces a signal like this?” We can detect the signal easily enough. We can model how whatever it is between the thing and our sensors changes what we could detect. (This kind of thing we call an “inverse problem”, finding the thing that could produce what we know.) Green’s functions are one of the ways we can get at the source of what we can see.
Now, Green’s functions are a powerful and useful idea. They sprawl over a lot of mathematical applications. As they do, they pick up regional dialects. Things like deciding that , for example. None of these are significant differences. But before you go poking into someone else’s field and solving their problems, take a moment. Double-check that their symbols do mean precisely what you think they mean. It’ll save you some petty quarrels.
Also, I really don’t like how those systems of equations turned out up at the top of this essay. But I couldn’t work out how to do arrays of equations all lined up along the equals sign, or other mildly advanced LaTeX stuff like doing a function-definition-by-cases. If someone knows of the Real Official Proper List of what you can and can’t do with the LaTeX that comes from a standard free WordPress.com blog I’d appreciate a heads-up. Thank you.
Fourier series are named for Jean-Baptiste Joseph Fourier, and are maybe the greatest example of the theory that’s brilliantly wrong. Anyone can be wrong about something. There’s genius in being wrong in a way that gives us good new insights into things. Fourier series were developed to understand how the fluid we call “heat” flows through and between objects. Heat is not a fluid. So what? Pretending it’s a fluid gives us good, accurate results. More, you don’t need to use Fourier series to work with a fluid. Or a thing you’re pretending is a fluid. It works for lots of stuff. The Fourier series method challenged assumptions mathematicians had made about how functions worked, how continuity worked, how differential equations worked. These problems could be sorted out. It took a lot of work. It challenged and expended our ideas of functions.
Fourier also managed to hold political offices in France during the Revolution, the Consulate, the Empire, the Bourbon Restoration, the Hundred Days, and the Second Bourbon Restoration without getting killed for his efforts. If nothing else this shows the depth of his talents.
The weirdness of the setup: you want to think of functions as points in space. The allegory is rather close. Think of the common association between a point in space and the coordinates that describe that point. Pretend those are the same thing. Then you can do stuff like add points together. That is, take the coordinates of both points. Add the corresponding coordinates together. Match that sum-of-coordinates to a point. This gives us the “sum” of two points. You can subtract points from one another, again by going through their coordinates. Multiply a point by a constant and get a new point. Find the angle between two points. (This is the angle formed by the line segments connecting the origin and both points.)
Functions can work like this. You can add functions together and get a new function. Subtract one function from another. Multiply a function by a constant. It’s even possible to describe an “angle” between two functions. Mathematicians usually call that the dot product or the inner product. But we will sometimes call two functions “orthogonal”. That means the ordinary everyday meaning of “orthogonal”, if anyone said “orthogonal” in ordinary everyday life.
We can take equations of a bunch of variables and solve them. Call the values of that solution the coordinates of a point. Then we talk about finding the point where something interesting happens. Or the points where something interesting happens. We can do the same with differential equations. This is finding a point in the space of functions that makes the equation true. Maybe a set of points. So we can find a function or a family of functions solving the differential equation.
You have reasons for skepticism, even if you’ll grant me treating functions as being like points in space. You might remember solving systems of equations. You need as many equations as there are dimensions of space; a two-dimensional space needs two equations. A three-dimensional space needs three equations. You might have worked four equations in four variables. You were threatened with five equations in five variables if you didn’t all settle down. You’re not sure how many dimensions of space “all the possible functions” are. It’s got to be more than the one differential equation we started with.
This is fair. The approach I’m talking about uses the original differential equation, yes. But it breaks it up into a bunch of linear equations. Enough linear equations to match the space of functions. We turn a differential equation into a set of linear equations, a matrix problem, like we know how to solve. So that settles that.
So suppose solves the differential equation. Here I’m going to pretend that the function has one independent variable. Many functions have more than this. Doesn’t matter. Everything I say here extends into two or three or more independent variables. It takes longer and uses more symbols and we don’t need that. The thing about is that we don’t know what it is, but would quite like to.
What we’re going to do is choose a reference set of functions that we do know. Let me call them going on to however many we need. It can be infinitely many. It certainly is at least up to some for some big enough whole number N. These are a set of “basis functions”. For any function we want to represent we can find a bunch of constants, called coefficients. Let me use to represent them. Any function we want is the sum of the coefficient times the matching basis function. That is, there’s some coefficients so that
is true. That summation goes on until we run out of basis functions. Or it runs on forever. This is a great way to solve linear differential equations. This is because we know the basis functions. We know everything we care to know about them. We know their derivatives. We know everything on the right-hand side except the coefficients. The coefficients matching any particular function are constants. So the derivatives of , written as the sum of coefficients times basis functions, are easy to work with. If we need second or third or more derivatives? That’s no harder to work with.
You may know something about matrix equations. That is that solving them takes freaking forever. The bigger the equation, the more forever. If you have to solve eight equations in eight unknowns? If you start now, you might finish in your lifetime. For this function space? We need dozens, hundreds, maybe thousands of equations and as many unknowns. Maybe infinitely many. So we seem to have a solution that’s great apart from how we can’t use it.
Except. What if the equations we have to solve are all easy? If we have to solve a bunch that looks like, oh, and and … well, that’ll take some time, yes. But not forever. Great idea. Is there any way to guarantee that?
It’s in the basis functions. If we pick functions that are orthogonal, or are almost orthogonal, to each other? Then we can turn the differential equation into an easy matrix problem. Not as easy as in the last paragraph. But still, not hard.
So what’s a good set of basis functions?
And here, about 800 words later than everyone was expecting, let me introduce the sine and cosine functions. Sines and cosines make great basis functions. They don’t grow without bounds. They don’t dwindle to nothing. They’re easy to differentiate. They’re easy to integrate, which is really special. Most functions are hard to integrate. We even know what they look like. They’re waves. Some have long wavelengths, some short wavelengths. But waves. And … well, it’s easy to make sets of them orthogonal.
We have to set some rules. The first is that each of these sine and cosine basis functions have a period. That is, after some time (or distance), they repeat. They might repeat before that. Most of them do, in fact. But we’re guaranteed a repeat after no longer than some period. Call that period ‘L’.
Each of these sine and cosine basis functions has to have a whole number of complete oscillations within the period L. So we can say something about the sine and cosine functions. They have to look like these:
Here ‘j’ and ‘k’ are some whole numbers. I have two sets of basis functions at work here. Don’t let that throw you. We could have labelled them all as , with some clever scheme that told us for a given k whether it represents a sine or a cosine. It’s less hard work if we have s’s and c’s. And if we have coefficients of both a’s and b’s. That is, we suppose the function is:
This, at last, is the Fourier series. Each function has its own series. A “series” is a summation. It can be of finitely many terms. It can be of infinitely many. Often infinitely many terms give more interesting stuff. Like this, for example. Oh, and there’s a bare there, not multiplied by anything more complicated. It makes life easier. It lets us see that the Fourier series for, like, 3 + f(x) is the same as the Fourier series for f(x), except for the leading term. The ½ before that makes easier some work that’s outside the scope of this essay. Accept it as one of the merry, wondrous appearances of ‘2’ in mathematics expressions.
It’s great for solving differential equations. It’s also great for encryption. The sines and the cosines are standard functions, after all. We can send all the information we need to reconstruct a function by sending the coefficients for it. This can also help us pick out signal from noise. Noise has a Fourier series that looks a particular way. If you take the coefficients for a noisy signal and remove that? You can get a good approximation of the original, noiseless, signal.
This all seems great. That’s a good time to feel skeptical. First, like, not everything we want to work with looks like waves. Suppose we need a function that looks like a parabola. It’s silly to think we can add a bunch of sines and cosines and get a parabola. Like, a parabola isn’t periodic, to start with.
So it’s not. To use Fourier series methods on something that’s not periodic, we use a clever technique: we tell a fib. We declare that the period is something bigger than we care about. Say the period is, oh, ten million years long. A hundred light-years wide. Whatever. We trust that the difference between the function we do want, and the function that we calculate, will be small. We trust that if someone ten million years from now and a hundred light-years away wishes to complain about our work, we will be out of the office that day. Letting the period L be big enough is a good reliable tool.
The other thing? Can we approximate any function as a Fourier series? Like, at least chunks of parabolas? Polynomials? Chunks of exponential growths or decays? What about sawtooth functions, that rise and fall? What about step functions, that are constant for a while and then jump up or down?
The answer to all these questions is “yes,” although drawing out the word and raising a finger to say there are some issues we have to deal with. One issue is that most of the time, we need an infinitely long series to represent a function perfectly. TThis is fine if we’re trying to prove things about functions in general rather than solve some specific problem. It’s no harder to write the sum of infinitely many terms than the sum of finitely many terms. You write an &infty; symbol instead of an N in some important places. But if we want to solve specific problems? We probably want to deal with finitely many terms. (I hedge that statement on purpose. Sometimes it turns out we can find a formula for all the infinitely many coefficients.) This will usually give us an approximation of the we want. The approximation can be as good as we want, but to get a better approximation we need more terms. Fair enough. This kind of tradeoff doesn’t seem too weird.
Another issue is in discontinuities. If jumps around? If it has some point where it’s undefined? If it has corners? Then the Fourier series has problems. Summing up sines and cosines can’t give us a sudden jump or a gap or anything. Near a discontinuity, the Fourier series will get this high-frequency wobble. A bigger jump, a bigger wobble. You may not blame the series for not representing a discontinuity. But it does mean that what is, otherwise, a pretty good match for the you want gets this region where it stops being so good a match.
That’s all right. These issues aren’t bad enough, or unpredictable enough, to keep Fourier series from being powerful tools. Even when we find problems for which sines and cosines are poor fits, we use this same approach. Describe a function we would like to know as the sums of functions we choose to work with. Fourier series are one of those ideas that helps us solve problems, and guides us to new ways to solve problems.
The oldest reason to hide a message, at least from all but select recipients. Ancient encryption methods will substitute one letter for another, or will mix up the order of letters in a message. This won’t hide a message forever. But it will slow down a person trying to decrypt the message until they decide they don’t need to know what it says. Or decide to bludgeon the message-writer into revealing the secret.
Substituting one letter for another won’t stop an eavesdropper from working out the message. Not indefinitely, anyway. There are patterns in the language. Any language, but take English as an example. A single-letter word is either ‘I’ or ‘A’. A two-letter word has a great chance of being ‘in’, ‘on’, ‘by’, ‘of’, ‘an’, or a couple other choices. Solving this is a fun pastime, for people who like this. If you need it done fast, let a computer work it out.
To hide the message better requires being cleverer. For example, you could substitue letters according to a slightly different scheme for each letter in the original message. The Vignère cipher is an example of this. I remember some books from my childhood, written in the second person. They had programs that you-the-reader could type in to live the thrill of being a child secret agent computer programmer. This encryption scheme was one of the programs used for passing on messages. We can make the plans more complicated yet, but that won’t give us better insight yet.
The objective is to turn the message into something less predictable. An encryption which turns, say, ‘the’ into ‘rgw’ will slow the reader down. But if they pay attention and notice, oh, the text also has the words ‘rgwm’, ‘rgey’, and rgwb’ turn up a lot? It’s hard not to suspect these are ‘the’, ‘them’, ‘they’, and ‘then’. If a different three-letter code is used for every appearance of ‘the’, good. If there’s a way to conceal the spaces as something else, that’s even better, if we want it harder to decrypt the message.
So the messages hardest to decrypt should be the most random. We can give randomness a precise definition. We owe it to information theory, which is the study of how to encode and successfully transmit and decode messages. In this, the information content of a message is its entropy. Yes, the same word as used to describe broken eggs and cream stirred into coffee. The entropy measures how likely each possible message is. Encryption matches the message you really want with a message of higher entropy. That is, one that’s harder to predict. Decrypting reverses that matching.
So what goes into a message? We call them words, or codewords, so we have a clear noun to use. A codeword is a string of letters from an agreed-on alphabet. The terminology draws from common ordinary language. Cryptography grew out of sending sentences.
But anything can be the letters of the alphabet. Any string of them can be a codeword. An unavoidable song from my childhood told the story of a man asking his former lover to tie a yellow ribbon around an oak tree. This is a tiny alphabet, but it only had to convey two words, signalling whether she was open to resuming their relationship. Digital computers use an alphabet of two memory states. We label them ‘0’ and ‘1’, although we could as well label them +5 and -5, or A and B, or whatever. It’s not like actual symbols are scrawled very tight into the chips. Morse code uses dots and dashes and short and long pauses. Naval signal flags have a set of shapes and patterns to represent the letters of the alphabet, as well as common or urgent messages. There is not a single universally correct number of letters or length of words for encryption. It depends on what the code will be used for, and how.
Naval signal flags help me to my next point. There’s a single pattern which, if shown, communicates the message “I require a pilot”. Another, “I am on fire and have dangerous cargo”. Still another, “All persons should report on board as the vessel is about to set to sea”. These are whole sentences; they’re encrypted into a single letter.
And this is the second great use of encryption. English — any human language — has redundancy to it. Think of the sentence “No, I’d rather not go out this evening”. It’s polite, but is there anything in it not communicated by texting back “N”? An encrypted message is, often, shorter than the original. To send a message costs something. Time, if nothing else. To send it more briefly is typically better.
There are dangers to this. Strike out any word from “No, I’d rather not go out this evening”. Ask someone to guess what belongs there. Only the extroverts will have trouble. I guess if you strike out “evening” people might guess “today” or “tomorrow” or something. The sentiment of the sentence remains.
But strike out a letter from “N” and ask someone to guess what was meant. And this is a danger of encryption. The encrypted message has a higher entropy, a higher unpredictability. If some mistake happens in transmission, we’re lost.
We can fight this. It’s possible to build checks into an encryption. To carry a bit of extra information that lets one know that the message was garbled. These are “error-detecting codes”. It’s even possible to carry enough extra information to correct some errors. These are “error-correcting codes”. There are limits, of course. This kind of error-correcting takes calculation time and message space. We lose some economy but gain reliability. There is a general lesson in this.
And not everything can compress. There are (if I’m reading this right) 26 letter, 10 numeral, and four repeater flags used under the International Code of Symbols. So there are at most 40 signals that could be reduced to a single flag. If we need to communicate “I am on fire but have no dangerous cargo” we’re at a loss. We have to spell things out more. It’s a quick proof, by way of the pigeonhole principle, which tells us that not every message can compress. But this is all right. There are many messages we will never need to send. (“I am on fire and my cargo needs updates on Funky Winkerbean.”) If it’s mostly those that have no compressed version, who cares?
Encryption schemes are almost as flexible as language itself. There are families of kinds of schemes. This lets us fit schemes to needs: how many different messages do we need to be able to send? How sure do we need to be that errors are corrected? Or that errors are detected? How hard do we want it to be for eavesdroppers to decode the message? Are we able to set up information with the intended recipients separately? What we need, and what we are willing to do without, guide the scheme we use.
We’re only in the third week of the Fall 2019 Mathematics A-to-Z, but this is when I should be nailing down topics for the next several letters. So again, I ask you kind readers for suggestions. I’ve done five A-to-Z sequences before, from 2015 through 2018, and am listing the essays I’ve already written for the middle part of the alphabet. I’m open to revisiting topics, if I think I can improve on what I already wrote. But I reserve the right to use whatever topic feels most interesting to me.
To suggest anything for the letters I through N please leave the comment here. Also do please let me know if you have a mathematics blog, a Twitter or Mathstodon account, a YouTube channel, or anything else that you’d like to share.
I thank you again you for any thoughts you have. Please ask if there are any questions. I hope to be open to topics in any field of mathematics, including ones I don’t really know. The fun and terror of writing about a thing I’m only learning about is part of what I get from this kind of project.