Today’s A To Z term is another free choice. So I’m picking a term from the world of … mathematics. There are a lot of norms out there. Many are specialized to particular roles, such as looking at complex-valued numbers, or vectors, or matrices, or polynomials.
Still they share things in common, and that’s what this essay is for. And I’ve brushed up against the topic before.
The norm, also, has nothing particular to do with “normal”. “Normal” is an adjective which attaches to every noun in mathematics. This is security for me as while these A-To-Z sequences may run out of X and Y and W letters, I will never be short of N’s.
A “norm” is the size of whatever kind of thing you’re working with. You can see where this is something we look for. It’s easy to look at two things and wonder which is the smaller.
There are many norms, even for one set of things. Some seem compelling. For the real numbers, we usually let the absolute value do this work. By “usually” I mean “I don’t remember ever seeing a different one except from someone introducing the idea of other norms”. For a complex-valued number, it’s usually the square root of the sum of the square of the real part and the square of the imaginary coefficient. For a vector, it’s usually the square root of the vector dot-product with itself. (Dot product is this binary operation that is like multiplication, if you squint, for vectors.) Again, these, the “usually” means “always except when someone’s trying to make a point”.
Which is why we have the convention that there is a “the norm” for a kind of operation. The norm dignified as “the” is usually the one that looks as much as possible like the way we find distances between two points on a plane. I assume this is because we bring our intuition about everyday geometry to mathematical structures. You know how it is. Given an infinity of possible choices we take the one that seems least difficult.
Every sort of thing which can have a norm, that I can think of, is a vector space. This might be my failing imagination. It may also be that it’s quite easy to have a vector space. A vector space is a collection of things with some rules. Those rules are about adding the things inside the vector space, and multiplying the things in the vector space by scalars. These rules are not difficult requirements to meet. So a lot of mathematical structures are vector spaces, and the things inside them are vectors.
A norm is a function that has these vectors as its domain, and the non-negative real numbers as its range. And there are three rules that it has to meet. So. Give me a vector ‘u’ and a vector ‘v’. I’ll also need a scalar, ‘a. Then the function f is a norm when:
. This is a famous rule, called the triangle inequality. You know how in a triangle, the sum of the lengths of any two legs is greater than the length of the third leg? That’s the rule at work here.
. This doesn’t have so snappy a name. Sorry. It’s something about being homogeneous, at least.
If then u has to be the additive identity, the vector that works like zero does.
Norms take on many shapes. They depend on the kind of thing we measure, and what we find interesting about those things. Some are familiar. Look at a Euclidean space, with Cartesian coordinates, so that we might write something like (3, 4) to describe a point. The “the norm” for this, called the Euclidean norm or the L2 norm, is the square root of the sum of the squares of the coordinates. So, 5. But there are other norms. The L1 norm is the sum of the absolute values of all the coefficients; here, 7. The L∞ norm is the largest single absolute value of any coefficient; here, 4.
A polynomial, meanwhile? Write it out as . Take the absolute value of each of these terms. Then … you have choices. You could take those absolute values and add them up. That’s the L1 polynomial norm. Take those absolute values and square them, then add those squares, and take the square root of that sum. That’s the L2 norm. Take the largest absolute value of any of these coefficients. That’s the L∞ norm.
These don’t look so different, even though points in space and polynomials seem to be different things. We designed the tool. We want it not to be weirder than it has to be. When we try to put a norm on a new kind of thing, we look for a norm that resembles the old kind of thing. For example, when we want to define the norm of a matrix, we’ll typically rely on a norm we’ve already found for a vector. At least to set up the matrix norm; in practice, we might do a calculation that doesn’t explicitly use a vector’s norm, but gives us the same answer.
If we have a norm for some vector space, then we have an idea of distance. We can say how far apart two vectors are. It’s the norm of the difference between the vectors. This is called defining a metric on the vector space. A metric is that sense of how far apart two things are. What keeps a norm and a metric from being the same thing is that it’s possible to come up with a metric that doesn’t match any sensible norm.
It’s always possible to use a norm to define a metric, though. Doing that promotes our normed vector space to the dignified status of a “metric space”. Many of the spaces we find interesting enough to work in are such metric spaces. It’s hard to think of doing without some idea of size.
Today’s A To Z term was nominated again by @aajohannas. The other compelling nomination was from Vayuputrii, for the Mittag-Leffler function. I was tempted. But I realized I could not think of a clear way to describe why the function was interesting. Or even where it comes from that avoided being a heap of technical terms. There’s no avoiding technical terms in writing about mathematics, but there’s only so much I want to put in at once either. It also makes me realize I don’t understand the Mittag-Leffler function, but it is after all something I haven’t worked much with.
The Mittag-Leffler function looks like it’s one of those things named for several contributors, like Runge-Kutta Integration or Cauchy-Kovalevskaya Theorem or something. Not so here; this was one person, Gösta Mittag-Leffler. His name’s all over the theory of functions. And he was one of the people helping Sofia Kovalevskaya, whom you know from every list of pioneering women in mathematics, secure her professorship.
A martingale is how mathematicians prove you can’t get rich gambling.
Well, that exaggerates. Some people will be lucky, of course. But there’s no strategy that works. The only strategy that works is to rig the game. You can do this openly, by setting rules that give you a slight edge. You usually have to be the house to do this. Or you can do it covertly, using tricks like card-counting (in blackjack) or weighted dice or other tricks. But a fair game? Meaning one not biased towards or against any player? There’s no strategy to guarantee winning that.
We can make this more technical. Martingales arise form the world of stochastic processes. This is an indexed set of random variables. A random variable is some variable with a value that depends on the result of some phenomenon. A tossed coin. Rolled dice. Number of people crossing a particular walkway over a day. Engine temperature. Value of a stock being traded. Whatever. We can’t forecast what the next value will be. But we now the distribution, which values are more likely and which ones are unlikely and which ones impossible.
The field grew out of studying real-world phenomena. Things we could sample and do statistics on. So it’s hard to think of an index that isn’t time, or some proxy for time like “rolls of the dice”. Stochastic processes turn up all over the place. A lot of what we want to know is impossible, or at least impractical, to exactly forecast. Think of the work needed to forecast how many people will cross this particular walk four days from now. But it’s practical to describe what are more and less likely outcomes. What the average number of walk-crossers will be. What the most likely number will be. Whether to expect tomorrow to be a busier or a slower day.
And this is what the martingale is for. Start with a sequence of your random variables. How many people have crossed that street each day since you started studying. What is the expectation value, the best guess, for the next result? Your best guess for how many will cross tomorrow? Keeping in mind your knowledge of how all these past values. That’s an important piece. It’s not a martingale if the history of results isn’t a factor.
Every probability question has to deal with knowledge. Sometimes it’s easy. The probability of a coin coming up tails next toss? That’s one-half. The probability of a coin coming up tails next toss, given that it came up tails last time? That’s still one-half. The probability of a coin coming up tails next toss, given that it came up tails the last 40 tosses? That’s … starting to make you wonder if this is a fair coin. I’d bet tails, but I’d also ask to examine both sides, for a start.
So a martingale is a stochastic process where we can make forecasts about the future. Particularly, the expectation value. The expectation value is the sum of the products of every possible value and how probable they are. In a martingale, the expected value for all time to come is just the current value. So if whatever it was you’re measuring was, say, 40 this time? That’s your expectation for the whole future. Specific values might be above 40, or below 40, but on average, 40 is it.
Put it that way and you’d think, well, how often does that ever happen? Maybe some freak process will give you that, but most stuff?
Well, here’s one. The random walk. Set a value. At each step, it can increase or decrease by some fixed value. It’s as likely to increase as to decrease. This is a martingale. And it turns out a lot of stuff is random walks. Or can be processed into random walks. Even if the original walk is unbalanced — say it’s more likely to increase than decrease. Then we can do a transformation, and find a new random variable based on the original. Then that one is as likely to increase as decrease. That one is a martingale.
It’s not just random walks. Poisson processes are things where the chance of something happening is tiny, but it has lots of chances to happen. So this measures things like how many car accidents happen on this stretch of road each week. Or where a couple plants will grow together into a forest, as opposed to lone trees. How often a store will have too many customers for the cashiers on hand. These processes by themselves aren’t often martingales. But we can use them to make a new stochastic process, and that one is a martingale.
Where this all comes to gambling is in stopping times. This is a random variable that’s based on the stochastic process you started with. Its value at each index represents the probability that the random variable in that has reached some particular value by this index. The language evokes a gambler’s decision: when do you stop? There are two obvious stopping times for any game. One is to stop when you’ve won enough money. The other is to stop when you’ve lost your whole stake.
So there is something interesting about a martingale that has bounds. It will almost certainly hit at least one of those bounds, in a finite time. (“Almost certainly” has a technical meaning. It’s the same thing I mean when I say if you flip a fair coin infinitely many times then “almost certainly” it’ll come up tails at least once. Like, it’s not impossible that it doesn’t. It just won’t happen.) And for the gambler? The boundary of “runs out of money” is a lot closer than “makes the house run out of money”.
Oh, if you just want a little payoff, that’s fine. If you’re happy to walk away from the table with a one percent profit? You can probably do that. You’re closer to that boundary than to the runs-out-of-money one. A ten percent profit? Maybe so. Making an unlimited amount of money, like you’d want to live on your gambling winnings? No, that just doesn’t happen.
This gets controversial when we turn from gambling to the stock market. Or a lot of financial mathematics. Look at the value of a stock over time. I write “stock” for my convenience. It can be anything with a price that’s constantly open for renegotiation. Stocks, bonds, exchange funds, used cars, fish at the market, anything. The price over time looks like it’s random, at least hour-by-hour. So how can you reliably make money if the fluctuations of the price of a stock are random?
Well, if I knew, I’d have smaller student loans outstanding. But martingales seem like they should offer some guidance. Much of modern finance builds on not dealing with a stock price varying. Instead, buy the right to buy the stock at a set price. Or buy the right to sell the stock at a set price. This lets you pay to secure a certain profit, or a worst-possible loss, in case the price reaches some level. And now you see the martingale. Is it likely that the stock will reach a certain price within this set time? How likely? This can, in principle, guide you to a fair price for this right-to-buy.
The mathematical reasoning behind that is fine, so far as I understand it. Trouble arises because pricing correctly means having a good understanding of how likely it is prices will reach different levels. Fortunately, there are few things humans are better at than estimating probabilities. Especially the probabilities of complicated situations, with abstract and remote dangers.
So martingales are an interesting corner of mathematics. They apply to purely abstract problems like random walks. Or to good mathematical physics problems like Brownian motion and the diffusion of particles. And they’re lurking behind the scenes of the finance news. Exciting stuff.
I’m hopefully going to pass the halfway point on this year’s mathematics A-To-Z. This makes it a good time to panel for topics for the next several letters in the alphabet. It’s easier for me to keep my notes straight if you post requests as comments on this thread, but I’ll try to keep up if you do comment on other threads.
As ever, I’m happy to consider most mathematical topics, including ones that I’ve written about in the past if I think I can better an old essay. If there’s several suggestions for the same letter, I’ll pick the one that I think I can do most interestingly. If several seem interesting I might try rephrasing, if the subject allows for that.
And I do thank everyone who makes a suggestion, especially if it’s one that surprises me and that makes me learn something along the way.
Here’s the essays I’ve written in past years for the letters O through T.
I couldn’t find a place to fit this in the essay proper. But it’s too good to leave out. The simplex method, discussed within, traces to George Dantzig. He’d been planning methods for the US Army Air Force during the Second World War. Dantzig is a person you have heard about, if you’ve heard any mathematical urban legends. In 1939 he was late to Jerzy Neyman’s class. He took two statistics problems on the board to be homework. He found them “harder than usual”, but solved them in a couple days and turned in the late homework hoping Neyman would be understanding. They weren’t homework. They were examples of famously unsolved problems. Within weeks Neyman had written one of the solutions up for publication. When he needed a thesis topic Neyman advised him to just put what he already had in a binder. It’s the stuff every grad student dreams of. The story mutated. It picked up some glurge to become a narrative about positive thinking. And mutated further, into the movie Good Will Hunting.
Every three days one of the comic strips I read has the elderly main character talk about how they never used algebra. This is my hyperbole. But mathematics has got the reputation for being difficult and inapplicable to everyday life. We’ll concede using arithmetic, when we get angry at the fast food cashier who hands back our two pennies before giving change for our $6.77 hummus wrap. But otherwise, who knows what an elliptic integral is, and whether it’s working properly?
Linear programming does not have this problem. In part, this is because it lacks a reputation. But those who have heard of it, acknowledge it as immensely practical mathematics. It is about something a particular kind of human always finds compelling. That is how to do a thing best.
There are several kinds of “best”. There is doing a thing in as little time as possible. Or for as little effort as possible. For the greatest profit. For the highest capacity. For the best score. For the least risk. The goals have a thousand names, none of which we need to know. They all mean the same thing. They mean “the thing we wish to optimize”. To optimize has two directions, which are one. The optimum is either the maximum or the minimum. To be good at finding a maximum is to be good at finding a minimum.
It’s obvious why we call this “programming”; obviously, we leave the work of finding answers to a computer. It’s a spurious reason. The “programming” here comes from an independent sense of the word. It means more about finding a plan. Think of “programming” a night’s entertainment, so that every performer gets their turn, all scene changes have time to be done, you don’t put two comedians right after the other, and you accommodate the performer who has to leave early and the performer who’ll get in an hour late. Linear programming problems are often about finding how to do as well as possible given various priorities. All right. At least the “linear” part is obvious. A mathematics problem is “linear” when it’s something we can reasonably expect to solve. This is not the technical meaning. Technically what it means is we’re looking at a function something like:
Here, x, y, and z are the independent variables. We don’t know their values but wish to. a, b, and c are coefficients. These values are set to some constant for the problem, but they might be something else for other problems. They’re allowed to be positive or negative or even zero. If a coefficient is zero, then the matching variable doesn’t affect matters at all. The corresponding value can be anything at all, within the constraints.
I’ve written this for three variables, as an example and because ‘x’ and ‘y’ and ‘z’ are comfortable, familiar variables. There can be fewer. There can be more. There almost always are. Two- and three-variable problems will teach you how to do this kind of problem. They’re too simple to be interesting, usually. To avoid committing to a particular number of variables we can use indices. for values of j from 1 up to N. Or we can bundle all these values together into a vector, and write everything as . This has a particular advantage since when we can write the coefficients as a vector too. Then we use the notation of linear algebra, and write that we hope to maximize the value of:
(The superscript T means “transpose”. As a linear algebra problem we’d usually think of writing a vector as a tall column of things. By transposing that we write a long row of things. By transposing we can use the notation of matrix multiplication.)
This is the objective function. Objective here in the sense of goal; it’s the thing we want to find the best possible value of.
We have constraints. These represent limits on the variables. The variables are always things that come in limited supply. There’s no allocating more money than the budget allows, nor putting more people on staff than work for the company. Often these constraints interact. Perhaps not only is there only so much staff, but no one person can work more than a set number of days in a row. Something like that. That’s all right. We can write all these constraints as a matrix equation. An inequality, properly. We can bundle all the constraints into a big matrix named A, and demand:
Also, traditionally, we suppose that every component of is non-negative. That is, positive, or at lowest, zero. This reflects the field’s core problems of figuring how to allocate resources. There’s no allocating less than zero of something.
But we need some bounds. This is easiest to see with a two-dimensional problem. Try it yourself: draw a pair of axes on a sheet of paper. Now put in a constraint. Doesn’t matter what. The constraint’s edge is a straight line, which you can draw at any position and any angle you like. This includes horizontal and vertical. Shade in one side of the constraint. Whatever you shade in is the “feasible region”, the sets of values allowed under the constraint. Now draw in another line, another constraint. Shade in one side or the other of that. Draw in yet another line, another constraint. Shade in one side or another of that. The “feasible region” is whatever points have taken on all these shades. If you were lucky, this is a bounded region, a triangle. If you weren’t lucky, it’s not bounded. It’s maybe got some corners but goes off to the edge of the page where you stopped shading things in.
So adding that every component of is at least as big as zero is a backstop. It means we’ll usually get a feasible region with a finite volume. What was the last project you worked on that had no upper limits for anything, just minimums you had to satisfy? Anyway if you know you need something to be allowed less than zero go ahead. We’ll work it out. The important thing is there’s finite bounds on all the variables.
I didn’t see the bounds you drew. It’s possible you have a triangle with all three shades inside. But it’s also possible you picked the other sides to shade, and you have an annulus, with no region having more than two shades in it. This can happen. It means it’s impossible to satisfy all the constraints at once. At least one of them has to give. You may be reminded of the sign taped to the wall of your mechanics’ about picking two of good-fast-cheap.
But impossibility is at least easy. What if there is a feasible region?
Well, we have reason to hope. The optimum has to be somewhere inside the region, that’s clear enough. And it even has to be on the edge of the region. If you’re not seeing why, think of a simple example, like, finding the maximum of , inside the square where x is between 0 and 2 and y is between 0 and 3. Suppose you had a putative maximum on the inside, like, where x was 1 and y was 2. What happens if you increase x a tiny bit? If you increase y by twice that? No, it’s only on the edges you can get a maximum that can’t be locally bettered. And only on the corners of the edges, at that.
(This doesn’t prove the case. But it is what the proof gets at.)
So the problem sounds simple then! We just have to try out all the vertices and pick the maximum (or minimum) from them all.
OK, and here’s where we start getting into trouble. With two variables and, like, three constraints? That’s easy enough. That’s like five points to evaluate? We can do that.
We never need to do that. If someone’s hiring you to test five combinations I admire your hustle and need you to start getting me consulting work. A real problem will have many variables and many constraints. The feasible region will most often look like a multifaceted gemstone. It’ll extend into more than three dimensions, usually. It’s all right if you just imagine the three, as long as the gemstone is complicated enough.
Because now we’ve got lots of vertices. Maybe more than we really want to deal with. So what’s there to do?
The basic approach, the one that’s foundational to the field, is the simplex method. A “simplex” is a triangle. In three dimensions, anyway. In four dimensions it’s a tetrahedron. In two dimensions it’s a line segment. Generally, however many dimensions of space you have? The simplex is the simplest thing that fills up volume in your space.
You know how you can turn any polygon into a bunch of triangles? Just by connecting enough vertices together? You can turn a polyhedron into a bunch of tetrahedrons, by adding faces that connect trios of vertices. And for polyhedron-like shapes in more dimensions? We call those polytopes. Polytopes we can turn into a bunch of simplexes. So this is why it’s the “simplex method”. Any one simplex it’s easy to check the vertices on. And we can turn the polytope into a bunch of simplexes. And we can ignore all the interior vertices of the simplexes.
So here’s the simplex method. First, break your polytope up into simplexes. Next, pick any simplex; doesn’t matter which. Pick any outside vertex of that simplex. This is the first viable possible solution. It’s most likely wrong. That’s okay. We’ll make it better.
Because there are other vertices on this simplex. And there are other simplexes, adjacent to that first, which share this vertex. Test the vertices that share an edge with this one. Is there one that improves the objective function? Probably. Is there a best one of those in this simplex? Sure. So now that’s our second viable possible solution. If we had to give an answer right now, that would be our best guess.
But this new vertex, this new tentative solution? It shares edges with other vertices, across several simplexes. So look at these new neighbors. Are any of them an improvement? Which one of them is the best improvement? Move over there. That’s our next tentative solution.
You see where this is going. Keep at this. Eventually it’ll wind to a conclusion. Usually this works great. If you have, like, 8 constraints, you can usually expect to get your answer in from 16 to 24 iterations. If you have 20 constraints, expect an answer in from 40 to 60 iterations. This is doing pretty well.
But it might take a while. It’s possible for the method to “stall” a while, often because one or more of the variables is at its constraint boundary. Or the division of polytope into simplexes got unlucky, and it’s hard to get to better solutions. Or there might be a string of vertices that are all at, or near, the same value, so the simplex method can’t resolve where to “go” next. In the worst possible case, the simplex method takes a number of iterations that grows exponentially with the number of constraints. This, yes, is very bad. It doesn’t typically happen. It’s a numerical algorithm. There’s some problem to spoil any numerical algorithm.
You may have complaints. Like, the world is complicated. Why are we only looking at linear objective functions? Or, why only look at linear constraints? Well, if you really need to do that? Go ahead, but that’s not linear programming anymore. Think hard about whether you really need that, though. Linear anything is usually simpler than nonlinear anything. I mean, if your optimization function absolutely has to have in it? Could we just say you have a new variable that just happens to be equal to the square of y? Will that work? If you have to have the sine of z? Are you sure that z isn’t going to get outside the region where the sine of z is pretty close to just being z? Can you check?
Maybe you have, and there’s just nothing for it. That’s all right. This is why optimization is a living field of study. It demands judgement and thought and all that hard work.
I’m more fluent in graph theory, and my writing will reflect that. But its critical insight involves looking at spaces and ignoring things like distance and area and angle. It is amazing that one can discard so much of geometry and still have anything to consider. What we do learn then applies to very many problems.
Königsberg Bridge Problem.
Once upon a time there was a city named Königsberg. It no longer is. It is Kaliningrad now. It’s no longer in that odd non-contiguous chunk of Prussia facing the Baltic Sea. It’s now in that odd non-contiguous chunk of Russia facing the Baltic Sea.
I put it this way because what the city evokes, to mathematicians, is a story. I do not have specific reason to think the story untrue. But it is a good story, and as I think more about history I grow more skeptical of good stories. A good story teaches, though not always the thing it means to convey.
The story is this. The city is on two sides of the Pregel river, now the Pregolya River. Two large islands are in the river. For several centuries these four land masses were connected by a total of seven bridges. And we are told that people in the city would enjoy free time with an idle puzzle. Was there a way to walk all seven bridges one and only one time? If no one did something fowl like taking a boat to cross the river, or not going the whole way across a bridge, anyway? There were enough bridges, though, and enough possible ways to cross them, that trying out every option was hopeless.
Then came Leonhard Euler. Who is himself a preposterous number of stories. Pick any major field of mathematics; there is an Euler’s Theorem at its center. Or an Euler’s Formula. Euler’s Method. Euler’s Function. Likely he brought great new light to it.
And in 1736 he solved the Königsberg Bridge Problem. The answer was to look at what would have to be true for a solution to exist. He noticed something so obvious it required genius not to dismiss it. It seems too simple to be useful. In a successful walk you enter each land mass (river bank or island) the same number of times you leave it. So if you cross each bridge exactly once, you use an even number of bridges per land mass. The exceptions are that you must start at one land mass, and end at a land mass. Maybe a different one. How you get there doesn’t count for the problem. How you leave doesn’t either. So the land mass you start from may have an odd number of bridges. So may the one you end on. So there are up to two land masses that may have an odd number of bridges.
Once this is observed, it’s easy to tell that Königsberg’s Bridges did not match that. All four land masses in Königsberg have an odd number of bridges. And so we could stop looking. It’s impossible to walk the seven bridges exactly once each in a tour, not without cheating.
Graph theoreticians, like the topologists of my prologue, now consider this foundational to their field. To look at a geographic problem and not concern oneself with areas and surfaces and shapes? To worry only about how sets connect? This guides graph theory in how to think about networks.
The city exists, as do the islands, and the bridges existed as described. So does Euler’s solution. And his reasoning is sound. The reasoning is ingenious, too. Everything hard about the problem evaporates. So what do I doubt about this fine story?
Teo Paoletti, author of that web page, says Danzig mayor Carl Leonhard Gottlieb Ehler wrote Euler, asking for a solution. This falls short of proving that the bridges were a common subject of speculation. It does show at least that Ehler thought it worth pondering. Euler apparently did not think it was even mathematics. Not that he thought it was hard; he simply thought it didn’t depend on mathematical principles. It took only reason. But he did find something interesting: why was it not mathematics? Paoletti quotes Euler as writing:
This question is so banal, but seemed to me worthy of attention in that [neither] geometry, nor algebra, nor even the art of counting was sufficient to solve it.
I am reminded of a mathematical joke. It’s about the professor who always went on at great length about any topic, however slight. I have no idea why this should stick with me. Finally one day the professor admitted of something, “This problem is not interesting.” The students barely had time to feel relief. The professor went on: “But the reasons why it is not interesting are very interesting. So let us explore that.”
The Königsberg Bridge Problem is in the first chapter of every graph theory book ever. And it is a good graph theory problem. It may not be fair to say it created graph theory, though. Euler seems to have treated this as a little side bit of business, unrelated to his real mathematics. Graph theory as we know it — as a genre — formed in the 19th century. So did topology. In hindsight we can see how studying these bridges brought us good questions to ask, and ways to solve them. But for something like a century after Euler published this, it was just the clever solution to a recreational mathematics puzzle. It was as important as finding knight’s tours of chessboards.
That we take it as the introduction to graph theory, and maybe topology, tells us something. It is an easy problem to pose. Its solution is clever, but not obscure. It takes no long chains of complex reasoning. Many people approach mathematics problems with fear. By telling this story, we promise mathematics that feels as secure as a stroll along the riverfront. This promise is good through about chapter three, section four, where there are four definitions on one page and the notation summons obscure demons of LaTeX.
Still. Look at what the story of the bridges tells us. We notice something curious about our environment. The problem seems mathematical, or at least geographic. The problem is of no consequence. But it lingers in the mind. The obvious approaches to solving it won’t work. But think of the problem differently. The problem becomes simple. And better than simple. It guides one to new insights. In a century it gives birth to two fields of mathematics. In two centuries these are significant fields. They’re things even non-mathematicians have heard of. It’s almost a mathematician’s fantasy of insight and accomplishment.
But this does happen. The world suggests no end of little mathematics problems. Sometimes they are wonderful. Richard Feynman’s memoirs tell of his imagination being captured by a plate spinning in the air. Solving that helped him resolve a problem in developing Quantum Electrodynamics. There are more mundane problems. One of my professors in grad school remembered tossing and catching a tennis racket and realizing he didn’t know why sometimes it flipped over and sometimes didn’t. His specialty was in dynamical systems, and he could work out the mechanics of what a tennis racket should do, and when. And I know that within me is the ability to work out when a pile of books becomes too tall to stand on its own. I just need to work up to it.
The story of the Königsberg Bridge Problem is about this. Even if nobody but the mayor of Danzig pondered how to cross the bridges, and he only got an answer because he infected Euler with the need to know? It is a story of an important piece of mathematics. Good stories will tell us things that are true, which are not necessarily the things that happen in them.
Today’s A To Z term is my pick again. So I choose the Julia Set. This is named for Gaston Julia, one of the pioneers in chaos theory and fractals. He was born earlier than you imagine. No, earlier than that: he was born in 1893.
The early 20th century saw amazing work done. We think of chaos theory and fractals as modern things, things that require vast computing power to understand. The computers help, yes. But the foundational work was done more than a century ago. Some of these pioneering mathematicians may have been able to get some numerical computing done. But many did not. They would have to do the hard work of thinking about things which they could not visualize. Things which surely did not look like they imagined.
We think of things as moving. Even static things we consider as implying movement. Else we’d think it odd to ask, “Where does that road go?” This carries over to abstract things, like mathematical functions. A function is a domain, a range, and a rule matching things in the domain to things in the range. It “moves” things as much as a dictionary moves words.
Yet we still think of a function as expressing motion. A common way for mathematicians to write functions uses little arrows, and describes what’s done as “mapping”. We might write . This is a general idea. We’re expressing that it maps things in the set D to things in the set R. We can use the notation to write something more specific. If ‘z’ is in the set D, we might write . This describes the rule that matches things in the domain to things in the range. represents the evaluation of this rule at a specific point, the one where the independent variable has the value ‘2’. represents the evaluation of this rule at a specific point without committing to what that point is. represents a collection of points. It’s the set you get by evaluating the rule at every point in D.
And it’s not bad to think of motion. Many functions are models of things that move. Particles in space. Fluids in a room. Populations changing in time. Signal strengths varying with a sensor’s position. Often we’ll calculate the development of something iteratively, too. If the domain and the range of a function are the same set? There’s no reason that we can’t take our z, evaluate f(z), and then take whatever that thing is and evaluate f(f(z)). And again. And again.
My age cohort, at least, learned to do this almost instinctively when we discovered you could take the result on a calculator and hit a function again. Calculate something and keep hitting square root; you get a string of numbers that eventually settle on 1. Or you started at zero. Calculate something and keep hitting square; you settle at either 0, 1, or grow to infinity. Hitting sine over and over … well, that was interesting since you might settle on 0 or some other, weird number. Same with tangent. Cosine you wouldn’t settle down to zero.
Serious mathematicians look at this stuff too, though. Take any set ‘D’, and find what its image is, f(D). Then iterate this, figuring out what f(f(D)) is. Then f(f(f(D))). f(f(f(f(D)))). And so on. What happens if you keep doing this? Like, forever?
We can say some things, at least. Even without knowing what f is. There could be a part of D that all these many iterations of f will send out to infinity. There could be a part of D that all these many iterations will send to some fixed point. And there could be a part of D that just keeps getting shuffled around without ever finishing.
Some of these might not exist. Like, doesn’t have any fixed points or shuffled-around points. It sends everything off to infinity. has only a fixed point; nothing from it goes off to infinity and nothing’s shuffled back and forth. has a fixed point and a lot of points that shuffle back and forth.
Thinking about these fixed points and these shuffling points gets us Julia Sets. These sets are the fixed points and shuffling-around points for certain kinds of functions. These functions are ones that have domain and range of the complex-valued numbers. Complex-valued numbers are the sum of a real number plus an imaginary number. A real number is just what it says on the tin. An imaginary number is a real number multiplied by . What is ? It’s the imaginary unit. It has the neat property that . That’s all we need to know about it.
Oh, also, zero times is zero again. So if you really want, you can say all real numbers are complex numbers; they’re just themselves plus . Complex-valued functions are worth a lot of study in their own right. Better, they’re easier to study (at the introductory level) than real-valued functions are. This is such a relief to the mathematics major.
And now let me explain some little nagging weird thing. I’ve been using ‘z’ to represent the independent variable here. You know, using it as if it were ‘x’. This is a convention mathematicians use, when working with complex-valued numbers. An arbitrary complex-valued number tends to be called ‘z’. We haven’t forgotten x, though. We just in this context use ‘x’ to mean “the real part of z”. We also use “y” to carry information about the imaginary part of z. When we write ‘z’ we hold in trust an ‘x’ and ‘y’ for which . This all comes in handy.
But we still don’t have Julia Sets for every complex-valued function. We need it to be a rational function. The name evokes rational numbers, but that doesn’t seem like much guidance. is a rational function. It seems too boring to be worth studying, though, and it is. A “rational function” is a function that’s one polynomial divided by another polynomial. This whether they’re real-valued or complex-valued polynomials.
So. Start with an ‘f’ that’s one complex-valued polynomial divided by another complex-valued polynomial. Start with the domain D, all of the complex-valued numbers. Find f(D). And f(f(D)). And f(f(f(D))). And so on. If you iterated this ‘f’ without limit, what’s the set of points that never go off to infinity? That’s the Julia Set for that function ‘f’.
There are some famous Julia sets, though. There are the Julia sets that we heard about during the great fractal boom of the 1980s. This was when computers got cheap enough, and their graphic abilities good enough, to automate the calculation of points in these sets. At least to approximate the points in these sets. And these are based on some nice, easy-to-understand functions. First, you have to pick a constant C. This C is drawn from the complex-valued numbers. But that can still be, like, ½, if that’s what interests you. For whatever your C is? Define this function:
And that’s it. Yes, this is a rational function. The numerator function is . The denominator function is .
This produces many different patterns. If you picked C = 0, you get a circle. Good on you for starting out with something you could double-check. If you picked C = -2? You get a long skinny line, again, easy enough to check. If you picked C = -1? Well, now you have a nice interesting weird shape, several bulging ovals with peninsulas of other bulging ovals all over. Pick other numbers. Pick numbers with interesting imaginary components. You get pinwheels. You get jagged streaks of lightning. You can even get separate islands, whole clouds of disjoint threatening-looking blobs.
There is some guessing what you’ll get. If you work out a Julia Set for a particular C, you’ll see a similar-looking Julia Set for a different C that’s very close to it. This is a comfort.
You can create a Julia Set for any rational function. I’ve only ever seen anyone actually do it for functions that look like what we already had. . Sometimes . I suppose once, in high school, I might have tried but I don’t remember what it looked like. If someone’s done, say, please write in and let me know what it looks like.
The Julia Set has a famous partner. Maybe the most famous fractal of them all, the Mandelbrot Set. That’s the strange blobby sea surrounded by lightning bolts that you see on the cover of every pop mathematics book from the 80s and 90s. If a C gives us a Julia Set that’s one single, contiguous patch? Then that C is in the Mandelbrot Set. Also vice-versa.
The ideas behind these sets are old. Julia’s paper about the iterations of rational functions first appeared in 1918. Julia died in 1978, the same year that the first computer rendering of the Mandelbrot set was done. I haven’t been able to find whether that rendering existed before his death. Nor have I decided which I would think the better sequence.
Today’s A To Z term is a free pick. I didn’t notice any suggestions for a mathematics term starting with this letter. I apologize if you did submit one and I missed it. I don’t mean any insult.
What I’ve picked is a concept from analysis. I’ve described this casually as the study of why calculus works. That’s a good part of what it is. Analysis is also about why real numbers work. Later on you also get to why complex numbers and why functions work. But it’s in the courses about Real Analysis where a mathematics major can expect to find the infimum, and it’ll stick around on the analysis courses after that.
The infimum is the thing you mean when you say “lower bound”. It applies to a set of things that you can put in order. The order has to work the way less-than-or-equal-to works with whole numbers. You don’t have to have numbers to put a number-like order on things. Otherwise whoever made up the Alphabet Song was fibbing to us all. But starting out with numbers can let you get confident with the idea, and we’ll trust you can go from numbers to other stuff, in case you ever need to.
A lower bound would start out meaning what you’d imagine if you spoke English. Let me call it L. It’ll make my sentences so much easier to write. Suppose that L is less than or equal to all the elements in your set. Then, great! L is a lower bound of your set.
You see the loophole here. It’s in the article “a”. If L is a lower bound, then what about L – 1? L – 10? L – 1,000,000,000½? Yeah, they’re all lower bounds, too. There’s no end of lower bounds. And that is not what you mean be a lower bound, in everyday language. You mean “the smallest thing you have to deal with”.
But you can’t just say “well, the lower bound of a set is the smallest thing in the set”. There’s sets that don’t have a smallest thing. The iconic example is positive numbers. No positive number can be a lower bound of this. All the negative numbers are lowest bounds of this. Zero can be a lower bound of this.
For the postive numbers, it’s obvious: zero is the lower bound we want. It’s smaller than all of the positive numbers. And there’s no greater number that’s also smaller than all the positive numbers. So this is the infimum of the positive numbers. It’s the greatest lower bound of the set.
The infimum of a set may or may not be part of the original set. But. Between the infimum of a set and the infimum plus any positive number, however tiny that is? There’s always at least one thing in the set.
And there isn’t always an infimum. This is obvious if your set is, like, the set of all the integers. If there’s no lower bound at all, there can’t be a greatest lower bound. So that’s obvious enough.
Infimums turn up in a good number of proofs. There are a couple reasons they do. One is that we want to prove a boundary between two kinds of things exist. It’s lurking in the proof, for example, of the intermediate value theorem. This is the proposition that if you have a continuous function on the domain [a, b], and range of real numbers, and pick some number g that’s between f(a) and f(b)? There’ll be at least one point c, between a and b, where f(c) equals g. You can structure this: look at the set of numbers x in the domain [a, b] whose f(x) is larger than g. So what’s the infimum of this set? What does f have to be for that infimum?
It also turns up a lot in proofs about calculus. Proofs about functions, particularly, especially integrating functions. A proof like this will, generically, not deal with the original function, which might have all kinds of unpleasant aspects. Instead it’ll look at a sequence of approximations of the original function. Each approximation is chosen so it has no unpleasant aspect. And then prove that we could make arbitrarily tiny the difference between the result for the function we want and the result for the sequence of functions we make. Infimums turn up in this, since we’ll want a minimum function without being sure that the minimum is in the sequence we work with.
This is the terminology of stuff to work as lower bounds. There’s a similar terminology to work with upper bounds. The upper-bound equivalent of the infimum is the supremum. They’re abbreviated as inf and sup. The supremum turns up most every time an infimum does, and for the reasons you’d expect.
If an infimum does exist, it’s unique; there can’t be two different ones. Same with the supremum.
And things can get weird. It’s possible to have lower bounds but no infimum. This seems bizarre. This is because we’ve been relying on the real numbers to guide our intuition. And the real numbers have a useful property called being “complete”. So let me break the real numbers. Imagine the real numbers except for zero. Call that the set R’. Now look at the set of positive numbers inside R’. What’s the infimum of the positive numbers, within R’? All we can do is shrug and say there is none, even though there are plenty of lower bounds. The infimum of a set depends on the set. It also depends on what bigger set that the set is within. That something depends both on a set and what the bigger set of things is, is another thing that turns up all the time in analysis. It’s worth becoming familiar with.
Today’s A To Z term is another I drew from Mr Wu, of the Singapore Math Tuition blog. It gives me more chances to discuss differential equations and mathematical physics, too.
The Hamiltonian we name for Sir William Rowan Hamilton, the 19th century Irish mathematical physicists who worked on everything. You might have encountered his name from hearing about quaternions. Or for coining the terms “scalar” and “tensor”. Or for work in graph theory. There’s more. He did work in Fourier analysis, which is what you get into when you feel at ease with Fourier series. And then wild stuff combining matrices and rings. He’s not quite one of those people where there’s a Hamilton’s Theorem for every field of mathematics you might be interested in. It’s close, though.
When you first learn about physics you learn about forces and accelerations and stuff. When you major in physics you learn to avoid dealing with forces and accelerations and stuff. It’s not explicit. But you get trained to look, so far as possible, away from vectors. Look to scalars. Look to single numbers that somehow encode your problem.
A great example of this is the Lagrangian. It’s built on “generalized coordinates”, which are not necessarily, like, position and velocity and all. They include the things that describe your system. This can be positions. It’s often angles. The Lagrangian shines in problems where it matters that something rotates. Or if you need to work with polar coordinates or spherical coordinates or anything non-rectangular. The Lagrangian is, in your general coordinates, equal to the kinetic energy minus the potential energy. It’ll be a function. It’ll depend on your coordinates and on the derivative-with-respect-to-time of your coordinates. You can take partial derivatives of the Lagrangian. This tells how the coordinates, and the change-in-time of your coordinates should change over time.
The Hamiltonian is a similar way of working out mechanics problems. The Hamiltonian function isn’t anything so primitive as the kinetic energy minus the potential energy. No, the Hamiltonian is the kinetic energy plus the potential energy. Totally different in idea.
From that description you maybe guessed you can transfer from the Lagrangian to the Hamiltonian. Maybe vice-versa. Yes, you can, although we use the term “transform”. Specifically a “Legendre transform”. We can use any coordinates we like, just as with Lagrangian mechanics. And, as with the Lagrangian, we can find how coordinates change over time. The change of any coordinate depends on the partial derivative of the Hamiltonian with respect to a particular other coordinate. This other coordinate is its “conjugate”. (It may either be this derivative, or minus one times this derivative. By the time you’re doing work in the field you’ll know which.)
That conjugate coordinate is the important thing. It’s why we muck around with Hamiltonians when Lagrangians are so similar. In ordinary, common coordinate systems these conjugate coordinates form nice pairs. In Cartesian coordinates, the conjugate to a particle’s position is its momentum, and vice-versa. In polar coordinates, the conjugate to the angular velocity is the angular momentum. These are nice-sounding pairs. But that’s our good luck. These happen to match stuff we already think is important. In general coordinates one or more of a pair can be some fusion of variables we don’t have a word for and would never care about. Sometimes it gets weird. In the problem of vortices swirling around each other on an infinitely great plane? The horizontal position is conjugate to the vertical position. Velocity doesn’t enter into it. For vortices on the sphere the longitude is conjugate to the cosine of the latitude.
What’s valuable about these pairings is that they make a “symplectic manifold”. A manifold is a patch of space where stuff works like normal Euclidean geometry does. In this case, the space is in “phase space”. This is the collection of all the possible combinations of all the variables that could ever turn up. Every particular moment of a mechanical system matches some point in phase space. Its evolution over time traces out a path in that space. Call it a trajectory or an orbit as you like.
We get good things from looking at the geometry that this symplectic manifold implies. For example, if we know that one variable doesn’t appear in the Hamiltonian, then its conjugate’s value never changes. This is almost the kindest thing you can do for a mathematical physicist. But more. A famous theorem by Emmy Noether tells us that symmetries in the Hamiltonian match with conservation laws in the physics. Time-invariance, for example — time not appearing in the Hamiltonian — gives us the conservation of energy. If only distances between things, not absolute positions, matter, then we get conservation of linear momentum. Stuff like that. To find conservation laws in physics problems is the kindest thing you can do for a mathematical physicist.
The Hamiltonian was born out of planetary physics. These are problems easy to understand and, apart from the case of one star with one planet orbiting each other, impossible to solve exactly. That’s all right. The formalism applies to all kinds of problems. They’re very good at handling particles that interact with each other and maybe some potential energy. This is a lot of stuff.
More, the approach extends naturally to quantum mechanics. It takes some damage along the way. We can’t talk about “the” position or “the” momentum of anything quantum-mechanical. But what we get when we look at quantum mechanics looks very much like what Hamiltonians do. We can calculate things which are quantum quite well by using these tools. This though they came from questions like why Saturn’s rings haven’t fallen part and whether the Earth will stay around its present orbit.
It holds surprising power, too. Notice that the Hamiltonian is the kinetic energy of a system plus its potential energy. For a lot of physics problems that’s all the energy there is. That is, the value of the Hamiltonian for some set of coordinates is the total energy of the system at that time. And, if there’s no energy lost to friction or heat or whatever? Then that’s the total energy of the system for all time.
Here’s where this becomes almost practical. We often want to do a numerical simulation of a physics problem. Generically, we do this by looking up what all the values of all the coordinates are at some starting time t0. Then we calculate how fast these coordinates are changing with time. We pick a small change in time, Δ t. Then we say that at time t0 plus Δ t, the coordinates are whatever they started at plus Δ t times that rate of change. And then we repeat, figuring out how fast the coordinates are changing now, at this position and time.
The trouble is we always make some mistake, and once we’ve made a mistake, we’re going to keep on making mistakes. We can do some clever stuff to make the smallest error possible figuring out where to go, but it’ll still happen. Usually, we stick to calculations where the error won’t mess up our results.
But when we look at stuff like whether the Earth will stay around its present orbit? We can’t make each step good enough for that. Unless we get to thinking about the Hamiltonian, and our symplectic variables. The actual system traces out a path in phase space. Everyone on that path the Hamiltonian is a particular value, the energy of the system. So use the regular methods to project most of the variables to the new time, t0 + Δ t. But the rest? Pick the values that makes the Hamiltonian work out right. Also momentum and angular momentum and other stuff we know get conserved. We’ll still make an error. But it’s a different kind of error. It’ll project to a point that’s maybe in the wrong place on the trajectory. But it’s on the trajectory.
(OK, it’s near the trajectory. Suppose the real energy is, oh, the square root of 5. The computer simulation will have an energy of 2.23607. This is close but not exactly the same. That’s all right. Each step will stay close to the real energy.)
So what we’ll get is a projection of the Earth’s orbit that maybe puts it in the wrong place in its orbit. Putting the planet on the opposite side of the sun from Venus when we ought to see Venus transiting the Sun. That’s all right, if what we’re interested in is whether Venus and Earth are still in the solar system.
There’s a special cost for this. If there weren’t we’d use it all the time. The cost is computational complexity. It’s pricey enough that you haven’t heard about these “symplectic integrators” before. That’s all right. These are the kinds of things open to us once we look long into the Hamiltonian.
These are named for George Green, an English mathematician of the early 19th century. He’s one of those people who gave us our idea of mathematical physics. He’s credited with coining the term “potential”, as in potential energy, and in making people realize how studying this simplified problems. Mostly problems in electricity and magnetism, which were so very interesting back then. On the side also came work in multivariable calculus. His work most famous to mathematics and physics majors connects integrals over the surface of a shape with (different) integrals over the entire interior volume. In more specific problems, he did work on the behavior of water in canals.
There’s a patch of (high school) algebra where you solve systems of equations in a couple variables. Like, you have to do one system where you’re solving, say,
And then maybe later on you get a different problem, one that looks like:
If you solve both of them you notice you’re doing a lot of the same work. All the same hard work. It’s only the part on the right-hand side of the equals signs that are different. Even then, the series of steps you follow on the right-hand-side are the same. They have different numbers is all. What makes the problem distinct is the stuff on the left-hand-side. It’s the set of what coefficients times what variables add together. If you get enough about matrices and vectors you get in the habit of writing this set of equations as one matrix equation, as
Here holds all the unknown variables, your x and y and z and anything else that turns up. Your holds the right-hand side. Do enough of these problems and you notice something. You can describe how to find the solution for these equations before you even know what the right-hand-side is. You can do all the hard work of solving this set of equations for a generic set of right-hand-side constants. Fill them in when you need a particular answer.
I mentioned, while writing about Fourier series, how it turns out most of what you do to numbers you can also do to functions. This really proves itself in differential equations. Both partial and ordinary differential equations. A differential equation works with some not-yet-known function u(x). For what I’m discussing here it doesn’t matter whether ‘x’ is a single variable or a whole set of independent variables, like, x and y and z. I’ll use ‘x’ as shorthand for all that. The differential equation takes u(x) and maybe multiplies it by something, and adds to that some derivatives of u(x) multiplied by something. Those somethings can be constants. They can be other, known, functions with independent variable x. They can be functions that depend on u(x) also. But if they are, then this is a nonlinear differential equation and there’s no solving that.
So suppose we have a linear differential equation. Partial or ordinary, whatever you like. There’s terms that have u(x) or its derivatives in them. Move them all to the left-hand-side. Move everything else to the right-hand-side. This right-hand-side might be constant. It might depend on x. Doesn’t matter. This right-hand-side is some function which I’ll call f(x). This f(x) might be constant; that’s fine. That’s still a legitimate function.
Put this way, every differential equation looks like:
That stuff with u(x) and its derivatives we can call an operator. An operator’s a function which has a domain of functions and a range of functions. So we can give give that a name. ‘L’ is a good name here, because if it’s not the operator for a linear differential equation — a linear operator — then we’re done anyway. So whatever our differential equation was we can write it:
Writing it makes it look like we’re multiplying L by u(x). We’re not. We’re really not. This is more like if ‘L’ is the predicate of a sentence and ‘u(x)’ is the object. Read it like, to make up an example, ‘L’ means ‘three times the second derivative plus two x times’ and ‘u(x)’ as ‘u(x)’.
Still, looking at and then back up at tells you what I’m thinking. We can find some set of instructions to, for any , find the that makes true. So why can’t we find some set of instructions to, for any , find the that makes true?
This is where a Green’s function comes in. Or, like everybody says, “the” Green’s function. “The” here we use like we might talk about “the” roots of a polynomial. Every polynomial has different roots. So, too, does every differential equation have a different Green’s function. What the Green’s function is depends on the equation. It can also depend on what domain the differential equation applies to. It can also depend on some extra information called initial values or boundary values.
The Green’s function for a differential equation has twice as many independent variables as the differential equation has. This seems like we’re making a mess of things. It’s all right. These new variables are the falsework, the scaffolding. Once they’ve helped us get our work done they disappear. This kind of thing we call a “dummy variable”. If x is the actual independent variable, then pick something else — s is a good choice — for the dummy variable. It’s from the same domain as the original x, though. So the Green’s function is some . All right, but how do you find it?
To get this, you have to solve a particular special case of the differential equation. You have to solve:
This may look like we’re not getting anywhere. It may even look like we’re getting in more trouble. What is this , for example? Well, this is a particular and famous thing called the Dirac delta function. It’s called a function as a courtesy to our mathematical physics friends, who don’t care about whether it truly is a function. Dirac is Paul Dirac, from over in physics. The one whose biography is called The Strangest Man. His delta function is a strange function. Let me say that its independent variable is t. Then is zero, unless t is itself zero. If t is zero then is … something. What is that something? … Oh … something big. It’s … just … don’t look directly at it. What’s important is the integral of this function:
I write it this way because there’s delta functions for two-dimensional spaces, three-dimensional spaces, everything. If you integrate over a region that includes the origin, the integral of the delta function is 1. If you integrate over a region that doesn’t, the integral of the delta function is 0.
The delta function has a neat property sometimes called filtering. This is what happens if you integrate some function times the Dirac delta function. Then …
This may look dumb. That’s fine. This scheme is so good at getting rid of integrals where you don’t want them. Or at getting integrals in where it’d be convenient to have.
So, I have a mental model of what the Dirac delta function does. It might help you. Think of beating a drum. It can sound like many different things. It depends on how hard you hit it, how fast you hit it, what kind of stick you use, where exactly you hit it. I think of each differential equation as a different drumhead. The Green’s function is then the sound of a specific, uniform, reference hit at a reference position. This produces a sound. I can use that sound to extrapolate how every different sort of drumming would sound on this particular drumhead.
So solving this one differential equation, to find the Green’s function for a particular case, may be hard. Maybe not. Often it’s easier than some particular f(x) because the Dirac delta function is so weird that it becomes kinda easy-ish. But you do have to find one solution to this differential equation, somehow.
Once you do, though? Once you have this ? That is glorious. Because then, whatever your f is? The solution to is:
Here the integral is over whatever the domain of the differential equation is, and whatever the domain of f is. This last integral is where the dummy variable finally evaporates. All that remains is x, as we want.
A little bit of … arithmetic isn’t the right word. But symbol manipulation will convince you this is right, if you need convincing. (The trick is remembering that ‘x’ and ‘s’ are different variables. When you differentiate with respect to ‘x’, ‘s’ acts like a constant. When you integrate with respect to ‘s’, ‘x’ acts like a constant.)
What can make a Green’s function worth finding is that we do a lot of the same kinds of differential equations. We do a lot of diffusion problems. A lot of wave transmission problems. A lot of wave-transmission-with-losses problems. So there are many problems that can all use the same tools to solve.
Consider remote detection problems. This can include things like finding things underground. It also includes, like, medical sensors. We would like to know “what kind of thing produces a signal like this?” We can detect the signal easily enough. We can model how whatever it is between the thing and our sensors changes what we could detect. (This kind of thing we call an “inverse problem”, finding the thing that could produce what we know.) Green’s functions are one of the ways we can get at the source of what we can see.
Now, Green’s functions are a powerful and useful idea. They sprawl over a lot of mathematical applications. As they do, they pick up regional dialects. Things like deciding that , for example. None of these are significant differences. But before you go poking into someone else’s field and solving their problems, take a moment. Double-check that their symbols do mean precisely what you think they mean. It’ll save you some petty quarrels.
Also, I really don’t like how those systems of equations turned out up at the top of this essay. But I couldn’t work out how to do arrays of equations all lined up along the equals sign, or other mildly advanced LaTeX stuff like doing a function-definition-by-cases. If someone knows of the Real Official Proper List of what you can and can’t do with the LaTeX that comes from a standard free WordPress.com blog I’d appreciate a heads-up. Thank you.
Fourier series are named for Jean-Baptiste Joseph Fourier, and are maybe the greatest example of the theory that’s brilliantly wrong. Anyone can be wrong about something. There’s genius in being wrong in a way that gives us good new insights into things. Fourier series were developed to understand how the fluid we call “heat” flows through and between objects. Heat is not a fluid. So what? Pretending it’s a fluid gives us good, accurate results. More, you don’t need to use Fourier series to work with a fluid. Or a thing you’re pretending is a fluid. It works for lots of stuff. The Fourier series method challenged assumptions mathematicians had made about how functions worked, how continuity worked, how differential equations worked. These problems could be sorted out. It took a lot of work. It challenged and expended our ideas of functions.
Fourier also managed to hold political offices in France during the Revolution, the Consulate, the Empire, the Bourbon Restoration, the Hundred Days, and the Second Bourbon Restoration without getting killed for his efforts. If nothing else this shows the depth of his talents.
The weirdness of the setup: you want to think of functions as points in space. The allegory is rather close. Think of the common association between a point in space and the coordinates that describe that point. Pretend those are the same thing. Then you can do stuff like add points together. That is, take the coordinates of both points. Add the corresponding coordinates together. Match that sum-of-coordinates to a point. This gives us the “sum” of two points. You can subtract points from one another, again by going through their coordinates. Multiply a point by a constant and get a new point. Find the angle between two points. (This is the angle formed by the line segments connecting the origin and both points.)
Functions can work like this. You can add functions together and get a new function. Subtract one function from another. Multiply a function by a constant. It’s even possible to describe an “angle” between two functions. Mathematicians usually call that the dot product or the inner product. But we will sometimes call two functions “orthogonal”. That means the ordinary everyday meaning of “orthogonal”, if anyone said “orthogonal” in ordinary everyday life.
We can take equations of a bunch of variables and solve them. Call the values of that solution the coordinates of a point. Then we talk about finding the point where something interesting happens. Or the points where something interesting happens. We can do the same with differential equations. This is finding a point in the space of functions that makes the equation true. Maybe a set of points. So we can find a function or a family of functions solving the differential equation.
You have reasons for skepticism, even if you’ll grant me treating functions as being like points in space. You might remember solving systems of equations. You need as many equations as there are dimensions of space; a two-dimensional space needs two equations. A three-dimensional space needs three equations. You might have worked four equations in four variables. You were threatened with five equations in five variables if you didn’t all settle down. You’re not sure how many dimensions of space “all the possible functions” are. It’s got to be more than the one differential equation we started with.
This is fair. The approach I’m talking about uses the original differential equation, yes. But it breaks it up into a bunch of linear equations. Enough linear equations to match the space of functions. We turn a differential equation into a set of linear equations, a matrix problem, like we know how to solve. So that settles that.
So suppose solves the differential equation. Here I’m going to pretend that the function has one independent variable. Many functions have more than this. Doesn’t matter. Everything I say here extends into two or three or more independent variables. It takes longer and uses more symbols and we don’t need that. The thing about is that we don’t know what it is, but would quite like to.
What we’re going to do is choose a reference set of functions that we do know. Let me call them going on to however many we need. It can be infinitely many. It certainly is at least up to some for some big enough whole number N. These are a set of “basis functions”. For any function we want to represent we can find a bunch of constants, called coefficients. Let me use to represent them. Any function we want is the sum of the coefficient times the matching basis function. That is, there’s some coefficients so that
is true. That summation goes on until we run out of basis functions. Or it runs on forever. This is a great way to solve linear differential equations. This is because we know the basis functions. We know everything we care to know about them. We know their derivatives. We know everything on the right-hand side except the coefficients. The coefficients matching any particular function are constants. So the derivatives of , written as the sum of coefficients times basis functions, are easy to work with. If we need second or third or more derivatives? That’s no harder to work with.
You may know something about matrix equations. That is that solving them takes freaking forever. The bigger the equation, the more forever. If you have to solve eight equations in eight unknowns? If you start now, you might finish in your lifetime. For this function space? We need dozens, hundreds, maybe thousands of equations and as many unknowns. Maybe infinitely many. So we seem to have a solution that’s great apart from how we can’t use it.
Except. What if the equations we have to solve are all easy? If we have to solve a bunch that looks like, oh, and and … well, that’ll take some time, yes. But not forever. Great idea. Is there any way to guarantee that?
It’s in the basis functions. If we pick functions that are orthogonal, or are almost orthogonal, to each other? Then we can turn the differential equation into an easy matrix problem. Not as easy as in the last paragraph. But still, not hard.
So what’s a good set of basis functions?
And here, about 800 words later than everyone was expecting, let me introduce the sine and cosine functions. Sines and cosines make great basis functions. They don’t grow without bounds. They don’t dwindle to nothing. They’re easy to differentiate. They’re easy to integrate, which is really special. Most functions are hard to integrate. We even know what they look like. They’re waves. Some have long wavelengths, some short wavelengths. But waves. And … well, it’s easy to make sets of them orthogonal.
We have to set some rules. The first is that each of these sine and cosine basis functions have a period. That is, after some time (or distance), they repeat. They might repeat before that. Most of them do, in fact. But we’re guaranteed a repeat after no longer than some period. Call that period ‘L’.
Each of these sine and cosine basis functions has to have a whole number of complete oscillations within the period L. So we can say something about the sine and cosine functions. They have to look like these:
Here ‘j’ and ‘k’ are some whole numbers. I have two sets of basis functions at work here. Don’t let that throw you. We could have labelled them all as , with some clever scheme that told us for a given k whether it represents a sine or a cosine. It’s less hard work if we have s’s and c’s. And if we have coefficients of both a’s and b’s. That is, we suppose the function is:
This, at last, is the Fourier series. Each function has its own series. A “series” is a summation. It can be of finitely many terms. It can be of infinitely many. Often infinitely many terms give more interesting stuff. Like this, for example. Oh, and there’s a bare there, not multiplied by anything more complicated. It makes life easier. It lets us see that the Fourier series for, like, 3 + f(x) is the same as the Fourier series for f(x), except for the leading term. The ½ before that makes easier some work that’s outside the scope of this essay. Accept it as one of the merry, wondrous appearances of ‘2’ in mathematics expressions.
It’s great for solving differential equations. It’s also great for encryption. The sines and the cosines are standard functions, after all. We can send all the information we need to reconstruct a function by sending the coefficients for it. This can also help us pick out signal from noise. Noise has a Fourier series that looks a particular way. If you take the coefficients for a noisy signal and remove that? You can get a good approximation of the original, noiseless, signal.
This all seems great. That’s a good time to feel skeptical. First, like, not everything we want to work with looks like waves. Suppose we need a function that looks like a parabola. It’s silly to think we can add a bunch of sines and cosines and get a parabola. Like, a parabola isn’t periodic, to start with.
So it’s not. To use Fourier series methods on something that’s not periodic, we use a clever technique: we tell a fib. We declare that the period is something bigger than we care about. Say the period is, oh, ten million years long. A hundred light-years wide. Whatever. We trust that the difference between the function we do want, and the function that we calculate, will be small. We trust that if someone ten million years from now and a hundred light-years away wishes to complain about our work, we will be out of the office that day. Letting the period L be big enough is a good reliable tool.
The other thing? Can we approximate any function as a Fourier series? Like, at least chunks of parabolas? Polynomials? Chunks of exponential growths or decays? What about sawtooth functions, that rise and fall? What about step functions, that are constant for a while and then jump up or down?
The answer to all these questions is “yes,” although drawing out the word and raising a finger to say there are some issues we have to deal with. One issue is that most of the time, we need an infinitely long series to represent a function perfectly. TThis is fine if we’re trying to prove things about functions in general rather than solve some specific problem. It’s no harder to write the sum of infinitely many terms than the sum of finitely many terms. You write an &infty; symbol instead of an N in some important places. But if we want to solve specific problems? We probably want to deal with finitely many terms. (I hedge that statement on purpose. Sometimes it turns out we can find a formula for all the infinitely many coefficients.) This will usually give us an approximation of the we want. The approximation can be as good as we want, but to get a better approximation we need more terms. Fair enough. This kind of tradeoff doesn’t seem too weird.
Another issue is in discontinuities. If jumps around? If it has some point where it’s undefined? If it has corners? Then the Fourier series has problems. Summing up sines and cosines can’t give us a sudden jump or a gap or anything. Near a discontinuity, the Fourier series will get this high-frequency wobble. A bigger jump, a bigger wobble. You may not blame the series for not representing a discontinuity. But it does mean that what is, otherwise, a pretty good match for the you want gets this region where it stops being so good a match.
That’s all right. These issues aren’t bad enough, or unpredictable enough, to keep Fourier series from being powerful tools. Even when we find problems for which sines and cosines are poor fits, we use this same approach. Describe a function we would like to know as the sums of functions we choose to work with. Fourier series are one of those ideas that helps us solve problems, and guides us to new ways to solve problems.
The oldest reason to hide a message, at least from all but select recipients. Ancient encryption methods will substitute one letter for another, or will mix up the order of letters in a message. This won’t hide a message forever. But it will slow down a person trying to decrypt the message until they decide they don’t need to know what it says. Or decide to bludgeon the message-writer into revealing the secret.
Substituting one letter for another won’t stop an eavesdropper from working out the message. Not indefinitely, anyway. There are patterns in the language. Any language, but take English as an example. A single-letter word is either ‘I’ or ‘A’. A two-letter word has a great chance of being ‘in’, ‘on’, ‘by’, ‘of’, ‘an’, or a couple other choices. Solving this is a fun pastime, for people who like this. If you need it done fast, let a computer work it out.
To hide the message better requires being cleverer. For example, you could substitue letters according to a slightly different scheme for each letter in the original message. The Vignère cipher is an example of this. I remember some books from my childhood, written in the second person. They had programs that you-the-reader could type in to live the thrill of being a child secret agent computer programmer. This encryption scheme was one of the programs used for passing on messages. We can make the plans more complicated yet, but that won’t give us better insight yet.
The objective is to turn the message into something less predictable. An encryption which turns, say, ‘the’ into ‘rgw’ will slow the reader down. But if they pay attention and notice, oh, the text also has the words ‘rgwm’, ‘rgey’, and rgwb’ turn up a lot? It’s hard not to suspect these are ‘the’, ‘them’, ‘they’, and ‘then’. If a different three-letter code is used for every appearance of ‘the’, good. If there’s a way to conceal the spaces as something else, that’s even better, if we want it harder to decrypt the message.
So the messages hardest to decrypt should be the most random. We can give randomness a precise definition. We owe it to information theory, which is the study of how to encode and successfully transmit and decode messages. In this, the information content of a message is its entropy. Yes, the same word as used to describe broken eggs and cream stirred into coffee. The entropy measures how likely each possible message is. Encryption matches the message you really want with a message of higher entropy. That is, one that’s harder to predict. Decrypting reverses that matching.
So what goes into a message? We call them words, or codewords, so we have a clear noun to use. A codeword is a string of letters from an agreed-on alphabet. The terminology draws from common ordinary language. Cryptography grew out of sending sentences.
But anything can be the letters of the alphabet. Any string of them can be a codeword. An unavoidable song from my childhood told the story of a man asking his former lover to tie a yellow ribbon around an oak tree. This is a tiny alphabet, but it only had to convey two words, signalling whether she was open to resuming their relationship. Digital computers use an alphabet of two memory states. We label them ‘0’ and ‘1’, although we could as well label them +5 and -5, or A and B, or whatever. It’s not like actual symbols are scrawled very tight into the chips. Morse code uses dots and dashes and short and long pauses. Naval signal flags have a set of shapes and patterns to represent the letters of the alphabet, as well as common or urgent messages. There is not a single universally correct number of letters or length of words for encryption. It depends on what the code will be used for, and how.
Naval signal flags help me to my next point. There’s a single pattern which, if shown, communicates the message “I require a pilot”. Another, “I am on fire and have dangerous cargo”. Still another, “All persons should report on board as the vessel is about to set to sea”. These are whole sentences; they’re encrypted into a single letter.
And this is the second great use of encryption. English — any human language — has redundancy to it. Think of the sentence “No, I’d rather not go out this evening”. It’s polite, but is there anything in it not communicated by texting back “N”? An encrypted message is, often, shorter than the original. To send a message costs something. Time, if nothing else. To send it more briefly is typically better.
There are dangers to this. Strike out any word from “No, I’d rather not go out this evening”. Ask someone to guess what belongs there. Only the extroverts will have trouble. I guess if you strike out “evening” people might guess “today” or “tomorrow” or something. The sentiment of the sentence remains.
But strike out a letter from “N” and ask someone to guess what was meant. And this is a danger of encryption. The encrypted message has a higher entropy, a higher unpredictability. If some mistake happens in transmission, we’re lost.
We can fight this. It’s possible to build checks into an encryption. To carry a bit of extra information that lets one know that the message was garbled. These are “error-detecting codes”. It’s even possible to carry enough extra information to correct some errors. These are “error-correcting codes”. There are limits, of course. This kind of error-correcting takes calculation time and message space. We lose some economy but gain reliability. There is a general lesson in this.
And not everything can compress. There are (if I’m reading this right) 26 letter, 10 numeral, and four repeater flags used under the International Code of Symbols. So there are at most 40 signals that could be reduced to a single flag. If we need to communicate “I am on fire but have no dangerous cargo” we’re at a loss. We have to spell things out more. It’s a quick proof, by way of the pigeonhole principle, which tells us that not every message can compress. But this is all right. There are many messages we will never need to send. (“I am on fire and my cargo needs updates on Funky Winkerbean.”) If it’s mostly those that have no compressed version, who cares?
Encryption schemes are almost as flexible as language itself. There are families of kinds of schemes. This lets us fit schemes to needs: how many different messages do we need to be able to send? How sure do we need to be that errors are corrected? Or that errors are detected? How hard do we want it to be for eavesdroppers to decode the message? Are we able to set up information with the intended recipients separately? What we need, and what we are willing to do without, guide the scheme we use.
We’re only in the third week of the Fall 2019 Mathematics A-to-Z, but this is when I should be nailing down topics for the next several letters. So again, I ask you kind readers for suggestions. I’ve done five A-to-Z sequences before, from 2015 through 2018, and am listing the essays I’ve already written for the middle part of the alphabet. I’m open to revisiting topics, if I think I can improve on what I already wrote. But I reserve the right to use whatever topic feels most interesting to me.
To suggest anything for the letters I through N please leave the comment here. Also do please let me know if you have a mathematics blog, a Twitter or Mathstodon account, a YouTube channel, or anything else that you’d like to share.
I thank you again you for any thoughts you have. Please ask if there are any questions. I hope to be open to topics in any field of mathematics, including ones I don’t really know. The fun and terror of writing about a thing I’m only learning about is part of what I get from this kind of project.
The thing most important to know about differential equations is that for short, we call it “diff eq”. This is pronounced “diffy q”. It’s a fun name. People who aren’t taking mathematics smile when they hear someone has to get to “diffy q”.
Sometimes we need to be more exact. Then the less exciting names “ODE” and “PDE” get used. The meaning of the “DE” part is an easy guess. The meaning of “O” or “P” will be clear by the time this essay’s finished. We can find approximate answers to differential equations by computer. This is known generally as “numerical solutions”. So you will encounter talk about, say, “NSPDE”. There’s an implied “of” between the S and the P there. I don’t often see “NSODE”. For some reason, probably a quite arbitrary historical choice, this is just called “numerical integration” instead.
One of algebra’s unsettling things is the idea that we can work with numbers without knowing their values. We can give them names, like ‘x’ or ‘a’ or ‘t’. We can know things about them. Often it’s equations telling us these things. We can make collections of numbers based on them all sharing some property. Often these things are solutions to equations. We can even describe changing those collections according to some rule, even before we know whether any of the numbers is 2. Often these things are functions, here matching one set of numbers to another.
One of analysis’s unsettling things is the idea that most things we can do with numbers we can also do with functions. We can give them names, like ‘f’ and ‘g’ and … ‘F’. That’s easy enough. We can add and subtract them. Multiply and divide. This is unsurprising. We can measure their sizes. This is odd but, all right. We can know things about functions even without knowing exactly what they are. We can group together collections of functions based on some properties they share. This is getting wild. We can even describe changing these collections according to some rule. This change is itself a function, but it is usually called an “operator”, saving us some confusion.
So we can describe a function in an equation. We may not know what f is, but suppose we know is true. We can suppose that if we cared we could find what function, or functions, f made that equation true. There is shorthand here. A function has a domain, a range, and a rule. The equation part helps us find the rule. The domain and range we get from the problem. Or we take the implicit rule that both are the biggest sets of real-valued numbers for which the rule parses. Sometimes biggest sets of complex-valued numbers. We get so used to saying “the function” to mean “the rule for the function” that we’ll forget to say that’s what we’re doing.
There are things we can do with functions that we can’t do with numbers. Or at least that are too boring to do with numbers. The most important here is taking derivatives. The derivative of a function is another function. One good way to think of a derivative is that it describes how a function changes when its variables change. (The derivative of a number is zero, which is boring except when it’s also useful.) Derivatives are great. You learn them in Intro Calculus, and there are a bunch of rules to follow. But follow them and you can pretty much take the derivative of any function even if it’s complicated. Yes, you might have to look up what the derivative of the arc-hyperbolic-secant is. Nobody has ever used the arc-hyperbolic-secant, except to tease a student.
And the derivative of a function is itself a function. So you can take a derivative again. Mathematicians call this the “second derivative”, because we didn’t expect someone would ask what to call it and we had to say something. We can take the derivative of the second derivative. This is the “third derivative” because by then changing the scheme would be awkward. If you need to talk about taking the derivative some large but unspecified number of times, this is the n-th derivative. Or m-th, if you’ve already used ‘n’ to mean something else.
And now we get to differential equations. These are equations in which we describe a function using at least one of its derivatives. The original function, that is, f, usually appears in the equation. It doesn’t have to, though.
We divide the earth naturally (we think) into two pairs of hemispheres, northern and southern, eastern and western. We divide differential equations naturally (we think) into two pairs of two kinds of differential equations.
The first division is into linear and nonlinear equations. I’ll describe the two kinds of problem loosely. Linear equations are the kind you don’t need a mathematician to solve. If the equation has solutions, we can write out procedures that find them, like, all the time. A well-programmed computer can solve them exactly. Nonlinear equations, meanwhile, are the kind no mathematician can solve. They’re just too hard. There’s no processes that are sure to find an answer.
You may ask. We don’t need mathematicians to solve linear equations. Mathematicians can’t solve nonlinear ones. So what do we need mathematicians for? The answer is that I exaggerate. Linear equations aren’t quite that simple. Nonlinear equations aren’t quite that hopeless. There are nonlinear equations we can solve exactly, for example. This usually involves some ingenious transformation. We find a linear equation whose solution guides us to the function we do want.
And that is what mathematicians do in such a field. A nonlinear differential equation may, generally, be hopeless. But we can often find a linear differential equation which gives us insight to what we want. Finding that equation, and showing that its answers are relevant, is the work.
The other hemispheres we call ordinary differential equations and partial differential equations. In form, the difference between them is the kind of derivative that’s taken. If the function’s domain is more than one dimension, then there are different kinds of derivative. Or as normal people put it, if the function has more than one independent variable, then there are different kinds of derivatives. These are partial derivatives and ordinary (or “full”) derivatives. Partial derivatives give us partial differential equations. Ordinary derivatives give us ordinary differential equations. I think it’s easier to understand a partial derivative.
Suppose a function depends on three variables, imaginatively named x, y, and z. There are three partial first derivatives. One describes how the function changes if we pretend y and z are constants, but let x change. This is the “partial derivative with respect to x”. Another describes how the function changes if we pretend x and z are constants, but let y change. This is the “partial derivative with respect to y”. The third describes how the function changes if we pretend x and y are constants, but let z change. You can guess what we call this.
In an ordinary differential equation we would still like to know how the function changes when x changes. But we have to admit that a change in x might cause a change in y and z. So we have to account for that. If you don’t see how such a thing is possible don’t worry. The differential equations textbook has an example in which you wish to measure something on the surface of a hill. Temperature, usually. Maybe rainfall or wind speed. To move from one spot to another a bit east of it is also to move up or down. The change in (let’s say) x, how far east you are, demands a change in z, how far above sea level you are.
That’s structure, though. What’s more interesting is the meaning. What kinds of problems do ordinary and partial differential equations usually represent? Partial differential equations are great for describing surfaces and flows and great bulk masses of things. If you see an equation about how heat transmits through a room? That’s a partial differential equation. About how sound passes through a forest? Partial differential equation. About the climate? Partial differential equations again.
Ordinary differential equations are great for describing a ball rolling on a lumpy hill. It’s given an initial push. There are some directions (downhill) that it’s easier to roll in. There’s some directions (uphill) that it’s harder to roll in, but it can roll if the push was hard enough. There’s maybe friction that makes it roll to a stop.
Put that way it’s clear all the interesting stuff is partial differential equations. Balls on lumpy hills are nice but who cares? Miniature golf course designers and that’s all. This is because I’ve presented it to look silly. I’ve got you thinking of a “ball” and a “hill” as if I meant balls and hills. Nah. It’s usually possible to bundle a lot of information about a physical problem into something that looks like a ball. And then we can bundle the ways things interact into something that looks like a hill.
Like, suppose we have two blocks on a shared track, like in a high school physics class. We can describe their positions as one point in a two-dimensional space. One axis is where on the track the first block is, and the other axis is where on the track the second block is. Physics problems like this also usually depend on momentum. We can toss these in too, an axis that describes the momentum of the first block, and another axis that describes the momentum of the second block.
We’re already up to four dimensions, and we only have two things, both confined to one track. That’s all right. We don’t have to draw it. If we do, we draw something that looks like a two- or three-dimensional sketch, maybe with a note that says “D = 4” to remind us. There’s some point in this four-dimensional space that describes these blocks on the track. That’s the “ball” for this differential equation.
The things that the blocks can do? Like, they can collide? They maybe have rubber tips so they bounce off each other? Maybe someone’s put magnets on them so they’ll draw together or repel? Maybe there’s a spring connecting them? These possible interactions are the shape of the hills that the ball representing the system “rolls” over. An impenetrable barrier, like, two things colliding, is a vertical wall. Two things being attracted is a little divot. Two things being repulsed is a little hill. Things like that.
Now you see why an ordinary differential equation might be interesting. It can capture what happens when many separate things interact.
I write this as though ordinary and partial differential equations are different continents of thought. They’re not. When you model something you make choices and they can guide you to ordinary or to partial differential equations. My own research work, for example, was on planetary atmospheres. Atmospheres are fluids. Representing how fluids move usually calls for partial differential equations. But my own interest was in vortices, swirls like hurricanes or Jupiter’s Great Red Spot. Since I was acting as if the atmosphere was a bunch of storms pushing each other around, this implied ordinary differential equations.
There are more hemispheres of differential equations. They have names like homogenous and non-homogenous. Coupled and decoupled. Separable and nonseparable. Exact and non-exact. Elliptic, parabolic, and hyperbolic partial differential equations. Don’t worry about those labels. They relate to how difficult the equations are to solve. What ways they’re difficult. In what ways they break computers trying to approximate their solutions.
What’s interesting about these, besides that they represent many physical problems, is that they capture the idea of feedback. Of control. If a system’s current state affects how it’s going to change, then it probably has a differential equation describing it. Many systems change based on their current state. So differential equations have long been near the center of professional mathematics. They offer great and exciting pure questions while still staying urgent and relevant to real-world problems. They’re great things.
Today’s A To Z term is category theory. It was suggested by aajohannas, on Twitter as @aajohannas. It’s a topic I have long wanted to know better, and that every year or so I make a new attempt to try learning without ever feeling like I’ve made progress.
The language of it is beautiful, though. Much of its work is attractive just to see, too, as the field’s developed notation that could be presented as visual art. Much of mathematics could be visual art, yes, but these are art you can almost create in ASCII. It’s amazing.
What is the most important part of mathematics? Well, the part you wish you understood, yes. But what’s the fundamental part? The piece of mathematics that we could feel most sure an alien intelligence would agree is mathematics?
There’s idle curiosity behind this, yes. It’s a question implicit in some ideals of the Enlightenment. The notion that we should be able to find truths that all beings capable of reason would agree upon, and find themselves. Mathematics seems particularly good for this. If we have something proven by deductive logic from clearly stated axioms and definitions, then we know something true.
There’s practicality too. In the late 19th and early 20th century (western) mathematics tried to find logically rigorous foundations. You might have thought we always had that and, uh, not so much. It turns out a complete rigorous logical proof of even simple stuff takes forever. Mathematicians compose enough of an argument to convince other mathematicians that we could fill in the details. But we still trusted there was a rigorous foundation. The question is, what is it?
A great candidate for this was set theory. This was a great breakthrough. The basic modern idea of set theory builds on bunches of things, called elements. And collections of those things, called sets. And we build rigorous ideas of what it means for elements to be members of sets. This doesn’t sound like much. Powerful ideas never do.
I don’t know that everyone’s intuition is like this. But my gut wants to think a “powerful” result is, like, a great rocket. Some enormous and prominent and mighty thing that blasts through a problem like gravity or an atmosphere. This is almost the opposite of what mathematics means by “powerful”. A rocket is a fiddly, delicate thing. It has millions of components made to tight specifications. It can only launch when lots of conditions are exactly right. A theorem that gives a great result, but has a long list of prerequisites and lemmas that feed into it resembles this. A powerful mathematical result is more like the gravity that the rocket overcomes. It tends to suppose little about the situation, and so it provides results that are applicable over the whole field. Or over a wide field, or a surprising breadth of topics. And, really, mighty as a rocket might be, the gravity it fights is moreso.
So set theory is powerful. It can explain many things. Most amazing is that we can represent arithmetic with it. At least we can get to integers, and all that we do with integers, and that in not too much work. It makes sense that mathematicians latched onto this as critical. It fueled much of the thinking behind the New Math, the infamous attempted United States educational reform of the 1960s and 70s. I grew up in the tail end of this, learning unions and intersections and complements along with times tables and delighted in it.
But even before New Math became a coherent idea there was a better idea. Emmy Noether, mentioned yesterday, is not a part of it. But a part of her insight into physics, and into group theory, was an understanding of structure. That important mathematics results from considering what we can do with sets of things. And what things we can do that produce invariants, things that don’t change. Saunders Mac Lane, one of Noether’s students, and Samuel Eilenberg in the 1940s used what looks to me like this principle. They organized category theory.
Category Theory looks at first like set theory only made terrifying. I’m not very comfortable with it myself. It’s an abstract field, and I’m more at home with stuff I can write a quick Octave program to double-check. Many results in category theory are described, or even proved, with beautiful directed-graph lattices. They show how things relate to one another. This is definitely the field to study if you like drawing arrows.
Just as set theory does, category theory starts with things, called objects. And these objects get piled together into collections. And then there’s another collection of relationships between these collections. These relationships you call maps or morphisms or arrows, based on whatever the first book you kind of understood called them. I’m partial to “maps”. And then we have rules by which these maps compose, that is, where two maps reduce to a single map. This bundle of things — the objects, the collections, and the maps — is a category.
These objects can start out looking like elements, and the collections like sets, and the maps like functions. This gives me, at least, a patch of ground where I feel like I know what I’m doing. But what we need of things to be objects and collections and maps is very little. The result is great power. We can describe set theory in the language of categories. So we can describe arithmetic in category theory. There’s a bit of a hike from the start of category theory to, like, knowing what 18 plus 7 is.
But we’re not bound to anything that concrete. We can describe, for example, groups as categories. This gives us results like when we can factor polynomials. Or whether compass and straightedge can trisect an arbitrary angle. (There’s some work behind this too.) We can describe vector spaces as categories. Heady results like the idea that one function might be orthogonal to another lurk within this field. Manifolds, spaces that work like normal space, are part of the field. So are topological spaces, which tell us about shapes.
If you aren’t yet dizzy then consider this. A category is itself an object. So we can define maps between categories. These we call functors. Which themselves have use in computer science, as a way some kinds of software can be programmed well. More, maps themselves are objects. We can define mappings between maps. These we call natural transformations. Which are the things that Eilenberg and Mac Lane were particularly interested in, to start with. Category theory grew in part out of needing a better understanding of natural transformations.
I do not know what to recommend for people who want to really learn category theory. I haven’t found the textbook or the blog that makes me feel like I am mastering the subject. Writing this essay has introduced me to Dr Tom Leinster’s Basic Category Theory, which I’ve enjoyed skimming. Exercise 3.3.1, for example, seems like exactly the sort of problem I would pose if I knew category theory well enough to write a book on it.
Is this, finally, the mathematics we could be sure an alien would recognize? I’m skeptical, but I always am. It seems to me we build mathematics on arithmetic and geometry. Category theory, seeming to offer explanations of both, is a natural foundation for that. But we are evolved to see the world in terms of number and shape. Of course we see arithmetic and geometry as mathematics. Can we count on every being capable of reason seeing the same things as important? … I admit I can’t imagine a being we might communicate with not recognizing both. But this may say more about the limits of my imagination than about the limits of what could be mathematics.
Today’s A To Z term was suggested by Peter Mander. Mander authors CarnotCycle, which when I first joined WordPress was one of the few blogs discussing thermodynamics in any detail. When I last checked it still was, which is a shame. Thermodynamics is a fascinating field. It’s as deeply weird and counter-intuitive and important as quantum mechanics. Yet its principles are as familiar as a mug of warm tea on a chilly day. Mander writes at a more technical level than I usually do. But if you’re comfortable with calculus, or if you’re comfortable nodding at a line and agreeing that he wouldn’t fib to you about a thing like calculus, it’s worth reading.
I’ve written of my fondness for boredom. A bored mind is not one lacking stimulation. It is one stimulated by anything, however petty. And in petty things we can find great surprises.
I do not know what caused Georges-Louis Leclerc, Comte de Buffon, to discover the needle problem named for him. It seems like something born of a bored but active mind. Buffon had an active mind: he was one of Europe’s most important naturalists of the 1700s. He also worked in mathematics, and astronomy, and optics. It shows what one can do with an engaged mind and a large inheritance from one’s childless uncle who’s the tax farmer for all Sicily.
The problem, though. Imagine dropping a needle on a floor that has equally spaced parallel lines. What is the probability that the needle will land on any of the lines? It could occur to anyone with a wood floor who’s dropped a thing. (There is a similar problem which would occur to anyone with a tile floor.) They have only to be ready to ask the question. Buffon did this in 1733. He had it solved by 1777. We, with several centuries’ insight into probability and calculus, need less than 44 years to solve the question.
Let me use L as the length of the needle. And d as the spacing of the parallel lines. If the needle’s length is less than the spacing then this is an easy formula to write, and not too hard to calculate. The probability, P, of the needle crossing some line is:
I won’t derive it rigorously. You don’t need me for that. The interesting question is whether this formula makes sense. That L and d are in it? Yes, that makes sense. The length of the needle and the gap between lines have to be in there. More, the probability has to have the ratio between the two. There’s different ways to argue this. Dimensional analysis convinces me, at least. Probability is a pure number. L is a measurement of length; d is a measurement of length. To get a pure number starting with L and d means one of them has to divide into the other. That L is in the numerator and d the denominator makes sense. A tiny needle has a tiny chance of crossing a line. A large needle has a large chance. That is raised to the first power, rather than the second or third or such … well, that’s fair. A needle twice as long having twice the chance of crossing a line? That sounds more likely than a needle twice as long having four times the chance, or eight times the chance.
Does the 2 belong there? Hard to say. 2 seems like a harmless enough number. It appears in many respectable formulas. That π, though …
That π …
π comes to us from circles. We see it in calculations about circles and spheres all the time. We’re doing a problem with lines and line segments. What business does π have showing up?
We can find reasons. One way is to look at a similar problem. Imagine dropping a disc on these lines. What’s the chance the disc falls across some line? That’s the chance that the center of the disc is less than one radius from any of the lines. What if the disc has an equal chance of landing anywhere on the floor? Then it has a probability of of crossing a line. If the radius is smaller than the distance between lines, anyway. If the radius is larger than that, the probability is 1.
Now draw a diameter line on this disc. What’s the chance that this diameter line crosses this floor line? That depends on a couple things. Whether the center of the disc is near enough a floor line. And what angle the diameter line makes with respect to the floor lines. If the diameter line is parallel the floor line there’s almost no chance. If the diameter line is perpendicular to the floor line there’s the best possible chance. But that angle might be anything.
Let me call that angle θ. The diameter line crosses the floor line if the diameter times the sine of θ is less than half the distance between floor lines. … Oh. Sine. Sine and cosine and all the trigonometry functions we get from studying circles, and how to draw triangles within circles. And this diameter-line problem looks the same as the needle problem. So that’s where π comes from.
I’m being figurative. I don’t think one can make a rigorous declaration that the π in the probability formula “comes from” this sine, any more than you can declare that the square-ness of a shape comes from any one side. But it gives a reason to believe that π belongs in the probability.
If the needle’s longer than the gap between floor lines, if , there’s still a probability that the needle crosses at least one line. It never becomes certain. No matter how long the needle is it could fall parallel to all the floor lines and miss them all. The probability is instead:
Here is the world-famous arcsecant function. That is, it’s whatever angle has as its secant the number . I don’t mean to insult you. I’m being kind to the person reading this first thing in the morning. I’m not going to try justifying this formula. You can play with numbers, though. You’ll see that if is a little bit bigger than 1, the probability is a little more than what you get if is a little smaller than 1. This is reassuring.
The exciting thing is arithmetic, though. Use the probability of a needle crossing a line, for short needles. You can re-write it as this:
L and d you can find by measuring needles and the lines. P you can estimate. Drop a needle many times over. Count how many times you drop it, and how many times it crosses a line. P is roughly the number of crossings divided by the number of needle drops. Doing this gives you a way to estimate π. This gives you something to talk about on Pi Day.
It’s a rubbish way to find π. It’s a lot of work, plus you have to sweep needles off the floor. Well, you can do it in simulation and avoid the risk of stepping on an overlooked needle. But it takes a lot of needle-drops to get good results. To be certain you’ve calculated the first two decimal points correctly requires 3,380,000 needle-drops. Yes, yes. You could get lucky and happen to hit on an estimate of 3.14 for π with fewer needle-drops. But if you were sincerely trying to calculate the digits of π this way? If you did not know what they were? You would need the three and a third million tries to be confident you had the number correct.
So this result is, as a practical matter, useless. It’s a heady concept, though. We think casually of randomness as … randomness. Unpredictability. Sometimes we will speak of the Law of Large Numbers. This is several theorems in probability. They all point to the same result. That if some event has (say) a probability of one-third of happening, then given 30 million chances, it will happen quite close to 10 million times.
This π result is another casting of the Law of Large Numbers, and of the apparent paradox that true unpredictability is itself predictable. There is no way to predict whether any one dropped needle will cross any line. It doesn’t even matter whether any one needle crosses any line. An enormous number of needles, tossed without fear or favor, will fall in ways that embed π. The same π you get from comparing the circumference of a circle to its diameter. The same π you get from looking at the arc-cosine of a negative one.
I suppose we could use this also to calculate the value of 2, but that somehow seems to touch lesser majesties.
Today’s A To Z term is the Abacus. It was suggested by aajohannas, on Twitter as @aajohannas. Particularly asked for was how to use an abacus. The abacus has been used by a great many cultures over thousands of years. So it’s hard to say that there is any one right way to use it. I’m going to get into a way to use it to compute, any more than there is a right way to use a hammer. There are many hammers, and many things to hammer. But there are similarities between all hammers, and the ways to use them as hammers are similar. So learning one kind, and one way to use that kind, can be a useful start.
I taught at the National University of Singapore in the first half of the 2000s. At the student union was this sheltered overhang formed by a stairwell. Underneath it, partly exposed to the elements (a common building style there) was a convenience store. Up front were the things with high turnover, snacks and pop and daily newspapers, that sort of thing. In the back, beyond the register, in the areas that the rain, the only non-gentle element, couldn’t reach even whipped by wind, were other things. Miscellaneous things. Exam bluebooks faded with age and dust. Good-luck cat statues colonized by spiderwebs. Unlabelled power cables for obsolete electronics. Once when browsing through this I encountered two things that I bought as badges of office.
One was a slide rule, a proper twelve-inch one. I’d had one already, a $2 six-inch-long one I’d gotten as an undergraduate from a convenience store the university had already decided to evict. The NUS one was a slide rule you could do actual work on. Another was a soroban, a compact Japanese abacus, in a patterned cardboard box a half-inch too short to hold it. I got both. For the novelty, yes. Also, I taught Computational Science. I felt it appropriate to have these iconic human computing devices.
But do I use them? Other than for decoration? … No, not really. I have too many calculators to need them. Also I am annoyed that while I can lay my hands on the slide rule I have put the soroban somewhere so logical and safe I can’t find it. A couple photographs would improve this essay. Too bad.
Do I know how to use them? If I find them? The slide rule, sure. If you know that a slide rule works via logarithms, and you play with it a little? You know how to use a slide rule. At least a little, after a bit of experimentation and playing with the three times table.
The abacus, though? How do you use that?
In childhood I heard about abacuses. That there’s a series of parallel rods, each with beads on them. Four placed below a center beam, one placed above. Sometimes two placed above. That the lower beads on a rod represent one each. That the upper bead represents five. That some people can do arithmetic on that faster than others can an electric calculator. And that was about all I got, or at least retained. How to do this arithmetic never penetrated my brain. I imagined, well, addition must be easy. Say you wanted to do three plus six, well, move three lower beads up to the center bar. Then slide one lower and one upper bead, for six, to the center bar, and read that off. Right?
The bizarre thing is my naive childhood idea is right. At least in the big picture. Let each rod represent one of the numbers in base-ten style. It’s anachronistic to the abacus’s origins to speak of a ones rod, a tens rod, a hundreds rod, and so on. So what? We’re using this tool today. We can use the ideas of base ten to make our understanding easier.
Pick a row of beads that you want to represent the ones. The row to the left of that represents tens. To the left of that, hundreds. To the right of the ones is the one-tenths, and the one-hundredths, and so on. This goes on to however far your need and however big your abacus is.
Move beads to the center to represent numbers you want. If you want ’21’, slide two lower beads up in the tens column and one lower bead in the ones column. If you want ’38’, slide three lower beads up in the tends column and one upper and three lower beads in the ones column.
To add two numbers, slide more beads representing those numbers toward the center bar. To subtract, slide beads away. Multiplication and division were beyond my young imagination. I’ll let them wait a bit.
There are complications. The complications are for good reason. When you master them, they make computation swifter. But you pay for that later speed with more time spent learning. This is a trade we make, and keep making, in computational mathematics. We make a process more reliable, more speedy, by making it less obvious.
Some of this isn’t too difficult. Like, work in one direction so far as possible. It’s easy to suppose this is better than jumping around from, say, the thousands digit to the tens to the hundreds to the ones. The advice I’ve read says work from the left to the right, that is, from the highest place to the lowest. Arithmetic as I learned it works from the ones to the tens to the hundreds, but this seems wiser. The most significant digits get calculated first this way. It’s usually more important to know the answer is closer to 2,000 than to 3,000 than to know that the answer ends in an 8 rather than a 6.
Some of this is subtle. This is to cope with practical problems. Suppose you want to add 5 to 6? There aren’t that many beads on any row. A Chinese abacus, which has two beads on the upper part, could cope with this particular problem. It’s going to be in trouble when you want to add 8 to 9, though. That’s not unique to an abacus. Any numerical computing technique can be broken by some problem. This is why it’s never enough to calculate; we still have to think. For example, thinking will let us handle this five plus six difficulty.
Consider this: four plus one is five. So four and one are “complementary numbers”, with respect to five. Similarly, three and two are five’s complementary numbers. So if we need to add four to a number, that’s equivalent to adding five and subtracting one. If we need to add two, that’s equivalent to adding five and subtracting three. This will get us through some shortages in bead count.
And consider this: four plus six is ten. So four and six are ten-complementary numbers. Similarly, three and seven are ten’s complementary numbers. Two and eight. One and nine. This gets us through much of the rest of the shortage.
So here’s how this works. Suppose we have 35, and wish to add 6 to it. There aren’t the beads to add six to the ones column. So? Instead subtract the complement of six. That is, add ten and subtract four. In moving across the rows, when you reach the tens, slide one lower bead up, making the abacus represent 45. Then from the ones column subtract four. that is, slide the upper bead away from the center bar, and slide the complement to four, one bead, up to the center. And now the abacus represents 41, just like it ought.
If you’re experienced enough you can reduce some of these operations, sliding the beads above and below the center bar at once. Or sliding a bead in the tens and another in the ones column at once. Don’t fret doing this. Worry about making correct steps. You’ll speed up with practice. I remember advice from a typesetting manual I collected once: “strive for consistent, regular keystrokes. Speed comes with practice. Errors are time-consuming to correct”. This is, mutatis mutandis, always good advice.
Subtraction works like addition. This should surprise few. If you have the beads in place, just remove them: four minus two takes no particular insight. If there aren’t enough beads? Fall back on complements. If you wish to do 35 minus 6? Set up 35, and calculate 35 minus 10 plus 4. When you get to the tens rod, slide one bead down; this leaves you with 25. Then in the ones column, slide four beads up. This leaves you with 29. I’m so glad these seem to be working out.
Doing longer additions and subtractions takes more rows, but not actually more work. 35.2 plus 6.4 is the same work as 35 plus 6 and 2 plus 4, each of which you, in principle, know how to do. 35.2 minus 6.4 is a bit more fuss. When you get to the 2 minus 4 bit you have to do that addition-of-complements stuff. But that’s not any new work.
Where the decimal point goes is something you have to keep track of. As with the slide rule, the magnitude of these numbers is notional. Your fingers move the same way to add 352 and 64 as they will 0.352 and 0.064. That’s convenient.
Multiplication gets more tedious. It demands paying attention to where the decimal point is. Just like the slide rule demands, come to think of it. You’ll need columns on the abacus for both the multiplicands and the product. And you’ll do a lot of adding up. But at heart? 2038 times 3.7 amounts to doing eight multiplications. 8 times 7, 3 times 7, 0 times 7 (OK, that one’s easy), 2 times 7, 3 times 7, 3 times 3, 0 times 3 (again, easy), and 2 times 3. And then add up these results in the correct columns. This may be tedious, but it’s not hard. It does mean the abacus doesn’t spare you having to know some times tables. I mean, you could use the abacus to work out 8 times 7 by adding seven to itself over and over, but that’s time-consuming. You can save time, and calculation steps, by memorization. By knowing some answers ahead of time.
Totton Heffelfinger and Gary Flom’s page, from which I’m drawing almost all my practical advice, offers a good notation of lettering the rods of the abacus, A, B, C, D, and so on. To multiply, say, 352 by 64 start by putting the 64 on rods BC. Set the 352 on rods EFG. We’ll get the answer with rod K as the ones column. 2 times 4 is 8; put that on rod K. 5 times 4 is 20; add that to rods IJ. 3 times 4 is 12; add that to rods HI. 2 times 6 is 12; add that to rods IJ. 5 times 6 is 30; add that to rods HI. 3 times 6 is 18; add that to rods GH. All going well this should add up to 22,528, spread out along rods GHIJK. I can see right away at least the 8 is correct.
You would do the same physical steps to multiply, oh, 3.52 by 0.0064. You have to take care of the decimal place yourself, though.
I see you, in the back there, growing suspicious. I’ll come around to this. Don’t worry.
Division is … oh, I have to fess up. Division is not something I feel comfortable with. I can read the instructions and repeat the examples given. I haven’t done it enough to have that flash where I understand the point of things. I recognize what’s happening. It’s the work of division as done by hand. You know, 821 divided by 56 worked out by, well, 56 goes into 82 once with a remainder of 26. Then drop down the 1 to make this 261. 56 goes into 261 … oh, it would be so nice if it went five times, but it doesn’t. It goes in four times, with a remainder of 37. I can walk you through the steps but all I am truly doing is trying to keep up with Totton Heffelfinger and Gary Flom’s instructions here.
There are, I read, also schemes to calculate square roots on the abacus. I don’t know that there are cube-root schemes also. I would bet on there being such, though.
Never mind, though. The suspicious thing I expect you’ve noticed is the steps being done. They’re represented as sliding beads along rods, yes. But the meaning of these steps? They’re the same steps you would do by doing arithmetic on paper. Sliding two beads and then two more beads up to the center bar isn’t any different from looking at 2 + 2 and representing that as 4. All this ten’s-complement stuff to subtract one number from another is just … well, I learned it as subtraction by “borrowing”. I don’t know the present techniques but I’m sure they’re at heart the same. But the work is eerily like what you would do on paper, using Arabic numerals.
The slide rule uses a logarithm-based ruler. This makes the addition of distances along the slides match the multiplication of the values of the rulers. What does the abacus do to help us compute?
Why use an abacus?
What an abacus gives us is memory. It stores numbers. It lets us break a big problem into a series of small problems. It lets us keep a partial computation while we work through those steps. We don’t add 35.2 to 6.4. We add 3 to 0 and 5 to 6 and 2 to 4. We don’t multiply 2038 by 3.7. We multiply 8 by 7, and 8 by 3, and 3 by 7, and 3 by 3, and so on.
And this is most of numerical computing, even today. We describe what we want to do as these high-level operations. But the computation is a lot of calculations, each one of them simple. We use some memory to hold partially completed results. Memory, the ability to store results, lets us change hard problems into long strings of simple ones.
We do more things the way the abacus encourages. We even use those complementary numbers. Not five’s or ten’s complements, not with binary arithmetic computers. Two’s complement arithmetic makes it possible to subtract, or write negative numbers, in ways that are easy to calculate. That there are a set number of rods even has its parallel in modern computing. When representing a real number on the computer we have only so many decimal places. (Yes, yes, binary digit places.) At least unless we use a weird data structure. This affects our calculations. There are numbers we can’t represent perfectly, such as one-third. We need to think about whether this affects what we are using our calculation for.
There are major differences between a digital computer and a person using the abacus. But the processes are similar. This may help us to understand why computational science works the way it does. It may at least help us understand those contests in the 1950s where the abacus user was faster than the calculator user.
But no, I confess, I only use mine for decoration, or will when I find it again.
And a good late August to all my readers. I’m as ready as can be for my Fall 2019 Mathematics A-To-Z. For this I hope to explore one word or concept for each letter in the alphabet, one essay for each. I’m trying, as I did last year, to publish just two essays per week. I like to think this will keep my writing load from being too much. I’m fooling only myself.
For topics, though, I like to ask readers for suggestions. And I’ll be asking just for parts of the alphabet at a time. I’ve found this makes it easier for me to track suggestions. It also makes it easier for me to think about which subjects I feel I can write the most interesting essay about. This is in case I get more than one nomination for a particular letter. That’s hardly guaranteed, but I do like thinking this might happen.
If you do leave a suggestion, please also mention whether you host your own blog or YouTube channel or Twitter or Mathstodon account. Anything that you’d like people to know about.
I’ve done five of these A To Z sequences before, from 2015 through to last year. I won’t necessarily refuse to re-explore something I’ve already written up. There are certainly ones I could improve, given another chance. But I’d probably look to write about a fresh topic.
I hope to start the first week of September, mostly so that I end by (United States) Thanksgiving. The letters that I would like to finish by September are the first eight, A through H. Covered in past years from this have been:
Thank you for any thoughts you have. Please ask if there are any questions. And I intend for this to be open to topics in any field of mathematics, including the ones I don’t really know. Writing about something I’m just learning about is terrifying and fun. It’s a large part of why I do these things every year, and also why I don’t do them more than once a year.
I have a tradition at the end of an A To Z sequence of looking back and considering what I learned doing it. Sometimes this is mathematics I’ve learned. At the risk of spoiling the magic, I don’t know as much as I present myself as knowing. I’ll often take an essay topic and study up before writing, and hope that I look competent enough that nobody seriously questions me. Yes, I thought I was a pretty good student journalist back in the day, and harbored fantasies of doing that for a career. This before I went on to fantasize about doing mathematics. Still; I also learn things about writing in doing a big writing project like this. And now I’ve had some breathing space to sit and think. I can try finding out what I thought.
The major differences between this A To Z and those of past years was scheduling. Through 2017 I’d done A To Z’s posted three days a week. This is a thrilling schedule. It makes it easy to have full weeks, even full months, with some original posting every day. I can tell myself that the number of posts doesn’t matter. It’s the quality that does. It doesn’t work. I tend toward compulsive behavior and a-post-a-day is so gratifying.
But I also knew that the last quarter of 2018 would be busy and I had to cut something down. Some of that is Reading the Comics posts. Including pictures of each comic I discuss adds considerably to the production time. This is not least because I feel I can’t reasonably claim Fair Use of the comics without writing something of more substance. Moving files around and writing alt-text for the images also takes time. But the time can be worth it. Doing that has sometimes made me think longer, or even better, about the comics.
So switching to two-a-week seemed the thing to do. It would spread the A To Z over three months, but that’s not bad. I figured to prove out whether that schedule worked. If I could find one that let me do possibly several A To Z sequences in a year that’d be wonderful too. I’ve only done that once, so far, in 2016, but I think the exercise is always good if exhausting for me.
This completely failed to save time. I had fewer things to write, but somehow, I only wrote more. All my writing’s getting longer as I go on, yes. Last year my average post exceeded a thousand words, though, and that counts Reading the Comics and pointers to things I’ve been reading and all that. There’s a similar steady expansion going on in my humor blog. Possibly having more time between essays encouraged me to write longer for each. Work famously expands to fit the time available, and having as many as three full days to write, rather than one, might be dangerous. For the first time in an A To Z I never got ahead of myself. I would, at best, be researching and making notes for the next essay while waiting for the current one to post. There’s value in dangerous writing. But I don’t like that as a habit.
Another scheduling change was in how I took topic suggestions. In the past I’d thrown the whole alphabet open at once. This time I broke the alphabet into a couple of pieces, and asked for about one-quarter at a time. This, overall, worked. For one, it gave me more chances to talk about the A To Z. Talking about something is one of the non-annoying ways to advertise a thing. And I think it helped a greater variety of people suggest topics. I did have more collisions this time around, letters for which several people suggested different ideas. That’s a happy situation to have. Thinking of what to write is the hard part; going on about a topic someone else named? That’s easy. So I’ll certainly keep that.
I did write some about maybe doing supplementary pieces, based on topics I didn’t use for the main line. Might yet do that, perhaps under rules where I do one a week, or limit myself to 700 words, or something like that. It might be worth doing a couple just to have a buffer against weeks that there’s no comic strips worth discussing. Or to head off gaps next time around, although that would spoil some letters for people.
Also I completely ran out of ‘X’ topics, and went with the 90s alternative of “extreme”. There are plenty of “extreme” things in mathematics I could write about. But that feels a bit chintzy to do too often.
This time around I changed focus on many of my essays. It wasn’t a conscious thing, not to start with. But I got to writing more about the meaning and significance and cultural import of topics, rather than definitions or descriptions of the use of things. This is a natural direction to go for a topic like Fermat’s Last Theorem, or the Infinite Monkey Theorem, or mathematics jokes. I liked the way those pieces turned out, though, and tried doing more of it. This likely helped the essays grow so long. Context demands space, after all. And more thinking. Thinking’s the hard part of writing, but it’s also fun, because when you’re thinking about a subject you aren’t typing any specific words.
But it’s probably a worthwhile shift. For a pop mathematics blog to describe what makes something a “smooth” function is all right. But it’s not a story unless it says why we should care. That’s more about context than about definitions, which anyone could get by typing ‘mathworld smooth’ into DuckDuckGo. For all the trouble this causes me, it’s the way to go.
Every good lesson carries its opposite along, though. One of the requests this time around was about Lord Kelvin. There’s no end of things you can write about him: he did important work in basically every field of science and mathematics as the 19th century knew it. It’s easy to start writing about his work and never stop. I did the opposite, taking one tiny and often-overlooked piece and focusing on that. I’m not sure it alone would convince anyone of Kelvin’s exhaustive greatness. But I don’t imagine anyone interested in reading a single essay on Kelvin would never read a second one. It seems to me a couple narrow-focus essays help in that context. Seeing more of one detail gives scale to the big picture.
I’ve done just the one A To Z the last two years. There’s surely an optimal rate for doing these. The sequences are usually good for my readership. My experience, tracking monthly readership figures, suggests that just posting more often is good for my readership. They’re also the thing I write that most directly solicits reader responses. They’re also exhausting. The last several letters are always a challenge. The weeks after a sequence is completed I collapse into a little recuperative bubble. So I want to do these as much as I can without burning out on the idea. Also without overloading Thomas Dye, who’s been so good as to make the snappy banners for these pieces. He has his own projects, including the web comic Projection Edge, to worry about. More than once a year is probably sustainable. I may also want to stack this with hosting the Playful Mathematics Education Blog Carnival again, if I’m able to this year.
Deep down, though, I think the best moment of my Fall 2018 A To Z might have been in the first. I wrote about asymptotes and realized I could put in ordinary words why they were a thing worth having. If I could have three insights like that a year I’d be a great mathematics writer.
I have reached the end! Thirteen weeks at two essays per week to describe a neat sampling of mathematics. I hope to write a few words about what I learned by doing all this. In the meanwhile, though, I want to gather together the list of all the essays I did put into this project.
Some areas of mathematics struggle against the question, “So what is this useful for?” As though usefulness were a particular merit — or demerit — for a field of human study. Most mathematics fields discover some use, though, even if it takes centuries. Others are born useful. Probability, for example. Statistics. Know what the fields are and you know why they’re valuable.
Game theory is another of these. The subject, as often happens, we can trace back centuries. Usually as the study of some particular game. Occasionally in the study of some political science problem. But game theory developed a particular identity in the early 20th century. Some of this from set theory experts. Some from probability experts. Some from John von Neumann, because it was the 20th century and all that. Calling it “game theory” explains why anyone might like to study it. Who doesn’t like playing games? Who, studying a game, doesn’t want to play it better?
But why it might be interesting is different from why it might be important. Think of what a game is. It is a string of choices made by one or more parties. The point of the choices is to achieve some goal. Put that way you realize: this is everything. All life is making choices, all in the pursuit of some goal, even if that goal is just “not end up any worse off”. I don’t know that the earliest researchers in game theory as a field realized what a powerful subject they had touched on. But by the 1950s they were doing serious work in strategic planning, and by 1964 were even giving us Stanley Kubrick movies.
This is taking me away from my glossary term. The field of games is enormous. If we narrow the field some we can discuss specific kinds of games. And say more involved things about these games. So first we’ll limit things by thinking only of sequential games. These are ones where there are a set number of players, and they take turns making choices. I’m not sure whether the field expects the order of play to be the same every time. My understanding is that much of the focus is on two-player games. What’s important is that at any one step there’s only one party making a choice.
The other thing narrowing the field is to think of information. There are many things that can affect the state of the game. Some of them might be obvious, like where the pieces are on the game board. Or how much money a player has. We’re used to that. But there can be hidden information. A player might conceal some game money so as to make other players underestimate her resources. Many card games have one or more cards concealed from the other players. There can be information unknown to any party. No one can make a useful prediction what the next throw of the game dice will be. Or what the next event card will be.
But there are games where there’s none of this ambiguity. These are called games with “perfect information”. In them all the players know the past moves every player has made. Or at least should know them. Players are allowed to forget what they ought to know.
There’s a separate but similar-sounding idea called “complete information”. In a game with complete information, players know everything that affects the gameplay. At least, probably, apart from what their opponents intend to do. This might sound like an impossibly high standard, at first. All games with shuffled decks of cards and with dice to roll are out. There’s no concealing or lying about the state of affairs.
Set complete-information aside; we don’t need it here. Think only of perfect-information games. What are they? Some ancient games, certainly. Tic-tac-toe, for example. Some more modern versions, like Connect Four and its variations. Some that are actually deep, like checkers and chess and go. Some that are, arguably, more puzzles than games, as in sudoku. Some that hardly seem like games, like several people agreeing how to cut a cake fairly. Some that seem like tests to prove people are fundamentally stupid, like when you auction off a dollar. (The rules are set so players can easily end up paying more then a dollar.) But that’s enough for me, at least. You can see there are games of clear, tangible interest here.
The last restriction: think only of two-player games. Or at least two parties. Any of these two-party sequential games with perfect information are a part of “combinatorial game theory”. It doesn’t usually allow for incomplete-information games. But at least the MathWorld glossary doesn’t demand they be ruled out. So I will defer to this authority. I’m not sure how the name “combinatorial” got attached to this kind of game. My guess is that it seems like you should be able to list all the possible combinations of legal moves. That number may be enormous, as chess and go players are always going on about. But you could imagine a vast book which lists every possible game. If your friend ever challenged you to a game of chess the two of you could simply agree, oh, you’ll play game number 2,038,940,949,172 and then look up to see who won. Quite the time-saver.
Most games don’t have such a book, though. Players have to act on what they understand of the current state, and what they think the other player will do. This is where we get strategies from. Not just what we plan to do, but what we imagine the other party plans to do. When working out a strategy we often expect the other party to play perfectly. That is, to make no mistakes, to not do anything that worsens their position. Or that reduces their chance of winning.
… And yes, arguably, the word “chance” doesn’t belong there. These are games where the rules are known, every past move is known, every future move is in principle computable. And if we suppose everyone is making the best possible move then we can imagine forecasting the whole future of the game. One player has a “chance” of winning in the same way Christmas day of the year 2038 has a “chance” of being on a Tuesday. That is, the probability is just an expression of our ignorance, that we don’t happen to be able to look it up.
But what choice do we have? I’ve never seen a reference that lists all the possible games of tic-tac-toe. And that’s about the simplest combinatorial-game-theory game anyone might actually play. What’s possible is to look at the current state of the game. And evaluate which player seems to be closer to her goal. And then look at all the possible moves.
There are three things a move can do. It can put the party closer to the goal. It can put the party farther from the goal. Or it can do neither. On her turn the other party might do something that moves you farther from your goal, moves you closer to your goal, or doesn’t affect your status at all. It seems like this makes strategy obvious. On every step take the available move that takes one closest to the goal. This is known as a “greedy” strategy. As the name suggests it isn’t automatically bad. If you expect the game to be a short one, greed might be the best approach. The catch is that moves that seem less good — even ones that seem to hurt you initially — might set up other, even better moves. So strategy requires some thinking beyond the current step. Properly, it requires thinking through to the end of the game. Or at least until the end of the game seems obvious.
We should like a strategy that leaves us no choice but to win. Next-best would be one that leaves the game undecided, since something might happen like the other player needing to catch a bus and so resigning. This is how I got my solitary win in the two months I spent in the college chess club. Worst would be the games that leave us no choice but to lose.
It can be that there are no good moves. That is, that every move available makes it a little less likely that we win. Sometimes a game offers the chance to pass, preserving the state of the game but giving the other party the turn. Then maybe the other party will do something that creates a better opportunity for us. But if we are allowed to pass, there’s a good chance the game lets the other party pass, too, and we end up in the same fix. And it may be the rules of the game don’t allow passing anyway. One must move.
The phenomenon of having to make a move when it’s impossible to make a good move has prominence in chess. I don’t have the chess knowledge to say how common the situation is. But it seems to be a situation people who study chess problems love. I suppose it appeals to a love of lost causes and the hope that you can be brilliant enough to see what everyone else has overlooked. German chess literate gave it a name 160 years ago, “zugzwang”, “compulsion to move”. Somehow I never encountered the term when I was briefly a college chess player. Perhaps because I was never in zugzwang and was just too incompetent a player to find my good moves. I first encountered the term in Michael Chabon’s The Yiddish Policeman’s Union. The protagonist picked up on the term as he investigated the murder of a chess player and then felt himself in one.
Combinatorial game theorists have picked up the word, and sharpened its meaning. If I understand correctly chess players allow the term to be used for any case where a player hurts her position by moving at all. Game theorists make it more dire. This may reflect their knowledge that an optimal strategy might require taking some dismal steps along the way. The game theorist formally grants the term only to the situation where the compulsion to move changes what should be a win into a loss. This seems terrible, but then, we’ve all done this in play. We all feel terrible about it.
I’d like here to give examples. But in searching the web I can find only either courses in game theory. These are a bit too much for even me to sumarize. Or chess problems, which I’m not up to understanding. It seems hard to set out an example: I need to not just set out the game, but show that what had been a win is now, by any available move, turned into a loss. Chess is looser. It even allows, I discover, a double zugzwang, where both players are at a disadvantage if they have to move.
It’s a quite relatable problem. You see why game theory has this reputation as mathematics that touches all life.