## My Little 2021 Mathematics A-to-Z: Ordinary Differential Equations

Mr Wu, my Singapore Maths Tuition friend, has offered many fine ideas for A-to-Z topics. This week’s is another of them, and I’m grateful for it.

# Ordinary Differential Equations

As a rule, if you can do something with a number, you can do the same thing with a function. Not always, of course, but the exceptions are fewer than you might imagine. I’ll start with one of those things you can do to both.

A powerful thing we learn in (high school) algebra is that we can use a number without knowing what it is. We give it a name like ‘x’ or ‘y’ and describe what we find interesting about it. If we want to know what it is, we (usually) find some equation or set of equations and find what value of x could make that true. If we study enough (college) mathematics we learn its equivalent in functions. We give something a name like f or g or Ψ and describe what we know about it. And then try to find functions which make that true.

There are a couple common types of equation for these not-yet-known functions. The kind you expect to learn as a mathematics major involves differential equations. These are ones where your equation (or equations) involve derivatives of the not-yet-known f. A derivative describes the rate at which something changes. If we imagine the original f is a position, the derivative is velocity. Derivatives can have derivatives also; this second derivative would be the acceleration. And then second derivatives can have derivatives also, and so on, into infinity. When an equation involves a function and its derivatives we have a differential equation.

(The second common type is the integral equation, using a function and its integrals. And a third involves both derivatives and integrals. That’s known as an integro-differential equation, and isn’t life complicated enough? )

Differential equations themselves naturally divide into two kinds, ordinary and partial. They serve different roles. Usually an ordinary differential equation we can describe the change for from knowing only the current situation. (This may include velocities and accelerations and stuff. We could ask what the velocity at an instant means. But never mind that here.) Usually a partial differential equation bases the change where you are on the neighborhood of where your location. If you see holes you can pick in that, you’re right. The precise difference is about the independent variables. If the function f has more than one independent variable, it’s possible to take a partial derivative. This describes how f changes if one variable changes while the others stay fixed. If the function f has only the one independent variable, you can only take ordinary derivatives. So you get an ordinary differential equation.

But let’s speak casually here. If what you’re studying can be fully represented with a dashboard readout? Like, an ordered list of positions and velocities and stuff? You probably have an ordinary differential equation. If you need a picture with a three-dimensional surface or a color map to understand it? You probably have a partial differential equation.

One more metaphor. If you can imagine the thing you’re modeling as a marble rolling around on a hilly table? Odds are that’s an ordinary differential equation. And that representation covers a lot of interesting problems. Marbles on hills, obviously. But also rigid pendulums: we can treat the angle a pendulum makes and the rate at which those change as dimensions of space. The pendulum’s swinging then matches exactly a marble rolling around the right hilly table. Planets in space, too. We need more dimensions — three space dimensions and three velocity dimensions — for each planet. So, like, the Sun-Earth-and-Moon would be rolling around a hilly table with 18 dimensions. That’s all right. We don’t have to draw it. The mathematics works about the same. Just longer.

[ To be precise we need three momentum dimensions for each orbiting body. If they’re not changing mass appreciably, and not moving too near the speed of light, velocity is just momentum times a constant number, so we can use whichever is easier to visualize. ]

We mostly work with ordinary differential equations of either the first or the second order. First order means we have first derivatives in the equation, but never have to deal with more than the original function and its first derivative. Second order means we have second derivatives in the equation, but never have to deal with more than the original function or its first or second derivatives. You’ll never guess what a “third order” differential equation is unless you have experience in reading words. There are some reasons we stick to these low orders like first and second, though. One is that we know of good techniques for solving most first- and second-order ordinary differential equations. For higher-order differential equations we often use techniques that find a related normal old polynomial. Its solution helps with the thing we want. Or we break a high-order differential equation into a set of low-order ones. So yes, again, we search for answers where the light is good. But the good light covers many things we like to look at.

There’s simple harmonic motion, for example. It covers pendulums and springs and perturbations around stable equilibriums and all. This turns out to cover so many problems that, as a physics major, you get a little sick of simple harmonic motion. There’s the Airy function, which started out to describe the rainbow. It turns out to describe particles trapped in a triangular quantum well. The van der Pol equation, about systems where a small oscillation gets energy fed into it while a large oscillation gets energy drained. All kinds of exponential growth and decay problems. Very many functions where pairs of particles interact.

This doesn’t cover everything we would like to do. That’s all right. Ordinary differential equations lend themselves to numerical solutions. It requires considerable study and thought to do these numerical solutions well. But this doesn’t make the subject unapproachable. Few of us could animate the “Pink Elephants on Parade” scene from Dumbo. But could you draw a flip book of two stick figures tossing a ball back and forth? If you’ve had a good rest, a hearty breakfast, and have not listened to the news yet today, so you’re in a good mood?

The flip book ball is a decent example here, too. The animation will look good if the ball moves about the “right” amount between pages. A little faster when it’s first thrown, a bit slower as it reaches the top of its arc, a little faster as it falls back to the catcher. The ordinary differential equation tells us how fast our marble is rolling on this hilly table, and in what direction. So we can calculate how far the marble needs to move, and in what direction, to make the next page in the flip book.

Almost. The rate at which the marble should move will change, in the interval between one flip-book page and the next. The difference, the error, may not be much. But there is a difference between the exact and the numerical solution. Well, there is a difference between a circle and a regular polygon. We have many ways of minimizing and estimating and controlling the error. Doing that is what makes numerical mathematics the high-paid professional industry it is. Our game of catch we can verify by flipping through the book. The motion of four dozen planets and moons attracting one another is harder to be sure we calculate it right.

I said at the top that most anything one can do with numbers one can do with functions also. I would like to close the essay with some great parallel. Like, the way that trying to solve cubic equations made people realize complex numbers were good things to have. I don’t have a good example like that for ordinary differential equations, where the study expanded our ideas of what functions could be. Part of that is that complex numbers are more accessible than the stranger functions. Part of that is that complex numbers have a story behind them. The story features titanic figures like Gerolamo Cardano, Niccolò Tartaglia and Ludovico Ferrari. We see some awesome and weird personalities in 19th century mathematics. But their fights are generally harder to watch from the sidelines and cheer on. And part is that it’s easier to find pop historical treatments of the kinds of numbers. The historiography of what a “function” is is a specialist occupation.

But I can think of a possible case. A tool that’s sometimes used in solving ordinary differential equations is the “Dirac delta function”. Yes, that Paul Dirac. It’s a weird function, written as $\delta(x)$. It’s equal to zero everywhere, except where $x$ is zero. When $x$ is zero? It’s … we don’t talk about what it is. Instead we talk about what it can do. The integral of that Dirac delta function times some other function can equal that other function at a single point. It strains credibility to call this a function the way we speak of, like, $sin(x)$ or $\sqrt{x^2 + 4}$ being functions. Many will classify it as a distribution instead. But it is so useful, for a particular kind of problem, that it’s impossible to throw away.

So perhaps the parallels between numbers and functions extend that far. Ordinary differential equations can make us notice kinds of functions we would not have seen otherwise.

And with this — I can see the much-postponed end of the Little 2021 Mathematics A-to-Z! You can read all my entries for 2021 at this link, and if you’d like can find all my A-to-Z essays here. How will I finish off the shortest yet most challenging sequence I’ve done yet? Will it be yellow and equivalent to the Axiom of Choice? Answers should come, in a week, if all starts going well.

## My Little 2021 Mathematics A-to-Z: Tangent Space

And now, finally, I resume and hopefully finish what was meant to be a simpler and less stressful A-to-Z for last year. I’m feeling much better about my stress loads now and hope that I can soon enjoy the feeling of having a thing accomplished.

This topic is one of many suggestions that Elkement, one of my longest blog-friendships here, offered. It’s a creation that sent me back to my grad school textbooks, some of those slender paperback volumes with tiny, close-set type that turn out to be far more expensive than you imagine. Though not in this case: my most useful reference here was V I Arnold’s Ordinary Differential Equations, stamped inside as costing \$18.75. The field is full of surprises. Another wonderful reference was this excellent set of notes prepared by Jodin Morey. They would have done much to help me through that class.

# Tangent Space

Stand in midtown Manhattan, holding a map of midtown Manhattan. You have — not a tangent space, not yet. A tangent plane, representing the curved surface of the Earth with the flat surface of your map, though. But the tangent space is near: see how many blocks you must go, along the streets and the avenues, to get somewhere. Four blocks north, three west. Two blocks south, ten east. And so on. Those directions, of where you need to go, are the tangent space around you.

There is the first trick in tangent spaces. We get accustomed, early in learning calculus, to think of tangent lines and then of tangent planes. These are nice, flat approximations to some original curve. But while we’re introduced to the tangent space, and first learn examples of it, as tangent planes, we don’t stay there. There are several ways to define tangent spaces. One recasts tangent spaces in group theory terms, describing them as a ring based on functions that are equal to zero at the tangent point. (To be exact, it’s an ideal, based on a quotient group, based on two sets of such functions.)

That’s a description mathematicians are inclined to like, not only because it’s far harder to imagine than a map of the city is. But this ring definition describes the tangent space in terms of what we can do with it, rather than how to calculate finding it. That tends to appeal to mathematicians. And it offers surprising insights. Cleverer mathematicians than I am notice how this makes tangent spaces very close to Lagrange multipliers. Lagrange multipliers are a technique to find the maximum of a function subject to a constraint from another function. They seem to work by magic, and tangent spaces will echo that.

I’ll step back from the abstraction. There’s relevant observations to make from this map of midtown. The directions “four blocks north, three west” do not represent any part of Manhattan. It describes a way you might move in Manhattan, yes. But you could move in that direction from many places in the city. And you could go four blocks north and three west if you were in any part of any city with a grid of streets. It is a vector space, with elements that are velocities at a tangent point.

The tangent space is less a map showing where things are and more one of how to get to other places, closer to a subway map than a literal one. Still, the topic is steeped in the language of maps. I’ll find it a useful metaphor too. We do not make a map unless we want to know how to find something. So the interesting question is what do we try to find in these tangent spaces?

There are several routes to tangent spaces. The one I’m most familiar with is through dynamical systems. These are typically physics-driven, sometimes biology-driven, problems. They describe things that change in time according to ordinary differential equations. Physics problems particularly are often about things moving in space. Space, in dynamical systems, becomes “phase space”, an abstract universe spanned by all of the possible values of the variables. The variables are, usually, the positions and momentums of the particles (for a physics problem). Sometimes time and energy appear as variables. In biology variables are often things that represent populations. The role the Earth served in my first paragraph is now played by a manifold. The manifold represents whatever constraints are relevant to the problem. That’s likely to be conservation laws or limits on how often arctic hares can breed or such.

The evolution in time of this system, though, is now the tracing out of a path in phase space. An understandable and much-used system is the rigid pendulum. A stick, free to swing around a point. There are two useful coordinates here. There’s the angle the stick makes, relative to the vertical axis, $\theta$. And there’s how fast the stick is changing, $\dot{\theta}$. You can draw these axes; I recommend $\theta$ as the horizontal and $\dot{\theta}$ as the vertical axis but, you know, you do you.

If you give the pendulum a little tap, it’ll swing back and forth. It rises and moves to the right, then falls while moving to the left, then rises and moves to the left, then falls and moves to the right. In phase space, this traces out an ellipse. It’s your choice whether it’s going clockwise or anticlockwise. If you give the pendulum a huge tap, it’ll keep spinning around and around. It’ll spin a little slower as it gets nearly upright, but it speeds back up again. So in phase space that’s a wobbly line, moving either to the right or the left, depending what direction you hit it.

You can even imagine giving the pendulum just the right tap, exactly hard enough that it rises to vertical and balances there, perfectly aligned so it doesn’t fall back down. This is a special path, the dividing line between those ellipses and that wavy line. Or setting it vertically there to start with and trusting no truck driving down the street will rattle it loose. That’s a very precise dot, where $\dot{\theta}$ is exactly zero. These paths, the trajectories, match whatever walking you did in the first paragraph to get to some spot in midtown Manhattan. And now let’s look again at the map, and the tangent space.

Within the tangent space we see what changes would change the system’s behavior. How much of a tap we would need, say, to launch our swinging pendulum into never-ending spinning. Or how much of a tap to stop a spinning pendulum. Every point on a trajectory of a dynamical system has a tangent space. And, for many interesting systems, the tangent space will be separable into two pieces. One of them will be perturbations that don’t go far from the original trajectory. One of them will be perturbations that do wander far from the original.

These regions may have a complicated border, with enclaves and enclaves within enclaves, and so on. This can be where we get (deterministic) chaos from. But what we usually find interesting is whether the perturbation keeps the old behavior intact or destroys it altogether. That is, how we can change where we are going.

That said, in practice, mathematicians don’t use tangent spaces to send pendulums swinging. They tend to come up when one is past studying such petty things as specific problems. They’re more often used in studying the ways that dynamical systems can behave. Tangent spaces themselves often get wrapped up into structures with names like tangent bundles. You’ll see them proving the existence of some properties, describing limit points and limit cycles and invariants and quite a bit of set theory. These can take us surprising places. It’s possible to use a tangent-space approach to prove the fundamental theorem of algebra, that every polynomial has at least one root. This seems to me the long way around to get there. But it is amazing to learn that is a place one can go.

I am so happy to be finally finishing Little 2021 Mathematics A-to-Z. All of this project’s essays should be at this link. And all my glossary essays from every year should be at this link. Thank you for reading.

## From my Sixth A-to-Z: Operator

One of the many small benefits of these essays is getting myself clearly grounded on terms that I had accepted without thinking much about. Operator, like functional (mentioned in here), is one of them. I’m sure that when these were first introduced my instructors gave them clear definitions. Buut when they’re first introduced it’s not clear why these are important, or that we are going to spend the rest of grad school talking about them. So this piece from 2019’s A-to-Z sequence secured my footing on a term I had a fair understanding of. You get some idea of what has to be intended from the context in which the term is used. Also from knowing how terms like this tend to be defined. But having it down to where I could certainly pass a true-false test about “is this an operator”? That was new.

Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there.

# Operator.

An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same sapce but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.

So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.

My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.

I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.

Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function $af + bg$ is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of $af + bg$ is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.

I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.

One of the conceptual leaps in high school algebra is that we start analyzing the things we do with numbers. Like, we don’t just take the number three, square it, multiply that by two and add to that the number three times four and add to that the number 1. We think about what if we take any number, call it x, and think of $2x^2 + 4x + 1$. And what if we make equations based on doing this $2x^2 + 4x + 1$; what values of x make those equations true? Or tell us something interesting?

Operators represent a similar leap. We can think of functions as things we manipulate, and think of those manipulations as a particular thing to do. For example, let me come up with a differential expression. For some function u(x) work out the value of this:

$2\frac{d^2 u(x)}{dx^2} + 4 \frac{d u(x)}{dx} + u(x)$

Let me join in the convention of using ‘D’ for the differential operator. Then we can rewrite this expression like so:

$2D^2 u + 4D u + u$

Suddenly the differential equation looks a lot like a polynomial. Of course it does. Remember that everything in mathematics is polynomials. We get new tools to solve differential equations by rewriting them as operators. That’s nice. It also scratches that itch that I think everyone in Intro to Calculus gets, of wanting to somehow see $\frac{d^2}{dx^2}$ as if it were a square of $\frac{d}{dx}$. It’s not, and $D^2$ is not the square of $D$. It’s composing $D$ with itself. But it looks close enough to squaring to feel comfortable.

Nobody needs to do $2D^2 u + 4D u + u$ except to learn some stuff about operators. But you might imagine a world where we did this process all the time. If we did, then we’d develop shorthand for it. Maybe a new operator, call it T, and define it that $T = 2D^2 + 4D + 1$. You see the grammar of treating functions as if they were real numbers becoming familiar. You maybe even noticed the ‘1’ sitting there, serving as the “identity operator”. You know how you’d write out $Tv(x) = 3$ if you needed to write it in full.

But there are operators that we use all the time. These do get special names, and often shorthand. For example, there’s the gradient operator. This applies to any function with several independent variables. The gradient has a great physical interpretation if the variables represent coordinates of space. If they do, the gradient of a function at a point gives us a vector that describes the direction in which the function increases fastest. And the size of that gradient — a functional on this operator — describes how fast that increase is.

The gradient itself defines more operators. These have names you get very familiar with in Vector Calculus, with names like divergence and curl. These have compelling physical interpretations if we think of the function we operate on as describing a moving fluid. A positive divergence means fluid is coming into the system; a negative divergence, that it is leaving. The curl, in fluids, describe how nearby streams of fluid move at different rate.

Physical interpretations are common in operators. This probably reflects how much influence physics has on mathematics and vice-versa. Anyone studying quantum mechanics gets familiar with a host of operators. These have comfortable names like “position operator” or “momentum operator” or “spin operator”. These are operators that apply to the wave function for a problem. They transform the wave function into a probability distribution. That distribution describes what positions or momentums or spins are likely, how likely they are. Or how unlikely they are.

They’re not all physical, though. Or not purely physical. Many operators are useful because they are powerful mathematical tools. There is a variation of the Fourier series called the Fourier transform. We can interpret this as an operator. Suppose the original function started out with time or space as its independent variable. This often happens. The Fourier transform operator gives us a new function, one with frequencies as independent variable. This can make the function easier to work with. The Fourier transform is an integral operator, by the way, so don’t go thinking everything is a complicated set of derivatives.

Another integral-based operator that’s important is the Laplace transform. This is a great operator because it turns differential equations into algebraic equations. Often, into polynomials. You saw that one coming.

This is all a lot of good press for operators. Well, they’re powerful tools. They help us to see that we can manipulate functions in the ways that functions let us manipulate numbers. It should sound good to realize there is much new that you can do, and you already know most of what’s needed to do it.

This and all the other Fall 2019 A To Z posts should be gathered here. And once I have the time to fiddle with tags I’ll have all past A to Z essays gathered at this link.

## From my Second A-to-Z: Orthonormal

For early 2016 — dubbed “Leap Day 2016” as that’s when it started — I got a request to explain orthogonal. I went in a different direction, although not completely different. This essay does get a bit more into specifics of how mathematicians use the idea, like, showing some calculations and such. I put in a casual description of vectors here. For book publication I’d want to rewrite that to be clearer that, like, ordered sets of numbers are just one (very common) way to represent vectors.

Jacob Kanev had requested “orthogonal” for this glossary. I’d be happy to oblige. But I used the word in last summer’s Mathematics A To Z. And I admit I’m tempted to just reprint that essay, since it would save some needed time. But I can do something more.

## Orthonormal.

“Orthogonal” is another word for “perpendicular”. Mathematicians use it for reasons I’m not precisely sure of. My belief is that it’s because “perpendicular” sounds like we’re talking about directions. And we want to extend the idea to things that aren’t necessarily directions. As majors, mathematicians learn orthogonality for vectors, things pointing in different directions. Then we extend it to other ideas. To functions, particularly, but we can also define it for spaces and for other stuff.

I was vague, last summer, about how we do that. We do it by creating a function called the “inner product”. That takes in two of whatever things we’re measuring and gives us a real number. If the inner product of two things is zero, then the two things are orthogonal.

The first example mathematics majors learn of this, before they even hear the words “inner product”, are dot products. These are for vectors, ordered sets of numbers. The dot product we find by matching up numbers in the corresponding slots for the two vectors, multiplying them together, and then adding up the products. For example. Give me the vector with values (1, 2, 3), and the other vector with values (-6, 5, -4). The inner product will be 1 times -6 (which is -6) plus 2 times 5 (which is 10) plus 3 times -4 (which is -12). So that’s -6 + 10 – 12 or -8.

So those vectors aren’t orthogonal. But how about the vectors (1, -1, 0) and (0, 0, 1)? Their dot product is 1 times 0 (which is 0) plus -1 times 0 (which is 0) plus 0 times 1 (which is 0). The vectors are perpendicular. And if you tried drawing this you’d see, yeah, they are. The first vector we’d draw as being inside a flat plane, and the second vector as pointing up, through that plane, like a thumbtack.

Well … the inner product can tell us something besides orthogonality. What happens if we take the inner product of a vector with itself? Say, (1, 2, 3) with itself? That’s going to be 1 times 1 (which is 1) plus 2 times 2 (4, according to rumor) plus 3 times 3 (which is 9). That’s 14, a tidy sum, although, so what?

The inner product of (-6, 5, -4) with itself? Oh, that’s some ugly numbers. Let’s skip it. How about the inner product of (1, -1, 0) with itself? That’ll be 1 times 1 (which is 1) plus -1 times -1 (which is positive 1) plus 0 times 0 (which is 0). That adds up to 2. And now, wait a minute. This might be something.

Start from somewhere. Move 1 unit to the east. (Don’t care what the unit is. Inches, kilometers, astronomical units, anything.) Then move -1 units to the north, or like normal people would say, 1 unit o the south. How far are you from the starting point? … Well, you’re the square root of 2 units away.

Now imagine starting from somewhere and moving 1 unit east, and then 2 units north, and then 3 units straight up, because you found a convenient elevator. How far are you from the starting point? This may take a moment of fiddling around with the Pythagorean theorem. But you’re the square root of 14 units away.

And what the heck, (0, 0, 1). The inner product of that with itself is 0 times 0 (which is zero) plus 0 times 0 (still zero) plus 1 times 1 (which is 1). That adds up to 1. And, yeah, if we go one unit straight up, we’re one unit away from where we started.

The inner product of a vector with itself gives us the square of the vector’s length. At least if we aren’t using some freak definition of inner products and lengths and vectors. And this is great! It means we can talk about the length — maybe better to say the size — of things that maybe don’t have obvious sizes.

Some stuff will have convenient sizes. For example, they’ll have size 1. The vector (0, 0, 1) was one such. So is (1, 0, 0). And you can think of another example easily. Yes, it’s $\left(\frac{1}{\sqrt{2}}, -\frac{1}{2}, \frac{1}{2}\right)$. (Go ahead, check!)

So by “orthonormal” we mean a collection of things that are orthogonal to each other, and that themselves are all of size 1. It’s a description of both what things are by themselves and how they relate to one another. A thing can’t be orthonormal by itself, for the same reason a line can’t be perpendicular to nothing in particular. But a pair of things might be orthogonal, and they might be the right length to be orthonormal too.

Why do this? Well, the same reasons we always do this. We can impose something like direction onto a problem. We might be able to break up a problem into simpler problems, one in each direction. We might at least be able to simplify the ways different directions are entangled. We might be able to write a problem’s solution as the sum of solutions to a standard set of representative simple problems. This one turns up all the time. And an orthogonal set of something is often a really good choice of a standard set of representative problems.

This sort of thing turns up a lot when solving differential equations. And those often turn up when we want to describe things that happen in the real world. So a good number of mathematicians develop a habit of looking for orthonormal sets.

## From my Sixth A-to-Z: Taylor Series

By the time of 2019 and my sixth A-to-Z series , I had some standard narrative tricks I could deploy. My insistence that everything is polynomials, for example. Anecdotes from my slight academic career. A prose style that emphasizes what we do with the idea of something rather than instructions. That last comes from the idea that if you wanted to know how to compute a Taylor series you’d just look it up on Mathworld or Wikipedia or whatnot. The thing a pop mathematics blog can do is give some reason that you’d want to know how to compute a Taylor series. I regret talking about functions that break Taylor series, though. I have to treat these essays as introducing the idea of a Taylor series to someone who doesn’t know anything about them. And it’s bad form to teach how stuff doesn’t work too close to teaching how it does work. Readers tend to blur what works and what doesn’t together. Still, $f(x) = \exp(-\frac{1}{x^2})$ is a really neat weird function and it’d be a shame to let it go completely unmentioned.

Today’s A To Z term was nominated by APMA, author of the Everybody Makes DATA blog. It was a topic that delighted me to realize I could explain. Then it started to torment me as I realized there is a lot to explain here, and I had to pick something. So here’s where things ended up.

# Taylor Series.

In the mid-2000s I was teaching at a department being closed down. In its last semester I had to teach Computational Quantum Mechanics. The person who’d normally taught it had transferred to another department. But a few last majors wanted the old department’s version of the course, and this pressed me into the role. Teaching a course you don’t really know is a rush. It’s a semester of learning, and trying to think deeply enough that you can convey something to students. This while all the regular demands of the semester eat your time and working energy. And this in the leap of faith that the syllabus you made up, before you truly knew the subject, will be nearly enough right. And that you have not committed to teaching something you do not understand.

So around mid-course I realized I needed to explain finding the wave function for a hydrogen atom with two electrons. The wave function is this probability distribution. You use it to find things like the probability a particle is in a certain area, or has a certain momentum. Things like that. A proton with one electron is as much as I’d ever done, as a physics major. We treat the proton as the center of the universe, immobile, and the electron hovers around that somewhere. Two electrons, though? A thing repelling your electron, and repelled by your electron, and neither of those having fixed positions? What the mathematics of that must look like terrified me. When I couldn’t procrastinate it farther I accepted my doom and read exactly what it was I should do.

It turned out I had known what I needed for nearly twenty years already. Got it in high school.

Of course I’m discussing Taylor Series. The equations were loaded down with symbols, yes. But at its core, the important stuff, was this old and trusted friend.

The premise behind a Taylor Series is even older than that. It’s universal. If you want to do something complicated, try doing the simplest thing that looks at all like it. And then make that a little bit more like you want. And then a bit more. Keep making these little improvements until you’ve got it as right as you truly need. Put that vaguely, the idea describes Taylor series just as well as it describes making a video game or painting a state portrait. We can make it more specific, though.

A series, in this context, means the sum of a sequence of things. This can be finitely many things. It can be infinitely many things. If the sum makes sense, we say the series converges. If the sum doesn’t, we say the series diverges. When we first learn about series, the sequences are all numbers. $1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \cdots$, for example, which diverges. (It adds to a number bigger than any finite number.) Or $1 + \frac{1}{2^2} + \frac{1}{3^2} + \frac{1}{4^2} + \cdots$, which converges. (It adds to $\frac{1}{6}\pi^2$.)

In a Taylor Series, the terms are all polynomials. They’re simple polynomials. Let me call the independent variable ‘x’. Sometimes it’s ‘z’, for the reasons you would expect. (‘x’ usually implies we’re looking at real-valued functions. ‘z’ usually implies we’re looking at complex-valued functions. ‘t’ implies it’s a real-valued function with an independent variable that represents time.) Each of these terms is simple. Each term is the distance between x and a reference point, raised to a whole power, and multiplied by some coefficient. The reference point is the same for every term. What makes this potent is that we use, potentially, many terms. Infinitely many terms, if need be.

Call the reference point ‘a’. Or if you prefer, x0. z0 if you want to work with z’s. You see the pattern. This ‘a’ is the “point of expansion”. The coefficients of each term depend on the original function at the point of expansion. The coefficient for the term that has $(x - a)$ is the first derivative of f, evaluated at a. The coefficient for the term that has $(x - a)^2$ is the second derivative of f, evaluated at a (times a number that’s the same for the squared-term for every Taylor Series). The coefficient for the term that has $(x - a)^3$ is the third derivative of f, evaluated at a (times a different number that’s the same for the cubed-term for every Taylor Series).

You’ll never guess what the coefficient for the term with $(x - a)^{122,743}$ is. Nor will you ever care. The only reason you would wish to is to answer an exam question. The instructor will, in that case, have a function that’s either the sine or the cosine of x. The point of expansion will be 0, $\frac{\pi}{2}$, $\pi$, or $\frac{3\pi}{2}$.

Otherwise you will trust that this is one of the terms of $(x - a)^n$, ‘n’ representing some counting number too great to be interesting. All the interesting work will be done with the Taylor series either truncated to a couple terms, or continued on to infinitely many.

What a Taylor series offers is the chance to approximate a function we’re genuinely interested in with a polynomial. This is worth doing, usually, because polynomials are easier to work with. They have nice analytic properties. We can automate taking their derivatives and integrals. We can set a computer to calculate their value at some point, if we need that. We might have no idea how to start calculating the logarithm of 1.3. We certainly have an idea how to start calculating $0.3 - \frac{1}{2}(0.3^2) + \frac{1}{3}(0.3^3)$. (Yes, it’s 0.3. I’m using a Taylor series with a = 1 as the point of expansion.)

The first couple terms tell us interesting things. Especially if we’re looking at a function that represents something physical. The first two terms tell us where an equilibrium might be. The next term tells us whether an equilibrium is stable or not. If it is stable, it tells us how perturbations, points near the equilibrium, behave.

The first couple terms will describe a line, or a quadratic, or a cubic, some simple function like that. Usually adding more terms will make this Taylor series approximation a better fit to the original. There might be a larger region where the polynomial and the original function are close enough. Or the difference between the polynomial and the original function will be closer together on the same old region.

We would really like that region to eventually grow to the whole domain of the original function. We can’t count on that, though. Roughly, the interval of convergence will stretch from ‘a’ to wherever the first weird thing happens. Weird things are, like, discontinuities. Vertical asymptotes. Anything you don’t like dealing with in the original function, the Taylor series will refuse to deal with. Outside that interval, the Taylor series diverges and we just can’t use it for anything meaningful. Which is almost supernaturally weird of them. The Taylor series uses information about the original function, but it’s all derivatives at a single point. Somehow the derivatives of, say, the logarithm of x around x = 1 give a hint that the logarithm of 0 is undefinable. And so they won’t help us calculate the logarithm of 3.

Things can be weirder. There are functions that just break Taylor series altogether. Some are obvious. A function needs lots of derivatives at a point to have a good Taylor series approximation. So, many fractal curves won’t have a Taylor series approximation. These curves are all corners, points where they aren’t continuous or where derivatives don’t exist. Some are obviously designed to break Taylor series approximations. We can make a function that follows different rules if x is rational than if x is irrational. There’s no approximating that, and you’d blame the person who made such a function, not the Taylor series. It can be subtle. The function defined by the rule $f(x) = \exp{-\frac{1}{x^2}}$, with the note that if x is zero then f(x) is 0, seems to satisfy everything we’d look for. It’s a function that’s mostly near 1, that drops down to being near zero around where x = 0. But its Taylor series expansion around a = 0 is a horizontal line always at 0. The interval of convergence can be a single point, challenging our idea of what an interval is.

That’s all right. If we can trust that we’re avoiding weird parts, Taylor series give us an outstanding new tool. Grant that the Taylor series describes a function with the same rule as our original function. The Taylor series is often easier to work with, especially if we’re working on differential equations. We can automate, or at least find formulas for, taking the derivative of a polynomial. Or adding together derivatives of polynomials. Often we can attack a differential equation too hard to solve otherwise by supposing the answer is a polynomial. This is essentially what that quantum mechanics problem used, and why the tool was so familiar when I was in a strange land.

Roughly. What I was actually doing was treating the function I wanted as a power series. This is, like the Taylor series, the sum of a sequence of terms, all of which are $(x - a)^n$ times some coefficient. What makes it not a Taylor series is that the coefficients weren’t the derivatives of any function I knew to start. But the experience of Taylor series trained me to look at functions as things which could be approximated by polynomials.

This gives us the hint to look at other series that approximate interesting functions. We get a host of these, with names like Laurent series and Fourier series and Chebyshev series and such. Laurent series look like Taylor series but we allow powers to be negative integers as well as positive ones. Fourier series do away with polynomials. They instead use trigonometric functions, sines and cosines. Chebyshev series build on polynomials, but not on pure powers. They’ll use orthogonal polynomials. These behave like perpendicular directions do. That orthogonality makes many numerical techniques behave better.

The Taylor series is a great introduction to these tools. Its first several terms have good physical interpretations. Its calculation requires tools we learn early on in calculus. The habits of thought it teaches guides us even in unfamiliar territory.

And I feel very relieved to be done with this. I often have a few false starts to an essay, but those are mostly before I commit words to text editor. This one had about four branches that now sit in my scrap file. I’m glad to have a deadline forcing me to just publish already.

Thank you, though. This and the essays for the Fall 2019 A to Z should be at this link. Next week: the letters U and V. And all past A to Z essays ought to be at this link.

## My All 2020 Mathematics A to Z: Wronskian

Today’s is another topic suggested by Mr Wu, author of the Singapore Maths Tuition blog. The Wronskian is named for Józef Maria Hoëne-Wroński, a Polish mathematician, born in 1778. He served in General Tadeusz Kosciuszko’s army in the 1794 Kosciuszko Uprising. After being captured and forced to serve in the Russian army, he moved to France. He kicked around Western Europe and its mathematical and scientific circles. I’d like to say this was all creative and insightful, but, well. Wikipedia describes him trying to build a perpetual motion machine. Trying to square the circle (also impossible). Building a machine to predict the future. The St Andrews mathematical biography notes his writing a summary of “the general solution of the fifth degree [polynomial] equation”. This doesn’t exist.

Both sources, though, admit that for all that he got wrong, there were flashes of insight and brilliance in his work. The St Andrews biography particularly notes that Wronski’s tables of logarithms were well-designed. This is a hard thing to feel impressed by. But it’s hard to balance information so that it’s compact yet useful. He wrote about the Wronskian in 1812; it wouldn’t be named for him until 1882. This was 29 years after his death, but it does seem likely he’d have enjoyed having a familiar thing named for him. I suspect he wouldn’t enjoy my next paragraph, but would enjoy the fight with me about it.

# Wronskian.

The Wronskian is a thing put into Introduction to Ordinary Differential Equations courses because students must suffer in atonement for their sins. Those who fail to reform enough must go on to the Hessian, in Partial Differential Equations.

To be more precise, the Wronskian is the determinant of a matrix. The determinant you find by adding and subtracting products of the elements in a matrix together. It’s not hard, but it is tedious, and gets more tedious pretty fast as the matrix gets bigger. (In Big-O notation, it’s the order of the cube of the matrix size. This is rough, for things humans do, although not bad as algorithms go.) The matrix here is made up of a bunch of functions and their derivatives. The functions need to be ones of a single variable. The derivatives, you need first, second, third, and so on, up to one less than the number of functions you have.

If you have two functions, $f$ and $g$, you need their first derivatives, $f'$ and $g'$. If you have three functions, $f$, $g$, and $h$, you need first derivatives, $f'$, $g'$, and $h'$, as well as second derivatives, $f''$, $g''$, and $h''$. If you have $N$ functions and here I’ll call them $f_1, f_2, f_3, \cdots f_N$, you need $N-1$ derivatives, $f'_1, f''_1, f'''_1, \cdots f^{(N-1)}_1$ and so on through $f^{(N-1)}_N$. You see right away this is a fun and exciting thing to calculate. Also why in intro to differential equations you only work this out with two or three functions. Maybe four functions if the class has been really naughty.

Go through your $N$ functions and your $N-1$ derivatives and make a big square matrix. And then you go through calculating the derivative. This involves a lot of multiplying strings of these derivatives together. It’s a lot of work. But at least doing all this work gets you older.

So one will ask why do all this? Why fit it into every Intro to Ordinary Differential Equations textbook and why slip it in to classes that have enough stuff going on?

One answer is that if the Wronskian is not zero for some values of the independent variable, then the functions that went into it are linearly independent. Mathematicians learn to like sets of linearly independent functions. We can treat functions like directions in space. Linear independence assures us none of these functions are redundant, pointing a way we already can describe. (Real people see nothing wrong in having north, east, and northeast as directions. But mathematicians would like as few directions in our set as possible.) The Wronskian being zero for every value of the independent variable seems like it should tell us the functions are linearly dependent. It doesn’t, not without some more constraints on the functions.

This is fine, but who cares? And, unfortunately, in Intro it’s hard to reach a strong reason to care. To this major, the emphasis on linearly independent functions felt misplaced. It’s the sort of thing we care about in linear algebra. Or some course where we talk about vector spaces. Differential equations do lead us into vector spaces. It’s hard to find a corner of analysis that doesn’t.

Every ordinary differential equation has a secret picture. This is a vector field. One axis in the field is the independent variable of the function. The other axes are the value of the function. And maybe its derivatives, depending on how many derivatives are used in the ordinary differential equation. To solve one particular differential equation is to find one path in this field. People who just use differential equations will want to find one path.

Mathematicians tend to be fine with finding one path. But they want to find what kinds of paths there can be. Are there paths which the differential equation picks out, by making paths near it stay near? Or by making paths that run away from it? And here is the value of the Wronskian. The Wronskian tells us about the divergence of this vector field. This gives us insight to how these paths behave. It’s in the same way that knowing where high- and low-pressure systems are describes how the weather will change. The Wronskian, by way of a thing called Liouville’s Theorem that I haven’t the strength to describe today, ties in to the Hamiltonian. And the Hamiltonian we see in almost every mechanics problem of note.

You can see where the mathematics PhD, or the physicist, would find this interesting. But what about the student, who would look at the symbols evoked by those paragraphs above with reasonable horror?

And here’s the second answer for what the Wronskian is good for. It helps us solve ordinary differential equations. Like, particular ones. An ordinary differential equation will (normally) have several linearly independent solutions. If you know all but one of those solutions, it’s possible to calculate the Wronskian and, from that, the last of the independent solutions. Since a big chunk of mathematics — particularly for science or engineering — is solving differential equations you see why this is something valuable. Allow that it’s tedious. Tedious work we can automate, or give to research assistant to do.

One then asks what kind of differential equation would have all-but-one answer findable, and yield that last one only by long efforts of hard work. So let me show you an example ordinary differential equation:

$y'' + a(x) y' + b(x) y = g(x)$

Here $a(x)$, $b(x)$, and $g(x)$ are some functions that depend only on the independent variable, $x$. Don’t know what they are; don’t care. The differential equation is a lot easier of $a(x)$ and $b(x)$ are constants, but we don’t insist on that.

This equation has a close cousin, and one that’s easier to solve than the original. Is cousin is called a homogeneous equation:

$y'' + a(x) y' + b(x) y = 0$

The left-hand-side, the parts with the function $y$ that we want to find, is the same. It’s the right-hand-side that’s different, that’s a constant zero. This is what makes the new equation homogenous. This homogenous equation is easier and we can expect to find two functions, $y_1$ and $y_2$, that solve it. If $a(x)$ and $b(x)$ are constant this is even easy. Even if they’re not, if you can find one solution, the Wronskian lets you generate the second.

That’s nice for the homogenous equation. But if we care about the original, inhomogenous one? The Wronskian serves us there too. Imagine that the inhomogenous solution has any solution, which we’ll call $y_p$. (The ‘p’ stands for ‘particular’, as in “the solution for this particular $g(x)$”.) But $y_p + y_1$ also has to solve that inhomogenous differential equation. It seems startling but if you work it out, it’s so. (The key is the derivative of the sum of functions is the same as the sum of the derivative of functions.) $y_p + y_2$ also has to solve that inhomogenous differential equation. In fact, for any constants $C_1$ and $C_2$, it has to be that $y_p + C_1 y_1 + C_2 y_2$ is a solution.

I’ll skip the derivation; you have Wikipedia for that. The key is that knowing these homogenous solutions, and the Wronskian, and the original $g(x)$, will let you find the $y_p$ that you really want.

My reading is that this is more useful in proving things true about differential equations, rather than particularly solving them. It takes a lot of paper and I don’t blame anyone not wanting to do it. But it’s a wonder that it works, and so well.

Don’t make your instructor so mad you have to do the Wronskian for four functions.

This and all the others in My 2020 A-to-Z essays should be at this link. All the essays from every A-to-Z series should be at this link. Thank you for reading.

## My All 2020 Mathematics A to Z: Velocity

I’m happy to be back with long-form pieces. This week’s is another topic suggested by Mr Wu, of the Singapore Maths Tuition blog.

# Velocity.

This is easy. The velocity is the first derivative of the position. First derivative with respect to time, if you must know. That hardly needed an extra week to write.

Yes, there’s more. There is always more. Velocity is important by itself. It’s also important for guiding us into new ideas. There are many. One idea is that it’s often the first good example of vectors. Many things can be vectors, as mathematicians see them. But the ones we think of most often are “some magnitude, in some direction”.

The position of things, in space, we describe with vectors. But somehow velocity, the changes of positions, seems more significant. I suspect we often find static things below our interest. I remember as a physics major that my Intro to Mechanics instructor skipped Statics altogether. There are many important things, like bridges and roofs and roller coaster supports, that we find interesting because they don’t move. But the real Intro to Mechanics is stuff in motion. Balls rolling down inclined planes. Pendulums. Blocks on springs. Also planets. (And bridges and roofs and roller coaster supports wouldn’t work if they didn’t move a bit. It’s not much though.)

So velocity shows us vectors. Anything could, in principle, be moving in any direction, with any speed. We can imagine a thing in motion inside a room that’s in motion, its net velocity being the sum of two vectors.

And they show us derivatives. A compelling answer to “what does differentiation mean?” is “it’s the rate at which something changes”. Properly, we can take the derivative of any quantity with respect to any variable. But there are some that make sense to do, and position with respect to time is one. Anyone who’s tried to catch a ball understands the interest in knowing.

We take derivatives with respect to time so often we have shorthands for it, by putting a ‘ mark after, or a dot above, the variable. So if x is the position (and it often is), then $x'$ is the velocity. If we want to emphasize we think of vectors, $\vec{x}$ is the position and $\vec{x}'$ the velocity.

Velocity has another common shorthand. This is $v$, or if we want to emphasize its vector nature, $\vec{v}$. Why a name besides the good enough $\vec{x}'$? It helps us avoid misplacing a ‘ mark in our work, for one. And giving velocity a separate symbol encourages us to think of the velocity as independent from the position. It’s not — not exactly — independent. But knowing that a thing is in the lawn outside tells us nothing about how it’s moving. Velocity affects position, in a process so familiar we rarely consider how there’s parts we don’t understand about it. But velocity is also somehow also free of the position at an instant.

Velocity also guides us into a first understanding of how to take derivatives. Thinking of the change in position over smaller and smaller time intervals gets us to the “instantaneous” velocity by doing only things we can imagine doing with a ruler and a stopwatch.

Velocity has a velocity. $\vec{v}'$, also known as $\vec{a}$. Or, if we’re sure we won’t lose a ‘ mark, $\vec{x}''$. Once we are comfortable thinking of how position changes in time we can think of other changes. Velocity’s change in time we call acceleration. This is also a vector, more abstract than position or velocity. Multiply the acceleration by the mass of the thing accelerating and we have a vector called the “force”. That, we at least feel we understand, and can work with.

Acceleration has a velocity too, a rate of change in time. It’s called the “jerk” by people telling you the change in acceleration in time is called the “jerk”. (I don’t see the term used in the wild, but admit my experience is limited.) And so on. We could, in principle, keep taking derivatives of the position and keep finding new changes. But most physics problems we find interesting use just a couple of derivatives of the position. We can label them, if we need, $\vec{x}^{(n)}$, where n is some big enough number like 4.

We can bundle them in interesting ways, though. Come back to that mention of treating position and velocity of something as though they were independent coordinates. It’s a useful perspective. Imagine the rules about how particles interacting with one another and with their environment. These usually have explicit roles for position and velocity. (Granting this may reflect a selection bias. But these do cover enough interesting problems to fill a career.)

So we create a new vector. It’s made of the positition and the velocity. We’d write it out as $(x, v)^T$. The superscript-T there, “transposition”, lets us use the tools of matrix algebra. This vector describes a point in phase space. Phase space is the collection of all the physically possible positions and velocities for the system.

What’s the derivative, in time, of this point in phase space? Glad to say we can do this piece by piece. The derivative of a vector is the derivative of each component of a vector. So the derivative of $(x, v)^T$ is $(x', v')^T$, or, $(v, a)^T$. This acceleration itself depends on, normally, the positions and velocities. So we can describe this as $(v, f(x, v))^T$ for some function $f(x, v)$. You are surely impressed with this symbol-shuffling. You are less sure why this bother.

The bother is a trick of ordinary differential equations. All differential equations are about how a function-to-be-determined and its derivatives relate to one another. In ordinary differential equations, the function-to-be-determined depends on a single variable. Usually it’s called x or t. There may be many derivatives of f. This symbol-shuffling rewriting takes away those higher-order derivatives. We rewrite the equation as a vector equation of just one order. There’s some point in phase space, and we know what its velocity is. That we do because in this form many problems can be written as a matrix problem: $\vec{x}' = A\vec{x}$. Or approximate our problem as a matrix problem. This lets us bring in linear algebra tools, and that’s worthwhile.

It also lets us bring in numerical tools. Numerical mathematics has developed many methods to solve the ordinary differential equation $x' = f(x)$. Most of them extend to $\vec{x}' = f(\vec{x})$. The result is a classic mathematician’s trick. We can recast a problem as one we have better tools to solve.

It calls on a more abstract idea of what a “velocity” might be. We can explain what the thing that’s “moving” and what it’s moving through are, given time. But the instincts we develop from watching ordinary things move help us in these new territories. This is also a classic mathematician’s trick. It may seem like all mathematicians do is develop tricks to extend what they already do. I can’t say this is wrong.

Thank you all for reading and for putting up with my gap week. This and all of my 2020 A-to-Z essays should be at this link. All the essays from every A-to-Z series should be at this link.

## My All 2020 Mathematics A to Z: Delta

I have Dina Yagodich to thank for my inspiration this week. As will happen with these topics about something fundamental, this proved to be a hard topic to think about. I don’t know of any creative or professional projects Yagodich would like me to mention. I’ll pass them on if I learn of any.

# Delta.

In May 1962 Mercury astronaut Deke Slayton did not orbit the Earth. He had been grounded for (of course) a rare medical condition. Before his grounding he had selected his flight’s callsign and capsule name: Delta 7. His backup, Wally Schirra, who did not fly in Slayton’s place, named his capsule the Sigma 7. Schirra chose sigma for its mathematical and scientific meaning, representing the sum of (in principle) many parts. Slayton said he chose Delta only because he would have been the fourth American into space and Δ is the fourth letter of the Greek alphabet. I believe it, but do notice how D is so prominent a letter in Slayton’s name. And S, Σ, prominent in both Slayton and Schirra’s.

Δ is also a prominent mathematics and engineering symbol. It has several meanings, with several of the most useful ones escaping mathematics and becoming vaguely known things. They blur together, as ideas that are useful and related and not identical will do.

If “Δ” evokes anything mathematical to a person it is “change”. This probably owes to space in the popular imagination. Astronauts talking about the delta-vee needed to return to Earth is some of the most accessible technical talk of Apollo 13, to pick one movie. After that it’s easy to think of pumping the car’s breaks as shedding some delta-vee. It secondarily owes to school, high school algebra classes testing people on their ability to tell how steep a line is. This gets described as the change-in-y over the change-in-x, or the delta-y over delta-x.

Δ prepended to a variable like x or y or v we read as “the change in”. It fits the astronaut and the algebra uses well. The letter Δ by itself means as much as the words “the change in” do. It describes what we’re thinking about, but waits for a noun to complete. We say “the” rather than “a”, I’ve noticed. The change in velocity needed to reach Earth may be one thing. But “the” change in x and y coordinates to find the slope of a line? We can use infinitely many possible changes and get a good result. We must say “the” because we consider one at a time.

Used like this Δ acts like an operator. It means something like “a difference between two values of the variable ” and lets us fill in the blank. How to pick those two values? Sometimes there’s a compelling choice. We often want to study data sampled at some schedule. The Δ then is between one sample’s value and the next. Or between the last sample value and the current one. Which is correct? Ask someone who specializes in difference equations. These are the usually numeric approximations to differential equations. They turn up often in signal processing or in understanding the flows of fluids or the interactions of particles. We like those because computers can solve them.

Δ, as this operator, can even be applied to itself. You read ΔΔ x as “the change in the change in x”. The prose is stilted, but we can understand it. It’s how the change in x has itself changed. We can imagine being interested in this Δ2 x. We can see this as a numerical approximation to the second derivative of x, and this gets us back to differential equations. There are similar results for ΔΔΔ x even if we don’t wish to read it all out.

In principle, Δ x can be any number. In practice, at least for an independent variable, it’s a small number, usually real. Often we’re lured into thinking of it as positive, because a phrase like “x + Δ x” looks like we’re making a number a little bigger than x. When you’re a mathematician or a quality-control tester you remember to consider “what if Δ x is negative”. From testing that learn you wrote your computer code wrong. We’re less likely to assume this positive-ness for the dependent variable. By the time we do enough mathematics to have opinions we’ve seen too many decreasing functions to overlook that Δ y might be negative.

Notice that in that last paragraph I faithfully wrote Δ x and Δ y. Never Δ bare, unless I forgot and cannot find it in copy-editing. I’ve said that Δ means “the change in”; to write it without some variable is like writing √ by itself. We can understand wishing to talk about “the square root of”, as a concept. Still it means something else than √ x does.

We do write Δ by itself. Even professionals do. Written like this we don’t mean “the change in [ something ]”. We instead mean “a number”. In this role the symbol means the same thing as x or y or t might, a way to refer to a number whose value we might not know. We might not care about. The implication is that it’s small, at least if it’s something to add to the independent variable. We use it when we ponder how things would be different if there were a small change in something.

Small but not tiny. Here we step into mathematics as a language, which can be as quirky and ambiguous as English. Because sometimes we use the lower-case δ. And this also means “a small number”. It connotes a smaller number than Δ. Is 0.01 a suitable value for Δ? Or for δ? Maybe. My inclination would be to think of that as Δ, reserving δ for “a small number of value we don’t care to specify”. This may be my quirk. Others might see it different.

We will use this lowercase δ as an operator too, thinking of things like “x + δ x”. As you’d guess, δ x connotes a small change in x. Smaller than would earn the title Δ x. There is no declaring how much smaller. It’s contextual. As with δ bare, my tendency is to think that Δ x might be a specific number but that δ x is “a perturbation”, the general idea of a small number. We can understand many interesting problems as a small change from something we already understand. That small change often earns such a δ operator.

There are smaller changes than δ x. There are infinitesimal differences. This is our attempt to make sense of “a number as close to zero as you can get without being zero”. We forego the Greek letters for this and revert to Roman letters: dx and dy and dt and the other marks of differential calculus. These are difficult numbers to discuss. It took more than a century of mathematicians’ work to find a way our experience with Δ x could inform us about dx. (We do not use ‘d’ alone to mean an even smaller change than δ. Sometimes we will in analysis write d with a space beside it, waiting for a variable to have its differential taken. I feel unsettled when I see it.)

Much of the completion of work we can credit to Augustin Cauchy, who’s credited with about 800 publications. It’s an intimidating record, even before considering its importance. Cauchy is, per Florian Cajori’s History Mathematical Notations, one of the persons we can credit with the use of Δ as symbol for “the change in”. (Section 610.) He’s not the only one. Leonhardt Euler and Johann Bernoulli (section 640) used Δ to represent a finite difference, the difference between two values.

I’m not aware of an explicit statement why Δ got the pick, as opposed to other letters. It’s hard to imagine a reason besides “difference starts with d”. That an etymology seems obvious does not make it so. It does seem to have a more compelling explanation than the use of “m” for the slope of a line, or $\frac{\Delta y}{\Delta x}$, though.

Slayton’s Mercury flight, performed by Scott Carpenter, did not involve any appreciable changes in orbit, a Δ v. No crewed spacecraft would until Gemini III. The Mercury flight did involve tests in orienting the spacecraft, in Δ θ and Δ φ on the angles of the spacecraft’s direction. These might have been in Slayton’s mind. He eventually flew into space on the Apollo-Soyuz Test Project, when an accident during landing exposed the crew to toxic gases. The investigation discovered a lesion on Slayton’s lung. A tiny thing, ultimately benign, which discovered earlier could have kicked him off the mission and altered his life so.

Thank you all for reading. I’m gathering all my 2020 A-to-Z essays at this link, and have all my A-to-Z essays of any kind at this link. Here is hoping there’s a good week ahead.

## My 2019 Mathematics A To Z: Taylor Series

Today’s A To Z term was nominated by APMA, author of the Everybody Makes DATA blog. It was a topic that delighted me to realize I could explain. Then it started to torment me as I realized there is a lot to explain here, and I had to pick something. So here’s where things ended up.

# Taylor Series.

In the mid-2000s I was teaching at a department being closed down. In its last semester I had to teach Computational Quantum Mechanics. The person who’d normally taught it had transferred to another department. But a few last majors wanted the old department’s version of the course, and this pressed me into the role. Teaching a course you don’t really know is a rush. It’s a semester of learning, and trying to think deeply enough that you can convey something to students. This while all the regular demands of the semester eat your time and working energy. And this in the leap of faith that the syllabus you made up, before you truly knew the subject, will be nearly enough right. And that you have not committed to teaching something you do not understand.

So around mid-course I realized I needed to explain finding the wave function for a hydrogen atom with two electrons. The wave function is this probability distribution. You use it to find things like the probability a particle is in a certain area, or has a certain momentum. Things like that. A proton with one electron is as much as I’d ever done, as a physics major. We treat the proton as the center of the universe, immobile, and the electron hovers around that somewhere. Two electrons, though? A thing repelling your electron, and repelled by your electron, and neither of those having fixed positions? What the mathematics of that must look like terrified me. When I couldn’t procrastinate it farther I accepted my doom and read exactly what it was I should do.

It turned out I had known what I needed for nearly twenty years already. Got it in high school.

Of course I’m discussing Taylor Series. The equations were loaded down with symbols, yes. But at its core, the important stuff, was this old and trusted friend.

The premise behind a Taylor Series is even older than that. It’s universal. If you want to do something complicated, try doing the simplest thing that looks at all like it. And then make that a little bit more like you want. And then a bit more. Keep making these little improvements until you’ve got it as right as you truly need. Put that vaguely, the idea describes Taylor series just as well as it describes making a video game or painting a state portrait. We can make it more specific, though.

A series, in this context, means the sum of a sequence of things. This can be finitely many things. It can be infinitely many things. If the sum makes sense, we say the series converges. If the sum doesn’t, we say the series diverges. When we first learn about series, the sequences are all numbers. $1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \cdots$, for example, which diverges. (It adds to a number bigger than any finite number.) Or $1 + \frac{1}{2^2} + \frac{1}{3^2} + \frac{1}{4^2} + \cdots$, which converges. (It adds to $\frac{1}{6}\pi^2$.)

In a Taylor Series, the terms are all polynomials. They’re simple polynomials. Let me call the independent variable ‘x’. Sometimes it’s ‘z’, for the reasons you would expect. (‘x’ usually implies we’re looking at real-valued functions. ‘z’ usually implies we’re looking at complex-valued functions. ‘t’ implies it’s a real-valued function with an independent variable that represents time.) Each of these terms is simple. Each term is the distance between x and a reference point, raised to a whole power, and multiplied by some coefficient. The reference point is the same for every term. What makes this potent is that we use, potentially, many terms. Infinitely many terms, if need be.

Call the reference point ‘a’. Or if you prefer, x0. z0 if you want to work with z’s. You see the pattern. This ‘a’ is the “point of expansion”. The coefficients of each term depend on the original function at the point of expansion. The coefficient for the term that has $(x - a)$ is the first derivative of f, evaluated at a. The coefficient for the term that has $(x - a)^2$ is the second derivative of f, evaluated at a (times a number that’s the same for the squared-term for every Taylor Series). The coefficient for the term that has $(x - a)^3$ is the third derivative of f, evaluated at a (times a different number that’s the same for the cubed-term for every Taylor Series).

You’ll never guess what the coefficient for the term with $(x - a)^{122,743}$ is. Nor will you ever care. The only reason you would wish to is to answer an exam question. The instructor will, in that case, have a function that’s either the sine or the cosine of x. The point of expansion will be 0, $\frac{\pi}{2}$, $\pi$, or $\frac{3\pi}{2}$.

Otherwise you will trust that this is one of the terms of $(x - a)^n$, ‘n’ representing some counting number too great to be interesting. All the interesting work will be done with the Taylor series either truncated to a couple terms, or continued on to infinitely many.

What a Taylor series offers is the chance to approximate a function we’re genuinely interested in with a polynomial. This is worth doing, usually, because polynomials are easier to work with. They have nice analytic properties. We can automate taking their derivatives and integrals. We can set a computer to calculate their value at some point, if we need that. We might have no idea how to start calculating the logarithm of 1.3. We certainly have an idea how to start calculating $0.3 - \frac{1}{2}(0.3^2) + \frac{1}{3}(0.3^3)$. (Yes, it’s 0.3. I’m using a Taylor series with a = 1 as the point of expansion.)

The first couple terms tell us interesting things. Especially if we’re looking at a function that represents something physical. The first two terms tell us where an equilibrium might be. The next term tells us whether an equilibrium is stable or not. If it is stable, it tells us how perturbations, points near the equilibrium, behave.

The first couple terms will describe a line, or a quadratic, or a cubic, some simple function like that. Usually adding more terms will make this Taylor series approximation a better fit to the original. There might be a larger region where the polynomial and the original function are close enough. Or the difference between the polynomial and the original function will be closer together on the same old region.

We would really like that region to eventually grow to the whole domain of the original function. We can’t count on that, though. Roughly, the interval of convergence will stretch from ‘a’ to wherever the first weird thing happens. Weird things are, like, discontinuities. Vertical asymptotes. Anything you don’t like dealing with in the original function, the Taylor series will refuse to deal with. Outside that interval, the Taylor series diverges and we just can’t use it for anything meaningful. Which is almost supernaturally weird of them. The Taylor series uses information about the original function, but it’s all derivatives at a single point. Somehow the derivatives of, say, the logarithm of x around x = 1 give a hint that the logarithm of 0 is undefinable. And so they won’t help us calculate the logarithm of 3.

Things can be weirder. There are functions that just break Taylor series altogether. Some are obvious. A function needs lots of derivatives at a point to have a good Taylor series approximation. So, many fractal curves won’t have a Taylor series approximation. These curves are all corners, points where they aren’t continuous or where derivatives don’t exist. Some are obviously designed to break Taylor series approximations. We can make a function that follows different rules if x is rational than if x is irrational. There’s no approximating that, and you’d blame the person who made such a function, not the Taylor series. It can be subtle. The function defined by the rule $f(x) = \exp{-\frac{1}{x^2}}$, with the note that if x is zero then f(x) is 0, seems to satisfy everything we’d look for. It’s a function that’s mostly near 1, that drops down to being near zero around where x = 0. But its Taylor series expansion around a = 0 is a horizontal line always at 0. The interval of convergence can be a single point, challenging our idea of what an interval is.

That’s all right. If we can trust that we’re avoiding weird parts, Taylor series give us an outstanding new tool. Grant that the Taylor series describes a function with the same rule as our original function. The Taylor series is often easier to work with, especially if we’re working on differential equations. We can automate, or at least find formulas for, taking the derivative of a polynomial. Or adding together derivatives of polynomials. Often we can attack a differential equation too hard to solve otherwise by supposing the answer is a polynomial. This is essentially what that quantum mechanics problem used, and why the tool was so familiar when I was in a strange land.

Roughly. What I was actually doing was treating the function I wanted as a power series. This is, like the Taylor series, the sum of a sequence of terms, all of which are $(x - a)^n$ times some coefficient. What makes it not a Taylor series is that the coefficients weren’t the derivatives of any function I knew to start. But the experience of Taylor series trained me to look at functions as things which could be approximated by polynomials.

This gives us the hint to look at other series that approximate interesting functions. We get a host of these, with names like Laurent series and Fourier series and Chebyshev series and such. Laurent series look like Taylor series but we allow powers to be negative integers as well as positive ones. Fourier series do away with polynomials. They instead use trigonometric functions, sines and cosines. Chebyshev series build on polynomials, but not on pure powers. They’ll use orthogonal polynomials. These behave like perpendicular directions do. That orthogonality makes many numerical techniques behave better.

The Taylor series is a great introduction to these tools. Its first several terms have good physical interpretations. Its calculation requires tools we learn early on in calculus. The habits of thought it teaches guides us even in unfamiliar territory.

And I feel very relieved to be done with this. I often have a few false starts to an essay, but those are mostly before I commit words to text editor. This one had about four branches that now sit in my scrap file. I’m glad to have a deadline forcing me to just publish already.

Thank you, though. This and the essays for the Fall 2019 A to Z should be at this link. Next week: the letters U and V. And all past A to Z essays ought to be at this link.

## My 2019 Mathematics A To Z: Operator

Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there.

# Operator.

An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same space but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.

So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.

My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.

I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.

Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function $af + bg$ is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of $af + bg$ is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.

I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.

One of the conceptual leaps in high school algebra is that we start analyzing the things we do with numbers. Like, we don’t just take the number three, square it, multiply that by two and add to that the number three times four and add to that the number 1. We think about what if we take any number, call it x, and think of $2x^2 + 4x + 1$. And what if we make equations based on doing this $2x^2 + 4x + 1$; what values of x make those equations true? Or tell us something interesting?

Operators represent a similar leap. We can think of functions as things we manipulate, and think of those manipulations as a particular thing to do. For example, let me come up with a differential expression. For some function u(x) work out the value of this:

$2\frac{d^2 u(x)}{dx^2} + 4 \frac{d u(x)}{dx} + u(x)$

Let me join in the convention of using ‘D’ for the differential operator. Then we can rewrite this expression like so:

$2D^2 u + 4D u + u$

Suddenly the differential equation looks a lot like a polynomial. Of course it does. Remember that everything in mathematics is polynomials. We get new tools to solve differential equations by rewriting them as operators. That’s nice. It also scratches that itch that I think everyone in Intro to Calculus gets, of wanting to somehow see $\frac{d^2}{dx^2}$ as if it were a square of $\frac{d}{dx}$. It’s not, and $D^2$ is not the square of $D$. It’s composing $D$ with itself. But it looks close enough to squaring to feel comfortable.

Nobody needs to do $2D^2 u + 4D u + u$ except to learn some stuff about operators. But you might imagine a world where we did this process all the time. If we did, then we’d develop shorthand for it. Maybe a new operator, call it T, and define it that $T = 2D^2 + 4D + 1$. You see the grammar of treating functions as if they were real numbers becoming familiar. You maybe even noticed the ‘1’ sitting there, serving as the “identity operator”. You know how you’d write out $Tv(x) = 3$ if you needed to write it in full.

But there are operators that we use all the time. These do get special names, and often shorthand. For example, there’s the gradient operator. This applies to any function with several independent variables. The gradient has a great physical interpretation if the variables represent coordinates of space. If they do, the gradient of a function at a point gives us a vector that describes the direction in which the function increases fastest. And the size of that gradient — a functional on this operator — describes how fast that increase is.

The gradient itself defines more operators. These have names you get very familiar with in Vector Calculus, with names like divergence and curl. These have compelling physical interpretations if we think of the function we operate on as describing a moving fluid. A positive divergence means fluid is coming into the system; a negative divergence, that it is leaving. The curl, in fluids, describe how nearby streams of fluid move at different rate.

Physical interpretations are common in operators. This probably reflects how much influence physics has on mathematics and vice-versa. Anyone studying quantum mechanics gets familiar with a host of operators. These have comfortable names like “position operator” or “momentum operator” or “spin operator”. These are operators that apply to the wave function for a problem. They transform the wave function into a probability distribution. That distribution describes what positions or momentums or spins are likely, how likely they are. Or how unlikely they are.

They’re not all physical, though. Or not purely physical. Many operators are useful because they are powerful mathematical tools. There is a variation of the Fourier series called the Fourier transform. We can interpret this as an operator. Suppose the original function started out with time or space as its independent variable. This often happens. The Fourier transform operator gives us a new function, one with frequencies as independent variable. This can make the function easier to work with. The Fourier transform is an integral operator, by the way, so don’t go thinking everything is a complicated set of derivatives.

Another integral-based operator that’s important is the Laplace transform. This is a great operator because it turns differential equations into algebraic equations. Often, into polynomials. You saw that one coming.

This is all a lot of good press for operators. Well, they’re powerful tools. They help us to see that we can manipulate functions in the ways that functions let us manipulate numbers. It should sound good to realize there is much new that you can do, and you already know most of what’s needed to do it.

This and all the other Fall 2019 A To Z posts should be gathered here. And once I have the time to fiddle with tags I’ll have all past A to Z essays gathered at this link. Thank you for reading. I should be back on Thursday with the letter P.

## My 2019 Mathematics A To Z: Green’s function

Today’s A To Z term is Green’s function. Vayuputrii nominated the topic, and once again I went for one close to my own interests.

These are named for George Green, an English mathematician of the early 19th century. He’s one of those people who gave us our idea of mathematical physics. He’s credited with coining the term “potential”, as in potential energy, and in making people realize how studying this simplified problems. Mostly problems in electricity and magnetism, which were so very interesting back then. On the side also came work in multivariable calculus. His work most famous to mathematics and physics majors connects integrals over the surface of a shape with (different) integrals over the entire interior volume. In more specific problems, he did work on the behavior of water in canals.

# Green’s function.

There’s a patch of (high school) algebra where you solve systems of equations in a couple variables. Like, you have to do one system where you’re solving, say,

$6x + 1y - 2z = 1 \\ 7x + 3y + z = 4 \\ -2x - y + 2z = -2$

And then maybe later on you get a different problem, one that looks like:

$6x + 1y - 2z = 14 \\ 7x + 3y + z = -4 \\ -2x - y + 2z = -6$

If you solve both of them you notice you’re doing a lot of the same work. All the same hard work. It’s only the part on the right-hand side of the equals signs that are different. Even then, the series of steps you follow on the right-hand-side are the same. They have different numbers is all. What makes the problem distinct is the stuff on the left-hand-side. It’s the set of what coefficients times what variables add together. If you get enough about matrices and vectors you get in the habit of writing this set of equations as one matrix equation, as

$A\vec{x} = \vec{b}$

Here $\vec{x}$ holds all the unknown variables, your x and y and z and anything else that turns up. Your $\vec{b}$ holds the right-hand side. Do enough of these problems and you notice something. You can describe how to find the solution for these equations before you even know what the right-hand-side is. You can do all the hard work of solving this set of equations for a generic set of right-hand-side constants. Fill them in when you need a particular answer.

I mentioned, while writing about Fourier series, how it turns out most of what you do to numbers you can also do to functions. This really proves itself in differential equations. Both partial and ordinary differential equations. A differential equation works with some not-yet-known function u(x). For what I’m discussing here it doesn’t matter whether ‘x’ is a single variable or a whole set of independent variables, like, x and y and z. I’ll use ‘x’ as shorthand for all that. The differential equation takes u(x) and maybe multiplies it by something, and adds to that some derivatives of u(x) multiplied by something. Those somethings can be constants. They can be other, known, functions with independent variable x. They can be functions that depend on u(x) also. But if they are, then this is a nonlinear differential equation and there’s no solving that.

So suppose we have a linear differential equation. Partial or ordinary, whatever you like. There’s terms that have u(x) or its derivatives in them. Move them all to the left-hand-side. Move everything else to the right-hand-side. This right-hand-side might be constant. It might depend on x. Doesn’t matter. This right-hand-side is some function which I’ll call f(x). This f(x) might be constant; that’s fine. That’s still a legitimate function.

Put this way, every differential equation looks like:

$(\mbox{stuff with } u(x) \mbox{ and its derivatives}) = f(x)$

That stuff with u(x) and its derivatives we can call an operator. An operator’s a function which has a domain of functions and a range of functions. So we can give give that a name. ‘L’ is a good name here, because if it’s not the operator for a linear differential equation — a linear operator — then we’re done anyway. So whatever our differential equation was we can write it:

$Lu(x) = f(x)$

Writing it $Lu(x)$ makes it look like we’re multiplying L by u(x). We’re not. We’re really not. This is more like if ‘L’ is the predicate of a sentence and ‘u(x)’ is the object. Read it like, to make up an example, ‘L’ means ‘three times the second derivative plus two x times’ and ‘u(x)’ as ‘u(x)’.

Still, looking at $Lu(x) = f(x)$ and then back up at $A\vec{x} = \vec{b}$ tells you what I’m thinking. We can find some set of instructions to, for any $\vec{b}$, find the $\vec{x}$ that makes $A\vec{x} = \vec{b}$ true. So why can’t we find some set of instructions to, for any $f(x)$, find the $u(x)$ that makes $Lu(x) = f(x)$ true?

This is where a Green’s function comes in. Or, like everybody says, “the” Green’s function. “The” here we use like we might talk about “the” roots of a polynomial. Every polynomial has different roots. So, too, does every differential equation have a different Green’s function. What the Green’s function is depends on the equation. It can also depend on what domain the differential equation applies to. It can also depend on some extra information called initial values or boundary values.

The Green’s function for a differential equation has twice as many independent variables as the differential equation has. This seems like we’re making a mess of things. It’s all right. These new variables are the falsework, the scaffolding. Once they’ve helped us get our work done they disappear. This kind of thing we call a “dummy variable”. If x is the actual independent variable, then pick something else — s is a good choice — for the dummy variable. It’s from the same domain as the original x, though. So the Green’s function is some $G(f, s)$. All right, but how do you find it?

To get this, you have to solve a particular special case of the differential equation. You have to solve:

$L G(f, s) = \delta(x - s)$

This may look like we’re not getting anywhere. It may even look like we’re getting in more trouble. What is this $\delta(x - s)$, for example? Well, this is a particular and famous thing called the Dirac delta function. It’s called a function as a courtesy to our mathematical physics friends, who don’t care about whether it truly is a function. Dirac is Paul Dirac, from over in physics. The one whose biography is called The Strangest Man. His delta function is a strange function. Let me say that its independent variable is t. Then $\delta(t)$ is zero, unless t is itself zero. If t is zero then $\delta(t)$ is … something. What is that something? … Oh … something big. It’s … just … don’t look directly at it. What’s important is the integral of this function:

$\int_D\delta(t) dt = 0, \mbox{ if 0 is not in D} \\ \int_D\delta(t) dt = 1, \mbox{ if 0 is in D}$

I write it this way because there’s delta functions for two-dimensional spaces, three-dimensional spaces, everything. If you integrate over a region that includes the origin, the integral of the delta function is 1. If you integrate over a region that doesn’t, the integral of the delta function is 0.

The delta function has a neat property sometimes called filtering. This is what happens if you integrate some function times the Dirac delta function. Then …

$\int_D f(t)\delta(t) dt = 0, \mbox{ if 0 is not in D} \\ \int_D f(t)\delta(t) dt = f(0), \mbox{ if 0 is in D}$

This may look dumb. That’s fine. This scheme is so good at getting rid of integrals where you don’t want them. Or at getting integrals in where it’d be convenient to have.

So, I have a mental model of what the Dirac delta function does. It might help you. Think of beating a drum. It can sound like many different things. It depends on how hard you hit it, how fast you hit it, what kind of stick you use, where exactly you hit it. I think of each differential equation as a different drumhead. The Green’s function is then the sound of a specific, uniform, reference hit at a reference position. This produces a sound. I can use that sound to extrapolate how every different sort of drumming would sound on this particular drumhead.

So solving this one differential equation, to find the Green’s function for a particular case, may be hard. Maybe not. Often it’s easier than some particular f(x) because the Dirac delta function is so weird that it becomes kinda easy-ish. But you do have to find one solution to this differential equation, somehow.

Once you do, though? Once you have this $G(x, s)$? That is glorious. Because then, whatever your f is? The solution to $Lu(x) = f(x)$ is:

$u(x) = \int G(x, s) f(s) ds$

Here the integral is over whatever the domain of the differential equation is, and whatever the domain of f is. This last integral is where the dummy variable finally evaporates. All that remains is x, as we want.

A little bit of … arithmetic isn’t the right word. But symbol manipulation will convince you this is right, if you need convincing. (The trick is remembering that ‘x’ and ‘s’ are different variables. When you differentiate with respect to ‘x’, ‘s’ acts like a constant. When you integrate with respect to ‘s’, ‘x’ acts like a constant.)

What can make a Green’s function worth finding is that we do a lot of the same kinds of differential equations. We do a lot of diffusion problems. A lot of wave transmission problems. A lot of wave-transmission-with-losses problems. So there are many problems that can all use the same tools to solve.

Consider remote detection problems. This can include things like finding things underground. It also includes, like, medical sensors. We would like to know “what kind of thing produces a signal like this?” We can detect the signal easily enough. We can model how whatever it is between the thing and our sensors changes what we could detect. (This kind of thing we call an “inverse problem”, finding the thing that could produce what we know.) Green’s functions are one of the ways we can get at the source of what we can see.

Now, Green’s functions are a powerful and useful idea. They sprawl over a lot of mathematical applications. As they do, they pick up regional dialects. Things like deciding that $LG(x, s) = - \delta(x - s)$, for example. None of these are significant differences. But before you go poking into someone else’s field and solving their problems, take a moment. Double-check that their symbols do mean precisely what you think they mean. It’ll save you some petty quarrels.

I should have the ‘H’ essay in the Fall 2019 series on Thursday. That and all other Fall 2019 A To Z posts should be at this link.

Also, I really don’t like how those systems of equations turned out up at the top of this essay. But I couldn’t work out how to do arrays of equations all lined up along the equals sign, or other mildly advanced LaTeX stuff like doing a function-definition-by-cases. If someone knows of the Real Official Proper List of what you can and can’t do with the LaTeX that comes from a standard free WordPress.com blog I’d appreciate a heads-up. Thank you.

## My 2019 Mathematics A To Z: Fourier series

Today’s A To Z term came to me from two nominators. One was @aajohannas, again offering a great topic. Another was Mr Wu, author of the Singapore Maths Tuition blog. I hope neither’s disappointed here.

Fourier series are named for Jean-Baptiste Joseph Fourier, and are maybe the greatest example of the theory that’s brilliantly wrong. Anyone can be wrong about something. There’s genius in being wrong in a way that gives us good new insights into things. Fourier series were developed to understand how the fluid we call “heat” flows through and between objects. Heat is not a fluid. So what? Pretending it’s a fluid gives us good, accurate results. More, you don’t need to use Fourier series to work with a fluid. Or a thing you’re pretending is a fluid. It works for lots of stuff. The Fourier series method challenged assumptions mathematicians had made about how functions worked, how continuity worked, how differential equations worked. These problems could be sorted out. It took a lot of work. It challenged and expended our ideas of functions.

Fourier also managed to hold political offices in France during the Revolution, the Consulate, the Empire, the Bourbon Restoration, the Hundred Days, and the Second Bourbon Restoration without getting killed for his efforts. If nothing else this shows the depth of his talents.

# Fourier series.

So, how do you solve differential equations? As long as they’re linear? There’s usually something we can do. This is one approach. It works well. It has a bit of a weird setup.

The weirdness of the setup: you want to think of functions as points in space. The allegory is rather close. Think of the common association between a point in space and the coordinates that describe that point. Pretend those are the same thing. Then you can do stuff like add points together. That is, take the coordinates of both points. Add the corresponding coordinates together. Match that sum-of-coordinates to a point. This gives us the “sum” of two points. You can subtract points from one another, again by going through their coordinates. Multiply a point by a constant and get a new point. Find the angle between two points. (This is the angle formed by the line segments connecting the origin and both points.)

Functions can work like this. You can add functions together and get a new function. Subtract one function from another. Multiply a function by a constant. It’s even possible to describe an “angle” between two functions. Mathematicians usually call that the dot product or the inner product. But we will sometimes call two functions “orthogonal”. That means the ordinary everyday meaning of “orthogonal”, if anyone said “orthogonal” in ordinary everyday life.

We can take equations of a bunch of variables and solve them. Call the values of that solution the coordinates of a point. Then we talk about finding the point where something interesting happens. Or the points where something interesting happens. We can do the same with differential equations. This is finding a point in the space of functions that makes the equation true. Maybe a set of points. So we can find a function or a family of functions solving the differential equation.

You have reasons for skepticism, even if you’ll grant me treating functions as being like points in space. You might remember solving systems of equations. You need as many equations as there are dimensions of space; a two-dimensional space needs two equations. A three-dimensional space needs three equations. You might have worked four equations in four variables. You were threatened with five equations in five variables if you didn’t all settle down. You’re not sure how many dimensions of space “all the possible functions” are. It’s got to be more than the one differential equation we started with.

This is fair. The approach I’m talking about uses the original differential equation, yes. But it breaks it up into a bunch of linear equations. Enough linear equations to match the space of functions. We turn a differential equation into a set of linear equations, a matrix problem, like we know how to solve. So that settles that.

So suppose $f(x)$ solves the differential equation. Here I’m going to pretend that the function has one independent variable. Many functions have more than this. Doesn’t matter. Everything I say here extends into two or three or more independent variables. It takes longer and uses more symbols and we don’t need that. The thing about $f(x)$ is that we don’t know what it is, but would quite like to.

What we’re going to do is choose a reference set of functions that we do know. Let me call them $g_0(x), g_1(x), g_2(x), g_3(x), \cdots$ going on to however many we need. It can be infinitely many. It certainly is at least up to some $g_N(x)$ for some big enough whole number N. These are a set of “basis functions”. For any function we want to represent we can find a bunch of constants, called coefficients. Let me use $a_0, a_1, a_2, a_3, \cdots$ to represent them. Any function we want is the sum of the coefficient times the matching basis function. That is, there’s some coefficients so that

$f(x) = a_0\cdot g_0(x) + a_1\cdot g_1(x) + a_2\cdot g_2(x) + a_3\cdot g_3(x) + \cdots$

is true. That summation goes on until we run out of basis functions. Or it runs on forever. This is a great way to solve linear differential equations. This is because we know the basis functions. We know everything we care to know about them. We know their derivatives. We know everything on the right-hand side except the coefficients. The coefficients matching any particular function are constants. So the derivatives of $f(x)$, written as the sum of coefficients times basis functions, are easy to work with. If we need second or third or more derivatives? That’s no harder to work with.

You may know something about matrix equations. That is that solving them takes freaking forever. The bigger the equation, the more forever. If you have to solve eight equations in eight unknowns? If you start now, you might finish in your lifetime. For this function space? We need dozens, hundreds, maybe thousands of equations and as many unknowns. Maybe infinitely many. So we seem to have a solution that’s great apart from how we can’t use it.

Except. What if the equations we have to solve are all easy? If we have to solve a bunch that looks like, oh, $2a_0 = 4$ and $3a_1 = -9$ and $2a_2 = 10$ … well, that’ll take some time, yes. But not forever. Great idea. Is there any way to guarantee that?

It’s in the basis functions. If we pick functions that are orthogonal, or are almost orthogonal, to each other? Then we can turn the differential equation into an easy matrix problem. Not as easy as in the last paragraph. But still, not hard.

So what’s a good set of basis functions?

And here, about 800 words later than everyone was expecting, let me introduce the sine and cosine functions. Sines and cosines make great basis functions. They don’t grow without bounds. They don’t dwindle to nothing. They’re easy to differentiate. They’re easy to integrate, which is really special. Most functions are hard to integrate. We even know what they look like. They’re waves. Some have long wavelengths, some short wavelengths. But waves. And … well, it’s easy to make sets of them orthogonal.

We have to set some rules. The first is that each of these sine and cosine basis functions have a period. That is, after some time (or distance), they repeat. They might repeat before that. Most of them do, in fact. But we’re guaranteed a repeat after no longer than some period. Call that period ‘L’.

Each of these sine and cosine basis functions has to have a whole number of complete oscillations within the period L. So we can say something about the sine and cosine functions. They have to look like these:

$s_j(x) = \sin\left(\frac{2\pi j}{L} x\right)$

$c_k(x) = \cos\left(\frac{2\pi k}{L} x\right)$

Here ‘j’ and ‘k’ are some whole numbers. I have two sets of basis functions at work here. Don’t let that throw you. We could have labelled them all as $g_k(x)$, with some clever scheme that told us for a given k whether it represents a sine or a cosine. It’s less hard work if we have s’s and c’s. And if we have coefficients of both a’s and b’s. That is, we suppose the function $f(x)$ is:

$f(x) = \frac{1}{2}a_0 + b_1 s_1(x) + a_1 c_1(x) + b_2 s_2(x) + a_2 s_2(x) + b_3 s_3(x) + a_3 c_3(x) + \cdots$

This, at last, is the Fourier series. Each function has its own series. A “series” is a summation. It can be of finitely many terms. It can be of infinitely many. Often infinitely many terms give more interesting stuff. Like this, for example. Oh, and there’s a bare $\frac{1}{2}a_0$ there, not multiplied by anything more complicated. It makes life easier. It lets us see that the Fourier series for, like, 3 + f(x) is the same as the Fourier series for f(x), except for the leading term. The ½ before that makes easier some work that’s outside the scope of this essay. Accept it as one of the merry, wondrous appearances of ‘2’ in mathematics expressions.

It’s great for solving differential equations. It’s also great for encryption. The sines and the cosines are standard functions, after all. We can send all the information we need to reconstruct a function by sending the coefficients for it. This can also help us pick out signal from noise. Noise has a Fourier series that looks a particular way. If you take the coefficients for a noisy signal and remove that? You can get a good approximation of the original, noiseless, signal.

This all seems great. That’s a good time to feel skeptical. First, like, not everything we want to work with looks like waves. Suppose we need a function that looks like a parabola. It’s silly to think we can add a bunch of sines and cosines and get a parabola. Like, a parabola isn’t periodic, to start with.

So it’s not. To use Fourier series methods on something that’s not periodic, we use a clever technique: we tell a fib. We declare that the period is something bigger than we care about. Say the period is, oh, ten million years long. A hundred light-years wide. Whatever. We trust that the difference between the function we do want, and the function that we calculate, will be small. We trust that if someone ten million years from now and a hundred light-years away wishes to complain about our work, we will be out of the office that day. Letting the period L be big enough is a good reliable tool.

The other thing? Can we approximate any function as a Fourier series? Like, at least chunks of parabolas? Polynomials? Chunks of exponential growths or decays? What about sawtooth functions, that rise and fall? What about step functions, that are constant for a while and then jump up or down?

The answer to all these questions is “yes,” although drawing out the word and raising a finger to say there are some issues we have to deal with. One issue is that most of the time, we need an infinitely long series to represent a function perfectly. This is fine if we’re trying to prove things about functions in general rather than solve some specific problem. It’s no harder to write the sum of infinitely many terms than the sum of finitely many terms. You write an ∞ symbol instead of an N in some important places. But if we want to solve specific problems? We probably want to deal with finitely many terms. (I hedge that statement on purpose. Sometimes it turns out we can find a formula for all the infinitely many coefficients.) This will usually give us an approximation of the $f(x)$ we want. The approximation can be as good as we want, but to get a better approximation we need more terms. Fair enough. This kind of tradeoff doesn’t seem too weird.

Another issue is in discontinuities. If $f(x)$ jumps around? If it has some point where it’s undefined? If it has corners? Then the Fourier series has problems. Summing up sines and cosines can’t give us a sudden jump or a gap or anything. Near a discontinuity, the Fourier series will get this high-frequency wobble. A bigger jump, a bigger wobble. You may not blame the series for not representing a discontinuity. But it does mean that what is, otherwise, a pretty good match for the $f(x)$ you want gets this region where it stops being so good a match.

That’s all right. These issues aren’t bad enough, or unpredictable enough, to keep Fourier series from being powerful tools. Even when we find problems for which sines and cosines are poor fits, we use this same approach. Describe a function we would like to know as the sums of functions we choose to work with. Fourier series are one of those ideas that helps us solve problems, and guides us to new ways to solve problems.

This is my last big essay for the week. All of Fall 2019 A To Z posts should be at this link. The letter G should get its chance on Tuesday and H next Thursday. I intend to have A To Z essays should be available at this link. If you’d like to nominate topics for essays, I’m asking for the letters I through N at this link. Thank you.

## My 2019 Mathematics A To Z: Differential Equations

The thing most important to know about differential equations is that for short, we call it “diff eq”. This is pronounced “diffy q”. It’s a fun name. People who aren’t taking mathematics smile when they hear someone has to get to “diffy q”.

Sometimes we need to be more exact. Then the less exciting names “ODE” and “PDE” get used. The meaning of the “DE” part is an easy guess. The meaning of “O” or “P” will be clear by the time this essay’s finished. We can find approximate answers to differential equations by computer. This is known generally as “numerical solutions”. So you will encounter talk about, say, “NSPDE”. There’s an implied “of” between the S and the P there. I don’t often see “NSODE”. For some reason, probably a quite arbitrary historical choice, this is just called “numerical integration” instead.

To write about “differential equations” was suggested by aajohannas, who is on Twitter as @aajohannas.

# Differential Equations.

One of algebra’s unsettling things is the idea that we can work with numbers without knowing their values. We can give them names, like ‘x’ or ‘a’ or ‘t’. We can know things about them. Often it’s equations telling us these things. We can make collections of numbers based on them all sharing some property. Often these things are solutions to equations. We can even describe changing those collections according to some rule, even before we know whether any of the numbers is 2. Often these things are functions, here matching one set of numbers to another.

One of analysis’s unsettling things is the idea that most things we can do with numbers we can also do with functions. We can give them names, like ‘f’ and ‘g’ and … ‘F’. That’s easy enough. We can add and subtract them. Multiply and divide. This is unsurprising. We can measure their sizes. This is odd but, all right. We can know things about functions even without knowing exactly what they are. We can group together collections of functions based on some properties they share. This is getting wild. We can even describe changing these collections according to some rule. This change is itself a function, but it is usually called an “operator”, saving us some confusion.

So we can describe a function in an equation. We may not know what f is, but suppose we know $\sqrt{f(x) - 2} = x$ is true. We can suppose that if we cared we could find what function, or functions, f made that equation true. There is shorthand here. A function has a domain, a range, and a rule. The equation part helps us find the rule. The domain and range we get from the problem. Or we take the implicit rule that both are the biggest sets of real-valued numbers for which the rule parses. Sometimes biggest sets of complex-valued numbers. We get so used to saying “the function” to mean “the rule for the function” that we’ll forget to say that’s what we’re doing.

There are things we can do with functions that we can’t do with numbers. Or at least that are too boring to do with numbers. The most important here is taking derivatives. The derivative of a function is another function. One good way to think of a derivative is that it describes how a function changes when its variables change. (The derivative of a number is zero, which is boring except when it’s also useful.) Derivatives are great. You learn them in Intro Calculus, and there are a bunch of rules to follow. But follow them and you can pretty much take the derivative of any function even if it’s complicated. Yes, you might have to look up what the derivative of the arc-hyperbolic-secant is. Nobody has ever used the arc-hyperbolic-secant, except to tease a student.

And the derivative of a function is itself a function. So you can take a derivative again. Mathematicians call this the “second derivative”, because we didn’t expect someone would ask what to call it and we had to say something. We can take the derivative of the second derivative. This is the “third derivative” because by then changing the scheme would be awkward. If you need to talk about taking the derivative some large but unspecified number of times, this is the n-th derivative. Or m-th, if you’ve already used ‘n’ to mean something else.

And now we get to differential equations. These are equations in which we describe a function using at least one of its derivatives. The original function, that is, f, usually appears in the equation. It doesn’t have to, though.

We divide the earth naturally (we think) into two pairs of hemispheres, northern and southern, eastern and western. We divide differential equations naturally (we think) into two pairs of two kinds of differential equations.

The first division is into linear and nonlinear equations. I’ll describe the two kinds of problem loosely. Linear equations are the kind you don’t need a mathematician to solve. If the equation has solutions, we can write out procedures that find them, like, all the time. A well-programmed computer can solve them exactly. Nonlinear equations, meanwhile, are the kind no mathematician can solve. They’re just too hard. There’s no processes that are sure to find an answer.

You may ask. We don’t need mathematicians to solve linear equations. Mathematicians can’t solve nonlinear ones. So what do we need mathematicians for? The answer is that I exaggerate. Linear equations aren’t quite that simple. Nonlinear equations aren’t quite that hopeless. There are nonlinear equations we can solve exactly, for example. This usually involves some ingenious transformation. We find a linear equation whose solution guides us to the function we do want.

And that is what mathematicians do in such a field. A nonlinear differential equation may, generally, be hopeless. But we can often find a linear differential equation which gives us insight to what we want. Finding that equation, and showing that its answers are relevant, is the work.

The other hemispheres we call ordinary differential equations and partial differential equations. In form, the difference between them is the kind of derivative that’s taken. If the function’s domain is more than one dimension, then there are different kinds of derivative. Or as normal people put it, if the function has more than one independent variable, then there are different kinds of derivatives. These are partial derivatives and ordinary (or “full”) derivatives. Partial derivatives give us partial differential equations. Ordinary derivatives give us ordinary differential equations. I think it’s easier to understand a partial derivative.

Suppose a function depends on three variables, imaginatively named x, y, and z. There are three partial first derivatives. One describes how the function changes if we pretend y and z are constants, but let x change. This is the “partial derivative with respect to x”. Another describes how the function changes if we pretend x and z are constants, but let y change. This is the “partial derivative with respect to y”. The third describes how the function changes if we pretend x and y are constants, but let z change. You can guess what we call this.

In an ordinary differential equation we would still like to know how the function changes when x changes. But we have to admit that a change in x might cause a change in y and z. So we have to account for that. If you don’t see how such a thing is possible don’t worry. The differential equations textbook has an example in which you wish to measure something on the surface of a hill. Temperature, usually. Maybe rainfall or wind speed. To move from one spot to another a bit east of it is also to move up or down. The change in (let’s say) x, how far east you are, demands a change in z, how far above sea level you are.

That’s structure, though. What’s more interesting is the meaning. What kinds of problems do ordinary and partial differential equations usually represent? Partial differential equations are great for describing surfaces and flows and great bulk masses of things. If you see an equation about how heat transmits through a room? That’s a partial differential equation. About how sound passes through a forest? Partial differential equation. About the climate? Partial differential equations again.

Ordinary differential equations are great for describing a ball rolling on a lumpy hill. It’s given an initial push. There are some directions (downhill) that it’s easier to roll in. There’s some directions (uphill) that it’s harder to roll in, but it can roll if the push was hard enough. There’s maybe friction that makes it roll to a stop.

Put that way it’s clear all the interesting stuff is partial differential equations. Balls on lumpy hills are nice but who cares? Miniature golf course designers and that’s all. This is because I’ve presented it to look silly. I’ve got you thinking of a “ball” and a “hill” as if I meant balls and hills. Nah. It’s usually possible to bundle a lot of information about a physical problem into something that looks like a ball. And then we can bundle the ways things interact into something that looks like a hill.

Like, suppose we have two blocks on a shared track, like in a high school physics class. We can describe their positions as one point in a two-dimensional space. One axis is where on the track the first block is, and the other axis is where on the track the second block is. Physics problems like this also usually depend on momentum. We can toss these in too, an axis that describes the momentum of the first block, and another axis that describes the momentum of the second block.

We’re already up to four dimensions, and we only have two things, both confined to one track. That’s all right. We don’t have to draw it. If we do, we draw something that looks like a two- or three-dimensional sketch, maybe with a note that says “D = 4” to remind us. There’s some point in this four-dimensional space that describes these blocks on the track. That’s the “ball” for this differential equation.

The things that the blocks can do? Like, they can collide? They maybe have rubber tips so they bounce off each other? Maybe someone’s put magnets on them so they’ll draw together or repel? Maybe there’s a spring connecting them? These possible interactions are the shape of the hills that the ball representing the system “rolls” over. An impenetrable barrier, like, two things colliding, is a vertical wall. Two things being attracted is a little divot. Two things being repulsed is a little hill. Things like that.

Now you see why an ordinary differential equation might be interesting. It can capture what happens when many separate things interact.

I write this as though ordinary and partial differential equations are different continents of thought. They’re not. When you model something you make choices and they can guide you to ordinary or to partial differential equations. My own research work, for example, was on planetary atmospheres. Atmospheres are fluids. Representing how fluids move usually calls for partial differential equations. But my own interest was in vortices, swirls like hurricanes or Jupiter’s Great Red Spot. Since I was acting as if the atmosphere was a bunch of storms pushing each other around, this implied ordinary differential equations.

There are more hemispheres of differential equations. They have names like homogenous and non-homogenous. Coupled and decoupled. Separable and nonseparable. Exact and non-exact. Elliptic, parabolic, and hyperbolic partial differential equations. Don’t worry about those labels. They relate to how difficult the equations are to solve. What ways they’re difficult. In what ways they break computers trying to approximate their solutions.

What’s interesting about these, besides that they represent many physical problems, is that they capture the idea of feedback. Of control. If a system’s current state affects how it’s going to change, then it probably has a differential equation describing it. Many systems change based on their current state. So differential equations have long been near the center of professional mathematics. They offer great and exciting pure questions while still staying urgent and relevant to real-world problems. They’re great things.

Thanks again for reading. All Fall 2019 A To Z posts should be at this link. I should get to the letter E for Tuesday. All of the A To Z essays should be at this link. If you have thoughts about other topics I might cover, please offer suggestions for the letters G and H.

## What I’ve Been Reading, Mid-March 2018

So here’s some of the stuff I’ve noticed while being on the Internet and sometimes noticing interesting mathematical stuff.

Here from the end of January is a bit of oddball news. A story problem for 11-year-olds in one district of China set up a problem that couldn’t be solved. Not exactly, anyway. The question — “if a ship had 26 sheep and 10 goats onboard, how old is the ship’s captain?” — squares nicely with that Gil comic strip I discussed the other day. After seeing 26 (something) and 10 (something else) it’s easy to think of what answers might be wanted: 36 (total animals) or 16 (how many more sheep there are than goats) or maybe 104 (how many hooves there are, if they all have the standard four hooves). That the question doesn’t ask anything that the given numbers matter for barely registers unless you read the question again. I like the principle of reminding people not to calculate until you know what you want to do and why that. And it’s possible to give partial answers: the BBC News report linked above includes a mention from one commenter that allowed a reasonable lower bound to be set on the ship’s captain’s age.

In something for my mathematics majors, here’s A Regiment of Monstrous Functions as assembled by Rob J Low. This is about functions with a domain and a range that are both real numbers. There’s many kinds of these functions. They match nicely to the kinds of curves you can draw on a sheet of paper. So take a sheet of paper and draw a curve. You’ve probably drawn a continuous curve, one that can be drawn without lifting your pencil off the paper. Good chance you drew a differentiable one, one without corners. But most functions aren’t continuous. And aren’t differentiable. Of those few exceptions that are, many of them are continuous or differentiable only in weird cases. Low reviews some of the many kinds of functions out there. Functions discontinuous at a point. Functions continuous only on one point, and why that’s not a crazy thing to say. Functions continuous on irrational numbers but discontinuous on rational numbers. This is where mathematics majors taking real analysis feel overwhelmed. And then there’s stranger stuff out there.

Here’s a neat one. It’s about finding recognizable, particular, interesting pictures in long enough prime numbers. The secret to it is described in the linked paper. The key is that the eye is very forgiving of slightly imperfect images. This fact should reassure people learning to draw, but will not. And there’s a lot of prime numbers out there. If an exactly-correct image doesn’t happen to be a prime number that’s all right. There’s a number close enough to it that will be. That latter point is something that anyone interested in number theory “knows”, in that we know some stuff about the biggest possible gaps between prime numbers. But that fact isn’t the same as seeing it.

And finally there’s something for mathematics majors. Differential equations are big and important. They appear whenever you want to describe something that changes based on its current state. And this is so much stuff. Finding solutions to differential equations is a whole major field of mathematics. The linked PDF is a slideshow of notes about one way to crack these problems: find symmetries. The only trouble is it’s a PDF of a Powerpoint presentation, one of those where each of the items gets added on in sequence. So each slide appears like eight times, each time with one extra line on it. It’s still good, interesting stuff.

## Everything Interesting There Is To Say About Springs

I need another supplemental essay to get to the next part in Why Stuff Can Orbit. (Here’s the last part.) You probably guessed it’s about springs. They’re useful to know about. Why? That one killer Mystery Science Theater 3000 short, yes. But also because they turn up everywhere.

Not because there are literally springs in everything. Not with the rise in anti-spring political forces. But what makes a spring is a force that pushes something back where it came from. It pushes with a force that grows just as fast as the distance from where it came grows. Most anything that’s stable, that has some normal state which it tends to look like, acts like this. A small nudging away from the normal state gets met with some resistance. A bigger nudge meets bigger resistance. And most stuff that we see is stable. If it weren’t stable it would have broken before we got there.

(There are exceptions. Stable is, sometimes, about perspective. It can be that something is unstable but it takes so long to break that we don’t have to worry about it. Uranium, for example, is dying, turning slowly into stable elements like lead and helium. There will come a day there’s none left in the Earth. But it takes so long to break down that, barring surprises, the Earth will have broken down into something else first. And it may be that something is unstable, but it’s created by something that’s always going on. Oxygen in the atmosphere is always busy combining with other chemicals. But oxygen stays in the atmosphere because life keeps breaking it out of other chemicals.)

Now I need to put in some terms. Start with your thing. It’s on a spring, literally or metaphorically. Don’t care. If it isn’t being pushed in any direction then it’s at rest. Or it’s at an equilibrium. I don’t want to call this the ideal or natural state. That suggests some moral superiority to one way of existing over another, and how do I know what’s right for your thing? I can tell you what it acts like. It’s your business whether it should. Anyway, your thing has an equilibrium.

Next term is the displacement. It’s how far your thing is from the equilibrium. If it’s really a block of wood on a spring, like it is in high school physics, this displacement is how far the spring is stretched out. In equations I’ll represent this as ‘x’ because I’m not going to go looking deep for letters for something like this. What value ‘x’ has will change with time. This is what makes it a physics problem. If we want to make clear that ‘x’ does depend on time we might write ‘x(t)’. We might go all the way and start at the top of the page with ‘x = x(t)’, just in case.

If ‘x’ is a positive number it means your thing is displaced in one direction. If ‘x’ is a negative number it was displaced in the opposite direction. By ‘one direction’ I mean ‘to the right, or else up’. By ‘the opposite direction’ I mean ‘to the left, or else down’. Yes, you can pick any direction you like but why are you making life harder for everyone? Unless there’s something compelling about the setup of your thing that makes another choice make sense just go along with what everyone else is doing. Apply your creativity and iconoclasm where it’ll make your life better instead.

Also, we only have to worry about one direction. This might surprise you. If you’ve played much with springs you might have noticed how they’re three-dimensional objects. You can set stuff swinging back and forth in two directions at once. That’s all right. We can describe a two-dimensional displacement as a displacement in one direction plus a displacement perpendicular to that. And if there’s no such thing as friction, they won’t interact. We can pretend they’re two problems that happen to be running on the same spring at the same time. So here I declare: we can ignore friction and pretend it doesn’t matter. We don’t have to deal with more than one direction at a time.

(It’s not only friction. There’s problems about how energy gets transmitted between ways the thing can oscillate. This is what causes what starts out as a big whack in one direction to turn into a middling little circular wobbling. That’s a higher level physics than I want to do right now. So here I declare: we can ignore that and pretend it doesn’t matter.)

Whether your thing is displaced or not it’s got some potential energy. This can be as large or as small as you like, down to some minimum when your thing is at equilibrium. The potential energy we represent as a number named ‘U’ because of good reasons that somebody surely had. The potential energy of a spring depends on the square of the displacement. We can write its value as ‘U = ½ k x2‘. Here ‘k’ is a number known as the spring constant. It describes how strongly the spring reacts; the bigger ‘k’ is, the more any displacement’s met with a contrary force. It’ll be a positive number. ½ is that same old one-half that you know from ideas being half-baked or going-off being half-cocked.

Potential energy is great. If you can describe a physics problem with its energy you’re in good shape. It lets us bring physical intuition into understanding things. Imagine a bowl or a Habitrail-type ramp that’s got the cross-section of your potential energy. Drop a little marble into it. How the marble rolls? That’s what your thingy does in that potential energy.

Also we have mathematics. Calculus, particularly differential equations, lets us work out how the position of your thing will change. We need one more piece for this. That’s the momentum of your thing. Momentum is traditionally represented with the letter ‘p’. And now here’s how stuff moves when you know the potential energy ‘U’:

$\frac{dp}{dt} = - \frac{\partial U}{\partial x}$

Let me unpack that. $\frac{dp}{dt}$ — also known as $\frac{d}{dt}p$ if that looks better — is “the derivative of p with respect to t”. It means “how the value of the momentum changes as the time changes”. And that is equal to minus one times …

You might guess that $\frac{\partial U}{\partial x}$ — also written as $\frac{\partial}{\partial x} U$ — is some kind of derivative. The $\partial$ looks kind of like a cursive d, after all. It’s known as the partial derivative, because it means we look at how ‘U’ changes as ‘x’ and nothing else at all changes. With the normal, ‘d’ style full derivative, we have to track how all the variables change as the ‘t’ we’re interested in changes. In this particular problem the difference doesn’t matter. But there are problems where it does matter and that’s why I’m careful about the symbols.

So now we fall back on how to take derivatives. This gives us the equation that describes how the physics of your thing on a spring works:

$\frac{dp}{dt} = - k x$

You’re maybe underwhelmed. This is because we haven’t got any idea how the momentum ‘p’ relates to the displacement ‘x’. Well, we do, because I know and if you’re still reading at this point you know full well what momentum is. But let me make it official. Momentum is, for this kind of thing, the mass ‘m’ of your thing times how its position is changing, which is $\frac{dx}{dt}$. The mass of your thing isn’t changing. If you’re going to let it change then we’re doing some screwy rocket problem and that’s a different article. So its easy to get the momentum out of that problem. We get instead the second derivative of the displacement with respect to time:

$m\frac{d^2 x}{dt^2} = - kx$

Fine, then. Does that tell us anything about what ‘x(t)’ is? Not yet, but I will now share with you one of the top secrets that only real mathematicians know. We will take a guess to what the answer probably is. Then we’ll see in what circumstances that answer could possibly be right. Does this seem ad hoc? Fine, so it’s ad hoc. Here is the secret of mathematicians:

It’s fine if you get your answer by any stupid method you like, including guessing and getting lucky, as long as you check that your answer is right.

Oh, sure, we’d rather you get an answer systematically, since a system might give us ideas how to find answers in new problems. But if all we want is an answer then, by definition, we don’t care where it came from. Anyway, we’re making a particular guess, one that’s very good for this sort of problem. Indeed, this guess is our system. A lot of guesses at solving differential equations use exactly this guess. Are you ready for my guess about what solves this? Because here it is.

We should expect that

$x(t) = C e^{r t}$

Here ‘C’ is some constant number, not yet known. And ‘r’ is some constant number, not yet known. ‘t’ is time. ‘e’ is that number 2.71828(etc) that always turns up in these problems. Why? Because its derivative is very easy to take, and if we have to take derivatives we want them to be easy to take. The first derivative of $Ce^{rt}$ with respect to ‘t’ is $r Ce^{rt}$. The second derivative with respect to ‘t’ is $r^2 Ce^{rt}$. so here’s what we have:

$m r^2 Ce^{rt} = - k Ce^{rt}$

What we’d like to find are the values for ‘C’ and ‘r’ that make this equation true. It’s got to be true for every value of ‘t’, yes. But this is actually an easy equation to solve. Why? Because the $C e^{rt}$ on the left side has to equal the $C e^{rt}$ on the right side. As long as they’re not equal to zero and hey, what do you know? $C e^{rt}$ can’t be zero unless ‘C’ is zero. So as long as ‘C’ is any number at all in the world except zero we can divide this ugly lump of symbols out of both sides. (If ‘C’ is zero, then this equation is 0 = 0 which is true enough, I guess.) What’s left?

$m r^2 = -k$

OK, so, we have no idea what ‘C’ is and we’re not going to have any. That’s all right. We’ll get it later. What we can get is ‘r’. You’ve probably got there already. There’s two possible answers:

$r = \pm\sqrt{-\frac{k}{m}}$

You might not like that. You remember that ‘k’ has to be positive, and if mass ‘m’ isn’t positive something’s screwed up. So what are we doing with the square root of a negative number? Yes, we’re getting imaginary numbers. Two imaginary numbers, in fact:

$r = \imath \sqrt{\frac{k}{m}}, r = - \imath \sqrt{\frac{k}{m}}$

Which is right? Both. In some combination, too. It’ll be a bit with that first ‘r’ plus a bit with that second ‘r’. In the differential equations trade this is called superposition. We’ll have information that tells us how much uses the first ‘r’ and how much uses the second.

You might still be upset. Hey, we’ve got these imaginary numbers here describing how a spring moves and while you might not be one of those high-price physicists you see all over the media you know springs aren’t imaginary. I’ve got a couple responses to that. Some are semantic. We only call these numbers “imaginary” because when we first noticed they were useful things we didn’t know what to make of them. The label is an arbitrary thing that doesn’t make any demands of the numbers. If we had called them, oh, “Cardanic numbers” instead would you be upset that you didn’t see any Cardanos in your springs?

My high-class semantic response is to ask in exactly what way is the “square root of minus one” any less imaginary than “three”? Can you give me a handful of three? No? Didn’t think so.

And then the practical response is: don’t worry. Exponentials raised to imaginary numbers do something amazing. They turn into sine waves. Well, sine and cosine waves. I’ll spare you just why. You can find it by looking at the first twelve or so posts of any pop mathematics blog and its article about how amazing Euler’s Formula is. Given that Euler published, like, 2,038 books and papers through his life and the fifty years after his death it took to clear the backlog you might think, “Euler had a lot of Formulas, right? Identities too?” Yes, he did, but you’ll know this one when you see it.

What’s important is that the displacement of your thing on a spring will be described by a function which looks like this:

$x(t) = C_1 e^{\sqrt{\frac{k}{m}} t} + C_2 e^{-\sqrt{\frac{k}{m}} t}$

for two constants, ‘C1‘ and ‘C2‘. These were the things we called ‘C’ back when we thought the answer might be $Ce^{rt}$; there’s two of them because there’s two r’s. I give you my word this is equivalent to a formula like this, but you can make me show my work if you must:

$x(t) = A cos\left(\sqrt{\frac{k}{m}} t\right) + B sin\left(\sqrt{\frac{k}{m}} t\right)$

for some (other) constants ‘A’ and ‘B’. Cosine and sine are the old things you remember from learning about cosine and sine.

OK, but what are ‘A’ and ‘B’?

Generically? We don’t care. Some numbers. Maybe zero. Maybe not. The pattern, how the displacement changes over time, will be the same whatever they are. It’ll be regular oscillation. At one time your thing will be as far from the equilibrium as it gets, and not moving toward or away from the center. At one time it’ll be back at the center and moving as fast as it can. At another time it’ll be as far away from the equilibrium as it gets, but on the other side. At another time it’ll be back at the equilibrium and moving as fast as it ever does, but the other way. How far is that maximum? What’s the fastest it travels?

The answer’s in how we started. If we start at the equilibrium without any kind of movement we’re never going to leave the equilibrium. We have to get nudged out of it. But what kind of nudge? There’s three ways you can do to nudge something out.

You can tug it out some and let it go from rest. This is the easiest: then ‘A’ is however big your tug was and ‘B’ is zero.

You can let it start from equilibrium but give it a good whack so it’s moving at some initial velocity. This is the next-easiest: ‘A’ is zero, and ‘B’ is … no, not the initial velocity. You need to look at what the velocity of your thing is at the start. That’s the first derivative:

$\frac{dx}{dt} = -\sqrt{\frac{k}{m}}A sin\left(\sqrt{\frac{k}{m}} t\right) + \sqrt{\frac{k}{m}} B sin\left(\sqrt{\frac{k}{m}} t\right)$

The start is when time is zero because we don’t need to be difficult. when ‘t’ is zero the above velocity is $\sqrt{\frac{k}{m}} B$. So that product has to be the initial velocity. That’s not much harder.

The third case is when you start with some displacement and some velocity. A combination of the two. Then, ugh. You have to figure out ‘A’ and ‘B’ that make both the position and the velocity work out. That’s the simultaneous solutions of equations, and not even hard equations. It’s more work is all. I’m interested in other stuff anyway.

Because, yeah, the spring is going to wobble back and forth. What I’d like to know is how long it takes to get back where it started. How long does a cycle take? Look back at that position function, for example. That’s all we need.

$x(t) = A cos\left(\sqrt{\frac{k}{m}} t\right) + B sin\left(\sqrt{\frac{k}{m}} t\right)$

Sine and cosine functions are periodic. They have a period of 2π. This means if you take the thing inside the parentheses after a sine or a cosine and increase it — or decrease it — by 2π, you’ll get the same value out. What’s the first time that the displacement and the velocity will be the same as their starting values? If they started at t = 0, then, they’re going to be back there at a time ‘T’ which makes true the equation

$\sqrt{\frac{k}{m}} T = 2\pi$

And that’s going to be

$T = 2\pi\sqrt{\frac{m}{k}}$

Maybe surprising thing about this: the period doesn’t depend at all on how big the displacement is. That’s true for perfect springs, which don’t exist in the real world. You knew that. Imagine taking a Junior Slinky from the dollar store and sticking a block of something on one end. Imagine stretching it out to 500,000 times the distance between the Earth and Jupiter and letting go. Would it act like a spring or would it break? Yeah, we know. It’s sad. Think of the animated-cartoon joy a spring like that would produce.

But this period not depending on the displacement is true for small enough displacements, in the real world. Or for good enough springs. Or things that work enough like springs. By “true” I mean “close enough to true”. We can give that a precise mathematical definition, which turns out to be what you would mean by “close enough” in everyday English. The difference is it’ll have Greek letters included.

So to sum up: suppose we have something that acts like a spring. Then we know qualitatively how it behaves. It oscillates back and forth in a sine wave around the equilibrium. Suppose we know what the spring constant ‘k’ is. Suppose we also know ‘m’, which represents the inertia of the thing. If it’s a real thing on a real spring it’s mass. Then we know quantitatively how it moves. It has a period, based on this spring constant and this mass. And we can say how big the oscillations are based on how big the starting displacement and velocity are. That’s everything I care about in a spring. At least until I get into something wild like several springs wired together, which I am not doing now and might never do.

And, as we’ll see when we get back to orbits, a lot of things work close enough to springs.

## Excuses, But Classed Up Some

Afraid I’m behind on resuming Why Stuff Can Orbit, mostly as a result of a power outage yesterday. It wasn’t a major one, but it did reshuffle all the week’s chores to yesterday when we could be places that had power, and kept me from doing as much typing as I wanted. I’m going to be riding this excuse for weeks.

So instead, here, let me pass this on to you.

It links to a post about the Legendre Transform, which is one of those cool advanced tools you get a couple years into a mathematics or physics major. It is, like many of these cool advanced tools, about solving differential equations. Differential equations turn up anytime the current state of something affects how it’s going to change, which is to say, anytime you’re looking at something not boring. It’s one of mathematics’s uses of “duals”, letting you swap between the function you’re interested in and what you know about how the function you’re interested in changes.

On the linked page, Jonathan Manton tries to present reasons behind the Legendre transform, in ways he likes better. It might not explain the idea in a way you like, especially if you haven’t worked with it before. But I find reading multiple attempts to explain an idea helpful. Even if one perspective doesn’t help, having a cluster of ideas often does.

## Reading the Comics, January 14, 2017: Redeye and Reruns Edition

So for all I worried about the Gocomics.com redesign it’s not bad. The biggest change is it’s removed a side panel and given the space over to the comics. And while it does show comics you haven’t been reading, it only shows one per day. One week in it apparently sticks with the same comic unless you choose to dismiss that. So I’ve had it showing me The Comic Strip That Has A Finale Every Day as a strip I’m not “reading”. I’m delighted how thisbreaks the logic about what it means to “not read” an “ongoing comic strip”. (That strip was a Super-Fun-Pak Comix offering, as part of Ruben Bolling’s Tom the Dancing Bug. It was turned into a regular Gocomics.com feature by someone who got the joke.)

Comic Strip Master Command responded to the change by sending out a lot of comic strips. I’m going to have to divide this week’s entry into two pieces. There’s not deep things to say about most of these comics, but I’ll make do, surely.

Julie Larson’s Dinette Set rerun for the 8th is about one of the great uses of combinatorics. That use is working out how the number of possible things compares to the number of things there are. What’s always staggering is that the number of possible things grows so very very fast. Here one of Larson’s characters claims a science-type show made an assertion about the number of possible ideas a brain could hold. I don’t know if that’s inspired by some actual bit of pop science. I can imagine someone trying to estimate the number of possible states a brain might have.

And that has to be larger than the number of atoms in the universe. Consider: there’s something less than a googol of atoms in the universe. But a person can certainly have the idea of the number 1, or the idea of the number 2, or the idea of the number 3, or so on. I admit a certain sameness seems to exist between the ideas of the numbers 2,038,412,562,593,604 and 2,038,412,582,593,604. But there is a difference. We can out-number the atoms in the universe even before we consider ideas like rabbits or liberal democracy or jellybeans or board games. The universe never had a chance.

Or did it? Is it possible for a number to be too big for the human brain to ponder? If there are more digits in the number than there are atoms in the universe we can’t form any discrete representation of it, after all. … Except that we kind of can. For example, “the largest prime number less than one googolplex” is perfectly understandable. We can’t write it out in digits, I think. But you now have thought of that number, and while you may not know what its millionth decimal digit is, you also have no reason to care what that digit is. This is stepping into the troubled waters of algorithmic complexity.

Bob Weber Jr’s Slylock Fox and Comics for Kids for the 9th is built on soap bubbles. The link between the wand and the soap bubble vanishes quickly once the bubble breaks loose of the wand. But soap films that keep adhered to the wand or mesh can be quite strangely shaped. Soap films are a practical example of a kind of partial differential equations problem. Partial differential equations often appear when we want to talk about shapes and surfaces and materials that tug or deform the material near them. The shape of a soap bubble will be the one that minimizes the torsion stresses of the bubble’s surface. It’s a challenge to solve analytically. It’s still a good challenge to solve numerically. But you can do that most wonderful of things and solve a differential equation experimentally, if you must. It’s old-fashioned. The computer tools to do this have gotten so common it’s hard to justify going to the engineering lab and getting soapy water all over a mathematician’s fingers. But the option is there.

Gordon Bess’s Redeye rerun from the 28th of August, 1970, is one of a string of confused-student jokes. (The strip had a Generic Comedic Western Indian setting, putting it in the vein of Hagar the Horrible and other comic-anachronism comics.) But I wonder if there are kids baffled by numbers getting made several different ways. Experience with recipes and assembly instructions and the like might train someone to thinking there’s one correct way to make something. That could build a bad intuition about what additions can work.

Corey Pandolph’s Barkeater Lake rerun for the 9th just name-drops algebra. And that as a word that starts with the “alj” sound. So far as I’m aware there’s not a clear etymological link between Algeria and algebra, despite both being modified Arabic words. Algebra comes from “al-jabr”, about reuniting broken things. Algeria comes from Algiers, which Wikipedia says derives from `al-jaza’ir”, “the Islands [of the Mazghanna tribe]”.

Guy Gilchrist’s Nancy for the 9th is another mathematics-cameo strip. But it was also the first strip I ran across this week that mentioned mathematics and wasn’t a rerun. I’ll take it.

Donna A Lewis’s Reply All for the 9th has Lizzie accuse her boyfriend of cheating by using mathematics in Scrabble. He seems to just be counting tiles, though. I think Lizzie suspects something like Blackjack card-counting is going on. Since there are only so many of each letter available knowing just how many tiles remain could maybe offer some guidance how to play? But I don’t see how. In Blackjack a player gets to decide whether to take more cards or not. Counting cards can suggest whether it’s more likely or less likely that another card will make the player or dealer bust. Scrabble doesn’t offer that choice. One has to refill up to seven tiles until the tile bag hasn’t got enough left. Perhaps I’m overlooking something; I haven’t played much Scrabble since I was a kid.

Perhaps we can take the strip as portraying the folk belief that mathematicians get to know secret, barely-explainable advantages on ordinary folks. That itself reflects a folk belief that experts of any kind are endowed with vaguely cheating knowledge. I’ll admit being able to go up to a blackboard and write with confidence a bunch of integrals feels a bit like magic. This doesn’t help with Scrabble.

Gordon Bess’s Redeye continued the confused-student thread on the 29th of August, 1970. This one’s a much older joke about resisting word problems.

Ryan North’s Dinosaur Comics rerun for the 10th talks about multiverses. If we allow there to be infinitely many possible universes that would suggest infinitely many different Shakespeares writing enormously many variations of everything. It’s an interesting variant on the monkeys-at-typewriters problem. I noticed how T-Rex put Shakespeare at typewriters too. That’ll have many of the same practical problems as monkeys-at-typewriters do, though. There’ll be a lot of variations that are just a few words or a trivial scene different from what we have, for example. Or there’ll be variants that are completely uninteresting, or so different we can barely recognize them as relevant. And that’s if it’s actually possible for there to be an alternate universe with Shakespeare writing his plays differently. That seems like it should be possible, but we lack evidence that it is.

## What I Learned Doing The End 2016 Mathematics A To Z

The slightest thing I learned in the most recent set of essays is that I somehow slid from the descriptive “End Of 2016” title to the prescriptive “End 2016” identifier for the series. My unscientific survey suggests that most people would agree that we had too much 2016 and would have been better off doing without it altogether. So it goes.

The most important thing I learned about this is I have to pace things better. The A To Z essays have been creeping up in length. I didn’t keep close track of their lengths but I don’t think any of them came in under a thousand words. 1500 words was more common. And that’s fine enough, but at three per week, plus the Reading the Comics posts, that’s 5500 or 6000 words of mathematics alone. And that before getting to my humor blog, which even on a brief week will be a couple thousand words. I understand in retrospect why November and December felt like I didn’t have any time outside the word mines.

I’m not bothered by writing longer essays, mind. I can apparently go on at any length on any subject. And I like the words I’ve been using. My suspicion is between these A To Zs and the Theorem Thursdays over the summer I’ve found a mode for writing pop mathematics that works for me. It’s just a matter of how to balance workloads. The humor blog has gotten consistently better readership, for the obvious reasons (lately I’ve been trying to explain what the story comics are doing), but the mathematics more satisfying. If I should have to cut back on either it’d be the humor blog that gets the cut first.

Another little discovery is that I can swap out equations and formulas and the like for historical discussion. That’s probably a useful tradeoff for most of my readers. And it plays to my natural tendencies. It is very easy to imagine me having gone into history than into mathematics or science. It makes me aware how mediocre my knowledge of mathematics history is, though. For example, several times in the End 2016 A To Z the Crisis of Foundations came up, directly or in passing. But I’ve never read a proper history, not even a basic essay, about the Crisis. I don’t even know of a good description of this important-to-the-field event. Most mathematics history focuses around biographies of a few figures, often cribbed from Eric Temple Bell’s great but unreliable book, or a couple of famous specific incidents. (Newton versus Leibniz, the bridges of Köningsburg, Cantor’s insanity, Gödel’s citizenship exam.) Plus Bourbaki.

That’s not enough for someone taking the subject seriously, and I do mean to. So if someone has a suggestion for good histories of, for example, how Fourier series affected mathematicians’ understanding of what functions are, I’d love to know it. Maybe I should set that as a standing open request.

In looking over the subjects I wrote about I find a pretty strong mix of group theory and real analysis. Maybe that shouldn’t surprise. Those are two of the maybe three legs that form a mathematics major’s education. So anyone wanting to understand mathematicians would see this stuff and have questions about it. (There are more things mathematics majors learn, but there are a handful of things almost any mathematics major is sure to spend a year being baffled by.)

The third leg, I’d say, is differential equations. That’s a fantastic field, but it’s hard to describe without equations. Also pictures of what the equations imply. I’ve tended towards essays with few equations and pictures. That’s my laziness. Equations are best written in LaTeX, a typesetting tool that might as well be the standard for mathematicians writing papers and books. While WordPress supports a bit of LaTeX it isn’t quite effortless. That comes back around to balancing my workload. I do that a little better and I can explain solving first-order differential equations by integrating factors. (This is a prank. Nobody has ever needed to solve a first-order differential equation by integrating factors except for mathematics majors being taught the method.) But maybe I could make a go of that.

I’m not setting any particular date for the next A-To-Z, or similar, project. I need some time to recuperate. And maybe some time to think of other running projects that would be fun or educational for me. There’ll be something, though.

## The End 2016 Mathematics A To Z: Boundary Value Problems

I went to a grad school, Rensselaer Polytechnic Institute. The joke at the school is that the mathematics department has two tracks, “Applied Mathematics” and “More Applied Mathematics”. So I got to know the subject of today’s A To Z very well. It’s worth your knowing too.

## Boundary Value Problems.

I’ve talked about differential equations before. I’ll talk about them again. They’re important. They might be the most directly useful sort of higher mathematics. They turn up naturally whenever you have a system whose changes depend on the current state of things.

There are many kinds of differential equations problems. The ones that come first to mind, and that students first learn, are “initial value problems”. In these, you’re given some system, told how it changes in time, and told what things are at a start. There’s good reasons to do that. It’s conceptually easy. It describes all sorts of systems where something moves. Think of your classic physics problems of a ball being tossed in the air, or a weight being put on a spring, or a planet orbiting a sun. These are classic initial value problems. They almost look like natural experiments. Set a thing up and watch what happens.

They’re not everything. There’s another class of problems at least as important. Maybe more important. In these we’re given how the parts of a system affect one another. And we’re told some information about the edges of the system. The boundaries, that is. And these are “boundary value problems”.

Mathematics majors learn them after getting thoroughly trained in and sick of initial value problems. There’s reasons for that. First is that they almost need to be about problems with multiple variables. You can set one up for, like, a ball tossed in the air. But they’re rarer. Differential equations for multiple variables are harder than differential equations for a single variable, because of course. We have to learn the tools of “partial differential equations”. In these we work out how the system changes if we pretend all but one of the variables is fixed. We combine information about all those changes for each individual changing variable. Lots more, and lots stranger, stuff can happen.

The partial differential equation describes some region. It involves maybe some space, maybe some time, maybe both. There’s a region, called the “domain”, for which the differential equation is true.

For example, maybe we’re interested in the amount of heat in a metal bar as it’s warmed on one end and cooled on another. The domain here is the length of the bar and the time it’s subjected to the heat and cool. Or maybe we’re interested in the amount of water flowing through a section of a river bed. The domain here is the length and width and depth of the river, if we suppose the river isn’t swelling or shrinking or changing much. Maybe we’re intersted in the electric field created by putting a bit of charge on a metal ball. Then the domain is the entire universe except the metal ball and the space inside it. We’re comfortable with boundlessly large domains.

But what makes this a boundary value problem is that we know something about the boundary looks like. Once again a mathematics term is less baffling than you might figure. The boundary is just what it sounds like: the edge of the domain, the part that divides the domain from not-the-domain. The metal bar being heated up has boundaries on either end. The river bed has boundaries at the surface of the water, the banks of the river, and the start and the end of wherever we’re observing. The metal ball has boundaries of the ball’s surface and … uh … the limits of space and time, somewhere off infinitely far away.

There’s all kinds of information we might get about a boundary. What we actually get is one of four kinds. The first kind is “we get told what values the solution should be at the boundary”. Mathematics majors love this because it lets us know we at least have the boundary’s values right. It’s certainly what we learn on first. And it might be most common. If we’re measuring, say, temperature or fluid speed or something like that we feel like we can know what these are. If we need a name we call this “Dirichlet Boundary Conditions”. That’s named for Peter Gustav Lejune Dirichlet. He’s one of those people mathematics majors keep running across. We get stuff named for him in mathematical physics, in probability, in heat, in Fourier series.

The second kind is “we get told what the derivative of the solution should be at the boundary”. Mathematics majors hate this because we’re having a hard enough time solving this already and you want us to worry about the derivative of the solution on the boundary? Give us something we can check, please. But this sort of boundary condition keeps turning up. It comes up, for instance, in the electric field around a conductive metal box, or ball, or plate. The electric field will be, near the metal plate, perpendicular to the conductive metal. Goodness knows what the electric field’s value is, but we know something about how it changes. If we need a name we call this “Neumann Boundary Conditions”. This is not named for the applied mathematician/computer scientist/physicist John von Neumann. Nobody remembers the Neumann it is named for, who was Carl Neumann.

The third kind is called “Robin boundary conditions” if someone remembers the name for it. It’s slightly named for Victor Gustave Robin. In these we don’t necessarily know the value the solution should have on the boundary. And we don’t know what the derivative of the solution on the boundary should be. But we do know some linear combination of them. That is, we know some number times the original value plus some (possibly other) number times the derivative. Mathematics majors loathe this one because the Neumann boundary conditions were hard enough and now we have this? They turn up in heat and diffusion problems, when there’s something limiting the flow of whatever you’re studying into and out of the region.

And the last kind is called “mixed boundary conditions” as, I don’t know, nobody seems to have got their name attached to it. In this we break up the boundary. For some of it we get, say, Dirichlet boundary conditions. For some of the boundary we get, say, Neumann boundary conditions. Or maybe we have Robin boundary conditions for some of the edge and Dirichlet for others. Whatever. This mathematics majors get once or twice, as punishment for their sinful natures, and then we try never to think of them again because of the pain. Sometimes it’s the only approach that fits the problem. Still hurts.

We see boundary value problems when we do things like blow a soap bubble using weird wireframes and ponder the shape. Or when we mix hot coffee and cold milk in a travel mug and ponder how the temperatures mix. Or when we see a pipe squeezing into narrower channels and wonder how this affects the speed of water flowing into and out of it. Often these will be problems about how stuff over a region, maybe of space and maybe of time, will settle down to some predictable, steady pattern. This is why it turns up all over applied mathematics problems, and why in grad school we got to know them so very well.

## How Differential Calculus Works

I’m at a point in my Why Stuff Can Orbit essays where I need to talk about derivatives. If I don’t then I’m going to be stuck with qualitative talk that’s too hard to focus on anything. But it’s a big subject. It’s about two-fifths of a Freshman Calculus course and one of the two main legs of calculus. So I want to pull out a quick essay explaining the important stuff.

Derivatives, also called differentials, are about how things change. By “things” I mean “functions”. And particularly I mean functions which have as a domain that’s in the real numbers and a range that’s in the real numbers. That’s the starting point. We can define derivatives for functions with domains or ranges that are vectors of real numbers, or that are complex-valued numbers, or are even more exotic kinds of numbers. But those ideas are all built on the derivatives for real-valued functions. So if you get really good at them, everything else is easy.

Derivatives are the part of calculus where we study how functions change. They’re blessed with some easy-to-visualize interpretations. Suppose you have a function that describes where something is at a given time. Then its derivative with respect to time is the thing’s velocity. You can build a pretty good intuitive feeling for derivatives that way. I won’t stop you from it.

A function’s derivative is itself a function. The derivative doesn’t have to be constant, although it’s possible. Sometimes the derivative is positive. This means the original function increases as whatever its variable is increases. Sometimes the derivative is negative. This means the original function decreases as its variable increases. If the derivative is a big number this means it’s increasing or decreasing fast. If the derivative is a small number it’s increasing or decreasing slowly. If the derivative is zero then the function isn’t increasing or decreasing, at least not at this particular value of the variable. This might be a local maximum, where the function’s got a larger value than it has anywhere else nearby. It might be a local minimum, where the function’s got a smaller value than it has anywhere else nearby. Or it might just be a little pause in the growth or shrinkage of the function. No telling, at least not just from knowing the derivative is zero.

Derivatives tell you something about where the function is going. We can, and in practice often do, make approximations to a function that are built on derivatives. Many interesting functions are real pains to calculate. Derivatives let us make polynomials that are as good an approximation as we need and that a computer can do for us instead.

Derivatives can change rapidly. That’s all right. They’re functions in their own right. Sometimes they change more rapidly and more extremely than the original function did. Sometimes they don’t. Depends what the original function was like.

Not every function has a derivative. Functions that have derivatives don’t necessarily have them at every point. A function has to be continuous to have a derivative. A function that jumps all over the place doesn’t really have a direction, not that differential calculus will let us suss out. But being continuous isn’t enough. The function also has to be … well, we call it “differentiable”. The word at least tells us what the property is good for, even if it doesn’t say how to tell if a function is differentiable. The function has to not have any corners, points where the direction suddenly and unpredictably changes.

Otherwise differentiable functions can have some corners, some non-differentiable points. For instance the height of a bouncing ball, over time, looks like a bunch of upside-down U-like shapes that suddenly rebound off the floor and go up again. Those points interrupting the upside-down U’s aren’t differentiable, even though the rest of the curve is.

It’s possible to make functions that are nothing but corners and that aren’t differentiable anywhere. These are done mostly to win quarrels with 19th century mathematicians about the nature of the real numbers. We use them in Real Analysis to blow students’ minds, and occasionally after that to give a fanciful idea a hard test. In doing real-world physics we usually don’t have to think about them.

So why do we do them? Because they tell us how functions change. They can tell us where functions momentarily don’t change, which are usually important. And equations with differentials in them, known in the trade as “differential equations”. They’re also known to mathematics majors as “diffy Q’s”, a name which delights everyone. Diffy Q’s let us describe physical systems where there’s any kind of feedback. If something interacts with its surroundings, that interaction’s probably described by differential equations.

So how do we do them? We start with a function that we’ll call ‘f’ because we’re not wasting more of our lives figuring out names for functions and we’re feeling smugly confident Turkish authorities aren’t interested in us.

f’s domain is real numbers, for example, the one we’ll call ‘x’. Its range is also real numbers. f(x) is some number in that range. It’s the one that the function f matches with the domain’s value of x. We read f(x) aloud as “the value of f evaluated at the number x”, if we’re pretending or trying to scare Freshman Calculus classes. Really we say “f of x” or maybe “f at x”.

There’s a couple ways to write down the derivative of f. First, for example, we say “the derivative of f with respect to x”. By that we mean how does the value of f(x) change if there’s a small change in x. That difference matters if we have a function that depends on more than one variable, if we had an “f(x, y)”. Having several variables for f changes stuff. Mostly it makes the work longer but not harder. We don’t need that here.

But there’s a couple ways to write this derivative. The best one if it’s a function of one variable is to put a little ‘ mark: f'(x). We pronounce that “f-prime of x”. That’s good enough if there’s only the one variable to worry about. We have other symbols. One we use a lot in doing stuff with differentials looks for all the world like a fraction. We spend a lot of time in differential calculus explaining why it isn’t a fraction, and then in integral calculus we do some shady stuff that treats it like it is. But that’s $\frac{df}{dx}$. This also appears as $\frac{d}{dx} f$. If you encounter this do not cross out the d’s from the numerator and denominator. In this context the d’s are not numbers. They’re actually operators, which is what we call a function whose domain is functions. They’re notational quirks is all. Accept them and move on.

How do we calculate it? In Freshman Calculus we introduce it with this expression involving limits and fractions and delta-symbols (the Greek letter that looks like a triangle). We do that to scare off students who aren’t taking this stuff seriously. Also we use that formal proper definition to prove some simple rules work. Then we use those simple rules. Here’s some of them.

1. The derivative of something that doesn’t change is 0.
2. The derivative of xn, where n is any constant real number — positive, negative, whole, a fraction, rational, irrational, whatever — is n xn-1.
3. If f and g are both functions and have derivatives, then the derivative of the function f plus g is the derivative of f plus the derivative of g. This probably has some proper name but it’s used so much it kind of turns invisible. I dunno. I’d call it something about linearity because “linearity” is what happens when adding stuff works like adding numbers does.
4. If f and g are both functions and have derivatives, then the derivative of the function f times g is the derivative of f times the original g plus the original f times the derivative of g. This is called the Product Rule.
5. If f and g are both functions and have derivatives we might do something called composing them. That is, for a given x we find the value of g(x), and then we put that value in to f. That we write as f(g(x)). For example, “the sine of the square of x”. Well, the derivative of that with respect to x is the derivative of f evaluated at g(x) times the derivative of g. That is, f'(g(x)) times g'(x). This is called the Chain Rule and believe it or not this turns up all the time.
6. If f is a function and C is some constant number, then the derivative of C times f(x) is just C times the derivative of f(x), that is, C times f'(x). You don’t really need this as a separate rule. If you know the Product Rule and that first one about the derivatives of things that don’t change then this follows. But who wants to go through that many steps for a measly little result like this?
7. Calculus textbooks say there’s this Quotient Rule for when you divide f by g but you know what? The one time you’re going to need it it’s easier to work out by the Product Rule and the Chain Rule and your life is too short for the Quotient Rule. Srsly.
8. There’s some functions with special rules for the derivative. You can work them out from the Real And Proper Definition. Or you can work them out from infinitely-long polynomial approximations that somehow make sense in context. But what you really do is just remember the ones anyone actually uses. The derivative of ex is ex and we love it for that. That’s why e is that 2.71828(etc) number and not something else. The derivative of the natural log of x is 1/x. That’s what makes it natural. The derivative of the sine of x is the cosine of x. That’s if x is measured in radians, which it always is in calculus, because it makes that work. The derivative of the cosine of x is minus the sine of x. Again, radians. The derivative of the tangent of x is um the square of the secant of x? I dunno, look that sucker up. The derivative of the arctangent of x is $\frac{1}{1 + x^2}$ and you know what? The calculus book lists this in a table in the inside covers. Just use that. Arctangent. Sheesh. We just made up the “secant” and the “cosecant” to haze Freshman Calculus students anyway. We don’t get a lot of fun. Let us have this. Or we’ll make you hear of the inverse hyperbolic cosecant.

So. What’s this all mean for central force problems? Well, here’s what the effective potential energy V(r) usually looks like:

$V_{eff}(r) = C r^n + \frac{L^2}{2 m r^2}$

So, first thing. The variable here is ‘r’. The variable name doesn’t matter. It’s just whatever name is convenient and reminds us somehow why we’re talking about this number. We adapt our rules about derivatives with respect to ‘x’ to this new name by striking out ‘x’ wherever it appeared and writing in ‘r’ instead.

Second thing. C is a constant. So is n. So is L and so is m. 2 is also a constant, which you never think about until a mathematician brings it up. That second term is a little complicated in form. But what it means is “a constant, which happens to be the number $\frac{L^2}{2m}$, multiplied by r-2”.

So the derivative of Veff, with respect to r, is the derivative of the first term plus the derivative of the second. The derivatives of each of those terms is going to be some constant time r to another number. And that’s going to be:

$V'_{eff}(r) = C n r^{n - 1} - 2 \frac{L^2}{2 m r^3}$

And there you can divide the 2 in the numerator and the 2 in the denominator. So we could make this look a little simpler yet, but don’t worry about that.

OK. So that’s what you need to know about differential calculus to understand orbital mechanics. Not everything. Just what we need for the next bit.

## Reading the Comics, June 25, 2016: Busy Week Edition

I had meant to cut the Reading The Comics posts back to a reasonable one a week. Then came the 23rd, which had something like six hundred mathematically-themed comic strips. So I could post another impossibly long article on Sunday or split what I have. And splitting works better for my posting count, so, here we are.

Charles Brubaker’s Ask A Cat for the 19th is a soap-bubbles strip. As ever happens with comic strips, the cat blows bubbles that can’t happen without wireframes and skillful bubble-blowing artistry. It happens that a few days ago I linked to a couple essays showing off some magnificent surfaces that the right wireframe boundary might inspire. The mathematics describing how a soap bubbles’s shape should be made aren’t hard; I’m confident I could’ve understood the equations as an undergraduate. Finding exact solutions … I’m not sure I could have done. (I’d still want someone looking over my work if I did them today.) But numerical solutions, that I’d be confident in doing. And the real thing is available when you’re ready to get your hands … dirty … with soapy water.

Rick Stromoski’s Soup To Nutz for the 19th Shows RoyBoy on the brink of understanding symmetry. To lose at rock-paper-scissors is indeed just as hard as winning is. Suppose we replaced the names of the things thrown with letters. Suppose we replace ‘beats’ and ‘loses to’ with nonsense words. Then we could describe the game: A flobs B. B flobs C. C flobs A. A dostks C. C dostks B. B dostks A. There’s no way to tell, from this, whether A is rock or paper or scissors, or whether ‘flob’ or ‘dostk’ is a win.

Bill Whitehead’s Free Range for the 20th is the old joke about tipping being the hardest kind of mathematics to do. Proof? There’s an enormous blackboard full of symbols and the three guys in lab coats are still having trouble with it. I have long wondered why tips are used as the model of impossibly difficult things to compute that aren’t taxes. I suppose the idea of taking “fifteen percent” (or twenty, or whatever) of something suggests a need for precision. And it’ll be fifteen percent of a number chosen without any interest in making the calculation neat. So it looks like the worst possible kind of arithmetic problem. But the secret, of course, is that you don’t have to have “the” right answer. You just have to land anywhere in an acceptable range. You can work out a fraction — a sixth, a fifth, or so — of a number that’s close to the tab and you’ll be right. So, as ever, it’s important to know how to tell whether you have a correct answer before worrying about calculating it.

Allison Barrows’s Preeteena rerun for the 20th is your cheerleading geometry joke for this week.

I am sure Bill Holbrook’s On The Fastrack for the 22nd is not aimed at me. He hangs around Usenet group rec.arts.comics.strips some, as I do, and we’ve communicated a bit that way. But I can’t imagine he thinks of me much or even at all once he’s done with racs for the day. Anyway, Dethany does point out how a clear identity helps one communicate mathematics well. (Fi is to talk with elementary school girls about mathematics careers.) And bitterness is always a well-received pose. Me, I’m aware that my pop-mathematics brand identity is best described as “I guess he writes a couple things a week, doesn’t he?” and I could probably use some stronger hook, somewhere. I just don’t feel curmudgeonly most of the time.

Darby Conley’s Get Fuzzy rerun for the 22nd is about arithmetic as a way to be obscure. We’ve all been there. I had, at first, read Bucky’s rating as “out of 178 1/3 π” and thought, well, that’s not too bad since one-third of π is pretty close to 1. But then, Conley being a normal person, probably meant “one-hundred seventy-eight and a third”, and π times that is a mess. Well, it’s somewhere around 550 or so. Octave tells me it’s more like 560.251 and so on.

## A Leap Day 2016 Mathematics A To Z: Orthonormal

Jacob Kanev had requested “orthogonal” for this glossary. I’d be happy to oblige. But I used the word in last summer’s Mathematics A To Z. And I admit I’m tempted to just reprint that essay, since it would save some needed time. But I can do something more.

## Orthonormal.

“Orthogonal” is another word for “perpendicular”. Mathematicians use it for reasons I’m not precisely sure of. My belief is that it’s because “perpendicular” sounds like we’re talking about directions. And we want to extend the idea to things that aren’t necessarily directions. As majors, mathematicians learn orthogonality for vectors, things pointing in different directions. Then we extend it to other ideas. To functions, particularly, but we can also define it for spaces and for other stuff.

I was vague, last summer, about how we do that. We do it by creating a function called the “inner product”. That takes in two of whatever things we’re measuring and gives us a real number. If the inner product of two things is zero, then the two things are orthogonal.

The first example mathematics majors learn of this, before they even hear the words “inner product”, are dot products. These are for vectors, ordered sets of numbers. The dot product we find by matching up numbers in the corresponding slots for the two vectors, multiplying them together, and then adding up the products. For example. Give me the vector with values (1, 2, 3), and the other vector with values (-6, 5, -4). The inner product will be 1 times -6 (which is -6) plus 2 times 5 (which is 10) plus 3 times -4 (which is -12). So that’s -6 + 10 – 12 or -8.

So those vectors aren’t orthogonal. But how about the vectors (1, -1, 0) and (0, 0, 1)? Their dot product is 1 times 0 (which is 0) plus -1 times 0 (which is 0) plus 0 times 1 (which is 0). The vectors are perpendicular. And if you tried drawing this you’d see, yeah, they are. The first vector we’d draw as being inside a flat plane, and the second vector as pointing up, through that plane, like a thumbtack.

Well … the inner product can tell us something besides orthogonality. What happens if we take the inner product of a vector with itself? Say, (1, 2, 3) with itself? That’s going to be 1 times 1 (which is 1) plus 2 times 2 (4, according to rumor) plus 3 times 3 (which is 9). That’s 14, a tidy sum, although, so what?

The inner product of (-6, 5, -4) with itself? Oh, that’s some ugly numbers. Let’s skip it. How about the inner product of (1, -1, 0) with itself? That’ll be 1 times 1 (which is 1) plus -1 times -1 (which is positive 1) plus 0 times 0 (which is 0). That adds up to 2. And now, wait a minute. This might be something.

Start from somewhere. Move 1 unit to the east. (Don’t care what the unit is. Inches, kilometers, astronomical units, anything.) Then move -1 units to the north, or like normal people would say, 1 unit o the south. How far are you from the starting point? … Well, you’re the square root of 2 units away.

Now imagine starting from somewhere and moving 1 unit east, and then 2 units north, and then 3 units straight up, because you found a convenient elevator. How far are you from the starting point? This may take a moment of fiddling around with the Pythagorean theorem. But you’re the square root of 14 units away.

And what the heck, (0, 0, 1). The inner product of that with itself is 0 times 0 (which is zero) plus 0 times 0 (still zero) plus 1 times 1 (which is 1). That adds up to 1. And, yeah, if we go one unit straight up, we’re one unit away from where we started.

The inner product of a vector with itself gives us the square of the vector’s length. At least if we aren’t using some freak definition of inner products and lengths and vectors. And this is great! It means we can talk about the length — maybe better to say the size — of things that maybe don’t have obvious sizes.

Some stuff will have convenient sizes. For example, they’ll have size 1. The vector (0, 0, 1) was one such. So is (1, 0, 0). And you can think of another example easily. Yes, it’s $\left(\frac{1}{\sqrt{2}}, -\frac{1}{2}, \frac{1}{2}\right)$. (Go ahead, check!)

So by “orthonormal” we mean a collection of things that are orthogonal to each other, and that themselves are all of size 1. It’s a description of both what things are by themselves and how they relate to one another. A thing can’t be orthonormal by itself, for the same reason a line can’t be perpendicular to nothing in particular. But a pair of things might be orthogonal, and they might be the right length to be orthonormal too.

Why do this? Well, the same reasons we always do this. We can impose something like direction onto a problem. We might be able to break up a problem into simpler problems, one in each direction. We might at least be able to simplify the ways different directions are entangled. We might be able to write a problem’s solution as the sum of solutions to a standard set of representative simple problems. This one turns up all the time. And an orthogonal set of something is often a really good choice of a standard set of representative problems.

This sort of thing turns up a lot when solving differential equations. And those often turn up when we want to describe things that happen in the real world. So a good number of mathematicians develop a habit of looking for orthonormal sets.

## The Set Tour, Part 7: Matrices

I feel a bit odd about this week’s guest in the Set Tour. I’ve been mostly concentrating on sets that get used as the domains or ranges for functions a lot. The ones I want to talk about here don’t tend to serve the role of domain or range. But they are used a great deal in some interesting functions. So I loosen my rule about what to talk about.

## Rm x n and Cm x n

Rm x n might explain itself by this point. If it doesn’t, then this may help: the “x” here is the multiplication symbol. “m” and “n” are positive whole numbers. They might be the same number; they might be different. So, are we done here?

Maybe not quite. I was fibbing a little when I said “x” was the multiplication symbol. R2 x 3 is not a longer way of saying R6, an ordered collection of six real-valued numbers. The x does represent a kind of product, though. What we mean by R2 x 3 is an ordered collection, two rows by three columns, of real-valued numbers. Say the “x” here aloud as “by” and you’re pronouncing it correctly.

What we get is called a “matrix”. If we put into it only real-valued numbers, it’s a “real matrix”, or a “matrix of reals”. Sometimes mathematical terminology isn’t so hard to follow. Just as with vectors, Rn, it matters just how the numbers are organized. R2 x 3 means something completely different from what R3 x 2 means. And swapping which positions the numbers in the matrix occupy changes what matrix we have, as you might expect.

You can add together matrices, exactly as you can add together vectors. The same rules even apply. You can only add together two matrices of the same size. They have to have the same number of rows and the same number of columns. You add them by adding together the numbers in the corresponding slots. It’s exactly what you would do if you went in without preconceptions.

You can also multiply a matrix by a single number. We called this scalar multiplication back when we were working with vectors. With matrices, we call this scalar multiplication. If it strikes you that we could see vectors as a kind of matrix, yes, we can. Sometimes that’s wise. We can see a vector as a matrix in the set R1 x n or as one in the set Rn x 1, depending on just what we mean to do.

It’s trickier to multiply two matrices together. As with vectors multiplying the numbers in corresponding positions together doesn’t give us anything. What we do instead is a time-consuming but not actually hard process. But according to its rules, something in Rm x n we can multiply by something in Rn x k. “k” is another whole number. The second thing has to have exactly as many rows as the first thing has columns. What we get is a matrix in Rm x k.

I grant you maybe didn’t see that coming. Also a potential complication: if you can multiply something in Rm x n by something in Rn x k, can you multiply the thing in Rn x k by the thing in Rm x n? … No, not unless k and m are the same number. Even if they are, you can’t count on getting the same product. Matrices are weird things this way. They’re also gateways to weirder things. But it is a productive weirdness, and I’ll explain why in a few paragraphs.

A matrix is a way of organizing terms. Those terms can be anything. Real matrices are surely the most common kind of matrix, at least in mathematical usage. Next in common use would be complex-valued matrices, much like how we get complex-valued vectors. These are written Cm x n. A complex-valued matrix is different from a real-valued matrix. The terms inside the matrix can be complex-valued numbers, instead of real-valued numbers. Again, sometimes, these mathematical terms aren’t so tricky.

I’ve heard occasionally of people organizing matrices of other sets. The notation is similar. If you’re building a matrix of “m” rows and “n” columns out of the things you find inside a set we’ll call H, then you write that as Hm x n. I’m not saying you should do this, just that if you need to, that’s how to tell people what you’re doing.

Now. We don’t really have a lot of functions that use matrices as domains, and I can think of fewer that use matrices as ranges. There are a couple of valuable ones, ones so valuable they get special names like “eigenvalue” and “eigenvector”. (Don’t worry about what those are.) They take in Rm x n or Cm x n and return a set of real- or complex-valued numbers, or real- or complex-valued vectors. Not even those, actually. Eigenvectors and eigenfunctions are only meaningful if there are exactly as many rows as columns. That is, for Rm x m and Cm x m. These are known as “square” matrices, just as you might guess if you were shaken awake and ordered to say what you guessed a “square matrix” might be.

They’re important functions. There are some other important functions, with names like “rank” and “condition number” and the like. But they’re not many. I believe they’re not even thought of as functions, any more than we think of “the length of a vector” as primarily a function. They’re just properties of these matrices, that’s all.

So why are they worth knowing? Besides the joy that comes of knowing something, I mean?

Here’s one answer, and the one that I find most compelling. There is cultural bias in this: I come from an applications-heavy mathematical heritage. We like differential equations, which study how stuff changes in time and in space. It’s very easy to go from differential equations to ordered sets of equations. The first equation may describe how the position of particle 1 changes in time. It might describe how the velocity of the fluid moving past point 1 changes in time. It might describe how the temperature measured by sensor 1 changes as it moves. It doesn’t matter. We get a set of these equations together and we have a majestic set of differential equations.

Now, the dirty little secret of differential equations: we can’t solve them. Most interesting physical phenomena are nonlinear. Linear stuff is easy. Small change 1 has effect A; small change 2 has effect B. If we make small change 1 and small change 2 together, this has effect A plus B. Nonlinear stuff, though … it just doesn’t work. Small change 1 has effect A; small change 2 has effect B. Small change 1 and small change 2 together has effect … A plus B plus some weird A times B thing plus some effect C that nobody saw coming and then C does something with A and B and now maybe we’d best hide.

There are some nonlinear differential equations we can solve. Those are the result of heroic work and brilliant insights. Compared to all the things we would like to solve there’s not many of them. Methods to solve nonlinear differential equations are as precious as ways to slay krakens.

But here’s what we can do. What we usually like to know about in systems are equilibriums. Those are the conditions in which the system stops changing. Those are interesting. We can usually find those points by boring but not conceptually challenging calculations. If we can’t, we can declare x0 represents the equilibrium. If we still care, we leave calculating its actual values to the interested reader or hungry grad student.

But what’s really interesting is: what happens if we’re near but not exactly at the equilibrium? Sometimes, we stay near it. Think of pushing a swing. However good a push you give, it’s going to settle back to the boring old equilibrium of dangling straight down. Sometimes, we go racing away from it. Think of trying to balance a pencil on its tip; if we did this perfectly it would stay balanced. It never does. We’re never perfect, or there’s some wind or somebody walks by and the perfect balance is foiled. It falls down and doesn’t bounce back up. Sometimes, whether it it stays near or goes away depends on what way it’s away from the equilibrium.

And now we finally get back to matrices. Suppose we are starting out near an equilibrium. We can, usually, approximate the differential equations that describe what will happen. The approximation may only be good if we’re just a tiny bit away from the equilibrium, but that might be all we really want to know. That approximation will be some linear differential equations. (If they’re not, then we’re just wasting our time.) And that system of linear differential equations we can describe using matrices.

If we can write what we are interested in as a set of linear differential equations, then we have won. We can use the many powerful tools of matrix arithmetic — linear algebra, specifically — to tell us everything we want to know about the system. We can say whether a small push away from the equilibrium stays small, or whether it grows, or whether it depends. We can say how fast the small push shrinks, or grows (for a while). We can say how the system will change, approximately.

This is what I love in matrices. It’s not everything there is to them. But it’s enough to make matrices important to me.

## Reading the Comics, July 29, 2015: Not Entirely Reruns Edition

Zach Weinersmith’s Saturday Morning Breakfast Cereal (July 25) gets its scheduled appearance here with a properly formed Venn Diagram joke. I’m unqualified to speak for rap musicians. When mathematicians speak of something being “for reals” they mean they’re speaking about a variable that might be any of the real numbers. This is as opposed to limiting the variable to being some rational or irrational number, or being a whole number. It’s also as opposed to letting the variable be some complex-valued number, or some more exotic kind of number. It’s a way of saying what kind of thing we want to find true statements about.

I don’t know when the Saturday Morning Breakfast Cereal first ran, but I know I’ve seen it appear in my Twitter feed. I believe all the Gocomics.com postings of this strip are reruns, but I haven’t read the strip long enough to say.

Steve Sicula’s Home And Away (July 26) is built on the joke of kids wise to mathematics during summer vacation. I don’t think this is a rerun, although we’ve seen the joke this summer before.

Daniel Beyer’s Offbeat Comics (July 27) depicts an angel with a square halo because “I was good2.” The association between squaring a number and squares goes back a long time. Well, it’s right there in the name, isn’t it? Florian Cajori’s A History Of Mathematical Notations cites the term “latus” and the abbreviation “l” to represent the side of a square being used by the Roman surveyor Junius Nipsus in the second century; for centuries this would be as good a term as anyone had for the thing to be calculated. (Res, meaning “thing”, was also popular.) Once you’ve taken the idea of calculating based on the length of a square, the jump to “square” for “length times itself” seems like a tiny one. But Cajori doesn’t seem to have examples of that being written until the 16th century.

The square of the quantity you’re interested in might be written q, for quadratus. The cube would be c, for cubus. The fourth power would be b or bq, for biquadratus, and so on. This is tolerable if you only have to work with a single unknown quantity, but the notation turns into gibberish the moment you want two variables in the mix. So it collapsed in the 17th century, replaced by the familiar x2 and x3 and so on. Many authors developed notations close to this: James Hume would write xii or xiii; Pierre Hérigone x2 or x3, all in one line. Rene Descartes would write x2 or x3 or so, and many, many followed him. Still, quite a few people — including Rene Descartes, Isaac Newton, and even as late a figure as Carl Gauss, in the early 19th century — would resist “x2”. They’d prefer “xx”. Gauss defended this on the grounds that “x2” takes up just as much space as “xx” and so fails the biggest point of having notation.

Corey Pandolph’s Toby, Robot Satan (July 27, rerun) uses sudoku as an example of the logic and reasoning problems that one would expect a robot should be able to do. It is weird to encounter one that’s helpless before them.

Cory Thomas’s Watch Your Head (July 27, rerun from 2007) mentions “Chebyshev grids” and “infinite boundaries” as things someone doing mathematics on the computer would do. And it does so correctly. Differential equations describe how things change on some domain over space and time. They can be very hard to solve exactly, but can be put on the computer very well. For this, we pick a representative set of points which we call a mesh. And we find an approximate representation of the original differential question, which we call a discretization or a difference equation. We can then solve this difference equation on the mesh, and if we’ve done our work right, this approximation will let us get a good estimate for the solution to the original problem over the whole original domain.

A Chebyshev grid is a particular arrangement of mesh points. It’s not uniform; it tends to clump up, becoming more common near the ends of the boundary. This is useful if you have reason to expect that the boundaries are more interesting than the middle of the domain. There’s no sense wasting good computing power calculating boring stuff. The mesh is named for Pafnuty Chebyshev, a 19th Century Russian mathematician whose name is all over mathematics. Unfortunately since he was a 19th Century Russian mathematician, his name is transcribed into English all sorts of ways. Chebyshev seems to be most common today, though Tchebychev used to be quite popular, which is why polynomials of his might be abbreviated as T. There are many alternatives.

Ah, but how do you represent infinite boundaries with the finitely many points of any calculatable mesh? There are many approaches. One is to just draw a really wide mesh and trust that all the action is happening near the center so omitting the very farthest things doesn’t hurt too much. Or you might figure what the average of things far away is, and make a finite boundary that has whatever that value is. Another approach is to make the boundaries repeating: go far enough to the right and you loop back around to the left, go far enough up and you loop back around to down. Another approach is to create a mesh that is bundled up tight around the center, but that has points which do represent going off very, very far, maybe in principle infinitely far away. You’re allowed to create meshes that don’t space points uniformly, and that even move points as you compute. That’s harder work, but it’s legitimate numerical mathematics.

So, the mathematical work being described here is — so far as described — legitimate. I’m not competent to speak about the monkey side of the research.

Greg Evans’s Luann Againn (July 29; rerun from July 29, 1987) name-drops the Law of Averages. There are actually multiple Laws of Averages, with slightly different assumptions and implications, but they all come to about the same meaning. You can expect that if some experiment is run repeatedly, the average value of the experiments will be close to the true value of whatever you’re measuring. An important step in proving this law was done by Pafnuty Chebyshev.

## Into.

The definition of “into” will call back to my A to Z piece on “bijections”. It particularly call on what mathematicians mean by a function. When a mathematician talks about a functions she means a combination three components. The first is a set called the domain. The second is a set called the range. The last is a rule that matches up things in the domain to things in the range.

We said the function was “onto” if absolutely everything which was in the range got used. That is, if everything in the range has at least one thing in the domain that the rule matches to it. The function that has domain of -3 to 3, and range of -27 to 27, and the rule that matches a number x in the domain to the number x3 in the range is “onto”.

## A Summer 2015 Mathematics A To Z: ansatz

Sue Archer at the Doorway Between Worlds blog recently completed an A to Z challenge. I decided to follow her model and challenge and intend to do a little tour of some mathematical terms through the alphabet. My intent is to focus on some that are interesting terms of art that I feel non-mathematicians never hear. Or that they never hear clearly. Indeed, my first example is one I’m not sure I ever heard clearly described.

## Ansatz.

I first encountered this term in grad school. I can’t tell you when. I just realized that every couple sessions in differential equations the professor mentioned the ansatz for this problem. By then it felt too late to ask what it was I’d missed. In hindsight I’m not sure the professor ever made it clear. My research suggests the word is still a dialect rather than part of the universal language of mathematicians, and that it isn’t quite precisely defined.

What a mathematician means by the “ansatz” is the collection of ideas that go into solving a problem. This may be an assumption of what the solution should look like. This might be the assumptions of physical or mathematical properties a solution has to have. This might be a listing of properties that a valid solution would have to have. It could be the set of things you judge should be included, or ignored, in constructing a mathematical model of something. In short the ansatz is the set of possibly ad hoc assumptions you have to bring to a topic to make it something answerable. It’s different from the axioms of the field or the postulates for a problem. An axiom or postulate is assumed to be true by definition. The ansatz is a bunch of ideas we suppose are true because they seem likely to bring us to a solution.

An ansatz is good for getting an answer. It doesn’t do anything to verify that the answer means anything, though. The ansatz contains assumptions you the mathematician brought to the problem. You need to argue that the assumptions are reasonable, and reflect the actual problem you’re studying. You also should prove that the answer ultimately derived matches the actual behavior of whatever you were studying. Validating a solution can be the hardest part of mathematics, other than all the other parts of mathematics.

## What is Physics all about?

Over on the Reading Penrose blog, Jean Louis Van Belle (and I apologize if I’ve got the name capitalized or punctuated wrong but I couldn’t find the author’s name except in a run-together, uncapitalized form) is trying to understand Roger Penrose’s Road to Reality, about the various laws of physics as we understand them. In the entry for the 6th of December, “Ordinary Differential equations (II)”, he gets to the question “What’s Physics All About?” and comes to what I have to agree is the harsh fact: a lot of physics is about solving differential equations.

Some of them are ordinary differential equations, some of them are partial differential equations, but really, a lot of it is differential equations. Some of it is setting up models for differential equations. Here, though, he looks at a number of ordinary differential equations and how they can be classified. The post is a bit cryptic — he intends the blog to be his working notes while he reads a challenging book — but I think it’s still worth recommending as a quick tour through some of the most common, physics-relevant, kinds of ordinary differential equation.

## George Boole’s Birthday

The Maths History feed on Twitter reminded me that the second of November is the birthday of George Boole, one of a handful of people who’s managed to get a critically important computer data type named for him (others, of course, include Arthur von Integer and the Lady Annabelle String). Reminded is the wrong word; actually, I didn’t have any idea when his birthday was, other than that it was in the first half of the 19th century. To that extent I was right (it was 1815).

He’s famous, to the extent anyone in mathematics who isn’t Newton or Leibniz is, for his work in logic. “Boolean algebra” is even almost the default term for the kind of reasoning done on variables that may have either of exactly two possible values, which match neatly to the idea of propositions being either true or false. He’d also publicized how neatly the study of logic and the manipulation of algebraic symbols could parallel one another, which is a familiar enough notion that it takes some imagination to realize that it isn’t obviously so.

Boole also did work on linear differential equations, which are important because differential equations are nearly inevitably the way one describes a system in which the current state of the system affects how it is going to change, and linear differential equations are nearly the only kinds of differential equations that can actually be exactly solved. (There are some nonlinear differential equations that can be solved, but more commonly, we’ll find a linear differential equation that’s close enough to the original. Many nonlinear differential equations can also be approximately solved numerically, but that’s also quite difficult.)

His MacTutor History of Mathematics biography notes that Boole (when young) spent five years trying to teach himself differential and integral calculus — money just didn’t allow for him to attend school or hire a tutor — although given that he was, before the age of fourteen, able to teach himself ancient Greek I can certainly understand his supposition that he just needed the right books and some hard work. Apparently, at age fourteen he translated a poem by Meleager — I assume the poet from the first century BCE, though MacTutor doesn’t specify; there was also a Meleager who was briefly king of Macedon in 279 BCE, and another some decades before that who was a general serving Alexander the Great — so well that when it was published a local schoolmaster argued that a 14-year-old could not possibly have done that translation. He’d also, something I didn’t know until today, married Mary Everest, niece of the fellow whose name is on that tall mountain.

## From ElKement: On The Relation Of Jurassic Park and Alien Jelly Flowing Through Hyperspace

I’m frightfully late on following up on this, but ElKement has another entry in the series regarding quantum field theory, this one engagingly titled “On The Relation Of Jurassic Park and Alien Jelly Flowing Through Hyperspace”. The objective is to introduce the concept of phase space, a way of looking at physics problems that marks maybe the biggest thing one really needs to understand if one wants to be not just a physics major (or, for many parts of the field, a mathematics major) and a grad student.

As an undergraduate, it’s easy to get all sorts of problems in which, to pick an example, one models a damped harmonic oscillator. A good example of this is how one models the way a car bounces up and down after it goes over a bump, when the shock absorbers are working. You as a student are given some physical properties — how easily the car bounces, how well the shock absorbers soak up bounces — and how the first bounce went — how far the car bounced upward, how quickly it started going upward — and then work out from that what the motion will be ever after. It’s a bit of calculus and you might do it analytically, working out a complicated formula, or you might do it numerically, letting one of many different computer programs do the work and probably draw a picture showing what happens. That’s shown in class, and then for homework you do a couple problems just like that but with different numbers, and for the exam you get another one yet, and one more might turn up on the final exam.

## On exact and inexact differentials

The CarnotCycle blog recently posted a nice little article titled “On Exact And Inexact Differentials” and I’m bringing it to people’s attention because its the sort of thing which would have been extremely useful to me at a time when I was reading calculus-heavy texts that just assumed you knew what exact differentials were, without being aware that you probably missed the day in intro differential equations when they were explained. (That was by far my worst performance in a class. I have no excuse.)

So this isn’t going to be the most accessible article you run across on my blog here, until I finish making the switch to a full-on advanced statistical mechanics course. But if you start getting into, particularly, thermodynamics and wonder where this particular and slightly funky string of symbols comes from, this is a nice little warmup. For extra help, CarnotCycle also explains what makes something an inexact differential.

From the search term phrases that show up on this blog’s stats, CarnotCycle detects that a significant segment of visitors are studying foundation level thermodynamics  at colleges and universities around the world. So what better than a post that tackles that favorite test topic – exact and inexact differentials.

When I was an undergraduate, back in the time of Noah, we were first taught the visual approach to these things. Later we dispensed with diagrams and got our answers purely through the operations of calculus, but either approach is equally instructive. CarnotCycle herewith presents them both.

– – – –

The visual approach

Ok, let’s start off down the visual track by contemplating the following pair of pressure-volume diagrams:

The points A and B have identical coordinates on both diagrams, with A and B respectively representing the initial and final states of a closed PVT system, such as an…

View original post 782 more words

## Reading the Comics, September 11, 2012

Since the last installment of these mathematics-themed comic strips there’s been a steady drizzle of syndicated comics touching on something mathematical. This probably reflects the back-to-school interests that are naturally going to interest the people drawing either Precocious Children strips or Three Generations And A Dog strips.

## Why The Slope Is Too Interesting

After we have the intercept, the other thing we need is the slope. This is a very easy thing to start calculating and it’s extremely testable, but the idea weaves its way deep into all mathematics. It’s got an obvious physical interpretation. Imagine the x-coordinates are how far we are from some reference point in a horizontal direction, and the y-coordinates are how far we are from some reference point in the vertical direction. Then the slope is just the grade of the line: how much we move up or down for a given movement forward or back. It’s easy to calculate, it’s kind of obvious, so here’s what’s neat about it.

## Did King George III pay too little for astronomers or too much for tea?

In the opening pages of his 1998 biography George III: A Personal History, Christopher Hibbert tosses a remarkable statement into a footnote just after describing the allowance of Frederick, Prince of Wales, at George III’s birth:

Because of the fluctuating rate of inflation and other reasons it is not really practicable to translate eighteen-century sums into present-day equivalents. Multiplying the figures in this book by about sixty should give a very rough guide for the years before 1793. For the years of war between 1793 and 1815 the reader should multiply by about thirty, and thereafter by about forty.

“Not really practical” is wonderful understatement: it’s barely possible to compare the prices of things today to those of a half-century ago, and the modern economy at least existed in cartoon back then. I could conceivably have been paid for programming computers back then, but it would be harder for me to get into the field. To go back 250 years — before electricity, mass markets, public education, mass production, general incorporation laws, and nearly every form of transportation not muscle or wind-powered — and try to compare prices is nonsense. We may as well ask how many haikus it would take to tell Homer’s Odyssey, or how many limericks Ovid’s Metamorphoses would be.
Continue reading “Did King George III pay too little for astronomers or too much for tea?”