From my First A-to-Z: Z-transform


Back in the day I taught in a Computational Science department, which threw me out to exciting and new-to-me subjects more than once. One quite fun semester I was learning, and teaching, signal processing. This set me up for the triumphant conclusion of my first A-to-Z.

One of the things you can see in my style is mentioning the connotations implied by whether one uses x or z as a variable. Any letter will do, for the use it’s put to. But to use the name ‘z’ suggests an openness to something that ‘x’ doesn’t.

There’s a mention here about stability in algorithms, and the note that we can process data in ways that are stable or are unstable. I don’t mention why one would want or not want stability. Wanting stability hardly seems to need explaining; isn’t that the good option? And, often, yes, we want stable systems because they correct and wipe away error. But there are reasons we might want instability, or at least less stability. Too stable a system will obscure weak trends, or the starts of trends. Your weight flutters day by day in ways that don’t mean much, which is why it’s better to consider a seven-day average. If you took instead a 700-day running average, these meaningless fluctuations would be invisible. But you also would take a year or more to notice whether you were losing or gaining weight. That’s one of the things stability costs.


z-transform.

The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)

It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a0 represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.

a1, a2, a3, and so on are the first, second, third, and so on samples after the reference start. a-1, a-2, a-3, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.

So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a-4 = 77, a-3 = 81, a-2 = 84, a-1 = 82, a0 = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.

The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial 77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 . (z1 is the same as z. z0 is the same as the number “1”. I wrote it this way to make the pattern more clear.)

I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a1 = 83 and a2 = 76. Then the z-transform for this set of data would be 77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2 . You’d probably agree that’s not a polynomial, although it looks a lot like one.

The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.

A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression 77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2 .

Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure \cdots + 0z^2 + 0z^1 + 1 + 1\left(\frac{1}{z}\right)^1 + 1\left(\frac{1}{z}\right)^2 + 1\left(\frac{1}{z}\right)^3 + 1\left(\frac{1}{z}\right)^4 + \cdots . And that turns out to be the same as 1 \div \left(1 - \left(\frac{1}{z}\right)\right) . That’s much shorter to write down, at least.

Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a0, and adding to that one-fifth of yesterday’s temperature, a-1, and one-fifth of the day before’s temperature a-2, and one-fifth a-3, and one-fifth a-4.

The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?

We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.

Why not make an iceberg?


A post on Mathstodon made me aware there’s a bit of talk about iceberg shapes. Particularly that one of the iconic photographs of an iceberg above-and-below water, is a imaginative work. A real iceberg wouldn’t be stable in that orientation. Which, I’ll admit, isn’t something I had thought about. I also hadn’t thought about the photography challenge of getting a clear picture of something in sunlight and in water at once. There was a lot I hadn’t thought about. In my defense, I spend a lot of time noticing comic strips had a character complain about the New Math.

But this all leads me to a fun little play tool: Iceberger, designed to let you sketch in a potential iceberg and see what it does. Often, that’s roll over to a more stable orientation. It’s fun to play with, and to watch shapes tilt over, gradually or rapidly. And playing with it may help one develop a sense for what kinds of shapes should be stable in water, and what kinds should not.

My 2019 Mathematics A To Z: Versine


Jacob Siehler suggested the term for today’s A to Z essay. The letter V turned up a great crop of subjects: velocity, suggested by Dina Yagodich, and variable, from goldenoj, were also great suggestions. But Siehler offered something almost designed to appeal to me: an obscure function that shone in the days before electronic computers could do work for us. There was no chance of my resisting.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Versine.

A story about the comeuppance of a know-it-all who was not me. It was in mathematics class in high school. The teacher was explaining logic, and showing off diagrams. These would compute propositions very interesting to logic-diagram-class connecting symbols. These symbols meant logical AND and OR and NOT and so on. One of the students pointed out, you know, the only symbol you actually need is NAND. The teacher nodded; this was so. By the clever arrangement of enough NAND operations you could get the result of all the standard logic operations. He said he’d wait while the know-it-all tried it for any realistic problem. If we are able to do NAND we can construct an XOR. But we will understand what we are trying to do more clearly if we have an XOR in the kit.

So the versine. It’s a (spherical) trigonometric function. The versine of an angle is the same value as 1 minus the cosine of the angle. This seems like a confused name; shouldn’t something called “versine” have, you know, a sine in its rule? Sure, and if you don’t like that 1 minus the cosine thing, you can instead use this. The versine of an angle is two times the square of the sine of half the angle. There is a vercosine, so you don’t need to worry about that. The vercosine is two times the square of the cosine of half the angle. That’s also equal to 1 plus the cosine of the angle.

This is all fine, but what’s the point? We can see why it might be easier to say “versine of θ” than to say “2 sin(1/2 θ)”. But how is “versine of θ” easier than “one minus cosine of θ”?

The strongest answer, at the risk of sounding old, is to ask back, you know we haven’t always done things the way we do them now, right?

We have, these days, settled on an idea of what the important trigonometric functions are. Start with Cartesian coordinates on some flat space. Draw a circle of radius 1 and with center at the origin. Draw a horizontal line starting at the origin and going off in the positive-x-direction. Draw another line from the center and making an angle with respect to the horizontal line. That line intersects the circle somewhere. The x-coordinate of that point is the cosine of the angle. The y-coordinate of that point is the sine of the angle. What could be more sensible?

That depends what you think sensible. We’re so used to drawing circles and making lines inside that we forget we can do other things. Here’s one.

Start with a circle. Again with radius 1. Now chop an arc out of it. Pick up that arc and drop it down on the ground. How far does this arc reach, left to right? How high does it reach, top to bottom?

Well, the arc you chopped out has some length. Let me call that length 2θ. That two makes it easier to put this in terms of familiar trig functions. How much space does this chopped and dropped arc cover, horizontally? That’s twice the sine of θ. How tall is this chopped and dropped arc? That’s the versine of θ.

We are accustomed to thinking of the relationships between pieces of a circle like this in terms of angles inside the circle. Or of the relationships of the legs of triangles. It seems obviously useful. We even know many formulas relating sines and cosines and other major functions to each other. But it’s no less valid to look at arcs plucked out of a circle and the length of that arc and its width and its height. This might be more convenient, especially if we are often thinking about the outsides of circular things. Indeed, the oldest tables we in the Western tradition have of trigonometric functions list sines and versines. Cosines would come later.

This partly answers why there should have ever been a versine. But we’ve had the cosine since Arabian mathematicians started thinking seriously about triangles. Why had we needed versine the last 1200 years? And why don’t we need it anymore?

One answer here is that mention about the oldest tables of trigonometric functions. Or of less-old tables. Until recently, as things go, anyone who wanted to do much computing needed tables of common functions at many different values. These tables might not have the since we really need of, say, 1.17 degrees. But if the table had 1.1 and 1.2 we could get pretty close.

So a table of versines could make computation easier. You can, for example, use it to find square roots of numbers. (This essay actually, implicitly, uses vercosines. But it’ll give you the hint how to find them using versines.) Which is great if we have a table of versines but not, somehow, exponentials and logarithms. Well, if we could only take one chart in and we know trigonometry is needed, we might focus on that.

But trigonometry will be needed. One of the great fields of practical mathematics has long been navigation. We locate points on the globe using latitude and longitude. To find the distance between points is a messy calculation. The calculation becomes less longwinded, and more clear, written using versines. (Properly, it uses the haversine, which is one-half times the versine. It will not surprise you that a 19th-century English mathematician coined that name.)

Having a neat formula is pleasant, but, you know. It’s navigators and surveyors using these formulas. They can deal with a lengthy formula. The typesetters publishing their books are already getting hazard pay. Why change a bunch of \left(1 - \cos \left(\theta\right)\right) references to hav \left(\theta\right) instead?

We get a difference when it comes time to calculate. Like, pencil on paper. The cosine (sine, versine, haversine, whatever) of 1.17 degrees is a transcendental number. We do not have the paper to write that number out. We’ll write down instead enough digits until we get tired. 0.99979, say. Maybe 0.9998. To take 1 minus that number? That’s 0.00021. Maybe 0.0002. What’s the difference?

It’s in the precision. 1.17 degrees is a measure that has three significant digits. 0.00021? That’s only two digits. 0.0002? That’s got only one digit. We’ve lost precision, and without even noticing it. Whatever calculations we’re making on this have grown error margins. Maybe we’re doing calculations for which this won’t matter. Do we know that, though?

This reflects what we call numerical instability. You may have made only a slight error. But your calculation might magnify that error until it overwhelms your calculation. There isn’t any one fix for numerical instability. But there are some good general practices. Like, don’t divide a large number by a small one. Don’t add a tiny number to a large one. And don’t subtract two very-nearly-equal numbers. Calculating 1 minus the cosines of a small angle is subtracting a number that’s quite close to 1 from a number that is 1. You’re not guaranteed danger, but you are at greater risk.

A table of versines, rather than one of cosines, can compensate for this. If you’re making a table of versines it’s because you know people need the versine of 1.17 degrees with some precision. You can list it as 2.08488 times 10-4, and trust them to use as much precision as they need. For the cosine table, 0.999792 will give cosine-users the same number of significant digits. But it shortchanges versine-users.

And here we see a reason for the versine to go from minor but useful function to obscure function. Any modern computer calculates with floating point numbers. You can get fifteen or thirty or, if you really need, sixty digits of precision for the cosine of 1.17 degrees. So you can get twelve or twenty-seven or fifty-seven digits for the versine of 1.17 degrees. We don’t need to look this up in a table constructed by someone working out formulas carefully.

This, I have to warn, doesn’t always work. Versine formulas for things like distance work pretty well with small angles. There are other angles for which they work badly. You have to calculate in different orders and maybe use other functions in that case. Part of numerical computing is selecting the way to compute the thing you want to do. Versines are for some kinds of problems a good way.

There are other advantages versines offered back when computing was difficult. In spherical trigonometry calculations they can let one skip steps demanding squares and square roots. If you do need to take a square root, you have the assurance that the versine will be non-negative. You don’t have to check that you aren’t slipping complex-valued numbers into your computation. If you need to take a logarithm, similarly, you know you don’t have to deal with the log of a negative number. (You still have to do something to avoid taking the logarithm of zero, but we can’t have everything.)

So this is what the versine offered. You could get precision that just using a cosine table wouldn’t necessarily offer. You could dodge numerical instabilities. You could save steps, in calculations and in thinking what to calculate. These are all good things. We can respect that. We enjoy now a computational abundance, which makes the things versine gave us seem like absurd penny-pinching. If computing were hard again, we might see the versine recovered from obscurity to, at least, having more special interest.

Wikipedia tells me that there are still specialized uses for the versine, or for the haversine. Recent decades, apparently, have found useful tools for calculating lunar distances, and sight reductions. The lunar distance is the angular separation between the Moon and some other body in the sky. Sight reduction is calculating positions from the apparent positions of reference objects. These are not problems that I work on often. But I would appreciate that we are still finding ways to do them well.

I mentioned that besides the versine there was a coversine and a haversine. There’s also a havercosine, and then some even more obscure functions with no less wonderful names like the exsecant. I cannot imagine needing a hacovercosine, except maybe to someday meet an obscure poetic meter, but I am happy to know such a thing is out there in case. A person on Wikipedia’s Talk page about the versine wished to know if we could define a vertangent and some other terms. We can, of course, but apparently no one has found a need for such a thing. If we find a problem that we would like to solve that they do well, this may change.



Thank you for reading. This and the other essays for the Fall 2019 A to Z should appear at this link. We are coming up to the final four essays and I’m certainly excited by that. All the past A to Z essays ought to be at this link, and when I have a free afternoon to fix somethings, they will be.

Why Stuff Can Orbit, Part 13: To Close A Loop


Why Stuff Can Orbit, featuring a dazed-looking coati (it's a raccoon-like creature from Latin America) and a starry background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patreon for those able to support his work. He’s also open for commissions, starting from US$10.

Previously:

And the supplemental reading:


Today’s is one of the occasional essays in the Why Stuff Can Orbit sequence that just has a lot of equations. I’ve tried not to write everything around equations because I know what they’re like to read. They’re pretty to look at and after about four of them you might as well replace them with a big grey box that reads “just let your eyes glaze over and move down to the words”. It’s even more glaze-y than that for non-mathematicians.

But we do need them. Equations are wonderfully compact, efficient ways to write about things that are true. Especially things that are only true if exacting conditions are met. They’re so good that I’ll often find myself checking a textbook for an explanation of something and looking only at the equations, letting my eyes glaze over the words. That’s a chilling thing to catch yourself doing. Especially when you’ve written some obscure textbooks and a slightly read mathematics blog.

What I had been looking at was a perturbed central-force orbit. We have something, generically called a planet, that orbits the center of the universe. It’s attracted to the center of the universe by some potential energy, which we describe as ‘U(r)’. It’s some number that changes with the distance ‘r’ the planet has from the center of the universe. It usually depends on other stuff too, like some kind of mass of the planet or some constants or stuff. The planet has some angular momentum, which we can call ‘L’ and pretend is a simple number. It’s in truth a complicated number, but we’ve set up the problem where we can ignore the complicated stuff. This angular momentum implies the potential energy allows for a circular orbit at some distance which we’ll call ‘a’ from the center of the universe.

From ‘U(r)’ and ‘L’ we can say whether this is a stable orbit. If it’s stable, a little perturbation, a nudging, from the circular orbit will stay small. If it’s unstable, a little perturbation will keep growing and never stop. If we perturb this circular orbit the planet will wobble back and forth around the circular orbit. Sometimes the radius will be a little smaller than ‘a’, and sometimes it’ll be a little larger than ‘a’. And now I want to see whether we get a stable closed orbit.

The orbit will be closed if the planet ever comes back to the same position and same momentum that it started with. ‘Started’ is a weird idea in this case. But it’s common vocabulary. By it we mean “whatever properties the thing had when we started paying attention to it”. Usually in a problem like this we suppose there’s some measure of time. It’s typically given the name ‘t’ because we don’t want to make this hard on ourselves. The start is some convenient reference time, often ‘t = 0’. That choice usually makes the equations look simplest.

The position of the planet we can describe with two variables. One is the distance from the center of the universe, ‘r’, which we know changes with time: ‘r(t)’. Another is the angle the planet makes with respect to some reference line. The angle we might call ‘θ’ and often do. This will also change in time, then, ‘θ(t)’. We can pick other variables to describe where something is. But they’re going to involve more algebra, more symbol work, than this choice does so who needs it?

Momentum, now, that’s another set of variables we need to worry about. But we don’t need to worry about them. This particular problem is set up so that if we know the position of the planet we also have the momentum. We won’t be able to get both ‘r(t)’ and ‘θ(t)’ back to their starting values without also getting the momentum there. So we don’t have to worry about that. This won’t always work, as see my future series, ‘Why Statistical Mechanics Works’.

So. We know, because it’s not that hard to work out, how long it takes for ‘r(t)’ to get back to its original, ‘r(0)’, value. It’ll take a time we worked out to be (first big equation here, although we found it a couple essays back):

T_r = 2\pi\sqrt{ \frac{m}{ -F'(a) - \frac{3}{a} F(a) }}

Here ‘m’ is the mass of the planet. And ‘F’ is a useful little auxiliary function. It’s the force that the planet feels when it’s a distance from the origin. It’s defined as F(r) = -\frac{dU}{dr} . It’s convenient to have around. It makes equations like this one simpler, for one. And it’s weird to think of a central force problem where we never, ever see forces. The peculiar thing is we define ‘F’ for every distance the planet might be from the center of the universe. But all we care about is its value at the equilibrium, circular orbit distance of ‘a’. We also care about its first derivative, also evaluated at the distance of ‘a’, which is that F'(a) talk early on in that denominator.

So in the time between time ‘0’ and time ‘Tr‘ the perturbed radius will complete a full loop. It’ll reach its biggest value and its smallest value and get back to the original. (It is so much easier to suppose the perturbation starts at its biggest value at time ‘0’ that we often assume it has. It doesn’t have to be. But if we don’t have something forcing the choice of what time to call ‘0’ on us, why not pick one that’s convenient?) The question is whether ‘θ(t)’ completes a full loop in that time. If it does then we’ve gotten back to the starting position exactly and we have a closed orbit.

Thing is that the angle will never get back to its starting value. The angle ‘θ(t)’ is always increasing at a rate we call ‘ω’, the angular velocity. This number is constant, at least approximately. Last time we found out what this number was:

\omega = \frac{L}{ma^2}

So the angle, over time, is going to look like:

\theta(t) = \frac{L}{ma^2} t

And ‘θ(Tr)’ will never equal ‘θ(0)’ again, not unless ‘ω’ is zero. And if ‘ω’ is zero then the planet is racing away from the center of the universe never to be seen again. Or it’s plummeting into the center of the universe to be gobbled up by whatever resides there. In either case, not what we traditionally think of as orbits. Even if we allow these as orbits, these would be nudges too big to call perturbations.

So here’s the resolution. Angles are right awful pains in mathematical physics. This is because increasing an angle by 2π — or decreasing it by 2π — has no visible effect. In the language of the hew-mon, adding 360 degrees to a turn leaves you back where you started. A 45 degree angle is indistinguishable from a 405 degree angle, or a 765 degree angle, or a -315 degree angle, or so on. This makes for all sorts of irritating alternate cases to consider when you try solving for where one thing meets another. But it allows us to have closed orbits.

Because we can have a closed orbit, now, if the radius ‘r(t)’ completes a full oscillation in the time it takes ‘θ(t)’ to grow by 2π. Or to grow by π. Or to grow by ½π. Or a third of π. Or so on.

So. Last time we worked out that the angular velocity had to be this number:

\omega = \frac{L}{ma^2}

And that looked weird because the central force doesn’t seem to be there. It’s in there. It’s just implicit. We need to know what the central force is to work out what ‘a’ is. But we can make it explicit by using that auxiliary little function ‘F(r)’. In particular, at the circular orbit radius of ‘a’ we have that:

F(a) = -\frac{L^2}{ma^3}

I am going to use this to work out what ‘L’ has to be, in terms of ‘F’ and ‘m’ and ‘a’. First, multiply both sides of this equation by ‘ma3‘:

F(a) \cdot ma^3 = -L^2

And then both sides by -1:

-ma^3 F(a) = L^2

Take the square root — don’t worry, that it will turn out that ‘F(a)’ is a negative number so we’re not doing anything suspicious —

\sqrt{-ma^3 F(a)} = L

Now, take that ‘L’ we’ve got and put it back into the equation for angular velocity:

\omega = \frac{L}{ma^2} = \frac{\sqrt{-ma^3 F(a)}}{ma^2}

We might look stuck and at what seems like an even worse position. It’s not. When you do enough of these problems you get used to some tricks. For example, that ‘ma2‘ in the denominator we could move under the square root if we liked. This we know because ma^2 = \sqrt{ \left(ma^2\right)^2 } at least as long as ‘ma2‘ is positive. It is.

So. We fall back on the trick of squaring and square-rooting the denominator and so generate this mess:

\omega = \sqrt{\frac{-ma^3 F(a)}{\left(ma^2\right)^2}}	\\ \omega = \sqrt{\frac{-ma^3 F(a)}{m^2 a^4}} \\ \omega = \sqrt{\frac{-F(a)}{ma}}

That’s getting nice and simple. Let me go complicate matters. I’ll want to know the angle that the planet sweeps out as the radius goes from its largest to its smallest value. Or vice-versa. This time is going to be half of ‘Tr‘, the time it takes to do a complete oscillation. The oscillation might have started at time ‘t’ of zero, maybe not. But how long it takes will be the same. I’m going to call this angle ‘ψ’, because I’ve written “the angle that the planet sweeps out as the radius goes from its largest to its smallest value” enough times this essay. If ‘ψ’ is equal to π, or one-half π, or one-third π, or some other nice rational multiple of π we’ll get a closed orbit. If it isn’t, we won’t.

So. ‘ψ’ will be one-half times the oscillation time times that angular velocity. This is easy:

\psi = \frac{1}{2} \cdot T_r \cdot \omega

Put in the formulas we have for ‘Tr‘ and for ‘ω’. Now it’ll be complicated.

\psi = \frac{1}{2} 2\pi \sqrt{\frac{m}{-F'(a) - \frac{3}{a} F(a)}} \sqrt{\frac{-F(a)}{ma}}

Now we’ll make this a little simpler again. We have two square roots of fractions multiplied by each other. That’s the same as the square root of the two fractions multiplied by each other. So we can take numerator times numerator and denominator times denominator, all underneath the square root sign. See if I don’t. Oh yeah and one-half of two π is π but you saw that coming.

\psi = \pi \sqrt{ \frac{-m F(a)}{-\left(F'(a) + \frac{3}{m}F(a)\right)\cdot ma} }

OK, so there’s some minus signs in the numerator and denominator worth getting rid of. There’s an ‘m’ in the numerator and the denominator that we can divide out of both sides. There’s an ‘a’ in the denominator that can multiply into a term that has a denominator inside the denominator and you know this would be easier if I could use little cross-out symbols in WordPress LaTeX. If you’re not following all this, try writing it out by hand and seeing what makes sense to cancel out.

\psi = \pi \sqrt{ \frac{F(a)}{aF'(a) + 3F(a)} }

This is getting not too bad. Start from a potential energy ‘U(r)’. Use an angular momentum ‘L’ to figure out the circular orbit radius ‘a’. From the potential energy find the force ‘F(r)’. And then, based on what ‘F’ and the first derivative of ‘F’ happen to be, at the radius ‘a’, we can see whether a closed orbit can be there.

I’ve gotten to some pretty abstract territory here. Next time I hope to make things simpler again.

Why Stuff Can Orbit, Part 12: How Fast Is An Orbit?


Why Stuff Can Orbit, featuring a dazed-looking coati (it's a raccoon-like creature from Latin America) and a starry background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patreon for those able to support his work. He’s also open for commissions, starting from US$10.

Previously:

And the supplemental reading:


On to the next piece of looking for stable, closed orbits of a central force. We start from a circular orbit of something around the sun or the mounting point or whatever. The center. I would have saved myself so much foggy writing if I had just decided to make this a sun-and-planet problem. But I had wanted to write the general problem. In this the force attracting the something towards the center has a strength that’s some constant times the distance to the center raised to a power. This is easy to describe in symbols. It’s cluttered to describe in words. This is why symbols are so nice.

The perturbed orbit, the one I want to see close up, looks like an oscillation around that circle. The fact it is a perturbation, a small nudge away from the equilibrium, means how big the perturbation is will oscillate in time. How far the planet (whatever) is from the center will make a sine wave in time. Whether it closes depends on what it does in space.

Part of what it does in space is easy. I just said what the distance from the planet to the center does. But to say where the planet is we need to know how far it is from the center and what angle it makes with respect to some reference direction. That’s a little harder. We also need to know where it is in the third dimension, but that’s so easy. An orbit like this is always in one plane, so we picked that plane to be that of our paper or whiteboard or tablet or whatever we’re using to sketch this out. That’s so easy to answer we don’t even count it as solved.

The angle, though. Here, I mean the angle made by looking at the planet, the center, and some reference direction. This angle can be any real number, although a lot of those angles are going to point to the same direction in space. We’re coming at this from a mathematical view, or a physics view. Or a mathematical physics view. It means we measure this angle as radians instead of degrees. That is, a right angle is \frac{\pi}{2} , not 90 degrees, thank you. A full circle is 2\pi and not 360 degrees. We aren’t doing this to be difficult. There are good reasons to use radians. They make the mathematics simpler. What else could matter?

We use \theta as the symbol for this angle. It’s a popular choice. \theta is going to change in time. We’ll want to know how fast it changes over time. This concept we call the angular velocity. For this there are a bunch of different possible notations. The one that I snuck in here two essays ago was ω.

We came at the physics of this orbiting planet from a weird direction. Well, I came at it, and you followed along, and thank you for that. But I never did something like set the planet at a particular distance from the center of the universe and give it a set speed so it would have a circular enough orbit. I set up that we should have some potential energy. That energy implies a central force. It attracts things to the center of the universe. And that there should be some angular momentum that the planet has in its movement. And from that, that there would be some circular orbit. That circular orbit is one with just the right radius and just the right change in angle over time.

From the potential energy and the angular momentum we can work out the radius of the circular orbit. Suppose your potential energy obeys a rule like V(r) = Cr^n for some number ‘C’ and some power, another number, ‘n’. Suppose your planet has the mass ‘m’. Then you’ll get a circular orbit when the planet’s a distance ‘a’ from the center, if a^{n + 2} = \frac{L^2}{n C m} . And it turns out we can also work out the angular velocity of this circular orbit. It’s all implicit in the amount of angular momentum that the planet has. This is part of why a mathematical physicist looks for concepts like angular momentum. They’re easy to work with, and they yield all sorts of interesting information, given the chance.

I first introduced angular momentum as this number that was how much of something that our something had. It’s got physical meaning, though, reflecting how much … uh … our something would like to keep rotating around the way it has. And this can be written as a formula. The angular momentum ‘L’ is equal to the moment of inertia ‘I’ times the angular velocity ‘ω’. ‘L’ and ‘ω’ are really vectors, and ‘I’ is really a tensor. But we don’t have to worry about this because this kind of problem is easy. We can pretend these are all real numbers and nothing more.

The moment of inertia depends on how the mass of the thing rotating is distributed in space. And it depends on how far the mass is from whatever axis it’s rotating around. For real bodies this can be challenging to work out. It’s almost always a multidimensional integral, haunting students in Calculus III. For a mass in a central force problem, though, it’s easy once again. Please tell me you’re not surprised. If it weren’t easy I’d have some more supplemental reading pieces here first.

For a planet of mass ‘m’ that’s a distance ‘r’ from the axis of rotation, the moment of inertia ‘I’ is equal to ‘mr2‘. I’m fibbing. Slightly. This is for a point mass, that is, something that doesn’t occupy volume. We always look at point masses in this sort of physics. At least when we start. It’s easier, for one thing. And it’s not far off. The Earth’s orbit has a radius just under 150,000,000 kilometers. The difference between the Earth’s actual radius of just over 6,000 kilometers and a point-mass radius of 0 kilometers is a minor correction.

So since we know L = I\omega , and we know I = mr^2 , we have L = mr^2\omega and from this:

\omega = \frac{L}{mr^2}

We know that ‘r’ changes in time. It oscillates from a maximum to a minimum value like any decent sine wave. So ‘r2‘ is going to oscillate too, like a … sine-squared wave. And then dividing the constant ‘L’ by something oscillating like a sine-squared wave … this implies ω changes in time. So it does. In a possibly complicated and annoying way. So it does. I don’t want to deal with that. So I don’t.

Instead, I am going to summon the great powers of approximation. This perturbed orbit is a tiny change from a circular orbit with radius ‘a’. Tiny. The difference between the actual radius ‘r’ and the circular-orbit radius ‘a’ should be small enough we don’t notice it at first glance. So therefore:

\omega = \frac{L}{ma^2}

And this is going to be close enough. You may protest: what if it isn’t? Why can’t the perturbation be so big that ‘a’ is a lousy approximation to ‘r’? To this I say: if the perturbation is that big it’s not a perturbation anymore. It might be an interesting problem. But it’s a different problem from what I’m doing here. It needs different techniques. The Earth’s orbit is different from Halley’s Comet’s orbit in ways we can’t ignore. I hope this answers your complaint. Maybe it doesn’t. I’m on your side there. A lot of mathematical physics, and of analysis, is about making approximations. We need to find perturbations big enough to give interesting results. But not so big they need harder mathematics than you can do. It’s a strange art. I’m not sure I know how to describe how to do it. What I know I’ve learned from doing a lot of problems. You start to learn what kinds of approaches usually pan out.

But what we’re relying on is the same trick we use in analysis. We suppose there is some error margin in the orbit’s radius and angle that’s tolerable. Then if the perturbation means we’d fall outside that error margin, we just look instead at a smaller perturbation. If there is no perturbation small enough to stay within our error margin then the orbit isn’t stable. And we already know it is. Here, we’re looking for closed orbits. People could in good faith argue about whether some particular observed orbit is a small enough perturbation from the circular equilibrium. But they can’t argue about whether there exist some small enough perturbations.

Let me suppose that you’re all right with my answer about big perturbations. There’s at least one more good objection to have here. It’s this: where is the central force? The mass of the planet (or whatever) is there. The angular momentum is there. The equilibrium orbit is there. But where’s the force? Where’s the potential energy we started with? Shouldn’t that appear somewhere in the description of how fast this planet moves around the center?

It should. And it is there, in an implicit form. We get the radius of the circular, equilibrium orbit, ‘a’, from knowing the potential energy. But we’ll do well to tease it out more explicitly. I hope to get there next time.

Why Stuff Can Orbit, Part 11: In Search Of Closure


Why Stuff Can Orbit, featuring a dazed-looking coati (it's a raccoon-like creature from Latin America) and a starry background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patreon for those able to support his work.

Previously:

And the supplemental reading:


I’m not ready to finish the series off yet. But I am getting closer to wrapping up perturbed orbits. So I want to say something about what I’m looking for.

In some ways I’m done already. I showed how to set up a central force problem, where some mass gets pulled towards the center of the universe. It can be pulled by a force that follows any rule you like. The rule has to follow some rules. The strength of the pull changes with how far the mass is from the center. It can’t depend on what angle the mass makes with respect to some reference meridian. Once we know how much angular momentum the mass has we can find whether it can have a circular orbit. And we can work out whether that orbit is stable. If the orbit is stable, then for a small nudge, the mass wobbles around that equilibrium circle. It spends some time closer to the center of the universe and some time farther away from it.

I want something a little more, else I can’t carry on this series. I mean, we can make central force problems with more things in them. What we have now is a two-body problem. A three-body problem is more interesting. It’s pretty near impossible to give exact, generally true answers about. We can save things by only looking at very specific cases. Fortunately one is a sun, planet, and moon, where each object is much more massive than the next one. We see a lot of things like that. Four bodies is even more impossible. Things start to clear up if we look at, like, a million bodies, because our idea of what “clear” is changes. I don’t want to do that right now.

Instead I’m going to look for closed orbits. Closed orbits are what normal people would call “orbits”. We’re used to thinking of orbits as, like, satellites going around and around the Earth. We know those go in circles, or ellipses, over and over again. They don’t, but the difference between a closed orbit and what they do is small enough we don’t need to care.

Here, “orbit” means something very close to but not exactly what normal people mean by orbits. Maybe I should have said something about that before. But the difference hasn’t counted for much before.

Start off by thinking of what we need to completely describe what a particular mass is doing. You need to know the central force law that the mass obeys. You need to know, for some reference time, where it is. You also need to know, for that same reference time, what its momentum is. Once you have that, you can predict where it should go for all time to come. You can also work out where it must have been before that reference time. (This we call “retrodicting”. Or “predicting the past”. With this kind of physics problem time has an unnerving symmetry. The tools which forecast what the mass will do in the future are exactly the same as those which tell us what the mass has done in the past.)

Now imagine knowing all the sets of positions and momentums that the mass has had. Don’t look just at the reference time. Look at all the time before the reference time, and look at all the time after the reference time. Imagine highlighting all the sets of positions and momentums the mass ever took on or ever takes on. We highlight them against the universe of all the positions and momentums that the mass could have had if this were a different problem.

What we get is this ribbon-y thread that passes through the universe of every possible setup. This universe of every possible setup we call a “phase space”. It’s easy to explain the “space” part of that name. The phase space obeys the rules we’d expect from a vector space. It also acts in a lot of ways like the regular old space that we live in. The “phase” part I’m less sure how to justify. I suspect we get it because this way of looking at physics problems comes from statistical mechanics. And in that field we’re looking, often, at the different ways a system can behave. This mathematics looks a lot like that of different phases of matter. The changes between solids and liquids and gases are some of what we developed this kind of mathematics to understand, in fact. But this is speculation on my part. I’m not sure why “phase” has attached to this name. I can think of other, harder-to-popularize reasons why the name would make sense too. Maybe it’s the convergence of several reasons. I’d love to hear if someone has a good etymology. If one exists; remember that we still haven’t got the story straight about why ‘m’ stands for the slope of a line.

Anyway, this ribbon of all the arrangements of position and momentum that the mass does ever at any point have we call a “trajectory”. We call it a trajectory because it looks like a trajectory. Sometimes mathematics terms aren’t so complicated. We also call it an “orbit” since very often the problems we like involve trajectories that loop around some interesting area. It looks like a planet orbiting a sun.

A “closed orbit” is an orbit that gets back to where it started. This means you can take some reference time, and wait. Eventually the mass comes back to the same position and the same momentum that you saw at that reference time. This might seem unavoidable. Wouldn’t it have to get back there? And it turns out, no, it doesn’t. A trajectory might wander all over phase space. This doesn’t take much imagination. But even if it doesn’t, if it stays within a bounded region, it could still wander forever without repeating itself. If you’re not sure about that, please consider an old sequence I wrote inspired by the Aardman Animation film Arthur Christmas. Also please consider seeing the Aardman Animation film Arthur Christmas. It is one of the best things this decade has offered us. The short version is, though, that there is a lot of room even in the smallest bit of space. A trajectory is, in a way, a one-dimensional thing that might get all coiled up. But phase space has got plenty of room for that.

And sometimes we will get a closed orbit. The mass can wander around the center of the universe and come back to wherever we first noticed it with the same momentum it first had. A that point it’s locked into doing that same thing again, forever. If it could ever break out of the closed orbit it would have had to the first time around, after all.

Closed orbits, I admit, don’t exist in the real world. Well, the real world is complicated. It has more than a single mass and a single force at work. Energy and momentum are conserved. But we effectively lose both to friction. We call the shortage “entropy”. Never mind. No person has ever seen a circle, and no person ever will. They are still useful things to study. So it is with closed orbits.

An equilibrium orbit, the circular orbit of a mass that’s at exactly the right radius for its angular momentum, is closed. A perturbed orbit, wobbling around the equilibrium, might be closed. It might not. I mean next time to discuss what has to be true to close an orbit.

Why Stuff Can Orbit, Part 10: Where Time Comes From And How It Changes Things


Why Stuff Can Orbit, featuring a dazed-looking coati (it's a raccoon-like creature from Latin America) and a starry background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patron for those able to support his work.

Previously:

And the supplemental reading:


And again my thanks to Thomas K Dye, creator of the web comic Newshounds, for the banner art. He has a Patreon to support his creative habit.

In the last installment I introduced perturbations. These are orbits that are a little off from the circles that make equilibriums. And they introduce something that’s been lurking, unnoticed, in all the work done before. That’s time.

See, how do we know time exists? … Well, we feel it, so, it’s hard for us not to notice time exists. Let me rephrase it then, and put it in contemporary technology terms. Suppose you’re looking at an animated GIF. How do you know it’s started animating? Or that it hasn’t stalled out on some frame?

If the picture changes, then you know. It has to be going. But if it doesn’t change? … Maybe it’s stalled out. Maybe it hasn’t. You don’t know. You know there’s time when you can see change. And that’s one of the little practical insights of physics. You can build an understanding of special relativity by thinking hard about that. Also think about the observation that the speed of light (in vacuum) doesn’t change.

When something physical’s in equilibrium, it isn’t changing. That’s how we found equilibriums to start with. And that means we stop keeping track of time. It’s one more thing to keep track of that doesn’t tell us anything new. Who needs it?

For the planet orbiting a sun, in a perfect circle, or its other little variations, we do still need time. At least some. How far the planet is from the sun doesn’t change, no, but where it is on the orbit will change. We can track where it is by setting some reference point. Where the planet is at the start of our problem. How big is the angle between where the planet is now, the sun (the center of our problem’s universe), and that origin point? That will change over time.

But it’ll change in a boring way. The angle will keep increasing in magnitude at a constant speed. Suppose it takes five time units for the angle to grow from zero degrees to ten degrees. Then it’ll take ten time units for the angle to grow from zero to twenty degrees. It’ll take twenty time units for the angle to grow from zero to forty degrees. Nice to know if you want to know when the planet is going to be at a particular spot, and how long it’ll take to get back to the same spot. At this rate it’ll be eighteen time units before the angle grows to 360 degrees, which looks the same as zero degrees. But it’s not anything interesting happening.

We’ll label this sort of change, where time passes, yeah, but it’s too dull to notice as a “dynamic equilibrium”. There’s change, but it’s so steady and predictable it’s not all that exciting. And I’d set up the circular orbits so that we didn’t even have to notice it. If the radius of the planet’s orbit doesn’t change, then the rate at which its apsidal angle changes, its “angular velocity”, also doesn’t change.

Now, with perturbations, the distance between the planet and the center of the universe will change in time. That was the stuff at the end of the last installment. But also the apsidal angle is going to change. I’ve used ‘r(t)’ to represent the radial distance between the planet and the sun before, and to note that what value it is depends on the time. I need some more symbols.

There’s two popular symbols to use for angles. Both are Greek letters because, I dunno, they’ve always been. (Florian Cajori’s A History of Mathematical Notation doesn’t seem to have anything. And when my default go-to for explaining mathematician’s choices tells me nothing, what can I do? Look at Wikipedia? Sure, but that doesn’t enlighten me either.) One is to use theta, θ. The other is to use phi, φ. Both are good, popular choices, and in three-dimensional problems we’ll often need both. We don’t need both. The orbit of something moving under a central force might be complicated, but it’s going to be in a single plane of movement. The conservation of angular momentum gives us that. It’s not the last thing angular momentum will give us. The orbit might happen not to be in a horizontal plane. But that’s all right. We can tilt our heads until it is.

So I’ll reach deep into the universe of symbols for angles and call on θ for the apsidal angle. θ will change with time, so, ‘θ(t)’ is the angular counterpart to ‘r(t)’.

I’d said before the apsidal angle is the angle made between the planet, the center of the universe, and some reference point. What is my reference point? I dunno. It’s wherever θ(0) is, that is, where the planet is when my time ‘t’ is zero. There’s probably a bootstrapping fallacy here. I’ll cover it up by saying, you know, the reference point doesn’t matter. It’s like the choice of prime meridian. We have to have one, but we can pick whatever one is convenient. So why not pick one that gives us the nice little identity that ‘θ(0) = 0’? If you don’t buy that and insist I pick a reference point first, fine, go ahead. But you know what? The labels on my time axis are arbitrary. There’s no difference in the way physics works whether ‘t’ is ‘0’ or ‘2017’ or ‘21350’. (At least as long as I adjust any time-dependent forces, which there aren’t here.) So we get back to ‘θ(0) = 0’.

For a circular orbit, the dynamic equilibrium case, these are pretty boring, but at least they’re easy to write. They’re:

r(t) = a	\\ \theta(t) = \omega t

Here ‘a’ is the radius of the circular orbit. And ω is a constant number, the angular velocity. It’s how much a bit of time changes the apsidal angle. And this set of equations is pretty dull. You can see why it barely rates a mention.

The perturbed case gets more interesting. We know how ‘r(t)’ looks. We worked that out last time. It’s some function like:

r(t) = a + A cos\left(\sqrt{\frac{k}{m}} t\right) + B sin\left(\sqrt{\frac{k}{m}} t\right)

Here ‘A’ and ‘B’ are some numbers telling us how big the perturbation is, and ‘m’ is the mass of the planet, and ‘k’ is something related to how strong the central force is. And ‘a’ is that radius of the circular orbit, the thing we’re perturbed around.

What about ‘θ(t)’? How’s that look? … We don’t seem to have a lot to go on. We could go back to Newton and all that force equalling the change in momentum over time stuff. We can always do that. It’s tedious, though. We have something better. It’s another gift from the conservation of angular momentum. When we can turn a forces-over-time problem into a conservation-of-something problem we’re usually doing the right thing. The conservation-of-something is typically a lot easier to set up and to track. We’ve used it in the conservation of energy, before, and we’ll use it again. The conservation of ordinary, ‘linear’, momentum helps other problems, though not I’ll grant this one. The conservation of angular momentum will help us here.

So what is angular momentum? … It’s something about ice skaters twirling around and your high school physics teacher sitting on a bar stool spinning a bike wheel. All right. But it’s also a quantity. We can get some idea of it by looking at the formula for calculating linear momentum:

\vec{p} = m\vec{v}

The linear momentum of a thing is its inertia times its velocity. This is if the thing isn’t moving fast enough we have to notice relativity. Also if it isn’t, like, an electric or a magnetic field so we have to notice it’s not precisely a thing. Also if it isn’t a massless particle like a photon because see previous sentence. I’m talking about ordinary things like planets and blocks of wood on springs and stuff. The inertia, ‘m’, is rather happily the same thing as its mass. The velocity is how fast something is travelling and which direction it’s going in.

Angular momentum, meanwhile, we calculate with this radically different-looking formula:

\vec{L} = I\vec{\omega}

Here, again, talking about stuff that isn’t moving so fast we have to notice relativity. That isn’t electric or magnetic fields. That isn’t massless particles. And so on. Here ‘I’ is the “moment of inertia” and \vec{w} is the angular velocity. The angular velocity is a vector that describes for us how fast the spinning is and what direction the axis around which the thing spins is. The moment of inertia describes how easy or hard it is to make the thing spin around each axis. It’s a tensor because real stuff can be easier to spin in some directions than in others. If you’re not sure that’s actually so, try tossing some stuff in the air so it spins in each of the three major directions. You’ll see.

We’re fortunate. For central force problems the moment of inertia is easy to calculate. We don’t need the tensor stuff. And we don’t even need to notice that the angular velocity is a vector. We know what axis the planet’s rotating around; it’s the one pointing out of the plane of motion. We can focus on the size of the angular velocity, the number ‘ω’. See how they’re different, what with one not having an arrow over the symbol. The arrow-less version is easier. For a planet, or other object, with mass ‘m’ that’s orbiting a distance ‘r’ from the sun, the moment of inertia is:

I = mr^2

So we know this number is going to be constant:

L = mr^2\omega

The mass ‘m’ doesn’t change. We’re not doing those kinds of problem. So however ‘r’ changes in time, the angular velocity ‘ω’ has to change with it, so that this product stays constant. The angular velocity is how the apsidal angle ‘θ’ changes over time. So since we know ‘L’ doesn’t change, and ‘m’ doesn’t change, then the way ‘r’ changes must tell us something about how ‘θ’ changes. We’ll get into that next time.

Why Stuff Can Orbit, Part 9: How The Spring In The Cosmos Behaves


Why Stuff Can Orbit, featuring a dazed-looking coati (it's a raccoon-like creature from Latin America) and a starry background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patron for those able to support his work.

Previously:

And the supplemental reading:


First, I thank Thomas K Dye for the banner art I have for this feature! Thomas is the creator of the longrunning web comic Newshounds. He’s hoping soon to finish up special editions of some of the strip’s stories and to publish a definitive edition of the comic’s history. He’s also got a Patreon account to support his art habit. Please give his creations some of your time and attention.

Now back to central forces. I’ve run out of obvious fun stuff to say about a mass that’s in a circular orbit around the center of the universe. Before you question my sense of fun, remember that I own multiple pop histories about the containerized cargo industry and last month I read another one that’s changed my mind about some things. These sorts of problems cover a lot of stuff. They cover planets orbiting a sun and blocks of wood connected to springs. That’s about all we do in high school physics anyway. Well, there’s spheres colliding, but there’s no making a central force problem out of those. You can also make some things that look like bad quantum mechanics models out of that. The mathematics is interesting even if the results don’t match anything in the real world.

But I’m sticking with central forces that look like powers. These have potential energy functions with rules that look like V(r) = C rn. So far, ‘n’ can be any real number. It turns out ‘n’ has to be larger than -2 for a circular orbit to be stable, but that’s all right. There are lots of numbers larger than -2. ‘n’ carries the connotation of being an integer, a whole (positive or negative) number. But if we want to let it be any old real number like 0.1 or π or 18 and three-sevenths that’s fine. We make a note of that fact and remember it right up to the point we stop pretending to care about non-integer powers. I estimate that’s like two entries off.

We get a circular orbit by setting the thing that orbits in … a circle. This sounded smarter before I wrote it out like that. Well. We set it moving perpendicular to the “radial direction”, which is the line going from wherever it is straight to the center of the universe. This perpendicular motion means there’s a non-zero angular momentum, which we write as ‘L’ for some reason. For each angular momentum there’s a particular radius that allows for a circular orbit. Which radius? It’s whatever one is a minimum for the effective potential energy:

V_{eff}(r) = Cr^n + \frac{L^2}{2m}r^{-2}

This we can find by taking the first derivative of ‘Veff‘ with respect to ‘r’ and finding where that first derivative is zero. This is standard mathematics stuff, quite routine. We can do with any function whether it represents something physics or not. So:

\frac{dV_{eff}}{dr} = nCr^{n-1} - 2\frac{L^2}{2m}r^{-3} = 0

And after some work, this gets us to the circular orbit’s radius:

r = \left(\frac{L^2}{nCm}\right)^{\frac{1}{n + 2}}

What I’d like to talk about is if we’re not quite at that radius. If we set the planet (or whatever) a little bit farther from the center of the universe. Or a little closer. Same angular momentum though, so the equilibrium, the circular orbit, should be in the same spot. It happens there isn’t a planet there.

This enters us into the world of perturbations, which is where most of the big money in mathematical physics is. A perturbation is a little nudge away from an equilibrium. What happens in response to the little nudge is interesting stuff. And here we already know, qualitatively, what’s going to happen: the planet is going to rock around the equilibrium. This is because the circular orbit is a stable equilibrium. I’d described that qualitatively last time. So now I want to talk quantitatively about how the perturbation changes given time.

Before I get there I need to introduce another bit of notation. It is so convenient to be able to talk about the radius of the circular orbit that would be the equilibrium. I’d called that ‘r’ up above. But I also need to be able to talk about how far the perturbed planet is from the center of the universe. That’s also really hard not to call ‘r’. Something has to give. Since the radius of the circular orbit is not going to change I’m going to give that a new name. I’ll call it ‘a’. There’s several reasons for this. One is that ‘a’ is commonly used for describing the size of ellipses, which turn up in actual real-world planetary orbits. That’s something we know because this is like the thirteenth part of an essay series about the mathematics of orbits. You aren’t reading this if you haven’t picked up a couple things about orbits on your own. Also we’ve used ‘a’ before, in these sorts of approximations. It was handy in the last supplemental as the point of expansion’s name. So let me make that unmistakable:

a \equiv r = \left(\frac{L^2}{nCm}\right)^{\frac{1}{n + 2}}

The \equiv there means “defined to be equal to”. You might ask how this is different from “equals”. It seems like more emphasis to me. Also, there are other names for the circular orbit’s radius that I could have used. ‘re‘ would be good enough, as the subscript would suggest “radius of equilibrium”. Or ‘r0‘ would be another popular choice, the 0 suggesting that this is something of key, central importance and also looking kind of like a circle. (That’s probably coincidence.) I like the ‘a’ better there because I know how easy it is to drop a subscript. If you’re working on a problem for yourself that’s easy to fix, with enough cursing and redoing your notes. On a board in front of class it’s even easier to fix since someone will ask about the lost subscript within three lines. In a post like this? It would be a mess.

So now I’m going to look at possible values of the radius ‘r’ that are close to ‘a’. How close? Close enough that ‘Veff‘, the effective potential energy, looks like a parabola. If it doesn’t look much like a parabola then I look at values of ‘r’ that are even closer to ‘a’. (Do you see how the game is played? If you don’t, look closer. Yes, this is actually valid.) If ‘r’ is that close to ‘a’, then we can get away with this polynomial expansion:

V_{eff}(r) \approx V_{eff}(a) + m\cdot(r - a) + \frac{1}{2} m_2 (r - a)^2

where

m = \frac{dV_{eff}}{dr}\left(a\right)	\\ m_2  = \frac{d^2V_{eff}}{dr^2}\left(a\right)

The “approximate” there is because this is an approximation. V_{eff}(r) is in truth equal to the thing on the right-hand-side there plus something that isn’t (usually) zero, but that is small.

I am sorry beyond my ability to describe that I didn’t make that ‘m’ and ‘m2‘ consistent last week. That’s all right. One of these is going to disappear right away.

Now, what V_{eff}(a) is? Well, that’s whatever you get from putting in ‘a’ wherever you start out seeing ‘r’ in the expression for V_{eff}(r) . I’m not going to bother with that. Call it math, fine, but that’s just a search-and-replace on the character ‘r’. Also, where I’m going next, it’s going to disappear, never to be seen again, so who cares? What’s important is that this is a constant number. If ‘r’ changes, the value of V_{eff}(a) does not, because ‘r’ doesn’t appear anywhere in V_{eff}(a) .

How about ‘m’? That’s the value of the first derivative of ‘Veff‘ with respect to ‘r’, evaluated when ‘r’ is equal to ‘a’. That might be something. It’s not, because of what ‘a’ is. It’s the value of ‘r’ which would make \frac{dV_{eff}}{dr}(r) equal to zero. That’s why ‘a’ has that value instead of some other, any other.

So we’ll have a constant part ‘Veff(a)’, plus a zero part, plus a part that’s a parabola. This is normal, by the way, when we do expansions around an equilibrium. At least it’s common. Good to see it. To find ‘m2‘ we have to take the second derivative of ‘Veff(r)’ and then evaluate it when ‘r’ is equal to ‘a’ and ugh but here it is.

\frac{d^2V_{eff}}{dr^2}(r) = n (n - 1) C r^{n - 2} + 3\cdot\frac{L^2}{m}r^{-4}

And at the point of approximation, where ‘r’ is equal to ‘a’, it’ll be:

m_2 = \frac{d^2V_{eff}}{dr^2}(a) = n (n - 1) C a^{n - 2} + 3\cdot\frac{L^2}{m}a^{-4}

We know exactly what ‘a’ is so we could write that out in a nice big expression. You don’t want to. I don’t want to. It’s a bit of a mess. I mean, it’s not hard, but it has a lot of symbols in it and oh all right. Here. Look fast because I’m going to get rid of that as soon as I can.

m_2 = \frac{d^2V_{eff}}{dr^2}(a) = n (n - 1) C \left(\frac{L^2}{n C m}\right)^{n - 2} + 3\cdot\frac{L^2}{m}\left(\frac{L^2}{n C m}\right)^{-4}

For the values of ‘n’ that we actually care about because they turn up in real actual physics problems this expression simplifies some. Enough, anyway. If we pretend we know nothing about ‘n’ besides that it is a number bigger than -2 then … ugh. We don’t have a lot that can clean it up.

Here’s how. I’m going to define an auxiliary little function. Its role is to contain our symbolic sprawl. It has a legitimate role too, though. At least it represents something that it makes sense to give a name. It will be a new function, named ‘F’ and that depends on the radius ‘r’:

F(r) \equiv -\frac{dV}{dr}

Notice that’s the derivative of the original ‘V’, not the angular-momentum-equipped ‘Veff‘. This is the secret of its power. It doesn’t do anything to make V_{eff}(r) easier to work with. It starts being good when we take its derivatives, though:

\frac{dV_{eff}}{dr} = -F(r) - \frac{L^2}{m}r^{-3}

That already looks nicer, doesn’t it? It’s going to be really slick when you think about what ‘F(a)’ is. Remember that ‘a’ is the value for ‘r’ which makes the derivative of ‘Veff‘ equal to zero. So … I may not know much, but I know this:

0 = \frac{dV_{eff}}{dr}(a) = -F(a) - \frac{L^2}{m}a^{-3}	\\ F(a) = -\frac{L}{ma^3}

I’m not going to say what value F(r) has for other values of ‘r’ because I don’t care. But now look at what it does for the second derivative of ‘Veff‘:

\frac{d^2 V_{eff}}{dr^2}(r) = -F'(r) + 3\frac{L^2}{mr^4}

Here the ‘F'(r)’ is a shorthand way of writing ‘the derivative of F with respect to r’. You can do when there’s only the one free variable to consider. And now something magic that happens when we look at the second derivative of ‘Veff‘ when ‘r’ is equal to ‘a’ …

\frac{d^2 V_{eff}}{dr^2}(a) = -F'(a) - \frac{3}{a} F(a)

We get away with this because we happen to know that ‘F(a)’ is equal to -\frac{L}{ma^3} and doesn’t that work out great? We’ve turned a symbolic mess into a … less symbolic mess.

Now why do I say it’s legitimate to introduce ‘F(r)’ here? It’s because minus the derivative of the potential energy with respect to the position of something can be something of actual physical interest. It’s the amount of force exerted on the particle by that potential energy at that point. The amount of force on a thing is something that we could imagine being interested in. Indeed, we’d have used that except potential energy is usually so much easier to work with. I’ve avoided it up to this point because it wasn’t giving me anything I needed. Here, I embrace it because it will save me from some awful lines of symbols.

Because with this expression in place I can write the approximation to the potential energy of:

V_{eff}(r) \approx V_{eff}(a) + \frac{1}{2} \left( -F'(a) - \frac{3}{a}F(a) \right) (r - a)^2

So if ‘r’ is close to ‘a’, then the polynomial on the right is a good enough approximation to the effective potential energy. And that potential energy has the shape of a spring’s potential energy. We can use what we know about springs to describe its motion. Particularly, we’ll have this be true:

\frac{dp}{dt} = -\frac{dv_{eff}}{dr}(r) = -\left( F'(a) + \frac{3}{a} F(a)\right) r

Here, ‘p’ is the (linear) momentum of whatever’s orbiting, which we can treat as equal to ‘mr’, the mass of the orbiting thing times how far it is from the center. You may sense in me some reluctance about doing this, what with that ‘we can treat as equal to’ talk. There’s reasons for this and I’d have to get deep into geometry to explain why. I can get away with specifically this use because the problem allows it. If you’re trying to do your own original physics problem inspired by this thread, and it’s not orbits like this, be warned. This is a spot that could open up to a gigantic danger pit, lined at the bottom with sharp spikes and angry poison-clawed mathematical tigers and I bet it’s raining down there too.

So we can rewrite all this as

m\frac{d^2r}{dt^2} = -\frac{dv_{eff}}{dr}(r) = -\left( F'(a) + \frac{3}{a} F(a)\right) r

And when we learned everything interesting there was to know about springs we learned what the solutions to this look like. Oh, in that essay the variable that changed over time was called ‘x’ and here it’s called ‘r’, but that’s not an actual difference. ‘r’ will be some sinusoidal curve:

r(t) = A cos\left(\sqrt{\frac{k}{m}} t\right) + B sin\left(\sqrt{\frac{k}{m}} t\right)

where, here, ‘k’ is equal to that whole mass of constants on the right-hand side:

k = -\left( F'(a) + \frac{3}{a} F(a)\right)

I don’t know what ‘A’ and ‘B’ are. It’ll depend on just what the perturbation is like, how far the planet is from the circular orbit. But I can tell you what the behavior is like. The planet will wobble back and forth around the circular orbit, sometimes closer to the center, sometimes farther away. It’ll spend as much time closer to the center than the circular orbit as it does farther away. And the period of that oscillation will be

T = 2\pi\sqrt{\frac{m}{k}} = 2\pi\sqrt{\frac{m}{-\left(F'(a) + \frac{3}{a}F(a)\right)}}

This tells us something about what the orbit of a thing not in a circular orbit will be like. Yes, I see you in the back there, quivering with excitement about how we’ve got to elliptical orbits. You’re moving too fast. We haven’t got that. There will be elliptical orbits, yes, but only for a very particular power ‘n’ for the potential energy. Not for most of them. We’ll see.

It might strike you there’s something in that square root. We need to take the square root of a positive number, so maybe this will tell us something about what kinds of powers we’re allowed. It’s a good thought. It turns out not to tell us anything useful, though. Suppose we started with V(r) = Cr^n . Then F(r) = -nCr^{n - 1}, and F'(r) = -n(n - 1)C^{n - 2} . Sad to say, this leads us to a journey which reveals that we need ‘n’ to be larger than -2 or else we don’t get oscillations around a circular orbit. We already knew that, though. We already found we needed it to have a stable equilibrium before. We can see there not being a period for these oscillations around the circular orbit as another expression of the circular orbit not being stable. Sad to say, we haven’t got something new out of this.

We will get to new stuff, though. Maybe even ellipses.

Everything Interesting There Is To Say About Springs


I need another supplemental essay to get to the next part in Why Stuff Can Orbit. (Here’s the last part.) You probably guessed it’s about springs. They’re useful to know about. Why? That one killer Mystery Science Theater 3000 short, yes. But also because they turn up everywhere.

Not because there are literally springs in everything. Not with the rise in anti-spring political forces. But what makes a spring is a force that pushes something back where it came from. It pushes with a force that grows just as fast as the distance from where it came grows. Most anything that’s stable, that has some normal state which it tends to look like, acts like this. A small nudging away from the normal state gets met with some resistance. A bigger nudge meets bigger resistance. And most stuff that we see is stable. If it weren’t stable it would have broken before we got there.

(There are exceptions. Stable is, sometimes, about perspective. It can be that something is unstable but it takes so long to break that we don’t have to worry about it. Uranium, for example, is dying, turning slowly into stable elements like lead and helium. There will come a day there’s none left in the Earth. But it takes so long to break down that, barring surprises, the Earth will have broken down into something else first. And it may be that something is unstable, but it’s created by something that’s always going on. Oxygen in the atmosphere is always busy combining with other chemicals. But oxygen stays in the atmosphere because life keeps breaking it out of other chemicals.)

Now I need to put in some terms. Start with your thing. It’s on a spring, literally or metaphorically. Don’t care. If it isn’t being pushed in any direction then it’s at rest. Or it’s at an equilibrium. I don’t want to call this the ideal or natural state. That suggests some moral superiority to one way of existing over another, and how do I know what’s right for your thing? I can tell you what it acts like. It’s your business whether it should. Anyway, your thing has an equilibrium.

Next term is the displacement. It’s how far your thing is from the equilibrium. If it’s really a block of wood on a spring, like it is in high school physics, this displacement is how far the spring is stretched out. In equations I’ll represent this as ‘x’ because I’m not going to go looking deep for letters for something like this. What value ‘x’ has will change with time. This is what makes it a physics problem. If we want to make clear that ‘x’ does depend on time we might write ‘x(t)’. We might go all the way and start at the top of the page with ‘x = x(t)’, just in case.

If ‘x’ is a positive number it means your thing is displaced in one direction. If ‘x’ is a negative number it was displaced in the opposite direction. By ‘one direction’ I mean ‘to the right, or else up’. By ‘the opposite direction’ I mean ‘to the left, or else down’. Yes, you can pick any direction you like but why are you making life harder for everyone? Unless there’s something compelling about the setup of your thing that makes another choice make sense just go along with what everyone else is doing. Apply your creativity and iconoclasm where it’ll make your life better instead.

Also, we only have to worry about one direction. This might surprise you. If you’ve played much with springs you might have noticed how they’re three-dimensional objects. You can set stuff swinging back and forth in two directions at once. That’s all right. We can describe a two-dimensional displacement as a displacement in one direction plus a displacement perpendicular to that. And if there’s no such thing as friction, they won’t interact. We can pretend they’re two problems that happen to be running on the same spring at the same time. So here I declare: we can ignore friction and pretend it doesn’t matter. We don’t have to deal with more than one direction at a time.

(It’s not only friction. There’s problems about how energy gets transmitted between ways the thing can oscillate. This is what causes what starts out as a big whack in one direction to turn into a middling little circular wobbling. That’s a higher level physics than I want to do right now. So here I declare: we can ignore that and pretend it doesn’t matter.)

Whether your thing is displaced or not it’s got some potential energy. This can be as large or as small as you like, down to some minimum when your thing is at equilibrium. The potential energy we represent as a number named ‘U’ because of good reasons that somebody surely had. The potential energy of a spring depends on the square of the displacement. We can write its value as ‘U = ½ k x2‘. Here ‘k’ is a number known as the spring constant. It describes how strongly the spring reacts; the bigger ‘k’ is, the more any displacement’s met with a contrary force. It’ll be a positive number. ½ is that same old one-half that you know from ideas being half-baked or going-off being half-cocked.

Potential energy is great. If you can describe a physics problem with its energy you’re in good shape. It lets us bring physical intuition into understanding things. Imagine a bowl or a Habitrail-type ramp that’s got the cross-section of your potential energy. Drop a little marble into it. How the marble rolls? That’s what your thingy does in that potential energy.

Also we have mathematics. Calculus, particularly differential equations, lets us work out how the position of your thing will change. We need one more piece for this. That’s the momentum of your thing. Momentum is traditionally represented with the letter ‘p’. And now here’s how stuff moves when you know the potential energy ‘U’:

\frac{dp}{dt} = - \frac{\partial U}{\partial x}

Let me unpack that. \frac{dp}{dt} — also known as \frac{d}{dt}p if that looks better — is “the derivative of p with respect to t”. It means “how the value of the momentum changes as the time changes”. And that is equal to minus one times …

You might guess that \frac{\partial U}{\partial x} — also written as \frac{\partial}{\partial x} U — is some kind of derivative. The \partial looks kind of like a cursive d, after all. It’s known as the partial derivative, because it means we look at how ‘U’ changes as ‘x’ and nothing else at all changes. With the normal, ‘d’ style full derivative, we have to track how all the variables change as the ‘t’ we’re interested in changes. In this particular problem the difference doesn’t matter. But there are problems where it does matter and that’s why I’m careful about the symbols.

So now we fall back on how to take derivatives. This gives us the equation that describes how the physics of your thing on a spring works:

\frac{dp}{dt} = - k x

You’re maybe underwhelmed. This is because we haven’t got any idea how the momentum ‘p’ relates to the displacement ‘x’. Well, we do, because I know and if you’re still reading at this point you know full well what momentum is. But let me make it official. Momentum is, for this kind of thing, the mass ‘m’ of your thing times how its position is changing, which is \frac{dx}{dt} . The mass of your thing isn’t changing. If you’re going to let it change then we’re doing some screwy rocket problem and that’s a different article. So its easy to get the momentum out of that problem. We get instead the second derivative of the displacement with respect to time:

m\frac{d^2 x}{dt^2} = - kx

Fine, then. Does that tell us anything about what ‘x(t)’ is? Not yet, but I will now share with you one of the top secrets that only real mathematicians know. We will take a guess to what the answer probably is. Then we’ll see in what circumstances that answer could possibly be right. Does this seem ad hoc? Fine, so it’s ad hoc. Here is the secret of mathematicians:

It’s fine if you get your answer by any stupid method you like, including guessing and getting lucky, as long as you check that your answer is right.

Oh, sure, we’d rather you get an answer systematically, since a system might give us ideas how to find answers in new problems. But if all we want is an answer then, by definition, we don’t care where it came from. Anyway, we’re making a particular guess, one that’s very good for this sort of problem. Indeed, this guess is our system. A lot of guesses at solving differential equations use exactly this guess. Are you ready for my guess about what solves this? Because here it is.

We should expect that

x(t) = C e^{r t}

Here ‘C’ is some constant number, not yet known. And ‘r’ is some constant number, not yet known. ‘t’ is time. ‘e’ is that number 2.71828(etc) that always turns up in these problems. Why? Because its derivative is very easy to take, and if we have to take derivatives we want them to be easy to take. The first derivative of Ce^{rt} with respect to ‘t’ is r Ce^{rt} . The second derivative with respect to ‘t’ is r^2 Ce^{rt} . so here’s what we have:

m r^2 Ce^{rt} = - k Ce^{rt}

What we’d like to find are the values for ‘C’ and ‘r’ that make this equation true. It’s got to be true for every value of ‘t’, yes. But this is actually an easy equation to solve. Why? Because the C e^{rt} on the left side has to equal the C e^{rt} on the right side. As long as they’re not equal to zero and hey, what do you know? C e^{rt} can’t be zero unless ‘C’ is zero. So as long as ‘C’ is any number at all in the world except zero we can divide this ugly lump of symbols out of both sides. (If ‘C’ is zero, then this equation is 0 = 0 which is true enough, I guess.) What’s left?

m r^2 = -k

OK, so, we have no idea what ‘C’ is and we’re not going to have any. That’s all right. We’ll get it later. What we can get is ‘r’. You’ve probably got there already. There’s two possible answers:

r = \pm\sqrt{-\frac{k}{m}}

You might not like that. You remember that ‘k’ has to be positive, and if mass ‘m’ isn’t positive something’s screwed up. So what are we doing with the square root of a negative number? Yes, we’re getting imaginary numbers. Two imaginary numbers, in fact:

r = \imath \sqrt{\frac{k}{m}}, r = - \imath \sqrt{\frac{k}{m}}

Which is right? Both. In some combination, too. It’ll be a bit with that first ‘r’ plus a bit with that second ‘r’. In the differential equations trade this is called superposition. We’ll have information that tells us how much uses the first ‘r’ and how much uses the second.

You might still be upset. Hey, we’ve got these imaginary numbers here describing how a spring moves and while you might not be one of those high-price physicists you see all over the media you know springs aren’t imaginary. I’ve got a couple responses to that. Some are semantic. We only call these numbers “imaginary” because when we first noticed they were useful things we didn’t know what to make of them. The label is an arbitrary thing that doesn’t make any demands of the numbers. If we had called them, oh, “Cardanic numbers” instead would you be upset that you didn’t see any Cardanos in your springs?

My high-class semantic response is to ask in exactly what way is the “square root of minus one” any less imaginary than “three”? Can you give me a handful of three? No? Didn’t think so.

And then the practical response is: don’t worry. Exponentials raised to imaginary numbers do something amazing. They turn into sine waves. Well, sine and cosine waves. I’ll spare you just why. You can find it by looking at the first twelve or so posts of any pop mathematics blog and its article about how amazing Euler’s Formula is. Given that Euler published, like, 2,038 books and papers through his life and the fifty years after his death it took to clear the backlog you might think, “Euler had a lot of Formulas, right? Identities too?” Yes, he did, but you’ll know this one when you see it.

What’s important is that the displacement of your thing on a spring will be described by a function which looks like this:

x(t) = C_1 e^{\sqrt{\frac{k}{m}} t} + C_2 e^{-\sqrt{\frac{k}{m}} t}

for two constants, ‘C1‘ and ‘C2‘. These were the things we called ‘C’ back when we thought the answer might be Ce^{rt} ; there’s two of them because there’s two r’s. I give you my word this is equivalent to a formula like this, but you can make me show my work if you must:

x(t) = A cos\left(\sqrt{\frac{k}{m}} t\right) + B sin\left(\sqrt{\frac{k}{m}} t\right)

for some (other) constants ‘A’ and ‘B’. Cosine and sine are the old things you remember from learning about cosine and sine.

OK, but what are ‘A’ and ‘B’?

Generically? We don’t care. Some numbers. Maybe zero. Maybe not. The pattern, how the displacement changes over time, will be the same whatever they are. It’ll be regular oscillation. At one time your thing will be as far from the equilibrium as it gets, and not moving toward or away from the center. At one time it’ll be back at the center and moving as fast as it can. At another time it’ll be as far away from the equilibrium as it gets, but on the other side. At another time it’ll be back at the equilibrium and moving as fast as it ever does, but the other way. How far is that maximum? What’s the fastest it travels?

The answer’s in how we started. If we start at the equilibrium without any kind of movement we’re never going to leave the equilibrium. We have to get nudged out of it. But what kind of nudge? There’s three ways you can do to nudge something out.

You can tug it out some and let it go from rest. This is the easiest: then ‘A’ is however big your tug was and ‘B’ is zero.

You can let it start from equilibrium but give it a good whack so it’s moving at some initial velocity. This is the next-easiest: ‘A’ is zero, and ‘B’ is … no, not the initial velocity. You need to look at what the velocity of your thing is at the start. That’s the first derivative:

\frac{dx}{dt} = -\sqrt{\frac{k}{m}}A sin\left(\sqrt{\frac{k}{m}} t\right) + \sqrt{\frac{k}{m}} B sin\left(\sqrt{\frac{k}{m}} t\right)

The start is when time is zero because we don’t need to be difficult. when ‘t’ is zero the above velocity is \sqrt{\frac{k}{m}} B . So that product has to be the initial velocity. That’s not much harder.

The third case is when you start with some displacement and some velocity. A combination of the two. Then, ugh. You have to figure out ‘A’ and ‘B’ that make both the position and the velocity work out. That’s the simultaneous solutions of equations, and not even hard equations. It’s more work is all. I’m interested in other stuff anyway.

Because, yeah, the spring is going to wobble back and forth. What I’d like to know is how long it takes to get back where it started. How long does a cycle take? Look back at that position function, for example. That’s all we need.

x(t) = A cos\left(\sqrt{\frac{k}{m}} t\right) + B sin\left(\sqrt{\frac{k}{m}} t\right)

Sine and cosine functions are periodic. They have a period of 2π. This means if you take the thing inside the parentheses after a sine or a cosine and increase it — or decrease it — by 2π, you’ll get the same value out. What’s the first time that the displacement and the velocity will be the same as their starting values? If they started at t = 0, then, they’re going to be back there at a time ‘T’ which makes true the equation

\sqrt{\frac{k}{m}} T = 2\pi

And that’s going to be

T = 2\pi\sqrt{\frac{m}{k}}

Maybe surprising thing about this: the period doesn’t depend at all on how big the displacement is. That’s true for perfect springs, which don’t exist in the real world. You knew that. Imagine taking a Junior Slinky from the dollar store and sticking a block of something on one end. Imagine stretching it out to 500,000 times the distance between the Earth and Jupiter and letting go. Would it act like a spring or would it break? Yeah, we know. It’s sad. Think of the animated-cartoon joy a spring like that would produce.

But this period not depending on the displacement is true for small enough displacements, in the real world. Or for good enough springs. Or things that work enough like springs. By “true” I mean “close enough to true”. We can give that a precise mathematical definition, which turns out to be what you would mean by “close enough” in everyday English. The difference is it’ll have Greek letters included.

So to sum up: suppose we have something that acts like a spring. Then we know qualitatively how it behaves. It oscillates back and forth in a sine wave around the equilibrium. Suppose we know what the spring constant ‘k’ is. Suppose we also know ‘m’, which represents the inertia of the thing. If it’s a real thing on a real spring it’s mass. Then we know quantitatively how it moves. It has a period, based on this spring constant and this mass. And we can say how big the oscillations are based on how big the starting displacement and velocity are. That’s everything I care about in a spring. At least until I get into something wild like several springs wired together, which I am not doing now and might never do.

And, as we’ll see when we get back to orbits, a lot of things work close enough to springs.

Why Stuff Can Orbit, Part 8: Introducing Stability


Previously:

And the supplemental reading:


I bet you imagined I’d forgot this series, or that I’d quietly dropped it. Not so. I’ve just been finding the energy for this again. 2017 has been an exhausting year.

With the last essay I finished the basic goal of “Why Stuff Can Orbit”. I’d described some of the basic stuff for central forces. These involve something — a planet, a mass on a spring, whatever — being pulled by the … center. Well, you can call anything the origin, the center of your coordinate system. Why put that anywhere but the place everything’s pulled towards? The key thing about a central force is it’s always in the direction of the center. It can be towards the center or away from the center, but it’s always going to be towards the center because the “away from” case is boring. (The thing gets pushed away from the center and goes far off, never to be seen again.) How strongly it’s pulled toward the center changes only with the distance from the center.

Since the force only changes with the distance between the thing and the center it’s easy to think this is a one-dimensional sort of problem. You only need the coordinate describing this distance. We call that ‘r’, because we end up finding orbits that are circles. Since the distance between the center of a circle and its edge is the radius, it would be a shame to use any other letter.

Forces are hard to work with. At least for a lot of stuff. We can represent central forces instead as potential energy. This is easier because potential energy doesn’t have any direction. It’s a lone number. When we can shift something complicated into one number chances are we’re doing well.

But we are describing something in space. Something in three-dimensional space, although it turns out we’ll only need two. We don’t care about stuff that plunges right into the center; that’s boring. We like stuff that loops around and around the center. Circular orbits. We’ve seen that second dimension in the angular momentum, which we represent as ‘L’ for reasons I dunno. I don’t think I’ve ever met anyone who did. Maybe it was the first letter that came to mind when someone influential wrote a good textbook. Angular momentum is a vector, but for these problems we don’t need to care about that. We can use an ordinary number to carry all the information we need about it.

We get that information from the potential energy plus a term that’s based on the square of the angular momentum divided by the square of the radius. This “effective potential energy” lets us find whether there can be a circular orbit at all, and where it’ll be. And it lets us get some other nice stuff like how the size of the orbit and the time it takes to complete an orbit relate to each other. See the earlier stuff for details. In short, though, we get an equilibrium, a circular orbit, whenever the effective potential energy is flat, neither rising nor falling. That happens when the effective potential energy changes from rising to falling, or changes from falling to rising. Well, if it isn’t rising and if it isn’t falling, what else can it be doing? It only does this for an infinitesimal moment, but that’s all we need. It also happens when the effective potential energy is flat for a while, but that like never happens.

Where I want to go next is into closed orbits. That is, as the planet orbits a sun (or whatever it is goes around whatever it’s going around), does it come back around to exactly where it started? Moving with the same speed in the same direction? That is, does the thing orbit like a planet does?

(Planets don’t orbit like this. When you have three, or more, things in the universe the mathematics of orbits gets way too complicated to do exactly. But this is the thing they’re approximating, we hope, well.)

To get there I’ll have to put back a second dimension. Sorry. Won’t need a third, though. That’ll get named θ because that’s our first choice for an angle. And it makes too much sense to describe a planet’s position as its distance from the center and the angle it makes with respect to some reference line. Which reference line? Whatever works for you. It’s like measuring longitude. We could measure degrees east and west of some point other than Greenwich as well, and as correctly, as we do. We use the one we use because it was convenient.

Along the way to closed orbits I have to talk about stability. There are many kinds of mathematical stability. My favorite is called Lyapunov Stability, because it’s such a mellifluous sound. They all circle around the same concept. It’s what you’d imagine from how we use the word in English. Start with an equilibrium, a system that isn’t changing. Give it a nudge. This disrupts it in some way. Does the disruption stay bounded? That is, does the thing still look somewhat like it did before? Or does the disruption grow so crazy big we have no idea what it’ll ever look like again? (A small nudge, by the way. You can break anything with a big enough nudge; that’s not interesting. It’s whether you can break it with a small nudge that we’d like to know.)

One of the ways we can study this is by looking at the effective potential energy. By its shape we can say whether a central-force equilibrium is stable or not. It’s easy, too, as we’ve got this set up. (Warning before you go passing yourself off as a mathematical physicist: it is not always easy!) Look at the effective potential energy versus the radius. If it has a part that looks like a bowl, cupped upward, it’s got a stable equilibrium. If it doesn’t, it doesn’t have a stable equilibrium. If you aren’t sure, imagine the potential energy was a track, like for a toy car. And imagine you dropped a marble on it. If you give the marble a nudge, does it roll to a stop? If it does, stable. If it doesn’t, unstable.

The sort of wiggly shape that serves as every mathematical physicist's generic potential energy curve to show off the different kinds of equilibrium.
A phony effective potential energy. Most are a lot less exciting than this; see some of the earlier pieces in this series. But some weird-shaped functions like this were toyed with by physicists in the 19th century who were hoping to understand chemistry. Why should gases behave differently at different temperatures? Why should some combinations of elements make new compounds while others don’t? We needed statistical mechanics and quantum mechanics to explain those, but we couldn’t get there without a lot of attempts and failures at explaining it with potential energies and classical mechanics.

Stable is more interesting. We look at cases where there is this little bowl cupped upward. If we have a tiny nudge we only have to look at a small part of that cup. And that cup is going to look an awful lot like a parabola. If you don’t remember what a parabola is, think back to algebra class. Remember that curvey shape that was the only thing drawn on the board when you were dealing with the quadratic formula? That shape is a parabola.

Who cares about parabolas? We care because we know something good about them. In this context, anyway. The potential energy for a mass on a spring is also a parabola. And we know everything there is to know about masses on springs. Seriously. You’d think it was all physics was about from like 1678 through 1859. That’s because it’s something calculus lets us solve exactly. We don’t need books of complicated integrals or computers to do the work for us.

So here’s what we do. It’s something I did not get clearly when I was first introduced to these concepts. This left me badly confused and feeling lost in my first physics and differential equations courses. We are taking our original physics problem and building a new problem based on it. This new problem looks at how big our nudge away from the equilibrium is. How big the nudge is, how fast it grows, how it changes in time will follow rules. Those rules will look a lot like those for a mass on a spring. We started out with a radius that gives us a perfectly circular orbit. Now we get a secondary problem about how the difference between the nudged and the circular orbit changes in time.

That secondary problem has the same shape, the same equations, as a mass on a spring does. A mass on a spring is a central force problem. All the tools we had for studying central-force problems are still available. There is a new central-force problem, hidden within our original one. Here the “center” is the equilibrium we’re nudged around. It will let us answer a new set of questions.

Why Stuff Can Orbit, Part 7: ALL the Circles


Previously:

And some supplemental reading:


Last time around I showed how to do a central-force problem for normal gravity. That’s one where a planet, or moon, or satellite, or whatever is drawn towards the center of space. It’s drawn by a potential energy that equals some constant times the inverse of the distance from the origin. That is, V(r) = C r-1. With a little bit of fussing around we could find out what distance from the center lets a circular orbit happen. And even Kepler’s Third Law, connecting how long an orbit takes to how big it must be.

There are two natural follow-up essays. One is to work out elliptical orbits. We know there are such things; all real planets and moons have them, and nearly all satellites do. The other is to work out circular orbits for another easy-to-understand example, like a mass on a spring. That’s something with a potential energy that looks like V(r) = C r2.

I want to do the elliptical orbits later on. The mass-on-a-spring I could do now. So could you, if you look follow last week’s essay and just change the numbers a little. But, you know, why bother working out one problem? Why not work out a lot of them? Why not work out every central-force problem, all at once?

Because we can’t. I mean, I can describe how to do that, but it isn’t going to save us much time. Like, the quadratic formula is great because it’ll give you the roots of a quadratic polynomial in one step. You don’t have to do anything but a little arithmetic. We can’t get a formula that easy if we try to solve for every possible potential energy.

But we can work out a lot of central-force potential energies all at once. That is, we can solve for a big set of similar problems, a “family” as we call them. The obvious family is potential energies that are powers of the planet’s distance from the center. That is, they’re potential energies that follow the rule

V(r) = C r^n

Here ‘C’ is some number. It might depend on the planet’s mass, or the sun’s mass. Doesn’t matter. All that’s important is that it not change over the course of the problem. So, ‘C’ for Constant. And ‘n’ is another constant number. Some numbers turn up a lot in useful problems. If ‘n’ is -1 then this can describe gravitational attraction. If ‘n’ is 2 then this can describe a mass on a spring. This ‘n’ can be any real number. That’s not an ideal choice of letter. ‘n’ usually designates a whole number. By using that letter I’m biasing people to think of numbers like ‘2’ at the expense of perfectly legitimate alternatives such as ‘2.1’. But now that I’ve made that explicit maybe we won’t make a casual mistake.

So what I want is to find where there are stable circular orbits for an arbitrary radius-to-a-power force. I don’t know what ‘C’ and ‘n’ are, but they’re some numbers. To find where a planet can have a circular orbit I need to suppose the planet has some mass, ‘m’. And that its orbit has some angular momentum, a number called ‘L’. From this we get the effective potential energy. That’s what the potential energy looks like when we remember that angular momentum has to be conserved.

V_{eff}(r) = C r^n + \frac{L^2}{2m} r^{-2}

To find where a circular orbit can be we have to take the first derivative of Veff with respect to ‘r’. The circular orbit can happen at a radius for which this first derivative equals zero. So we need to solve this:

\frac{dV_{eff}}{dr} = n C r^{n-1} - 2\frac{L^2}{2m} r^{-3} = 0

That derivative we know from the rules of how to take derivatives. And from this point on we have to do arithmetic. We want to get something which looks like ‘r = (some mathematics stuff here)’. Hopefully it’ll be something not too complicated. And hey, in the second term there, the one with L2 in it, we have a 2 in the numerator and a 2 in the denominator. So those cancel out and that’s simpler. That’s hopeful, isn’t it?

n C r^{n-1} - \frac{L^2}{m}r^{-3} = 0

OK. Add \frac{L^2}{m}r^{-3} to both sides of the equation; we’re used to doing that. At least in high school algebra we are.

n C r^{n-1} = \frac{L^2}{m}r^{-3}

Not looking much better? Try multiplying both left and right sides by ‘r3‘. This gets rid of all the ‘r’ terms on the right-hand side of the equation.

n C r^{n+2} = \frac{L^2}{m}

Now we’re getting close to the ideal of ‘r = (some mathematics stuff)’. Divide both sides by the constant number ‘n times C’.

r^{n+2} = \frac{L^2}{n C m}

I know how much everybody likes taking (n+2)-nd roots of a quantity. I’m sure you occasionally just pick an object at random — your age, your telephone number, a potato, a wooden block — and find its (n+2)-nd root. I know. I’ll spoil some of the upcoming paragraphs to say that it’s going to be more useful knowing ‘rn + 2‘ than it is knowing ‘r’. But I’d like to have the radius of a circular orbit on the record. Here it is.

r = \left(\frac{L^2}{n C m}\right)^{\frac{1}{n + 2}}

Can we check that this is right? Well, we can at least check that things aren’t wrong. We can check against the example we already know. That’s the gravitational potential energy problem. For that one, ‘C’ is the number ‘G M m’. That’s the gravitational constant of the universe times the mass of the sun times the mass of the planet. And for gravitational potential energy, ‘n’ is equal to -1. This implies that, for a gravitational potential energy problem, we get a circular orbit when

r_{grav} = \left(\frac{L^2}{n G M m^2}\right)^{\frac{1}{1}}

I’m labelling it ‘rgrav‘ to point out it’s the radius of a circular orbit for gravitational problems. Might or might not need that in the future, but the label won’t hurt anything.

Go ahead and guess whether that agrees with last week’s work. I’m feeling confident.

OK, so, we know where a circular orbit might turn up for an arbitrary power function potential energy. Is it stable? We know from the third “Why Stuff Can Orbit” essay that it’s not a sure thing. We can have potential energies that don’t have any circular orbits. So it must be possible there are unstable orbits.

Whether our circular orbit is stable demands we do the same work we did last time. It will look a little harder to start, because there’s one more variable in it. What had been ‘-1’ last time is now an ‘n’, and stuff like ‘-2’ becomes ‘n-1’. Is that actually harder? Really?

So here’s the second derivative of the effective potential:

\frac{d^2V_{eff}}{dr^2} = (n-1)nCr^{n - 2} + 3\frac{L^2}{m}r^{-4}

My first impulse when I worked this out was to take the ‘r’ for a circular orbit, the thing worked out five paragraphs above, and plug it in to that expression. This is madness. Don’t do it. Or, you know, go ahead and start doing it and see how long it takes before you regret the errors of your ways.

The non-madness-inducing way to work out if this is a positive number? It involves noticing r^{n-2} is the same number as r^{n+2}\cdot r^{-4} . So we have this bit of distribution-law magic:

\frac{d^2V_{eff}}{dr^2} = (n-1)nCr^{n + 2}r^{-4} + 3\frac{L^2}{m}r^{-4}

\frac{d^2V_{eff}}{dr^2} = \left((n-1)nCr^{n + 2} + 3\frac{L^2}{m}\right) \cdot r^{-4}

I’m sure we all agree that’s better, right? No, honestly, let me tell you why this is better. When will this expression be true?

\left((n-1)nCr^{n + 2} + 3\frac{L^2}{m}\right) \cdot r^{-4} > 0

That’s the product of two expressions. One of them is ‘r-4‘. ‘r’ is the radius of the planet’s orbit. That has to be a positive number. It’s how far the planet is from the origin. The number can’t be anything but positive. So we don’t have to worry about that.

SPOILER: I just palmed a card there. Did you see me palm a card there? Because I totally did. Watch for where that card turns up. It’ll be after this next bit.

So let’s look at the non-card-palmed part of this. We’re going to have a stable equilibrium when the other factor of that mess up above is positive. We need to know when this is true:

(n-1)nCr^{n + 2} + 3\frac{L^2}{m}  > 0

OK. Well. We do know what ‘rn+2‘ is. Worked that out … uhm … twelve(?) paragraphs ago. I’ll say twelve and hope I don’t mess that up in editing. Anyway, what’s important is r^{n+2} = \frac{L^2}{n C m} . So we put that in where ‘rn+2‘ appeared in that above expression.

(n-1)nC\frac{L^2}{n C m} + 3 \frac{L^2}{m} > 0

This is going to simplify down some. Look at that first term, with an ‘n C’ in the numerator and again in the denominator. We’re going to be happier soon as we cancel those out.

(n-1)\frac{L^2}{m} + 3\frac{L^2}{m} > 0

And now we get to some fine distributive-law action, the kind everyone likes:

\left( (n-1) + 3 \right)\frac{L^2}{m} > 0

Well, we know \frac{L^2}{m} has to be positive. The angular momentum ‘L’ might be positive or might be negative but its square is certainly positive. The mass ‘m’ has to be a positive number. So we’ll get a stable equilibrium whenever (n - 1) + 3 is greater than 0. That is, whenever n > -2 . Done.

No we’re not done. That’s nonsense. We knew that going in. We saw that a couple essays ago. If your potential energy were something like, say, V(r) = -2 r^3 you wouldn’t have any orbits at all, never mind stable orbits. But 3 is certainly greater than -2. So what’s gone wrong here?

Let’s go back to that palmed card. Remember I mentioned how the radius of our circular orbit was a positive number. This has to be true, if there is a circular orbit. What if there isn’t one? Do we know there is a radius ‘r’ that the planet can orbit the origin? Here’s the formula giving us that circular orbit’s radius once again:

r = \left(\frac{L^2}{n C m}\right)^{\frac{1}{n + 2}}

Do we know that’s going to exist? … Well, sure. That’s going to be some meaningful number as long as we avoid obvious problems. Like, we can’t have the power ‘n’ be equal to zero, because dividing by zero is all sorts of bad. Also we can’t have the constant ‘C’ be zero, again because dividing by zero is bad.

Not a problem, though. If either ‘C’ or ‘n’ were zero, or if both were, then the original potential energy would be a constant number. V(r) would be equal to ‘C’ (if ‘n’ were zero), or ‘0’ (if ‘C’ were zero). It wouldn’t change with the radius ‘r’. This is a case called the ‘free particle’. There’s no force pushing the planet in one direction or another. So if the planet were not moving it would never start. If the planet were already moving, it would keep moving in the same direction in a straight line. No circular orbits.

Similarly if ‘n’ were equal to ‘-2’ there’d be problems because the power we raise that parenthetical expression to would be equal to one divided by zero, which is bad. Is there anything else that could be trouble there?

What if the thing inside parentheses is a negative number? I may not know what ‘n’ is. I don’t. We started off by supposing we didn’t know beyond that it was a number. But I do know that the n-th root of a negative number is going to be trouble. It might be negative. It might be complex-valued. But it won’t be a positive number. And we need a radius that’s a positive number. So that’s the palmed card. To have a circular orbit at all, positive or negative, we have to have:

\frac{L^2}{n C m} > 0

‘L’ is a regular old number, maybe positive, maybe negative. So ‘L2‘ is a positive number. And the mass ‘m’ is a positive number. We don’t know what ‘n’ and C’ are. But as long as their product is positive we’re good. The whole equation will be true. So ‘n’ and ‘C’ can both be negative numbers. We saw that with gravity: V(r) = -\frac{GMm}{r} . ‘G’ is the gravitational constant of the universe, a positive number. ‘M’ and ‘m’ are masses, also positive.

Or ‘n’ and ‘C’ can both be positive numbers. That turns up with spring problems: V(r) = K r^2 , where ‘K’ is the ‘spring constant’. That’s some positive number again.

That time we found potential energies that didn’t have orbits? They were ones that had a positive ‘C’ and negative ‘n’, or a negative ‘C’ and positive ‘n’. The case we just worked out doesn’t have circular orbits. It’s nice to have that sorted out at least.

So what does it mean that we can’t have a stable orbit if ‘n’ is less than or equal to -2? Even if ‘C’ is negative? It turns out that if you have a negative ‘C’ and big negative ‘n’, like say -5, the potential energy drops way down to something infinitely large and negative at smaller and smaller radiuses. If you have a positive ‘C’, the potential energy goes way up at smaller and smaller radiuses. For large radiuses the potential drops to zero. But there’s never the little U-shaped hill in the middle, the way you get for gravity-like potentials or spring potentials or normal stuff like that. Yeah, who would have guessed?

What if we do have a stable orbit? How long does an orbit take? How does that relate to the radius of the orbit? We used this radius expression to work out Kepler’s Third Law for the gravity problem last week. We can do that again here.

Last week we worked out what the angular momentum ‘L’ had to be in terms of the radius of the orbit and the time it takes to complete one orbit. The radius of the orbit we called ‘r’. The time an orbit takes we call ‘T’. The formula for angular momentum doesn’t depend on what problem we’re doing. It just depends on the mass ‘m’ of what’s spinning around and how it’s spinning. So:

L = 2\pi m \frac{r^2}{T}

And from this we know what ‘L2‘ is.

L^2 = 4\pi^2 m^2 \frac{r^4}{T^2}

That’s convenient because we have an ‘L2‘ term in the formula for what the radius is. I’m going to stick with the formula we got for ‘rn+2‘ because that is so, so much easier to work with than ‘r’ by itself. So we go back to that starting point and then substitute what we know ‘L2‘ to be in there.

r^{n + 2} = \frac{L^2}{n C m}

This we rewrite as:

r^{n + 2} = \frac{4 \pi^2 m^2}{n C m}\frac{r^4}{T^2}

Some stuff starts cancelling out again. One ‘m’ in the numerator and one in the denominator. Small thing but it makes our lives a bit better. We can multiply the left side and the right side by T2. That’s more obviously an improvement. We can divide the left side and the right side by ‘rn + 2‘. And yes that is too an improvement. Watch all this:

r^{n + 2} = \frac{4 \pi^2 m}{n C}\frac{r^4}{T^2}

T^2 \cdot r^{n + 2} = \frac{4 \pi^2 m}{n C}r^4

T^2  = \frac{4 \pi^2 m}{n C}r^{2 - n}

And that last bit is the equivalent of Kepler’s Third Law for our arbitrary power-law style force.

Are we right? Hard to say offhand. We can check that we aren’t wrong, at least. We can check against the gravitational potential energy. For this ‘n’ is equal to -1. ‘C’ is equal to ‘-G M m’. Make those substitutions; what do we get?

T^2  = \frac{4 \pi^2 m}{(-1) (-G M m)}r^{2 - (-1)}

T^2  = \frac{4 \pi^2}{G M}r^{3}

Well, that is what we expected for this case. So the work looks good, this far. Comforting.

Why Stuff Can Orbit, Part 6: Circles and Where To Find Them


Previously:

And some supplemental reading:


So now we can work out orbits. At least orbits for a central force problem. Those are ones where a particle — it’s easy to think of it as a planet — is pulled towards the center of the universe. How strong that pull is depends on some constants. But it only changes as the distance the planet is from the center changes.

What we’d like to know is whether there are circular orbits. By “we” I mean “mathematical physicists”. And I’m including you in that “we”. If you’re reading this far you’re at least interested in knowing how mathematical physicists think about stuff like this.

It’s easiest describing when these circular orbits exist if we start with the potential energy. That’s a function named ‘V’. We write it as ‘V(r)’ to show it’s an energy that changes as ‘r’ changes. By ‘r’ we mean the distance from the center of the universe. We’d use ‘d’ for that except we’re so used to thinking of distance from the center as ‘radius’. So ‘r’ seems more compelling. Sorry.

Besides the potential energy we need to know the angular momentum of the planet (or whatever it is) moving around the center. The amount of angular momentum is a number we call ‘L’. It might be positive, it might be negative. Also we need the planet’s mass, which we call ‘m’. The angular momentum and mass let us write a function called the effective potential energy, ‘Veff(r)’.

And we’ll need to take derivatives of ‘Veff(r)’. Fortunately that “How Differential Calculus Works” essay explains all the symbol-manipulation we need to get started. That part is calculus, but the easy part. We can just follow the rules already there. So here’s what we do:

  • The planet (or whatever) can have a circular orbit around the center at any radius which makes the equation \frac{dV_{eff}}{dr} = 0 true.
  • The circular orbit will be stable if the radius of its orbit makes the second derivative of the effective potential, \frac{d^2V_{eff}}{dr^2} , some number greater than zero.

We’re interested in stable orbits because usually unstable orbits are boring. They might exist but any little perturbation breaks them down. The mathematician, ordinarily, sees this as a useless solution except in how it describes different kinds of orbits. The physicist might point out that sometimes it can take a long time, possibly millions of years, before the perturbation becomes big enough to stand out. Indeed, it’s an open question whether our solar system is stable. While it seems to have gone millions of years without any planet changing its orbit very much we haven’t got the evidence to say it’s impossible that, say, Saturn will be kicked out of the solar system anytime soon. Or worse, that Earth might be. “Soon” here means geologically soon, like, in the next million years.

(If it takes so long for the instability to matter then the mathematician might allow that as “metastable”. There are a lot of interesting metastable systems. But right now, I don’t care.)

I realize now I didn’t explain the notation for the second derivative before. It looks funny because that’s just the best we can work out. In that fraction \frac{d^2V_{eff}}{dr^2} the ‘d’ isn’t a number so we can’t cancel it out. And the superscript ‘2’ doesn’t mean squaring, at least not the way we square numbers. There’s a functional analysis essay in there somewhere. Again I’m sorry about this but there’s a lot of things mathematicians want to write out and sometimes we can’t find a way that avoids all confusion. Roll with it.

So that explains the whole thing clearly and easily and now nobody could be confused and yeah I know. If my Classical Mechanics professor left it at that we’d have open rebellion. Let’s do an example.

There are two and a half good examples. That is, they’re central force problems with answers we know. One is gravitation: we have a planet orbiting a star that’s at the origin. Another is springs: we have a mass that’s connected by a spring to the origin. And the half is electric: put a positive electric charge at the center and have a negative charge orbit that. The electric case is only half a problem because it’s the same as the gravitation problem except for what the constants involved are. Electric charges attract each other crazy way stronger than gravitational masses do. But that doesn’t change the work we do.

This is a lie. Electric charges accelerating, and just orbiting counts as accelerating, cause electromagnetic effects to happen. They give off light. That’s important, but it’s also complicated. I’m not going to deal with that.

I’m going to do the gravitation problem. After all, we know the answer! By Kepler’s something law, something something radius cubed something G M … something … squared … After all, we can look up the answer!

The potential energy for a planet orbiting a sun looks like this:

V(r) = - G M m \frac{1}{r}

Here ‘G’ is a constant, called the Gravitational Constant. It’s how strong gravity in the universe is. It’s not very strong. ‘M’ is the mass of the sun. ‘m’ is the mass of the planet. To make sense ‘M’ should be a lot bigger than ‘m’. ‘r’ is how far the planet is from the sun. And yes, that’s one-over-r, not one-over-r-squared. This is the potential energy of the planet being at a given distance from the sun. One-over-r-squared gives us how strong the force attracting the planet towards the sun is. Different thing. Related thing, but different thing. Just listing all these quantities one after the other means ‘multiply them together’, because mathematicians multiply things together a lot and get bored writing multiplication symbols all the time.

Now for the effective potential we need to toss in the angular momentum. That’s ‘L’. The effective potential energy will be:

V_{eff}(r) = - G M m \frac{1}{r} + \frac{L^2}{2 m r^2}

I’m going to rewrite this in a way that means the same thing, but that makes it easier to take derivatives. At least easier to me. You’re on your own. But here’s what looks easier to me:

V_{eff}(r) = - G M m r^{-1} + \frac{L^2}{2 m} r^{-2}

I like this because it makes every term here look like “some constant number times r to a power”. That’s easy to take the derivative of. Check back on that “How Differential Calculus Works” essay. The first derivative of this ‘Veff(r)’, taken with respect to ‘r’, looks like this:

\frac{dV_{eff}}{dr} = -(-1) G M m r^{-2} -2\frac{L^2}{2m} r^{-3}

We can tidy that up a little bit: -(-1) is another way of writing 1. The second term has two times something divided by 2. We don’t need to be that complicated. In fact, when I worked out my notes I went directly to this simpler form, because I wasn’t going to be thrown by that. I imagine I’ve got people reading along here who are watching these equations warily, if at all. They’re ready to bolt at the first sign of something terrible-looking. There’s nothing terrible-looking coming up. All we’re doing from this point on is really arithmetic. It’s multiplying or adding or otherwise moving around numbers to make the equation prettier. It happens we only know those numbers by cryptic names like ‘G’ or ‘L’ or ‘M’. You can go ahead and pretend they’re ‘4’ or ‘5’ or ‘7’ if you like. You know how to do the steps coming up.

So! We allegedly can have a circular orbit when this first derivative is equal to zero. What values of ‘r’ make true this equation?

G M m r^{-2} - \frac{L^2}{m} r^{-3} = 0

Not so helpful there. What we want is to have something like ‘r = (mathematics stuff here)’. We have to do some high school algebra moving-stuff-around to get that. So one thing we can do to get closer is add the quantity \frac{L^2}{m} r^{-3} to both sides of this equation. This gets us:

G M m r^{-2} = \frac{L^2}{m} r^{-3}

Things are getting better. Now multiply both sides by the same number. Which number? r3. That’s because ‘r-3‘ times ‘r3‘ is going to equal 1, while ‘r-2‘ times ‘r3‘ will equal ‘r1‘, which normal people call ‘r’. I kid; normal people don’t think of such a thing at all, much less call it anything. But if they did, they’d call it ‘r’. We’ve got:

G M m r = \frac{L^2}{m}

And now we’re getting there! Divide both sides by whatever number ‘G M’ is, as long as it isn’t zero. And then we have our circular orbit! It’s at the radius

r = \frac{L^2}{G M m^2}

Very good. I’d even say pretty. It’s got all those capital letters and one little lowercase. Something squared in the numerator and the denominator. Aesthetically pleasant. Stinks a little that it doesn’t look like anything we remember from Kepler’s Laws once we’ve looked them up. We can fix that, though.

The key is the angular momentum ‘L’ there. I haven’t said anything about how that number relates to anything. It’s just been some constant of the universe. In a sense that’s fair enough. Angular momentum is conserved, exactly the same way energy is conserved, or the way linear momentum is conserved. Why not just let it be whatever number it happens to be?

(A note for people who skipped earlier essays: Angular momentum is not a number. It’s really a three-dimensional vector. But in a central force problem with just one planet moving around we aren’t doing any harm by pretending it’s just a number. We set it up so that the angular momentum is pointing directly out of, or directly into, the sheet of paper we pretend the planet’s orbiting in. Since we know the direction before we even start work, all we have to care about is the size. That’s the number I’m talking about.)

The angular momentum of a thing is its moment of inertia times its angular velocity. I’m glad to have cleared that up for you. The moment of inertia of a thing describes how easy it is to start it spinning, or stop it spinning, or change its spin. It’s a lot like inertia. What it is depends on the mass of the thing spinning, and how that mass is distributed, and what it’s spinning around. It’s the first part of physics that makes the student really have to know volume integrals.

We don’t have to know volume integrals. A single point mass spinning at a constant speed at a constant distance from the origin is the easy angular momentum to figure out. A mass ‘m’ at a fixed distance ‘r’ from the center of rotation moving at constant speed ‘v’ has an angular momentum of ‘m’ times ‘r’ times ‘v’.

So great; we’ve turned ‘L’ which we didn’t know into ‘m r v’, where we know ‘m’ and ‘r’ but don’t know ‘v’. We’re making progress, I promise. The planet’s tracing out a circle in some amount of time. It’s a circle with radius ‘r’. So it traces out a circle with perimeter ‘2 π r’. And it takes some amount of time to do that. Call that time ‘T’. So its speed will be the distance travelled divided by the time it takes to travel. That’s \frac{2 \pi r}{T} . Again we’ve changed one unknown number ‘L’ for another unknown number ‘T’. But at least ‘T’ is an easy familiar thing: it’s how long the orbit takes.

Let me show you how this helps. Start off with what ‘L’ is:

L = m r v = m r \frac{2\pi r}{T} = 2\pi m \frac{r^2}{T}

Now let’s put that into the equation I got eight paragraphs ago:

r = \frac{L^2}{G M m^2}

Remember that one? Now put what I just said ‘L’ was, in where ‘L’ shows up in that equation.

r = \frac{\left(2\pi m \frac{r^2}{T}\right)^2}{G M m^2}

I agree, this looks like a mess and possibly a disaster. It’s not so bad. Do some cleaning up on that numerator.

r = \frac{4 \pi^2 m^2}{G M m^2} \frac{r^4}{T^2}

That’s looking a lot better, isn’t it? We even have something we can divide out: the mass of the planet is just about to disappear. This sounds bizarre, but remember Kepler’s laws: the mass of the planet never figures into things. We may be on the right path yet.

r = \frac{4 \pi^2}{G M} \frac{r^4}{T^2}

OK. Now I’m going to multiply both sides by ‘T2‘ because that’ll get that out of the denominator. And I’ll divide both sides by ‘r’ so that I only have the radius of the circular orbit on one side of the equation. Here’s what we’ve got now:

T^2 = \frac{4 \pi^2}{G M} r^3

And hey! That looks really familiar. A circular orbit’s radius cubed is some multiple of the square of the orbit’s time. Yes. This looks right. At least it looks reasonable. Someone else can check if it’s right. I like the look of it.

So this is the process you’d use to start understanding orbits for your own arbitrary potential energy. You can find the equivalent of Kepler’s Third Law, the one connecting orbit times and orbit radiuses. And it isn’t really hard. You need to know enough calculus to differentiate one function, and then you need to be willing to do a pile of arithmetic on letters. It’s not actually hard. Next time I hope to talk about more and different … um …

I’d like to talk about the different … oh, dear. Yes. You’re going to ask about that, aren’t you?

Ugh. All right. I’ll do it.

How do we know this is a stable orbit? Well, it just is. If it weren’t the Earth wouldn’t have a Moon after all this. Heck, the Sun wouldn’t have an Earth. At least it wouldn’t have a Jupiter. If the solar system is unstable, Jupiter is probably the most stable part. But that isn’t convincing. I’ll do this right, though, and show what the second derivative tells us. It tells us this is too a stable orbit.

So. The thing we have to do is find the second derivative of the effective potential. This we do by taking the derivative of the first derivative. Then we have to evaluate this second derivative and see what value it has for the radius of our circular orbit. If that’s a positive number, then the orbit’s stable. If that’s a negative number, then the orbit’s not stable. This isn’t hard to do, but it isn’t going to look pretty.

First the pretty part, though. Here’s the first derivative of the effective potential:

\frac{dV_{eff}}{dr} = G M m r^{-2} - \frac{L^2}{m} r^{-3}

OK. So the derivative of this with respect to ‘r’ isn’t hard to evaluate again. This is again a function with a bunch of terms that are all a constant times r to a power. That’s the easiest sort of thing to differentiate that isn’t just something that never changes.

\frac{d^2 V_{eff}}{dr^2} = -2 G M m r^{-3} - (-3)\frac{L^2}{m} r^{-4}

Now the messy part. We need to work out what that line above is when our planet’s in our circular orbit. That circular orbit happens when r = \frac{L^2}{G M m^2} . So we have to substitute that mess in for ‘r’ wherever it appears in that above equation and you’re going to love this. Are you ready? It’s:

-2 G M m \left(\frac{L^2}{G M m^2}\right)^{-3} + 3\frac{L^2}{m}\left(\frac{L^2}{G M m^2}\right)^{-4}

This will get a bit easier promptly. That’s because something raised to a negative power is the same as its reciprocal raised to the positive of that power. So that terrible, terrible expression is the same as this terrible, terrible expression:

-2 G M m \left(\frac{G M m^2}{L^2}\right)^3 + 3 \frac{L^2}{m}\left(\frac{G M m^2}{L^2}\right)^4

Yes, yes, I know. Only thing to do is start hacking through all this because I promise it’s going to get better. Putting all those third- and fourth-powers into their parentheses turns this mess into:

-2 G M m \frac{G^3 M^3 m^6}{L^6} + 3 \frac{L^2}{m} \frac{G^4 M^4 m^8}{L^8}

Yes, my gut reaction when I see multiple things raised to the eighth power is to say I don’t want any part of this either. Hold on another line, though. Things are going to start cancelling out and getting shorter. Group all those things-to-powers together:

-2 \frac{G^4 M^4 m^7}{L^6} + 3 \frac{G^4 M^4 m^7}{L^6}

Oh. Well, now this is different. The second derivative of the effective potential, at this point, is the number

\frac{G^4 M^4 m^7}{L^6}

And I admit I don’t know what number that is. But here’s what I do know: ‘G’ is a positive number. ‘M’ is a positive number. ‘m’ is a positive number. ‘L’ might be positive or might be negative, but ‘L6‘ is a positive number either way. So this is a bunch of positive numbers multiplied and divided together.

So this second derivative what ever it is must be a positive number. And so this circular orbit is stable. Give the planet a little nudge and that’s all right. It’ll stay near its orbit. I’m sorry to put you through that but some people raised the, honestly, fair question.

So this is the process you’d use to start understanding orbits for your own arbitrary potential energy. You can find the equivalent of Kepler’s Third Law, the one connecting orbit times and orbit radiuses. And it isn’t really hard. You need to know enough calculus to differentiate one function, and then you need to be willing to do a pile of arithmetic on letters. It’s not actually hard. Next time I hope to talk about the other kinds of central forces that you might get. We only solved one problem here. We can solve way more than that.

How Mathematical Physics Works: Another Course In 2200 Words


OK, I need some more background stuff before returning to the Why Stuff Can Orbit series. Last week I explained how to take derivatives, which is one of the three legs of a Calculus I course. Now I need to say something about why we take derivatives. This essay won’t really qualify you to do mathematical physics, but it’ll at least let you bluff your way through a meeting with one.

We care about derivatives because we’re doing physics a smart way. This involves thinking not about forces but instead potential energy. We have a function, called V or sometimes U, that changes based on where something is. If we need to know the forces on something we can take the derivative, with respect to position, of the potential energy.

The way I’ve set up these central force problems makes it easy to shift between physical intuition and calculus. Draw a scribbly little curve, something going up and down as you like, as long as it doesn’t loop back on itself. Also, don’t take the pen from paper. Also, no corners. That’s just cheating. Smooth curves. That’s your potential energy function. Take any point on this scribbly curve. If you go to the right a little from that point, is the curve going up? Then your function has a positive derivative at that point. Is the curve going down? Then your function has a negative derivative. Find some other point where the curve is going in the other direction. If it was going up to start, find a point where it’s going down. Somewhere in-between there must be a point where the curve isn’t going up or going down. The Intermediate Value Theorem says you’re welcome.

These points where the potential energy isn’t increasing or decreasing are the interesting ones. At least if you’re a mathematical physicist. They’re equilibriums. If whatever might be moving happens to be exactly there, then it’s not going to move. It’ll stay right there. Mathematically: the force is some fixed number times the derivative of the potential energy there. The potential energy’s derivative is zero there. So the force is zero and without a force nothing’s going to change. Physical intuition: imagine you laid out a track with exactly the shape of your curve. Put a marble at this point where the track isn’t rising and isn’t falling. Does the marble move? No, but if you’re not so sure about that read on past the next paragraph.

Mathematical physicists learn to look for these equilibriums. We’re taught to not bother with what will happen if we release this particle at this spot with this velocity. That is, you know, not looking at any particular problem someone might want to know. We look instead at equilibriums because they help us describe all the possible behaviors of a system. Mathematicians are sometimes characterized as lazy in spirit. This is fair. Mathematicians will start out with a problem looking to see if it’s just like some other problem someone already solved. But the flip side is if one is going to go to the trouble of solving a new problem, she’s going to really solve it. We’ll work out not just what happens from some one particular starting condition. We’ll try to describe all the different kinds of thing that could happen, and how to tell which of them does happen for your measly little problem.

If you actually do have a curvy track and put a marble down on its equilibrium it might yet move. Suppose the track is rising a while and then falls back again; putting the marble at top and it’s likely to roll one way or the other. If it doesn’t it’s probably because of friction; the track sticks a little. If it were a really smooth track and the marble perfectly round then it’d fall. Give me this. But even with a perfectly smooth track and perfectly frictionless marble it’ll still roll one way or another. Unless you put it exactly at the spot that’s the top of the hill, not a bit to the left or the right. Good luck.

What’s happening here is the difference between a stable and an unstable equilibrium. This is again something we all have a physical intuition for. Imagine you have something that isn’t moving. Give it a little shove. Does it stay about like it was? Then it’s stable. Does it break? Then it’s unstable. The marble at the top of the track is at an unstable equilibrium; a little nudge and it’ll roll away. If you had a marble at the bottom of a track, inside a valley, then it’s a stable equilibrium. A little nudge will make the marble rock back and forth but it’ll stay nearby.

Yes, if you give it a crazy big whack the marble will go flying off, never to be seen again. We’re talking about small nudges. No, smaller than that. This maybe sounds like question-begging to you. But what makes for an unstable equilibrium is that no nudge is too small. The nudge — perturbation, in the trade — will just keep growing. In a stable equilibrium there’s nudges small enough that they won’t keep growing. They might not shrink, but they won’t grow either.

So how to tell which is which? Well, look at your potential energy and imagine it as a track with a marble again. Where are the unstable equilibriums? They’re the ones at tops of hills. Near them the curve looks like a cup pointing down, to use the metaphor every Calculus I class takes. Where are the stable equilibriums? They’re the ones at bottoms of valleys. Near them the curve looks like a cup pointing up. Again, see Calculus I.

We may be able to tell the difference between these kinds of equilibriums without drawing the potential energy. We can use the second derivative. To find the second derivative of a function you take the derivative of a function and then — you may want to think this one over — take the derivative of that. That is, you take the derivative of the original function a second time. Sometimes higher mathematics gives us terms that aren’t too hard.

So if you have a spot where you know there’s an equilibrium, look at what the second derivative at that spot is. If it’s positive, you have a stable equilibrium. If it’s negative, you have an unstable equilibrium. This is called “Second Derivative Test”, as it was named by a committee that figured it was close enough to 5 pm and why cause trouble?

If the second derivative is zero there, um, we can’t say anything right now. The equilibrium may also be an inflection point. That’s where the growth of something pauses a moment before resuming. Or where the decline of something pauses a moment before resuming. In either case that’s still an unstable equilibrium. But it doesn’t have to be. It could still be a stable equilibrium. It might just have a very smoothly flat base. No telling just from that one piece of information and this is why we have to go on to other work.

But this gets at how we’d like to look at a system. We look for its equilibriums. We figure out which equilibriums are stable and which ones are unstable. With a little more work we can say, if the system starts out like this it’ll stay near that equilibrium. If it starts out like that it’ll stay near this whole other equilibrium. If it starts out this other way, it’ll go flying off to the end of the universe. We can solve every possible problem at once and never have to bother with a particular case. This feels good.

It also gives us a little something more. You maybe have heard of a tangent line. That’s a line that’s, er, tangent to a curve. Again with the not-too-hard terms. What this means is there’s a point, called the “point of tangency”, again named by a committee that wanted to get out early. And the line just touches the original curve at that point, and it’s going in exactly the same direction as the original curve at that point. Typically this means the line just grazes the curve, at least around there. If you’ve ever rolled a pencil until it just touched the edge of your coffee cup or soda can, you’ve set up a tangent line to the curve of your beverage container. You just didn’t think of it as that because you’re not daft. Fair enough.

Mathematicians will use tangents because a tangent line has values that are so easy to calculate. The function describing a tangent line is a polynomial and we llllllllove polynomials, correctly. The tangent line is always easy to understand, however hard the original function was. Its value, at the equilibrium, is exactly what the original function’s was. Its first derivative, at the equilibrium, is exactly what the original function’s was at that point. Its second derivative is zero, which might or might not be true of the original function. We don’t care.

We don’t use tangent lines when we look at equilibriums. This is because in this case they’re boring. If it’s an equilibrium then its tangent line is a horizontal line. No matter what the original function was. It’s trivial: you know the answer before you’ve heard the question.

Ah, but, there is something mathematical physicists do like. The tangent line is boring. Fine. But how about, using the second derivative, building a tangent … well, “parabola” is the proper term. This is a curve that’s a quadratic, that looks like an open bowl. It exactly matches the original function at the equilibrium. Its derivative exactly matches the original function’s derivative at the equilibrium. Its second derivative also exactly matches the original function’s second derivative, though. Third derivative we don’t care about. It’s so not important here I can’t even finish this sentence in a

What this second-derivative-based approximation gives us is a parabola. It will look very much like the original function if we’re close to the equilibrium. And this gives us something great. The great thing is this is the same potential energy shape of a weight on a spring, or anything else that oscillates back and forth. It’s the potential energy for “simple harmonic motion”.

And that’s great. We start studying simple harmonic motion, oh, somewhere in high school physics class because it’s so much fun to play with slinkies and springs and accidentally dropping weights on our lab partners. We never stop. The mathematics behind it is simple. It turns up everywhere. If you understand the mathematics of a mass on a spring you have a tool that relevant to pretty much every problem you ever have. This approximation is part of that. Close to a stable equilibrium, whatever system you’re looking at has the same behavior as a weight on a spring.

It may strike you that a mass on a spring is itself a central force. And now I’m saying that within the central force problem I started out doing, stuff that orbits, there’s another central force problem. This is true. You’ll see that in a few Why Stuff Can Orbit essays.

So far, by the way, I’ve talked entirely about a potential energy with a single variable. This is for a good reason: two or more variables is harder. Well of course it is. But the basic dynamics are still open. There’s equilibriums. They can be stable or unstable. They might have inflection points. There is a new kind of behavior. Mathematicians call it a “saddle point”. This is where in one direction the potential energy makes it look like a stable equilibrium while in another direction the potential energy makes it look unstable. Examples of it kind of look like the shape of a saddle, if you haven’t looked at an actual saddle recently. (If you really want to know, get your computer to plot the function z = x2 – y2 and look at the origin, where x = 0 and y = 0.) Well, there’s points on an actual saddle that would be saddle points to a mathematician. It’s unstable, because there’s that direction where it’s definitely unstable.

So everything about multivariable functions is longer, and a couple bits of it are harder. There’s more chances for weird stuff to happen. I think I can get through most of Why Stuff Can Orbit without having to know that. But do some reading up on that before you take a job as a mathematical physicist.

The Set Tour, Part 7: Matrices


I feel a bit odd about this week’s guest in the Set Tour. I’ve been mostly concentrating on sets that get used as the domains or ranges for functions a lot. The ones I want to talk about here don’t tend to serve the role of domain or range. But they are used a great deal in some interesting functions. So I loosen my rule about what to talk about.

Rm x n and Cm x n

Rm x n might explain itself by this point. If it doesn’t, then this may help: the “x” here is the multiplication symbol. “m” and “n” are positive whole numbers. They might be the same number; they might be different. So, are we done here?

Maybe not quite. I was fibbing a little when I said “x” was the multiplication symbol. R2 x 3 is not a longer way of saying R6, an ordered collection of six real-valued numbers. The x does represent a kind of product, though. What we mean by R2 x 3 is an ordered collection, two rows by three columns, of real-valued numbers. Say the “x” here aloud as “by” and you’re pronouncing it correctly.

What we get is called a “matrix”. If we put into it only real-valued numbers, it’s a “real matrix”, or a “matrix of reals”. Sometimes mathematical terminology isn’t so hard to follow. Just as with vectors, Rn, it matters just how the numbers are organized. R2 x 3 means something completely different from what R3 x 2 means. And swapping which positions the numbers in the matrix occupy changes what matrix we have, as you might expect.

You can add together matrices, exactly as you can add together vectors. The same rules even apply. You can only add together two matrices of the same size. They have to have the same number of rows and the same number of columns. You add them by adding together the numbers in the corresponding slots. It’s exactly what you would do if you went in without preconceptions.

You can also multiply a matrix by a single number. We called this scalar multiplication back when we were working with vectors. With matrices, we call this scalar multiplication. If it strikes you that we could see vectors as a kind of matrix, yes, we can. Sometimes that’s wise. We can see a vector as a matrix in the set R1 x n or as one in the set Rn x 1, depending on just what we mean to do.

It’s trickier to multiply two matrices together. As with vectors multiplying the numbers in corresponding positions together doesn’t give us anything. What we do instead is a time-consuming but not actually hard process. But according to its rules, something in Rm x n we can multiply by something in Rn x k. “k” is another whole number. The second thing has to have exactly as many rows as the first thing has columns. What we get is a matrix in Rm x k.

I grant you maybe didn’t see that coming. Also a potential complication: if you can multiply something in Rm x n by something in Rn x k, can you multiply the thing in Rn x k by the thing in Rm x n? … No, not unless k and m are the same number. Even if they are, you can’t count on getting the same product. Matrices are weird things this way. They’re also gateways to weirder things. But it is a productive weirdness, and I’ll explain why in a few paragraphs.

A matrix is a way of organizing terms. Those terms can be anything. Real matrices are surely the most common kind of matrix, at least in mathematical usage. Next in common use would be complex-valued matrices, much like how we get complex-valued vectors. These are written Cm x n. A complex-valued matrix is different from a real-valued matrix. The terms inside the matrix can be complex-valued numbers, instead of real-valued numbers. Again, sometimes, these mathematical terms aren’t so tricky.

I’ve heard occasionally of people organizing matrices of other sets. The notation is similar. If you’re building a matrix of “m” rows and “n” columns out of the things you find inside a set we’ll call H, then you write that as Hm x n. I’m not saying you should do this, just that if you need to, that’s how to tell people what you’re doing.

Now. We don’t really have a lot of functions that use matrices as domains, and I can think of fewer that use matrices as ranges. There are a couple of valuable ones, ones so valuable they get special names like “eigenvalue” and “eigenvector”. (Don’t worry about what those are.) They take in Rm x n or Cm x n and return a set of real- or complex-valued numbers, or real- or complex-valued vectors. Not even those, actually. Eigenvectors and eigenfunctions are only meaningful if there are exactly as many rows as columns. That is, for Rm x m and Cm x m. These are known as “square” matrices, just as you might guess if you were shaken awake and ordered to say what you guessed a “square matrix” might be.

They’re important functions. There are some other important functions, with names like “rank” and “condition number” and the like. But they’re not many. I believe they’re not even thought of as functions, any more than we think of “the length of a vector” as primarily a function. They’re just properties of these matrices, that’s all.

So why are they worth knowing? Besides the joy that comes of knowing something, I mean?

Here’s one answer, and the one that I find most compelling. There is cultural bias in this: I come from an applications-heavy mathematical heritage. We like differential equations, which study how stuff changes in time and in space. It’s very easy to go from differential equations to ordered sets of equations. The first equation may describe how the position of particle 1 changes in time. It might describe how the velocity of the fluid moving past point 1 changes in time. It might describe how the temperature measured by sensor 1 changes as it moves. It doesn’t matter. We get a set of these equations together and we have a majestic set of differential equations.

Now, the dirty little secret of differential equations: we can’t solve them. Most interesting physical phenomena are nonlinear. Linear stuff is easy. Small change 1 has effect A; small change 2 has effect B. If we make small change 1 and small change 2 together, this has effect A plus B. Nonlinear stuff, though … it just doesn’t work. Small change 1 has effect A; small change 2 has effect B. Small change 1 and small change 2 together has effect … A plus B plus some weird A times B thing plus some effect C that nobody saw coming and then C does something with A and B and now maybe we’d best hide.

There are some nonlinear differential equations we can solve. Those are the result of heroic work and brilliant insights. Compared to all the things we would like to solve there’s not many of them. Methods to solve nonlinear differential equations are as precious as ways to slay krakens.

But here’s what we can do. What we usually like to know about in systems are equilibriums. Those are the conditions in which the system stops changing. Those are interesting. We can usually find those points by boring but not conceptually challenging calculations. If we can’t, we can declare x0 represents the equilibrium. If we still care, we leave calculating its actual values to the interested reader or hungry grad student.

But what’s really interesting is: what happens if we’re near but not exactly at the equilibrium? Sometimes, we stay near it. Think of pushing a swing. However good a push you give, it’s going to settle back to the boring old equilibrium of dangling straight down. Sometimes, we go racing away from it. Think of trying to balance a pencil on its tip; if we did this perfectly it would stay balanced. It never does. We’re never perfect, or there’s some wind or somebody walks by and the perfect balance is foiled. It falls down and doesn’t bounce back up. Sometimes, whether it it stays near or goes away depends on what way it’s away from the equilibrium.

And now we finally get back to matrices. Suppose we are starting out near an equilibrium. We can, usually, approximate the differential equations that describe what will happen. The approximation may only be good if we’re just a tiny bit away from the equilibrium, but that might be all we really want to know. That approximation will be some linear differential equations. (If they’re not, then we’re just wasting our time.) And that system of linear differential equations we can describe using matrices.

If we can write what we are interested in as a set of linear differential equations, then we have won. We can use the many powerful tools of matrix arithmetic — linear algebra, specifically — to tell us everything we want to know about the system. We can say whether a small push away from the equilibrium stays small, or whether it grows, or whether it depends. We can say how fast the small push shrinks, or grows (for a while). We can say how the system will change, approximately.

This is what I love in matrices. It’s not everything there is to them. But it’s enough to make matrices important to me.

A Summer 2015 Mathematics A To Z: z-transform


z-transform.

The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)

It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a0 represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.

a1, a2, a3, and so on are the first, second, third, and so on samples after the reference start. a-1, a-2, a-3, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.

So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a-4 = 77, a-3 = 81, a-2 = 84, a-1 = 82, a0 = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.

The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial 77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 . (z1 is the same as z. z0 is the same as the number “1”. I wrote it this way to make the pattern more clear.)

I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a1 = 83 and a2 = 76. Then the z-transform for this set of data would be 77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2 . You’d probably agree that’s not a polynomial, although it looks a lot like one.

The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.

A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression 77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2 .

Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure \cdots + 0z^2 + 0z^1 + 1 + 1\left(\frac{1}{z}\right)^1 + 1\left(\frac{1}{z}\right)^2 + 1\left(\frac{1}{z}\right)^3 + 1\left(\frac{1}{z}\right)^4 + \cdots . And that turns out to be the same as 1 \div \left(1 - \left(\frac{1}{z}\right)\right) . That’s much shorter to write down, at least.

Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a0, and adding to that one-fifth of yesterday’s temperature, a-1, and one-fifth of the day before’s temperature a-2, and one-fifth a-3, and one-fifth a-4.

The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?

We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.

A Summer 2015 Mathematics A To Z: well-posed problem


Well-Posed Problem.

This is another mathematical term almost explained by what the words mean in English. Probably you’d guess a well-posed problem to be a question whose answer you can successfully find. This also implies that there is an answer, and that it can be found by some method other than guessing luckily.

Mathematicians demand three things of a problem to call it “well-posed”. The first is that a solution exists. The second is that a solution has to be unique. It’s imaginable there might be several answers that answer a problem. In that case we weren’t specific enough about what we’re looking for. Or we should have been looking for a set of answers instead of a single answer.

The third requirement takes some time to understand. It’s that the solution has to vary continuously with the initial conditions. That is, suppose we started with a slightly different problem. If the answer would look about the same, then the problem was well-posed to begin with. Suppose we’re looking at the problem of how a block of ice gets melted by a heater set in its center. The way that melts won’t change much if the heater is a little bit hotter, or if it’s moved a little bit off center. This heating problem is well-posed.

There are problems that don’t have this continuous variation, though. Typically these are “inverse problems”. That is, they’re problems in which you look at the outcome of something and try to say what caused it. That would be looking at the puddle of melted water and the heater and trying to say what the original block of ice looked like. There are a lot of blocks of ice that all look about the same once melted, and there’s no way of telling which was the one you started with.

You might think of these conditions as “there’s an answer, there’s only one answer, and you can find it”. That’s good enough as a memory aid, but it isn’t quite so. A problem’s solution might have this continuous variation, but still be “numerically unstable”. This is a difficulty you can run across when you try doing calculations on a computer.

You know the thing where on a calculator you type in 1 / 3 and get back 0.333333? And you multiply that by three and get 0.999999 instead of exactly 1? That’s the thing that underlies numerical instability. We want to work with numbers, but the calculator or computer will let us work with only an approximation to them. 0.333333 is close to 1/3, but isn’t exactly that.

For many calculations the difference doesn’t matter. 0.999999 is really quite close to 1. If you lost 0.000001 parts of every dollar you earned there’s a fine chance you’d never even notice. But in some calculations, numerically unstable ones, that difference matters. It gets magnified until the error created by the difference between the number you want and the number you can calculate with is too big to ignore. In that case we call the calculation we’re doing “ill-conditioned”.

And it’s possible for a problem to be well-posed but ill-conditioned. This is annoying and is why numerical mathematicians earn the big money, or will tell you they should. Trying to calculate the answer will be so likely to give something meaningless that we can’t trust the work that’s done. But often it’s possible to rework a calculation into something equivalent but well-conditioned. And a well-posed, well-conditioned problem is great. Not only can we find its solution, but we can usually have a computer do the calculations, and that’s a great breakthrough.

Conditions of equilibrium and stability


This month Peter Mander’s CarnotCycle blog talks about the interesting world of statistical equilibriums. And particularly it talks about stable equilibriums. A system’s in equilibrium if it isn’t going to change over time. It’s in a stable equilibrium if being pushed a little bit out of equilibrium isn’t going to make the system unpredictable.

For simple physical problems these are easy to understand. For example, a marble resting at the bottom of a spherical bowl is in a stable equilibrium. At the exact bottom of the bowl, the marble won’t roll away. If you give the marble a little nudge, it’ll roll around, but it’ll stay near where it started. A marble sitting on the top of a sphere is in an equilibrium — if it’s perfectly balanced it’ll stay where it is — but it’s not a stable one. Give the marble a nudge and it’ll roll away, never to come back.

In statistical mechanics we look at complicated physical systems, ones with thousands or millions or even really huge numbers of particles interacting. But there are still equilibriums, some stable, some not. In these, stuff will still happen, but the kind of behavior doesn’t change. Think of a steadily-flowing river: none of the water is staying still, or close to it, but the river isn’t changing.

CarnotCycle describes how to tell, from properties like temperature and pressure and entropy, when systems are in a stable equilibrium. These are properties that don’t tell us a lot about what any particular particle is doing, but they can describe the whole system well. The essay is higher-level than usual for my blog. But if you’re taking a statistical mechanics or thermodynamics course this is just the sort of essay you’ll find useful.

carnotcycle

cse01

In terms of simplicity, purely mechanical systems have an advantage over thermodynamic systems in that stability and instability can be defined solely in terms of potential energy. For example the center of mass of the tower at Pisa, in its present state, must be higher than in some infinitely near positions, so we can conclude that the structure is not in stable equilibrium. This will only be the case if the tower attains the condition of metastability by returning to a vertical position or absolute stability by exceeding the tipping point and falling over.

cse02

Thermodynamic systems lack this simplicity, but in common with purely mechanical systems, thermodynamic equilibria are always metastable or stable, and never unstable. This is equivalent to saying that every spontaneous (observable) process proceeds towards an equilibrium state, never away from it.

If we restrict our attention to a thermodynamic system of unchanging composition and apply…

View original post 2,534 more words

A Second Way To Fall Over


I admit not being perfectly satisfied with my answer, about whether a box is easier to tip over by pushing on the middle of one of its top edges or by pushing on its corner, just by looking at it from the energy both approaches need to raise the box’s center of mass above the pivot. It’s straightforward enough, but I don’t do this sort of calculation often, so maybe I’m looking at the wrong things. Can I find another, independent, line of argument? If I can, does that get to the same answer? If it does, good. If it doesn’t, then I get to wonder which line of argument I believe in more. So here’s one.

Continue reading “A Second Way To Fall Over”

One Way To Fall Over


[ Huh. My statistics page says that someone came to me yesterday looking for the “mathematics behind rap music”. I don’t doubt there is such mathematics, but I’ve never written anything approaching it. I admit that despite the long intertwining of mathematics and music, and my own childhood of playing a three-quarter size violin in a way that must be characterized as “technically playing”, I don’t know anything nontrivial about the mathematics of any music. So, whoever was searching for that, I’m sorry to have disappointed you. ]

Now, let me try my first guess at saying whether it’s easier to tip the cube over by pushing along the middle of the edge or by pushing at the corner. I laid out the ground rules, and particularly, the symbols used for the size of the box (it’s of length a) and how far the center of mass (the dead center of the box) is from the edges and the corners last time around. Here’s my first thought about what has to be done to tip the box over: we have to make the box pivot on some point — along one edge, if we’re pushing on the edge; along one corner, if we’re pushing on the corner — and so make it start to roll. If we can raise the center of mass above the pivot then we can drop the box back down with some other face to the floor, which has to count as tipping the box over. If we don’t raise the center of mass we aren’t tipping the box at all, we’re just shoving it.

Continue reading “One Way To Fall Over”

Tipping The Toy


My brother phoned to remind me how much more generally nervous I should be about things, as well as to ask my opinion in an utterly pointless dispute he was having with his significant other. The dispute was over no stakes whatsoever and had no consequences of any practical value so I can see why it’d call for an outside expert. It’s more one of physics, but I did major in physics long ago, and it’s easier to treat mathematically anyway, and it was interesting enough that I spent the rest of the night working it out and I’m still not positive I’m unambiguously right. I could probably find out for certain with some simple experiments, but that would be precariously near trying, and so is right out. Let me set up the problem, though, since it’s interesting and should offer room for people to argue I’m completely wrong.

Continue reading “Tipping The Toy”

%d bloggers like this: