## My best tea-refilling strategy

The problem I’d set out last week: I have a teapot good for about three cups of tea. I want to put milk in the once, before the first cup. How much should I drink before topping up the cup, to have the most milk at the end?

I have expectations. Some of this I know from experience, doing other problems where things get replaced at random. Here, tea or milk particles get swallowed at random, and replaced with tea particles. Yes, ‘particle’ is a strange word to apply to “a small bit of tea”. But it’s not like I can call them tea molecules. “Particle” will do and stop seeming weird someday.

Random replacement problems tend to be exponential decays. That I know from experience doing problems like this. So if I get an answer that doesn’t look like an exponential decay I’ll doubt it. I might be right, but I’ll need more convincing.

I also get some insight from extreme cases. We can call them reductios. Here “reductio” as in the word we usually follow with “ad absurdum”. Make the case ridiculous and see if that offers insight. The first reductio is to suppose I drink the entire first cup down to the last particle, then pour new tea in. By the second cup, there’s no milk left. The second reductio is to suppose I drink not a bit of the first cup of milk-with-tea. Then I have the most milk preserved. It’s not a satisfying break. But it leads me to suppose the most milk makes it through to the end if I have a lot of small sips and replacements of tea. And to look skeptically if my work suggests otherwise.

So that’s what I expect. What actually happens? Here, I do a bit of reasoning. Suppose that I have a mug. It can hold up to 1 unit of tea-and-milk. And the teapot, which holds up to 2 more units of tea-and-milk. What units? For the mathematics, I don’t care.

I’m going to suppose that I start with some amount — call it $a$ — of milk. $a$ is some number between 0 and 1. I fill the cup up to full, that is, 1 unit of tea-and-milk. And I drink some amount of the mixture. Call the amount I drink $x$. It, too, is between 0 and 1. After this, I refill the mug up to full, so, putting in $x$ units of tea. And I repeat this until I empty the teapot. So I can do this $\frac{2}{x}$ times.

I know you noticed that I’m short on tea here. The teapot should hold 3 units of tea. I’m only pouring out $3 - a$. I could be more precise by refilling the mug $\frac{2 + a}{x}$ times. I’m also going to suppose that I refill the mug with $x$ amount of tea a whole number of times. This sounds necessarily true. But consider: what if I drank and re-filled three-quarters of a cup of tea each time? How much tea is poured that third time?

I make these simplifications for good reasons. They reduce the complexity of the calculations I do without, I trust, making the result misleading. I can justify it too. I don’t drink tea from a graduated cylinder. It’s a false precision to pretend I do. I drink (say) about half my cup and refill it. How much tea I get in the teapot is variable too. Also, I don’t want to do that much work for this problem.

In fact, I’m going to do most of the work of this problem with a single drawing of a square. Here it is.

So! I start out with $a$ units of tea in the mixture. After drinking $x$ units of milk-and-tea, what’s left is $a\cdot(1 - x)$ units of milk in the mixture.

How about the second refill? The process is the same as the first refill. But where, before, there had been $a$ units of milk in the tea, now there are only $a\cdot(1 - x)$ units in. So that horizontal strip is a little narrower is all. The same reasoning applies and so, after the second refill, there’s $a\cdot(1 - x)\cdot(1 - x)$ milk in the mixture.

If you nodded to that, you’d agree that after the third refill there’s $a\cdot(1 - x)\cdot(1 - x)\cdot(1 - x)$. And are pretty sure what happens at the fourth and fifth and so on. If you didn’t nod to that, it’s all right. If you’re willing to take me on faith we can continue. If you’re not, that’s good too. Try doing a couple drawings yourself and you may convince yourself. If not, I don’t know. Maybe try, like, getting six white and 24 brown beads, stir them up, take out four at random. Replace all four with brown beads and count, and do that several times over. If you’re short on beads, cut up some paper into squares and write ‘B’ and ‘W’ on each square.

But anyone comfortable with algebra can see how to reduce this. The amount of milk remaining after j refills is going to be

$a\cdot(1 - x)^j$

How many refills does it take to run out of tea? That we knew from above: it’s $\frac{2}{j}$ refills. So my last full mug of tea will have left in it

$a\cdot(1 - x)^{\frac{2}{x}}$

units of milk.

Anyone who does differential equations recognizes this. It’s the discrete approximation of the exponential decay curve. Discrete, here, because we take out some finite but nonzero amount of milk-and-tea, $x$, and replace it with the same amount of pure tea.

Now, again, I’ve seen this before so I know its conclusions. The most milk will make it to the end of $x$ is as small as possible. The best possible case would be if I drink and replace an infinitesimal bit of milk-and-tea each time. Then the last mug would end with $a\cdot e^{-2}$ of milk. That’s $e$ as in the base of the natural logarithm. Every mathematics problem has an $e$ somewhere in it and I’m not exaggerating much. All told this would be about 13 and a half percent of the original milk.

Drinking more realistic amounts, like, half the mug before refilling, makes the milk situation more dire. Replacing half the mug at a time means the last full mug has only one-sixteenth what I started with. Drinking a quarter of the mug and replacing it lets about one-tenth the original milk survive.

But all told the lesson is clear. If I want milk in the last mug, I should put some in each refill. Putting all the milk in at the start and letting it dissolve doesn’t work.

## My All 2020 Mathematics A to Z: Big-O and Little-O Notation

Mr Wu, author of the Singapore Maths Tuition blog, asked me to explain a technical term today. I thought that would be a fun, quick essay. I don’t learn very fast, do I?

A note on style. I make reference here to “Big-O” and “Little-O”, capitalizing and hyphenating them. This is to give them visual presence as a name. In casual discussion they’re just read, or said, as the two words or word-and-a-letter. Often the Big- or Little- gets dropped and we just talk about O. An O, without further context, in my experience means Big-O.

The part of me that wants smooth consistency in prose urges me to write “Little-o”, as the thing described is represented with a lowercase ‘o’. But Little-o sounds like a midway game or an Eyerly Aircraft Company amusement park ride. And I never achieve consistency in my prose anyway. Maybe for the book publication. Until I’m convinced another is better, though, “Little-O” it is.

# Big-O and Little-O Notation.

When I first went to college I had a campus post office box. I knew my box number. I also knew the length of the sluggish line for the combination lock code. The lock was a dial, lettered A through J. Being a young STEM-class idiot I thought, boy, would it actually be quicker to pick the lock than wait for the line? A three-letter combination, of ten options? That’s 1,000 possibilities. If I could try five a minute that’s, at worst, three hours 20 minutes. Combination might be anywhere in that set; I might get lucky. I could expect to spend 80 minutes picking my lock.

I decided to wait in line instead, and good that I did. I was unaware combination might not be a letter, like ‘A’. It could be the midway point between adjacent letters, like ‘AB’. That meant there were eight times as many combinations as I estimated, and I could expect to spend over ten hours. Even the slow line was faster than that. It transpired that my combination had two of these midway letters.

But that’s a little demonstration of algorithmic complexity. Also in cracking passwords by trial-and-error. Doubling the set of possible combination codes octuples the time it takes to break into the set. Making the combination longer would also work; each extra letter would multiply the cracking time by twenty. So you understand why your password should include “special characters” like punctuation, but most of all should be long.

We’re often interested in how long to expect a task to take. Sometimes we’re interested in the typical time it takes. Often we’re interested in the longest it could ever take. If we have a deterministic algorithm, we can say. We can count how many steps it takes. Sometimes this is easy. If we want to add two two-digit numbers together we know: it will be, at most, three single-digit additions plus, maybe, writing down a carry. (To add 98 and 37 is adding 8 + 7 to get 15, to add 9 + 3 to get 12, and to take the carry from the 15, so, 1 + 12 to get 13, so we have 135.) We can get a good quarrel going about what “a single step” is. We can argue whether that carry into the hundreds column is really one more addition. But we can agree that there is some smallest bit of arithmetic work, and work from that.

For any algorithm we have something that describes how big a thing we’re working on. It’s often ‘n’. If we need more than one variable to describe how big it is, ‘m’ gets called up next. If we’re estimating how long it takes to work on a number, ‘n’ is the number of digits in the number. If we’re thinking about a square matrix, ‘n’ is the number of rows and columns. If it’s a not-square matrix, then ‘n’ is the number of rows and ‘m’ the number of columns. Or vice-versa; it’s your matrix. If we’re looking for an item in a list, ‘n’ is the number of items in the list. If we’re looking to evaluate a polynomial, ‘n’ is the order of the polynomial.

In normal circumstances we don’t work out how many steps some operation does take. It’s more useful to know that multiplying these two long numbers would take about 900 steps than that it would need only 816. And so this gives us an asymptotic estimate. We get an estimate of how much longer cracking the combination lock will take if there’s more letters to pick from. This allowing that some poor soul will get the combination A-B-C.

There are a couple ways to describe how long this will take. The more common is the Big-O. This is just the letter, like you find between N and P. Since that’s easy, many have taken to using a fancy, vaguely cursive O, one that looks like $\mathcal{O}$. I agree it looks nice. Particularly, though, we write $\mathcal{O}(f(n))$, where f is some function. In practice, we’ll see functions like $\mathcal{O}(n)$ or $\mathcal{O}(n^2 \log(n))$ or $\mathcal{O}(n^3)$. Usually something simple like that. It can be tricky. There’s a scheme for multiplying large numbers together that’s $\mathcal{O}(n \cdot 2^{\sqrt{2 log (n)}} \cdot log(n))$. What you will not see is something like $\mathcal{O}(\sin (n))$, or $\mathcal{O}(n^3 - n^4)$ or such. This comes to what we mean by the Big-O.

It’ll be convenient for me to have a name for the actual number of steps the algorithm takes. Let me call the function describing that g(n). Then g(n) is $\mathcal{O}(f(n))$ if once n gets big enough, g(n) is always less than C times f(n). Here c is some constant number. Could be 1. Could be 1,000,000. Could be 0.00001. Doesn’t matter; it’s some positive number.

There’s some neat tricks to play here. For example, the function ‘$n$‘ is $\mathcal{O}(n)$. It’s also $\mathcal{O}(n^2)$ and $\mathcal{O}(n^9)$ and $\mathcal{O}(e^{n})$. The function ‘$n^2$ is also $\mathcal{O}(n^2)$ and those later terms, but it is not $\mathcal{O}(n)$. And you can see why $\mathcal{O}(\sin(n))$ is right out.

There is also a Little-O notation. It, too, is an upper bound on the function. But it is a stricter bound, setting tighter restrictions on what g(n) is like. You ask how it is the stricter bound gets the minuscule letter. That is a fine question. I think it’s a quirk of history. Both symbols come to us through number theory. Big-O was developed first, published in 1894 by Paul Bachmann. Little-O was published in 1909 by Edmund Landau. Yes, the one with the short Hilbert-like list of number theory problems. In 1914 G H Hardy and John Edensor Littlewood would work on another measure and they used Ω to express it. (If you see the letter used for Big-O and Little-O as the Greek omicron, then you see why a related concept got called omega.)

What makes the Little-O measure different is its sternness. g(n) is $o(f(n))$ if, for every positive number C, whenever n is large enough g(n) is less than or equal to C times f(n). I know that sounds almost the same. Here’s why it’s not.

If g(n) is $\mathcal{O}(f(n))$, then you can go ahead and pick a C and find that, eventually, $g(n) \le C f(n)$. If g(n) is $o(f(n))$, then I, trying to sabotage you, can go ahead and pick a C, trying my best to spoil your bounds. But I will fail. Even if I pick, like a C of one millionth of a billionth of a trillionth, eventually f(n) will be so big that $g(n) \le C f(n)$. I can’t find a C small enough that f(n) doesn’t eventually outgrow it, and outgrow g(n).

This implies some odd-looking stuff. Like, that the function n is not $o(n)$. But the function n is at least $o(n^2)$, and $o(n^9)$ and those other fun variations. Being Little-O compels you to be Big-O. Big-O is not compelled to be Little-O, although it can happen.

These definitions, for Big-O and Little-O, I’ve laid out from algorithmic complexity. It’s implicitly about functions defined on the counting numbers. But there’s no reason I have to limit the ideas to that. I could define similar ideas for a function g(x), with domain the real numbers, and come up with an idea of being on the order of f(x).

We make some adjustments to this. The important one is that, with algorithmic complexity, we assumed g(n) had to be a positive number. What would it even mean for something to take minus four steps to complete? But a regular old function might be zero or negative or change between negative and positive. So we look at the absolute value of g(x). Is there some value of C so that, when x is big enough, the absolute value of g(x) stays less than C times f(x)? If it does, then g(x) is $\mathcal{O}(f(x))$. Is it the case that for every positive number C it’s true that g(x) is less than C times f(x), once x is big enough? Then g(x) is $o(f(x))$.

Fine, but why bother defining this?

A compelling answer is that it gives us a way to describe how different a function is from an approximation to that function. We are always looking for approximations to functions because most functions are hard. We have a small set of functions we like to work with. Polynomials are great numerically. Exponentials and trig functions are great analytically. That’s about all the functions that are easy to work with. Big-O notation particularly lets us estimate how bad an error we make using the approximation.

For example, the Runge-Kutta method numerically approximates solutions to ordinary differential equations. It does this by taking the information we have about the function at some point x to approximate its value at a point x + h. ‘h’ is some number. The difference between the actual answer and the Runge-Kutta approximation is $\mathcal{O}(h^4)$. We use this knowledge to make sure our error is tolerable. Also, we don’t usually care what the function is at x + h. It’s just what we can calculate. What we want is the function at some point a fair bit away from x, call it x + L. So we use our approximate knowledge of conditions at x + h to approximate the function at x + 2h. And use x + 2h to tell us about x + 3h, and from that x + 4h and so on, until we get to x + L. We’d like to have as few of these uninteresting intermediate points as we can, so look for as big an h as is safe.

That context may be the more common one. We see it, particularly, in Taylor Series and other polynomial approximations. For example, the sine of a number is approximately:

$\sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \frac{x^9}{9!} + \mathcal{O}(x^{11})$

This has consequences. It tells us, for example, that if x is about 0.1, this approximation is probably pretty good. So it is: the sine of 0.1 (radians) is about 0.0998334166468282 and that’s exactly what five terms here gives us. But it also warns that if x is about 10, this approximation may be gibberish. And so it is: the sine of 10.0 is about -0.5440 and the polynomial is about 1448.27.

The connotation in using Big-O notation here is that we look for small h’s, and for $\mathcal{O}(x)$ to be a tiny number. It seems odd to use the same notation with a large independent variable and with a small one. The concept carries over, though, and helps us talk efficiently about this different problem.

I hope this week to post the Playful Math Education Blog Carnival for September. Any educational or recreational or fun mathematics sites you know about would be greatly helpful to me and them. Thanks for your help.

Lastly, I am open for mathematics topics starting with P, Q, and R to write about next month. I’ve basically chosen my ‘P’ subject, though I’d be happy to hear alternatives for ‘Q’ and ‘R’ yet.

## My All 2020 Mathematics A to Z: Jacobi Polynomials

Mr Wu, author of the Singapore Maths Tuition blog, gave me a good nomination for this week’s topic: the j-function of number theory. Unfortunately I concluded I didn’t understand the function well enough to write about it. So I went to a topic of my own choosing instead.

The Jacobi Polynomials discussed here are named for Carl Gustav Jacob Jacobi. Jacobi lived in Prussia in the first half of the 19th century. Though his career was short, it was influential. I’ve already discussed the Jacobian, which describes how changes of variables change volume. He has a host of other things named for him, most of them in matrices or mathematical physics. He was also a pioneer in those elliptic curves you hear so much about these days.

# Jacobi Polynomials.

Jacobi Polynomials are a family of functions. Polynomials, it happens; this is a happy case where the name makes sense. “Family” is the name mathematicians give to a bunch of functions that have some similarity. This often means there’s a parameter, and each possible value of the parameter describes a different function in the family. For example, we talk about the family of sine functions, $S_n(z)$. For every integer n we have the function $S_n(z) = \sin(n z)$ where z is a real number between -π and π.

We like a family because every function in it gives us some nice property. Often, the functions play nice together, too. This is often something like mutual orthogonality. This means two different representatives of the family are orthogonal to one another. “Orthogonal” means “perpendicular”. We can talk about functions being perpendicular to one another through a neat mechanism. It comes from vectors. It’s easy to use vectors to represent how to get from one point in space to another. From vectors we define a dot product, a way of multiplying them together. A dot product has to meet a couple rules that are pretty easy to do. And if you don’t do anything weird? Then the dot product between two vectors is the cosine of the angle made by the end of the first vector, the origin, and the end of the second vector.

Functions, it turns out, meet all the rules for a vector space. (There are not many rules to make a vector space.) And we can define something that works like a dot product for two functions. Take the integral, over the whole domain, of the first function times the second. This meets all the rules for a dot product. (There are not many rules to make a dot product.) Did you notice me palm that card? When I did not say “the dot product is take the integral …”? That card will come back. That’s for later. For now: we have a vector space, we have a dot product, we can take arc-cosines, so why not define the angle between functions?

Mostly we don’t because we don’t care. Where we do care? We do like functions that are at right angles to one another. As with most things mathematicians do, it’s because it makes life easier. We’ll often want to describe properties of a function we don’t yet know. We can describe the function we don’t yet know as the sum of coefficients — some fixed real number — times basis functions that we do know. And then our problem of finding the function changes to one of finding the coefficients. If we picked a set of basis functions that are all orthogonal to one another, the finding of these coefficients gets easier. Analytically and numerically: we can often turn each coefficient into its own separate problem. Let a different computer, or at least computer process, work on each coefficient and get the full answer much faster.

The Jacobi Polynomials have three coefficients. I see them most often labelled α, β, and n. Likely you imagine this means it’s a huge family. It is huger than that. A zoologist would call this a superfamily, at least. Probably an order, possibly a class.

It turns out different relationships of these coefficients give you families of functions. Many of these families are noteworthy enough to have their own names. For example, if α and β are both zero, then the Jacobi functions are a family also known as the Legendre Polynomials. This is a great set of orthogonal polynomials. And the roots of the Legendre Polynomials give you information needed for Gaussian quadrature. Gaussian quadrature is a neat trick for numerically integrating a function. Take a weighted sum of the function you’re integrating evaluated at a set of points. This can get a very good — maybe even perfect — numerical estimate of the integral. The points to use, and the weights to use, come from a Legendre polynomial.

If α and β are both $-\frac{1}{2}$ then the Jacobi Polynomials are the Chebyshev Polynomials of the first kind. (There’s also a second kind.) These are handy in approximation theory, describing ways to better interpolate a polynomial from a set of data. They also have a neat, peculiar relationship to the multiple-cosine formulas. Like, $\cos(2\theta) = 2\cos^2(\theta) - 1$. And the second Chebyshev polynomial is $T_2(x) = 2x^2 - 1$. Imagine sliding between x and $cos(\theta)$ and you see the relationship. $cos(3\theta) = 4 \cos^3(\theta) - 3\cos(\theta)$ and $T_3(x) = 4x^3 - 3x$. And so on.

Chebyshev Polynomials have some superpowers. One that’s most amazing is accelerating convergence. Often a numerical process, such as finding the solution of an equation, is an iterative process. You can’t find the answer all at once. You instead find an approximation and do something that improves it. Each time you do the process, you get a little closer to the true answer. This can be fine. But, if the problem you’re working on allows it, you can use the first couple iterations of the solution to figure out where this is going. The result is that you can get very good answers using the same amount of computer time you needed to just get decent answers. The trade, of course, is that you need to understand Chebyshev Polynomials and accelerated convergence. We always have to make trades like that.

Back to the Jacobi Polynomials family. If α and β are the same number, then the Jacobi functions are a family called the Gegenbauer Polynomials. These are great in mathematical physics, in potential theory. You can turn the gravitational or electrical potential function — that one-over-the-distance-squared force — into a sum of better-behaved functions. And they also describe zonal spherical harmonics. These let you represent functions on the surface of a sphere as the sum of coefficients times basis functions. They work in much the way the terms of a Fourier series do.

If β is zero and there’s a particular relationship between α and n that I don’t want to get into? The Jacobi Polynomials become the Zernike Polynomials, which I never heard of before this paragraph either. I read they are the tools you need to understand optics, and particularly how lenses will alter the light passing through.

Since the Jacobi Polynomials have a greater variety of form than even poison ivy has, you’ll forgive me not trying to list them. Or even listing a representative sample. You might also ask how they’re related at all.

Well, they all solve the same differential equation, for one. Not literally a single differential equation. A family of differential equations, where α and β and n turn up in the coefficients. The formula using these coefficients is the same in all these differential equations. That’s a good reason to see a relationship. Or we can write the Jacobi Polynomials as a series, a function made up of the sum of terms. The coefficients for each of the terms depends on α and β and n, always in the same way. I’ll give you that formula. You won’t like it and won’t ever use it. The Jacobi Polynomial for a particular α, β, and n is the polynomial

$P_n^{(\alpha, \beta)}(z) = (n+\alpha)!(n + \beta)!\sum_{s=0}^n \frac{1}{s!(n + \alpha - s)!(\beta + s)!(n - s)!}\left(\frac{z-1}{2}\right)^{n-s}\left(\frac{z + 1}{2}\right)^s$

Its domain, by the way, is the real numbers from -1 to 1. We need something for the domain. It turns out there’s nothing you can do on the real numbers that you can’t fit into the domain from -1 to 1 anyway. (If you have to do something on, say, the interval from 10 to 54? Do a change of variable, scaling things down and moving them, and use -1 to 1. Then undo that change when you’re done.) The range is the real numbers, as you’d expect.

(You maybe noticed I used ‘z’ for the independent variable there, rather than ‘x’. Usually using ‘z’ means we expect this to be a complex number. But ‘z’ here is definitely a real number. This is because we can also get to the Jacobi Polynomials through the hypergeometric series, a function I don’t want to get into. But for the hypergeometric series we are open to the variable being a complex number. So many references carry that ‘z’ back into Jacobi Polynomials.)

Another thing which links these many functions is recurrence. If you know the Jacobi Polynomial for one set of parameters — and you do; $P_0^{(\alpha, \beta)}(z) = 1$ — you can find others. You do this in a way rather like how you find new terms in the Fibonacci series by adding together terms you already know. These formulas can be long. Still, if you know $P_{n-1}^{(\alpha, \beta)}$ and $P_{n-2}^{(\alpha, \beta)}$ for the same α and β? Then you can calculate $P_n^{(\alpha, \beta)}$ with nothing more than pen, paper, and determination. If it helps,

$P_1^{(\alpha, \beta)}(z) = (\alpha + 1) + (\alpha + \beta + 2)\frac{z - 1}{2}$

and this is true for any α and β. You’ll never do anything with that. This is fine.

There is another way that all these many polynomials are related. It goes back to their being orthogonal. We measured orthogonality by a dot product. Back when I palmed that card I told you was the integral of the two functions multiplied together. This is indeed a dot product. We can define others. We make those others by taking a weighted integral of the product of these two functions. That is, integrate the two functions times a third, a weight function. Of course there’s reasons to do this; they amount to deciding that some parts of the domain are more important than others. The weight function can be anything that meets a few rules. If you want to get the Jacobi Polynomials out of them, you start with the function $P_0^{(\alpha, \beta)}(z) = 1$ and the weight function

$w_n(z) = (1 - z)^{\alpha} (1 + z)^{\beta}$

As I say, though, you’ll never use that. If you’re eager and ready to leap into this work you can use this to build a couple Legendre Polynomials. Or Chebyshev Polynomials. For the full Jacobi Polynomials, though? Use, like, the command JacobiP[n, a, b, z] in Mathematica, or jacobiP(n, a, b, z) in Matlab. Other people have programmed this for you. Enjoy their labor.

In my work I have not used the full set of Jacobi Polynomials much. There’s more of them than I need. I do rely on the Legendre Polynomials, and the Chebyshev Polynomials. Other mathematicians use other slices regularly. It is stunning to sometimes look and realize that these many functions, different as they look, are reflections of one another, though. Mathematicians like to generalize, and find one case that covers as many things as possible. It’s rare that we are this successful.

I thank you for reading this. All of this year’s A-to-Z essays should be available at this link. The essays from every A-to-Z sequence going back to 2015 should be at this link. And I’m already looking ahead to the M, N, and O essays that I’ll be writing the day before publication instead of the week before like I want! I appreciate any nominations you have, even ones I can’t cover fairly.

## My 2018 Mathematics A To Z: e

I’m back to requests! Today’s comes from commenter Dina Yagodich. I don’t know whether Yagodich has a web site, YouTube channel, or other mathematics-discussion site, but am happy to pass along word if I hear of one.

# e.

Let me start by explaining integral calculus in two paragraphs. One of the things done in it is finding a `definite integral’. This is itself a function. The definite integral has as its domain the combination of a function, plus some boundaries, and its range is numbers. Real numbers, if nobody tells you otherwise. Complex-valued numbers, if someone says it’s complex-valued numbers. Yes, it could have some other range. But if someone wants you to do that they’re obliged to set warning flares around the problem and precede and follow it with flag-bearers. And you get at least double pay for the hazardous work. The function that gets definite-integrated has its own domain and range. The boundaries of the definite integral have to be within the domain of the integrated function.

For real-valued functions this definite integral has a great physical interpretation. A real-valued function means the domain and range are both real numbers. You see a lot of these. Call the function ‘f’, please. Call its independent variable ‘x’ and its dependent variable ‘y’. Using Euclidean coordinates, or as normal people call it “graph paper”, draw the points that make true the equation “y = f(x)”. Then draw in the x-axis, that is, the points where “y = 0”. The boundaries of the definite integral are going to be two values of ‘x’, a lower and an upper bound. Call that lower bound ‘a’ and the upper bound ‘b’. And heck, call that a “left boundary” and a “right boundary”, because … I mean, look at them. Draw the vertical line at “x = a” and the vertical line at “x = b”. If ‘f(x)’ is always a positive number, then there’s a shape bounded below by “y = 0”, on the left by “x = a”, on the right by “x = b”, and above by “y = f(x)”. And the definite integral is the area of that enclosed space. If ‘f(x)’ is sometimes zero, then there’s several segments, but their combined area is the definite integral. If ‘f(x)’ is sometimes below zero, then there’s several segments. The definite integral is the sum of the areas of parts above “y = 0” minus the area of the parts below “y = 0”.

(Why say “left boundary” instead of “lower boundary”? Taste, pretty much. But I look at the words “lower boundary” and think about the lower edge, that is, the line where “y = 0” here. And “upper boundary” makes sense as a way to describe the curve where “y = f(x)” as well as “x = b”. I’m confusing enough without making the simple stuff ambiguous.)

Don’t try to pass your thesis defense on this alone. But it’s what you need to understand ‘e’. Start out with the function ‘f’, which has domain of the positive real numbers and range of the positive real numbers. For every ‘x’ in the domain, ‘f(x)’ is the reciprocal, one divided by x. This is a shape you probably know well. It’s a hyperbola. Its asymptotes are the x-axis and the y-axis. It’s a nice gentle curve. Its plot passes through such famous points as (1, 1), (2, 1/2), (1/3, 3), and pairs like that. (10, 1/10) and (1/100, 100) too. ‘f(x)’ is always positive on this domain. Use as left boundary the line “x = 1”. And then — let’s think about different right boundaries.

If the right boundary is close to the left boundary, then this area is tiny. If it’s at, like, “x = 1.1” then the area can’t be more than 0.1. (It’s less than that. If you don’t see why that’s so, fit a rectangle of height 1 and width 0.1 around this curve and these boundaries. See?) But if the right boundary is farther out, this area is more. It’s getting bigger if the right boundary is “x = 2” or “x = 3”. It can get bigger yet. Give me any positive number you like. I can find a right boundary so the area inside this is bigger than your number.

Is there a right boundary where the area is exactly 1? … Well, it’s hard to see how there couldn’t be. If a quantity (“area between x = 1 and x = b”) changes from less than one to greater than one, it’s got to pass through 1, right? … Yes, it does, provided some technical points are true, and in this case they are. So that’s nice.

And there is. It’s a number (settle down, I see you quivering with excitement back there, waiting for me to unveil this) a slight bit more than 2.718. It’s a neat number. Carry it out a couple more digits and it turns out to be 2.718281828. So it looks like a great candidate to memorize. It’s not. It’s an irrational number. The digits go off without repeating or falling into obvious patterns after that. It’s a transcendental number, which has to do with polynomials. Nobody knows whether it’s a normal number, because remember, a normal number is just any real number that you never heard of. To be a normal number, every finite string of digits has to appear in the decimal expansion, just as often as every other string of digits of the same length. We can show by clever counting arguments that roughly every number is normal. Trick is it’s hard to show that any particular number is.

So let me do another definite integral. Set the left boundary to this “x = 2.718281828(etc)”. Set the right boundary a little more than that. The enclosed area is less than 1. Set the right boundary way off to the right. The enclosed area is more than 1. What right boundary makes the enclosed area ‘1’ again? … Well, that will be at about “x = 7.389”. That is, at the square of 2.718281828(etc).

Repeat this. Set the left boundary at “x = (2.718281828etc)2”. Where does the right boundary have to be so the enclosed area is 1? … Did you guess “x = (2.718281828etc)3”? Yeah, of course. You know my rhetorical tricks. What do you want to guess the area is between, oh, “x = (2.718281828etc)3” and “x = (2.718281828etc)5”? (Notice I put a ‘5’ in the superscript there.)

Now, relationships like this will happen with other functions, and with other left- and right-boundaries. But if you want it to work with a function whose rule is as simple as “f(x) = 1 / x”, and areas of 1, then you’re going to end up noticing this 2.718281828(etc). It stands out. It’s worthy of a name.

Which is why this 2.718281828(etc) is a number you’ve heard of. It’s named ‘e’. Leonhard Euler, whom you will remember as having written or proved the fundamental theorem for every area of mathematics ever, gave it that name. He used it first when writing for his own work. Then (in November 1731) in a letter to Christian Goldbach. Finally (in 1763) in his textbook Mechanica. Everyone went along with him because Euler knew how to write about stuff, and how to pick symbols that worked for stuff.

Once you know ‘e’ is there, you start to see it everywhere. In Western mathematics it seems to have been first noticed by Jacob (I) Bernoulli, who noticed it in toy compound interest problems. (Given this, I’d imagine it has to have been noticed by the people who did finance. But I am ignorant of the history of financial calculations. Writers of the kind of pop-mathematics history I read don’t notice them either.) Bernoulli and Pierre Raymond de Montmort noticed the reciprocal of ‘e’ turning up in what we’ve come to call the ‘hat check problem’. A large number of guests all check one hat each. The person checking hats has no idea who anybody is. What is the chance that nobody gets their correct hat back? … That chance is the reciprocal of ‘e’. The number’s about 0.368. In a connected but not identical problem, suppose something has one chance in some number ‘N’ of happening each attempt. And it’s given ‘N’ attempts given for it to happen. What’s the chance that it doesn’t happen? The bigger ‘N’ gets, the closer the chance it doesn’t happen gets to the reciprocal of ‘e’.

It comes up in peculiar ways. In high school or freshman calculus you see it defined as what you get if you take $\left(1 + \frac{1}{x}\right)^x$ for ever-larger real numbers ‘x’. (This is the toy-compound-interest problem Bernoulli found.) But you can find the number other ways. You can calculate it — if you have the stamina — by working out the value of

$1 + 1 + \frac12\left( 1 + \frac13\left( 1 + \frac14\left( 1 + \frac15\left( 1 + \cdots \right)\right)\right)\right)$

There’s a simpler way to write that. There always is. Take all the nonnegative whole numbers — 0, 1, 2, 3, 4, and so on. Take their factorials. That’s 1, 1, 2, 6, 24, and so on. Take the reciprocals of all those. That’s … 1, 1, one-half, one-sixth, one-twenty-fourth, and so on. Add them all together. That’s ‘e’.

This ‘e’ turns up all the time. Any system whose rate of growth depends on its current value has an ‘e’ lurking in its description. That’s true if it declines, too, as long as the decline depends on its current value. It gets stranger. Cross ‘e’ with complex-valued numbers and you get, not just growth or decay, but oscillations. And many problems that are hard to solve to start with become doable, even simple, if you rewrite them as growths and decays and oscillations. Through ‘e’ problems too hard to do become problems of polynomials, or even simpler things.

Simple problems become that too. That property about the area underneath “f(x) = 1/x” between “x = 1” and “x = b” makes ‘e’ such a natural base for logarithms that we call it the base for natural logarithms. Logarithms let us replace multiplication with addition, and division with subtraction, easier work. They change exponentiation problems to multiplication, again easier. It’s a strange touch, a wondrous one.

There are some numbers interesting enough to attract books about them. π, obviously. 0. The base of imaginary numbers, $\imath$, has a couple. I only know one pop-mathematics treatment of ‘e’, Eli Maor’s e: The Story Of A Number. I believe there’s room for more.

Oh, one little remarkable thing that’s of no use whatsoever. Mathworld’s page about approximations to ‘e’ mentions this. Work out, if you can coax your calculator into letting you do this, the number:

$\left(1 + 9^{-(4^{(42)})}\right)^{\left(3^{(2^{85})}\right)}$

You know, the way anyone’s calculator will let you raise 2 to the 85th power. And then raise 3 to whatever number that is. Anyway. The digits of this will agree with the digits of ‘e’ for the first 18,457,734,525,360,901,453,873,570 decimal digits. One Richard Sabey found that, by what means I do not know, in 2004. The page linked there includes a bunch of other, no less amazing, approximations to numbers like ‘e’ and π and the Euler-Mascheroni Constant.

## What Second Derivatives Are And What They Can Do For You

Previous supplemental reading for Why Stuff Can Orbit:

This is another supplemental piece because it’s too much to include in the next bit of Why Stuff Can Orbit. I need some more stuff about how a mathematical physicist would look at something.

This is also a story about approximations. A lot of mathematics is really about approximations. I don’t mean numerical computing. We all know that when we compute we’re making approximations. We use 0.333333 instead of one-third and we use 3.141592 instead of π. But a lot of precise mathematics, what we call analysis, is also about approximations. We do this by a logical structure that works something like this: take something we want to prove. Now for every positive number ε we can find something — a point, a function, a curve — that’s no more than ε away from the thing we’re really interested in, and which is easier to work with. Then we prove whatever we want to with the easier-to-work-with thing. And since ε can be as tiny a positive number as we want, we can suppose ε is a tinier difference than we can hope to measure. And so the difference between the thing we’re interested in and the thing we’ve proved something interesting about is zero. (This is the part that feels like we’re pulling a scam. We’re not, but this is where it’s worth stopping and thinking about what we mean by “a difference between two things”. When you feel confident this isn’t a scam, continue.) So we proved whatever we proved about the thing we’re interested in. Take an analysis course and you will see this all the time.

When we get into mathematical physics we do a lot of approximating functions with polynomials. Why polynomials? Yes, because everything is polynomials. But also because polynomials make so much mathematical physics easy. Polynomials are easy to calculate, if you need numbers. Polynomials are easy to integrate and differentiate, if you need analysis. Here that’s the calculus that tells you about patterns of behavior. If you want to approximate a continuous function you can always do it with a polynomial. The polynomial might have to be infinitely long to approximate the entire function. That’s all right. You can chop it off after finitely many terms. This finite polynomial is still a good approximation. It’s just good for a smaller region than the infinitely long polynomial would have been.

Necessary qualifiers: pages 65 through 82 of any book on real analysis.

So. Let me get to functions. I’m going to use a function named ‘f’ because I’m not wasting my energy coming up with good names. (When we get back to the main Why Stuff Can Orbit sequence this is going to be ‘U’ for potential energy or ‘E’ for energy.) It’s got a domain that’s the real numbers, and a range that’s the real numbers. To express this in symbols I can write $f: \Re \rightarrow \Re$. If I have some number called ‘x’ that’s in the domain then I can tell you what number in the domain is matched by the function ‘f’ to ‘x’: it’s the number ‘f(x)’. You were expecting maybe 3.5? I don’t know that about ‘f’, not yet anyway. The one thing I do know about ‘f’, because I insist on it as a condition for appearing, is that it’s continuous. It hasn’t got any jumps, any gaps, any regions where it’s not defined. You could draw a curve representing it with a single, if wriggly, stroke of the pen.

I mean to build an approximation to the function ‘f’. It’s going to be a polynomial expansion, a set of things to multiply and add together that’s easy to find. To make this polynomial expansion this I need to choose some point to build the approximation around. Mathematicians call this the “point of expansion” because we froze up in panic when someone asked what we were going to name it, okay? But how are we going to make an approximation to a function if we don’t have some particular point we’re approximating around?

(One answer we find in grad school when we pick up some stuff from linear algebra we hadn’t been thinking about. We’ll skip it for now.)

I need a name for the point of expansion. I’ll use ‘a’. Many mathematicians do. Another popular name for it is ‘x0‘. Or if you’re using some other variable name for stuff in the domain then whatever that variable is with subscript zero.

So my first approximation to the original function ‘f’ is … oh, shoot, I should have some new name for this. All right. I’m going to use ‘F0‘ as the name. This is because it’s one of a set of approximations, each of them a little better than the old. ‘F1‘ will be better than ‘F0‘, but ‘F2‘ will be even better, and ‘F2038‘ will be way better yet. I’ll also say something about what I mean by “better”, although you’ve got some sense of that already.

I start off by calling the first approximation ‘F0‘ by the way because you’re going to think it’s too stupid to dignify with a number as big as ‘1’. Well, I have other reasons, but they’ll be easier to see in a bit. ‘F0‘, like all its sibling ‘Fn‘ functions, has a domain of the real numbers and a range of the real numbers. The rule defining how to go from a number ‘x’ in the domain to some real number in the range?

$F^0(x) = f(a)$

That is, this first approximation is simply whatever the original function’s value is at the point of expansion. Notice that’s an ‘x’ on the left side of the equals sign and an ‘a’ on the right. This seems to challenge the idea of what an “approximation” even is. But it’s legit. Supposing something to be constant is often a decent working assumption. If you failed to check what the weather for today will be like, supposing that it’ll be about like yesterday will usually serve you well enough. If you aren’t sure where your pet is, you look first wherever you last saw the animal. (Or, yes, where your pet most loves to be. A particular spot, though.)

We can make this rigorous. A mathematician thinks this is rigorous: you pick any margin of error you like. Then I can find a region near enough to the point of expansion. The value for ‘f’ for every point inside that region is ‘f(a)’ plus or minus your margin of error. It might be a small region, yes. Doesn’t matter. It exists, no matter how tiny your margin of error was.

But yeah, that expansion still seems too cheap to work. My next approximation, ‘F1‘, will be a little better. I mean that we can expect it will be closer than ‘F0‘ was to the original ‘f’. Or it’ll be as close for a bigger region around the point of expansion ‘a’. What it’ll represent is a line. Yeah, ‘F0‘ was a line too. But ‘F0‘ is a horizontal line. ‘F1‘ might be a line at some completely other angle. If that works better. The second approximation will look like this:

$F^1(x) = f(a) + m\cdot\left(x - a\right)$

Here ‘m’ serves its traditional yet poorly-explained role as the slope of a line. What the slope of that line should be we learn from the derivative of the original ‘f’. The derivative of a function is itself a new function, with the same domain and the same range. There’s a couple ways to denote this. Each way has its strengths and weaknesses about clarifying what we’re doing versus how much we’re writing down. And trying to write down almost anything can inspire confusion in analysis later on. There’s a part of analysis when you have to shift from thinking of particular problems to how problems work then.

So I will define a new function, spoken of as f-prime, this way:

$f'(x) = \frac{df}{dx}\left(x\right)$

If you look closely you realize there’s two different meanings of ‘x’ here. One is the ‘x’ that appears in parentheses. It’s the value in the domain of f and of f’ where we want to evaluate the function. The other ‘x’ is the one in the lower side of the derivative, in that $\frac{df}{dx}$. That’s my sloppiness, but it’s not uniquely mine. Mathematicians keep this straight by using the symbols $\frac{df}{dx}$ so much they don’t even see the ‘x’ down there anymore so have no idea there’s anything to find confusing. Students keep this straight by guessing helplessly about what their instructors want and clinging to anything that doesn’t get marked down. Sorry. But what this means is to “take the derivative of the function ‘f’ with respect to its variable, and then, evaluate what that expression is for the value of ‘x’ that’s in parentheses on the left-hand side”. We can do some things that avoid the confusion in symbols there. They all require adding some more variables and some more notation in, and it looks like overkill for a measly definition like this.

Anyway. We really just want the deriviate evaluated at one point, the point of expansion. That is:

$m = f'(a) = \frac{df}{dx}\left(a\right)$

which by the way avoids that overloaded meaning of ‘x’ there. Put this together and we have what we call the tangent line approximation to the original ‘f’ at the point of expansion:

$F^1(x) = f(a) + f'(a)\cdot\left(x - a\right)$

This is also called the tangent line, because it’s a line that’s tangent to the original function. A plot of ‘F1‘ and the original function ‘f’ are guaranteed to touch one another only at the point of expansion. They might happen to touch again, but that’s luck. The tangent line will be close to the original function near the point of expansion. It might happen to be close again later on, but that’s luck, not design. Most stuff you might want to do with the original function you can do with the tangent line, but the tangent line will be easier to work with. It exactly matches the original function at the point of expansion, and its first derivative exactly matches the original function’s first derivative at the point of expansion.

We can do better. We can find a parabola, a second-order polynomial that approximates the original function. This will be a function ‘F2(x)’ that looks something like:

$F^2(x) = f(a) + f'(a)\cdot\left(x - a\right) + \frac12 m_2 \left(x - a\right)^2$

What we’re doing is adding a parabola to the approximation. This is that curve that looks kind of like a loosely-drawn U. The ‘m2‘ there measures how spread out the U is. It’s not quite the slope, but it’s kind of like that, which is why I’m using the letter ‘m’ for it. Its value we get from the second derivative of the original ‘f’:

$m_2 = f''(a) = \frac{d^2f}{dx^2}\left(a\right)$

We find the second derivative of a function ‘f’ by evaluating the first derivative, and then, taking the derivative of that. We can denote it with two ‘ marks after the ‘f’ as long as we aren’t stuck wrapping the function name in ‘ marks to set it out. And so we can describe the function this way:

$F^2(x) = f(a) + f'(a)\cdot\left(x - a\right) + \frac12 f''(a) \left(x - a\right)^2$

This will be a better approximation to the original function near the point of expansion. Or it’ll make larger the region where the approximation is good.

If the first derivative of a function at a point is zero that means the tangent line is horizontal. In physics stuff this is an equilibrium. The second derivative can tell us whether the equilibrium is stable or not. If the second derivative at the equilibrium is positive it’s a stable equilibrium. The function looks like a bowl open at the top. If the second derivative at the equilibrium is negative then it’s an unstable equilibrium.

We can make better approximations yet, by using even more derivatives of the original function ‘f’ at the point of expansion:

$F^3(x) = f(a) + f'(a)\cdot\left(x - a\right) + \frac12 f''(a) \left(x - a\right)^2 + \frac{1}{3\cdot 2} f'''(a) \left(x - a\right)^3$

There’s better approximations yet. You can probably guess what the next, fourth-degree, polynomial would be. Or you can after I tell you the fraction in front of the new term will be $\frac{1}{4\cdot 3\cdot 2}$. The only big difference is that after about the third derivative we give up on adding ‘ marks after the function name ‘f’. It’s just too many little dots. We start writing, like, ‘f(iv)‘ instead. Or if the Roman numerals are too much then ‘f(2038)‘ instead. Or if we don’t want to pin things down to a specific value ‘f(j)‘ with the understanding that ‘j’ is some whole number.

We don’t need all of them. In physics problems we get equilibriums from the first derivative. We get stability from the second derivative. And we get springs in the second derivative too. And that’s what I hope to pick up on in the next installment of the main series.

## The End 2016 Mathematics A To Z: Osculating Circle

I’m happy to say it’s another request today. This one’s from HowardAt58, author of the Saving School Math blog. He’s given me some great inspiration in the past.

## Osculating Circle.

It’s right there in the name. Osculating. You know what that is from that one Daffy Duck cartoon where he cries out “Greetings, Gate, let’s osculate” while wearing a moustache. Daffy’s imitating somebody there, but goodness knows who. Someday the mystery drives the young you to a dictionary web site. Osculate means kiss. This doesn’t seem to explain the scene. Daffy was imitating Jerry Colonna. That meant something in 1943. You can find him on old-time radio recordings. I think he’s funny, in that 40s style.

Make the substitution. A kissing circle. Suppose it’s not some playground antic one level up from the Kissing Bandit that plagues recess yet one or two levels down what we imagine we’d do in high school. It suggests a circle that comes really close to something, that touches it a moment, and then goes off its own way.

But then touching. We know another word for that. It’s the root behind “tangent”. Tangent is a trigonometry term. But it appears in calculus too. The tangent line is a line that touches a curve at one specific point and is going in the same direction as the original curve is at that point. We like this because … well, we do. The tangent line is a good approximation of the original curve, at least at the tangent point and for some region local to that. The tangent touches the original curve, and maybe it does something else later on. What could kissing be?

The osculating circle is about approximating an interesting thing with a well-behaved thing. So are similar things with names like “osculating curve” or “osculating sphere”. We need that a lot. Interesting things are complicated. Well-behaved things are understood. We move from what we understand to what we would like to know, often, by an approximation. This is why we have tangent lines. This is why we build polynomials that approximate an interesting function. They share the original function’s value, and its derivative’s value. A polynomial approximation can share many derivatives. If the function is nice enough, and the polynomial big enough, it can be impossible to tell the difference between the polynomial and the original function.

The osculating circle, or sphere, isn’t so concerned with matching derivatives. I know, I’m as shocked as you are. Well, it matches the first and the second derivatives of the original curve. Anything past that, though, it matches only by luck. The osculating circle is instead about matching the curvature of the original curve. The curvature is what you think it would be: it’s how much a function curves. If you imagine looking closely at the original curve and an osculating circle they appear to be two arcs that come together. They must touch at one point. They might touch at others, but that’s incidental.

Osculating circles, and osculating spheres, sneak out of mathematics and into practical work. This is because we often want to work with things that are almost circles. The surface of the Earth, for example, is not a sphere. But it’s only a tiny bit off. It’s off in ways that you only notice if you are doing high-precision mapping. Or taking close measurements of things in the sky. Sometimes we do this. So we map the Earth locally as if it were a perfect sphere, with curvature exactly what its curvature is at our observation post.

Or we might be observing something moving in orbit. If the universe had only two things in it, and they were the correct two things, all orbits would be simple: they would be ellipses. They would have to be “point masses”, things that have mass without any volume. They never are. They’re always shapes. Spheres would be fine, but they’re never perfect spheres even. The slight difference between a perfect sphere and whatever the things really are affects the orbit. Or the other things in the universe tug on the orbiting things. Or the thing orbiting makes a course correction. All these things make little changes in the orbiting thing’s orbit. The actual orbit of the thing is a complicated curve. The orbit we could calculate is an osculating — well, an osculating ellipse, rather than an osculating circle. Similar idea, though. Call it an osculating orbit if you’d rather.

That osculating circles have practical uses doesn’t mean they aren’t respectable mathematics. I’ll concede they’re not used as much as polynomials or sine curves are. I suppose that’s because polynomials and sine curves have nicer derivatives than circles do. But osculating circles do turn up as ways to try solving nonlinear differential equations. We need the help. Linear differential equations anyone can solve. Nonlinear differential equations are pretty much impossible. They also turn up in signal processing, as ways to find the frequencies of a signal from a sampling of data. This, too, we would like to know.

We get the name “osculating circle” from Gottfried Wilhelm Leibniz. This might not surprise. Finding easy-to-understand shapes that approximate interesting shapes is why we have calculus. Isaac Newton described a way of making them in the Principia Mathematica. This also might not surprise. Of course they would on this subject come so close together without kissing.

## How Mathematical Physics Works: Another Course In 2200 Words

OK, I need some more background stuff before returning to the Why Stuff Can Orbit series. Last week I explained how to take derivatives, which is one of the three legs of a Calculus I course. Now I need to say something about why we take derivatives. This essay won’t really qualify you to do mathematical physics, but it’ll at least let you bluff your way through a meeting with one.

We care about derivatives because we’re doing physics a smart way. This involves thinking not about forces but instead potential energy. We have a function, called V or sometimes U, that changes based on where something is. If we need to know the forces on something we can take the derivative, with respect to position, of the potential energy.

The way I’ve set up these central force problems makes it easy to shift between physical intuition and calculus. Draw a scribbly little curve, something going up and down as you like, as long as it doesn’t loop back on itself. Also, don’t take the pen from paper. Also, no corners. That’s just cheating. Smooth curves. That’s your potential energy function. Take any point on this scribbly curve. If you go to the right a little from that point, is the curve going up? Then your function has a positive derivative at that point. Is the curve going down? Then your function has a negative derivative. Find some other point where the curve is going in the other direction. If it was going up to start, find a point where it’s going down. Somewhere in-between there must be a point where the curve isn’t going up or going down. The Intermediate Value Theorem says you’re welcome.

These points where the potential energy isn’t increasing or decreasing are the interesting ones. At least if you’re a mathematical physicist. They’re equilibriums. If whatever might be moving happens to be exactly there, then it’s not going to move. It’ll stay right there. Mathematically: the force is some fixed number times the derivative of the potential energy there. The potential energy’s derivative is zero there. So the force is zero and without a force nothing’s going to change. Physical intuition: imagine you laid out a track with exactly the shape of your curve. Put a marble at this point where the track isn’t rising and isn’t falling. Does the marble move? No, but if you’re not so sure about that read on past the next paragraph.

Mathematical physicists learn to look for these equilibriums. We’re taught to not bother with what will happen if we release this particle at this spot with this velocity. That is, you know, not looking at any particular problem someone might want to know. We look instead at equilibriums because they help us describe all the possible behaviors of a system. Mathematicians are sometimes characterized as lazy in spirit. This is fair. Mathematicians will start out with a problem looking to see if it’s just like some other problem someone already solved. But the flip side is if one is going to go to the trouble of solving a new problem, she’s going to really solve it. We’ll work out not just what happens from some one particular starting condition. We’ll try to describe all the different kinds of thing that could happen, and how to tell which of them does happen for your measly little problem.

If you actually do have a curvy track and put a marble down on its equilibrium it might yet move. Suppose the track is rising a while and then falls back again; putting the marble at top and it’s likely to roll one way or the other. If it doesn’t it’s probably because of friction; the track sticks a little. If it were a really smooth track and the marble perfectly round then it’d fall. Give me this. But even with a perfectly smooth track and perfectly frictionless marble it’ll still roll one way or another. Unless you put it exactly at the spot that’s the top of the hill, not a bit to the left or the right. Good luck.

What’s happening here is the difference between a stable and an unstable equilibrium. This is again something we all have a physical intuition for. Imagine you have something that isn’t moving. Give it a little shove. Does it stay about like it was? Then it’s stable. Does it break? Then it’s unstable. The marble at the top of the track is at an unstable equilibrium; a little nudge and it’ll roll away. If you had a marble at the bottom of a track, inside a valley, then it’s a stable equilibrium. A little nudge will make the marble rock back and forth but it’ll stay nearby.

Yes, if you give it a crazy big whack the marble will go flying off, never to be seen again. We’re talking about small nudges. No, smaller than that. This maybe sounds like question-begging to you. But what makes for an unstable equilibrium is that no nudge is too small. The nudge — perturbation, in the trade — will just keep growing. In a stable equilibrium there’s nudges small enough that they won’t keep growing. They might not shrink, but they won’t grow either.

So how to tell which is which? Well, look at your potential energy and imagine it as a track with a marble again. Where are the unstable equilibriums? They’re the ones at tops of hills. Near them the curve looks like a cup pointing down, to use the metaphor every Calculus I class takes. Where are the stable equilibriums? They’re the ones at bottoms of valleys. Near them the curve looks like a cup pointing up. Again, see Calculus I.

We may be able to tell the difference between these kinds of equilibriums without drawing the potential energy. We can use the second derivative. To find the second derivative of a function you take the derivative of a function and then — you may want to think this one over — take the derivative of that. That is, you take the derivative of the original function a second time. Sometimes higher mathematics gives us terms that aren’t too hard.

So if you have a spot where you know there’s an equilibrium, look at what the second derivative at that spot is. If it’s positive, you have a stable equilibrium. If it’s negative, you have an unstable equilibrium. This is called “Second Derivative Test”, as it was named by a committee that figured it was close enough to 5 pm and why cause trouble?

If the second derivative is zero there, um, we can’t say anything right now. The equilibrium may also be an inflection point. That’s where the growth of something pauses a moment before resuming. Or where the decline of something pauses a moment before resuming. In either case that’s still an unstable equilibrium. But it doesn’t have to be. It could still be a stable equilibrium. It might just have a very smoothly flat base. No telling just from that one piece of information and this is why we have to go on to other work.

But this gets at how we’d like to look at a system. We look for its equilibriums. We figure out which equilibriums are stable and which ones are unstable. With a little more work we can say, if the system starts out like this it’ll stay near that equilibrium. If it starts out like that it’ll stay near this whole other equilibrium. If it starts out this other way, it’ll go flying off to the end of the universe. We can solve every possible problem at once and never have to bother with a particular case. This feels good.

It also gives us a little something more. You maybe have heard of a tangent line. That’s a line that’s, er, tangent to a curve. Again with the not-too-hard terms. What this means is there’s a point, called the “point of tangency”, again named by a committee that wanted to get out early. And the line just touches the original curve at that point, and it’s going in exactly the same direction as the original curve at that point. Typically this means the line just grazes the curve, at least around there. If you’ve ever rolled a pencil until it just touched the edge of your coffee cup or soda can, you’ve set up a tangent line to the curve of your beverage container. You just didn’t think of it as that because you’re not daft. Fair enough.

Mathematicians will use tangents because a tangent line has values that are so easy to calculate. The function describing a tangent line is a polynomial and we llllllllove polynomials, correctly. The tangent line is always easy to understand, however hard the original function was. Its value, at the equilibrium, is exactly what the original function’s was. Its first derivative, at the equilibrium, is exactly what the original function’s was at that point. Its second derivative is zero, which might or might not be true of the original function. We don’t care.

We don’t use tangent lines when we look at equilibriums. This is because in this case they’re boring. If it’s an equilibrium then its tangent line is a horizontal line. No matter what the original function was. It’s trivial: you know the answer before you’ve heard the question.

Ah, but, there is something mathematical physicists do like. The tangent line is boring. Fine. But how about, using the second derivative, building a tangent … well, “parabola” is the proper term. This is a curve that’s a quadratic, that looks like an open bowl. It exactly matches the original function at the equilibrium. Its derivative exactly matches the original function’s derivative at the equilibrium. Its second derivative also exactly matches the original function’s second derivative, though. Third derivative we don’t care about. It’s so not important here I can’t even finish this sentence in a

What this second-derivative-based approximation gives us is a parabola. It will look very much like the original function if we’re close to the equilibrium. And this gives us something great. The great thing is this is the same potential energy shape of a weight on a spring, or anything else that oscillates back and forth. It’s the potential energy for “simple harmonic motion”.

And that’s great. We start studying simple harmonic motion, oh, somewhere in high school physics class because it’s so much fun to play with slinkies and springs and accidentally dropping weights on our lab partners. We never stop. The mathematics behind it is simple. It turns up everywhere. If you understand the mathematics of a mass on a spring you have a tool that relevant to pretty much every problem you ever have. This approximation is part of that. Close to a stable equilibrium, whatever system you’re looking at has the same behavior as a weight on a spring.

It may strike you that a mass on a spring is itself a central force. And now I’m saying that within the central force problem I started out doing, stuff that orbits, there’s another central force problem. This is true. You’ll see that in a few Why Stuff Can Orbit essays.

So far, by the way, I’ve talked entirely about a potential energy with a single variable. This is for a good reason: two or more variables is harder. Well of course it is. But the basic dynamics are still open. There’s equilibriums. They can be stable or unstable. They might have inflection points. There is a new kind of behavior. Mathematicians call it a “saddle point”. This is where in one direction the potential energy makes it look like a stable equilibrium while in another direction the potential energy makes it look unstable. Examples of it kind of look like the shape of a saddle, if you haven’t looked at an actual saddle recently. (If you really want to know, get your computer to plot the function z = x2 – y2 and look at the origin, where x = 0 and y = 0.) Well, there’s points on an actual saddle that would be saddle points to a mathematician. It’s unstable, because there’s that direction where it’s definitely unstable.

So everything about multivariable functions is longer, and a couple bits of it are harder. There’s more chances for weird stuff to happen. I think I can get through most of Why Stuff Can Orbit without having to know that. But do some reading up on that before you take a job as a mathematical physicist.

## Anatomizing An Error

Though it’s the summer months I’m happy to say the Carnot Cycle thermodynamics blog is still posting. He had been writing about Jacobus Henricus van ‘t Hoff, first recipient of the Nobel Prize in Chemistry. In the 1880s van ‘t Hoff was studying the osmosis. In April’s essay Carnot Cycle described the problem, and how van ‘t Hoff passed up a correct formula describing osmotic pressure in favor of an attractive but wrong alternative.

In this month’s essay Carnot Cycle continues the topic. It particularly goes over just how van ‘t Hoff got to his mistaken idea. It’s not that he started out wrong. He began from a good start and derived a mistaken formula. The derivation involved a string of assumptions and simplifications and approximations, of the kind that must be made to go from starting principles to a specific problem. He was guided by an idea of what the answer ought to look like, though, and that led him astray. The blog describes what he did and why it would look reasonable in the circumstance. It’s worth reading to see what actual mathematics, the kind that doesn’t have known answers, is like.

## Theorem Thursday: Liouville’s Approximation Theorem And How To Make Your Own Transcendental Number

As I get into the second month of Theorem Thursdays I have, I think, the whole roster of weeks sketched out. Today, I want to dive into some real analysis, and the study of numbers. It’s the sort of thing you normally get only if you’re willing to be a mathematics major. I’ll try to be readable by people who aren’t. If you carry through to the end and follow directions you’ll have your very own mathematical construct, too, so enjoy.

# Liouville’s Approximation Theorem

It all comes back to polynomials. Of course it does. Polynomials aren’t literally everything in mathematics. They just come close. Among the things we can do with polynomials is divide up the real numbers into different sets. The tool we use is polynomials with integer coefficients. Integers are the positive and the negative whole numbers, stuff like ‘4’ and ‘5’ and ‘-12’ and ‘0’.

A polynomial is the sum of a bunch of products of coefficients multiplied by a variable raised to a power. We can use anything for the variable’s name. So we use ‘x’. Sometimes ‘t’. If we want complex-valued polynomials we use ‘z’. Some people trying to make a point will use ‘y’ or ‘s’ but they’re just showing off. Coefficients are just numbers. If we know the numbers, great. If we don’t know the numbers, or we want to write something that doesn’t commit us to any particular numbers, we use letters from the start of the alphabet. So we use ‘a’, maybe ‘b’ if we must. If we need a lot of numbers, we use subscripts: a0, a1, a2, and so on, up to some an for some big whole number n. To talk about one of these without committing ourselves to a specific example we use a subscript of i or j or k: aj, ak. It’s possible that aj and ak equal each other, but they don’t have to, unless j and k are the same whole number. They might also be zero, but they don’t have to be. They can be any numbers. Or, for this essay, they can be any integers. So we’d write a generic polynomial f(x) as:

$f(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + \cdots + a_{n - 1}x^{n - 1} + a_n x^n$

(Some people put the coefficients in the other order, that is, $a_n + a_{n - 1}x + a_{n - 2}x^2$ and so on. That’s not wrong. The name we give a number doesn’t matter. But it makes it harder to remember what coefficient matches up with, say, x14.)

A zero, or root, is a value for the variable (‘x’, or ‘t’, or what have you) which makes the polynomial equal to zero. It’s possible that ‘0’ is a zero, but don’t count on it. A polynomial of degree n — meaning the highest power to which x is raised is n — can have up to n different real-valued roots. All we’re going to care about is one.

Rational numbers are what we get by dividing one whole number by another. They’re numbers like 1/2 and 5/3 and 6. They’re numbers like -2.5 and 1.0625 and negative a billion. Almost none of the real numbers are rational numbers; they’re exceptional freaks. But they are all the numbers we actually compute with, once we start working out digits. Thus we remember that to live is to live paradoxically.

And every rational number is a root of a first-degree polynomial. That is, there’s some polynomial f(x) = a_0 + a_1 x that’s made zero for your polynomial. It’s easy to tell you what it is, too. Pick your rational number. You can write that as the integer p divided by the integer q. Now look at the polynomial f(x) = p – q x. Astounded yet?

That trick will work for any rational number. It won’t work for any irrational number. There’s no first-degree polynomial with integer coefficients that has the square root of two as a root. There are polynomials that do, though. There’s f(x) = 2 – x2. You can find the square root of two as the zero of a second-degree polynomial. You can’t find it as the zero of any lower-degree polynomials. So we say that this is an algebraic number of the second degree.

This goes on higher. Look at the cube root of 2. That’s another irrational number, so no first-degree polynomials have it as a root. And there’s no second-degree polynomials that have it as a root, not if we stick to integer coefficients. Ah, but f(x) = 2 – x3? That’s got it. So the cube root of two is an algebraic number of degree three.

We can go on like this, although I admit examples for higher-order algebraic numbers start getting hard to justify. Most of the numbers people have heard of are either rational or are order-two algebraic numbers. I can tell you truly that the eighth root of two is an eighth-degree algebraic number. But I bet you don’t feel enlightened. At best you feel like I’m setting up for something. The number r(5), the smallest radius a disc can have so that five of them will completely cover a disc of radius 1, is eighth-degree and that’s interesting. But you never imagined the number before and don’t have any idea how big that is, other than “I guess that has to be smaller than 1”. (It’s just a touch less than 0.61.) I sound like I’m wasting your time, although you might start doing little puzzles trying to make smaller coins cover larger ones. Do have fun.

Liouville’s Approximation Theorem is about approximating algebraic numbers with rational ones. Almost everything we ever do is with rational numbers. That’s all right because we can make the difference between the number we want, even if it’s r(5), and the numbers we can compute with, rational numbers, as tiny as we need. We trust that the errors we make from this approximation will stay small. And then we discover chaos science. Nothing is perfect.

For example, suppose we need to estimate π. Everyone knows we can approximate this with the rational number 22/7. That’s about 3.142857, which is all right but nothing great. Some people know we can approximate it as 333/106. (I didn’t until I started writing this paragraph and did some research.) That’s about 3.141509, which is better. Then there’s 355/113, which is not as famous as 22/7 but is a celebrity compared to 333/106. That’s about 3.141529. Then we get into some numbers only mathematics hipsters know: 103993/33102 and 104348/33215 and so on. Fine.

The Liouville Approximation Theorem is about sequences that converge on an irrational number. So we have our first approximation x1, that’s the integer p1 divided by the integer q1. So, 22 and 7. Then there’s the next approximation x2, that’s the integer p2 divided by the integer q2. So, 333 and 106. Then there’s the next approximation yet, x3, that’s the integer p3 divided by the integer q3. As we look at more and more approximations, xj‘s, we get closer and closer to the actual irrational number we want, in this case π. Also, the denominators, the qj‘s, keep getting bigger.

The theorem speaks of having an algebraic number, call it x, of some degree n greater than 1. Then we have this limit on how good an approximation can be. The difference between the number x that we want, and our best approximation p / q, has to be larger than the number (1/q)n + 1. The approximation might be higher than x. It might be lower than x. But it will be off by at least the n-plus-first power of 1/q.

Polynomials let us separate the real numbers into infinitely many tiers of numbers. They also let us say how well the most accessible tier of numbers, rational numbers, can approximate these more exotic things.

One of the things we learn by looking at numbers through this polynomial screen is that there are transcendental numbers. These are numbers that can’t be the root of any polynomial with integer coefficients. π is one of them. e is another. Nearly all numbers are transcendental. But the proof that any particular number is one is hard. Joseph Liouville showed that transcendental numbers must exist by using continued fractions. But this approximation theorem tells us how to make our own transcendental numbers. This won’t be any number you or anyone else has ever heard of, unless you pick a special case. But it will be yours.

You will need:

1. a1, an integer from 1 to 9, such as ‘1’, ‘9’, or ‘5’.
2. a2, another integer from 1 to 9. It may be the same as a1 if you like, but it doesn’t have to be.
3. a3, yet another integer from 1 to 9. It may be the same as a1 or a2 or, if it so happens, both.
4. a4, one more integer from 1 to 9 and you know what? Let’s summarize things a bit.
5. A whopping great big gob of integers aj, every one of them from 1 to 9, for every possible integer ‘j’ so technically this is infinitely many of them.
6. Comfort with the notation n!, which is the factorial of n. For whole numbers that’s the product of every whole number from 1 to n, so, 2! is 1 times 2, or 2. 3! is 1 times 2 times 3, or 6. 4! is 1 times 2 times 3 times 4, or 24. And so on.
7. Not to be thrown by me writing -n!. By that I mean work out n! and then multiply that by -1. So -2! is -2. -3! is -6. -4! is -24. And so on.

Now, assemble them into your very own transcendental number z, by this formula:

$z = a_1 \cdot 10^{-1} + a_2 \cdot 10^{-2!} + a_3 \cdot 10^{-3!} + a_4 \cdot 10^{-4!} + a_5 \cdot 10^{-5!} + a_6 \cdot 10^{-6!} \cdots$

If you’ve done it right, this will look something like:

$z = 0.a_{1}a_{2}000a_{3}00000000000000000a_{4}0000000 \cdots$

Ah, but, how do you know this is transcendental? We can prove it is. The proof is by contradiction, which is how a lot of great proofs are done. We show nonsense follows if the thing isn’t true, so the thing must be true. (There are mathematicians that don’t care for proof-by-contradiction. They insist on proof by charging straight ahead and showing a thing is true directly. That’s a matter of taste. I think every mathematician feels that way sometimes, to some extent or on some issues. The proof-by-contradiction is easier, at least in this case.)

Suppose that your z here is not transcendental. Then it’s got to be an algebraic number of degree n, for some finite number n. That’s what it means not to be transcendental. I don’t know what n is; I don’t care. There is some n and that’s enough.

Now, let’s let zm be a rational number approximating z. We find this approximation by taking the first m! digits after the decimal point. So, z1 would be just the number 0.a1. z2 is the number 0.a1a2. z3 is the number 0.a1a2000a3. I don’t know what m you like, but that’s all right. We’ll pick a nice big m.

So what’s the difference between z and zm? Well, it can’t be larger than 10 times 10-(m + 1)!. This is for the same reason that π minus 3.14 can’t be any bigger than 0.01.

Now suppose we have the best possible rational approximation, p/q, of your number z. Its first m! digits are going to be p / 10m!. This will be zm And by the Liouville Approximation Theorem, then, the difference between z and zm has to be at least as big as (1/10m!)(n + 1).

So we know the difference between z and zm has to be larger than one number. And it has to be smaller than another. Let me write those out.

$\frac{1}{10^{m! (n + 1)}} < |z - z_m | < \frac{10}{10^{(m + 1)!}}$

We don’t need the z – zm anymore. That thing on the rightmost side we can write what I’ll swear is a little easier to use. What we have left is:

$\frac{1}{10^{m! (n + 1)}} < \frac{1}{10^{(m + 1)! - 1}}$

And this will be true whenever the number m! (n + 1) is greater than (m + 1)! – 1 for big enough numbers m.

But there’s the thing. This isn’t true whenever m is greater than n. So the difference between your alleged transcendental number and its best-possible rational approximation has to be simultaneously bigger than a number and smaller than that same number without being equal to it. Supposing your number is anything but transcendental produces nonsense. Therefore, congratulations! You have a transcendental number.

If you chose all 1’s for your aj‘s, then you have what is sometimes called the Liouville Constant. If you didn’t, you may have a transcendental number nobody’s ever noticed before. You can name it after someone if you like. That’s as meaningful as naming a star for someone and cheaper. But you can style it as weaving someone’s name into the universal truth of mathematics. Enjoy!

I’m glad to finally give you a mathematics essay that lets you make something you can keep.

## Theorem Thursday: A First Fixed Point Theorem

I’m going to let the Mean Value Theorem slide a while. I feel more like a Fixed Point Theorem today. As with the Mean Value Theorem there’s several of these. Here I’ll start with an easy one.

# The Fixed Point Theorem.

Back when the world and I were young I would play with electronic calculators. They encouraged play. They made it so easy to enter a number and hit an operation, and then hit that operation again, and again and again. Patterns appeared. Start with, say, ‘2’ and hit the ‘squared’ button, the smaller ‘2’ raised up from the key’s baseline. You got 4. And again: 16. And again: 256. And again and again and you got ever-huger numbers. This happened whenever you started from a number bigger than 1. Start from something smaller than 1, however tiny, and it dwindled down to zero, whatever you tried. Start at ‘1’ and it just stays there. The results were similar if you started with negative numbers. The first squaring put you in positive numbers and everything carried on as before.

This sort of thing happened a lot. Keep hitting the mysterious ‘exp’ and the numbers would keep growing forever. Keep hitting ‘sqrt’; if you started above 1, the numbers dwindled to 1. Start below and the numbers rise to 1. Or you started at zero, but who’s boring enough to do that? ‘log’ would start with positive numbers and keep dropping until it turned into a negative number. The next step was the calculator’s protest we were unleashing madness on the world.

But you didn’t always get zero, one, infinity, or madness, from repeatedly hitting the calculator button. Sometimes, some functions, you’d get an interesting number. If you picked any old number and hit cosine over and over the digits would eventually settle down to around 0.739085. Or -0.739085. Cosine’s great. Tangent … tangent is weird. Tangent does all sorts of bizarre stuff. But at least cosine is there, giving us this interesting number.

(Something you might wonder: this is the cosine of an angle measured in radians, which is how mathematicians naturally think of angles. Normal people measure angles in degrees, and that will have a different fixed point. We write both the cosine-in-radians and the cosine-in-degrees using the shorthand ‘cos’. We get away with this because people who are confused by this are too embarrassed to call us out on it. If we’re thoughtful we write, say, ‘cos x’ for radians and ‘cos x°’ for degrees. This makes the difference obvious. It doesn’t really, but at least we gave some hint to the reader.)

This all is an example of a fixed point theorem. Fixed point theorems turn up in a lot of fields. They were most impressed upon me in dynamical systems, studying how a complex system changes in time. A fixed point, for these problems, is an equilibrium. It’s where things aren’t changed by a process. You can see where that’s interesting.

In this series I haven’t stated theorems exactly much, and I haven’t given them real proofs. But this is an easy one to state and to prove. Start off with a function, which I’ll name ‘f’, because yes that is exactly how much effort goes in to naming functions. It has as a domain the interval [a, b] for some real numbers ‘a’ and ‘b’. And it has as rang the same interval, [a, b]. It might use the whole range; it might use only a subset of it. And we have to require that f is continuous.

Then there has to be at least one fixed point. There must be at last one number ‘c’, somewhere in the interval [a, b], for which f(c) equals c. There may be more than one; we don’t say anything about how many there are. And it can happen that c is equal to a. Or that c equals b. We don’t know that it is or that it isn’t. We just know there’s at least one ‘c’ that makes f(c) equal c.

You get that in my various examples. If the function f has the rule that any given x is matched to x2, then we do get two fixed points: f(0) = 02 = 0, and, f(1) = 12 = 1. Or if f has the rule that any given x is matched to the square root of x, then again we have: $f(0) = \sqrt{0} = 0$ and $f(1) = \sqrt{1} = 1$. Same old boring fixed points. The cosine is a little more interesting. For that we have $f(0.739085...) = \cos\left(0.739085...\right) = 0.739085...$.

How to prove it? The easiest way I know is to summon the Intermediate Value Theorem. Since I wrote a couple hundred words about that a few weeks ago I can assume you to understand it perfectly and have no question about how it makes this problem easy. I don’t even need to go on, do I?

… Yeah, fair enough. Well, here’s how to do it. We’ll take the original function f and create, based on it, a new function. We’ll dig deep in the alphabet and name that ‘g’. It has the same domain as f, [a, b]. Its range is … oh, well, something in the real numbers. Don’t care. The wonder comes from the rule we use.

The rule for ‘g’ is this: match the given number ‘x’ with the number ‘f(x) – x’. That is, g(a) equals whatever f(a) would be, minus a. g(b) equals whatever f(b) would be, minus b. We’re allowed to define a function in terms of some other function, as long as the symbols are meaningful. But we aren’t doing anything wrong like dividing by zero or taking the logarithm of a negative number or asking for f where it isn’t defined.

You might protest that we don’t know what the rule for f is. We’re told there is one, and that it’s a continuous function, but nothing more. So how can I say I’ve defined g in terms of a function I don’t know?

In the first place, I already know everything about f that I need to. I know it’s a continuous function defined on the interval [a, b]. I won’t use any more than that about it. And that’s great. A theorem that doesn’t require knowing much about a function is one that applies to more functions. It’s like the difference between being able to say something true of all living things in North America, and being able to say something true of all persons born in Redbank, New Jersey, on the 18th of February, 1944, who are presently between 68 and 70 inches tall and working on their rock operas. Both things may be true, but one of those things you probably use more.

In the second place, suppose I gave you a specific rule for f. Let me say, oh, f matches x with the arccosecant of x. Are you feeling any more enlightened now? Didn’t think so.

Back to g. Here’s some things we can say for sure about it. g is a function defined on the interval [a, b]. That’s how we set it up. Next point: g is a continuous function on the interval [a, b]. Remember, g is just the function f, which was continuous, minus x, which is also continuous. The difference of two continuous functions is still going to be continuous. (This is obvious, although it may take some considered thinking to realize why it is obvious.)

Now some interesting stuff. What is g(a)? Well, it’s whatever number f(a) is minus a. I can’t tell you what number that is. But I can tell you this: it’s not negative. Remember that f(a) has to be some number in the interval [a, b]. That is, it’s got to be no smaller than a. So the smallest f(a) can be is equal to a, in which case f(a) minus a is zero. And f(a) might be larger than a, in which case f(a) minus a is positive. So g(a) is either zero or a positive number.

(If you’ve just realized where I’m going and gasped in delight, well done. If you haven’t, don’t worry. You will. You’re just out of practice.)

What about g(b)? Since I don’t know what f(b) is, I can’t tell you what specific number it is. But I can tell you it’s not a positive number. The reasoning is just like above: f(b) is some number on the interval [a, b]. So the biggest number f(b) can equal is b. And in that case f(b) minus b is zero. If f(b) is any smaller than b, then f(b) minus b is negative. So g(b) is either zero or a negative number.

(Smiling at this? Good job. If you aren’t, again, not to worry. This sort of argument is not the kind of thing you do in Boring Algebra. It takes time and practice to think this way.)

And now the Intermediate Value Theorem works. g(a) is a positive number. g(b) is a negative number. g is continuous from a to b. Therefore, there must be some number ‘c’, between a and b, for which g(c) equals zero. And remember what g(c) means: f(c) – c equals 0. Therefore f(c) has to equal c. There has to be a fixed point.

And some tidying up. Like I said, g(a) might be positive. It might also be zero. But if g(a) is zero, then f(a) – a = 0. So a would be a fixed point. And similarly if g(b) is zero, then f(b) – b = 0. So then b would be a fixed point. The important thing is there must be at least some fixed point.

Now that calculator play starts taking on purposeful shape. Squaring a number could find a fixed point only if you started with a number from -1 to 1. The square of a number outside this range, such as ‘2’, would be bigger than you started with, and the Fixed Point Theorem doesn’t apply. Similarly with exponentials. But square roots? The square root of any number from 0 to a positive number ‘b’ is a number between 0 and ‘b’, at least as long as b was bigger than 1. So there was a fixed point, at 1. The cosine of a real number is some number between -1 and 1, and the cosines of all the numbers between -1 and 1 are themselves between -1 and 1. The Fixed Point Theorem applies. Tangent isn’t a continuous function. And the calculator play never settles on anything.

As with the Intermediate Value Theorem, this is an existence proof. It guarantees there is a fixed point. It doesn’t tell us how to find one. Calculator play does, though. Start from any old number that looks promising and work out f for that number. Then take that and put it back into f. And again. And again. This is known as “fixed point iteration”. It won’t give you the exact answer.

Not usually, anyway. In some freak cases it will. But what it will give, provided some extra conditions are satisfied, is a sequence of values that get closer and closer to the fixed point. When you’re close enough, then you stop calculating. How do you know you’re close enough? If you know something about the original f you can work out some logically rigorous estimates. Or you just keep calculating until all the decimal points you want stop changing between iterations. That’s not logically sound, but it’s easy to program.

That won’t always work. It’ll only work if the function f is differentiable on the interval (a, b). That is, it can’t have corners. And there have to be limits on how fast the function changes on the interval (a, b). If the function changes too fast, iteration can’t be guaranteed to work. But often if we’re interested in a function at all then these conditions will be true, or we can think of a related function that for which they are true.

And even if it works it won’t always work well. It can take an enormous pile of calculations to get near the fixed point. But this is why we have computers, and why we can leave them to work overnight.

And yet such a simple idea works. It appears in ancient times, in a formula for finding the square root of an arbitrary positive number ‘N’. (Find the fixed point for $f(x) = \frac{1}{2}\left(\frac{N}{x} + x\right)$). It creeps into problems that don’t look like fixed points. Calculus students learn of something called the Newton-Raphson Iteration. It finds roots, points where a function f(x) equals zero. Mathematics majors learn of numerical methods to solve ordinary differential equations. The most stable of these are again fixed-point iteration schemes, albeit in disguise.

## Why Someone Should Take That Deal

Let me start answering my Deal or No Deal-based question by just pointing to Chiaroscuro’s answer, which does the arithmetic exactly right and comes to a quite sensible conclusion from it. This leaves me feeling like I’m not quite earning my pay here, so let me go into further depth and ask that someone pay me.

## Some More Comic Strips

I might turn this into a regular feature. A couple more comic strips, all this week on gocomics.com, ran nice little mathematically-linked themes, and as far as I can tell I’m the only one who reads any of them so I might spread the word some.

Grant Snider’s Incidental Comics returns again with the Triangle Circus, in his strip of the 12th of March. This strip is also noteworthy for making use of “scalene”, which is also known as “that other kind of triangle” which nobody can remember the name for. (He’s had several other math-panel comic strips, and I really enjoy how full he stuffs the panels with drawings and jokes in most strips.)

Dave Blazek’s Loose Parts from the 15th of March puts up a version of the Cretan Paradox that amused me much more than I thought it would at first glance. I kept thinking back about it and grinning. (This blurs the line between mathematics and philosophy, but those lines have always been pretty blurred, particularly in the hotly disputed territory of Logic.)

Bud Fisher’s Mutt and Jeff is in reruns, of course, and shows a random scattering of strips from the 1930s and 1940s and, really, seem to show off how far we’ve advanced in efficiency in setup-and-punchline since the early 20th century. But the rerun from the 17th of March (I can’t make out the publication date, although the figures in the article probably could be used to guess at the year) does demonstrate the sort of estimating-a-value that’s good mental exercise too.

I note that where Mutt divides 150,000,000 into 700,000,000 I would instead have divided the 150 million into 750,000,000, because that’s a much easier problem, and he just wanted an estimate anyway. It would get to the estimate of ten cents a week later in the word balloon more easily that way, too. But making estimates and approximations are in part an art. But I don’t think of anything that gives me 2/3ds of a cent as an intermediate value on the way to what I want as being a good approximation.

There’s nothing fresh from Bill Whitehead’s Free Range, though I’m still reading just in case.