## How To Find Logarithms Without Using Powerful Computers

I got to remembering an old sequence of mine, and wanted to share it for my current audience. A couple years ago I read a 1949-published book about numerical computing. And it addressed a problem I knew existed but hadn’t put much thought into. That is, how to calculate the logarithm of a number? Logarithms … well, we maybe don’t need them so much now. But they were indispensable for computing for a very long time. They turn the difficult work of multiplication and division into the easier work of addition and subtraction. They turn the really hard work of exponentiation into the easier work of multiplication. So they’re great to have. But how to get them? And, particularly, how to get them if you have a computing device that’s able to do work, but not very much work?

Machines That Think About Logarithms sets out the question, including mentioning Edmund Callis Berkeley’s book that got me started on this. And some talk about the kinds of logarithms and why we use each of them.

Machines That Do Something About Logarithms sets out some principles. These are all things that are generically true about logarithms, including about calculating logarithms. They’re just the principles that were put into clever play by Harvard’s IBM Automatic Sequence-Controlled Calculator in the 1940s.

Machines That Give You Logarithms explains how to use those tools. And lays out how to get the base-ten logarithm for most numbers that you would like with a tiny bit of computing work. I showed off an example of getting the logarithm of 47.2286 using only three divisions, four additions, and a little bit of looking up stuff.

Without Machines That Think About Logarithms closes out the cycle. One catch with the algorithm described is that you need to work out some logarithms ahead of time and have them on hand, ready to look up. They’re not ones that you care about particularly for any problem, but they make it easier to find the logarithm you do want. This essay talks about which logarithms to calculate, in order to get the most accurate results for the logarithm you want, using the least custom work possible.

And there we go. Logarithms are still indispensable for mathematical work, although I realize not so much because we ever care what the logarithm of 47.2286 or any other arbitrary number is. Logarithms have some nice analytic properties, though, and they make other work easier to do. So they’re still in use, but for different problems.

Advertisements

## My 2018 Mathematics A To Z: Witch of Agnesi

Nobody had a suggested topic starting with ‘W’ for me! So I’ll take that as a free choice, and get lightly autobiogrpahical. Art by Thomas K Dye, creator of the web comics Newshounds, Something Happens, and Infinity Refugees. His current project is Projection Edge. And you can get Projection Edge six months ahead of public publication by subscribing to his Patreon. And he’s on Twitter as @Newshoundscomic.

# Witch of Agnesi.

I know I encountered the Witch of Agnesi while in middle school. Eighth grade, if I’m not mistaken. It was a footnote in a textbook. I don’t remember much of the textbook. What I mostly remember of the course was how much I did not fit with the teacher. The only relief from boredom that year was the month we had a substitute and the occasional interesting footnote.

It was in a chapter about graphing equations. That is, finding curves whose points have coordinates that satisfy some equation. In a bit of relief from lines and parabolas the footnote offered this: $y = \frac{8a^3}{x^2 + 4a^2}$

In a weird tantalizing moment the footnote didn’t offer a picture. Or say what an ‘a’ was doing in there. In retrospect I recognize ‘a’ as a parameter, and that different values of it give different but related shapes. No hint what the ‘8’ or the ‘4’ were doing there. Nor why ‘a’ gets raised to the third power in the numerator or the second in the denominator. I did my best with the tools I had at the time. Picked a nice easy boring ‘a’. Picked out values of ‘x’ and found the corresponding ‘y’ which made the equation true, and tried connecting the dots. The result didn’t look anything like a witch. Nor a witch’s hat.

It was one of a handful of biographical notes in the book. These were a little attempt to add some historical context to mathematics. It wasn’t much. But it was an attempt to show that mathematics came from people. Including, here, from Maria Gaëtana Agnesi. She was, I’m certain, the only woman mentioned in the textbook I’ve otherwise completely forgotten.

We have few names of ancient mathematicians. Those we have are often compilers like Euclid whose fame obliterated the people whose work they explained. Or they’re like Pythagoras, credited with discoveries by people who obliterated their own identities. In later times we have the mathematics done by, mostly, people whose social positions gave them time to write mathematics results. So we see centuries where every mathematician is doing it as their side hustle to being a priest or lawyer or physician or combination of these. Women don’t get the chance to stand out here.

Today of course we can name many women who did, and do, mathematics. We can name Emmy Noether, Ada Lovelace, and Marie-Sophie Germain. Challenged to do a bit more, we can offer Florence Nightingale and Sofia Kovalevskaya. Well, and also Grace Hopper and Margaret Hamilton if we decide computer scientists count. Katherine Johnson looks likely to make that cut. But in any case none of these people are known for work understandable in a pre-algebra textbook. This must be why Agnesi earned a place in this book. She’s among the earliest women we can specifically credit with doing noteworthy mathematics. (Also physics, but that’s off point for me.) Her curve might be a little advanced for that textbook’s intended audience. But it’s not far off, and pondering questions like “why $8a^3$? Why not $a^3$?” is more pleasant, to a certain personality, than pondering what a directrix might be and why we might use one.

The equation might be a lousy way to visualize the curve described. The curve is one of that group of interesting shapes you get by constructions. That is, following some novel process. Constructions are fun. They’re almost a craft project.

For this we start with a circle. And two parallel tangent lines. Without loss of generality, suppose they’re horizontal, so, there’s lines at the top and the bottom of the curve.

Take one of the two tangent points. Again without loss of generality, let’s say the bottom one. Draw a line from that point over to the other line. Anywhere on the other line. There’s a point where the line you drew intersects the circle. There’s another point where it intersects the other parallel line. We’ll find a new point by combining pieces of these two points. The point is on the same horizontal as wherever your line intersects the circle. It’s on the same vertical as wherever your line intersects the other parallel line. This point is on the Witch of Agnesi curve.

Now draw another line. Again, starting from the lower tangent point and going up to the other parallel line. Again it intersects the circle somewhere. This gives another point on the Witch of Agnesi curve. Draw another line. Another intersection with the circle, another intersection with the opposite parallel line. Another point on the Witch of Agnesi curve. And so on. Keep doing this. When you’ve drawn all the lines that reach from the tangent point to the other line, you’ll have generated the full Witch of Agnesi curve. This takes more work than writing out $y = \frac{8a^3}{x^2 + 4a^2}$, yes. But it’s more fun. It makes for neat animations. And I think it prepares us to expect the shape of the curve.

It’s a neat curve. Between it and the lower parallel line is an area four times that of the circle that generated it. The shape is one we would get from looking at the derivative of the arctangent. So there’s some reasons someone working in calculus might find it interesting. And people did. Pierre de Fermat studied it, and found this area. Isaac Newton and Luigi Guido Grandi studied the shape, using this circle-and-parallel-lines construction. Maria Agnesi’s name attached to it after she published a calculus textbook which examined this curve. She showed, according to people who present themselves as having read her book, the curve and how to find it. And she showed its equation and found the vertex and asymptote line and the inflection points. The inflection points, here, are where the curve chances from being cupped upward to cupping downward, or vice-versa.

It’s a neat function. It’s got some uses. It’s a natural smooth-hill shape, for example. So this makes a good generic landscape feature if you’re modeling the flow over a surface. I read that solitary waves can have this curve’s shape, too.

And the curve turns up as a probability distribution. Take a fixed point. Pick lines at random that pass through this point. See where those lines reach a separate, straight line. Some regions are more likely to be intersected than are others. Chart how often any particular line is the new intersection point. That chart will (given some assumptions I ask you to pretend you agree with) be a Witch of Agnesi curve. This might not surprise you. It seems inevitable from the circle-and-intersecting-line construction process. And that’s nice enough. As a distribution it looks like the usual Gaussian bell curve.

It’s different, though. And it’s different in strange ways. Like, for a probability distribution we can find an expected value. That’s … well, what it sounds like. But this is the strange probability distribution for which the law of large numbers does not work. Imagine an experiment that produces real numbers, with the frequency of each number given by this distribution. Run the experiment zillions of times. What’s the mean value of all the zillions of generated numbers? And it … doesn’t … have one. I mean, we know it ought to, it should be the center of that hill. But the calculations for that don’t work right. Taking a bigger sample makes the sample mean jump around more, not less, the way every other distribution should work. It’s a weird idea.

Imagine carving a block of wood in the shape of this curve, with a horizontal lower bound and the Witch of Agnesi curve as the upper bound. Where would it balance? … The normal mathematical tools don’t say, even though the shape has an obvious line of symmetry. And a finite area. You don’t get this kind of weirdness with parabolas.

(Yes, you’ll get a balancing point if you actually carve a real one. This is because you work with finitely-long blocks of wood. Imagine you had a block of wood infinite in length. Then you would see some strange behavior.)

It teaches us more strange things, though. Consider interpolations, that is, taking a couple data points and fitting a curve to them. We usually start out looking for polynomials when we interpolate data points. This is because everything is polynomials. Toss in more data points. We need a higher-order polynomial, but we can usually fit all the given points. But sometimes polynomials won’t work. A problem called Runge’s Phenomenon can happen, where the more data points you have the worse your polynomial interpolation is. The Witch of Agnesi curve is one of those. Carl Runge used points on this curve, and trying to fit polynomials to those points, to discover the problem. More data and higher-order polynomials make for worse interpolations. You get curves that look less and less like the original Witch. Runge is himself famous to mathematicians, known for “Runge-Kutta”. That’s a family of techniques to solve differential equations numerically. I don’t know whether Runge came to the weirdness of the Witch of Agnesi curve from considering how errors build in numerical integration. I can imagine it, though. The topics feel related to me.

I understand how none of this could fit that textbook’s slender footnote. I’m not sure any of the really good parts of the Witch of Agnesi could even fit thematically in that textbook. At least beyond the fact of its interesting name, which any good blog about the curve will explain. That there was no picture, and that the equation was beyond what the textbook had been describing, made it a challenge. Maybe not seeing what the shape was teased the mathematician out of this bored student.

And next is ‘X’. Will I take Mr Wu’s suggestion and use that to describe something “extreme”? Or will I take another topic or suggestion? We’ll see on Friday, barring unpleasant surprises. Thanks for reading.

## I Don’t Have Any Good Ideas For Finding Cube Roots By Trigonometry

So I did a bit of thinking. There’s a prosthaphaeretic rule that lets you calculate square roots using nothing more than trigonometric functions. Is there one that lets you calculate cube roots?

And I don’t know. I don’t see where there is one. I may be overlooking an approach, though. Let me outline what I’ve thought out.

First is square roots. It’s possible to find the square root of a number between 0 and 1 using arc-cosine and cosine functions. This is done by using a trigonometric identity called the double-angle formula. This formula, normally, you use if you know the cosine of a particular angle named θ and want the cosine of double that angle: $\cos\left(2\theta\right) = 2 \cos^2\left(\theta\right) - 1$

If we suppose the number whose square we want is $\cos^2\left(\theta\right)$ then we can find $\cos\left(\theta\right)$. The calculation on the right-hand side of this is easy; double your number and subtract one. Then to the lookup table; find the angle whose cosine is that number. That angle is two times θ. So divide that angle in two. Cosine of that is, well, $\cos\left(\theta\right)$ and most people would agree that’s a square root of $\cos^2\left(\theta\right)$ without any further work.

Why can’t I do the same thing with a triple-angle formula? … Well, here’s my choices among the normal trig functions: $\cos\left(3\theta\right) = 4 \cos^3\left(\theta\right) - 3\cos\left(\theta\right)$ $\sin\left(3\theta\right) = 3 \sin\left(\theta\right) - 4\sin^3\left(\theta\right)$ $\tan\left(3\theta\right) = \frac{3 \tan\left(\theta\right) - \tan^3\left(\theta\right)}{1 - 3 \tan^2\left(\theta\right)}$

Yes, I see you in the corner, hopping up and down and asking about the cosecant. It’s not any better. Trust me.

So you see the problem here. The number whose cube root I want has to be the $\cos^3\left(\theta\right)$. Or the cube of the sine of theta, or the cube of the tangent of theta. Whatever. The trouble is I don’t see a way to calculate cosine (sine, tangent) of 3θ, or 3 times the cosine (etc) of θ. Nor to get some other simple expression out of that. I can get mixtures of the cosine of 3θ plus the cosine of θ, sure. But that doesn’t help me figure out what θ is.

Can it be worked out? Oh, sure, yes. There’s absolutely approximation schemes that would let me find a value of θ which makes true, say, $4 \cos^3\left(\theta\right) - 3 \cos\left(\theta\right) = 0.5$

But: is there a way takes less work than some ordinary method of calculating a cube root? Even if you allow some work to be done by someone else ahead of time, such as by computing a table of trig functions? … If there is, I don’t see it. So there’s another point in favor of logarithms. Finding a cube root using a logarithm table is no harder than finding a square root, or any other root.

If you’re using trig tables, you can find a square root, or a fourth root, or an eighth root. Cube roots, if I’m not missing something, are beyond us. So are, I imagine, fifth roots and sixth roots and seventh roots and so on. I could protest that I have never in my life cared what the seventh root of a thing is, but it would sound like a declaration of sour grapes. Too bad.

If I have missed something, it’s probably obvious. Please go ahead and tell me what it is.

## How To Calculate A Square Root By A Method You Will Never Actually Use

Sunday’s comics post got me thinking about ways to calculate square roots besides using the square root function on a calculator. I wondered if I could find my own little approach. Maybe something that isn’t iterative. Iterative methods are great in that they tend to forgive numerical errors. All numerical calculations carry errors with them. But they can involve a lot of calculation and, in principle, never finish. You just give up when you think the answer is good enough. A non-iterative method carries the promise that things will, someday, end.

And I found one! It’s a neat little way to find the square root of a number between 0 and 1. Call the number ‘S’, as in square. I’ll give you the square root from it. Here’s how.

First, take S. Multiply S by two. Then subtract 1 from this.

Next. Find the angle — I shall call it 2A — whose cosine is this number 2S – 1.

You have 2A? Great. Divide that in two, so that you get the angle A.

Now take the cosine of A. This will be the (positive) square root of S. (You can find the negative square root by taking minus this.)

Let me show it in action. Let’s say you want the square root of 0.25. So let S = 0.25. And then 2S – 1 is two times 0.25 (which is 0.50) minus 1. That’s -0.50. What angle has cosine of -0.50? Well, that’s an angle of 2 π / 3 radians. Mathematicians think in radians. People think in degrees. And you can do that too. This is 120 degrees. Divide this by two. That’s an angle of π / 3 radians, or 60 degrees. The cosine of π / 3 is 0.5. And, indeed, 0.5 is the square root of 0.25.

I hear you protesting already: what if we want the square root of something larger than 1? Like, how is this any good in finding the square root of 81? Well, if we add a little step before and after this work, we’re in good shape. Here’s what.

So we start with some number larger than 1. Say, 81. Fine. Divide it by 100. If it’s still larger than 100, divide it again, and again, until you get a number smaller than 1. Keep track of how many times you did this. In this case, 81 just has to be divided by 100 the one time. That gives us 0.81, a number which is smaller than 1.

Twice 0.81 minus 1 is equal to 0.62. The angle which has 0.81 as cosine is roughly 0.90205. Half this angle is about 0.45103. And the cosine of 0.45103 is 0.9. This is looking good, but obviously 0.9 is no square root of 81.

Ah, but? We divided 81 by 100 to get it smaller than 1. So we balance that by multiplying 0.9 by 10 to get it back larger than 1. If we had divided by 100 twice to start with, we’d multiply by 10 twice to finish. If we had divided by 100 six times to start with, we’d multiply by 10 six times to finish. Yes, 10 is the square root of 100. You see what’s going on here.

(And if you want the square root of a tiny number, something smaller than 0.01, it’s not a bad idea to multiply it by 100, maybe several times over. Then calculate the square root, and divide the result by 10 a matching number of times. It’s hard to calculate with very big or with very small numbers. If you must calculate, do it on very medium numbers. This is one of those little things you learn in numerical mathematics.)

So maybe now you’re convinced this works. You may not be convinced of why this works. What I’m using here is a trigonometric identity, one of the angle-doubling formulas. Its heart is this identity. It’s familiar to students whose Intro to Trigonometry class is making them finally, irrecoverably hate mathematics: $\cos\left(2\theta\right) = 2 \cos^2\left(\theta\right) - 1$

Here, I let ‘S’ be the squared number, $\cos^2\left(\theta\right)$. So then anything I do to find $\cos\left(\theta\right)$ gets me the square root. The algebra here is straightforward. Since ‘S’ is that cosine-squared thing, all I have to do is double it, subtract one, and then find what angle 2θ has that number as cosine. Then the cosine of θ has to be the square root.

Oh, yeah, all right. There’s an extra little objection. In what world is it easier to take an arc-cosine (to figure out what 2θ is) and then later to take a cosine? … And the answer is, well, any world where you’ve already got a table printed out of cosines of angles and don’t have a calculator on hand. This would be a common condition through to about 1975. And not all that ridiculous through to about 1990.

This is an example of a prosthaphaeretic rule. These are calculation tools. They’re used to convert multiplication or division problems into addition and subtraction. The idea is exactly like that of logarithms and exponents. Using trig functions predates logarithms. People knew about sines and cosines long before they knew about logarithms and exponentials. But the impulse is the same. And you might, if you squint, see in my little method here an echo of what you’d do more easily with a logarithm table. If you had a log table, you’d calculate $\exp\left(\frac{1}{2}\log\left(S\right)\right)$ instead. But if you don’t have a log table, and only have a table of cosines, you can calculate $\cos\left(\frac{1}{2}\arccos\left(2 S - 1 \right)\right)$ at least.

Is this easier than normal methods of finding square roots? … If you have a table of cosines, yes. Definitely. You have to scale the number into range (divide by 100 some) do an easy multiplication (S times 2), an easy subtraction (minus 1), a table lookup (arccosine), an easy division (divide by 2), another table lookup (cosine), and scale the number up again (multiply by 10 some). That’s all. Seven steps, and two of them are reading. Two of the rest are multiplying or dividing by 10’s. Using logarithm tables has it beat, yes, at five steps (two that are scaling, two that are reading, one that’s dividing by 2). But if you can’t find your table of logarithms, and do have a table of cosines, you’re set.

This may not be practical, since who has a table of cosines anymore? Who hasn’t also got a calculator that does square roots faster? But it delighted me to work this scheme out. Give me a while and maybe I’ll think about cube roots.

## Reading the Comics, October 4, 2016: Split Week Edition Part 1

The last week in mathematically themed comics was a pleasant one. By “a pleasant one” I mean Comic Strip Master Command sent enough comics out that I feel comfortable splitting them across two essays. Look for the other half of the past week’s strips in a couple days at a very similar URL.

Mac King and Bill King’s Magic in a Minute feature for the 2nd shows off a bit of number-pattern wonder. Set numbers in order on a four-by-four grid and select four as directed and add them up. You get the same number every time. It’s a cute trick. I would not be surprised if there’s some good group theory questions underlying this, like about what different ways one could arrange the numbers 1 through 16. Or for what other size grids the pattern will work for: 2 by 2? (Obviously.) 3 by 3? 5 by 5? 6 by 6? I’m not saying I actually have been having fun doing this. I just sense there’s fun to be had there.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 2nd is based on one of those weirdnesses of the way computers add. I remember in the 90s being on a Java mailing list. Routinely it would draw questions from people worried that something was very wrong, as adding 0.01 to a running total repeatedly wouldn’t get to exactly 1.00. Java was working correctly, in that it was doing what the specifications said. It’s just the specifications didn’t work quite like new programmers expected.

What’s going on here is the same problem you get if you write down 1/3 as 0.333. You know that 1/3 plus 1/3 plus 1/3 ought to be 1 exactly. But 0.333 plus 0.333 plus 0.333 is 0.999. 1/3 is really a little bit more than 0.333, but we skip that part because it’s convenient to use only a few points past the decimal. Computers normally represent real-valued numbers with a scheme called floating point representation. At heart, that’s representing numbers with a couple of digits. Enough that we don’t normally see the difference between the number we want and the number the computer represents.

Every number base has some rational numbers it can’t represent exactly using finitely many digits. Our normal base ten, for example, has “one-third” and “two-third”. Floating point arithmetic is built on base two, and that has some problems with tenths and hundredths and thousandths. That’s embarrassing but in the main harmless. Programmers learn about these problems and how to handle them. And if they ask the mathematicians we tell them how to write code so as to keep these floating-point errors from growing uncontrollably. If they ask nice.

Random Acts of Nancy for the 3rd is a panel from Ernie Bushmiller’s Nancy. That panel’s from the 23rd of November, 1946. And it just uses mathematics in passing, arithmetic serving the role of most of Nancy’s homework. There’s a bit of spelling (I suppose) in there too, which probably just represents what’s going to read most cleanly. Random Acts is curated by Ernie Bushmiller fans Guy Gilchrist (who draws the current Nancy) and John Lotshaw.

Thom Bluemel’s Birdbrains for the 4th depicts the discovery of a new highest number. When humans discovered ‘1’ is, I would imagine, probably unknowable. Given the number sense that animals have it’s probably something that predates humans, that it’s something we’re evolved to recognize and understand. A single stroke for 1 seems to be a common symbol for the number. I’ve read histories claiming that a culture’s symbol for ‘1’ is often what they use for any kind of tally mark. Obviously nothing in human cultures is truly universal. But when I look at number symbols other than the Arabic and Roman schemes I’m used to, it is usually the symbol for ‘1’ that feels familiar. Then I get to the Thai numeral and shrug at my helplessness.

Bill Amend’s FoxTrot Classics for the 4th is a rerun of the strip from the 11th of October, 2005. And it’s made for mathematics people to clip out and post on the walls. Jason and Marcus are in their traditional nerdly way calling out sequences of numbers. Jason’s is the Fibonacci Sequence, which is as famous as mathematics sequences get. That’s the sequence of numbers in which every number is the sum of the previous two terms. You can start that sequence with 0 and 1, or with 1 and 1, or with 1 and 2. It doesn’t matter.

Marcus calls out the Perrin Sequence, which I neve heard of before either. It’s like the Fibonacci Sequence. Each term in it is the sum of two other terms. Specifically, each term is the sum of the second-previous and the third-previous terms. And it starts with the numbers 3, 0, and 2. The sequence is named for François Perrin, who described it in 1899, and that’s as much as I know about him. The sequence describes some interesting stuff. Take n points and put them in a ‘cycle graph’, which looks to the untrained eye like a polygon with n corners and n sides. You can pick subsets of those points. A maximal independent set is the biggest subset you can make that doesn’t fit into another subset. And the number of these maximal independent sets in a cyclic graph is the n-th number in the Perrin sequence. I admit this seems like a nice but not compelling thing to know. But I’m not a cyclic graph kind of person so what do I know?

Jeffrey Caulfield and Alexandre Rouillard’s Mustard and Boloney for the 4th is the anthropomorphic numerals joke for this essay and I was starting to worry we wouldn’t get one.

## Theorem Thursday: A First Fixed Point Theorem

I’m going to let the Mean Value Theorem slide a while. I feel more like a Fixed Point Theorem today. As with the Mean Value Theorem there’s several of these. Here I’ll start with an easy one.

# The Fixed Point Theorem.

Back when the world and I were young I would play with electronic calculators. They encouraged play. They made it so easy to enter a number and hit an operation, and then hit that operation again, and again and again. Patterns appeared. Start with, say, ‘2’ and hit the ‘squared’ button, the smaller ‘2’ raised up from the key’s baseline. You got 4. And again: 16. And again: 256. And again and again and you got ever-huger numbers. This happened whenever you started from a number bigger than 1. Start from something smaller than 1, however tiny, and it dwindled down to zero, whatever you tried. Start at ‘1’ and it just stays there. The results were similar if you started with negative numbers. The first squaring put you in positive numbers and everything carried on as before.

This sort of thing happened a lot. Keep hitting the mysterious ‘exp’ and the numbers would keep growing forever. Keep hitting ‘sqrt’; if you started above 1, the numbers dwindled to 1. Start below and the numbers rise to 1. Or you started at zero, but who’s boring enough to do that? ‘log’ would start with positive numbers and keep dropping until it turned into a negative number. The next step was the calculator’s protest we were unleashing madness on the world.

But you didn’t always get zero, one, infinity, or madness, from repeatedly hitting the calculator button. Sometimes, some functions, you’d get an interesting number. If you picked any old number and hit cosine over and over the digits would eventually settle down to around 0.739085. Or -0.739085. Cosine’s great. Tangent … tangent is weird. Tangent does all sorts of bizarre stuff. But at least cosine is there, giving us this interesting number.

(Something you might wonder: this is the cosine of an angle measured in radians, which is how mathematicians naturally think of angles. Normal people measure angles in degrees, and that will have a different fixed point. We write both the cosine-in-radians and the cosine-in-degrees using the shorthand ‘cos’. We get away with this because people who are confused by this are too embarrassed to call us out on it. If we’re thoughtful we write, say, ‘cos x’ for radians and ‘cos x°’ for degrees. This makes the difference obvious. It doesn’t really, but at least we gave some hint to the reader.)

This all is an example of a fixed point theorem. Fixed point theorems turn up in a lot of fields. They were most impressed upon me in dynamical systems, studying how a complex system changes in time. A fixed point, for these problems, is an equilibrium. It’s where things aren’t changed by a process. You can see where that’s interesting.

In this series I haven’t stated theorems exactly much, and I haven’t given them real proofs. But this is an easy one to state and to prove. Start off with a function, which I’ll name ‘f’, because yes that is exactly how much effort goes in to naming functions. It has as a domain the interval [a, b] for some real numbers ‘a’ and ‘b’. And it has as rang the same interval, [a, b]. It might use the whole range; it might use only a subset of it. And we have to require that f is continuous.

Then there has to be at least one fixed point. There must be at last one number ‘c’, somewhere in the interval [a, b], for which f(c) equals c. There may be more than one; we don’t say anything about how many there are. And it can happen that c is equal to a. Or that c equals b. We don’t know that it is or that it isn’t. We just know there’s at least one ‘c’ that makes f(c) equal c.

You get that in my various examples. If the function f has the rule that any given x is matched to x2, then we do get two fixed points: f(0) = 02 = 0, and, f(1) = 12 = 1. Or if f has the rule that any given x is matched to the square root of x, then again we have: $f(0) = \sqrt{0} = 0$ and $f(1) = \sqrt{1} = 1$. Same old boring fixed points. The cosine is a little more interesting. For that we have $f(0.739085...) = \cos\left(0.739085...\right) = 0.739085...$.

How to prove it? The easiest way I know is to summon the Intermediate Value Theorem. Since I wrote a couple hundred words about that a few weeks ago I can assume you to understand it perfectly and have no question about how it makes this problem easy. I don’t even need to go on, do I?

… Yeah, fair enough. Well, here’s how to do it. We’ll take the original function f and create, based on it, a new function. We’ll dig deep in the alphabet and name that ‘g’. It has the same domain as f, [a, b]. Its range is … oh, well, something in the real numbers. Don’t care. The wonder comes from the rule we use.

The rule for ‘g’ is this: match the given number ‘x’ with the number ‘f(x) – x’. That is, g(a) equals whatever f(a) would be, minus a. g(b) equals whatever f(b) would be, minus b. We’re allowed to define a function in terms of some other function, as long as the symbols are meaningful. But we aren’t doing anything wrong like dividing by zero or taking the logarithm of a negative number or asking for f where it isn’t defined.

You might protest that we don’t know what the rule for f is. We’re told there is one, and that it’s a continuous function, but nothing more. So how can I say I’ve defined g in terms of a function I don’t know?

In the first place, I already know everything about f that I need to. I know it’s a continuous function defined on the interval [a, b]. I won’t use any more than that about it. And that’s great. A theorem that doesn’t require knowing much about a function is one that applies to more functions. It’s like the difference between being able to say something true of all living things in North America, and being able to say something true of all persons born in Redbank, New Jersey, on the 18th of February, 1944, who are presently between 68 and 70 inches tall and working on their rock operas. Both things may be true, but one of those things you probably use more.

In the second place, suppose I gave you a specific rule for f. Let me say, oh, f matches x with the arccosecant of x. Are you feeling any more enlightened now? Didn’t think so.

Back to g. Here’s some things we can say for sure about it. g is a function defined on the interval [a, b]. That’s how we set it up. Next point: g is a continuous function on the interval [a, b]. Remember, g is just the function f, which was continuous, minus x, which is also continuous. The difference of two continuous functions is still going to be continuous. (This is obvious, although it may take some considered thinking to realize why it is obvious.)

Now some interesting stuff. What is g(a)? Well, it’s whatever number f(a) is minus a. I can’t tell you what number that is. But I can tell you this: it’s not negative. Remember that f(a) has to be some number in the interval [a, b]. That is, it’s got to be no smaller than a. So the smallest f(a) can be is equal to a, in which case f(a) minus a is zero. And f(a) might be larger than a, in which case f(a) minus a is positive. So g(a) is either zero or a positive number.

(If you’ve just realized where I’m going and gasped in delight, well done. If you haven’t, don’t worry. You will. You’re just out of practice.)

What about g(b)? Since I don’t know what f(b) is, I can’t tell you what specific number it is. But I can tell you it’s not a positive number. The reasoning is just like above: f(b) is some number on the interval [a, b]. So the biggest number f(b) can equal is b. And in that case f(b) minus b is zero. If f(b) is any smaller than b, then f(b) minus b is negative. So g(b) is either zero or a negative number.

(Smiling at this? Good job. If you aren’t, again, not to worry. This sort of argument is not the kind of thing you do in Boring Algebra. It takes time and practice to think this way.)

And now the Intermediate Value Theorem works. g(a) is a positive number. g(b) is a negative number. g is continuous from a to b. Therefore, there must be some number ‘c’, between a and b, for which g(c) equals zero. And remember what g(c) means: f(c) – c equals 0. Therefore f(c) has to equal c. There has to be a fixed point.

And some tidying up. Like I said, g(a) might be positive. It might also be zero. But if g(a) is zero, then f(a) – a = 0. So a would be a fixed point. And similarly if g(b) is zero, then f(b) – b = 0. So then b would be a fixed point. The important thing is there must be at least some fixed point.

Now that calculator play starts taking on purposeful shape. Squaring a number could find a fixed point only if you started with a number from -1 to 1. The square of a number outside this range, such as ‘2’, would be bigger than you started with, and the Fixed Point Theorem doesn’t apply. Similarly with exponentials. But square roots? The square root of any number from 0 to a positive number ‘b’ is a number between 0 and ‘b’, at least as long as b was bigger than 1. So there was a fixed point, at 1. The cosine of a real number is some number between -1 and 1, and the cosines of all the numbers between -1 and 1 are themselves between -1 and 1. The Fixed Point Theorem applies. Tangent isn’t a continuous function. And the calculator play never settles on anything.

As with the Intermediate Value Theorem, this is an existence proof. It guarantees there is a fixed point. It doesn’t tell us how to find one. Calculator play does, though. Start from any old number that looks promising and work out f for that number. Then take that and put it back into f. And again. And again. This is known as “fixed point iteration”. It won’t give you the exact answer.

Not usually, anyway. In some freak cases it will. But what it will give, provided some extra conditions are satisfied, is a sequence of values that get closer and closer to the fixed point. When you’re close enough, then you stop calculating. How do you know you’re close enough? If you know something about the original f you can work out some logically rigorous estimates. Or you just keep calculating until all the decimal points you want stop changing between iterations. That’s not logically sound, but it’s easy to program.

That won’t always work. It’ll only work if the function f is differentiable on the interval (a, b). That is, it can’t have corners. And there have to be limits on how fast the function changes on the interval (a, b). If the function changes too fast, iteration can’t be guaranteed to work. But often if we’re interested in a function at all then these conditions will be true, or we can think of a related function that for which they are true.

And even if it works it won’t always work well. It can take an enormous pile of calculations to get near the fixed point. But this is why we have computers, and why we can leave them to work overnight.

And yet such a simple idea works. It appears in ancient times, in a formula for finding the square root of an arbitrary positive number ‘N’. (Find the fixed point for $f(x) = \frac{1}{2}\left(\frac{N}{x} + x\right)$). It creeps into problems that don’t look like fixed points. Calculus students learn of something called the Newton-Raphson Iteration. It finds roots, points where a function f(x) equals zero. Mathematics majors learn of numerical methods to solve ordinary differential equations. The most stable of these are again fixed-point iteration schemes, albeit in disguise.

They all share this almost playful backbone.

## Theorem Thursday: One Mean Value Theorem Of Many

For this week I have something I want to follow up on. We’ll see if I make it that far.

# The Mean Value Theorem.

My subject line disagrees with the header just above here. I want to talk about the Mean Value Theorem. It’s one of those things that turns up in freshman calculus and then again in Analysis. It’s introduced as “the” Mean Value Theorem. But like many things in calculus it comes in several forms. So I figure to talk about one of them here, and another form in a while, when I’ve had time to make up drawings.

Calculus can split effortlessly into two kinds of things. One is differential calculus. This is the study of continuity and smoothness. It studies how a quantity changes if someting affecting it changes. It tells us how to optimize things. It tells us how to approximate complicated functions with simpler ones. Usually polynomials. It leads us to differential equations, problems in which the rate at which something changes depends on what value the thing has.

The other kind is integral calculus. This is the study of shapes and areas. It studies how infinitely many things, all infinitely small, add together. It tells us what the net change in things are. It tells us how to go from information about every point in a volume to information about the whole volume.

They aren’t really separate. Each kind informs the other, and gives us tools to use in studying the other. And they are almost mirrors of one another. Differentials and integrals are not quite inverses, but they come quite close. And as a result most of the important stuff you learn in differential calculus has an echo in integral calculus. The Mean Value Theorem is among them.

The Mean Value Theorem is a rule about functions. In this case it’s functions with a domain that’s an interval of the real numbers. I’ll use ‘a’ as the name for the smallest number in the domain and ‘b’ as the largest number. People talking about the Mean Value Theorem often do. The range is also the real numbers, although it doesn’t matter which ones.

I’ll call the function ‘f’ in accord with a longrunning tradition of not working too hard to name functions. What does matter is that ‘f’ is continuous on the interval [a, b]. I’ve described what ‘continuous’ means before. It means that here too.

And we need one more thing. The function f has to be differentiable on the interval (a, b). You maybe noticed that before I wrote [a, b], and here I just wrote (a, b). There’s a difference here. We need the function to be continuous on the “closed” interval [a, b]. That is, it’s got to be continuous for ‘a’, for ‘b’, and for every point in-between.

But we only need the function to be differentiable on the “open” interval (a, b). That is, it’s got to be continuous for all the points in-between ‘a’ and ‘b’. If it happens to be differentiable for ‘a’, or for ‘b’, or for both, that’s great. But we won’t turn away a function f for not being differentiable at those points. Only the interior. That sort of distinction between stuff true on the interior and stuff true on the boundaries is common. This is why mathematicians have words for “including the boundaries” (“closed”) and “never minding the boundaries” (“open”).

As to what “differentiable” is … A function is differentiable at a point if you can take its derivative at that point. I’m sure that clears everything up. There are many ways to describe what differentiability is. One that’s not too bad is to imagine zooming way in on the curve representing a function. If you start with a big old wobbly function it waves all around. But pick a point. Zoom in on that. Does the function stay all wobbly, or does it get more steady, more straight? Keep zooming in. Does it get even straighter still? If you zoomed in over and over again on the curve at some point, would it look almost exactly like a straight line?

If it does, then the function is differentiable at that point. It has a derivative there. The derivative’s value is whatever the slope of that line is. The slope is that thing you remember from taking Boring Algebra in high school. That rise-over-run thing. But this derivative is a great thing to know. You could approximate the original function with a straight line, with slope equal to that derivative. Close to that point, you’ll make a small enough error nobody has to worry about it.

That there will be this straight line approximation isn’t true for every function. Here’s an example. Picture a line that goes up and then takes a 90-degree turn to go back down again. Look at the corner. However close you zoom in on the corner, there’s going to be a corner. It’s never going to look like a straight line; there’s a 90-degree angle there. It can be a smaller angle if you like, but any sort of corner breaks this differentiability. This is a point where the function isn’t differentiable.

There are functions that are nothing but corners. They can be differentiable nowhere, or only at a tiny set of points that can be ignored. (A set of measure zero, as the dialect would put it.) Mathematicians discovered this over the course of the 19th century. They got into some good arguments about how that can even make sense. It can get worse. Also found in the 19th century were functions that are continuous only at a single point. This smashes just about everyone’s intuition. But we can’t find a definition of continuity that’s as useful as the one we use now and avoids that problem. So we accept that it implies some pathological conclusions and carry on as best we can.

Now I get to the Mean Value Theorem in its differential calculus pelage. It starts with the endpoints, ‘a’ and ‘b’, and the values of the function at those points, ‘f(a)’ and ‘f(b)’. And from here it’s easiest to figure what’s going on if you imagine the plot of a generic function f. I recommend drawing one. Just make sure you draw it without lifting the pen from paper, and without including any corners anywhere. Something wiggly.

Draw the line that connects the ends of the wiggly graph. Formally, we’re adding the line segment that connects the points with coordinates (a, f(a)) and (b, f(b)). That’s coordinate pairs, not intervals. That’s clear in the minds of the mathematicians who don’t see why not to use parentheses over and over like this. (We are short on good grouping symbols like parentheses and brackets and braces.)

Per the Mean Value Theorem, there is at least one point whose derivative is the same as the slope of that line segment. If you were to slide the line up or down, without changing its orientation, you’d find something wonderful. Most of the time this line intersects the curve, crossing from above to below or vice-versa. But there’ll be at least one point where the shifted line is “tangent”, where it just touches the original curve. Close to that touching point, the “tangent point”, the shifted line and the curve blend together and can’t be easily told apart. As long as the function is differentiable on the open interval (a, b), and continuous on the closed interval [a, b], this will be true. You might convince yourself of it by drawing a couple of curves and taking a straightedge to the results.

This is an existence theorem. Like the Intermediate Value Theorem, it doesn’t tell us which point, or points, make the thing we’re interested in true. It just promises us that there is some point that does it. So it gets used in other proofs. It lets us mix information about intervals and information about points.

It’s tempting to try using it numerically. It looks as if it justifies a common differential-calculus trick. Suppose we want to know the value of the derivative at a point. We could pick a little interval around that point and find the endpoints. And then find the slope of the line segment connecting the endpoints. And won’t that be close enough to the derivative at the point we care about?

Well. Um. No, we really can’t be sure about that. We don’t have any idea what interval might make the derivative of the point we care about equal to this line-segment slope. The Mean Value Theorem won’t tell us. It won’t even tell us if there exists an interval that would let that trick work. We can’t invoke the Mean Value Theorem to let us get away with that.

Often, though, we can get away with it. Differentiable functions do have to follow some rules. Among them is that if you do pick a small enough interval then approximations that look like this will work all right. If the function flutters around a lot, we need a smaller interval. But a lot of the functions we’re interested in don’t flutter around that much. So we can get away with it. And there’s some grounds to trust in getting away with it. The Mean Value Theorem isn’t any part of the grounds. It just looks so much like it ought to be.

I hope on a later Thursday to look at an integral-calculus form of the Mean Value Theorem.

## What’s The Shortest Proof I’ve Done?

I didn’t figure to have a bookend for last week’s “What’s The Longest Proof I’ve Done? question. I don’t keep track of these things, after all. And the length of a proof must be a fluid concept. If I show something is a direct consequence of a previous theorem, is the proof’s length the two lines of new material? Or is it all the proof of the previous theorem plus two new lines?

I would think the shortest proof I’d done was showing that the logarithm of 1 is zero. This would be starting from the definition of the natural logarithm of a number x as the definite integral of 1/t on the interval from 1 to x. But that requires a bunch of analysis to support the proof. And the Intermediate Value Theorem. Does that stuff count? Why or why not?

But this happened to cross my desk: The Shortest-Known Paper Published in a Serious Math Journal: Two Succinct Sentences, an essay by Dan Colman. It reprints a paper by L J Lander and T R Parkin which appeared in the Bulletin of the American Mathematical Society in 1966.

It’s about Euler’s Sums of Powers Conjecture. This is a spinoff of Fermat’s Last Theorem. Leonhard Euler observed that you need at least two whole numbers so that their squares add up to a square. And you need three cubes of whole numbers to add up to the cube of a whole number. Euler speculated you needed four whole numbers so that their fourth powers add up to a fourth power, five whole numbers so that their fifth powers add up to a fifth power, and so on.

And it’s not so. Lander and Parkin found that this conjecture is false. They did it the new old-fashioned way: they set a computer to test cases. And they found four whole numbers whose fifth powers add up to a fifth power. So the quite short paper answers a long-standing question, and would be hard to beat for accessibility.

There is another famous short proof sometimes credited as the most wordless mathematical presentation. Frank Nelson Cole gave it on the 31st of October, 1903. It was about the Mersenne number 267-1, or in human notation, 147,573,952,589,676,412,927. It was already known the number wasn’t prime. (People wondered because numbers of the form 2n-1 often lead us to perfect numbers. And those are interesting.) But nobody knew which factors it was. Cole gave his talk by going up to the board, working out 267-1, and then moving to the other side of the board. There he wrote out 193,707,721 × 761,838,257,287, and showed what that was. Then, per legend, he sat down without ever saying a word, and took in the standing ovation.

I don’t want to cast aspersions on a great story like that. But mathematics is full of great stories that aren’t quite so. And I notice that one of Cole’s doctoral students was Eric Temple Bell. Bell gave us a great many tales of mathematics history that are grand and great stories that just weren’t so. So I want it noted that I don’t know where we get this story from, or how it may have changed in the retellings. But Cole’s proof is correct, at least according to Octave.

So not every proof is too long to fit in the universe. But then I notice that Mathworld’s page regarding the Euler Sum of Powers Conjecture doesn’t cite the 1966 paper. It cites instead Lander and Parkin’s “A Counterexample to Euler’s Sum of Powers Conjecture” from Mathematics of Computation volume 21, number 97, of 1967. There the paper has grown to three pages, although it’s only a couple paragraphs of one page and three lines of citation on the third. It’s not so easy to read either, but it does explain how they set about searching for counterexamples. But it may give you some better idea of how numerical mathematicians find things.

## Theorem Thursday: What Is Cramer’s Rule?

KnotTheorist asked for this one during my appeal for theorems to discuss. And I’m taking an open interpretation of what a “theorem” is. I can do a rule.

# Cramer’s Rule

I first learned of Cramer’s Rule in the way I expect most people do. It was an algebra course. I mean high school algebra. By high school algebra I mean you spend roughly eight hundred years learning ways to solve for x or to plot y versus x. Then take a pause for polar coordinates and matrices. Then you go back to finding both x and y.

Cramer’s Rule came up in the context of solving simultaneous equations. You have more than one variable. So x and y. Maybe z. Maybe even a w, before whoever set up the problem gives up and renames everything x1 and x2 and x62 and all that. You also have more than one equation. In fact, you have exactly as many equations as you have variables. Are there any sets of values those variables can have which make all those variable true simultaneously? Thus the imaginative name “simultaneous equations” or the search for “simultaneous solutions”.

If all the equations are linear then we can always say whether there’s simultaneous solutions. By “linear” we mean what we always mean in mathematics, which is, “something we can handle”. But more exactly it means the equations have x and y and whatever other variables only to the first power. No x-squared or square roots of y or tangents of z or anything. (The equations are also allowed to omit a variable. That is, if you have one equation with x, y, and z, and another with just x and z, and another with just y and z, that’s fine. We pretend the missing variable is there and just multiplied by zero, and proceed as before.) One way to find these solutions is with Cramer’s Rule.

Cramer’s Rule sets up some matrices based on the system of equations. If the system has two equations, it sets up three matrices. If the system has three equations, it sets up four matrices. If the system has twelve equations, it sets up thirteen matrices. You see the pattern here. And then you can take the determinant of each of these matrices. Dividing the determinant of one of these matrices by another one tells you what value of x makes all the equations true. Dividing the determinant of another matrix by the determinant of one of these matrices tells you which values of y makes all the equations true. And so on. The Rule tells you which determinants to use. It also says what it means if the determinant you want to divide by equals zero. It means there’s either no set of simultaneous solutions or there’s infinitely many solutions.

This gets dropped on us students in the vain effort to convince us knowing how to calculate determinants is worth it. It’s not that determinants aren’t worth knowing. It’s just that they don’t seem to tell us anything we care about. Not until we get into mappings and calculus and differential equations and other mathematics-major stuff. We never see it in high school.

And the hard part of determinants is that for all the cool stuff they tell us, they take forever to calculate. The determinant for a matrix with two rows and two columns isn’t bad. Three rows and three columns is getting bad. Four rows and four columns is awful. The determinant for a matrix with five rows and five columns you only ever calculate if you’ve made your teacher extremely cross with you.

So there’s the genius and the first problem with Cramer’s Rule. It takes a lot of calculating. Many any errors along the way with the calculation and your work is wrong. And worse, it won’t be wrong in an obvious way. You can find the error only by going over every single step and hoping to catch the spot where you, somehow, got 36 times -7 minus 21 times -8 wrong.

The second problem is nobody in high school algebra mentions why systems of linear equations should be interesting to solve. Oh, maybe they’ll explain how this is the work you do to figure out where two straight lines intersect. But that just shifts the “and we care because … ?” problem back one step. Later on we might come to understand the lines represent cases where something we’re interested in is true, or where it changes from true to false.

This sort of simultaneous-solution problem turns up naturally in optimization problems. These are problems where you try to find a maximum subject to some constraints. Or find a minimum. Maximums and minimums are the same thing when you think about them long enough. If all the constraints can be satisfied at once and you get a maximum (or minimum, whatever), great! If they can’t … Well, you can study how close it’s possible to get, and what happens if you loosen one or more constraint. That’s worth knowing about.

The third problem with Cramer’s Rule is that, as a method, it kind of sucks. We can be convinced that simultaneous linear equations are worth solving, or at least that we have to solve them to get out of High School Algebra. And we have computers. They can grind away and work out thirteen determinants of twelve-row-by-twelve-column matrices. They might even get an answer back before the end of the term. (The amount of work needed for a determinant grows scary fast as the matrix gets bigger.) But all that work might be meaningless.

The trouble is that Cramer’s Rule is numerically unstable. Before I even explain what that is you already sense it’s a bad thing. Think of all the good things in your life you’ve heard described as unstable. Fair enough. But here’s what we mean by numerically unstable.

Is 1/3 equal to 0.3333333? No, and we know that. But is it close enough? Sure, most of the time. Suppose we need a third of sixty million. 0.3333333 times 60,000,000 equals 19,999,998. That’s a little off of the correct 20,000,000. But I bet you wouldn’t even notice the difference if nobody pointed it out to you. Even if you did notice it you might write off the difference. “If we must, make up the difference out of petty cash”, you might declare, as if that were quite sensible in the context.

And that’s so because this multiplication is numerically stable. Make a small error in either term and you get a proportional error in the result. A small mistake will — well, maybe it won’t stay small, necessarily. But it’ll not grow too fast too quickly.

So now you know intuitively what an unstable calculation is. This is one in which a small error doesn’t necessarily stay proportionally small. It might grow huge, arbitrarily huge, and in few calculations. So your answer might be computed just fine, but actually be meaningless.

This isn’t because of a flaw in the computer per se. That is, it’s working as designed. It’s just that we might need, effectively, infinitely many digits of precision for the result to be correct. You see where there may be problems achieving that.

Cramer’s Rule isn’t guaranteed to be nonsense, and that’s a relief. But it is vulnerable to this. You can set up problems that look harmless but which the computer can’t do. And that’s surely the worst of all worlds, since we wouldn’t bother calculating them numerically if it weren’t too hard to do by hand.

(Let me direct the reader who’s unintimidated by mathematical jargon, and who likes seeing a good Wikipedia Editors quarrel, to the Cramer’s Rule Talk Page. Specifically to the section “Cramer’s Rule is useless.”)

I don’t want to get too down on Cramer’s Rule. It’s not like the numerical instability hurts every problem you might use it on. And you can, at the cost of some more work, detect whether a particular set of equations will have instabilities. That requires a lot of calculation but if we have the computer to do the work fine. Let it. And a computer can limit its numerical instabilities if it can do symbolic manipulations. That is, if it can use the idea of “one-third” rather than 0.3333333. The software package Mathematica, for example, does symbolic manipulations very well. You can shed many numerical-instability problems, although you gain the problem of paying for a copy of Mathematica.

If you just care about, or just need, one of the variables then what the heck. Cramer’s Rule lets you solve for just one or just some of the variables. That seems like a niche application to me, but it is there.

And the Rule re-emerges in pure analysis, where numerical instability doesn’t matter. When we look to differential equations, for example, we often find solutions are combinations of several independent component functions. Bases, in fact. Testing whether we have found independent bases can be done through a thing called the Wronskian. That’s a way that Cramer’s Rule appears in differential equations.

Wikipedia also asserts the use of Cramer’s Rule in differential geometry. I believe that’s a true statement, and that it will be reflected in many mechanics problems. In these we can use our knowledge that, say, energy and angular momentum of a system are constant values to tell us something of how positions and velocities depend on each other. But I admit I’m not well-read in differential geometry. That’s something which has indeed caused me pain in my scholarly life. I don’t know whether differential geometers thank Cramer’s Rule for this insight or whether they’re just glad to have got all that out of the way. (See the above Wikipedia Editors quarrel.)

I admit for all this talk about Cramer’s Rule I haven’t said what it is. Not in enough detail to pass your high school algebra class. That’s all right. It’s easy to find. MathWorld has the rule in pretty simple form. Mathworld does forget to define what it means by the vector d. (It’s the vector with components d1, d2, et cetera.) But that’s enough technical detail. If you need to calculate something using it, you can probably look closer at the problem and see if you can do it another way instead. Or you’re in high school algebra and just have to slog through it. It’s all right. Eventually you can put x and y aside and do geometry.

## A Leap Day 2016 Mathematics A To Z: Polynomials

I have another request for today’s Leap Day Mathematics A To Z term. Gaurish asked for something exciting. This should be less challenging than Dedekind Domains. I hope.

## Polynomials.

Polynomials are everything. Everything in mathematics, anyway. If humans study it, it’s a polynomial. If we know anything about a mathematical construct, it’s because we ran across it while trying to understand polynomials.

I exaggerate. A tiny bit. Maybe by three percent. But polynomials are big.

They’re easy to recognize. We can get them in pre-algebra. We make them out of a set of numbers called coefficients and one or more variables. The coefficients are usually either real numbers or complex-valued numbers. The variables we usually allow to be either real or complex-valued numbers. We take each coefficient and multiply it by some power of each variable. And we add all that up. So, polynomials are things that look like these things: $x^2 - 2x + 1$ $12 x^4 + 2\pi x^2 y^3 - 4x^3 y - \sqrt{6}$ $\ln(2) + \frac{1}{2}\left(x - 2\right) - \frac{1}{2 \cdot 2^2}\left(x - 2\right)^2 + \frac{1}{2 \cdot 2^3}\left(x - 2\right)^3 - \frac{1}{2 \cdot 2^4}\left(x - 2\right)^4 + \cdots$ $a_n x^n + a_{n - 1}x^{n - 1} + a_{n - 2}x^{n - 2} + \cdots + a_2 x^2 + a_1 x^1 + a_0$

The first polynomial maybe looks nice and comfortable. The second may look a little threatening, what with it having two variables and a square root in it, but it’s not too weird. The third is an infinitely long polynomial; you’re supposed to keep going on in that pattern, adding even more terms. The last is a generic representation of a polynomial. Each number a0, a1, a2, et cetera is some coefficient that we in principle know. It’s a good way of representing a polynomial when we want to work with it but don’t want to tie ourselves down to a particular example. The highest power we raise a variable to we call the degree of the polynomial. A second-degree polynomial, for example, has an x2 in it, but not an x3 or x4 or x18 or anything like that. A third-degree polynomial has an x3, but not x to any higher powers. Degree is a useful way of saying roughly how long a polynomial is, so it appears all over discussions of polynomials.

But why do we like polynomials? Why like them so much that MathWorld lists 1,163 pages that mention polynomials?

It’s because they’re great. They do everything we’d ever want to do and they’re great at it. We can add them together as easily as we add regular old numbers. We can subtract them as well. We can multiply and divide them. There’s even prime polynomials, just like there are prime numbers. They take longer to work out, but they’re not harder.

And they do great stuff in advanced mathematics too. In calculus we want to take derivatives of functions. Polynomials, we always can. We get another polynomial out of that. So we can keep taking derivatives, as many as we need. (We might need a lot of them.) We can integrate too. The integration produces another polynomial. So we can keep doing that as long as we need too. (We need to do this a lot, too.) This lets us solve so many problems in calculus, which is about how functions work. It also lets us solve so many problems in differential equations, which is about systems whose change depends on the current state of things.

That’s great for analyzing polynomials, but what about things that aren’t polynomials?

Well, if a function is continuous, then it might as well be a polynomial. To be a little more exact, we can set a margin of error. And we can always find polynomials that are less than that margin of error away from the original function. The original function might be annoying to deal with. The polynomial that’s as close to it as we want, though, isn’t.

Not every function is continuous. Most of them aren’t. But most of the functions we want to do work with are, or at least are continuous in stretches. Polynomials let us understand the functions that describe most real stuff.

Nice for mathematicians, all right, but how about for real uses? How about for calculations?

Oh, polynomials are just magnificent. You know why? Because you can evaluate any polynomial as soon as you can add and multiply. (Also subtract, but we think of that as addition.) Remember, x4 just means “x times x times x times x”, four of those x’s in the product. All these polynomials are easy to evaluate.

Even better, we don’t have to evaluate them. We can automate away the evaluation. It’s easy to set a calculator doing this work, and it will do it without complaint and with few unforeseeable mistakes.

Now remember that thing where we can make a polynomial close enough to any continuous function? And we can always set a calculator to evaluate a polynomial? Guess that this means about continuous functions. We have a tool that lets us calculate stuff we would want to know. Things like arccosines and logarithms and Bessel functions and all that. And we get nice easy to understand numbers out of them. For example, that third polynomial I gave you above? That’s not just infinitely long. It’s also a polynomial that approximates the natural logarithm. Pick a positive number x that’s between 0 and 4 and put it in that polynomial. Calculate terms and add them up. You’ll get closer and closer to the natural logarithm of that number. You’ll get there faster if you pick a number near 2, but you’ll eventually get there for whatever number you pick. (Calculus will tell us why x has to be between 0 and 4. Don’t worry about it for now.)

So through polynomials we can understand functions, analytically and numerically.

And they keep revealing things to us. We discovered complex-valued numbers because we wanted to find roots, values of x that make a polynomial of x equal to zero. Some formulas worked well for third- and fourth-degree polynomials. (They look like the quadratic formula, which solves second-degree polynomials. The big difference is nobody remembers what they are without looking them up.) But the formulas sometimes called for things that looked like square roots of negative numbers. Absurd! But if you carried on as if these square roots of negative numbers meant something, you got meaningful answers. And correct answers.

We wanted formulas to solve fifth- and higher-degree polynomials exactly. We can do this with second and third and fourth-degree polynomials, after all. It turns out we can’t. Oh, we can solve some of them exactly. The attempt to understand why, though, helped us create and shape group theory, the study of things that look like but aren’t numbers.

Polynomials go on, sneaking into everything. We can look at a square matrix and discover its characteristic polynomial. This allows us to find beautifully-named things like eigenvalues and eigenvectors. These reveal secrets of the matrix’s structure. We can find polynomials in the formulas that describe how many ways to split up a group of things into a smaller number of sets. We can find polynomials that describe how networks of things are connected. We can find polynomials that describe how a knot is tied. We can even find polynomials that distinguish between a knot and the knot’s reflection in the mirror.

Polynomials are everything.

## Terrible And Less-Terrible Things with Pi

We are coming around “Pi Day”, the 14th of March, again. I don’t figure to have anything thematically appropriate for the day. I figure to continue the Leap Day 2016 Mathematics A To Z, and I don’t tend to do a whole two posts in a single day. Two just seems like so many, doesn’t it?

But I would like to point people who’re interested in some π-related stuff to what I posted last year. Those posts were:

• Calculating Pi Terribly, in which I show a way to work out the value of π that’s fun and would take forever. I mean, yes, properly speaking they all take forever, but this takes forever just to get a couple of digits right. It might be fun to play with but don’t use this to get your digits of π. Really.
• Calculating Pi Less Terribly, in which I show a way to do better. This doesn’t lend itself to any fun side projects. It’s just calculations. But it gets you accurate digits a lot faster.

## Spaghetti Mathematics

Let’s ease into Monday. Win Smith with the Well Tempered Spreadsheet blog encountered one of those idle little puzzles that captures the imagination and doesn’t let go. It starts as many will with spaghetti.

I’m sure you’re intrigued too. It’s not the case that any old splitting of a strand of spaghetti will give you three pieces you can make into a triangle. You need the lengths of the three pieces to satisfy what’s imaginatively called the Triangle Inequality. That inequality demands the lengths of any two sides have to be greater than the length of the third side. So, suppose we start with spaghetti that’s 12 inches long, and we have it cut into pieces 5, 4, and 3 inches long: that’s fine. If we have it cut into pieces 9, 2, and 1 inches long, we’re stuck.

The Triangle Inequality is often known as the Cauchy Inequality, or the Cauchy-Schwarz Inequality, or the Cauchy-Bunyakovsky-Schwarz Inequality, or if that’s getting too long the CBS Inequality. And some pranksters reorder it to the Cauchy-Schwarz-Bunyakovsky Inequality. The Cauchy (etc) Inequality isn’t quite the same thing as the Triangle Inequality. But it’s an important and useful idea, and the Cauchy (etc) Inequality has the Triangle Inequality as one of its consequences. The name of it so overflows with names because mathematics history is complicated. Augustin-Louis Cauchy published the first proof of it, but for the special case of sums of series. Viktor Bunyakovsky proved a similar version of it for integrals, and has a name that’s so very much fun to say. Hermann Amandus Schwarz first put the proof into its modern form. So who deserves credit for it? Good question. If it influences your decision, know that Cauchy was incredibly prolific and has plenty of things named for him already. He’s got, without exaggeration, about eight hundred papers to his credit. Collecting all his work into a definitive volume took from 1882 to 1974.

Back to the spaghetti. The problem’s a fun one and if you follow the Twitter link above you’ll see the gritty work of mathematicians reasoning out the problem. As with every probability problem ever, the challenge is defining exactly what you’re looking for the probability of. This we call finding the “sample space”, the set of all the possible outcomes and how likely each of them are. Subtle changes in how you think of the problem will change whether you are right.

Smith cleans things up a bit, but preserves the essence of how the answer worked out. The answer that looks most likely correct was developed partly by reasoning and partly by numerical simulation. Numerical simulation is a great blessing for probability problems. Often the easiest way to figure out how likely something is will be trying it. But this does require working out the sample space, and what parts of the sample space represent what you’re interested in. With the information the numerical simulation provided, Smith was able to go back and find an analytic, purely reason-based, answer that looks plausible.

## Reading the Comics, January 15, 2015: Electric Brains and Klein Bottles Edition

I admit I don’t always find a theme running through Comic Strip Master Command’s latest set of mathematically-themed comics. The edition names are mostly so that I can tell them apart when I see a couple listed in the Popular Posts roundup anyway. Jimmy Hatlo’s Little Iodine for the 12th of January, 2016. Originally run the 7th of November, 1954.

Jimmy Hatlo’s Little Iodine is a vintage comic strip from the 1950s. It strikes me as an unlicensed adaptation of Baby Schnooks, but that’s not something for me to worry about. The particular strip, originally from the 7th of November, 1954 (and just run the 12th of January this year) interests me for its ancient views of computers. It’s from the days they were called “electric brains”. I’m also impressed that the machine on display early on is able to work out the “square root of 7921 x2 y2”. The square root of 7921 is no great feat. Being able to work with the symbols of x and y without knowing what they stand for, though, does impress me. I’m not sure there were computers which could handle that sort of symbolic manipulation in 1954. That sort of ability to work with a quantity by name rather than value is what we would buy Mathematica for, if we could afford it. It’s also at least a bit impressive that someone knows the square of 89 offhand. All told, I think this is my favorite of this essay’s set of strips. But it’s a weak field considering none of them are “students giving a snarky reply to a homework/exam/blackboard question”.

Joe Martin’s Willy and Ethel for the 13th of January is a percentages joke. Some might fault it for talking about people giving 110 percent, but of course, what is “100 percent”? If it’s the standard amount of work being done then it does seem like ten people giving 110 percent gets the job done as quickly as eleven people doing 100 percent. If work worked like that. Joe Martin’s Willy and Ethel for the 13th of January, 2016. The link will likely expire in mid-February.

Steve Sicula’s Home and Away for the 13th (a rerun from the 8th of October, 2004) gives a wrongheaded application of a decent principle. The principle is that of taking several data points and averaging their value. The problem with data is that it’s often got errors in it. Something weird happened and it doesn’t represent what it’s supposed to. Or it doesn’t represent it well. By averaging several data points together we can minimize the influence of a fluke reading. Or if we’re measuring something that changes in time, we might use a running average of the last several sampled values. In this way a short-term spike or a meaningless flutter will be minimized. We can avoid wasting time reacting to something that doesn’t matter. (The cost of this, though, is that if a trend is developing we will notice it later than we otherwise would.) Still, sometimes a data point is obviously wrong.

Zach Weinersmith’s Saturday Morning Breakfast Cereal wanted my attention, and so on the 13th it did a joke about Zeno’s Paradox. There are actually four classic Zeno’s Paradoxes, although the one riffed on here I think is the most popular. This one — the idea that you can’t finish something (leaving a room is the most common form) because you have to get halfway done, and have to get halfway to being halfway done, and halfway to halfway to halfway to being done — is often resolved by people saying that Zeno just didn’t understand that an infinite series could converge. That is, that you can add together infinitely many numbers and get a finite number. I’m inclined to think Zeno did not, somehow, think it was impossible to leave rooms. What the paradoxes as a whole get to are questions about space and time: they’re either infinitely divisible or they’re not. And either way produces effects that don’t seem to quite match our intuitions.

The next day Saturday Morning Breakfast Cereal does a joke about Klein bottles. These are famous topological constructs. At least they’re famous in the kinds of places people talk about topological constructs. It’s much like the Möbius strip, a ribbon given a twist and joined back to its edge. The Klein bottle similarly you can imagine as a cylinder stretched out into the fourth dimension, given a twist, then joined back to itself. We can’t really do this, what with it being difficult to craft four-dimensional objects. But we can imagine this, and it creates an object that doesn’t have a boundary, and has only one side. There’s not an inside or an outside. There’s no making this in the real world, but we can make nice-looking approximations, usually as bottles.

Ruben Bolling’s Super-Fun-Pak Comix for the 13th of January is an extreme installment of Chaos Butterfly. The trouble with touching Chaos Butterfly to cause disasters is that you don’t know — you can’t know — what would have happened had you not touched the butterfly. You change your luck, but there’s no way to tell whether for the better or worse. One of the commenters at Gocomics.com alludes to this problem.

Jon Rosenberg’s Scenes From A Multiverse for the 13th of January makes quite literal quantum mechanics talk about probability waves and quantum foam and the like. The wave formulation of quantum mechanics, the most popular and accessible one, describes what’s going on in equations that look much like the equations for things diffusing into space. And quantum mechanical problems are often solved by supposing that the probability distribution we’re interested in can be broken up into a series of sinusoidal waves. Representing a complex function as a set of waves is a common trick, not just in quantum mechanics, because it works so well so often. Sinusoidal waves behave in nice, predictable ways for most differential equations. So converting a hard differential equation problem into a long string of relatively easy differential equation problems is usually a good trade.

Tom Thaves’s Frank and Ernest for the 14th of January ties together the baffling worlds of grammar and negative numbers. It puts Frank and Ernest on panel with Euclid, who’s a fair enough choice to represent the foundation of (western) mathematics. He’s famous for the geometry we now call Euclidean. That’s the common everyday kind of blackboards and tabletops and solid cubes and spheres. But among his writings are compilations of arithmetic, as understood at the time. So if we know anyone in Ancient Greece to have credentials to talk about negative numbers it’s him. But the choice of Euclid traps the panel into an anachronism: the Ancient Greeks just didn’t think of negative numbers. They could work through “a lack of things” or “a shortage of something”, but a negative? That’s a later innovation. But it’s hard to think of a good rewriting of the joke. You might have Isaac Newton be consulted, but Newton makes normal people think of gravity and physics, confounding the mathematics joke. There’s a similar problem with Albert Einstein. Leibniz or Gauss should be good, but I suspect they’re not the household names that even Euclid is. And if we have to go “less famous mathematician than Gauss” we’re in real trouble. (No, not Andrew Wiles. Normal people know him as “the guy that proved Fermat’s thing”, and that’s too many words to fit on panel.) Perhaps the joke can’t be made to read cleanly and make good historic sense.

## The Set Tour, Part 11: Doughnuts And Lots Of Them

I’ve been slow getting back to my tour of commonly-used domains for several reasons. It’s been a busy season. It’s so much easier to plan out writing something than it is to write something. The usual. But one of my excuses this time is that I’m not sure the set I want to talk about is that common. But I like it, and I imagine a lot of people will like it. So that’s enough.

## T and Tn

T stands for the torus. Or the toroid, if you prefer. It’s a fun name. You know the shape. It’s a doughnut. Take a cylindrical tube and curl it around back on itself. Don’t rip it or fold it. That’s hard to do with paper or a sheet of clay or other real-world stuff. But we can imagine it easily enough. I suppose we can make a computer animation of it, if by ‘we’ we mean ‘you’.

We don’t use the whole doughnut shape for T. And no, we don’t use the hole either. What we use is the surface of the doughnut, the part that could get glazed. We ignore the inside, just the same way we had S represent the surface of a sphere (or the edge of a circle, or the boundary of a hypersphere). If there is a common symbol for the torus including the interior I don’t know it. I’d be glad to hear if someone had.

What good is the surface of a torus, though? Well, it’s a neat shape. Slice it in one direction, the way you’d cut a bagel in half, and at the slice you get the shape of a washer, the kind you fit around a nut and bolt. (An annulus, to use the trade term.) Slice it perpendicular to that, the way you’d cut it if you’re one of those people who eats half doughnuts to the amazement of the rest of us, and at the slice you get two detached circles. If you start from any point on the torus shape you can go in one direction and make a circle that loops around the doughnut’s central hole. You can go the perpendicular direction and make a circle that brushes up against but doesn’t go around the central hole. There’s some neat topology in it.

There’s also video games in it. The topology of this is just like old-fashioned video games where if you go off the edge of the screen to the right you come back around on the left, and if you go off the top you come back from the bottom. (And if you go off to the left you come back around the right, and off the bottom you come back to the top.) To go from the flat screen to the surface of a doughnut requires imagining some stretching and scrunching up of the surface, but that’s all right. (OK, in an old video game it was a kind-of flat screen.) We can imagine a nice flexible screen that just behaves.

This is a common trick to deal with boundaries. (I first wrote “to avoid having to deal with boundaries”. But this is dealing with them, by a method that often makes sense.) You just make each boundary match up with a logical other boundary. It’s not just useful in video games. Often we’ll want to study some phenomenon where the current state of things depends on the immediate neighborhood, but it’s hard to say what a logical boundary ought to be. This particularly comes up if we want to model an infinitely large surface without dealing with infinitely large things. The trick will turn up a lot in numerical simulations for that reason. (In that case, we’re in truth working with a numerical approximation of T, but that’ll be close enough.)

Tn, meanwhile, is a vector of things, each of which is a point on a torus. It’s akin to Rn or S2 x n. They’re ordered sets of things that are themselves things. There can be as many as you like. n, here, is whatever positive whole number you need.

You might wonder how big the doughnut is. When we talked about the surface of the sphere, S2, or the surface and interior, B3, we figured on a sphere with radius of 1 unless we heard otherwise. Toruses would seem to have two parameters. There’s how big the outer diameter is and how big the inner diameter is. Which do we pick?

We don’t actually care. It’s much the way we can talk about a point on the surface of a planet by the latitude and longitude of the point, and never care about how big the planet is. We can describe a point on the surface of the torus without needing to refer to how big the whole shape is or how big the hole in the middle is. A popular scheme to describe points is one that looks a lot like latitude and longitude.

Imagine the torus sitting as flat as it gets on the table. Pick a point that you find interesting.

We use some reference point that’s as good as an equator and a prime meridian. One coordinate is the angle you make going horizontally, possibly around the hole in the middle, from the reference point to the point we’re interested in. The other coordinate is the angle you make vertically, going in a loop that doesn’t go around the hole in the middle, from the reference point to the point we’re interested in. The reference point has coordinates 0, 0, as it must. If this sounds confusing it’s because I’m not using a picture. I thought making some pictures would be too much work. I’m a fool. But if you think of real torus-shaped objects it’ll come to you.

In this scheme the coordinates are both angles. Normal people would measure that in degrees, from 0 to 360, or maybe from -180 to 180. Mathematicians would measure as radians, from 0 to 2π, or from -π to +π. Whatever it is, it’s the same as the coordinates of a point on the edge of the circle, what we called S1 a few essays back. So it’s fair to say you can think of T as S1 x S1, an ordered set of points on circles.

I’ve written of these toruses as three-dimensional things. Well, two dimensional-surfaces wrapped up to suggest three-dimensional objects. You don’t have to stick with these dimensions if you don’t want or if your problem needs something else. You can make a torus that’s a three-dimensional shape in four dimensions. For me that’s easiest to imagine as a cube where the left edge and the right edge loop back and meet up, the lower and the upper edges meet up, and the front and the back edges meet up. This works well to model an infinitely large space with a nice and small block.

I like to think I can imagine a four-dimensional doughnut where every cross-section is a sphere. I may be kidding myself. There could also be a five-dimensional torus and you’re on your own working that out, or working out what to do with it.

I’m not sure there is a common standard notation for that, though. Probably the mathematician wanting to make clear she’s working with a torus in four dimensions just says so in text, and trusts that the context of her mathematics makes it clear this is no ordinary torus.

I’ve also written of these toruses as circular, as rounded shapes. That’s the most familiar torus. It’s a doughnut shape, or an O-ring shape, or an inner tube’s shape. It’s the shape you produce by taking a circle and looping it around an axis not on the ring. That’s common and that’s usually all we need.

But if you need some other torus, produced by rotating some other shape around an axis not inside it, go ahead. You’ll need to make clear what that original shape, the generator, is. You’ve seen examples of this in, for example, the washers that fit around nuts and bolts. They’re typically rectangles in cross-section. Or you might have seen that image of someone who fit together a couple dozen iMac boxes to make a giant wheel. I don’t know why you would need this, but it’s your problem, not mine. If these shapes are useful for your work, by all means, use them.

I’m not sure there is a standard notation for that sort of shape. My hunch is to say you’d define your generating shape and give it a name such as A or D. Then name the torus based on that as T(A) or T(D). But I would recommend spelling it out in text before you start using symbols like this.

## Reading the Comics, December 5, 2015: Awkward Break Edition

I confess I’m dissatisfied with this batch of Reading the Comics posts. I like having something like six to eight comics for one of these roundups. But there was this small flood of mathematically-themed comics on the 6th of December. I could either make do with a slightly short edition, or have an overstuffed edition. I suppose it’s possible to split one day’s comics across two Reading the Comics posts, but that’s crazy talk. So, a short edition today.

Jef Mallett’s Frazz for the 4th of December was part of a series in which Caulfield resists learning about reciprocals. The 4th offers a fair example of the story. At heart the joke is just the student-resisting-class, or student-resisting-story-problems. It certainly reflects a lack of motivation to learn what they are.

We use reciprocals most often to write division problems as multiplication. “a ÷ b” is the same as “a times the reciprocal of b”. But where do we get the reciprocal of b from? … Well, we can say it’s the multiplicative inverse of b. That is, it’s whatever number you have to multiply ‘b’ by in order to get ‘1’. But we’re almost surely going to find that taking 1 and dividing it by b. So we’ve swapped out one division problem for a slightly different one. This doesn’t seem to be getting us anywhere.

But we have gotten a new idea. If we can define the multiplication of things, we might be able to get division for almost free. Could we divide one matrix by another? We can certainly multiply a matrix by the inverse of another. (There are complications at work here. We’ll save them for another time.) A lot of sets allow us to define things that make sense as addition and multiplication. And if we can define a complicated operation in terms of addition and multiplication … If we follow this path, we get to do things like define the cosine of a matrix. Then we just have to figure out why we’d want have a cosine of a matrix.

There’s a simpler practical use of reciprocals. This relates to numerical mathematics, computer work. Computer chips do addition (and subtraction) really fast. They do multiplication a little slower. They do division a lot slower. Division is harder than multiplication, as anyone who’s done both knows. However, dividing by (say) 4 is the same thing as multiplying by 0.25. So if you know you need to divide by a number a lot, then it might make for a faster program to change division into multiplication by a reciprocal. You have to work out the reciprocal, but if you only have to do that once instead of many times over, this might make for faster code. Reciprocals are one of the tools we can use to change a mathematical process into something faster.

(In practice, you should never do this. You have a compiler that does this, and you should let it do its work. But it’s enlightening to know these are the sorts of things your compiler is looking for when it turns your code into something the computer does. And looking for ways to do the same work in less time is a noble side of mathematics.)

Charles Schulz’s Peanuts for the 4th of December (originally from 1968, on the same day) sees Peppermint Patty’s education crash against a word problem. It’s another problem in motivating a student to do a word problem. I admit when I was a kid I’d have been enchanted by this puzzle. But I was a weird one.

Dave Coverly’s Speed Bump for the 4th of December is a mathematics-symbols joke as applied to toast. I think you could probably actually sell those. At least the greater-than and the less-than signs. The approximately-equal-to signs would be hard to use. And people would think they were for bacon anyway.

Ruben Bolling’s Super-Fun-Pak Comix for the 4th of December showcases Young Albert Einstein. That counts as mathematical content, doesn’t it? The strip does make me wonder if they’re still selling music CDs and other stuff for infant or even prenatal development. I’m skeptical that they ever did any good, but it isn’t a field I’ve studied.

Bill Whitehead’s Free Range for the 5th of December uses a blackboard full of mathematical and semi-mathematical symbols to denote “stuff too complicated to understand”. The symbols don’t parse as anything. It is authentic to mathematical work to sometimes skip writing all the details of a thing and write in instead a few words describing it. Or to put in an abbreviation for the thing. That often gets circled or boxed or in some way marked off. That keeps us from later on mistaking, say, “MUB” as the product of M and U and B, whatever that would mean. Then we just have to remember we meant “minimum upper bound” by that.

## Reading the Comics, July 29, 2015: Not Entirely Reruns Edition

Zach Weinersmith’s Saturday Morning Breakfast Cereal (July 25) gets its scheduled appearance here with a properly formed Venn Diagram joke. I’m unqualified to speak for rap musicians. When mathematicians speak of something being “for reals” they mean they’re speaking about a variable that might be any of the real numbers. This is as opposed to limiting the variable to being some rational or irrational number, or being a whole number. It’s also as opposed to letting the variable be some complex-valued number, or some more exotic kind of number. It’s a way of saying what kind of thing we want to find true statements about.

I don’t know when the Saturday Morning Breakfast Cereal first ran, but I know I’ve seen it appear in my Twitter feed. I believe all the Gocomics.com postings of this strip are reruns, but I haven’t read the strip long enough to say.

Steve Sicula’s Home And Away (July 26) is built on the joke of kids wise to mathematics during summer vacation. I don’t think this is a rerun, although we’ve seen the joke this summer before. Daniel Beyer’s Offbeat Comics for the 27th of July, 2015.

Daniel Beyer’s Offbeat Comics (July 27) depicts an angel with a square halo because “I was good2.” The association between squaring a number and squares goes back a long time. Well, it’s right there in the name, isn’t it? Florian Cajori’s A History Of Mathematical Notations cites the term “latus” and the abbreviation “l” to represent the side of a square being used by the Roman surveyor Junius Nipsus in the second century; for centuries this would be as good a term as anyone had for the thing to be calculated. (Res, meaning “thing”, was also popular.) Once you’ve taken the idea of calculating based on the length of a square, the jump to “square” for “length times itself” seems like a tiny one. But Cajori doesn’t seem to have examples of that being written until the 16th century.

The square of the quantity you’re interested in might be written q, for quadratus. The cube would be c, for cubus. The fourth power would be b or bq, for biquadratus, and so on. This is tolerable if you only have to work with a single unknown quantity, but the notation turns into gibberish the moment you want two variables in the mix. So it collapsed in the 17th century, replaced by the familiar x2 and x3 and so on. Many authors developed notations close to this: James Hume would write xii or xiii; Pierre Hérigone x2 or x3, all in one line. Rene Descartes would write x2 or x3 or so, and many, many followed him. Still, quite a few people — including Rene Descartes, Isaac Newton, and even as late a figure as Carl Gauss, in the early 19th century — would resist “x2”. They’d prefer “xx”. Gauss defended this on the grounds that “x2” takes up just as much space as “xx” and so fails the biggest point of having notation.

Corey Pandolph’s Toby, Robot Satan (July 27, rerun) uses sudoku as an example of the logic and reasoning problems that one would expect a robot should be able to do. It is weird to encounter one that’s helpless before them.

Cory Thomas’s Watch Your Head (July 27, rerun from 2007) mentions “Chebyshev grids” and “infinite boundaries” as things someone doing mathematics on the computer would do. And it does so correctly. Differential equations describe how things change on some domain over space and time. They can be very hard to solve exactly, but can be put on the computer very well. For this, we pick a representative set of points which we call a mesh. And we find an approximate representation of the original differential question, which we call a discretization or a difference equation. We can then solve this difference equation on the mesh, and if we’ve done our work right, this approximation will let us get a good estimate for the solution to the original problem over the whole original domain.

A Chebyshev grid is a particular arrangement of mesh points. It’s not uniform; it tends to clump up, becoming more common near the ends of the boundary. This is useful if you have reason to expect that the boundaries are more interesting than the middle of the domain. There’s no sense wasting good computing power calculating boring stuff. The mesh is named for Pafnuty Chebyshev, a 19th Century Russian mathematician whose name is all over mathematics. Unfortunately since he was a 19th Century Russian mathematician, his name is transcribed into English all sorts of ways. Chebyshev seems to be most common today, though Tchebychev used to be quite popular, which is why polynomials of his might be abbreviated as T. There are many alternatives.

Ah, but how do you represent infinite boundaries with the finitely many points of any calculatable mesh? There are many approaches. One is to just draw a really wide mesh and trust that all the action is happening near the center so omitting the very farthest things doesn’t hurt too much. Or you might figure what the average of things far away is, and make a finite boundary that has whatever that value is. Another approach is to make the boundaries repeating: go far enough to the right and you loop back around to the left, go far enough up and you loop back around to down. Another approach is to create a mesh that is bundled up tight around the center, but that has points which do represent going off very, very far, maybe in principle infinitely far away. You’re allowed to create meshes that don’t space points uniformly, and that even move points as you compute. That’s harder work, but it’s legitimate numerical mathematics.

So, the mathematical work being described here is — so far as described — legitimate. I’m not competent to speak about the monkey side of the research.

Greg Evans’s Luann Againn (July 29; rerun from July 29, 1987) name-drops the Law of Averages. There are actually multiple Laws of Averages, with slightly different assumptions and implications, but they all come to about the same meaning. You can expect that if some experiment is run repeatedly, the average value of the experiments will be close to the true value of whatever you’re measuring. An important step in proving this law was done by Pafnuty Chebyshev.

## A Summer 2015 Mathematics A to Z Roundup

Since I’ve run out of letters there’s little dignified to do except end the Summer 2015 Mathematics A to Z. I’m still organizing my thoughts about the experience. I’m quite glad to have done it, though.

For the sake of good organization, here’s the set of pages that this project’s seen created:

## z-transform.

The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)

It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a0 represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.

a1, a2, a3, and so on are the first, second, third, and so on samples after the reference start. a-1, a-2, a-3, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.

So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a-4 = 77, a-3 = 81, a-2 = 84, a-1 = 82, a0 = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.

The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0$. (z1 is the same as z. z0 is the same as the number “1”. I wrote it this way to make the pattern more clear.)

I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a1 = 83 and a2 = 76. Then the z-transform for this set of data would be $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$. You’d probably agree that’s not a polynomial, although it looks a lot like one.

The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.

A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$.

Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure $\cdots + 0z^2 + 0z^1 + 1 + 1\left(\frac{1}{z}\right)^1 + 1\left(\frac{1}{z}\right)^2 + 1\left(\frac{1}{z}\right)^3 + 1\left(\frac{1}{z}\right)^4 + \cdots$. And that turns out to be the same as $1 \div \left(1 - \left(\frac{1}{z}\right)\right)$. That’s much shorter to write down, at least.

Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a0, and adding to that one-fifth of yesterday’s temperature, a-1, and one-fifth of the day before’s temperature a-2, and one-fifth a-3, and one-fifth a-4.

The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?

We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.

## Well-Posed Problem.

This is another mathematical term almost explained by what the words mean in English. Probably you’d guess a well-posed problem to be a question whose answer you can successfully find. This also implies that there is an answer, and that it can be found by some method other than guessing luckily.

Mathematicians demand three things of a problem to call it “well-posed”. The first is that a solution exists. The second is that a solution has to be unique. It’s imaginable there might be several answers that answer a problem. In that case we weren’t specific enough about what we’re looking for. Or we should have been looking for a set of answers instead of a single answer.

The third requirement takes some time to understand. It’s that the solution has to vary continuously with the initial conditions. That is, suppose we started with a slightly different problem. If the answer would look about the same, then the problem was well-posed to begin with. Suppose we’re looking at the problem of how a block of ice gets melted by a heater set in its center. The way that melts won’t change much if the heater is a little bit hotter, or if it’s moved a little bit off center. This heating problem is well-posed.

There are problems that don’t have this continuous variation, though. Typically these are “inverse problems”. That is, they’re problems in which you look at the outcome of something and try to say what caused it. That would be looking at the puddle of melted water and the heater and trying to say what the original block of ice looked like. There are a lot of blocks of ice that all look about the same once melted, and there’s no way of telling which was the one you started with.

You might think of these conditions as “there’s an answer, there’s only one answer, and you can find it”. That’s good enough as a memory aid, but it isn’t quite so. A problem’s solution might have this continuous variation, but still be “numerically unstable”. This is a difficulty you can run across when you try doing calculations on a computer.

You know the thing where on a calculator you type in 1 / 3 and get back 0.333333? And you multiply that by three and get 0.999999 instead of exactly 1? That’s the thing that underlies numerical instability. We want to work with numbers, but the calculator or computer will let us work with only an approximation to them. 0.333333 is close to 1/3, but isn’t exactly that.

For many calculations the difference doesn’t matter. 0.999999 is really quite close to 1. If you lost 0.000001 parts of every dollar you earned there’s a fine chance you’d never even notice. But in some calculations, numerically unstable ones, that difference matters. It gets magnified until the error created by the difference between the number you want and the number you can calculate with is too big to ignore. In that case we call the calculation we’re doing “ill-conditioned”.

And it’s possible for a problem to be well-posed but ill-conditioned. This is annoying and is why numerical mathematicians earn the big money, or will tell you they should. Trying to calculate the answer will be so likely to give something meaningless that we can’t trust the work that’s done. But often it’s possible to rework a calculation into something equivalent but well-conditioned. And a well-posed, well-conditioned problem is great. Not only can we find its solution, but we can usually have a computer do the calculations, and that’s a great breakthrough.

## Step.

On occasion a friend or relative who’s got schoolkids asks me how horrified I am by some bit of Common Core mathematics. This is a good chance for me to disappoint the friend or relative. Usually I’m just sincerely not horrified. Much of what raises horror is students being asked to estimate and approximate answers. This is instead of calculating the answer directly. But I like estimation and approximation. If I want an exact answer I’ll do better to use a calculator. What I need is assurance the thing I’m calculating can sensibly be the thing I want to know. Nearly all my feats of mental arithmetic amount to making an estimate. If I must I improve it until someone’s impressed.

The other horror-raising examples I get amount to “look at how many steps it takes to do this simple problem!” The ones that cross my desk are usually subtraction problems. Someone’s offended the student is told to work out 107 minus 18 (say) by counting by ones from 18 up to 20, then by tens from 20 up to 100, and then by ones again up to 107. And this when they could just write one number above another and do some borrowing and get 89 right away, no steps needed. Assuring my acquaintance that the other method is really just the way you might count change, and that I do subtraction that way much of the time, doesn’t change minds. (More often I do that to double-check my answer. This raises the question of why I don’t do it that way the first time.) Though it does make the acquaintance conclude I’m some crazy person with no idea how to teach kids.

That’s probably fair. I’ve never taught elementary school students, and haven’t any training for it. I’ve only taught college students. For that my entire training consisted of a single one-credit course my first semester as a Teaching Assistant, plus whatever I happened to pick up while TAing for professors who wanted me to sit in on lecture. From the first I learned there is absolutely no point to saying anything while I face the chalkboard because it will be unheard except by the board, which has already been through this class forty times. From the second I learned to toss hard candies as reward to anyone who would say anything, anything, in class. Both are timeless pedagogical truths.

But the worry about the number of steps it takes to do some arithmetic calculation stays with me. After all, what is a step? How much work is it? How hard is a step?

I don’t think there is a concrete measure of hardness. I’m not sure there could be. If I needed to, I’d work out 107 minus 18 by noticing it’s just about 110 minus 20, so it’s got to be about 90, and a 7 minus 8 has to end in a 9 so the answer must be 89. How many steps was that? I guess there are maybe three thoughts involved there. But I don’t do that, at least not deliberately, when I look at the problem. 89 just appears, and if I stay interested in the question, the reasons why that’s right follow in short order. So how many steps did I take? Three? One?

On the other hand, I know that in elementary school I would have had to work it out by looking at 7 minus 8. And then I’d need to borrow from the tens column. And oh dear there’s a 0 to the left of the 7 so I have to borrow from the hundreds and … That’s the procedure as it was taught back then. Now, I liked that. I understood it. And I was taught with appeals to breaking dollars into dimes and pennies, which worked for my imagination. But it’s obviously a bunch of steps. How many? I’m not sure; probably around ten or so. And, if we’re being honest, borrowing from a zero in the tens column is a deeply weird thing to do. I can understand people freezing up rather than do that.

Similarly, I know that if I needed to differentiate the logarithm of the cosine of x, I would have the answer in a flash. It’d be at most one step. If I were still in high school, in my calculus class, I’d need longer. I’d struggle through the chain rule and some simplifications after that. Call it maybe four or five steps. If I were in elementary school I’d need infinitely many steps. I couldn’t even understand the problem except in the most vague, metaphoric way.

This leads me to my suggestion for what a “step” is, at least for problems you work out by hand. (Numerical computing has a more rigorous definition of a step; that’s when you do one of the numerical processing operations.) A step is “the most work you can do in your head without a significant chance of making a mistake”. I think that’s a definition that clarifies the problem of counting steps. It will be different for different people. It will be different for the same person, depending on how experienced she is. The steps a newcomer has to a subject are smaller than the ones an expert has. And it’s not just that newcomer takes more steps to get to the same conclusion than the expert does. The expert might imagine the problem breaks down into different steps from the ones a newcomer can do. Possibly the most important skill a teacher has is being able to work out what the steps the newcomer can take are. These will not always be what the expert thinks the smaller steps would be.

But what to do with problem-solving approaches that require lots of steps? And here I recommend one of the wisest pieces of advice I’ve ever run across. It’s from the 1954 Printer 1 & C United States Navy Training Course manual, NavPers 10458. I apologize if I’m citing it wrong, but I hope people can follow that to the exact document. I have it because I’m interested in Linotype operation is why. From page 308, the section “Don’t Overlook Instructions” in Chapter 7:

When starting on a new piece of copy, or “take” is it is called, be sure to read all instructions, such as the style and size of type, the measure to be set, whether it is to be leaded, indented, and so on.

Then go slowly. Try to develop even, rhythmic strokes, rather than quick, sporadic motions. Strive for accuracy rather than speed. Speed will come with practice.

As with Linotype operations, so it is with arithmetic. Be certain you are doing what you mean to do, and strive to do it accurately. I don’t know how many steps you need, but you probably won’t get a wrong answer if you take more than the minimum number of steps. If you take fewer steps than you need the results will be wretched. Speed will come with practice.