My 2019 Mathematics A To Z: Linear Programming


Today’s A To Z term is another proposed by @aajohannas.

I couldn’t find a place to fit this in the essay proper. But it’s too good to leave out. The simplex method, discussed within, traces to George Dantzig. He’d been planning methods for the US Army Air Force during the Second World War. Dantzig is a person you have heard about, if you’ve heard any mathematical urban legends. In 1939 he was late to Jerzy Neyman’s class. He took two statistics problems on the board to be homework. He found them “harder than usual”, but solved them in a couple days and turned in the late homework hoping Neyman would be understanding. They weren’t homework. They were examples of famously unsolved problems. Within weeks Neyman had written one of the solutions up for publication. When he needed a thesis topic Neyman advised him to just put what he already had in a binder. It’s the stuff every grad student dreams of. The story mutated. It picked up some glurge to become a narrative about positive thinking. And mutated further, into the movie Good Will Hunting.

The story gets better, for my purposes. The simplex method can be understood as one of those homework problems. Dantzig describes some of that in this 1987 essay about the origins of the method. The essay is worth reading to understand some of how people come to think of good mathematics.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Linear Programming.

Every three days one of the comic strips I read has the elderly main character talk about how they never used algebra. This is my hyperbole. But mathematics has got the reputation for being difficult and inapplicable to everyday life. We’ll concede using arithmetic, when we get angry at the fast food cashier who hands back our two pennies before giving change for our $6.77 hummus wrap. But otherwise, who knows what an elliptic integral is, and whether it’s working properly?

Linear programming does not have this problem. In part, this is because it lacks a reputation. But those who have heard of it, acknowledge it as immensely practical mathematics. It is about something a particular kind of human always finds compelling. That is how to do a thing best.

There are several kinds of “best”. There is doing a thing in as little time as possible. Or for as little effort as possible. For the greatest profit. For the highest capacity. For the best score. For the least risk. The goals have a thousand names, none of which we need to know. They all mean the same thing. They mean “the thing we wish to optimize”. To optimize has two directions, which are one. The optimum is either the maximum or the minimum. To be good at finding a maximum is to be good at finding a minimum.

It’s obvious why we call this “programming”; obviously, we leave the work of finding answers to a computer. It’s a spurious reason. The “programming” here comes from an independent sense of the word. It means more about finding a plan. Think of “programming” a night’s entertainment, so that every performer gets their turn, all scene changes have time to be done, you don’t put two comedians right after the other, and you accommodate the performer who has to leave early and the performer who’ll get in an hour late. Linear programming problems are often about finding how to do as well as possible given various priorities. All right. At least the “linear” part is obvious. A mathematics problem is “linear” when it’s something we can reasonably expect to solve. This is not the technical meaning. Technically what it means is we’re looking at a function something like:

ax + by + cz

Here, x, y, and z are the independent variables. We don’t know their values but wish to. a, b, and c are coefficients. These values are set to some constant for the problem, but they might be something else for other problems. They’re allowed to be positive or negative or even zero. If a coefficient is zero, then the matching variable doesn’t affect matters at all. The corresponding value can be anything at all, within the constraints.

I’ve written this for three variables, as an example and because ‘x’ and ‘y’ and ‘z’ are comfortable, familiar variables. There can be fewer. There can be more. There almost always are. Two- and three-variable problems will teach you how to do this kind of problem. They’re too simple to be interesting, usually. To avoid committing to a particular number of variables we can use indices. x_j for values of j from 1 up to N. Or we can bundle all these values together into a vector, and write everything as \vec{x} . This has a particular advantage since when we can write the coefficients as a vector too. Then we use the notation of linear algebra, and write that we hope to maximize the value of:

\vec{c}^T\vec{x}

(The superscript T means “transpose”. As a linear algebra problem we’d usually think of writing a vector as a tall column of things. By transposing that we write a long row of things. By transposing we can use the notation of matrix multiplication.)

This is the objective function. Objective here in the sense of goal; it’s the thing we want to find the best possible value of.

We have constraints. These represent limits on the variables. The variables are always things that come in limited supply. There’s no allocating more money than the budget allows, nor putting more people on staff than work for the company. Often these constraints interact. Perhaps not only is there only so much staff, but no one person can work more than a set number of days in a row. Something like that. That’s all right. We can write all these constraints as a matrix equation. An inequality, properly. We can bundle all the constraints into a big matrix named A, and demand:

A\vec{x} \le \vec{b}

Also, traditionally, we suppose that every component of \vec{x} is non-negative. That is, positive, or at lowest, zero. This reflects the field’s core problems of figuring how to allocate resources. There’s no allocating less than zero of something.

But we need some bounds. This is easiest to see with a two-dimensional problem. Try it yourself: draw a pair of axes on a sheet of paper. Now put in a constraint. Doesn’t matter what. The constraint’s edge is a straight line, which you can draw at any position and any angle you like. This includes horizontal and vertical. Shade in one side of the constraint. Whatever you shade in is the “feasible region”, the sets of values allowed under the constraint. Now draw in another line, another constraint. Shade in one side or the other of that. Draw in yet another line, another constraint. Shade in one side or another of that. The “feasible region” is whatever points have taken on all these shades. If you were lucky, this is a bounded region, a triangle. If you weren’t lucky, it’s not bounded. It’s maybe got some corners but goes off to the edge of the page where you stopped shading things in.

So adding that every component of \vec{x} is at least as big as zero is a backstop. It means we’ll usually get a feasible region with a finite volume. What was the last project you worked on that had no upper limits for anything, just minimums you had to satisfy? Anyway if you know you need something to be allowed less than zero go ahead. We’ll work it out. The important thing is there’s finite bounds on all the variables.

I didn’t see the bounds you drew. It’s possible you have a triangle with all three shades inside. But it’s also possible you picked the other sides to shade, and you have an annulus, with no region having more than two shades in it. This can happen. It means it’s impossible to satisfy all the constraints at once. At least one of them has to give. You may be reminded of the sign taped to the wall of your mechanics’ about picking two of good-fast-cheap.

But impossibility is at least easy. What if there is a feasible region?

Well, we have reason to hope. The optimum has to be somewhere inside the region, that’s clear enough. And it even has to be on the edge of the region. If you’re not seeing why, think of a simple example, like, finding the maximum of 2x + y , inside the square where x is between 0 and 2 and y is between 0 and 3. Suppose you had a putative maximum on the inside, like, where x was 1 and y was 2. What happens if you increase x a tiny bit? If you increase y by twice that? No, it’s only on the edges you can get a maximum that can’t be locally bettered. And only on the corners of the edges, at that.

(This doesn’t prove the case. But it is what the proof gets at.)

So the problem sounds simple then! We just have to try out all the vertices and pick the maximum (or minimum) from them all.

OK, and here’s where we start getting into trouble. With two variables and, like, three constraints? That’s easy enough. That’s like five points to evaluate? We can do that.

We never need to do that. If someone’s hiring you to test five combinations I admire your hustle and need you to start getting me consulting work. A real problem will have many variables and many constraints. The feasible region will most often look like a multifaceted gemstone. It’ll extend into more than three dimensions, usually. It’s all right if you just imagine the three, as long as the gemstone is complicated enough.

Because now we’ve got lots of vertices. Maybe more than we really want to deal with. So what’s there to do?

The basic approach, the one that’s foundational to the field, is the simplex method. A “simplex” is a triangle. In three dimensions, anyway. In four dimensions it’s a tetrahedron. In two dimensions it’s a line segment. Generally, however many dimensions of space you have? The simplex is the simplest thing that fills up volume in your space.

You know how you can turn any polygon into a bunch of triangles? Just by connecting enough vertices together? You can turn a polyhedron into a bunch of tetrahedrons, by adding faces that connect trios of vertices. And for polyhedron-like shapes in more dimensions? We call those polytopes. Polytopes we can turn into a bunch of simplexes. So this is why it’s the “simplex method”. Any one simplex it’s easy to check the vertices on. And we can turn the polytope into a bunch of simplexes. And we can ignore all the interior vertices of the simplexes.

So here’s the simplex method. First, break your polytope up into simplexes. Next, pick any simplex; doesn’t matter which. Pick any outside vertex of that simplex. This is the first viable possible solution. It’s most likely wrong. That’s okay. We’ll make it better.

Because there are other vertices on this simplex. And there are other simplexes, adjacent to that first, which share this vertex. Test the vertices that share an edge with this one. Is there one that improves the objective function? Probably. Is there a best one of those in this simplex? Sure. So now that’s our second viable possible solution. If we had to give an answer right now, that would be our best guess.

But this new vertex, this new tentative solution? It shares edges with other vertices, across several simplexes. So look at these new neighbors. Are any of them an improvement? Which one of them is the best improvement? Move over there. That’s our next tentative solution.

You see where this is going. Keep at this. Eventually it’ll wind to a conclusion. Usually this works great. If you have, like, 8 constraints, you can usually expect to get your answer in from 16 to 24 iterations. If you have 20 constraints, expect an answer in from 40 to 60 iterations. This is doing pretty well.

But it might take a while. It’s possible for the method to “stall” a while, often because one or more of the variables is at its constraint boundary. Or the division of polytope into simplexes got unlucky, and it’s hard to get to better solutions. Or there might be a string of vertices that are all at, or near, the same value, so the simplex method can’t resolve where to “go” next. In the worst possible case, the simplex method takes a number of iterations that grows exponentially with the number of constraints. This, yes, is very bad. It doesn’t typically happen. It’s a numerical algorithm. There’s some problem to spoil any numerical algorithm.

You may have complaints. Like, the world is complicated. Why are we only looking at linear objective functions? Or, why only look at linear constraints? Well, if you really need to do that? Go ahead, but that’s not linear programming anymore. Think hard about whether you really need that, though. Linear anything is usually simpler than nonlinear anything. I mean, if your optimization function absolutely has to have y^2 in it? Could we just say you have a new variable w that just happens to be equal to the square of y? Will that work? If you have to have the sine of z? Are you sure that z isn’t going to get outside the region where the sine of z is pretty close to just being z? Can you check?

Maybe you have, and there’s just nothing for it. That’s all right. This is why optimization is a living field of study. It demands judgement and thought and all that hard work.


Thank you for reading. This and all the other Fall 2019 A To Z posts should be at this link. They should be joined next week by letters ‘M’ and ‘N’. Also next week I hope to open for nominations for the next set of letters. All of my past A To Z essays should be available at this link.

Advertisements

My 2019 Mathematics A To Z: Julia set


Today’s A To Z term is my pick again. So I choose the Julia Set. This is named for Gaston Julia, one of the pioneers in chaos theory and fractals. He was born earlier than you imagine. No, earlier than that: he was born in 1893.

The early 20th century saw amazing work done. We think of chaos theory and fractals as modern things, things that require vast computing power to understand. The computers help, yes. But the foundational work was done more than a century ago. Some of these pioneering mathematicians may have been able to get some numerical computing done. But many did not. They would have to do the hard work of thinking about things which they could not visualize. Things which surely did not look like they imagined.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Julia set.

We think of things as moving. Even static things we consider as implying movement. Else we’d think it odd to ask, “Where does that road go?” This carries over to abstract things, like mathematical functions. A function is a domain, a range, and a rule matching things in the domain to things in the range. It “moves” things as much as a dictionary moves words.

Yet we still think of a function as expressing motion. A common way for mathematicians to write functions uses little arrows, and describes what’s done as “mapping”. We might write f: D \rightarrow R . This is a general idea. We’re expressing that it maps things in the set D to things in the set R. We can use the notation to write something more specific. If ‘z’ is in the set D, we might write f : z \rightarrow z^2 + \frac{1}{2} . This describes the rule that matches things in the domain to things in the range. f(2) represents the evaluation of this rule at a specific point, the one where the independent variable has the value ‘2’. f(z) represents the evaluation of this rule at a specific point without committing to what that point is. f(D) represents a collection of points. It’s the set you get by evaluating the rule at every point in D.

And it’s not bad to think of motion. Many functions are models of things that move. Particles in space. Fluids in a room. Populations changing in time. Signal strengths varying with a sensor’s position. Often we’ll calculate the development of something iteratively, too. If the domain and the range of a function are the same set? There’s no reason that we can’t take our z, evaluate f(z), and then take whatever that thing is and evaluate f(f(z)). And again. And again.

My age cohort, at least, learned to do this almost instinctively when we discovered you could take the result on a calculator and hit a function again. Calculate something and keep hitting square root; you get a string of numbers that eventually settle on 1. Or you started at zero. Calculate something and keep hitting square; you settle at either 0, 1, or grow to infinity. Hitting sine over and over … well, that was interesting since you might settle on 0 or some other, weird number. Same with tangent. Cosine you wouldn’t settle down to zero.

Serious mathematicians look at this stuff too, though. Take any set ‘D’, and find what its image is, f(D). Then iterate this, figuring out what f(f(D)) is. Then f(f(f(D))). f(f(f(f(D)))). And so on. What happens if you keep doing this? Like, forever?

We can say some things, at least. Even without knowing what f is. There could be a part of D that all these many iterations of f will send out to infinity. There could be a part of D that all these many iterations will send to some fixed point. And there could be a part of D that just keeps getting shuffled around without ever finishing.

Some of these might not exist. Like, f: z \rightarrow z + 4 doesn’t have any fixed points or shuffled-around points. It sends everything off to infinity. f: z \rightarrow \frac{1}{10} z has only a fixed point; nothing from it goes off to infinity and nothing’s shuffled back and forth. f: z \rightarrow -z has a fixed point and a lot of points that shuffle back and forth.

Thinking about these fixed points and these shuffling points gets us Julia Sets. These sets are the fixed points and shuffling-around points for certain kinds of functions. These functions are ones that have domain and range of the complex-valued numbers. Complex-valued numbers are the sum of a real number plus an imaginary number. A real number is just what it says on the tin. An imaginary number is a real number multiplied by \imath . What is \imath ? It’s the imaginary unit. It has the neat property that \imath^2 = -1 . That’s all we need to know about it.

Oh, also, zero times \imath is zero again. So if you really want, you can say all real numbers are complex numbers; they’re just themselves plus 0 \imath . Complex-valued functions are worth a lot of study in their own right. Better, they’re easier to study (at the introductory level) than real-valued functions are. This is such a relief to the mathematics major.

And now let me explain some little nagging weird thing. I’ve been using ‘z’ to represent the independent variable here. You know, using it as if it were ‘x’. This is a convention mathematicians use, when working with complex-valued numbers. An arbitrary complex-valued number tends to be called ‘z’. We haven’t forgotten x, though. We just in this context use ‘x’ to mean “the real part of z”. We also use “y” to carry information about the imaginary part of z. When we write ‘z’ we hold in trust an ‘x’ and ‘y’ for which z = x + y\imath . This all comes in handy.

But we still don’t have Julia Sets for every complex-valued function. We need it to be a rational function. The name evokes rational numbers, but that doesn’t seem like much guidance. f:z \rightarrow \frac{3}{5} is a rational function. It seems too boring to be worth studying, though, and it is. A “rational function” is a function that’s one polynomial divided by another polynomial. This whether they’re real-valued or complex-valued polynomials.

So. Start with an ‘f’ that’s one complex-valued polynomial divided by another complex-valued polynomial. Start with the domain D, all of the complex-valued numbers. Find f(D). And f(f(D)). And f(f(f(D))). And so on. If you iterated this ‘f’ without limit, what’s the set of points that never go off to infinity? That’s the Julia Set for that function ‘f’.

There are some famous Julia sets, though. There are the Julia sets that we heard about during the great fractal boom of the 1980s. This was when computers got cheap enough, and their graphic abilities good enough, to automate the calculation of points in these sets. At least to approximate the points in these sets. And these are based on some nice, easy-to-understand functions. First, you have to pick a constant C. This C is drawn from the complex-valued numbers. But that can still be, like, ½, if that’s what interests you. For whatever your C is? Define this function:

f_C: z \rightarrow z^2 + C

And that’s it. Yes, this is a rational function. The numerator function is z^2 + C . The denominator function is 1 .

This produces many different patterns. If you picked C = 0, you get a circle. Good on you for starting out with something you could double-check. If you picked C = -2? You get a long skinny line, again, easy enough to check. If you picked C = -1? Well, now you have a nice interesting weird shape, several bulging ovals with peninsulas of other bulging ovals all over. Pick other numbers. Pick numbers with interesting imaginary components. You get pinwheels. You get jagged streaks of lightning. You can even get separate islands, whole clouds of disjoint threatening-looking blobs.

There is some guessing what you’ll get. If you work out a Julia Set for a particular C, you’ll see a similar-looking Julia Set for a different C that’s very close to it. This is a comfort.

You can create a Julia Set for any rational function. I’ve only ever seen anyone actually do it for functions that look like what we already had. z^3 + C . Sometimes z^4 + C . I suppose once, in high school, I might have tried z^5 + C but I don’t remember what it looked like. If someone’s done, say, \frac{1}{z^2 + C} please write in and let me know what it looks like.

The Julia Set has a famous partner. Maybe the most famous fractal of them all, the Mandelbrot Set. That’s the strange blobby sea surrounded by lightning bolts that you see on the cover of every pop mathematics book from the 80s and 90s. If a C gives us a Julia Set that’s one single, contiguous patch? Then that C is in the Mandelbrot Set. Also vice-versa.

The ideas behind these sets are old. Julia’s paper about the iterations of rational functions first appeared in 1918. Julia died in 1978, the same year that the first computer rendering of the Mandelbrot set was done. I haven’t been able to find whether that rendering existed before his death. Nor have I decided which I would think the better sequence.


Thanks for reading. All of Fall 2019 A To Z posts should be at this link. And next week I hope to get to the letters ‘K’ and ‘L’. Sunday, yes, I hope to get back to the comics.

My 2019 Mathematics A To Z: Hamiltonian


Today’s A To Z term is another I drew from Mr Wu, of the Singapore Math Tuition blog. It gives me more chances to discuss differential equations and mathematical physics, too.

The Hamiltonian we name for Sir William Rowan Hamilton, the 19th century Irish mathematical physicists who worked on everything. You might have encountered his name from hearing about quaternions. Or for coining the terms “scalar” and “tensor”. Or for work in graph theory. There’s more. He did work in Fourier analysis, which is what you get into when you feel at ease with Fourier series. And then wild stuff combining matrices and rings. He’s not quite one of those people where there’s a Hamilton’s Theorem for every field of mathematics you might be interested in. It’s close, though.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Hamiltonian.

When you first learn about physics you learn about forces and accelerations and stuff. When you major in physics you learn to avoid dealing with forces and accelerations and stuff. It’s not explicit. But you get trained to look, so far as possible, away from vectors. Look to scalars. Look to single numbers that somehow encode your problem.

A great example of this is the Lagrangian. It’s built on “generalized coordinates”, which are not necessarily, like, position and velocity and all. They include the things that describe your system. This can be positions. It’s often angles. The Lagrangian shines in problems where it matters that something rotates. Or if you need to work with polar coordinates or spherical coordinates or anything non-rectangular. The Lagrangian is, in your general coordinates, equal to the kinetic energy minus the potential energy. It’ll be a function. It’ll depend on your coordinates and on the derivative-with-respect-to-time of your coordinates. You can take partial derivatives of the Lagrangian. This tells how the coordinates, and the change-in-time of your coordinates should change over time.

The Hamiltonian is a similar way of working out mechanics problems. The Hamiltonian function isn’t anything so primitive as the kinetic energy minus the potential energy. No, the Hamiltonian is the kinetic energy plus the potential energy. Totally different in idea.

From that description you maybe guessed you can transfer from the Lagrangian to the Hamiltonian. Maybe vice-versa. Yes, you can, although we use the term “transform”. Specifically a “Legendre transform”. We can use any coordinates we like, just as with Lagrangian mechanics. And, as with the Lagrangian, we can find how coordinates change over time. The change of any coordinate depends on the partial derivative of the Hamiltonian with respect to a particular other coordinate. This other coordinate is its “conjugate”. (It may either be this derivative, or minus one times this derivative. By the time you’re doing work in the field you’ll know which.)

That conjugate coordinate is the important thing. It’s why we muck around with Hamiltonians when Lagrangians are so similar. In ordinary, common coordinate systems these conjugate coordinates form nice pairs. In Cartesian coordinates, the conjugate to a particle’s position is its momentum, and vice-versa. In polar coordinates, the conjugate to the angular velocity is the angular momentum. These are nice-sounding pairs. But that’s our good luck. These happen to match stuff we already think is important. In general coordinates one or more of a pair can be some fusion of variables we don’t have a word for and would never care about. Sometimes it gets weird. In the problem of vortices swirling around each other on an infinitely great plane? The horizontal position is conjugate to the vertical position. Velocity doesn’t enter into it. For vortices on the sphere the longitude is conjugate to the cosine of the latitude.

What’s valuable about these pairings is that they make a “symplectic manifold”. A manifold is a patch of space where stuff works like normal Euclidean geometry does. In this case, the space is in “phase space”. This is the collection of all the possible combinations of all the variables that could ever turn up. Every particular moment of a mechanical system matches some point in phase space. Its evolution over time traces out a path in that space. Call it a trajectory or an orbit as you like.

We get good things from looking at the geometry that this symplectic manifold implies. For example, if we know that one variable doesn’t appear in the Hamiltonian, then its conjugate’s value never changes. This is almost the kindest thing you can do for a mathematical physicist. But more. A famous theorem by Emmy Noether tells us that symmetries in the Hamiltonian match with conservation laws in the physics. Time-invariance, for example — time not appearing in the Hamiltonian — gives us the conservation of energy. If only distances between things, not absolute positions, matter, then we get conservation of linear momentum. Stuff like that. To find conservation laws in physics problems is the kindest thing you can do for a mathematical physicist.

The Hamiltonian was born out of planetary physics. These are problems easy to understand and, apart from the case of one star with one planet orbiting each other, impossible to solve exactly. That’s all right. The formalism applies to all kinds of problems. They’re very good at handling particles that interact with each other and maybe some potential energy. This is a lot of stuff.

More, the approach extends naturally to quantum mechanics. It takes some damage along the way. We can’t talk about “the” position or “the” momentum of anything quantum-mechanical. But what we get when we look at quantum mechanics looks very much like what Hamiltonians do. We can calculate things which are quantum quite well by using these tools. This though they came from questions like why Saturn’s rings haven’t fallen part and whether the Earth will stay around its present orbit.

It holds surprising power, too. Notice that the Hamiltonian is the kinetic energy of a system plus its potential energy. For a lot of physics problems that’s all the energy there is. That is, the value of the Hamiltonian for some set of coordinates is the total energy of the system at that time. And, if there’s no energy lost to friction or heat or whatever? Then that’s the total energy of the system for all time.

Here’s where this becomes almost practical. We often want to do a numerical simulation of a physics problem. Generically, we do this by looking up what all the values of all the coordinates are at some starting time t0. Then we calculate how fast these coordinates are changing with time. We pick a small change in time, Δ t. Then we say that at time t0 plus Δ t, the coordinates are whatever they started at plus Δ t times that rate of change. And then we repeat, figuring out how fast the coordinates are changing now, at this position and time.

The trouble is we always make some mistake, and once we’ve made a mistake, we’re going to keep on making mistakes. We can do some clever stuff to make the smallest error possible figuring out where to go, but it’ll still happen. Usually, we stick to calculations where the error won’t mess up our results.

But when we look at stuff like whether the Earth will stay around its present orbit? We can’t make each step good enough for that. Unless we get to thinking about the Hamiltonian, and our symplectic variables. The actual system traces out a path in phase space. Everyone on that path the Hamiltonian is a particular value, the energy of the system. So use the regular methods to project most of the variables to the new time, t0 + Δ t. But the rest? Pick the values that makes the Hamiltonian work out right. Also momentum and angular momentum and other stuff we know get conserved. We’ll still make an error. But it’s a different kind of error. It’ll project to a point that’s maybe in the wrong place on the trajectory. But it’s on the trajectory.

(OK, it’s near the trajectory. Suppose the real energy is, oh, the square root of 5. The computer simulation will have an energy of 2.23607. This is close but not exactly the same. That’s all right. Each step will stay close to the real energy.)

So what we’ll get is a projection of the Earth’s orbit that maybe puts it in the wrong place in its orbit. Putting the planet on the opposite side of the sun from Venus when we ought to see Venus transiting the Sun. That’s all right, if what we’re interested in is whether Venus and Earth are still in the solar system.

There’s a special cost for this. If there weren’t we’d use it all the time. The cost is computational complexity. It’s pricey enough that you haven’t heard about these “symplectic integrators” before. That’s all right. These are the kinds of things open to us once we look long into the Hamiltonian.


This wraps up my big essay-writing for the week. I will pluck some older essays out of obscurity to re-share tomorrow and Saturday. All of Fall 2019 A To Z posts should be at this link. Next week should have the letter I on Tuesday and J on Thursday. All of my A To Z essays should be available at this link. And I am still interested in topics I might use for the letters K through N. Thank you.

My 2019 Mathematics A To Z: Buffon’s Needle


Today’s A To Z term was suggested by Peter Mander. Mander authors CarnotCycle, which when I first joined WordPress was one of the few blogs discussing thermodynamics in any detail. When I last checked it still was, which is a shame. Thermodynamics is a fascinating field. It’s as deeply weird and counter-intuitive and important as quantum mechanics. Yet its principles are as familiar as a mug of warm tea on a chilly day. Mander writes at a more technical level than I usually do. But if you’re comfortable with calculus, or if you’re comfortable nodding at a line and agreeing that he wouldn’t fib to you about a thing like calculus, it’s worth reading.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Buffon’s Needle.

I’ve written of my fondness for boredom. A bored mind is not one lacking stimulation. It is one stimulated by anything, however petty. And in petty things we can find great surprises.

I do not know what caused Georges-Louis Leclerc, Comte de Buffon, to discover the needle problem named for him. It seems like something born of a bored but active mind. Buffon had an active mind: he was one of Europe’s most important naturalists of the 1700s. He also worked in mathematics, and astronomy, and optics. It shows what one can do with an engaged mind and a large inheritance from one’s childless uncle who’s the tax farmer for all Sicily.

The problem, though. Imagine dropping a needle on a floor that has equally spaced parallel lines. What is the probability that the needle will land on any of the lines? It could occur to anyone with a wood floor who’s dropped a thing. (There is a similar problem which would occur to anyone with a tile floor.) They have only to be ready to ask the question. Buffon did this in 1733. He had it solved by 1777. We, with several centuries’ insight into probability and calculus, need less than 44 years to solve the question.

Let me use L as the length of the needle. And d as the spacing of the parallel lines. If the needle’s length is less than the spacing then this is an easy formula to write, and not too hard to calculate. The probability, P, of the needle crossing some line is:

P = \frac{2}{\pi}\frac{L}{d}

I won’t derive it rigorously. You don’t need me for that. The interesting question is whether this formula makes sense. That L and d are in it? Yes, that makes sense. The length of the needle and the gap between lines have to be in there. More, the probability has to have the ratio between the two. There’s different ways to argue this. Dimensional analysis convinces me, at least. Probability is a pure number. L is a measurement of length; d is a measurement of length. To get a pure number starting with L and d means one of them has to divide into the other. That L is in the numerator and d the denominator makes sense. A tiny needle has a tiny chance of crossing a line. A large needle has a large chance. That \frac{L}{d} is raised to the first power, rather than the second or third or such … well, that’s fair. A needle twice as long having twice the chance of crossing a line? That sounds more likely than a needle twice as long having four times the chance, or eight times the chance.

Does the 2 belong there? Hard to say. 2 seems like a harmless enough number. It appears in many respectable formulas. That π, though …

That π …

π comes to us from circles. We see it in calculations about circles and spheres all the time. We’re doing a problem with lines and line segments. What business does π have showing up?

We can find reasons. One way is to look at a similar problem. Imagine dropping a disc on these lines. What’s the chance the disc falls across some line? That’s the chance that the center of the disc is less than one radius from any of the lines. What if the disc has an equal chance of landing anywhere on the floor? Then it has a probability of \frac{L}{d} of crossing a line. If the radius is smaller than the distance between lines, anyway. If the radius is larger than that, the probability is 1.

Now draw a diameter line on this disc. What’s the chance that this diameter line crosses this floor line? That depends on a couple things. Whether the center of the disc is near enough a floor line. And what angle the diameter line makes with respect to the floor lines. If the diameter line is parallel the floor line there’s almost no chance. If the diameter line is perpendicular to the floor line there’s the best possible chance. But that angle might be anything.

Let me call that angle θ. The diameter line crosses the floor line if the diameter times the sine of θ is less than half the distance between floor lines. … Oh. Sine. Sine and cosine and all the trigonometry functions we get from studying circles, and how to draw triangles within circles. And this diameter-line problem looks the same as the needle problem. So that’s where π comes from.

I’m being figurative. I don’t think one can make a rigorous declaration that the π in the probability formula “comes from” this sine, any more than you can declare that the square-ness of a shape comes from any one side. But it gives a reason to believe that π belongs in the probability.

If the needle’s longer than the gap between floor lines, if L > d , there’s still a probability that the needle crosses at least one line. It never becomes certain. No matter how long the needle is it could fall parallel to all the floor lines and miss them all. The probability is instead:

P = \frac{2}{\pi}\left(\frac{L}{d} - \sqrt{\left(\frac{L}{d}\right)^2 - 1} + \sec^{-1}\left(\frac{L}{d}\right)\right)

Here \sec^{-1} is the world-famous arcsecant function. That is, it’s whatever angle has as its secant the number \frac{L}{d} . I don’t mean to insult you. I’m being kind to the person reading this first thing in the morning. I’m not going to try justifying this formula. You can play with numbers, though. You’ll see that if \frac{L}{d} is a little bit bigger than 1, the probability is a little more than what you get if \frac{L}{d} is a little smaller than 1. This is reassuring.

The exciting thing is arithmetic, though. Use the probability of a needle crossing a line, for short needles. You can re-write it as this:

\pi = 2\frac{L}{d}\frac{1}{P}

L and d you can find by measuring needles and the lines. P you can estimate. Drop a needle many times over. Count how many times you drop it, and how many times it crosses a line. P is roughly the number of crossings divided by the number of needle drops. Doing this gives you a way to estimate π. This gives you something to talk about on Pi Day.

It’s a rubbish way to find π. It’s a lot of work, plus you have to sweep needles off the floor. Well, you can do it in simulation and avoid the risk of stepping on an overlooked needle. But it takes a lot of needle-drops to get good results. To be certain you’ve calculated the first two decimal points correctly requires 3,380,000 needle-drops. Yes, yes. You could get lucky and happen to hit on an estimate of 3.14 for π with fewer needle-drops. But if you were sincerely trying to calculate the digits of π this way? If you did not know what they were? You would need the three and a third million tries to be confident you had the number correct.

So this result is, as a practical matter, useless. It’s a heady concept, though. We think casually of randomness as … randomness. Unpredictability. Sometimes we will speak of the Law of Large Numbers. This is several theorems in probability. They all point to the same result. That if some event has (say) a probability of one-third of happening, then given 30 million chances, it will happen quite close to 10 million times.

This π result is another casting of the Law of Large Numbers, and of the apparent paradox that true unpredictability is itself predictable. There is no way to predict whether any one dropped needle will cross any line. It doesn’t even matter whether any one needle crosses any line. An enormous number of needles, tossed without fear or favor, will fall in ways that embed π. The same π you get from comparing the circumference of a circle to its diameter. The same π you get from looking at the arc-cosine of a negative one.

I suppose we could use this also to calculate the value of 2, but that somehow seems to touch lesser majesties.


Thank you again for reading. All of the Fall 2019 A To Z posts should be at this link. This year’s and all past A To Z sequences should be at this link. I’ve made my picks for next week’s topics, and am fooling myself into thinking I have a rough outline for them already. But I’m still open for suggestions for the letters E through H and appreciate suggestions.

My 2019 Mathematics A To Z: Abacus


Today’s A To Z term is the Abacus. It was suggested by aajohannas, on Twitter as @aajohannas. Particularly asked for was how to use an abacus. The abacus has been used by a great many cultures over thousands of years. So it’s hard to say that there is any one right way to use it. I’m going to get into a way to use it to compute, any more than there is a right way to use a hammer. There are many hammers, and many things to hammer. But there are similarities between all hammers, and the ways to use them as hammers are similar. So learning one kind, and one way to use that kind, can be a useful start.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Abacus.

I taught at the National University of Singapore in the first half of the 2000s. At the student union was this sheltered overhang formed by a stairwell. Underneath it, partly exposed to the elements (a common building style there) was a convenience store. Up front were the things with high turnover, snacks and pop and daily newspapers, that sort of thing. In the back, beyond the register, in the areas that the rain, the only non-gentle element, couldn’t reach even whipped by wind, were other things. Miscellaneous things. Exam bluebooks faded with age and dust. Good-luck cat statues colonized by spiderwebs. Unlabelled power cables for obsolete electronics. Once when browsing through this I encountered two things that I bought as badges of office.

One was a slide rule, a proper twelve-inch one. I’d had one already, a $2 six-inch-long one I’d gotten as an undergraduate from a convenience store the university had already decided to evict. The NUS one was a slide rule you could do actual work on. Another was a soroban, a compact Japanese abacus, in a patterned cardboard box a half-inch too short to hold it. I got both. For the novelty, yes. Also, I taught Computational Science. I felt it appropriate to have these iconic human computing devices.

But do I use them? Other than for decoration? … No, not really. I have too many calculators to need them. Also I am annoyed that while I can lay my hands on the slide rule I have put the soroban somewhere so logical and safe I can’t find it. A couple photographs would improve this essay. Too bad.

Do I know how to use them? If I find them? The slide rule, sure. If you know that a slide rule works via logarithms, and you play with it a little? You know how to use a slide rule. At least a little, after a bit of experimentation and playing with the three times table.

The abacus, though? How do you use that?

In childhood I heard about abacuses. That there’s a series of parallel rods, each with beads on them. Four placed below a center beam, one placed above. Sometimes two placed above. That the lower beads on a rod represent one each. That the upper bead represents five. That some people can do arithmetic on that faster than others can an electric calculator. And that was about all I got, or at least retained. How to do this arithmetic never penetrated my brain. I imagined, well, addition must be easy. Say you wanted to do three plus six, well, move three lower beads up to the center bar. Then slide one lower and one upper bead, for six, to the center bar, and read that off. Right?

The bizarre thing is my naive childhood idea is right. At least in the big picture. Let each rod represent one of the numbers in base-ten style. It’s anachronistic to the abacus’s origins to speak of a ones rod, a tens rod, a hundreds rod, and so on. So what? We’re using this tool today. We can use the ideas of base ten to make our understanding easier.

Pick a row of beads that you want to represent the ones. The row to the left of that represents tens. To the left of that, hundreds. To the right of the ones is the one-tenths, and the one-hundredths, and so on. This goes on to however far your need and however big your abacus is.

Move beads to the center to represent numbers you want. If you want ’21’, slide two lower beads up in the tens column and one lower bead in the ones column. If you want ’38’, slide three lower beads up in the tends column and one upper and three lower beads in the ones column.

To add two numbers, slide more beads representing those numbers toward the center bar. To subtract, slide beads away. Multiplication and division were beyond my young imagination. I’ll let them wait a bit.

There are complications. The complications are for good reason. When you master them, they make computation swifter. But you pay for that later speed with more time spent learning. This is a trade we make, and keep making, in computational mathematics. We make a process more reliable, more speedy, by making it less obvious.

Some of this isn’t too difficult. Like, work in one direction so far as possible. It’s easy to suppose this is better than jumping around from, say, the thousands digit to the tens to the hundreds to the ones. The advice I’ve read says work from the left to the right, that is, from the highest place to the lowest. Arithmetic as I learned it works from the ones to the tens to the hundreds, but this seems wiser. The most significant digits get calculated first this way. It’s usually more important to know the answer is closer to 2,000 than to 3,000 than to know that the answer ends in an 8 rather than a 6.

Some of this is subtle. This is to cope with practical problems. Suppose you want to add 5 to 6? There aren’t that many beads on any row. A Chinese abacus, which has two beads on the upper part, could cope with this particular problem. It’s going to be in trouble when you want to add 8 to 9, though. That’s not unique to an abacus. Any numerical computing technique can be broken by some problem. This is why it’s never enough to calculate; we still have to think. For example, thinking will let us handle this five plus six difficulty.

Consider this: four plus one is five. So four and one are “complementary numbers”, with respect to five. Similarly, three and two are five’s complementary numbers. So if we need to add four to a number, that’s equivalent to adding five and subtracting one. If we need to add two, that’s equivalent to adding five and subtracting three. This will get us through some shortages in bead count.

And consider this: four plus six is ten. So four and six are ten-complementary numbers. Similarly, three and seven are ten’s complementary numbers. Two and eight. One and nine. This gets us through much of the rest of the shortage.

So here’s how this works. Suppose we have 35, and wish to add 6 to it. There aren’t the beads to add six to the ones column. So? Instead subtract the complement of six. That is, add ten and subtract four. In moving across the rows, when you reach the tens, slide one lower bead up, making the abacus represent 45. Then from the ones column subtract four. that is, slide the upper bead away from the center bar, and slide the complement to four, one bead, up to the center. And now the abacus represents 41, just like it ought.

If you’re experienced enough you can reduce some of these operations, sliding the beads above and below the center bar at once. Or sliding a bead in the tens and another in the ones column at once. Don’t fret doing this. Worry about making correct steps. You’ll speed up with practice. I remember advice from a typesetting manual I collected once: “strive for consistent, regular keystrokes. Speed comes with practice. Errors are time-consuming to correct”. This is, mutatis mutandis, always good advice.

Subtraction works like addition. This should surprise few. If you have the beads in place, just remove them: four minus two takes no particular insight. If there aren’t enough beads? Fall back on complements. If you wish to do 35 minus 6? Set up 35, and calculate 35 minus 10 plus 4. When you get to the tens rod, slide one bead down; this leaves you with 25. Then in the ones column, slide four beads up. This leaves you with 29. I’m so glad these seem to be working out.

Doing longer additions and subtractions takes more rows, but not actually more work. 35.2 plus 6.4 is the same work as 35 plus 6 and 2 plus 4, each of which you, in principle, know how to do. 35.2 minus 6.4 is a bit more fuss. When you get to the 2 minus 4 bit you have to do that addition-of-complements stuff. But that’s not any new work.

Where the decimal point goes is something you have to keep track of. As with the slide rule, the magnitude of these numbers is notional. Your fingers move the same way to add 352 and 64 as they will 0.352 and 0.064. That’s convenient.

Multiplication gets more tedious. It demands paying attention to where the decimal point is. Just like the slide rule demands, come to think of it. You’ll need columns on the abacus for both the multiplicands and the product. And you’ll do a lot of adding up. But at heart? 2038 times 3.7 amounts to doing eight multiplications. 8 times 7, 3 times 7, 0 times 7 (OK, that one’s easy), 2 times 7, 3 times 7, 3 times 3, 0 times 3 (again, easy), and 2 times 3. And then add up these results in the correct columns. This may be tedious, but it’s not hard. It does mean the abacus doesn’t spare you having to know some times tables. I mean, you could use the abacus to work out 8 times 7 by adding seven to itself over and over, but that’s time-consuming. You can save time, and calculation steps, by memorization. By knowing some answers ahead of time.

Totton Heffelfinger and Gary Flom’s page, from which I’m drawing almost all my practical advice, offers a good notation of lettering the rods of the abacus, A, B, C, D, and so on. To multiply, say, 352 by 64 start by putting the 64 on rods BC. Set the 352 on rods EFG. We’ll get the answer with rod K as the ones column. 2 times 4 is 8; put that on rod K. 5 times 4 is 20; add that to rods IJ. 3 times 4 is 12; add that to rods HI. 2 times 6 is 12; add that to rods IJ. 5 times 6 is 30; add that to rods HI. 3 times 6 is 18; add that to rods GH. All going well this should add up to 22,528, spread out along rods GHIJK. I can see right away at least the 8 is correct.

You would do the same physical steps to multiply, oh, 3.52 by 0.0064. You have to take care of the decimal place yourself, though.

I see you, in the back there, growing suspicious. I’ll come around to this. Don’t worry.

Division is … oh, I have to fess up. Division is not something I feel comfortable with. I can read the instructions and repeat the examples given. I haven’t done it enough to have that flash where I understand the point of things. I recognize what’s happening. It’s the work of division as done by hand. You know, 821 divided by 56 worked out by, well, 56 goes into 82 once with a remainder of 26. Then drop down the 1 to make this 261. 56 goes into 261 … oh, it would be so nice if it went five times, but it doesn’t. It goes in four times, with a remainder of 37. I can walk you through the steps but all I am truly doing is trying to keep up with Totton Heffelfinger and Gary Flom’s instructions here.

There are, I read, also schemes to calculate square roots on the abacus. I don’t know that there are cube-root schemes also. I would bet on there being such, though.

Never mind, though. The suspicious thing I expect you’ve noticed is the steps being done. They’re represented as sliding beads along rods, yes. But the meaning of these steps? They’re the same steps you would do by doing arithmetic on paper. Sliding two beads and then two more beads up to the center bar isn’t any different from looking at 2 + 2 and representing that as 4. All this ten’s-complement stuff to subtract one number from another is just … well, I learned it as subtraction by “borrowing”. I don’t know the present techniques but I’m sure they’re at heart the same. But the work is eerily like what you would do on paper, using Arabic numerals.

The slide rule uses a logarithm-based ruler. This makes the addition of distances along the slides match the multiplication of the values of the rulers. What does the abacus do to help us compute?

Why use an abacus?

What an abacus gives us is memory. It stores numbers. It lets us break a big problem into a series of small problems. It lets us keep a partial computation while we work through those steps. We don’t add 35.2 to 6.4. We add 3 to 0 and 5 to 6 and 2 to 4. We don’t multiply 2038 by 3.7. We multiply 8 by 7, and 8 by 3, and 3 by 7, and 3 by 3, and so on.

And this is most of numerical computing, even today. We describe what we want to do as these high-level operations. But the computation is a lot of calculations, each one of them simple. We use some memory to hold partially completed results. Memory, the ability to store results, lets us change hard problems into long strings of simple ones.

We do more things the way the abacus encourages. We even use those complementary numbers. Not five’s or ten’s complements, not with binary arithmetic computers. Two’s complement arithmetic makes it possible to subtract, or write negative numbers, in ways that are easy to calculate. That there are a set number of rods even has its parallel in modern computing. When representing a real number on the computer we have only so many decimal places. (Yes, yes, binary digit places.) At least unless we use a weird data structure. This affects our calculations. There are numbers we can’t represent perfectly, such as one-third. We need to think about whether this affects what we are using our calculation for.

There are major differences between a digital computer and a person using the abacus. But the processes are similar. This may help us to understand why computational science works the way it does. It may at least help us understand those contests in the 1950s where the abacus user was faster than the calculator user.

But no, I confess, I only use mine for decoration, or will when I find it again.


Thank you for reading. All the Fall 2019 A To Z posts should be at this link. Furthermore, both this year’s and all past A To Z sequences should be at this link. And I am still soliciting subjects for the first third of the alphabet.

How To Find Logarithms Without Using Powerful Computers


I got to remembering an old sequence of mine, and wanted to share it for my current audience. A couple years ago I read a 1949-published book about numerical computing. And it addressed a problem I knew existed but hadn’t put much thought into. That is, how to calculate the logarithm of a number? Logarithms … well, we maybe don’t need them so much now. But they were indispensable for computing for a very long time. They turn the difficult work of multiplication and division into the easier work of addition and subtraction. They turn the really hard work of exponentiation into the easier work of multiplication. So they’re great to have. But how to get them? And, particularly, how to get them if you have a computing device that’s able to do work, but not very much work?

Machines That Think About Logarithms sets out the question, including mentioning Edmund Callis Berkeley’s book that got me started on this. And some talk about the kinds of logarithms and why we use each of them.

Machines That Do Something About Logarithms sets out some principles. These are all things that are generically true about logarithms, including about calculating logarithms. They’re just the principles that were put into clever play by Harvard’s IBM Automatic Sequence-Controlled Calculator in the 1940s.

Machines That Give You Logarithms explains how to use those tools. And lays out how to get the base-ten logarithm for most numbers that you would like with a tiny bit of computing work. I showed off an example of getting the logarithm of 47.2286 using only three divisions, four additions, and a little bit of looking up stuff.

Without Machines That Think About Logarithms closes out the cycle. One catch with the algorithm described is that you need to work out some logarithms ahead of time and have them on hand, ready to look up. They’re not ones that you care about particularly for any problem, but they make it easier to find the logarithm you do want. This essay talks about which logarithms to calculate, in order to get the most accurate results for the logarithm you want, using the least custom work possible.

And there we go. Logarithms are still indispensable for mathematical work, although I realize not so much because we ever care what the logarithm of 47.2286 or any other arbitrary number is. Logarithms have some nice analytic properties, though, and they make other work easier to do. So they’re still in use, but for different problems.

My 2018 Mathematics A To Z: Witch of Agnesi


Nobody had a suggested topic starting with ‘W’ for me! So I’ll take that as a free choice, and get lightly autobiogrpahical.

Cartoon of a thinking coati (it's a raccoon-like animal from Latin America); beside him are spelled out on Scrabble titles, 'MATHEMATICS A TO Z', on a starry background. Various arithmetic symbols are constellations in the background.
Art by Thomas K Dye, creator of the web comics Newshounds, Something Happens, and Infinity Refugees. His current project is Projection Edge. And you can get Projection Edge six months ahead of public publication by subscribing to his Patreon. And he’s on Twitter as @Newshoundscomic.

Witch of Agnesi.

I know I encountered the Witch of Agnesi while in middle school. Eighth grade, if I’m not mistaken. It was a footnote in a textbook. I don’t remember much of the textbook. What I mostly remember of the course was how much I did not fit with the teacher. The only relief from boredom that year was the month we had a substitute and the occasional interesting footnote.

It was in a chapter about graphing equations. That is, finding curves whose points have coordinates that satisfy some equation. In a bit of relief from lines and parabolas the footnote offered this:

y = \frac{8a^3}{x^2 + 4a^2}

In a weird tantalizing moment the footnote didn’t offer a picture. Or say what an ‘a’ was doing in there. In retrospect I recognize ‘a’ as a parameter, and that different values of it give different but related shapes. No hint what the ‘8’ or the ‘4’ were doing there. Nor why ‘a’ gets raised to the third power in the numerator or the second in the denominator. I did my best with the tools I had at the time. Picked a nice easy boring ‘a’. Picked out values of ‘x’ and found the corresponding ‘y’ which made the equation true, and tried connecting the dots. The result didn’t look anything like a witch. Nor a witch’s hat.

It was one of a handful of biographical notes in the book. These were a little attempt to add some historical context to mathematics. It wasn’t much. But it was an attempt to show that mathematics came from people. Including, here, from Maria Gaëtana Agnesi. She was, I’m certain, the only woman mentioned in the textbook I’ve otherwise completely forgotten.

We have few names of ancient mathematicians. Those we have are often compilers like Euclid whose fame obliterated the people whose work they explained. Or they’re like Pythagoras, credited with discoveries by people who obliterated their own identities. In later times we have the mathematics done by, mostly, people whose social positions gave them time to write mathematics results. So we see centuries where every mathematician is doing it as their side hustle to being a priest or lawyer or physician or combination of these. Women don’t get the chance to stand out here.

Today of course we can name many women who did, and do, mathematics. We can name Emmy Noether, Ada Lovelace, and Marie-Sophie Germain. Challenged to do a bit more, we can offer Florence Nightingale and Sofia Kovalevskaya. Well, and also Grace Hopper and Margaret Hamilton if we decide computer scientists count. Katherine Johnson looks likely to make that cut. But in any case none of these people are known for work understandable in a pre-algebra textbook. This must be why Agnesi earned a place in this book. She’s among the earliest women we can specifically credit with doing noteworthy mathematics. (Also physics, but that’s off point for me.) Her curve might be a little advanced for that textbook’s intended audience. But it’s not far off, and pondering questions like “why 8a^3 ? Why not a^3 ?” is more pleasant, to a certain personality, than pondering what a directrix might be and why we might use one.

The equation might be a lousy way to visualize the curve described. The curve is one of that group of interesting shapes you get by constructions. That is, following some novel process. Constructions are fun. They’re almost a craft project.

For this we start with a circle. And two parallel tangent lines. Without loss of generality, suppose they’re horizontal, so, there’s lines at the top and the bottom of the curve.

Take one of the two tangent points. Again without loss of generality, let’s say the bottom one. Draw a line from that point over to the other line. Anywhere on the other line. There’s a point where the line you drew intersects the circle. There’s another point where it intersects the other parallel line. We’ll find a new point by combining pieces of these two points. The point is on the same horizontal as wherever your line intersects the circle. It’s on the same vertical as wherever your line intersects the other parallel line. This point is on the Witch of Agnesi curve.

Now draw another line. Again, starting from the lower tangent point and going up to the other parallel line. Again it intersects the circle somewhere. This gives another point on the Witch of Agnesi curve. Draw another line. Another intersection with the circle, another intersection with the opposite parallel line. Another point on the Witch of Agnesi curve. And so on. Keep doing this. When you’ve drawn all the lines that reach from the tangent point to the other line, you’ll have generated the full Witch of Agnesi curve. This takes more work than writing out y = \frac{8a^3}{x^2 + 4a^2} , yes. But it’s more fun. It makes for neat animations. And I think it prepares us to expect the shape of the curve.

It’s a neat curve. Between it and the lower parallel line is an area four times that of the circle that generated it. The shape is one we would get from looking at the derivative of the arctangent. So there’s some reasons someone working in calculus might find it interesting. And people did. Pierre de Fermat studied it, and found this area. Isaac Newton and Luigi Guido Grandi studied the shape, using this circle-and-parallel-lines construction. Maria Agnesi’s name attached to it after she published a calculus textbook which examined this curve. She showed, according to people who present themselves as having read her book, the curve and how to find it. And she showed its equation and found the vertex and asymptote line and the inflection points. The inflection points, here, are where the curve chances from being cupped upward to cupping downward, or vice-versa.

It’s a neat function. It’s got some uses. It’s a natural smooth-hill shape, for example. So this makes a good generic landscape feature if you’re modeling the flow over a surface. I read that solitary waves can have this curve’s shape, too.

And the curve turns up as a probability distribution. Take a fixed point. Pick lines at random that pass through this point. See where those lines reach a separate, straight line. Some regions are more likely to be intersected than are others. Chart how often any particular line is the new intersection point. That chart will (given some assumptions I ask you to pretend you agree with) be a Witch of Agnesi curve. This might not surprise you. It seems inevitable from the circle-and-intersecting-line construction process. And that’s nice enough. As a distribution it looks like the usual Gaussian bell curve.

It’s different, though. And it’s different in strange ways. Like, for a probability distribution we can find an expected value. That’s … well, what it sounds like. But this is the strange probability distribution for which the law of large numbers does not work. Imagine an experiment that produces real numbers, with the frequency of each number given by this distribution. Run the experiment zillions of times. What’s the mean value of all the zillions of generated numbers? And it … doesn’t … have one. I mean, we know it ought to, it should be the center of that hill. But the calculations for that don’t work right. Taking a bigger sample makes the sample mean jump around more, not less, the way every other distribution should work. It’s a weird idea.

Imagine carving a block of wood in the shape of this curve, with a horizontal lower bound and the Witch of Agnesi curve as the upper bound. Where would it balance? … The normal mathematical tools don’t say, even though the shape has an obvious line of symmetry. And a finite area. You don’t get this kind of weirdness with parabolas.

(Yes, you’ll get a balancing point if you actually carve a real one. This is because you work with finitely-long blocks of wood. Imagine you had a block of wood infinite in length. Then you would see some strange behavior.)

It teaches us more strange things, though. Consider interpolations, that is, taking a couple data points and fitting a curve to them. We usually start out looking for polynomials when we interpolate data points. This is because everything is polynomials. Toss in more data points. We need a higher-order polynomial, but we can usually fit all the given points. But sometimes polynomials won’t work. A problem called Runge’s Phenomenon can happen, where the more data points you have the worse your polynomial interpolation is. The Witch of Agnesi curve is one of those. Carl Runge used points on this curve, and trying to fit polynomials to those points, to discover the problem. More data and higher-order polynomials make for worse interpolations. You get curves that look less and less like the original Witch. Runge is himself famous to mathematicians, known for “Runge-Kutta”. That’s a family of techniques to solve differential equations numerically. I don’t know whether Runge came to the weirdness of the Witch of Agnesi curve from considering how errors build in numerical integration. I can imagine it, though. The topics feel related to me.

I understand how none of this could fit that textbook’s slender footnote. I’m not sure any of the really good parts of the Witch of Agnesi could even fit thematically in that textbook. At least beyond the fact of its interesting name, which any good blog about the curve will explain. That there was no picture, and that the equation was beyond what the textbook had been describing, made it a challenge. Maybe not seeing what the shape was teased the mathematician out of this bored student.


And next is ‘X’. Will I take Mr Wu’s suggestion and use that to describe something “extreme”? Or will I take another topic or suggestion? We’ll see on Friday, barring unpleasant surprises. Thanks for reading.

I Don’t Have Any Good Ideas For Finding Cube Roots By Trigonometry


So I did a bit of thinking. There’s a prosthaphaeretic rule that lets you calculate square roots using nothing more than trigonometric functions. Is there one that lets you calculate cube roots?

And I don’t know. I don’t see where there is one. I may be overlooking an approach, though. Let me outline what I’ve thought out.

First is square roots. It’s possible to find the square root of a number between 0 and 1 using arc-cosine and cosine functions. This is done by using a trigonometric identity called the double-angle formula. This formula, normally, you use if you know the cosine of a particular angle named θ and want the cosine of double that angle:

\cos\left(2\theta\right) = 2 \cos^2\left(\theta\right) - 1

If we suppose the number whose square we want is \cos^2\left(\theta\right) then we can find \cos\left(\theta\right) . The calculation on the right-hand side of this is easy; double your number and subtract one. Then to the lookup table; find the angle whose cosine is that number. That angle is two times θ. So divide that angle in two. Cosine of that is, well, \cos\left(\theta\right) and most people would agree that’s a square root of \cos^2\left(\theta\right) without any further work.

Why can’t I do the same thing with a triple-angle formula? … Well, here’s my choices among the normal trig functions:

\cos\left(3\theta\right) = 4 \cos^3\left(\theta\right) - 3\cos\left(\theta\right)

\sin\left(3\theta\right) = 3 \sin\left(\theta\right) - 4\sin^3\left(\theta\right)

\tan\left(3\theta\right) = \frac{3 \tan\left(\theta\right) - \tan^3\left(\theta\right)}{1 - 3 \tan^2\left(\theta\right)}

Yes, I see you in the corner, hopping up and down and asking about the cosecant. It’s not any better. Trust me.

So you see the problem here. The number whose cube root I want has to be the \cos^3\left(\theta\right) . Or the cube of the sine of theta, or the cube of the tangent of theta. Whatever. The trouble is I don’t see a way to calculate cosine (sine, tangent) of 3θ, or 3 times the cosine (etc) of θ. Nor to get some other simple expression out of that. I can get mixtures of the cosine of 3θ plus the cosine of θ, sure. But that doesn’t help me figure out what θ is.

Can it be worked out? Oh, sure, yes. There’s absolutely approximation schemes that would let me find a value of θ which makes true, say,

4 \cos^3\left(\theta\right) - 3 \cos\left(\theta\right) = 0.5

But: is there a way takes less work than some ordinary method of calculating a cube root? Even if you allow some work to be done by someone else ahead of time, such as by computing a table of trig functions? … If there is, I don’t see it. So there’s another point in favor of logarithms. Finding a cube root using a logarithm table is no harder than finding a square root, or any other root.

If you’re using trig tables, you can find a square root, or a fourth root, or an eighth root. Cube roots, if I’m not missing something, are beyond us. So are, I imagine, fifth roots and sixth roots and seventh roots and so on. I could protest that I have never in my life cared what the seventh root of a thing is, but it would sound like a declaration of sour grapes. Too bad.

If I have missed something, it’s probably obvious. Please go ahead and tell me what it is.

How To Calculate A Square Root By A Method You Will Never Actually Use


Sunday’s comics post got me thinking about ways to calculate square roots besides using the square root function on a calculator. I wondered if I could find my own little approach. Maybe something that isn’t iterative. Iterative methods are great in that they tend to forgive numerical errors. All numerical calculations carry errors with them. But they can involve a lot of calculation and, in principle, never finish. You just give up when you think the answer is good enough. A non-iterative method carries the promise that things will, someday, end.

And I found one! It’s a neat little way to find the square root of a number between 0 and 1. Call the number ‘S’, as in square. I’ll give you the square root from it. Here’s how.

First, take S. Multiply S by two. Then subtract 1 from this.

Next. Find the angle — I shall call it 2A — whose cosine is this number 2S – 1.

You have 2A? Great. Divide that in two, so that you get the angle A.

Now take the cosine of A. This will be the (positive) square root of S. (You can find the negative square root by taking minus this.)

Let me show it in action. Let’s say you want the square root of 0.25. So let S = 0.25. And then 2S – 1 is two times 0.25 (which is 0.50) minus 1. That’s -0.50. What angle has cosine of -0.50? Well, that’s an angle of 2 π / 3 radians. Mathematicians think in radians. People think in degrees. And you can do that too. This is 120 degrees. Divide this by two. That’s an angle of π / 3 radians, or 60 degrees. The cosine of π / 3 is 0.5. And, indeed, 0.5 is the square root of 0.25.

I hear you protesting already: what if we want the square root of something larger than 1? Like, how is this any good in finding the square root of 81? Well, if we add a little step before and after this work, we’re in good shape. Here’s what.

So we start with some number larger than 1. Say, 81. Fine. Divide it by 100. If it’s still larger than 100, divide it again, and again, until you get a number smaller than 1. Keep track of how many times you did this. In this case, 81 just has to be divided by 100 the one time. That gives us 0.81, a number which is smaller than 1.

Twice 0.81 minus 1 is equal to 0.62. The angle which has 0.81 as cosine is roughly 0.90205. Half this angle is about 0.45103. And the cosine of 0.45103 is 0.9. This is looking good, but obviously 0.9 is no square root of 81.

Ah, but? We divided 81 by 100 to get it smaller than 1. So we balance that by multiplying 0.9 by 10 to get it back larger than 1. If we had divided by 100 twice to start with, we’d multiply by 10 twice to finish. If we had divided by 100 six times to start with, we’d multiply by 10 six times to finish. Yes, 10 is the square root of 100. You see what’s going on here.

(And if you want the square root of a tiny number, something smaller than 0.01, it’s not a bad idea to multiply it by 100, maybe several times over. Then calculate the square root, and divide the result by 10 a matching number of times. It’s hard to calculate with very big or with very small numbers. If you must calculate, do it on very medium numbers. This is one of those little things you learn in numerical mathematics.)

So maybe now you’re convinced this works. You may not be convinced of why this works. What I’m using here is a trigonometric identity, one of the angle-doubling formulas. Its heart is this identity. It’s familiar to students whose Intro to Trigonometry class is making them finally, irrecoverably hate mathematics:

\cos\left(2\theta\right) = 2 \cos^2\left(\theta\right) - 1

Here, I let ‘S’ be the squared number, \cos^2\left(\theta\right) . So then anything I do to find \cos\left(\theta\right) gets me the square root. The algebra here is straightforward. Since ‘S’ is that cosine-squared thing, all I have to do is double it, subtract one, and then find what angle 2θ has that number as cosine. Then the cosine of θ has to be the square root.

Oh, yeah, all right. There’s an extra little objection. In what world is it easier to take an arc-cosine (to figure out what 2θ is) and then later to take a cosine? … And the answer is, well, any world where you’ve already got a table printed out of cosines of angles and don’t have a calculator on hand. This would be a common condition through to about 1975. And not all that ridiculous through to about 1990.

This is an example of a prosthaphaeretic rule. These are calculation tools. They’re used to convert multiplication or division problems into addition and subtraction. The idea is exactly like that of logarithms and exponents. Using trig functions predates logarithms. People knew about sines and cosines long before they knew about logarithms and exponentials. But the impulse is the same. And you might, if you squint, see in my little method here an echo of what you’d do more easily with a logarithm table. If you had a log table, you’d calculate \exp\left(\frac{1}{2}\log\left(S\right)\right) instead. But if you don’t have a log table, and only have a table of cosines, you can calculate \cos\left(\frac{1}{2}\arccos\left(2 S - 1 \right)\right) at least.

Is this easier than normal methods of finding square roots? … If you have a table of cosines, yes. Definitely. You have to scale the number into range (divide by 100 some) do an easy multiplication (S times 2), an easy subtraction (minus 1), a table lookup (arccosine), an easy division (divide by 2), another table lookup (cosine), and scale the number up again (multiply by 10 some). That’s all. Seven steps, and two of them are reading. Two of the rest are multiplying or dividing by 10’s. Using logarithm tables has it beat, yes, at five steps (two that are scaling, two that are reading, one that’s dividing by 2). But if you can’t find your table of logarithms, and do have a table of cosines, you’re set.

This may not be practical, since who has a table of cosines anymore? Who hasn’t also got a calculator that does square roots faster? But it delighted me to work this scheme out. Give me a while and maybe I’ll think about cube roots.

Reading the Comics, October 4, 2016: Split Week Edition Part 1


The last week in mathematically themed comics was a pleasant one. By “a pleasant one” I mean Comic Strip Master Command sent enough comics out that I feel comfortable splitting them across two essays. Look for the other half of the past week’s strips in a couple days at a very similar URL.

Mac King and Bill King’s Magic in a Minute feature for the 2nd shows off a bit of number-pattern wonder. Set numbers in order on a four-by-four grid and select four as directed and add them up. You get the same number every time. It’s a cute trick. I would not be surprised if there’s some good group theory questions underlying this, like about what different ways one could arrange the numbers 1 through 16. Or for what other size grids the pattern will work for: 2 by 2? (Obviously.) 3 by 3? 5 by 5? 6 by 6? I’m not saying I actually have been having fun doing this. I just sense there’s fun to be had there.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 2nd is based on one of those weirdnesses of the way computers add. I remember in the 90s being on a Java mailing list. Routinely it would draw questions from people worried that something was very wrong, as adding 0.01 to a running total repeatedly wouldn’t get to exactly 1.00. Java was working correctly, in that it was doing what the specifications said. It’s just the specifications didn’t work quite like new programmers expected.

What’s going on here is the same problem you get if you write down 1/3 as 0.333. You know that 1/3 plus 1/3 plus 1/3 ought to be 1 exactly. But 0.333 plus 0.333 plus 0.333 is 0.999. 1/3 is really a little bit more than 0.333, but we skip that part because it’s convenient to use only a few points past the decimal. Computers normally represent real-valued numbers with a scheme called floating point representation. At heart, that’s representing numbers with a couple of digits. Enough that we don’t normally see the difference between the number we want and the number the computer represents.

Every number base has some rational numbers it can’t represent exactly using finitely many digits. Our normal base ten, for example, has “one-third” and “two-third”. Floating point arithmetic is built on base two, and that has some problems with tenths and hundredths and thousandths. That’s embarrassing but in the main harmless. Programmers learn about these problems and how to handle them. And if they ask the mathematicians we tell them how to write code so as to keep these floating-point errors from growing uncontrollably. If they ask nice.

Random Acts of Nancy for the 3rd is a panel from Ernie Bushmiller’s Nancy. That panel’s from the 23rd of November, 1946. And it just uses mathematics in passing, arithmetic serving the role of most of Nancy’s homework. There’s a bit of spelling (I suppose) in there too, which probably just represents what’s going to read most cleanly. Random Acts is curated by Ernie Bushmiller fans Guy Gilchrist (who draws the current Nancy) and John Lotshaw.

Thom Bluemel’s Birdbrains for the 4th depicts the discovery of a new highest number. When humans discovered ‘1’ is, I would imagine, probably unknowable. Given the number sense that animals have it’s probably something that predates humans, that it’s something we’re evolved to recognize and understand. A single stroke for 1 seems to be a common symbol for the number. I’ve read histories claiming that a culture’s symbol for ‘1’ is often what they use for any kind of tally mark. Obviously nothing in human cultures is truly universal. But when I look at number symbols other than the Arabic and Roman schemes I’m used to, it is usually the symbol for ‘1’ that feels familiar. Then I get to the Thai numeral and shrug at my helplessness.

Bill Amend’s FoxTrot Classics for the 4th is a rerun of the strip from the 11th of October, 2005. And it’s made for mathematics people to clip out and post on the walls. Jason and Marcus are in their traditional nerdly way calling out sequences of numbers. Jason’s is the Fibonacci Sequence, which is as famous as mathematics sequences get. That’s the sequence of numbers in which every number is the sum of the previous two terms. You can start that sequence with 0 and 1, or with 1 and 1, or with 1 and 2. It doesn’t matter.

Marcus calls out the Perrin Sequence, which I neve heard of before either. It’s like the Fibonacci Sequence. Each term in it is the sum of two other terms. Specifically, each term is the sum of the second-previous and the third-previous terms. And it starts with the numbers 3, 0, and 2. The sequence is named for François Perrin, who described it in 1899, and that’s as much as I know about him. The sequence describes some interesting stuff. Take n points and put them in a ‘cycle graph’, which looks to the untrained eye like a polygon with n corners and n sides. You can pick subsets of those points. A maximal independent set is the biggest subset you can make that doesn’t fit into another subset. And the number of these maximal independent sets in a cyclic graph is the n-th number in the Perrin sequence. I admit this seems like a nice but not compelling thing to know. But I’m not a cyclic graph kind of person so what do I know?

Jeffrey Caulfield and Alexandre Rouillard’s Mustard and Boloney for the 4th is the anthropomorphic numerals joke for this essay and I was starting to worry we wouldn’t get one.

Theorem Thursday: A First Fixed Point Theorem


I’m going to let the Mean Value Theorem slide a while. I feel more like a Fixed Point Theorem today. As with the Mean Value Theorem there’s several of these. Here I’ll start with an easy one.

The Fixed Point Theorem.

Back when the world and I were young I would play with electronic calculators. They encouraged play. They made it so easy to enter a number and hit an operation, and then hit that operation again, and again and again. Patterns appeared. Start with, say, ‘2’ and hit the ‘squared’ button, the smaller ‘2’ raised up from the key’s baseline. You got 4. And again: 16. And again: 256. And again and again and you got ever-huger numbers. This happened whenever you started from a number bigger than 1. Start from something smaller than 1, however tiny, and it dwindled down to zero, whatever you tried. Start at ‘1’ and it just stays there. The results were similar if you started with negative numbers. The first squaring put you in positive numbers and everything carried on as before.

This sort of thing happened a lot. Keep hitting the mysterious ‘exp’ and the numbers would keep growing forever. Keep hitting ‘sqrt’; if you started above 1, the numbers dwindled to 1. Start below and the numbers rise to 1. Or you started at zero, but who’s boring enough to do that? ‘log’ would start with positive numbers and keep dropping until it turned into a negative number. The next step was the calculator’s protest we were unleashing madness on the world.

But you didn’t always get zero, one, infinity, or madness, from repeatedly hitting the calculator button. Sometimes, some functions, you’d get an interesting number. If you picked any old number and hit cosine over and over the digits would eventually settle down to around 0.739085. Or -0.739085. Cosine’s great. Tangent … tangent is weird. Tangent does all sorts of bizarre stuff. But at least cosine is there, giving us this interesting number.

(Something you might wonder: this is the cosine of an angle measured in radians, which is how mathematicians naturally think of angles. Normal people measure angles in degrees, and that will have a different fixed point. We write both the cosine-in-radians and the cosine-in-degrees using the shorthand ‘cos’. We get away with this because people who are confused by this are too embarrassed to call us out on it. If we’re thoughtful we write, say, ‘cos x’ for radians and ‘cos x°’ for degrees. This makes the difference obvious. It doesn’t really, but at least we gave some hint to the reader.)

This all is an example of a fixed point theorem. Fixed point theorems turn up in a lot of fields. They were most impressed upon me in dynamical systems, studying how a complex system changes in time. A fixed point, for these problems, is an equilibrium. It’s where things aren’t changed by a process. You can see where that’s interesting.

In this series I haven’t stated theorems exactly much, and I haven’t given them real proofs. But this is an easy one to state and to prove. Start off with a function, which I’ll name ‘f’, because yes that is exactly how much effort goes in to naming functions. It has as a domain the interval [a, b] for some real numbers ‘a’ and ‘b’. And it has as rang the same interval, [a, b]. It might use the whole range; it might use only a subset of it. And we have to require that f is continuous.

Then there has to be at least one fixed point. There must be at last one number ‘c’, somewhere in the interval [a, b], for which f(c) equals c. There may be more than one; we don’t say anything about how many there are. And it can happen that c is equal to a. Or that c equals b. We don’t know that it is or that it isn’t. We just know there’s at least one ‘c’ that makes f(c) equal c.

You get that in my various examples. If the function f has the rule that any given x is matched to x2, then we do get two fixed points: f(0) = 02 = 0, and, f(1) = 12 = 1. Or if f has the rule that any given x is matched to the square root of x, then again we have: f(0) = \sqrt{0} = 0 and f(1) = \sqrt{1} = 1 . Same old boring fixed points. The cosine is a little more interesting. For that we have f(0.739085...) = \cos\left(0.739085...\right) = 0.739085... .

How to prove it? The easiest way I know is to summon the Intermediate Value Theorem. Since I wrote a couple hundred words about that a few weeks ago I can assume you to understand it perfectly and have no question about how it makes this problem easy. I don’t even need to go on, do I?

… Yeah, fair enough. Well, here’s how to do it. We’ll take the original function f and create, based on it, a new function. We’ll dig deep in the alphabet and name that ‘g’. It has the same domain as f, [a, b]. Its range is … oh, well, something in the real numbers. Don’t care. The wonder comes from the rule we use.

The rule for ‘g’ is this: match the given number ‘x’ with the number ‘f(x) – x’. That is, g(a) equals whatever f(a) would be, minus a. g(b) equals whatever f(b) would be, minus b. We’re allowed to define a function in terms of some other function, as long as the symbols are meaningful. But we aren’t doing anything wrong like dividing by zero or taking the logarithm of a negative number or asking for f where it isn’t defined.

You might protest that we don’t know what the rule for f is. We’re told there is one, and that it’s a continuous function, but nothing more. So how can I say I’ve defined g in terms of a function I don’t know?

In the first place, I already know everything about f that I need to. I know it’s a continuous function defined on the interval [a, b]. I won’t use any more than that about it. And that’s great. A theorem that doesn’t require knowing much about a function is one that applies to more functions. It’s like the difference between being able to say something true of all living things in North America, and being able to say something true of all persons born in Redbank, New Jersey, on the 18th of February, 1944, who are presently between 68 and 70 inches tall and working on their rock operas. Both things may be true, but one of those things you probably use more.

In the second place, suppose I gave you a specific rule for f. Let me say, oh, f matches x with the arccosecant of x. Are you feeling any more enlightened now? Didn’t think so.

Back to g. Here’s some things we can say for sure about it. g is a function defined on the interval [a, b]. That’s how we set it up. Next point: g is a continuous function on the interval [a, b]. Remember, g is just the function f, which was continuous, minus x, which is also continuous. The difference of two continuous functions is still going to be continuous. (This is obvious, although it may take some considered thinking to realize why it is obvious.)

Now some interesting stuff. What is g(a)? Well, it’s whatever number f(a) is minus a. I can’t tell you what number that is. But I can tell you this: it’s not negative. Remember that f(a) has to be some number in the interval [a, b]. That is, it’s got to be no smaller than a. So the smallest f(a) can be is equal to a, in which case f(a) minus a is zero. And f(a) might be larger than a, in which case f(a) minus a is positive. So g(a) is either zero or a positive number.

(If you’ve just realized where I’m going and gasped in delight, well done. If you haven’t, don’t worry. You will. You’re just out of practice.)

What about g(b)? Since I don’t know what f(b) is, I can’t tell you what specific number it is. But I can tell you it’s not a positive number. The reasoning is just like above: f(b) is some number on the interval [a, b]. So the biggest number f(b) can equal is b. And in that case f(b) minus b is zero. If f(b) is any smaller than b, then f(b) minus b is negative. So g(b) is either zero or a negative number.

(Smiling at this? Good job. If you aren’t, again, not to worry. This sort of argument is not the kind of thing you do in Boring Algebra. It takes time and practice to think this way.)

And now the Intermediate Value Theorem works. g(a) is a positive number. g(b) is a negative number. g is continuous from a to b. Therefore, there must be some number ‘c’, between a and b, for which g(c) equals zero. And remember what g(c) means: f(c) – c equals 0. Therefore f(c) has to equal c. There has to be a fixed point.

And some tidying up. Like I said, g(a) might be positive. It might also be zero. But if g(a) is zero, then f(a) – a = 0. So a would be a fixed point. And similarly if g(b) is zero, then f(b) – b = 0. So then b would be a fixed point. The important thing is there must be at least some fixed point.

Now that calculator play starts taking on purposeful shape. Squaring a number could find a fixed point only if you started with a number from -1 to 1. The square of a number outside this range, such as ‘2’, would be bigger than you started with, and the Fixed Point Theorem doesn’t apply. Similarly with exponentials. But square roots? The square root of any number from 0 to a positive number ‘b’ is a number between 0 and ‘b’, at least as long as b was bigger than 1. So there was a fixed point, at 1. The cosine of a real number is some number between -1 and 1, and the cosines of all the numbers between -1 and 1 are themselves between -1 and 1. The Fixed Point Theorem applies. Tangent isn’t a continuous function. And the calculator play never settles on anything.

As with the Intermediate Value Theorem, this is an existence proof. It guarantees there is a fixed point. It doesn’t tell us how to find one. Calculator play does, though. Start from any old number that looks promising and work out f for that number. Then take that and put it back into f. And again. And again. This is known as “fixed point iteration”. It won’t give you the exact answer.

Not usually, anyway. In some freak cases it will. But what it will give, provided some extra conditions are satisfied, is a sequence of values that get closer and closer to the fixed point. When you’re close enough, then you stop calculating. How do you know you’re close enough? If you know something about the original f you can work out some logically rigorous estimates. Or you just keep calculating until all the decimal points you want stop changing between iterations. That’s not logically sound, but it’s easy to program.

That won’t always work. It’ll only work if the function f is differentiable on the interval (a, b). That is, it can’t have corners. And there have to be limits on how fast the function changes on the interval (a, b). If the function changes too fast, iteration can’t be guaranteed to work. But often if we’re interested in a function at all then these conditions will be true, or we can think of a related function that for which they are true.

And even if it works it won’t always work well. It can take an enormous pile of calculations to get near the fixed point. But this is why we have computers, and why we can leave them to work overnight.

And yet such a simple idea works. It appears in ancient times, in a formula for finding the square root of an arbitrary positive number ‘N’. (Find the fixed point for f(x) = \frac{1}{2}\left(\frac{N}{x} + x\right) ). It creeps into problems that don’t look like fixed points. Calculus students learn of something called the Newton-Raphson Iteration. It finds roots, points where a function f(x) equals zero. Mathematics majors learn of numerical methods to solve ordinary differential equations. The most stable of these are again fixed-point iteration schemes, albeit in disguise.

They all share this almost playful backbone.

Theorem Thursday: One Mean Value Theorem Of Many


For this week I have something I want to follow up on. We’ll see if I make it that far.

The Mean Value Theorem.

My subject line disagrees with the header just above here. I want to talk about the Mean Value Theorem. It’s one of those things that turns up in freshman calculus and then again in Analysis. It’s introduced as “the” Mean Value Theorem. But like many things in calculus it comes in several forms. So I figure to talk about one of them here, and another form in a while, when I’ve had time to make up drawings.

Calculus can split effortlessly into two kinds of things. One is differential calculus. This is the study of continuity and smoothness. It studies how a quantity changes if someting affecting it changes. It tells us how to optimize things. It tells us how to approximate complicated functions with simpler ones. Usually polynomials. It leads us to differential equations, problems in which the rate at which something changes depends on what value the thing has.

The other kind is integral calculus. This is the study of shapes and areas. It studies how infinitely many things, all infinitely small, add together. It tells us what the net change in things are. It tells us how to go from information about every point in a volume to information about the whole volume.

They aren’t really separate. Each kind informs the other, and gives us tools to use in studying the other. And they are almost mirrors of one another. Differentials and integrals are not quite inverses, but they come quite close. And as a result most of the important stuff you learn in differential calculus has an echo in integral calculus. The Mean Value Theorem is among them.

The Mean Value Theorem is a rule about functions. In this case it’s functions with a domain that’s an interval of the real numbers. I’ll use ‘a’ as the name for the smallest number in the domain and ‘b’ as the largest number. People talking about the Mean Value Theorem often do. The range is also the real numbers, although it doesn’t matter which ones.

I’ll call the function ‘f’ in accord with a longrunning tradition of not working too hard to name functions. What does matter is that ‘f’ is continuous on the interval [a, b]. I’ve described what ‘continuous’ means before. It means that here too.

And we need one more thing. The function f has to be differentiable on the interval (a, b). You maybe noticed that before I wrote [a, b], and here I just wrote (a, b). There’s a difference here. We need the function to be continuous on the “closed” interval [a, b]. That is, it’s got to be continuous for ‘a’, for ‘b’, and for every point in-between.

But we only need the function to be differentiable on the “open” interval (a, b). That is, it’s got to be continuous for all the points in-between ‘a’ and ‘b’. If it happens to be differentiable for ‘a’, or for ‘b’, or for both, that’s great. But we won’t turn away a function f for not being differentiable at those points. Only the interior. That sort of distinction between stuff true on the interior and stuff true on the boundaries is common. This is why mathematicians have words for “including the boundaries” (“closed”) and “never minding the boundaries” (“open”).

As to what “differentiable” is … A function is differentiable at a point if you can take its derivative at that point. I’m sure that clears everything up. There are many ways to describe what differentiability is. One that’s not too bad is to imagine zooming way in on the curve representing a function. If you start with a big old wobbly function it waves all around. But pick a point. Zoom in on that. Does the function stay all wobbly, or does it get more steady, more straight? Keep zooming in. Does it get even straighter still? If you zoomed in over and over again on the curve at some point, would it look almost exactly like a straight line?

If it does, then the function is differentiable at that point. It has a derivative there. The derivative’s value is whatever the slope of that line is. The slope is that thing you remember from taking Boring Algebra in high school. That rise-over-run thing. But this derivative is a great thing to know. You could approximate the original function with a straight line, with slope equal to that derivative. Close to that point, you’ll make a small enough error nobody has to worry about it.

That there will be this straight line approximation isn’t true for every function. Here’s an example. Picture a line that goes up and then takes a 90-degree turn to go back down again. Look at the corner. However close you zoom in on the corner, there’s going to be a corner. It’s never going to look like a straight line; there’s a 90-degree angle there. It can be a smaller angle if you like, but any sort of corner breaks this differentiability. This is a point where the function isn’t differentiable.

There are functions that are nothing but corners. They can be differentiable nowhere, or only at a tiny set of points that can be ignored. (A set of measure zero, as the dialect would put it.) Mathematicians discovered this over the course of the 19th century. They got into some good arguments about how that can even make sense. It can get worse. Also found in the 19th century were functions that are continuous only at a single point. This smashes just about everyone’s intuition. But we can’t find a definition of continuity that’s as useful as the one we use now and avoids that problem. So we accept that it implies some pathological conclusions and carry on as best we can.

Now I get to the Mean Value Theorem in its differential calculus pelage. It starts with the endpoints, ‘a’ and ‘b’, and the values of the function at those points, ‘f(a)’ and ‘f(b)’. And from here it’s easiest to figure what’s going on if you imagine the plot of a generic function f. I recommend drawing one. Just make sure you draw it without lifting the pen from paper, and without including any corners anywhere. Something wiggly.

Draw the line that connects the ends of the wiggly graph. Formally, we’re adding the line segment that connects the points with coordinates (a, f(a)) and (b, f(b)). That’s coordinate pairs, not intervals. That’s clear in the minds of the mathematicians who don’t see why not to use parentheses over and over like this. (We are short on good grouping symbols like parentheses and brackets and braces.)

Per the Mean Value Theorem, there is at least one point whose derivative is the same as the slope of that line segment. If you were to slide the line up or down, without changing its orientation, you’d find something wonderful. Most of the time this line intersects the curve, crossing from above to below or vice-versa. But there’ll be at least one point where the shifted line is “tangent”, where it just touches the original curve. Close to that touching point, the “tangent point”, the shifted line and the curve blend together and can’t be easily told apart. As long as the function is differentiable on the open interval (a, b), and continuous on the closed interval [a, b], this will be true. You might convince yourself of it by drawing a couple of curves and taking a straightedge to the results.

This is an existence theorem. Like the Intermediate Value Theorem, it doesn’t tell us which point, or points, make the thing we’re interested in true. It just promises us that there is some point that does it. So it gets used in other proofs. It lets us mix information about intervals and information about points.

It’s tempting to try using it numerically. It looks as if it justifies a common differential-calculus trick. Suppose we want to know the value of the derivative at a point. We could pick a little interval around that point and find the endpoints. And then find the slope of the line segment connecting the endpoints. And won’t that be close enough to the derivative at the point we care about?

Well. Um. No, we really can’t be sure about that. We don’t have any idea what interval might make the derivative of the point we care about equal to this line-segment slope. The Mean Value Theorem won’t tell us. It won’t even tell us if there exists an interval that would let that trick work. We can’t invoke the Mean Value Theorem to let us get away with that.

Often, though, we can get away with it. Differentiable functions do have to follow some rules. Among them is that if you do pick a small enough interval then approximations that look like this will work all right. If the function flutters around a lot, we need a smaller interval. But a lot of the functions we’re interested in don’t flutter around that much. So we can get away with it. And there’s some grounds to trust in getting away with it. The Mean Value Theorem isn’t any part of the grounds. It just looks so much like it ought to be.

I hope on a later Thursday to look at an integral-calculus form of the Mean Value Theorem.

What’s The Shortest Proof I’ve Done?


I didn’t figure to have a bookend for last week’s “What’s The Longest Proof I’ve Done? question. I don’t keep track of these things, after all. And the length of a proof must be a fluid concept. If I show something is a direct consequence of a previous theorem, is the proof’s length the two lines of new material? Or is it all the proof of the previous theorem plus two new lines?

I would think the shortest proof I’d done was showing that the logarithm of 1 is zero. This would be starting from the definition of the natural logarithm of a number x as the definite integral of 1/t on the interval from 1 to x. But that requires a bunch of analysis to support the proof. And the Intermediate Value Theorem. Does that stuff count? Why or why not?

But this happened to cross my desk: The Shortest-Known Paper Published in a Serious Math Journal: Two Succinct Sentences, an essay by Dan Colman. It reprints a paper by L J Lander and T R Parkin which appeared in the Bulletin of the American Mathematical Society in 1966.

It’s about Euler’s Sums of Powers Conjecture. This is a spinoff of Fermat’s Last Theorem. Leonhard Euler observed that you need at least two whole numbers so that their squares add up to a square. And you need three cubes of whole numbers to add up to the cube of a whole number. Euler speculated you needed four whole numbers so that their fourth powers add up to a fourth power, five whole numbers so that their fifth powers add up to a fifth power, and so on.

And it’s not so. Lander and Parkin found that this conjecture is false. They did it the new old-fashioned way: they set a computer to test cases. And they found four whole numbers whose fifth powers add up to a fifth power. So the quite short paper answers a long-standing question, and would be hard to beat for accessibility.

There is another famous short proof sometimes credited as the most wordless mathematical presentation. Frank Nelson Cole gave it on the 31st of October, 1903. It was about the Mersenne number 267-1, or in human notation, 147,573,952,589,676,412,927. It was already known the number wasn’t prime. (People wondered because numbers of the form 2n-1 often lead us to perfect numbers. And those are interesting.) But nobody knew which factors it was. Cole gave his talk by going up to the board, working out 267-1, and then moving to the other side of the board. There he wrote out 193,707,721 × 761,838,257,287, and showed what that was. Then, per legend, he sat down without ever saying a word, and took in the standing ovation.

I don’t want to cast aspersions on a great story like that. But mathematics is full of great stories that aren’t quite so. And I notice that one of Cole’s doctoral students was Eric Temple Bell. Bell gave us a great many tales of mathematics history that are grand and great stories that just weren’t so. So I want it noted that I don’t know where we get this story from, or how it may have changed in the retellings. But Cole’s proof is correct, at least according to Octave.

So not every proof is too long to fit in the universe. But then I notice that Mathworld’s page regarding the Euler Sum of Powers Conjecture doesn’t cite the 1966 paper. It cites instead Lander and Parkin’s “A Counterexample to Euler’s Sum of Powers Conjecture” from Mathematics of Computation volume 21, number 97, of 1967. There the paper has grown to three pages, although it’s only a couple paragraphs of one page and three lines of citation on the third. It’s not so easy to read either, but it does explain how they set about searching for counterexamples. But it may give you some better idea of how numerical mathematicians find things.

Theorem Thursday: What Is Cramer’s Rule?


KnotTheorist asked for this one during my appeal for theorems to discuss. And I’m taking an open interpretation of what a “theorem” is. I can do a rule.

Cramer’s Rule

I first learned of Cramer’s Rule in the way I expect most people do. It was an algebra course. I mean high school algebra. By high school algebra I mean you spend roughly eight hundred years learning ways to solve for x or to plot y versus x. Then take a pause for polar coordinates and matrices. Then you go back to finding both x and y.

Cramer’s Rule came up in the context of solving simultaneous equations. You have more than one variable. So x and y. Maybe z. Maybe even a w, before whoever set up the problem gives up and renames everything x1 and x2 and x62 and all that. You also have more than one equation. In fact, you have exactly as many equations as you have variables. Are there any sets of values those variables can have which make all those variable true simultaneously? Thus the imaginative name “simultaneous equations” or the search for “simultaneous solutions”.

If all the equations are linear then we can always say whether there’s simultaneous solutions. By “linear” we mean what we always mean in mathematics, which is, “something we can handle”. But more exactly it means the equations have x and y and whatever other variables only to the first power. No x-squared or square roots of y or tangents of z or anything. (The equations are also allowed to omit a variable. That is, if you have one equation with x, y, and z, and another with just x and z, and another with just y and z, that’s fine. We pretend the missing variable is there and just multiplied by zero, and proceed as before.) One way to find these solutions is with Cramer’s Rule.

Cramer’s Rule sets up some matrices based on the system of equations. If the system has two equations, it sets up three matrices. If the system has three equations, it sets up four matrices. If the system has twelve equations, it sets up thirteen matrices. You see the pattern here. And then you can take the determinant of each of these matrices. Dividing the determinant of one of these matrices by another one tells you what value of x makes all the equations true. Dividing the determinant of another matrix by the determinant of one of these matrices tells you which values of y makes all the equations true. And so on. The Rule tells you which determinants to use. It also says what it means if the determinant you want to divide by equals zero. It means there’s either no set of simultaneous solutions or there’s infinitely many solutions.

This gets dropped on us students in the vain effort to convince us knowing how to calculate determinants is worth it. It’s not that determinants aren’t worth knowing. It’s just that they don’t seem to tell us anything we care about. Not until we get into mappings and calculus and differential equations and other mathematics-major stuff. We never see it in high school.

And the hard part of determinants is that for all the cool stuff they tell us, they take forever to calculate. The determinant for a matrix with two rows and two columns isn’t bad. Three rows and three columns is getting bad. Four rows and four columns is awful. The determinant for a matrix with five rows and five columns you only ever calculate if you’ve made your teacher extremely cross with you.

So there’s the genius and the first problem with Cramer’s Rule. It takes a lot of calculating. Many any errors along the way with the calculation and your work is wrong. And worse, it won’t be wrong in an obvious way. You can find the error only by going over every single step and hoping to catch the spot where you, somehow, got 36 times -7 minus 21 times -8 wrong.

The second problem is nobody in high school algebra mentions why systems of linear equations should be interesting to solve. Oh, maybe they’ll explain how this is the work you do to figure out where two straight lines intersect. But that just shifts the “and we care because … ?” problem back one step. Later on we might come to understand the lines represent cases where something we’re interested in is true, or where it changes from true to false.

This sort of simultaneous-solution problem turns up naturally in optimization problems. These are problems where you try to find a maximum subject to some constraints. Or find a minimum. Maximums and minimums are the same thing when you think about them long enough. If all the constraints can be satisfied at once and you get a maximum (or minimum, whatever), great! If they can’t … Well, you can study how close it’s possible to get, and what happens if you loosen one or more constraint. That’s worth knowing about.

The third problem with Cramer’s Rule is that, as a method, it kind of sucks. We can be convinced that simultaneous linear equations are worth solving, or at least that we have to solve them to get out of High School Algebra. And we have computers. They can grind away and work out thirteen determinants of twelve-row-by-twelve-column matrices. They might even get an answer back before the end of the term. (The amount of work needed for a determinant grows scary fast as the matrix gets bigger.) But all that work might be meaningless.

The trouble is that Cramer’s Rule is numerically unstable. Before I even explain what that is you already sense it’s a bad thing. Think of all the good things in your life you’ve heard described as unstable. Fair enough. But here’s what we mean by numerically unstable.

Is 1/3 equal to 0.3333333? No, and we know that. But is it close enough? Sure, most of the time. Suppose we need a third of sixty million. 0.3333333 times 60,000,000 equals 19,999,998. That’s a little off of the correct 20,000,000. But I bet you wouldn’t even notice the difference if nobody pointed it out to you. Even if you did notice it you might write off the difference. “If we must, make up the difference out of petty cash”, you might declare, as if that were quite sensible in the context.

And that’s so because this multiplication is numerically stable. Make a small error in either term and you get a proportional error in the result. A small mistake will — well, maybe it won’t stay small, necessarily. But it’ll not grow too fast too quickly.

So now you know intuitively what an unstable calculation is. This is one in which a small error doesn’t necessarily stay proportionally small. It might grow huge, arbitrarily huge, and in few calculations. So your answer might be computed just fine, but actually be meaningless.

This isn’t because of a flaw in the computer per se. That is, it’s working as designed. It’s just that we might need, effectively, infinitely many digits of precision for the result to be correct. You see where there may be problems achieving that.

Cramer’s Rule isn’t guaranteed to be nonsense, and that’s a relief. But it is vulnerable to this. You can set up problems that look harmless but which the computer can’t do. And that’s surely the worst of all worlds, since we wouldn’t bother calculating them numerically if it weren’t too hard to do by hand.

(Let me direct the reader who’s unintimidated by mathematical jargon, and who likes seeing a good Wikipedia Editors quarrel, to the Cramer’s Rule Talk Page. Specifically to the section “Cramer’s Rule is useless.”)

I don’t want to get too down on Cramer’s Rule. It’s not like the numerical instability hurts every problem you might use it on. And you can, at the cost of some more work, detect whether a particular set of equations will have instabilities. That requires a lot of calculation but if we have the computer to do the work fine. Let it. And a computer can limit its numerical instabilities if it can do symbolic manipulations. That is, if it can use the idea of “one-third” rather than 0.3333333. The software package Mathematica, for example, does symbolic manipulations very well. You can shed many numerical-instability problems, although you gain the problem of paying for a copy of Mathematica.

If you just care about, or just need, one of the variables then what the heck. Cramer’s Rule lets you solve for just one or just some of the variables. That seems like a niche application to me, but it is there.

And the Rule re-emerges in pure analysis, where numerical instability doesn’t matter. When we look to differential equations, for example, we often find solutions are combinations of several independent component functions. Bases, in fact. Testing whether we have found independent bases can be done through a thing called the Wronskian. That’s a way that Cramer’s Rule appears in differential equations.

Wikipedia also asserts the use of Cramer’s Rule in differential geometry. I believe that’s a true statement, and that it will be reflected in many mechanics problems. In these we can use our knowledge that, say, energy and angular momentum of a system are constant values to tell us something of how positions and velocities depend on each other. But I admit I’m not well-read in differential geometry. That’s something which has indeed caused me pain in my scholarly life. I don’t know whether differential geometers thank Cramer’s Rule for this insight or whether they’re just glad to have got all that out of the way. (See the above Wikipedia Editors quarrel.)

I admit for all this talk about Cramer’s Rule I haven’t said what it is. Not in enough detail to pass your high school algebra class. That’s all right. It’s easy to find. MathWorld has the rule in pretty simple form. Mathworld does forget to define what it means by the vector d. (It’s the vector with components d1, d2, et cetera.) But that’s enough technical detail. If you need to calculate something using it, you can probably look closer at the problem and see if you can do it another way instead. Or you’re in high school algebra and just have to slog through it. It’s all right. Eventually you can put x and y aside and do geometry.

A Leap Day 2016 Mathematics A To Z: Polynomials


I have another request for today’s Leap Day Mathematics A To Z term. Gaurish asked for something exciting. This should be less challenging than Dedekind Domains. I hope.

Polynomials.

Polynomials are everything. Everything in mathematics, anyway. If humans study it, it’s a polynomial. If we know anything about a mathematical construct, it’s because we ran across it while trying to understand polynomials.

I exaggerate. A tiny bit. Maybe by three percent. But polynomials are big.

They’re easy to recognize. We can get them in pre-algebra. We make them out of a set of numbers called coefficients and one or more variables. The coefficients are usually either real numbers or complex-valued numbers. The variables we usually allow to be either real or complex-valued numbers. We take each coefficient and multiply it by some power of each variable. And we add all that up. So, polynomials are things that look like these things:

x^2 - 2x + 1
12 x^4 + 2\pi x^2 y^3 - 4x^3 y - \sqrt{6}
\ln(2) + \frac{1}{2}\left(x - 2\right) - \frac{1}{2 \cdot 2^2}\left(x - 2\right)^2 + \frac{1}{2 \cdot 2^3}\left(x - 2\right)^3 - \frac{1}{2 \cdot 2^4}\left(x - 2\right)^4  + \cdots
a_n x^n + a_{n - 1}x^{n - 1} + a_{n - 2}x^{n - 2} + \cdots + a_2 x^2 + a_1 x^1 + a_0

The first polynomial maybe looks nice and comfortable. The second may look a little threatening, what with it having two variables and a square root in it, but it’s not too weird. The third is an infinitely long polynomial; you’re supposed to keep going on in that pattern, adding even more terms. The last is a generic representation of a polynomial. Each number a0, a1, a2, et cetera is some coefficient that we in principle know. It’s a good way of representing a polynomial when we want to work with it but don’t want to tie ourselves down to a particular example. The highest power we raise a variable to we call the degree of the polynomial. A second-degree polynomial, for example, has an x2 in it, but not an x3 or x4 or x18 or anything like that. A third-degree polynomial has an x3, but not x to any higher powers. Degree is a useful way of saying roughly how long a polynomial is, so it appears all over discussions of polynomials.

But why do we like polynomials? Why like them so much that MathWorld lists 1,163 pages that mention polynomials?

It’s because they’re great. They do everything we’d ever want to do and they’re great at it. We can add them together as easily as we add regular old numbers. We can subtract them as well. We can multiply and divide them. There’s even prime polynomials, just like there are prime numbers. They take longer to work out, but they’re not harder.

And they do great stuff in advanced mathematics too. In calculus we want to take derivatives of functions. Polynomials, we always can. We get another polynomial out of that. So we can keep taking derivatives, as many as we need. (We might need a lot of them.) We can integrate too. The integration produces another polynomial. So we can keep doing that as long as we need too. (We need to do this a lot, too.) This lets us solve so many problems in calculus, which is about how functions work. It also lets us solve so many problems in differential equations, which is about systems whose change depends on the current state of things.

That’s great for analyzing polynomials, but what about things that aren’t polynomials?

Well, if a function is continuous, then it might as well be a polynomial. To be a little more exact, we can set a margin of error. And we can always find polynomials that are less than that margin of error away from the original function. The original function might be annoying to deal with. The polynomial that’s as close to it as we want, though, isn’t.

Not every function is continuous. Most of them aren’t. But most of the functions we want to do work with are, or at least are continuous in stretches. Polynomials let us understand the functions that describe most real stuff.

Nice for mathematicians, all right, but how about for real uses? How about for calculations?

Oh, polynomials are just magnificent. You know why? Because you can evaluate any polynomial as soon as you can add and multiply. (Also subtract, but we think of that as addition.) Remember, x4 just means “x times x times x times x”, four of those x’s in the product. All these polynomials are easy to evaluate.

Even better, we don’t have to evaluate them. We can automate away the evaluation. It’s easy to set a calculator doing this work, and it will do it without complaint and with few unforeseeable mistakes.

Now remember that thing where we can make a polynomial close enough to any continuous function? And we can always set a calculator to evaluate a polynomial? Guess that this means about continuous functions. We have a tool that lets us calculate stuff we would want to know. Things like arccosines and logarithms and Bessel functions and all that. And we get nice easy to understand numbers out of them. For example, that third polynomial I gave you above? That’s not just infinitely long. It’s also a polynomial that approximates the natural logarithm. Pick a positive number x that’s between 0 and 4 and put it in that polynomial. Calculate terms and add them up. You’ll get closer and closer to the natural logarithm of that number. You’ll get there faster if you pick a number near 2, but you’ll eventually get there for whatever number you pick. (Calculus will tell us why x has to be between 0 and 4. Don’t worry about it for now.)

So through polynomials we can understand functions, analytically and numerically.

And they keep revealing things to us. We discovered complex-valued numbers because we wanted to find roots, values of x that make a polynomial of x equal to zero. Some formulas worked well for third- and fourth-degree polynomials. (They look like the quadratic formula, which solves second-degree polynomials. The big difference is nobody remembers what they are without looking them up.) But the formulas sometimes called for things that looked like square roots of negative numbers. Absurd! But if you carried on as if these square roots of negative numbers meant something, you got meaningful answers. And correct answers.

We wanted formulas to solve fifth- and higher-degree polynomials exactly. We can do this with second and third and fourth-degree polynomials, after all. It turns out we can’t. Oh, we can solve some of them exactly. The attempt to understand why, though, helped us create and shape group theory, the study of things that look like but aren’t numbers.

Polynomials go on, sneaking into everything. We can look at a square matrix and discover its characteristic polynomial. This allows us to find beautifully-named things like eigenvalues and eigenvectors. These reveal secrets of the matrix’s structure. We can find polynomials in the formulas that describe how many ways to split up a group of things into a smaller number of sets. We can find polynomials that describe how networks of things are connected. We can find polynomials that describe how a knot is tied. We can even find polynomials that distinguish between a knot and the knot’s reflection in the mirror.

Polynomials are everything.

Terrible And Less-Terrible Things with Pi


We are coming around “Pi Day”, the 14th of March, again. I don’t figure to have anything thematically appropriate for the day. I figure to continue the Leap Day 2016 Mathematics A To Z, and I don’t tend to do a whole two posts in a single day. Two just seems like so many, doesn’t it?

But I would like to point people who’re interested in some π-related stuff to what I posted last year. Those posts were:

  • Calculating Pi Terribly, in which I show a way to work out the value of π that’s fun and would take forever. I mean, yes, properly speaking they all take forever, but this takes forever just to get a couple of digits right. It might be fun to play with but don’t use this to get your digits of π. Really.
  • Calculating Pi Less Terribly, in which I show a way to do better. This doesn’t lend itself to any fun side projects. It’s just calculations. But it gets you accurate digits a lot faster.

Spaghetti Mathematics


Let’s ease into Monday. Win Smith with the Well Tempered Spreadsheet blog encountered one of those idle little puzzles that captures the imagination and doesn’t let go. It starts as many will with spaghetti.

I’m sure you’re intrigued too. It’s not the case that any old splitting of a strand of spaghetti will give you three pieces you can make into a triangle. You need the lengths of the three pieces to satisfy what’s imaginatively called the Triangle Inequality. That inequality demands the lengths of any two sides have to be greater than the length of the third side. So, suppose we start with spaghetti that’s 12 inches long, and we have it cut into pieces 5, 4, and 3 inches long: that’s fine. If we have it cut into pieces 9, 2, and 1 inches long, we’re stuck.

The Triangle Inequality is often known as the Cauchy Inequality, or the Cauchy-Schwarz Inequality, or the Cauchy-Bunyakovsky-Schwarz Inequality, or if that’s getting too long the CBS Inequality. And some pranksters reorder it to the Cauchy-Schwarz-Bunyakovsky Inequality. The Cauchy (etc) Inequality isn’t quite the same thing as the Triangle Inequality. But it’s an important and useful idea, and the Cauchy (etc) Inequality has the Triangle Inequality as one of its consequences. The name of it so overflows with names because mathematics history is complicated. Augustin-Louis Cauchy published the first proof of it, but for the special case of sums of series. Viktor Bunyakovsky proved a similar version of it for integrals, and has a name that’s so very much fun to say. Hermann Amandus Schwarz first put the proof into its modern form. So who deserves credit for it? Good question. If it influences your decision, know that Cauchy was incredibly prolific and has plenty of things named for him already. He’s got, without exaggeration, about eight hundred papers to his credit. Collecting all his work into a definitive volume took from 1882 to 1974.

Back to the spaghetti. The problem’s a fun one and if you follow the Twitter link above you’ll see the gritty work of mathematicians reasoning out the problem. As with every probability problem ever, the challenge is defining exactly what you’re looking for the probability of. This we call finding the “sample space”, the set of all the possible outcomes and how likely each of them are. Subtle changes in how you think of the problem will change whether you are right.

Smith cleans things up a bit, but preserves the essence of how the answer worked out. The answer that looks most likely correct was developed partly by reasoning and partly by numerical simulation. Numerical simulation is a great blessing for probability problems. Often the easiest way to figure out how likely something is will be trying it. But this does require working out the sample space, and what parts of the sample space represent what you’re interested in. With the information the numerical simulation provided, Smith was able to go back and find an analytic, purely reason-based, answer that looks plausible.

Reading the Comics, January 15, 2015: Electric Brains and Klein Bottles Edition


I admit I don’t always find a theme running through Comic Strip Master Command’s latest set of mathematically-themed comics. The edition names are mostly so that I can tell them apart when I see a couple listed in the Popular Posts roundup anyway.

Little Iodine and her parents see an electronic brain capable of solving any problem; her father offers 'square root of 7,921 x^2 y^2'. It gets it correct, 89 xy. Little Iodine, inspired, makes her own. 'Where are you getting all the money for those ice cream cones and stuff?' her father demands. 'I made a 'lectric brain --- the kids pay me a nickel when they got homework --- here --- give me a problem.' He offers 9 times 16. The electric brain writes out 'Dere teecher, Plees xcuse my chidl for not doing his homwork'. 'And then these letters come out --- the kid gives it to the teacher and everything's okey --- '
Jimmy Hatlo’s Little Iodine for the 12th of January, 2016. Originally run the 7th of November, 1954.

Jimmy Hatlo’s Little Iodine is a vintage comic strip from the 1950s. It strikes me as an unlicensed adaptation of Baby Schnooks, but that’s not something for me to worry about. The particular strip, originally from the 7th of November, 1954 (and just run the 12th of January this year) interests me for its ancient views of computers. It’s from the days they were called “electric brains”. I’m also impressed that the machine on display early on is able to work out the “square root of 7921 x2 y2”. The square root of 7921 is no great feat. Being able to work with the symbols of x and y without knowing what they stand for, though, does impress me. I’m not sure there were computers which could handle that sort of symbolic manipulation in 1954. That sort of ability to work with a quantity by name rather than value is what we would buy Mathematica for, if we could afford it. It’s also at least a bit impressive that someone knows the square of 89 offhand. All told, I think this is my favorite of this essay’s set of strips. But it’s a weak field considering none of them are “students giving a snarky reply to a homework/exam/blackboard question”.

Joe Martin’s Willy and Ethel for the 13th of January is a percentages joke. Some might fault it for talking about people giving 110 percent, but of course, what is “100 percent”? If it’s the standard amount of work being done then it does seem like ten people giving 110 percent gets the job done as quickly as eleven people doing 100 percent. If work worked like that.

Willy asks his kid: 'OK, here's a question of my own that involves math and principles. If you're on an 11 man crew and 10 of them are giving 110%, do you have to show up for work?
Joe Martin’s Willy and Ethel for the 13th of January, 2016. The link will likely expire in mid-February.

Steve Sicula’s Home and Away for the 13th (a rerun from the 8th of October, 2004) gives a wrongheaded application of a decent principle. The principle is that of taking several data points and averaging their value. The problem with data is that it’s often got errors in it. Something weird happened and it doesn’t represent what it’s supposed to. Or it doesn’t represent it well. By averaging several data points together we can minimize the influence of a fluke reading. Or if we’re measuring something that changes in time, we might use a running average of the last several sampled values. In this way a short-term spike or a meaningless flutter will be minimized. We can avoid wasting time reacting to something that doesn’t matter. (The cost of this, though, is that if a trend is developing we will notice it later than we otherwise would.) Still, sometimes a data point is obviously wrong.

Zach Weinersmith’s Saturday Morning Breakfast Cereal wanted my attention, and so on the 13th it did a joke about Zeno’s Paradox. There are actually four classic Zeno’s Paradoxes, although the one riffed on here I think is the most popular. This one — the idea that you can’t finish something (leaving a room is the most common form) because you have to get halfway done, and have to get halfway to being halfway done, and halfway to halfway to halfway to being done — is often resolved by people saying that Zeno just didn’t understand that an infinite series could converge. That is, that you can add together infinitely many numbers and get a finite number. I’m inclined to think Zeno did not, somehow, think it was impossible to leave rooms. What the paradoxes as a whole get to are questions about space and time: they’re either infinitely divisible or they’re not. And either way produces effects that don’t seem to quite match our intuitions.

The next day Saturday Morning Breakfast Cereal does a joke about Klein bottles. These are famous topological constructs. At least they’re famous in the kinds of places people talk about topological constructs. It’s much like the Möbius strip, a ribbon given a twist and joined back to its edge. The Klein bottle similarly you can imagine as a cylinder stretched out into the fourth dimension, given a twist, then joined back to itself. We can’t really do this, what with it being difficult to craft four-dimensional objects. But we can imagine this, and it creates an object that doesn’t have a boundary, and has only one side. There’s not an inside or an outside. There’s no making this in the real world, but we can make nice-looking approximations, usually as bottles.

Ruben Bolling’s Super-Fun-Pak Comix for the 13th of January is an extreme installment of Chaos Butterfly. The trouble with touching Chaos Butterfly to cause disasters is that you don’t know — you can’t know — what would have happened had you not touched the butterfly. You change your luck, but there’s no way to tell whether for the better or worse. One of the commenters at Gocomics.com alludes to this problem.

Jon Rosenberg’s Scenes From A Multiverse for the 13th of January makes quite literal quantum mechanics talk about probability waves and quantum foam and the like. The wave formulation of quantum mechanics, the most popular and accessible one, describes what’s going on in equations that look much like the equations for things diffusing into space. And quantum mechanical problems are often solved by supposing that the probability distribution we’re interested in can be broken up into a series of sinusoidal waves. Representing a complex function as a set of waves is a common trick, not just in quantum mechanics, because it works so well so often. Sinusoidal waves behave in nice, predictable ways for most differential equations. So converting a hard differential equation problem into a long string of relatively easy differential equation problems is usually a good trade.

Tom Thaves’s Frank and Ernest for the 14th of January ties together the baffling worlds of grammar and negative numbers. It puts Frank and Ernest on panel with Euclid, who’s a fair enough choice to represent the foundation of (western) mathematics. He’s famous for the geometry we now call Euclidean. That’s the common everyday kind of blackboards and tabletops and solid cubes and spheres. But among his writings are compilations of arithmetic, as understood at the time. So if we know anyone in Ancient Greece to have credentials to talk about negative numbers it’s him. But the choice of Euclid traps the panel into an anachronism: the Ancient Greeks just didn’t think of negative numbers. They could work through “a lack of things” or “a shortage of something”, but a negative? That’s a later innovation. But it’s hard to think of a good rewriting of the joke. You might have Isaac Newton be consulted, but Newton makes normal people think of gravity and physics, confounding the mathematics joke. There’s a similar problem with Albert Einstein. Leibniz or Gauss should be good, but I suspect they’re not the household names that even Euclid is. And if we have to go “less famous mathematician than Gauss” we’re in real trouble. (No, not Andrew Wiles. Normal people know him as “the guy that proved Fermat’s thing”, and that’s too many words to fit on panel.) Perhaps the joke can’t be made to read cleanly and make good historic sense.

The Set Tour, Part 11: Doughnuts And Lots Of Them


I’ve been slow getting back to my tour of commonly-used domains for several reasons. It’s been a busy season. It’s so much easier to plan out writing something than it is to write something. The usual. But one of my excuses this time is that I’m not sure the set I want to talk about is that common. But I like it, and I imagine a lot of people will like it. So that’s enough.

T and Tn

T stands for the torus. Or the toroid, if you prefer. It’s a fun name. You know the shape. It’s a doughnut. Take a cylindrical tube and curl it around back on itself. Don’t rip it or fold it. That’s hard to do with paper or a sheet of clay or other real-world stuff. But we can imagine it easily enough. I suppose we can make a computer animation of it, if by ‘we’ we mean ‘you’.

We don’t use the whole doughnut shape for T. And no, we don’t use the hole either. What we use is the surface of the doughnut, the part that could get glazed. We ignore the inside, just the same way we had S represent the surface of a sphere (or the edge of a circle, or the boundary of a hypersphere). If there is a common symbol for the torus including the interior I don’t know it. I’d be glad to hear if someone had.

What good is the surface of a torus, though? Well, it’s a neat shape. Slice it in one direction, the way you’d cut a bagel in half, and at the slice you get the shape of a washer, the kind you fit around a nut and bolt. (An annulus, to use the trade term.) Slice it perpendicular to that, the way you’d cut it if you’re one of those people who eats half doughnuts to the amazement of the rest of us, and at the slice you get two detached circles. If you start from any point on the torus shape you can go in one direction and make a circle that loops around the doughnut’s central hole. You can go the perpendicular direction and make a circle that brushes up against but doesn’t go around the central hole. There’s some neat topology in it.

There’s also video games in it. The topology of this is just like old-fashioned video games where if you go off the edge of the screen to the right you come back around on the left, and if you go off the top you come back from the bottom. (And if you go off to the left you come back around the right, and off the bottom you come back to the top.) To go from the flat screen to the surface of a doughnut requires imagining some stretching and scrunching up of the surface, but that’s all right. (OK, in an old video game it was a kind-of flat screen.) We can imagine a nice flexible screen that just behaves.

This is a common trick to deal with boundaries. (I first wrote “to avoid having to deal with boundaries”. But this is dealing with them, by a method that often makes sense.) You just make each boundary match up with a logical other boundary. It’s not just useful in video games. Often we’ll want to study some phenomenon where the current state of things depends on the immediate neighborhood, but it’s hard to say what a logical boundary ought to be. This particularly comes up if we want to model an infinitely large surface without dealing with infinitely large things. The trick will turn up a lot in numerical simulations for that reason. (In that case, we’re in truth working with a numerical approximation of T, but that’ll be close enough.)

Tn, meanwhile, is a vector of things, each of which is a point on a torus. It’s akin to Rn or S2 x n. They’re ordered sets of things that are themselves things. There can be as many as you like. n, here, is whatever positive whole number you need.

You might wonder how big the doughnut is. When we talked about the surface of the sphere, S2, or the surface and interior, B3, we figured on a sphere with radius of 1 unless we heard otherwise. Toruses would seem to have two parameters. There’s how big the outer diameter is and how big the inner diameter is. Which do we pick?

We don’t actually care. It’s much the way we can talk about a point on the surface of a planet by the latitude and longitude of the point, and never care about how big the planet is. We can describe a point on the surface of the torus without needing to refer to how big the whole shape is or how big the hole in the middle is. A popular scheme to describe points is one that looks a lot like latitude and longitude.

Imagine the torus sitting as flat as it gets on the table. Pick a point that you find interesting.

We use some reference point that’s as good as an equator and a prime meridian. One coordinate is the angle you make going horizontally, possibly around the hole in the middle, from the reference point to the point we’re interested in. The other coordinate is the angle you make vertically, going in a loop that doesn’t go around the hole in the middle, from the reference point to the point we’re interested in. The reference point has coordinates 0, 0, as it must. If this sounds confusing it’s because I’m not using a picture. I thought making some pictures would be too much work. I’m a fool. But if you think of real torus-shaped objects it’ll come to you.

In this scheme the coordinates are both angles. Normal people would measure that in degrees, from 0 to 360, or maybe from -180 to 180. Mathematicians would measure as radians, from 0 to 2π, or from -π to +π. Whatever it is, it’s the same as the coordinates of a point on the edge of the circle, what we called S1 a few essays back. So it’s fair to say you can think of T as S1 x S1, an ordered set of points on circles.

I’ve written of these toruses as three-dimensional things. Well, two dimensional-surfaces wrapped up to suggest three-dimensional objects. You don’t have to stick with these dimensions if you don’t want or if your problem needs something else. You can make a torus that’s a three-dimensional shape in four dimensions. For me that’s easiest to imagine as a cube where the left edge and the right edge loop back and meet up, the lower and the upper edges meet up, and the front and the back edges meet up. This works well to model an infinitely large space with a nice and small block.

I like to think I can imagine a four-dimensional doughnut where every cross-section is a sphere. I may be kidding myself. There could also be a five-dimensional torus and you’re on your own working that out, or working out what to do with it.

I’m not sure there is a common standard notation for that, though. Probably the mathematician wanting to make clear she’s working with a torus in four dimensions just says so in text, and trusts that the context of her mathematics makes it clear this is no ordinary torus.

I’ve also written of these toruses as circular, as rounded shapes. That’s the most familiar torus. It’s a doughnut shape, or an O-ring shape, or an inner tube’s shape. It’s the shape you produce by taking a circle and looping it around an axis not on the ring. That’s common and that’s usually all we need.

But if you need some other torus, produced by rotating some other shape around an axis not inside it, go ahead. You’ll need to make clear what that original shape, the generator, is. You’ve seen examples of this in, for example, the washers that fit around nuts and bolts. They’re typically rectangles in cross-section. Or you might have seen that image of someone who fit together a couple dozen iMac boxes to make a giant wheel. I don’t know why you would need this, but it’s your problem, not mine. If these shapes are useful for your work, by all means, use them.

I’m not sure there is a standard notation for that sort of shape. My hunch is to say you’d define your generating shape and give it a name such as A or D. Then name the torus based on that as T(A) or T(D). But I would recommend spelling it out in text before you start using symbols like this.

Reading the Comics, December 5, 2015: Awkward Break Edition


I confess I’m dissatisfied with this batch of Reading the Comics posts. I like having something like six to eight comics for one of these roundups. But there was this small flood of mathematically-themed comics on the 6th of December. I could either make do with a slightly short edition, or have an overstuffed edition. I suppose it’s possible to split one day’s comics across two Reading the Comics posts, but that’s crazy talk. So, a short edition today.

Jef Mallett’s Frazz for the 4th of December was part of a series in which Caulfield resists learning about reciprocals. The 4th offers a fair example of the story. At heart the joke is just the student-resisting-class, or student-resisting-story-problems. It certainly reflects a lack of motivation to learn what they are.

We use reciprocals most often to write division problems as multiplication. “a ÷ b” is the same as “a times the reciprocal of b”. But where do we get the reciprocal of b from? … Well, we can say it’s the multiplicative inverse of b. That is, it’s whatever number you have to multiply ‘b’ by in order to get ‘1’. But we’re almost surely going to find that taking 1 and dividing it by b. So we’ve swapped out one division problem for a slightly different one. This doesn’t seem to be getting us anywhere.

But we have gotten a new idea. If we can define the multiplication of things, we might be able to get division for almost free. Could we divide one matrix by another? We can certainly multiply a matrix by the inverse of another. (There are complications at work here. We’ll save them for another time.) A lot of sets allow us to define things that make sense as addition and multiplication. And if we can define a complicated operation in terms of addition and multiplication … If we follow this path, we get to do things like define the cosine of a matrix. Then we just have to figure out why we’d want have a cosine of a matrix.

There’s a simpler practical use of reciprocals. This relates to numerical mathematics, computer work. Computer chips do addition (and subtraction) really fast. They do multiplication a little slower. They do division a lot slower. Division is harder than multiplication, as anyone who’s done both knows. However, dividing by (say) 4 is the same thing as multiplying by 0.25. So if you know you need to divide by a number a lot, then it might make for a faster program to change division into multiplication by a reciprocal. You have to work out the reciprocal, but if you only have to do that once instead of many times over, this might make for faster code. Reciprocals are one of the tools we can use to change a mathematical process into something faster.

(In practice, you should never do this. You have a compiler that does this, and you should let it do its work. But it’s enlightening to know these are the sorts of things your compiler is looking for when it turns your code into something the computer does. And looking for ways to do the same work in less time is a noble side of mathematics.)

Charles Schulz’s Peanuts for the 4th of December (originally from 1968, on the same day) sees Peppermint Patty’s education crash against a word problem. It’s another problem in motivating a student to do a word problem. I admit when I was a kid I’d have been enchanted by this puzzle. But I was a weird one.

Dave Coverly’s Speed Bump for the 4th of December is a mathematics-symbols joke as applied to toast. I think you could probably actually sell those. At least the greater-than and the less-than signs. The approximately-equal-to signs would be hard to use. And people would think they were for bacon anyway.

Ruben Bolling’s Super-Fun-Pak Comix for the 4th of December showcases Young Albert Einstein. That counts as mathematical content, doesn’t it? The strip does make me wonder if they’re still selling music CDs and other stuff for infant or even prenatal development. I’m skeptical that they ever did any good, but it isn’t a field I’ve studied.

Bill Whitehead’s Free Range for the 5th of December uses a blackboard full of mathematical and semi-mathematical symbols to denote “stuff too complicated to understand”. The symbols don’t parse as anything. It is authentic to mathematical work to sometimes skip writing all the details of a thing and write in instead a few words describing it. Or to put in an abbreviation for the thing. That often gets circled or boxed or in some way marked off. That keeps us from later on mistaking, say, “MUB” as the product of M and U and B, whatever that would mean. Then we just have to remember we meant “minimum upper bound” by that.