I admit I had not heard of Tutte’s Theorem before Gaurish’s request and I had to spend time working up to knowing what it was and why it was useful. I also admit I’m not completely sure I have it. But I’m happy to try to accept with grace corrections from the people in graph theory who know better.
It comes back to graphs. These are a bunch of points, “vertices”, which have curves, “edges” connecting pairs of them. This describes a lot of systems. Bunches of things that have to connect are naturally graphs. Connecting utilities to houses gave the first example I called up last week. The Internet’s made people vaguely familiar with the idea of bunches of computers linked to bunches of other computers. Social networks can be seen as graphs; each person is a vertex, her friendships edges connecting to other vertices.
In a graph not every vertex has to be connected to every other vertex. Not everyone needs to be friends with everyone else, which is a relief. I don’t want to point fingers but some of your friends are insufferable. I don’t know how you put up with them. You, you’re great. But the friends of a friend … yeeeeeeesh.
So we — mathematicians, anyway — get to wondering. Give me a graph. It’s got some vertices and some edges. Let’s look at only some of these edges. Each edge links to two vertices. Is there a subset of the edges that touch every one of the vertices exactly once? Sometimes there are; sometimes there aren’t. If there are, we have a “perfect matching”.
We look at this sort of thing because mathematicians learn to look at coverings. Coverings are what they sound like. What’s the smallest set of some standard item you need to … cover … all the stuff you’re interested in this problem? I think we’re bred to look for coverings in Real Analysis, because the idea of covering space with discs gives us measure. This gives us a rigorous idea of what length is, and what dimensions are, and lets us see there have to be more irrational than rational numbers and other interesting results like that. Mathematicians get used to looking for this sort of thing in other fields.
Tutte’s Theorem is about perfect matchings. It says what conditions a graph has to have to have a perfect matching. It’s based on striking subsets of the vertices from the original graph. The accompanying edges go too. What’s left might be connected, which means just what you think. You can get from any vertex in the decimated graph to any other vertex by following its edges. Or it might be disconnected, which means again just what you think. Mathematics is not always about complicated lingo.
Take the survivors. Count how many of the remaining connected components have an odd number of vertices. Is that less than or equal to the number of vertices you struck out? If it is, and if it is no matter how many vertices you struck out, and no matter how you arranged those vertices, then there’s a perfect matching.
This isn’t one for testing. I mean, consider what’s easier to do: start from your original graph and test out coloring in some edges to see if you can touch every edge the one time? Take every possible sub-graph of the original graph and count the number of connected sub-pieces with an odd number of vertices? … Well, maybe that isn’t so bad, if you set a computer to do the boring work. It’s going to take forever for your graph with 102 vertices, though, unless it’s a boring network. You have to test every possible removal all the way up to striking 102 vertices. (OK, it’s easy to show it’s true if 102 of 102 vertices are removed from the graph. Also if 101 of 102 vertices are removed. And 100 of 102 is also not hard. But that’s only a couple easy cases left.)
I don’t know the real work the theorem does. It has some neat and implications good for pop mathematics. Take a standard deck of 52 well-shuffled cards. Deal them out into thirteen piles of four cards each. Is it possible to select exactly one card from each pile so that, when done, there’s exactly one of each face value — Ace, Two, Three, Four, up through Queen and King — in the selected set? Indeed it is. I leave it to the Magic In A Minute cartoonists to turn this into a way to annoy a compliant monkey.
Another appealing use for it is in marriage problems. Well, marriage looked to mathematicians like a good source of bipartite graph problems. Society’s since noticed that nobody really hit on a compelling reason for “why does a woman have to marry a man, exactly”. I imagine the change to filter into mathematics textbooks sometime in the next seventy years. But please accept that this problem was formed and solved in the 1930s.
Suppose you have a group of women and a group of men, of equal size. Each woman would be happy married to some of the men in that group. Each man would be happy married to some of the women in that group. Is it possible to match women and men up in a way that everybody is married to at least someone they’d be happy with? We’re not guaranteeing that anyone gets their best possible pairing. We promise just that everyone marries someone they’ll be glad to.
It depends. Maybe this is best done by testing. Work through every possible subset of women. That is, look at every group of one woman, every group of two women, every group of three women, and so on. By group I mean what normal people mean by group, not what mathematicians mean. So look at your group of women. Count how many of the men at least one woman would be content marrying. Is that number equal to or larger than the number of women? Is that number equal to or larger than the number of women, however many women you picked and whichever women you did pick? If it did, great: everybody can marry someone they’re happy with.
Parlor tricks, I admit, but pleasant ones. What are its real uses? At this point I am really synthesizing my readings on the subject rather than speaking from things I’m confident about. If I understand right, though, Tutte’s Theorem is a foundation for the study of graph toughness. That is what you’d guess from the name. It’s about how easy it is to break up a graph into disconnected pieces. It’s easy to imagine real networks with strength or weakness. Image the person that holds a complicated group of friends together, for example, and whose removal devastates it. Consider the electrical network with a few vulnerable points that allow small problems to become nationwide blackouts. I believe it’s relevant to the study of proteins. Those long strands of blocks of molecules that then fold back on themselves. (I haven’t seen a source that says this, but it can’t imagine why it shouldn’t. I am open to correction from sneering protein researchers.) I would be surprised if the theorem has nothing to say about when a strand of Christmas tree lights will get too tangled to fix.
Let me close with a puzzle, a delightful little one. It regards one of the graphs we met last week. K5 is a complete graph with five vertices. Each vertex is connected to all four of its siblings. It can’t have a perfect matching. Only a graph with an even number of vertices can. Each edge connects to two vertices, after all. So — what is the subset that breaks Tutte’s theorem? It must be possible to strike some set of vertices from K5 so that the number of stricken vertices is smaller than the number of remaining connected components with an odd number of vertices. What’s that set?
It is obvious once you’ve heard what the subset of vertices is, and it is not K5. The rest of this paragraph is padding so that the size of the spoiler doesn’t give matters away. And by the way I’d like to thank the world at large for doing such a great job not spoiling Star Wars: The Force Awakens. So why could you not give some similar consideration for Star Trek Beyond? I stopped reading TrekBBS for a month partly to avoid spoilers and then I started getting them everywhere I looked. Not cool, world.
I’m doing that thing again. Sometime for my A To Z posts I’ve used one essay to explain something, but also to introduce ideas and jargon that will come up later. Here, I want to do a theorem in graph theory. But I don’t want to overload one essay. So I’m going to do a theorem that’s interesting and neat in itself. But my real interest in next week’s piece. This is just so recurring readers are ready for it.
The Kuratowski Reduction Theorem.
It starts with a children’s activity book-type puzzle. A lot of real mathematics does. In the traditional form that I have a faint memory of ever actually seeing it’s posed as a problem of hooking up utilities to houses. There are three utilities, usually gas, water, and electricity. There are three houses. Is it possible to connect pipes from each of the three utility starting points to each of the three houses?
Of course. We do it all the time. We do this with many more utilities and many more than three buildings. The underground of Manhattan island is a network of pipes and tunnels and subways and roads and basements and buildings so complicated I doubt humans can understand it all. But the problem isn’t about that. The problem is about connecting these pipes all in the same plane, by drawing lines on a sheet of paper, without ever going into the third dimension. Nor making that little semicircular hop that denotes one line going over the other.
That’s a little harder. By that I mean it’s impossible. You can try and it’s fun to try a while. Draw three dots that are the houses and three dots that are the utilities. Try drawing three lines, one from each utility to each of the houses. Or one leading into each house that comes from each of the utilities. The lines don’t have to be straight. They can have extra jogs, too. Soon you’ll hit on the possibilities of lines that go way out, away from the dots, in the quest to avoid crossing over one another. It doesn’t matter. The attempt’s doomed to failure.
This is a problem in graph theory. I’ve talked about graph theory before. It’s the field of mathematics most comfortable to people who like doodling. A graph is a bunch of points, which we call vertices, connected by arcs or lines, which we call edges. For this utilities graph problem, the houses are the vertices. The pipes are the edges. An edge has to start at one vertex and end at a vertex. These may be the same vertex. We’re not judging. A vertex can have one edge connecting it to something else, or two edges, or three edges. It can have no edges. It can have any number of edges. We’re even allowed to have two or more edges connecting a vertex to the same vertex. My experience is we think of that last, forgetting that it is a possibility, but it’s there.
This is a “nonplanar” graph. This means you can’t draw it in a plane, like a sheet of paper, without having at least two edges crossing each other. We draw this on paper by making one of the lines wiggle in a little half-circle to show it’s going over, or to fade out and back in again to show it’s going under. There are planar graphs. Try the same problem with two houses and two utilities, for example. Or three houses and two utilities. Or three houses and three utilities, but one of the houses doesn’t get one of the utilities. Your choice which. It can be a little surprise to the homeowners.
This utilities graph is an example of a “bipartite” graph. The “bi” maybe suggests where things are going. You can always divide the vertices in a graph into two groups for the same reason you can always divide a pile of change into two piles. As long as you have at least two vertices or pieces of change. But a graph is bipartite if, once you’ve divided the vertices up, each edge has one vertex in the first set and the other vertex in the second. For the utilities graph these sets are easy to find. Each edge, each pipe, connects one utility to one house. There’s our division: vertices representing houses and vertices representing utilities.
This graph turns up a lot. Graph theorists have a shorthand way of writing it. It’s written as K3,3. This means it’s a bipartite graph. It has three vertices in the first set. There’s three vertices in the second set. There’s an edge connecting everything in the first set to everything in the second. Go ahead now and guess what K2, 2 is. Or K3,5. The K — I’ve never heard what the K stands for, although while writing this essay I started to wonder if it’s for “Kuratowski”. That seems too organized, somehow.
Not every graph is bipartite. You might say “of course; why else would we have a name `bipartite’ if there weren’t such a thing as `non-bipartite’?” Well, we have the name “graph” for everything that’s a graph in graph theory. But there are non-bipartite graphs. They just don’t look like the utility-graph problem. Imagine three vertices, each of them connected to the other two. If you aren’t imagining a triangle you’re overthinking this. But this is a perfectly good non-bipartite graph. There’s no way to split the vertices into two sets with every edge connecting something in one set to something in the other. No, that isn’t inevitable once you have an odd number of vertices. Look above at the utilities problem where there’s three houses and two utilities. That’s nice and bipartite.
Non-bipartite graphs can be planar. The one with three vertices, each connected to each other, is. The one with four vertices, each vertex connected to each other, is also planar. But if you have five vertices, each connected to each other — well, that’s a lovely star-in-pentagon shape. It’s also not planar. There’s no connecting each vertex to each other one without some line crossing another or jumping out of the plane.
This shape, five vertices each connected to one another, shows up a lot too. And it has a shorthand notation. It’s K5. That is, it’s five points, all connected to each other. This makes it a “complete” graph: every set of two vertices has an edge connecting them. If you’ve leapt to the supposition that K3 is that circle and K4 is that square with diagonals drawn in you’re right. K6 is six vertices, each one connected to the five other vertices.
It may seem intolerably confusing that we have two kinds of graphs and describe them both with K and a subscript. But they’re completely different. The bipartite graphs have a subscript that’s two numbers separated by a comma: p, q. The p is the number of vertices in one of the subsets. The q is the number of vertices in the other subset. There’s an edge connecting every point in p to every point in q, and vice-versa. The points in the p subset aren’t connected to one another, though. And the points in the q subset aren’t connected to one another. That they don’t mean this isn’t a complete graph.
The others, K with a single number r in the subscript, are complete graphs, ones that aren’t bipartite. They have r vertices, and each vertex is connected to the (r – 1) other vertices. So there’s (1/2) times r times (r – 1) edges all told.
Not every graph is either Kp, q or Kr. There’s a lot of kinds of graphs out there. Some are planar, some are not. But here’s an amazing thing, and it’s Kuratowski’s Reduction Theorem. If a graph is not planar, then it has to have, somewhere inside it, K3, 3 or K5 or both. Maybe several of them.
A graph that’s hidden within another is called a “subgraph”. This follows the same etymological reasoning that gives us “subsets” and “subgroups” and many other mathematics words beginning with “sub”. And these subgraphs turn up whenever you have a nonplanar graph. A subgraph uses some set of the vertices and edges of the original graph; it doesn’t need all of them. A nonplanar graph has a subgraph that’s K3, 3 or K5 or both.
Sometimes it’s easy to find one of these. K4, 4 obviously has K3, 3 inside it. Pick three of the four vertices on one side and three of the four vertices on the other, and look at the edges connecting them up. There’s your K3, 3. Or on the other side, K6 obviously has K5 inside it. Pick any five of the vertices inside K6 and the edges connecting those vertices. There’s your K5.
Sometimes it’s hard to find one of these. We can make a graph look more complicated without changing whether it’s planar or not. Take your K3, 3 again. Go to each edge and draw another dot, another vertex, inside it. Well, now it’s a graph that’s got twelve vertices in it. It’s not obvious whether this is bipartite still. (Play with it a while.) But it hasn’t become planar, not because of this. It won’t be.
This is because we can make graphs more complicated, or more simple, without changing whether they’re planar. The process is a lot like what we did last week with the five-color map theorem, making a map simpler until it was easy enough to color. Suppose there’s a little section of the graph that’s a vertex connected by one edge to a middle vertex connected by one edge to a third vertex. Do we actually need that middle vertex for anything? Besides padding our vertex count? Nah. We can drop that whole edge-middle vertex-edge sequence and replace it all with a single edge. And there’s other rules that let us turn a vertex-edge-vertex set into a single vertex. That sort of thing. It won’t change a planar graph to a nonplanar one or vice-versa.
So it can be hard to find the K3, 3 or the K5 hiding inside a nonplanar graph. A big enough graph can have so much going on it’s hard to find the pattern. But they’ll be there, in some group of five or six vertices with the right paths between them.
It would make a good activity puzzle, if you could explain what to look for to kids.
People think mathematics is mostly counting and arithmetic. It’s what we get at when we say “do the math[s]”. It’s why the mathematician in the group is the one called on to work out what the tip should be. Heck, I attribute part of my love for mathematics to a Berenstain Bears book which implied being a mathematician was mostly about adding up sums in a base on the Moon, which is an irresistible prospect. In fact, usually counting and arithmetic are, at least, minor influences on real mathematics. There are legends of how catastrophically bad at figuring mathematical genius can be. But usually isn’t always, and this week I’d like to show off a case where counting things and adding things up lets us prove something interesting.
The Five-Color Map Theorem.
No, not four. I imagine anyone interested enough to read a mathematics blog knows the four-color map theorem. It says that you only need four colors to color a map. That’s true, given some qualifiers. No discontiguous chunks that need the same color. Two regions with the same color can touch at a point, they just can’t share a line or curve. The map is on a plane or the surface of a sphere. Probably some other requirements. I’m not going to prove that. Nobody has time for that. The best proofs we’ve figured out for it amount to working out how every map fits into one of a huge number of cases, and trying out each case. It’s possible to color each of those cases with only four colors, so, we’re done. Nice but unenlightening and way too long to deal with.
The five-color map theorem is a lot like the four-color map theorem, with this difference: it says that you only need five colors to color a map. Same qualifiers as before. Yes, it’s true because the four-color map theorem is true and because five is more than four. We can do better than that. We can prove five colors are enough even without knowing whether four colors will do. And it’s easy. The ease of the five-color map theorem gave people reason to think four colors would be maybe harder but still manageable.
The proof I want to show uses one of mathematicians’ common tricks. It employs the same principle which Hercules used to slay the Hydra, although it has less cauterizing lake-monster flesh with flaming torches, as that’s considered beneath the dignity of the Academy anymore except when grading finals for general-requirements classes. The part of the idea we do use is to take a problem which we might not be able to do and cut it down to one we can do. Properly speaking this is a kind of induction proof. In those we start from problems we can do and show that if we can do those, we can do all the complicated problems. But we come at it by cutting down complicated problems and making them simple ones.
Please enjoy my little map of the place. It gives all the states a single color because I don’t really know how to use QGIS and it would probably make my day job easier if I did. (Well, QGIS is open-source software, so its interface is a disaster and its tutorials gibberish. The only way to do something with it is to take flaming torches to it.)
There’s eight regions here, eight states, so it’s not like we’re at the point we can’t figure how to color this with five different colors. That’s all right. I’m using this for a demonstration. Pretend the Dominion of New England is so complicated we can’t tell whether five colors are enough. Oh, and a spot of lingo: if five colors are enough to color the map we say the map is “colorable”. We say it’s “5-colorable” if we want to emphasize five is enough colors.
So imagine that we erase the border between Maine and New Hampshire. Combine them into a single state over the loud protests of the many proud, scary Mainers. But if this simplified New England is colorable, so is the real thing. There’s at least one color not used for Greater New Hampshire, Vermont, or Massachusetts. We give that color to a restored Maine. If the simplified map can be 5-colored, so can the original.
Maybe we can’t tell. Suppose the simplified map is still too complicated to make it obvious. OK, then. Cut out another border. How about we offend Roger Williams partisans and merge Rhode Island into Massachusetts? Massachusetts started out touching five other states, which makes it a good candidate for a state that needed a sixth color. With Rhode Island reduced to being a couple counties of the Bay State, Greater Massachusetts only touches four other states. It can’t need a sixth color. There’s at least one of our original five that’s free.
OK, but, how does that help us find a color for Rhode Island? Maine it’s easy to see why there’s a free color. But Rhode Island?
Well, it’ll have to be the same color as either Greater New Hampshire or Vermont or New York. At least one of them has to be available. Rhode Island doesn’t touch them. Connecticut’s color is out because Rhode Island shares a border with it. Same with Greater Massachusetts’s color. But we’ve got three colors for the taking.
But is our reduced map 5-colorable? Even with Maine part of New Hampshire and Rhode Island part of Massachusetts it might still be too hard to tell. There’s six territories in it, after all. We can simplify things a little. Let’s reverse the treason of 1777 and put Vermont back into New York, dismissing New Hampshire’s claim on the territory as obvious absurdity. I am never going to be allowed back into New England. This Greater New York needs one color for itself, yes. And it touches four other states. But these neighboring states don’t touch each other. A restored Vermont could use the same color as New Jersey or Connecticut. Greater Massachusetts and Greater New Hampshire are unavailable, but there’s still two choices left.
And now look at the map we have remaining. There’s five states in it: Greater New Hampshire, Greater Massachusetts, Greater New York, Regular Old Connecticut, and Regular old New Jersey. We have five colors. Obviously we can give the five territories different colors.
This is one case, one example map. That’s all we need. A proper proof makes things more abstract, but uses the same pattern. Any map of a bunch of territories is going to have at least one territory that’s got at most five neighbors. Maybe it will have several. Look for one of them. If you find a territory with just one neighbor, such as Maine had, remove that border. You’ve got a simpler map and there must be a color free for the restored territory.
If you find a territory with just two neighbors, such as Rhode Island, take your pick. Merge it with either neighbor. You’ll still have at least one color free for the restored territory. With three neighbors, such as Vermont or Connecticut, again you have your choice. Merge it with any of the three neighbors. You’ll have a simpler map and there’ll be at least one free color.
If you have four neighbors, the way New York has, again pick a border you like and eliminate that. There is a catch. You can imagine one of the neighboring territories reaching out and wrapping around to touch the original state on more than one side. Imagine if Massachusetts ran far out to sea, looped back through Canada, and came back to touch New Jersey, Vermont from the north, and New York from the west. That’s more of a Connecticut stunt to pull, I admit. But that’s still all right. Most of the colonies tried this sort of stunt. And even if Massachusetts did that, we would have colors available. It would be impossible for Vermont and New Jersey to touch. We’ve got a theorem that proves it.
If you have five neighbors, the way Massachusetts has, well, maybe you’re lucky. We are here. None of its neighboring states touches more than two others. We can cut out a border easily and have colors to spare. But we could be in trouble. We could have a map in which all the bordering states touch three or four neighbors and that seems like it would run out of colors. Let me show a picture of that.
So this map looks dire even when you ignore that line that looks like it isn’t connected where C and D come together. Flood fill didn’t run past it, so it must be connected. It just doesn’t look right. Everybody has four neighbors except the province of B, which has three. The province of A has got five. What can we do?
Call on the Jordan Curve Theorem again. At least one of the provinces has to be landlocked, relative to the others. In this case, the borders of provinces A, D, and C come together to make a curve that keeps B in the inside and E on the outside. So we’re free to give B and E the same color. We treat this in the proof by doing a double merger. Erase the boundary between provinces A and B, and also that between provinces A and E. (Or you might merge B, A, and F together. It doesn’t matter. The Jordan Curve Theorem promises us there’ll be at least one choice and that’s all we need.)
So there we have it. As long as we have a map that has some provinces with up to five neighbors, we can reduce the map. And reduce it again, if need be, and again and again. Eventually we’ll get to a map with only five provinces and that has to be 5-colorable.
Just … now … one little nagging thing. We’re relying on there always being some province with at most five neighbors. Why can’t there be some horrible map where every province has six or more neighbors?
Counting will tell us. Arithmetic will finish the job. But we have to get there by way of polygons.
That is, the easiest way to prove this depends on a map with boundaries that are all polygons. That’s all right. Polygons are almost the polynomials of geometry. You can make a polygon that looks so much like the original shape the eye can’t tell the difference. Look at my Dominion of New England map. That’s computer-rendered, so it’s all polygons, and yet all those shore and river boundaries look natural.
But what makes up a polygon? Well, it’s a bunch of straight lines. We call those ‘edges’. Each edge starts and ends at a corner. We call those ‘vertices’. These edges come around and close together to make a ‘face’, a territory like we’ve been talking about. We’re going to count all the regions that have a certain number of neighboring other regions.
Specifically, F2 will represent however many faces there are that have two sides. F3 will represent however many faces there are that have three sides. F4 will represent however many faces there are that have four sides. F10 … yeah, you got this.
One thing you didn’t get. The outside counts as a face. We need this to make the count come out right, so we can use some solid-geometry results. In my map that’s the vast white space that represents the Atlantic Ocean, the other United States, the other parts of Canada, the Great Lakes, all the rest of the world. So Maine, for example, belongs to F2 because it touches New Hampshire and the great unknown void of the rest of the universe. Rhode Island belongs to F3 similarly. New Hampshire’s in F4.
Any map has to have at least one thing that’s in F2, F3, F4, or F5. They touch at most two, three, four or five neighbors. (If they touched more, they’d represent a face that was a polygon of even more sides.)
How do we know? It comes from Euler’s Formula, which starts out describing the ways corners and edges and faces of a polyhedron fit together. Our map, with its polygon on the surface of the sphere, turns out to be just as good as a polyhedron. It looks a little less blocky, but that doesn’t show.
By Euler’s Formula, there’s this neat relationship between the number of vertices, the number of edges, and the number of faces in a polyhedron. (This is the same Leonhard Euler famous for … well, everything in mathematics, really. But in this case it’s for his work with shapes.) It holds for our map too. Call the number of vertices V. Call the number of edges E. Call the number of faces F. Then:
Always true. Try drawing some maps yourself, using simple straight lines, and see if it works. For that matter, look at my Really Really Simplified map and see if it doesn’t hold true still.
Here’s one of those insights that’s so obvious it’s hard to believe. Every edge ends in two vertices. Three edges meet at every vertex. (We don’t have more than three territories come together at a point. If that were to happen, we’d change the map a little to find our coloring and then put it back afterwards. Pick one of the territories and give it a disc of area from the four or five or more corners. The troublesome corner is gone. Once we’ve done with our proof, shrink the disc back down to nothing. Coloring done!) And therefore .
A polygon has the same number of edges as vertices, and if you don’t believe that then draw some and count. Every edge touches exactly two regions. Every vertex touches exactly three edges. So we can rework Euler’s formula. Multiply it by six and we get . And from doubling the equation about edges and vertices equation in the last paragraph, . So if we break up that 6E into 4E and 2E we can rewrite that Euler’s formula again. It becomes . 6V – 4E is zero, so, .
Do we know anything about F itself?
Well, yeah. . The number of faces has to equal the sum of the number of faces of two edges, and of three edges, and of four edges, and of five edges, and of six edges, and on and on. Counting!
Do we know anything about how E and F relate?
Well, yeah. A polygon in F2 has two edges. A polygon in F3 has three edges. A polygon in F4 has four edges. And each edge runs up against two faces. So therefore . This goes on forever but that’s all right. We don’t need all these terms.
Because here’s what we do have. We know that . And we know how to write both E and F in terms of F2, F3, F4, and so on. We’re going to show at least one of these low-subscript Fsomethings has to be positive, that is, there has to be at least one of them.
Start by just shoving our long sum expressions into the modified Euler’s Formula we had. That gives us this:
Doesn’t look like we’ve got anywhere, does it? That’s all right. Multiply that -1 and that 6 into their parentheses. And then move the terms around, so that we group all the terms with F2 together, and all the terms with F3 together, and all the terms with F4 together, and so on. This gets us to:
I know, that’s a lot of parentheses. And it adds negative numbers to positive which I guess we’re allowed to do but who wants to do that? Simplify things a little more:
And now look at that. Each Fsubscript has to be zero or a positive number. You can’t have a negative number of shapes. If you can I don’t want to hear about it. Most of those Fsubscript‘s get multiplied by a negative number before they’re added up. But the sum has to be a positive number.
There’s only one way that this sum can be a positive number. At least one of F2, F3, F4, or F5 has to be a positive number. So there must be at least one region with at most five neighbors. And that’s true without knowing anything about our map. So it’s true about the original map, and it’s true about a simplified map, and about a simplified-more map, and on and on.
And that is why this hydra-style attack method always works. We can always simplify a map until it obviously can be colored with five colors. And we can go from that simplified map back to the original map, and color it in just fine. Formally, this is an existence proof: it shows there must be a way to color a map with five colors. But it does so the devious way, by showing a way to color the map. We don’t get enough existence proofs like that. And, at its critical point, we know the proof is true because we can count the number of regions and the number of edges and the number of corners they have. And we can add and subtract those numbers in the right way. Just like people imagine mathematicians do all day.
Properly this works only on the surface of a sphere. Euler’s Formula, which we use for the proof, depends on that. We get away with it on a piece of paper because we can pretend this is just a part of the globe so small we don’t see how flat it is. The vast white edge we suppose wraps around the whole world. And that’s fine since we mostly care about maps on flat surfaces or on globes. If we had a map that needed three dimensions, like one that looked at mining and water and overflight and land-use rights, things wouldn’t be so easy. Nor would they work at all if the map turned out to be on an exotic shape like a torus, a doughnut shape.
But this does have a staggering thought. Suppose we drew boundary lines. And suppose we found an arrangement of them so that we needed more than five colors. This would tell us that we have to be living on a surface such as a torus, the doughnut shape. We could learn something about the way space is curved by way of an experiment that never looks at more than where two regions come together. That we can find information about the whole of space, global information, by looking only at local stuff amazes me. I hope it at least surprises you.
From fiddling with this you probably figure the four-color map theorem should follow right away. Maybe involve a little more arithmetic but nothing too crazy. I agree, it so should. It doesn’t. Sorry.
There are many theorems that you have to get fairly far into mathematics to even hear of. Often they involve things that are so abstract and abstruse that it’s hard to parse just what we’re studying. This week’s entry is not one of them.
The Jordan Curve Theorem.
There are a couple of ways to write this. I’m going to fall back on the version that Richard Courant and Herbert Robbins put in the great book What Is Mathematics?. It’s a theorem in the field of topology, the study of how shapes interact. In particular it’s about simple, closed curves on a plane. A curve is just what you figure it should be. It’s closed if it … uh … closes, makes a complete loop. It’s simple if it doesn’t cross itself or have any disconnected bits. So, something you could draw without lifting pencil from paper and without crossing back over yourself. Have all that? Good. Here’s the theorem:
A simple closed curve in the plane divides that plane into exactly two domains, an inside and an outside.
It’s named for Camille Jordan, a French mathematician who lived from 1838 to 1922, and who’s renowned for work in group theory and topology. It’s a different Jordan from the one named in Gauss-Jordan Elimination, which is a matrix thing that’s important but tedious. It’s also a different Jordan from Jordan Algebras, which I remember hearing about somewhere.
The Jordan Curve Theorem is proved by reading its proposition and then saying, “Duh”. This is compelling, although it lacks rigor. It’s obvious if your curve is a circle, or a slightly squished circle, or a rectangle or something like that. It’s less obvious if your curve is a complicated labyrinth-type shape.
It gets downright hard if the curve has a lot of corners. This is why a completely satisfying rigorous proof took decades to find. There are curves that are nowhere differentiable, that are nothing but corners, and those are hard to deal with. If you think there’s no such thing, then remember the Koch Snowflake. That’s that triangle sticking up from the middle of a straight line, that itself has triangles sticking up in the middle of its straight lines, and littler triangles still sticking up from the straight lines. Carry that on forever and you have a shape that’s continuous but always changing direction, and this is hard to deal with.
Still, you can have a good bit of fun drawing a complicated figure, then picking a point and trying to work out whether it’s inside or outside the curve. The challenging way to do that is to view your figure as a maze and look for a path leading outside. The easy way is to draw a new line. I recommend doing that in a different color.
In particular, draw a line from your target point to the outside. Some definitely outside point. You need the line to not be parallel to any of the curve’s line segments. And it’s easier if you don’t happen to intersect any vertices, but if you must, we’ll deal with that two paragraphs down.
So draw your testing line here from the point to something definitely outside. And count how many times your testing line crosses the original curve. If the testing line crosses the original curve an even number of times then the original point was outside the curve. If the testing line crosses the original an odd number of times then the original point was inside of the curve. Done.
If your testing line touches a vertex, well, then it gets fussy. It depends whether the two edges of the curve that go into that vertex stay on the same side as your testing line. If the original curve’s edges stay on the same side of your testing line, then don’t count that as a crossing. If the edges go on opposite sides of the testing line, then that does count as one crossing. With that in mind, carry on like you did before. An even number of crossings means your point was outside. An odd number of crossings means your point was inside.
So go ahead and do this a couple times with a few labyrinths and sample points. It’s fun and elevates your doodling to the heights of 19th-century mathematics. Also once you’ve done that a couple times you’ve proved the Jordan curve theorem.
Well, no, not quite. But you are most of the way to proving it for a special case. If the curve is a polygon, a shape made up of a finite number of line segments, then you’ve got almost all the proof done. You have to finish it off by choosing a ray, a direction, that isn’t parallel to any of the polygon’s line segments. (This is one reason this method only works for polygons, and fails for stuff like the Koch Snowflake. It also doesn’t work well with space-filling curves, which are things that exist. Yes, those are what they sound like: lines that squiggle around so much they fill up area. Some can fill volume. I swear. It’s fractal stuff.) Imagine all the lines that are parallel to that ray. There’s definitely some point along that line that’s outside the curve. You’ll need that for reference. Classify all the points on that line by whether there’s an even or an odd number of crossings between a starting point and your reference definitely-outside point. Keep doing that for all these many parallel lines.
And that’s it. The mess of points that have an odd number of intersections are the inside. The mess of points that have an even number of intersections are the outside.
You won’t be surprised to know there’s versions of the Jordan curve theorem for solid objects in three-dimensional space. And for hyperdimensional spaces too. You can always work out an inside and an outside, as long as space isn’t being all weird. But it might sound like it’s not much of a theorem. So you can work out an inside and an outside; so what?
But it’s one of those great utility theorems. It pops in to places, the perfect tool for a problem you were just starting to notice existed. If I can get my rhetoric organized I hope to show that off next week, when I figure to do the Five-Color Map Theorem.
As I get into the second month of Theorem Thursdays I have, I think, the whole roster of weeks sketched out. Today, I want to dive into some real analysis, and the study of numbers. It’s the sort of thing you normally get only if you’re willing to be a mathematics major. I’ll try to be readable by people who aren’t. If you carry through to the end and follow directions you’ll have your very own mathematical construct, too, so enjoy.
Liouville’s Approximation Theorem
It all comes back to polynomials. Of course it does. Polynomials aren’t literally everything in mathematics. They just come close. Among the things we can do with polynomials is divide up the real numbers into different sets. The tool we use is polynomials with integer coefficients. Integers are the positive and the negative whole numbers, stuff like ‘4’ and ‘5’ and ‘-12’ and ‘0’.
A polynomial is the sum of a bunch of products of coefficients multiplied by a variable raised to a power. We can use anything for the variable’s name. So we use ‘x’. Sometimes ‘t’. If we want complex-valued polynomials we use ‘z’. Some people trying to make a point will use ‘y’ or ‘s’ but they’re just showing off. Coefficients are just numbers. If we know the numbers, great. If we don’t know the numbers, or we want to write something that doesn’t commit us to any particular numbers, we use letters from the start of the alphabet. So we use ‘a’, maybe ‘b’ if we must. If we need a lot of numbers, we use subscripts: a0, a1, a2, and so on, up to some an for some big whole number n. To talk about one of these without committing ourselves to a specific example we use a subscript of i or j or k: aj, ak. It’s possible that aj and ak equal each other, but they don’t have to, unless j and k are the same whole number. They might also be zero, but they don’t have to be. They can be any numbers. Or, for this essay, they can be any integers. So we’d write a generic polynomial f(x) as:
(Some people put the coefficients in the other order, that is, and so on. That’s not wrong. The name we give a number doesn’t matter. But it makes it harder to remember what coefficient matches up with, say, x14.)
A zero, or root, is a value for the variable (‘x’, or ‘t’, or what have you) which makes the polynomial equal to zero. It’s possible that ‘0’ is a zero, but don’t count on it. A polynomial of degree n — meaning the highest power to which x is raised is n — can have up to n different real-valued roots. All we’re going to care about is one.
Rational numbers are what we get by dividing one whole number by another. They’re numbers like 1/2 and 5/3 and 6. They’re numbers like -2.5 and 1.0625 and negative a billion. Almost none of the real numbers are rational numbers; they’re exceptional freaks. But they are all the numbers we actually compute with, once we start working out digits. Thus we remember that to live is to live paradoxically.
And every rational number is a root of a first-degree polynomial. That is, there’s some polynomial f(x) = a_0 + a_1 x that’s made zero for your polynomial. It’s easy to tell you what it is, too. Pick your rational number. You can write that as the integer p divided by the integer q. Now look at the polynomial f(x) = p – q x. Astounded yet?
That trick will work for any rational number. It won’t work for any irrational number. There’s no first-degree polynomial with integer coefficients that has the square root of two as a root. There are polynomials that do, though. There’s f(x) = 2 – x2. You can find the square root of two as the zero of a second-degree polynomial. You can’t find it as the zero of any lower-degree polynomials. So we say that this is an algebraic number of the second degree.
This goes on higher. Look at the cube root of 2. That’s another irrational number, so no first-degree polynomials have it as a root. And there’s no second-degree polynomials that have it as a root, not if we stick to integer coefficients. Ah, but f(x) = 2 – x3? That’s got it. So the cube root of two is an algebraic number of degree three.
We can go on like this, although I admit examples for higher-order algebraic numbers start getting hard to justify. Most of the numbers people have heard of are either rational or are order-two algebraic numbers. I can tell you truly that the eighth root of two is an eighth-degree algebraic number. But I bet you don’t feel enlightened. At best you feel like I’m setting up for something. The number r(5), the smallest radius a disc can have so that five of them will completely cover a disc of radius 1, is eighth-degree and that’s interesting. But you never imagined the number before and don’t have any idea how big that is, other than “I guess that has to be smaller than 1”. (It’s just a touch less than 0.61.) I sound like I’m wasting your time, although you might start doing little puzzles trying to make smaller coins cover larger ones. Do have fun.
Liouville’s Approximation Theorem is about approximating algebraic numbers with rational ones. Almost everything we ever do is with rational numbers. That’s all right because we can make the difference between the number we want, even if it’s r(5), and the numbers we can compute with, rational numbers, as tiny as we need. We trust that the errors we make from this approximation will stay small. And then we discover chaos science. Nothing is perfect.
For example, suppose we need to estimate π. Everyone knows we can approximate this with the rational number 22/7. That’s about 3.142857, which is all right but nothing great. Some people know we can approximate it as 333/106. (I didn’t until I started writing this paragraph and did some research.) That’s about 3.141509, which is better. Then there’s 355/113, which is not as famous as 22/7 but is a celebrity compared to 333/106. That’s about 3.141529. Then we get into some numbers only mathematics hipsters know: 103993/33102 and 104348/33215 and so on. Fine.
The Liouville Approximation Theorem is about sequences that converge on an irrational number. So we have our first approximation x1, that’s the integer p1 divided by the integer q1. So, 22 and 7. Then there’s the next approximation x2, that’s the integer p2 divided by the integer q2. So, 333 and 106. Then there’s the next approximation yet, x3, that’s the integer p3 divided by the integer q3. As we look at more and more approximations, xj‘s, we get closer and closer to the actual irrational number we want, in this case π. Also, the denominators, the qj‘s, keep getting bigger.
The theorem speaks of having an algebraic number, call it x, of some degree n greater than 1. Then we have this limit on how good an approximation can be. The difference between the number x that we want, and our best approximation p / q, has to be larger than the number (1/q)n + 1. The approximation might be higher than x. It might be lower than x. But it will be off by at least the n-plus-first power of 1/q.
Polynomials let us separate the real numbers into infinitely many tiers of numbers. They also let us say how well the most accessible tier of numbers, rational numbers, can approximate these more exotic things.
One of the things we learn by looking at numbers through this polynomial screen is that there are transcendental numbers. These are numbers that can’t be the root of any polynomial with integer coefficients. π is one of them. e is another. Nearly all numbers are transcendental. But the proof that any particular number is one is hard. Joseph Liouville showed that transcendental numbers must exist by using continued fractions. But this approximation theorem tells us how to make our own transcendental numbers. This won’t be any number you or anyone else has ever heard of, unless you pick a special case. But it will be yours.
You will need:
a1, an integer from 1 to 9, such as ‘1’, ‘9’, or ‘5’.
a2, another integer from 1 to 9. It may be the same as a1 if you like, but it doesn’t have to be.
a3, yet another integer from 1 to 9. It may be the same as a1 or a2 or, if it so happens, both.
a4, one more integer from 1 to 9 and you know what? Let’s summarize things a bit.
A whopping great big gob of integers aj, every one of them from 1 to 9, for every possible integer ‘j’ so technically this is infinitely many of them.
Comfort with the notation n!, which is the factorial of n. For whole numbers that’s the product of every whole number from 1 to n, so, 2! is 1 times 2, or 2. 3! is 1 times 2 times 3, or 6. 4! is 1 times 2 times 3 times 4, or 24. And so on.
Not to be thrown by me writing -n!. By that I mean work out n! and then multiply that by -1. So -2! is -2. -3! is -6. -4! is -24. And so on.
Now, assemble them into your very own transcendental number z, by this formula:
If you’ve done it right, this will look something like:
Ah, but, how do you know this is transcendental? We can prove it is. The proof is by contradiction, which is how a lot of great proofs are done. We show nonsense follows if the thing isn’t true, so the thing must be true. (There are mathematicians that don’t care for proof-by-contradiction. They insist on proof by charging straight ahead and showing a thing is true directly. That’s a matter of taste. I think every mathematician feels that way sometimes, to some extent or on some issues. The proof-by-contradiction is easier, at least in this case.)
Suppose that your z here is not transcendental. Then it’s got to be an algebraic number of degree n, for some finite number n. That’s what it means not to be transcendental. I don’t know what n is; I don’t care. There is some n and that’s enough.
Now, let’s let zm be a rational number approximating z. We find this approximation by taking the first m! digits after the decimal point. So, z1 would be just the number 0.a1. z2 is the number 0.a1a2. z3 is the number 0.a1a2000a3. I don’t know what m you like, but that’s all right. We’ll pick a nice big m.
So what’s the difference between z and zm? Well, it can’t be larger than 10 times 10-(m + 1)!. This is for the same reason that π minus 3.14 can’t be any bigger than 0.01.
Now suppose we have the best possible rational approximation, p/q, of your number z. Its first m! digits are going to be p / 10m!. This will be zm And by the Liouville Approximation Theorem, then, the difference between z and zm has to be at least as big as (1/10m!)(n + 1).
So we know the difference between z and zm has to be larger than one number. And it has to be smaller than another. Let me write those out.
We don’t need the z – zm anymore. That thing on the rightmost side we can write what I’ll swear is a little easier to use. What we have left is:
And this will be true whenever the number m! (n + 1) is greater than (m + 1)! – 1 for big enough numbers m.
But there’s the thing. This isn’t true whenever m is greater than n. So the difference between your alleged transcendental number and its best-possible rational approximation has to be simultaneously bigger than a number and smaller than that same number without being equal to it. Supposing your number is anything but transcendental produces nonsense. Therefore, congratulations! You have a transcendental number.
If you chose all 1’s for your aj‘s, then you have what is sometimes called the Liouville Constant. If you didn’t, you may have a transcendental number nobody’s ever noticed before. You can name it after someone if you like. That’s as meaningful as naming a star for someone and cheaper. But you can style it as weaving someone’s name into the universal truth of mathematics. Enjoy!
I’m glad to finally give you a mathematics essay that lets you make something you can keep.
I’m going to let the Mean Value Theorem slide a while. I feel more like a Fixed Point Theorem today. As with the Mean Value Theorem there’s several of these. Here I’ll start with an easy one.
The Fixed Point Theorem.
Back when the world and I were young I would play with electronic calculators. They encouraged play. They made it so easy to enter a number and hit an operation, and then hit that operation again, and again and again. Patterns appeared. Start with, say, ‘2’ and hit the ‘squared’ button, the smaller ‘2’ raised up from the key’s baseline. You got 4. And again: 16. And again: 256. And again and again and you got ever-huger numbers. This happened whenever you started from a number bigger than 1. Start from something smaller than 1, however tiny, and it dwindled down to zero, whatever you tried. Start at ‘1’ and it just stays there. The results were similar if you started with negative numbers. The first squaring put you in positive numbers and everything carried on as before.
This sort of thing happened a lot. Keep hitting the mysterious ‘exp’ and the numbers would keep growing forever. Keep hitting ‘sqrt’; if you started above 1, the numbers dwindled to 1. Start below and the numbers rise to 1. Or you started at zero, but who’s boring enough to do that? ‘log’ would start with positive numbers and keep dropping until it turned into a negative number. The next step was the calculator’s protest we were unleashing madness on the world.
But you didn’t always get zero, one, infinity, or madness, from repeatedly hitting the calculator button. Sometimes, some functions, you’d get an interesting number. If you picked any old number and hit cosine over and over the digits would eventually settle down to around 0.739085. Or -0.739085. Cosine’s great. Tangent … tangent is weird. Tangent does all sorts of bizarre stuff. But at least cosine is there, giving us this interesting number.
(Something you might wonder: this is the cosine of an angle measured in radians, which is how mathematicians naturally think of angles. Normal people measure angles in degrees, and that will have a different fixed point. We write both the cosine-in-radians and the cosine-in-degrees using the shorthand ‘cos’. We get away with this because people who are confused by this are too embarrassed to call us out on it. If we’re thoughtful we write, say, ‘cos x’ for radians and ‘cos x°’ for degrees. This makes the difference obvious. It doesn’t really, but at least we gave some hint to the reader.)
This all is an example of a fixed point theorem. Fixed point theorems turn up in a lot of fields. They were most impressed upon me in dynamical systems, studying how a complex system changes in time. A fixed point, for these problems, is an equilibrium. It’s where things aren’t changed by a process. You can see where that’s interesting.
In this series I haven’t stated theorems exactly much, and I haven’t given them real proofs. But this is an easy one to state and to prove. Start off with a function, which I’ll name ‘f’, because yes that is exactly how much effort goes in to naming functions. It has as a domain the interval [a, b] for some real numbers ‘a’ and ‘b’. And it has as rang the same interval, [a, b]. It might use the whole range; it might use only a subset of it. And we have to require that f is continuous.
Then there has to be at least one fixed point. There must be at last one number ‘c’, somewhere in the interval [a, b], for which f(c) equals c. There may be more than one; we don’t say anything about how many there are. And it can happen that c is equal to a. Or that c equals b. We don’t know that it is or that it isn’t. We just know there’s at least one ‘c’ that makes f(c) equal c.
You get that in my various examples. If the function f has the rule that any given x is matched to x2, then we do get two fixed points: f(0) = 02 = 0, and, f(1) = 12 = 1. Or if f has the rule that any given x is matched to the square root of x, then again we have: and . Same old boring fixed points. The cosine is a little more interesting. For that we have .
… Yeah, fair enough. Well, here’s how to do it. We’ll take the original function f and create, based on it, a new function. We’ll dig deep in the alphabet and name that ‘g’. It has the same domain as f, [a, b]. Its range is … oh, well, something in the real numbers. Don’t care. The wonder comes from the rule we use.
The rule for ‘g’ is this: match the given number ‘x’ with the number ‘f(x) – x’. That is, g(a) equals whatever f(a) would be, minus a. g(b) equals whatever f(b) would be, minus b. We’re allowed to define a function in terms of some other function, as long as the symbols are meaningful. But we aren’t doing anything wrong like dividing by zero or taking the logarithm of a negative number or asking for f where it isn’t defined.
You might protest that we don’t know what the rule for f is. We’re told there is one, and that it’s a continuous function, but nothing more. So how can I say I’ve defined g in terms of a function I don’t know?
In the first place, I already know everything about f that I need to. I know it’s a continuous function defined on the interval [a, b]. I won’t use any more than that about it. And that’s great. A theorem that doesn’t require knowing much about a function is one that applies to more functions. It’s like the difference between being able to say something true of all living things in North America, and being able to say something true of all persons born in Redbank, New Jersey, on the 18th of February, 1944, who are presently between 68 and 70 inches tall and working on their rock operas. Both things may be true, but one of those things you probably use more.
In the second place, suppose I gave you a specific rule for f. Let me say, oh, f matches x with the arccosecant of x. Are you feeling any more enlightened now? Didn’t think so.
Back to g. Here’s some things we can say for sure about it. g is a function defined on the interval [a, b]. That’s how we set it up. Next point: g is a continuous function on the interval [a, b]. Remember, g is just the function f, which was continuous, minus x, which is also continuous. The difference of two continuous functions is still going to be continuous. (This is obvious, although it may take some considered thinking to realize why it is obvious.)
Now some interesting stuff. What is g(a)? Well, it’s whatever number f(a) is minus a. I can’t tell you what number that is. But I can tell you this: it’s not negative. Remember that f(a) has to be some number in the interval [a, b]. That is, it’s got to be no smaller than a. So the smallest f(a) can be is equal to a, in which case f(a) minus a is zero. And f(a) might be larger than a, in which case f(a) minus a is positive. So g(a) is either zero or a positive number.
(If you’ve just realized where I’m going and gasped in delight, well done. If you haven’t, don’t worry. You will. You’re just out of practice.)
What about g(b)? Since I don’t know what f(b) is, I can’t tell you what specific number it is. But I can tell you it’s not a positive number. The reasoning is just like above: f(b) is some number on the interval [a, b]. So the biggest number f(b) can equal is b. And in that case f(b) minus b is zero. If f(b) is any smaller than b, then f(b) minus b is negative. So g(b) is either zero or a negative number.
(Smiling at this? Good job. If you aren’t, again, not to worry. This sort of argument is not the kind of thing you do in Boring Algebra. It takes time and practice to think this way.)
And now the Intermediate Value Theorem works. g(a) is a positive number. g(b) is a negative number. g is continuous from a to b. Therefore, there must be some number ‘c’, between a and b, for which g(c) equals zero. And remember what g(c) means: f(c) – c equals 0. Therefore f(c) has to equal c. There has to be a fixed point.
And some tidying up. Like I said, g(a) might be positive. It might also be zero. But if g(a) is zero, then f(a) – a = 0. So a would be a fixed point. And similarly if g(b) is zero, then f(b) – b = 0. So then b would be a fixed point. The important thing is there must be at least some fixed point.
Now that calculator play starts taking on purposeful shape. Squaring a number could find a fixed point only if you started with a number from -1 to 1. The square of a number outside this range, such as ‘2’, would be bigger than you started with, and the Fixed Point Theorem doesn’t apply. Similarly with exponentials. But square roots? The square root of any number from 0 to a positive number ‘b’ is a number between 0 and ‘b’, at least as long as b was bigger than 1. So there was a fixed point, at 1. The cosine of a real number is some number between -1 and 1, and the cosines of all the numbers between -1 and 1 are themselves between -1 and 1. The Fixed Point Theorem applies. Tangent isn’t a continuous function. And the calculator play never settles on anything.
As with the Intermediate Value Theorem, this is an existence proof. It guarantees there is a fixed point. It doesn’t tell us how to find one. Calculator play does, though. Start from any old number that looks promising and work out f for that number. Then take that and put it back into f. And again. And again. This is known as “fixed point iteration”. It won’t give you the exact answer.
Not usually, anyway. In some freak cases it will. But what it will give, provided some extra conditions are satisfied, is a sequence of values that get closer and closer to the fixed point. When you’re close enough, then you stop calculating. How do you know you’re close enough? If you know something about the original f you can work out some logically rigorous estimates. Or you just keep calculating until all the decimal points you want stop changing between iterations. That’s not logically sound, but it’s easy to program.
That won’t always work. It’ll only work if the function f is differentiable on the interval (a, b). That is, it can’t have corners. And there have to be limits on how fast the function changes on the interval (a, b). If the function changes too fast, iteration can’t be guaranteed to work. But often if we’re interested in a function at all then these conditions will be true, or we can think of a related function that for which they are true.
And even if it works it won’t always work well. It can take an enormous pile of calculations to get near the fixed point. But this is why we have computers, and why we can leave them to work overnight.
And yet such a simple idea works. It appears in ancient times, in a formula for finding the square root of an arbitrary positive number ‘N’. (Find the fixed point for ). It creeps into problems that don’t look like fixed points. Calculus students learn of something called the Newton-Raphson Iteration. It finds roots, points where a function f(x) equals zero. Mathematics majors learn of numerical methods to solve ordinary differential equations. The most stable of these are again fixed-point iteration schemes, albeit in disguise.
For this week I have something I want to follow up on. We’ll see if I make it that far.
The Mean Value Theorem.
My subject line disagrees with the header just above here. I want to talk about the Mean Value Theorem. It’s one of those things that turns up in freshman calculus and then again in Analysis. It’s introduced as “the” Mean Value Theorem. But like many things in calculus it comes in several forms. So I figure to talk about one of them here, and another form in a while, when I’ve had time to make up drawings.
Calculus can split effortlessly into two kinds of things. One is differential calculus. This is the study of continuity and smoothness. It studies how a quantity changes if someting affecting it changes. It tells us how to optimize things. It tells us how to approximate complicated functions with simpler ones. Usually polynomials. It leads us to differential equations, problems in which the rate at which something changes depends on what value the thing has.
The other kind is integral calculus. This is the study of shapes and areas. It studies how infinitely many things, all infinitely small, add together. It tells us what the net change in things are. It tells us how to go from information about every point in a volume to information about the whole volume.
They aren’t really separate. Each kind informs the other, and gives us tools to use in studying the other. And they are almost mirrors of one another. Differentials and integrals are not quite inverses, but they come quite close. And as a result most of the important stuff you learn in differential calculus has an echo in integral calculus. The Mean Value Theorem is among them.
The Mean Value Theorem is a rule about functions. In this case it’s functions with a domain that’s an interval of the real numbers. I’ll use ‘a’ as the name for the smallest number in the domain and ‘b’ as the largest number. People talking about the Mean Value Theorem often do. The range is also the real numbers, although it doesn’t matter which ones.
I’ll call the function ‘f’ in accord with a longrunning tradition of not working too hard to name functions. What does matter is that ‘f’ is continuous on the interval [a, b]. I’ve described what ‘continuous’ means before. It means that here too.
And we need one more thing. The function f has to be differentiable on the interval (a, b). You maybe noticed that before I wrote [a, b], and here I just wrote (a, b). There’s a difference here. We need the function to be continuous on the “closed” interval [a, b]. That is, it’s got to be continuous for ‘a’, for ‘b’, and for every point in-between.
But we only need the function to be differentiable on the “open” interval (a, b). That is, it’s got to be continuous for all the points in-between ‘a’ and ‘b’. If it happens to be differentiable for ‘a’, or for ‘b’, or for both, that’s great. But we won’t turn away a function f for not being differentiable at those points. Only the interior. That sort of distinction between stuff true on the interior and stuff true on the boundaries is common. This is why mathematicians have words for “including the boundaries” (“closed”) and “never minding the boundaries” (“open”).
As to what “differentiable” is … A function is differentiable at a point if you can take its derivative at that point. I’m sure that clears everything up. There are many ways to describe what differentiability is. One that’s not too bad is to imagine zooming way in on the curve representing a function. If you start with a big old wobbly function it waves all around. But pick a point. Zoom in on that. Does the function stay all wobbly, or does it get more steady, more straight? Keep zooming in. Does it get even straighter still? If you zoomed in over and over again on the curve at some point, would it look almost exactly like a straight line?
If it does, then the function is differentiable at that point. It has a derivative there. The derivative’s value is whatever the slope of that line is. The slope is that thing you remember from taking Boring Algebra in high school. That rise-over-run thing. But this derivative is a great thing to know. You could approximate the original function with a straight line, with slope equal to that derivative. Close to that point, you’ll make a small enough error nobody has to worry about it.
That there will be this straight line approximation isn’t true for every function. Here’s an example. Picture a line that goes up and then takes a 90-degree turn to go back down again. Look at the corner. However close you zoom in on the corner, there’s going to be a corner. It’s never going to look like a straight line; there’s a 90-degree angle there. It can be a smaller angle if you like, but any sort of corner breaks this differentiability. This is a point where the function isn’t differentiable.
There are functions that are nothing but corners. They can be differentiable nowhere, or only at a tiny set of points that can be ignored. (A set of measure zero, as the dialect would put it.) Mathematicians discovered this over the course of the 19th century. They got into some good arguments about how that can even make sense. It can get worse. Also found in the 19th century were functions that are continuous only at a single point. This smashes just about everyone’s intuition. But we can’t find a definition of continuity that’s as useful as the one we use now and avoids that problem. So we accept that it implies some pathological conclusions and carry on as best we can.
Now I get to the Mean Value Theorem in its differential calculus pelage. It starts with the endpoints, ‘a’ and ‘b’, and the values of the function at those points, ‘f(a)’ and ‘f(b)’. And from here it’s easiest to figure what’s going on if you imagine the plot of a generic function f. I recommend drawing one. Just make sure you draw it without lifting the pen from paper, and without including any corners anywhere. Something wiggly.
Draw the line that connects the ends of the wiggly graph. Formally, we’re adding the line segment that connects the points with coordinates (a, f(a)) and (b, f(b)). That’s coordinate pairs, not intervals. That’s clear in the minds of the mathematicians who don’t see why not to use parentheses over and over like this. (We are short on good grouping symbols like parentheses and brackets and braces.)
Per the Mean Value Theorem, there is at least one point whose derivative is the same as the slope of that line segment. If you were to slide the line up or down, without changing its orientation, you’d find something wonderful. Most of the time this line intersects the curve, crossing from above to below or vice-versa. But there’ll be at least one point where the shifted line is “tangent”, where it just touches the original curve. Close to that touching point, the “tangent point”, the shifted line and the curve blend together and can’t be easily told apart. As long as the function is differentiable on the open interval (a, b), and continuous on the closed interval [a, b], this will be true. You might convince yourself of it by drawing a couple of curves and taking a straightedge to the results.
This is an existence theorem. Like the Intermediate Value Theorem, it doesn’t tell us which point, or points, make the thing we’re interested in true. It just promises us that there is some point that does it. So it gets used in other proofs. It lets us mix information about intervals and information about points.
It’s tempting to try using it numerically. It looks as if it justifies a common differential-calculus trick. Suppose we want to know the value of the derivative at a point. We could pick a little interval around that point and find the endpoints. And then find the slope of the line segment connecting the endpoints. And won’t that be close enough to the derivative at the point we care about?
Well. Um. No, we really can’t be sure about that. We don’t have any idea what interval might make the derivative of the point we care about equal to this line-segment slope. The Mean Value Theorem won’t tell us. It won’t even tell us if there exists an interval that would let that trick work. We can’t invoke the Mean Value Theorem to let us get away with that.
Often, though, we can get away with it. Differentiable functions do have to follow some rules. Among them is that if you do pick a small enough interval then approximations that look like this will work all right. If the function flutters around a lot, we need a smaller interval. But a lot of the functions we’re interested in don’t flutter around that much. So we can get away with it. And there’s some grounds to trust in getting away with it. The Mean Value Theorem isn’t any part of the grounds. It just looks so much like it ought to be.
I hope on a later Thursday to look at an integral-calculus form of the Mean Value Theorem.
I first learned of Cramer’s Rule in the way I expect most people do. It was an algebra course. I mean high school algebra. By high school algebra I mean you spend roughly eight hundred years learning ways to solve for x or to plot y versus x. Then take a pause for polar coordinates and matrices. Then you go back to finding both x and y.
Cramer’s Rule came up in the context of solving simultaneous equations. You have more than one variable. So x and y. Maybe z. Maybe even a w, before whoever set up the problem gives up and renames everything x1 and x2 and x62 and all that. You also have more than one equation. In fact, you have exactly as many equations as you have variables. Are there any sets of values those variables can have which make all those variable true simultaneously? Thus the imaginative name “simultaneous equations” or the search for “simultaneous solutions”.
If all the equations are linear then we can always say whether there’s simultaneous solutions. By “linear” we mean what we always mean in mathematics, which is, “something we can handle”. But more exactly it means the equations have x and y and whatever other variables only to the first power. No x-squared or square roots of y or tangents of z or anything. (The equations are also allowed to omit a variable. That is, if you have one equation with x, y, and z, and another with just x and z, and another with just y and z, that’s fine. We pretend the missing variable is there and just multiplied by zero, and proceed as before.) One way to find these solutions is with Cramer’s Rule.
Cramer’s Rule sets up some matrices based on the system of equations. If the system has two equations, it sets up three matrices. If the system has three equations, it sets up four matrices. If the system has twelve equations, it sets up thirteen matrices. You see the pattern here. And then you can take the determinant of each of these matrices. Dividing the determinant of one of these matrices by another one tells you what value of x makes all the equations true. Dividing the determinant of another matrix by the determinant of one of these matrices tells you which values of y makes all the equations true. And so on. The Rule tells you which determinants to use. It also says what it means if the determinant you want to divide by equals zero. It means there’s either no set of simultaneous solutions or there’s infinitely many solutions.
This gets dropped on us students in the vain effort to convince us knowing how to calculate determinants is worth it. It’s not that determinants aren’t worth knowing. It’s just that they don’t seem to tell us anything we care about. Not until we get into mappings and calculus and differential equations and other mathematics-major stuff. We never see it in high school.
And the hard part of determinants is that for all the cool stuff they tell us, they take forever to calculate. The determinant for a matrix with two rows and two columns isn’t bad. Three rows and three columns is getting bad. Four rows and four columns is awful. The determinant for a matrix with five rows and five columns you only ever calculate if you’ve made your teacher extremely cross with you.
So there’s the genius and the first problem with Cramer’s Rule. It takes a lot of calculating. Many any errors along the way with the calculation and your work is wrong. And worse, it won’t be wrong in an obvious way. You can find the error only by going over every single step and hoping to catch the spot where you, somehow, got 36 times -7 minus 21 times -8 wrong.
The second problem is nobody in high school algebra mentions why systems of linear equations should be interesting to solve. Oh, maybe they’ll explain how this is the work you do to figure out where two straight lines intersect. But that just shifts the “and we care because … ?” problem back one step. Later on we might come to understand the lines represent cases where something we’re interested in is true, or where it changes from true to false.
This sort of simultaneous-solution problem turns up naturally in optimization problems. These are problems where you try to find a maximum subject to some constraints. Or find a minimum. Maximums and minimums are the same thing when you think about them long enough. If all the constraints can be satisfied at once and you get a maximum (or minimum, whatever), great! If they can’t … Well, you can study how close it’s possible to get, and what happens if you loosen one or more constraint. That’s worth knowing about.
The third problem with Cramer’s Rule is that, as a method, it kind of sucks. We can be convinced that simultaneous linear equations are worth solving, or at least that we have to solve them to get out of High School Algebra. And we have computers. They can grind away and work out thirteen determinants of twelve-row-by-twelve-column matrices. They might even get an answer back before the end of the term. (The amount of work needed for a determinant grows scary fast as the matrix gets bigger.) But all that work might be meaningless.
The trouble is that Cramer’s Rule is numerically unstable. Before I even explain what that is you already sense it’s a bad thing. Think of all the good things in your life you’ve heard described as unstable. Fair enough. But here’s what we mean by numerically unstable.
Is 1/3 equal to 0.3333333? No, and we know that. But is it close enough? Sure, most of the time. Suppose we need a third of sixty million. 0.3333333 times 60,000,000 equals 19,999,998. That’s a little off of the correct 20,000,000. But I bet you wouldn’t even notice the difference if nobody pointed it out to you. Even if you did notice it you might write off the difference. “If we must, make up the difference out of petty cash”, you might declare, as if that were quite sensible in the context.
And that’s so because this multiplication is numerically stable. Make a small error in either term and you get a proportional error in the result. A small mistake will — well, maybe it won’t stay small, necessarily. But it’ll not grow too fast too quickly.
So now you know intuitively what an unstable calculation is. This is one in which a small error doesn’t necessarily stay proportionally small. It might grow huge, arbitrarily huge, and in few calculations. So your answer might be computed just fine, but actually be meaningless.
This isn’t because of a flaw in the computer per se. That is, it’s working as designed. It’s just that we might need, effectively, infinitely many digits of precision for the result to be correct. You see where there may be problems achieving that.
Cramer’s Rule isn’t guaranteed to be nonsense, and that’s a relief. But it is vulnerable to this. You can set up problems that look harmless but which the computer can’t do. And that’s surely the worst of all worlds, since we wouldn’t bother calculating them numerically if it weren’t too hard to do by hand.
I don’t want to get too down on Cramer’s Rule. It’s not like the numerical instability hurts every problem you might use it on. And you can, at the cost of some more work, detect whether a particular set of equations will have instabilities. That requires a lot of calculation but if we have the computer to do the work fine. Let it. And a computer can limit its numerical instabilities if it can do symbolic manipulations. That is, if it can use the idea of “one-third” rather than 0.3333333. The software package Mathematica, for example, does symbolic manipulations very well. You can shed many numerical-instability problems, although you gain the problem of paying for a copy of Mathematica.
If you just care about, or just need, one of the variables then what the heck. Cramer’s Rule lets you solve for just one or just some of the variables. That seems like a niche application to me, but it is there.
And the Rule re-emerges in pure analysis, where numerical instability doesn’t matter. When we look to differential equations, for example, we often find solutions are combinations of several independent component functions. Bases, in fact. Testing whether we have found independent bases can be done through a thing called the Wronskian. That’s a way that Cramer’s Rule appears in differential equations.
Wikipedia also asserts the use of Cramer’s Rule in differential geometry. I believe that’s a true statement, and that it will be reflected in many mechanics problems. In these we can use our knowledge that, say, energy and angular momentum of a system are constant values to tell us something of how positions and velocities depend on each other. But I admit I’m not well-read in differential geometry. That’s something which has indeed caused me pain in my scholarly life. I don’t know whether differential geometers thank Cramer’s Rule for this insight or whether they’re just glad to have got all that out of the way. (See the above Wikipedia Editors quarrel.)
I admit for all this talk about Cramer’s Rule I haven’t said what it is. Not in enough detail to pass your high school algebra class. That’s all right. It’s easy to find. MathWorld has the rule in pretty simple form. Mathworld does forget to define what it means by the vectord. (It’s the vector with components d1, d2, et cetera.) But that’s enough technical detail. If you need to calculate something using it, you can probably look closer at the problem and see if you can do it another way instead. Or you’re in high school algebra and just have to slog through it. It’s all right. Eventually you can put x and y aside and do geometry.
I am still taking requests for this Theorem Thursdays sequence. I intend to post each Thursday in June and July an essay talking about some theorem and what it means and why it’s important. I have gotten a couple of requests in, but I’m happy to take more; please just give me a little lead time. But I want to start with one that delights me.
The Intermediate Value Theorem
I own a Scion tC. It’s a pleasant car, about 2400 percent more sporty than I am in real life. I got it because it met my most important criteria: it wasn’t expensive and it had a sun roof. That it looks stylish is an unsought bonus.
But being a car, and a black one at that, it has a common problem. Leave it parked a while, then get inside. In the winter, it gets so cold that snow can fall inside it. In the summer, it gets so hot that the interior, never mind the passengers, risks melting. While pondering this slight inconvenience I wondered, isn’t there any outside temperature that leaves my car comfortable?
Of course there is. We know this before thinking about it. The sun heats the car, yes. When the outside temperature is low enough, there’s enough heat flowing out that the car gets cold. When the outside temperature’s high enough, not enough heat flows out. The car stays warm. There must be some middle temperature where just enough heat flows out that the interior doesn’t get particularly warm or cold. Not just one middle temperature, come to that. There is a range of temperatures that are comfortable to sit in. But that just means there’s a range of outside temperatures for which the car’s interior stays comfortable. We know this range as late April, early May, here. Most years, anyway.
The reasoning that lets us know there is a comfort-producing outside temperature we can see as a use of the Intermediate Value Theorem. It addresses a function f with domain [a, b], and range of the real numbers. The domain is closed; that is, the numbers we call ‘a’ and ‘b’ are both in the set. And f has to be a continuous function. If you want to draw it, you can do so without having to lift pen from paper. (WARNING: Do not attempt to pass your Real Analysis course with that definition. But that’s what the proper definition means.)
So look at the numbers f(a) and f(b). Pick some number between them, and I’ll call that number ‘g’. There must be at least one number ‘c’, that’s between ‘a’ and ‘b’, and for which f(c) equals g.
Bernard Bolzano, an early-19th century mathematician/logician/theologist/priest, gets the credit for first proving this theorem. Bolzano’s version was a little different. It supposes that f(a) and f(b) are of opposite sign. That is, f(a) is a positive and f(b) a negative number. Or f(a) is negative and f(b) is positive. And Bolzano’s theorem says there must be some number ‘c’ for which f(c) is zero.
You can prove this by drawing any wiggly curve at all and then a horizontal line in the middle of it. Well, that doesn’t prove it to mathematician’s satisfaction. But it will prove the matter in the sense that you’ll be convinced. It’ll also convince anyone you try explaining this to.
You might wonder why anyone needed this proved at all. It’s a bit like proving that as you pour water into the sink there’ll come a time the last dish gets covered with water. So it is. The need for a proof came about from the ongoing attempt to make mathematics rigorous. We have an intuitive idea of what it means for functions to be continuous; see my above comment about lifting pens from paper. Can that be put in terms that don’t depend on physical intuition? … Yes, it can. And we can divorce the Intermediate Value Theorem from our physical intuitions. We can know something that’s true even if we never see a car or a sink.
This theorem might leave you feeling a little hollow inside. Proving that there is some ‘c’ for which f(c) equals g, or even equals zero, doesn’t seem to tell us much about how to find it. It doesn’t even tell us that there’s only one ‘c’, rather than two or three or a hundred million candidates that meet our criteria. Fair enough. The Intermediate Value Theorem is more about proving the existence of solutions, rather than how to find them.
But knowing there is a solution can help us find them. The Intermediate Value Theorem as we know it grew out of finding roots for polynomials. One numerical method, easy to set up for any problem, is the bisection method. If you know that somewhere between ‘a’ and ‘b’ the function goes from positive to negative, then find the midpoint, ‘c’. The function is equal to zero either between ‘a’ and ‘c’, or between ‘c’ and ‘b’. Pick the side that it’s on, and bisect that. Pick the half of that which the zero must be in. Bisect that half. And repeat until you get close enough to the answer for your needs. (The same reasoning applies to a lot of problems in which you divide the search range in two each time until the answer appears.)
We can get some pretty heady results from the Intermediate Value Theorem, too, even if we don’t know where any of them are. An example you’ll see everywhere is that there must be spots on the opposite sides of the globe with the exact same temperature. Or humidity, or daily rainfall, or any other quantity like that. I had thought everyone was ripping that example off from Richard Courant and Herbert Robbins’s masterpiece What Is Mathematics?. But I can’t find this particular example in there. I wonder what we are all ripping it off from.
So here’s a neat example that is ripped off from them. Draw two blobs on the plane. Is there a straight line that bisects both of them at once? Bisecting here means there’s exactly as much of one blob on one side of the line as on the other. There certainly is. The trick is there are any number of lines that will bisect one blob, and then look at what that does to the other.
A similar ripped-off result you can do with a single blob of any shape you like. Draw any line that bisects it. There are a lot of candidates. Can you draw a line perpendicular to that so that the blob gets quartered, divided into four spots of equal area? Yes. Try it.
But surely the best use of the Intermediate Value Theorem is in the problem of wobbly tables. If the table has four legs, all the same length, and the problem is the floor isn’t level it’s all right. There is some way to adjust the table so it won’t wobble. (Well, the ground can’t be angled more than a bit over 35 degrees, but that’s all right. If the ground has a 35 degree angle you aren’t setting a table on it. You’re rolling down it.) Finally a mathematical proof can save us from despair!
Except that the proof doesn’t work if the table legs are uneven which, alas, they often are. But we can’t get everything.
Courant and Robbins put forth one more example that’s fantastic, although it doesn’t quite work. But it’s a train problem unlike those you’ve seen before. Let me give it to you as they set it out:
Suppose a train travels from station A to station B along a straight section of track. The journey need not be of uniform speed or acceleration. The train may act in any manner, speeding up, slowing down, coming to a halt, or even backing up for a while, before reaching B. But the exact motion of the train is supposed to be known in advance; that is, the function s = f(t) is given, where s is the distance of the train from station A, and t is the time, measured from the instant of departure.
On the floor of one of the cars a rod is pivoted so that it may move without friction either forward or backward until it touches the floor. If it does touch the floor, we assume that it remains on the floor henceforth; this wil be the case if the rod does not bounce.
Is it possible to place the rod in such a position that, if it is released at the instant when the train starts and allowed to move solely under the influence of gravity and the motion of the train, it will not fall to the floor during the entire journey from A to B?
They argue it is possible, and use the Intermediate Value Theorem to show it. They admit the range of angles it’s safe to start the rod from may be too small to be useful.
But they’re not quite right. Ian Stewart, in the revision of What Is Mathematics?, includes an appendix about this. Stewart credits Tim Poston with pointing out, in 1976, the flaw. It’s possible to imagine a path which causes the rod, from one angle, to just graze tipping over, let’s say forward, and then get yanked back and fall over flat backwards. This would leave no room for any starting angles that avoid falling over entirely.
It’s a subtle flaw. You might expect so. Nobody mentioned it between the book’s original publication in 1941, after which everyone liking mathematics read it, and 1976. And it is one that touches on the complications of spaces. This little Intermediate Value Theorem problem draws us close to chaos theory. It’s one of those ideas that weaves through all mathematics.