Elkement, who’s been a longtime support of my blogging here, has been thinking about stereographic projection recently. This comes from playing with complex-valued numbers. It’s hard to start thinking about something like “what is and not get into the projection. The projection itself Elkement describes a bit in this post, from early in August. It’s one of the ways to try to match the points on a sphere to the points on the entire, infinite plane. One common way to imagine it, and to draw it, is to imagine setting the sphere on the plane. Imagine sitting on the top of the sphere. Draw the line connecting the top of the sphere with whatever point you find interesting on the sphere, and then extend that line until it intersects the plane. Match your point on the sphere with that point on the plane. You can use this to trace out shapes on the sphere and find their matching shapes on the plane.
This distorts the shapes, as you’d expect. Well, the sphere has a finite area, the plane an infinite one. We can’t possibly preserve the areas of shapes in this transformation. But this transformation does something amazing that offends students when they first encounter it. It preserves circles: a circle on the original sphere becomes a circle on the plane, and vice-versa. I know, you want it to turn something into ellipses, at least. She takes a turn at thinking out reasons why this should be reasonable. There are abundant proofs of this, but it helps the intuition to see different ways to make the argument. And to have rough proofs, that outline the argument you mean to make. We need rigorous proofs, yes, but a good picture that makes the case convincing helps a good deal.
I assume that last week I disappointed Mr Wu, of the Singapore Maths Tuition blog, last week when I passed on a topic he suggested to unintentionally rewrite a good enough essay. I hope to make it up this week with a piece of linear algebra.
A Unitary Matrix — note the article; there is not a singular the Unitary Matrix — starts with a matrix. This is an ordered collection of scalars. The scalars we call elements. I can’t think of a time I ever saw a matrix represented except as a rectangular grid of elements, or as a capital letter for the name of a matrix. Or a block inside a matrix. In principle the elements can be anything. In practice, they’re almost always either real numbers or complex numbers. To speak of Unitary Matrixes invokes complex-valued numbers. If a matrix that would be Unitary has only real-valued elements, we call that an Orthogonal Matrix. It’s not wrong to call an Orthogonal matrix “Unitary”. It’s like pointing to a known square, though, and calling it a parallelogram. Your audience will grant that’s true. But it wonder what you’re getting at, unless you’re talking about a bunch of parallelograms and some of them happen to be squares.
As with polygons, though, there are many names for particular kinds of matrices. The flurry of them settles down on the Intro to Linear Algebra student and it takes three or four courses before most of them feel like familiar names. I will try to keep the flurry clear. First, we’re talking about square matrices, ones with the same number of rows as columns.
Start with any old square matrix. Give it the name U because you see where this is going. There are a couple of new matrices we can derive from it. One of them is the complex conjugate. This is the matrix you get by taking the complex conjugate of every term. So, if one element is , in the complex conjugate, that element would be . Reverse the plus or minus sign of the imaginary component. The shorthand for “the complex conjugate to matrix U” is . Also we’ll often just say “the conjugate”, taking the “complex” part as implied.
Start back with any old square matrix, again called U. Another thing you can do with it is take the transposition. This matrix, U-transpose, you get by keeping the order of elements but changing rows and columns. That is, the elements in the first row become the elements in the first column. The elements in the second row become the elements in the second column. Third row becomes the third column, and so on. The diagonal — first row, first column; second row, second column; third row, third column; and so on — stays where it was. The shorthand for “the transposition of U” is .
You can chain these together. If you start with U and take both its complex-conjugate and its transposition, you get the adjoint. We write that with a little dagger: . For a wonder, as matrices go, it doesn’t matter whether you take the transpose or the conjugate first. It’s the same . You may ask how people writing this out by hand never mistake for . This is a good question and I hope to have an answer someday. (I would write it as in my notes.)
And the last thing you can maybe do with a square matrix is take its inverse. This is like taking the reciprocal of a number. When you multiply a matrix by its inverse, you get the Identity Matrix. Not every matrix has an inverse, though. It’s worse than real numbers, where only zero doesn’t have a reciprocal. You can have a matrix that isn’t all zeroes and that doesn’t have an inverse. This is part of why linear algebra mathematicians command the big money. But if a matrix U has an inverse, we write that inverse as .
The Identity Matrix is one of a family of square matrices. Every element in an identity matrix is zero, except on the diagonal. That is, the element at row one, column one, is the number 1. The element at row two, column two is also the number 1. Same with row three, column three: another one. And so on. This is the “identity” matrix because it works like the multiplicative identity. Pick any matrix you like, and multiply it by the identity matrix; you get the original matrix right back. We use the name for an identity matrix. If we have to be clear how many rows and columns the matrix has, we write that as a subscript: or or or so on.
So this, finally, lets me say what a Unitary Matrix is. It’s any square matrix U where the adjoint, is the same matrix as the inverse, . It’s wonderful to learn you have a Unitary Matrix. Not just because, most of the time, finding the inverse of a matrix is a long and tedious procedure. Here? You have to write the elements in a different order and change the plus-or-minus sign on the imaginary numbers. The only way it would be easier if you had only real numbers, and didn’t have to take the conjugates.
That’s all a nice heap of terms. What makes any of them important, other than so Intro to Linear Algebra professors can test their students?
Well, you know mathematicians. If we like something like this, it’s usually because it holds out the prospect of turning a hard problems into easier ones. So it is. Start out with any old matrix. Call it A. Then there exist some unitary matrixes, call them U and V. And their product does something wonderful: is a “diagonal” matrix. A diagonal matrix has zeroes for every element except the diagonal ones. That is, row one, column one; row two, column two; row three, column three; and so on. The elements that trace a path from the upper-left to the lower-right corner of the matrix. (The diagonal from the upper-right to the lower-left we have nothing to do with.) Everything we might do with matrices is easier on a diagonal matrix. So we process our matrix A into this diagonal matrix D. Process it by whatever the heck we’re doing. If we then multiply this by the inverses of U and V? If we calculate ? We get whatever our process would have given us had we done it to A. And, since U and V are unitary matrices, it’s easy to find these inverses. Wonderful!
Also this sounds like I just said Unitary Matrixes are great because they solve a problem you never heard of before.
The 20th Century’s first great use for Unitary Matrixes, and I imagine the impulse for Mr Wu’s suggestion, was quantum mechanics. (A later use would be data compression.) Unitary Matrixes help us calculate how quantum systems evolve. This should be a little easier to understand if I use a simple physics problem as demonstration.
So imagine three blocks, all the same mass. They’re connected in a row, left to right. There’s two springs, one between the left and the center mass, one between the center and the right mass. The springs have the same strength. The blocks can only move left-to-right. But, within those bounds, you can do anything you like with the blocks. Move them wherever you like and let go. Let them go with a kick moving to the left or the right. The only restraint is they can’t pass through one another; you can’t slide the center block to the right of the right block.
This is not quantum mechanics, by the way. But it’s not far, either. You can turn this into a fine toy of a molecule. For now, though, think of it as a toy. What can you do with it?
A bunch of things, but there’s two really distinct ways these blocks can move. These are the ways the blocks would move if you just hit it with some energy and let the system do what felt natural. One is to have the center block stay right where it is, and the left and right blocks swinging out and in. We know they’ll swing symmetrically, the left block going as far to the left as the right block goes to the right. But all these symmetric oscillations look about the same. They’re one mode.
The other is … not quite antisymmetric. In this mode, the center block moves in one direction and the outer blocks move in the other, just enough to keep momentum conserved. Eventually the center block switches direction and swings the other way. But the outer blocks switch direction and swing the other way too. If you’re having trouble imagining this, imagine looking at it from the outer blocks’ point of view. To them, it’s just the center block wobbling back and forth. That’s the other mode.
And it turns out? It doesn’t matter how you started these blocks moving. The movement looks like a combination of the symmetric and the not-quite-antisymmetric modes. So if you know how the symmetric mode evolves, and how the not-quite-antisymmetric mode evolves? Then you know how every possible arrangement of this system evolves.
So here’s where we get to quantum mechanics. Suppose we know the quantum mechanics description of a system at some time. This we can do as a vector. And we know the Hamiltonian, the description of all the potential and kinetic energy, for how the system evolves. The evolution in time of our quantum mechanics description we can see as a unitary matrix multiplied by this vector.
The Hamiltonian, by itself, won’t (normally) be a Unitary Matrix. It gets the boring name H. It’ll be some complicated messy thing. But perhaps we can find a Unitary Matrix U, so that is a diagonal matrix. And then that’s great. The original H is hard to work with. The diagonalized version? That one we can almost always work with. And then we can go from solutions on the diagonalized version back to solutions on the original. (If the function describes the evolution of , then describes the evolution of .) The work that U (and ) does to H is basically what we did with that three-block, two-spring model. It’s picking out the modes, and letting us figure out their behavior. Then put that together to work out the behavior of what we’re interested in.
There are other uses, besides time-evolution. For instance, an important part of quantum mechanics and thermodynamics is that we can swap particles of the same type. Like, there’s no telling an electron that’s on your nose from an electron that’s in one of the reflective mirrors the Apollo astronauts left on the Moon. If they swapped positions, somehow, we wouldn’t know. It’s important for calculating things like entropy that we consider this possibility. Two particles swapping positions is a permutation. We can describe that as multiplying the vector that describes what every electron on the Earth and Moon is doing by a Unitary Matrix. Here it’s a matrix that does nothing but swap the descriptions of these two electrons. I concede this doesn’t sound thrilling. But anything that goes into calculating entropy is first-rank important.
As with time-evolution and with permutation, though, any symmetry matches a Unitary Matrix. This includes obvious things like reflecting across a plane. But it also covers, like, being displaced a set distance. And some outright obscure symmetries too, such as the phase of the state function . I don’t have a good way to describe what this is, physically; we can’t observe it directly. This symmetry, though, manifests as the conservation of electric charge, a thing we rather like.
This, then, is the sort of problem that draws Unitary Matrixes to our attention.
I have another topic today suggested by Beth, of the I Didn’t Have My Glasses On …. inspiration blog. It overlaps a bit with other essays I’ve posted this A-to-Z sequence, but that’s all right. We get a better understanding of things by considering them from several perspectives. This one will be a bit more historical.
Pop science writer Isaac Asimov told a story he was proud of about his undergraduate days. A friend’s philosophy professor held court after class. One day he declared mathematicians were mystics, believing in things they even admit are “imaginary numbers”. Young Asimov, taking offense, offered to prove the reality of the square root of minus one, if the professor gave him one-half pieces of chalk. The professor snapped a piece of chalk in half and gave one piece to him. Asimov said this is one piece of chalk. The professor answered it was half the length of a piece of chalk and Asimov said that’s not what he asked for. Even if we accept “half the length” is okay, how do we know this isn’t 48 percent the length of a standard piece of chalk? If the professor was that bad on “one-half” how could he have opinions on “imaginary numbers”?
This story is another “STEM undergraduates outwitting the philosophy expert” legend. (Even if it did happen. What we know is the story Asimov spun it into, in which a plucky young science fiction fan out-argued someone whose job is forming arguments.) Richard Feynman tells a similar story, befuddling a philosophy class with the question of how we can prove a brick has a interior. It helps young mathematicians and science majors feel better about their knowledge. But Asimov’s story does get at a couple points. First, that “imaginary” is a terrible name for a class of numbers. The square root of minus one is as “real” as one-half is. Second, we’ve decided that one-half is “real” in some way. What the philosophy professor would have baffled Asimov to explain is: in what way is one-half real? Or minus one?
We’re introduced to imaginary numbers through polynomials. I mean in education. It’s usually right after getting into quadratics, looking for solutions to equations like . That quadratic has two solutions, but it’s possible to have a quadratic with only one, such as . Or to have a quadratic with no solutions, such as, iconically, . We might underscore that by plotting the curve whose x- and y-coordinates makes true the equation . There’s no point on the curve with a y-coordinate of zero, so, there we go.
Having established that has no solutions, the course then asks “what if we go ahead and say there was one”? Two solutions, in fact, and . This is all right for introducing the idea that mathematics is a tool. If it doesn’t do something we need, we can alter it.
But I see trouble in teaching someone how you can’t take square roots of negative numbers and then teaching them how to take square roots of negative numbers. It’s confusing at least. It needs some explanation about what changed. We might do better introducing them in a more historical method.
Historically, imaginary numbers (in the West) come from polynomials, yes. Different polynomials. Cubics, and quartics. Mathematicians still liked finding roots of them. Mathematicians would challenge one another to solve sets of polynomials. This seems hard to believe, but many sources agree on this. I hope we’re not all copying Eric Temple Bell here. (Bell’s Men of Mathematics is an inspiring collection of biographical sketches. But it’s not careful differentiating legends from documented facts.) And there are enough nerd challenges today that I can accept people daring one another to find solutions of .
Quadratics, equations we can write as for some real numbers a, b, and c, we’ve known about forever. Euclid solved these kinds of equations using geometric reasoning. Chinese mathematicians 2200 years ago described rules for how to find roots. The Indian mathematician Brahmagupta, by the early 7th century, described the quadratic formula to find at least one root. Both possible roots were known to Indian mathematicians a thousand years ago. We’ve reduced the formula today to
With that filtering into Western Europe, the search was on for similar formulas for other polynomials. This turns into several interesting threads. One is a tale of intrigue and treachery involving Gerolamo Cardano, Niccolò Tartaglia, and Ludovico Ferrari. I’ll save that for another essay because I have to cut something out, so of course I skip the dramatic thing. Another thread is the search for quadratic-like formulas for other polynomials. They exist for third-power and fourth-power polynomials. Not (generally) for the fifth- or higher-powers. That is, there are individual polynomials you can solve by formulas, like, . But stare at it and you can see where that’s “really” a quadratic pretending to be sixth-power. Finding there was no formula to find, though, lead people to develop group theory. And group theory underlies much of mathematics and modern physics.
The first great breakthrough solving the general cubic, , came near the end of the 14th century in some manuscripts out of Florence. It’s built on a transformation. Transformations are key to mathematics. The point of a transformation is to turn a problem you don’t know how to do into one you do. As I write this, MathWorld lists 543 pages as matching “transformation”. That’s about half what “polynomial” matches (1,199) and about three times “trigonometric” (184). So that can help you judge importance.
Here, the transformation to make is to write a related polynomial in terms of a new variable. You can call that new variable x’ if you like, or z. I’ll use z so as to not have too many superscript marks flying around. This will be a “depressed polynomial”. “Depressed” here means that at least one of the coefficients in the new polynomial is zero. (Here, for this problem, it means we won’t have a squared term in the new polynomial.) I suspect the term is old-fashioned.
Let z be the new variable, related to x by the equation . And then figure out what and are. Using all that, and the knowledge that , and a lot of arithmetic, you get to one of these three equations:
where p and q are some new coefficients. They’re positive numbers, or possibly zeros. They’re both derived from a, b, c, and d. And so in the 15th Century the search was on to solve one or more of these equations.
From our perspective in the 21st century, our first question is: what three equations? How are these not all the same equation? And today, yes, we would write this as one depressed equation, most likely . We would allow that p or q or both might be negative numbers.
And there is part of the great mysterious historical development. These days we generally learn about negative numbers. Once we are comfortable, our teachers hope, with those we get imaginary numbers. But in the Western tradition mathematicians noticed both, and approached both, at roughly the same time. With roughly similar doubts, too. It’s easy to point to three apples; who can point to “minus three” apples? We can arrange nine apples into a neat square. How big a square can we set “minus nine” apples in?
Hesitation and uncertainty about negative numbers would continue quite a long while. At least among Western mathematicians. Indian mathematicians seem to have been more comfortable with them sooner. And merchants, who could model a negative number as a debt, seem to have gotten the idea better.
But even seemingly simple questions could be challenging. John Wallis, in the 17th century, postulated that negative numbers were larger than infinity. Leonhard Euler seems to have agreed. (The notion may seem odd. It has echoes today, though. Computers store numbers as bit patterns. The normal scheme represents negative numbers by making the first bit in a pattern 1. These bit patterns make the negative numbers look bigger than the biggest positive numbers. And thermodynamics gives us a temperature defined by the relationship of energy to entropy. That definition implies there can be negative temperatures. Those are “hotter” — higher-energy, at least — than infinitely-high positive temperatures.) In the 18th century we see temperature scales designed so that the weather won’t give negative numbers too often. Augustus De Morgan wrote in 1831 that a negative number “occurring as the solution of a problem indicates some inconsistency or absurdity”. De Morgan was not an amateur. He coded the rules for deductive logic so well we still call them De Morgan’s laws. He put induction on a logical footing. And he found negative numbers (and imaginary numbers) a sign of defective work. In 1831. 1831!
But back to cubic equations. Allow that we’ve gotten comfortable enough with negative numbers we only want to solve the one depressed equation of . How to do it? … Another transformation, then. There are a couple you can do. Modern mathematicians would likely define a new variable w, set so that . This turns the depressed equation into
And this, believe it or not, is a disguised quadratic. Multiply everything in it by and move things around a little. You get
From there, quadratic formula to solve for . Then from that, take cube roots and you get three values of z. From that, you get your three values of x.
You see why nobody has taught this in high school algebra since 1959. Also why I am not touching the quartic formula, the equivalent of this for polynomials of degree four.
There are other approaches. And they can work out easier for particular problems. Take, for example, which I introduced in the first act. It’s past the time we set it off.
Rafael Bombelli, in the 1570s, pondered this particular equation. Notice it’s already depressed. A formula developed by Cardano addressed this, in the form . Notice that’s the second of the three sorts of depressed polynomial. Cardano’s formula says that one of the roots will be at
Put to this problem, we get something that looks like a compelling reason to stop:
Bombelli did not stop with that, though. He carried on as though these expressions of the square root of -121 made sense. And, if he did that he found these terms added up. You get an x of 4.
Which is true. It’s easy to check that it’s right. And here is the great surprising thing. Start from the respectable enough equation. It has nothing suspicious in it, not even negative numbers. Follow it through and you need to use negative numbers. Worse, you need to use the square roots of negative numbers. But keep going, as though you were confident in this, and you get a correct answer. And a real number.
We can get the other roots. Divide out of . What’s left is . You can use the quadratic formula for this. The other two roots are , about -0.268, and , about -3.732.
So here we have good reasons to work with negative numbers, and with imaginary numbers. We may not trust them. But they get us to correct answers. And this brings up another little secret of mathematics. If all you care about is an answer, then it’s all right to use a dubious method to get an answer.
There is a logical rigor missing in “we got away with it, I guess”. The name “imaginary numbers” tells of the disapproval of its users. We get the name from René Descartes, who was more generally discussing complex numbers. He wrote something like “in many cases no quantity exists which corresponds to what one imagines”.
John Wallis, taking a break from negative numbers and his other projects and quarrels, thought of how to represent imaginary numbers as branches off a number line. It’s a good scheme that nobody noticed at the time. Leonhard Euler envisioned matching complex numbers with points on the plane, but didn’t work out a logical basis for this. In 1797 Caspar Wessel presented a paper that described using vectors to represent complex numbers. It’s a good approach. Unfortunately that paper too sank without a trace, undiscovered for a century.
In 1806 Jean-Robert Argand wrote an “Essay on the Geometrical Interpretation of Imaginary Quantities”. Jacques Français got a copy, and published a paper describing the basics of complex numbers. He credited the essay, but noted that there was no author on the title page and asked the author to identify himself. Argand did. We started to get some good rigor behind the concept.
In 1831 William Rowan Hamilton, of Hamiltonian fame, described complex numbers using ordered pairs. Once we can define their arithmetic using the arithmetic of real numbers we have a second solid basis. More reason to trust them. Augustin-Louis Cauchy, who proved about four billion theorems of complex analysis, published a new construction of them. This used a group theory approach, a polynomial ring we denote as . I don’t have the strength to explain all that today. Matrices give us another approach. This matches complex numbers with particular two-row, two-column matrices. This turns the addition and multiplication of numbers into what Hamilton described.
And here we have some idea why mathematicians use negative numbers, and trust imaginary numbers. We are pushed toward them by convenience. Negative numbers let us work with one equation, , rather than three. (Or more than three equations, if we have to work with an x we know to be negative.) Imaginary numbers we can start with, and find answers we know to be true. And this encourages us to find reasons to trust the results. Having one line of reasoning is good. Having several lines — Argand’s geometric, Hamilton’s coordinates, Cauchy’s rings — is reassuring. We may not be able to point to an imaginary number of anything. But if we can trust our arithmetic on real numbers we can trust our arithmetic on imaginary numbers.
As I mentioned Descartes gave the name “imaginary number” to all of what we would now call “complex numbers”. Gauss published a geometric interpretation of complex numbers in 1831. And gave us the term “complex number”. Along the way he complained about the terminology, though. He noted “had +1, -1, and , instead of being called positive, negative, and imaginary (or worse still, impossible) unity, been given the names say, of direct, inverse, and lateral unity, there would hardly have been any scope for such obscurity”. I’ve never heard them term “impossible numbers”, except as an adjective.
The name of a thing doesn’t affect what it is. It can affect how we think about it, though. We can ask whether Asimov’s professor would dismiss “lateral numbers” as mysticism. Or at least as more mystical than “three” is. We can, in context, understand why Descartes thought of these as “imaginary numbers”. He saw them as something to use for the length of a calculation, and that would disappear once its use was done. We still have such concepts, things like “dummy variables” in a calculus problem. We can’t think of a use for dummy variables except to let a calculation proceed. But perhaps we’ll see things differently in four hundred years. Shall have to come back and check.
GoldenOj suggested the exponential as a topic. It seemed like a good important topic, but one that was already well-explored by other people. Then I realized I could spend time thinking about something which had bothered me.
In here I write about “the” exponential, which is a bit like writing about “the” multiplication. We can talk about and and many other such exponential functions. One secret of algebra, not appreciated until calculus (or later), is that all these different functions are a single family. Understanding one exponential function lets you understand them all. Mathematicians pick one, the exponential with base e, because we find that convenient. e itself isn’t a convenient number — it’s a bit over 2.718 — but it has some wonderful properties. When I write “the exponential” here, I am looking at this function where we look at .
This piece will have a bit more mathematics, as in equations, than usual. If you like me writing about mathematics more than reading equations, you’re hardly alone. I recommend letting your eyes drop to the next sentence, or at least the next sentence that makes sense. You should be fine.
My professor for real analysis, in grad school, gave us one of those brilliant projects. Starting from the definition of the logarithm, as an integral, prove at least thirty things. They could be as trivial as “the log of 1 is 0”. They could be as subtle as how to calculate the log of one number in a different base. It was a great project for testing what we knew about why calculus works.
And it gives me the structure to write about the exponential function. Anyone reading a pop-mathematics blog about exponentials knows them. They’re these functions that, as the independent variable grows, grow ever-faster. Or that decay asymptotically to zero. Some readers know that, if the independent variable is an imaginary number, the exponential is a complex number too. As the independent variable grows, becoming a bigger imaginary number, the exponential doesn’t grow. It oscillates, a sine wave.
That’s weird. I’d like to see why that makes sense.
To say “why” this makes sense is doomed. It’s like explaining “why” 36 is divisible by three and six and nine but not eight. It follows from what the words we have mean. The “why” I’ll offer is reasons why this strange behavior is plausible. It’ll be a mix of deductive reasoning and heuristics. This is a common blend when trying to understand why a result happens, or why we should accept it.
I’ll start with the definition of the logarithm, as used in real analysis. The natural logarithm, if you’re curious. It has a lot of nice properties. You can use this to prove over thirty things. Here it is:
The “s” is a dummy variable. You’ll never see it in actual use.
So now let me summon into existence a new function. I want to call it g. This is because I’ve worked this out before and I want to label something else as f. There is something coming ahead that’s a bit of a syntactic mess. This is the best way around it that I can find.
Here, ‘c’ is a constant. It might be real. It might be imaginary. It might be complex. I’m using ‘c’ rather than ‘a’ or ‘b’ so that I can later on play with possibilities.
So the alert reader noticed that g(x) here means “take the logarithm of x, and divide it by a constant”. So it does. I’ll need two things built off of g(x), though. The first is its derivative. That’s taken with respect to x, the only variable. Finding the derivative of an integral sounds intimidating but, happy to say, we have a theorem to make this easy. It’s the Fundamental Theorem of Calculus, and it tells us:
We can use the ‘ to denote “first derivative” if a function has only one variable. Saves time to write and is easier to type.
The other thing that I need, and the thing I really want, is the inverse of g. I’m going to call this function f(t). A more common notation would be to write but we already have in the works here. There is a limit to how many little one-stroke superscripts we need above g. This is the tradeoff to using ‘ for first derivatives. But here’s the important thing:
Here, we have some extratextual information. We know the inverse of a logarithm is an exponential. We even have a standard notation for that. We’d write
in any context besides this essay as I’ve set it up.
What I would like to know next is: what is the derivative of f(t)? This sounds impossible to know, if we’re thinking of “the inverse of this integration”. It’s not. We have the Inverse Function Theorem to come to our aid. We encounter the Inverse Function Theorem briefly, in freshman calculus. There we use it to do as many as two problems and then hide away forever from the Inverse Function Theorem. (This is why it’s not mentioned in my quick little guide to how to take derivatives.) It reappears in real analysis for this sort of contingency. The inverse function theorem tells us, if f the inverse of g, that:
That g'(f(t)) means, use the rule for g'(x), with f(t) substituted in place of ‘x’. And now we see something magic:
And that is the wonderful thing about the exponential. Its derivative is a constant times its original value. That alone would make the exponential one of mathematics’ favorite functions. It allows us, for example, to transform differential equations into polynomials. (If you want everlasting fame, albeit among mathematicians, invent a new way to turn differential equations into polynomials.) Because we could turn, say,
by supposing that f(t) has to be for the correct value of c. Then all you need do is find a value of ‘c’ that makes that last equation true.
Supposing that the answer has this convenient form may remind you of searching for the lost keys over here where the light is better. But we find so many keys in this good light. If you carry on in mathematics you will never stop seeing this trick, although it may be disguised.
In part because it’s so easy to work with. In part because exponentials like this cover so much of what we might like to do. Let’s go back to looking at the derivative of the exponential function.
There are many ways to understand what a derivative is. One compelling way is to think of it as the rate of change. If you make a tiny change in t, how big is the change in f(t)? So what is the rate of change here?
We can pose this as a pretend-physics problem. This lets us use our physical intuition to understand things. This also is the transition between careful reasoning and ad-hoc arguments. Imagine a particle that, at time ‘t’, is at the position . What is its velocity? That’s the first derivative of its position, so, .
If we are using our physics intuition to understand this it helps to go all the way. Where is the particle? Can we plot that? … Sure. We’re used to matching real numbers with points on a number line. Go ahead and do that. Not to give away spoilers, but we will want to think about complex numbers too. Mathematicians are used to matching complex numbers with points on the Cartesian plane, though. The real part of the complex number matches the horizontal coordinate. The imaginary part matches the vertical coordinate.
So how is this particle moving?
To say for sure we need some value of t. All right. Pick your favorite number. That’s our t. f(t) follows from whatever your t was. What’s interesting is that the change also depends on c. There’s a couple possibilities. Let me go through them.
First, what if c is zero? Well, then the definition of g(t) was gibberish and we can’t have that. All right.
What if c is a positive real number? Well, then, f'(t) is some positive multiple of whatever f(t) was. The change is “away from zero”. The particle will push away from the origin. As t increases, f(t) increases, so it pushes away faster and faster. This is exponential growth.
What if c is a negative real number? Well, then, f'(t) is some negative multiple of whatever f(t) was. The change is “towards zero”. The particle pulls toward the origin. But the closer it gets the more slowly it approaches. If t is large enough, f(t) will be so tiny that is too small to notice. The motion declines into imperceptibility.
What if c is an imaginary number, though?
So let’s suppose that c is equal to some real number b times , where .
I need some way to describe what value f(t) has, for whatever your pick of t was. Let me say it’s equal to , where and are some real numbers whose value I don’t care about. What’s important here is that .
And, then, what’s the first derivative? The magnitude and direction of motion? That’s easy to calculate; it’ll be . This is an interesting complex number. Do you see what’s interesting about it? I’ll get there next paragraph.
So f(t) matches some point on the Cartesian plane. But f'(t), the direction our particle moves with a small change in t, is another poiat whatever complex number f'(t) is as another point on the plane. The line segment connecting the origin to f(t) is perpendicular to the one connecting the origin to f'(t). The ‘motion’ of this particle is perpendicular to its position. And it always is. There’s several ways to show this. An easy one is to just pick some values for and and b and try it out. This proof is not rigorous, but it is quick and convincing.
If your direction of motion is always perpendicular to your position, then what you’re doing is moving in a circle around the origin. This we pick up in physics, but it applies to the pretend-particle moving here. The exponentials of and and will all be points on a locus that’s a circle centered on the origin. The values will look like the cosine of an angle plus times the sine of an angle.
And there, I think, we finally get some justification for the exponential of an imaginary number being a complex number. And for why exponentials might have anything to do with cosines and sines.
You might ask what if c is a complex number, if it’s equal to for some real numbers a and b. In this case, you get spirals as t changes. If a is positive, you get points spiralling outward as t increases. If a is negative, you get points spiralling inward toward zero as t increases. If b is positive the spirals go counterclockwise. If b is negative the spirals go clockwise. is the same as .
This does depend on knowing the exponential of a sum of terms, such as of , is equal to the product of the exponential of those terms. This is a good thing to have in your portfolio. If I remember right, it comes in around the 25th thing. It’s an easy result to have if you already showed something about the logarithms of products.
A throwaway joke somewhere in The Hitchhiker’s Guide To The Galaxy has Marvin The Paranoid Android grumble that he’s invented a square root for minus one. Marvin’s gone and rejiggered all of mathematics while waiting for something better to do. Nobody cares. It reminds us while Douglas Adams established much of a particular generation of nerd humor, he was not himself a nerd. The nerds who read The Hitchhiker’s Guide To The Galaxy obsessively know we already did that, centuries ago. Marvin’s creation was as novel as inventing “one-half”. (It may be that Adams knew, and intended Marvin working so hard on the already known as the joke.)
Anyone who’d read a pop mathematics blog like this likely knows the rough story of complex numbers in Western mathematics. The desire to find roots of polynomials. The discovery of formulas to find roots. Polynomials with numbers whose formulas demanded the square roots of negative numbers. And the discovery that sometimes, if you carried on as if the square root of a negative number made sense, the ugly terms vanished. And you got correct answers in the end. And, eventually, mathematicians relented. These things were unsettling enough to get unflattering names. To call a number “imaginary” may be more pejorative than even “negative”. It hints at the treatment of these numbers as falsework, never to be shown in the end. To call the sum of a “real” number and an “imaginary” “complex” is to warn. An expert might use these numbers only with care and deliberation. But we can count them as numbers.
I mentioned when writing about quaternions how when I learned of complex numbers I wanted to do the same trick again. My suspicion is many mathematicians do. The example of complex numbers teases us with the possibilites of other numbers. If we’ve defined to be “a number that, squared, equals minus one”, what next? Could we define a ? How about a ? Maybe something else? An arc-cosine of ?
You can try any of these. They turn out to be redundant. The real numbers and already let you describe any of those new numbers. You might have a flash of imagination: what if there were another number that, squared, equalled minus one, and that wasn’t equal to ? Numbers that look like ? Here, and later on, a and b and c are some real numbers. means “multiply the real number b by whatever is”, and we trust that this makes sense. There’s a similar setup for c and . And if you just try that, with , you get some interesting new mathematics. Then you get stuck on what the product of these two different square roots should be.
If you think of that. If all you think of is addition and subtraction and maybe multiplication by a real number? works fine. You only spot trouble if you happen to do multiplication. Granted, multiplication is to us not an exotic operation. Take that as a warning, though, of how trouble could develop. How do we know, say, that complex numbers are fine as long as you don’t try to take the log of the haversine of one of them, or some other obscurity? And that then they produce gibberish? Or worse, produce that most dread construct, a contradiction?
Here I am indebted to an essay that ten minutes ago I would have sworn was in one of the two books I still have out from the university library. I’m embarrassed to learn my error. It was about the philosophy of complex numbers and it gave me fresh perspectives. When the university library reopens for lending I will try to track back through my borrowing and find the original. I suspect, without confirming, that it may have been in Reuben Hersh’s What Is Mathematics, Really?.
The insight is that we can think of complex numbers in several ways. One fruitful way is to match complex numbers with points in a two-dimensional space. It’s common enough to pair, for example, the number with the point at Cartesian coordinates . Mathematicians do this so often it can take a moment to remember that is just a convention. And there is a common matching between points in a Cartesian coordinate system and vectors. Chaining together matches like this can worry. Trust that we believe our matches are sound. Then we notice that adding two complex numbers does the same work as adding ordered coordinate pairs. If we trust that adding coordinate pairs makes sense, then we need to accept that adding complex numbers makes sense. Adding coordinate pairs is the same work as adding real numbers. It’s just a lot of them. So we’re lead to trust that if addition for real numbers works then addition for complex numbers does.
Multiplication looks like a mess. A different perspective helps us. A different way to look at where point are on the plane is to use polar coordinates. That is, the distance a point is from the origin, and the angle between the positive x-axis and the line segment connecting the origin to the point. In this format, multiplying two complex numbers is easy. Let the first complex number have polar coordinates . Let the second have polar coordinates . Their product, by the rules of complex numbers, is a point with polar coordinates . These polar coordinates are real numbers again. If we trust addition and multiplication of real numbers, we can trust this for complex numbers.
If we’re confident in adding complex numbers, and confident in multiplying them, then … we’re in quite good shape. If we can add and multiply, we can do polynomials. And everything is polynomials.
We might feel suspicious yet. Going from complex numbers to points in space is calling on our geometric intuitions. That might be fooling ourselves. Can we find a different rationalization? The same result by several different lines of reasoning makes the result more believable. Is there a rationalization for complex numbers that never touches geometry?
Adding matrices is compelling. It’s the same work as adding ordered pairs of numbers. Multiplying matrices is tedious, though it’s not so bad for matrices this small. And it’s all done with real-number multiplication and addition. If we trust that the real numbers work, we can trust complex numbers do. If we can show that our new structure can be understood as a configuration of the old, we convince ourselves the new structure is meaningful.
The process by which we learn to trust them as numbers, guides us to learning how to trust any new mathematical structure. So here is a new thing that complex numbers can teach us, years after we have learned how to divide them. Do not attempt to divide complex numbers. That’s too much work.
Today’s A To Z term is my pick again. So I choose the Julia Set. This is named for Gaston Julia, one of the pioneers in chaos theory and fractals. He was born earlier than you imagine. No, earlier than that: he was born in 1893.
The early 20th century saw amazing work done. We think of chaos theory and fractals as modern things, things that require vast computing power to understand. The computers help, yes. But the foundational work was done more than a century ago. Some of these pioneering mathematicians may have been able to get some numerical computing done. But many did not. They would have to do the hard work of thinking about things which they could not visualize. Things which surely did not look like they imagined.
We think of things as moving. Even static things we consider as implying movement. Else we’d think it odd to ask, “Where does that road go?” This carries over to abstract things, like mathematical functions. A function is a domain, a range, and a rule matching things in the domain to things in the range. It “moves” things as much as a dictionary moves words.
Yet we still think of a function as expressing motion. A common way for mathematicians to write functions uses little arrows, and describes what’s done as “mapping”. We might write . This is a general idea. We’re expressing that it maps things in the set D to things in the set R. We can use the notation to write something more specific. If ‘z’ is in the set D, we might write . This describes the rule that matches things in the domain to things in the range. represents the evaluation of this rule at a specific point, the one where the independent variable has the value ‘2’. represents the evaluation of this rule at a specific point without committing to what that point is. represents a collection of points. It’s the set you get by evaluating the rule at every point in D.
And it’s not bad to think of motion. Many functions are models of things that move. Particles in space. Fluids in a room. Populations changing in time. Signal strengths varying with a sensor’s position. Often we’ll calculate the development of something iteratively, too. If the domain and the range of a function are the same set? There’s no reason that we can’t take our z, evaluate f(z), and then take whatever that thing is and evaluate f(f(z)). And again. And again.
My age cohort, at least, learned to do this almost instinctively when we discovered you could take the result on a calculator and hit a function again. Calculate something and keep hitting square root; you get a string of numbers that eventually settle on 1. Or you started at zero. Calculate something and keep hitting square; you settle at either 0, 1, or grow to infinity. Hitting sine over and over … well, that was interesting since you might settle on 0 or some other, weird number. Same with tangent. Cosine you wouldn’t settle down to zero.
Serious mathematicians look at this stuff too, though. Take any set ‘D’, and find what its image is, f(D). Then iterate this, figuring out what f(f(D)) is. Then f(f(f(D))). f(f(f(f(D)))). And so on. What happens if you keep doing this? Like, forever?
We can say some things, at least. Even without knowing what f is. There could be a part of D that all these many iterations of f will send out to infinity. There could be a part of D that all these many iterations will send to some fixed point. And there could be a part of D that just keeps getting shuffled around without ever finishing.
Some of these might not exist. Like, doesn’t have any fixed points or shuffled-around points. It sends everything off to infinity. has only a fixed point; nothing from it goes off to infinity and nothing’s shuffled back and forth. has a fixed point and a lot of points that shuffle back and forth.
Thinking about these fixed points and these shuffling points gets us Julia Sets. These sets are the fixed points and shuffling-around points for certain kinds of functions. These functions are ones that have domain and range of the complex-valued numbers. Complex-valued numbers are the sum of a real number plus an imaginary number. A real number is just what it says on the tin. An imaginary number is a real number multiplied by . What is ? It’s the imaginary unit. It has the neat property that . That’s all we need to know about it.
Oh, also, zero times is zero again. So if you really want, you can say all real numbers are complex numbers; they’re just themselves plus . Complex-valued functions are worth a lot of study in their own right. Better, they’re easier to study (at the introductory level) than real-valued functions are. This is such a relief to the mathematics major.
And now let me explain some little nagging weird thing. I’ve been using ‘z’ to represent the independent variable here. You know, using it as if it were ‘x’. This is a convention mathematicians use, when working with complex-valued numbers. An arbitrary complex-valued number tends to be called ‘z’. We haven’t forgotten x, though. We just in this context use ‘x’ to mean “the real part of z”. We also use “y” to carry information about the imaginary part of z. When we write ‘z’ we hold in trust an ‘x’ and ‘y’ for which . This all comes in handy.
But we still don’t have Julia Sets for every complex-valued function. We need it to be a rational function. The name evokes rational numbers, but that doesn’t seem like much guidance. is a rational function. It seems too boring to be worth studying, though, and it is. A “rational function” is a function that’s one polynomial divided by another polynomial. This whether they’re real-valued or complex-valued polynomials.
So. Start with an ‘f’ that’s one complex-valued polynomial divided by another complex-valued polynomial. Start with the domain D, all of the complex-valued numbers. Find f(D). And f(f(D)). And f(f(f(D))). And so on. If you iterated this ‘f’ without limit, what’s the set of points that never go off to infinity? That’s the Julia Set for that function ‘f’.
There are some famous Julia sets, though. There are the Julia sets that we heard about during the great fractal boom of the 1980s. This was when computers got cheap enough, and their graphic abilities good enough, to automate the calculation of points in these sets. At least to approximate the points in these sets. And these are based on some nice, easy-to-understand functions. First, you have to pick a constant C. This C is drawn from the complex-valued numbers. But that can still be, like, ½, if that’s what interests you. For whatever your C is? Define this function:
And that’s it. Yes, this is a rational function. The numerator function is . The denominator function is .
This produces many different patterns. If you picked C = 0, you get a circle. Good on you for starting out with something you could double-check. If you picked C = -2? You get a long skinny line, again, easy enough to check. If you picked C = -1? Well, now you have a nice interesting weird shape, several bulging ovals with peninsulas of other bulging ovals all over. Pick other numbers. Pick numbers with interesting imaginary components. You get pinwheels. You get jagged streaks of lightning. You can even get separate islands, whole clouds of disjoint threatening-looking blobs.
There is some guessing what you’ll get. If you work out a Julia Set for a particular C, you’ll see a similar-looking Julia Set for a different C that’s very close to it. This is a comfort.
You can create a Julia Set for any rational function. I’ve only ever seen anyone actually do it for functions that look like what we already had. . Sometimes . I suppose once, in high school, I might have tried but I don’t remember what it looked like. If someone’s done, say, please write in and let me know what it looks like.
The Julia Set has a famous partner. Maybe the most famous fractal of them all, the Mandelbrot Set. That’s the strange blobby sea surrounded by lightning bolts that you see on the cover of every pop mathematics book from the 80s and 90s. If a C gives us a Julia Set that’s one single, contiguous patch? Then that C is in the Mandelbrot Set. Also vice-versa.
The ideas behind these sets are old. Julia’s paper about the iterations of rational functions first appeared in 1918. Julia died in 1978, the same year that the first computer rendering of the Mandelbrot set was done. I haven’t been able to find whether that rendering existed before his death. Nor have I decided which I would think the better sequence.
I remain fascinated with Józef Maria Hoëne-Wronski’s attempted definition of π. It had started out like this:
And I’d translated that into something that modern mathematicians would accept without flinching. That is to evaluate the limit of a function that looks like this:
So. I don’t want to deal with that f(x) as it’s written. I can make it better. One thing that bothers me is seeing the complex number raised to a power. I’d like to work with something simpler than that. And I can’t see that number without also noticing that I’m subtracting from it raised to the same power. and are a “conjugate pair”. It’s usually nice to see those. It often hints at ways to make your expression simpler. That’s one of those patterns you pick up from doing a lot of problems as a mathematics major, and that then look like magic to the lay audience.
Here’s the first way I figure to make my life simpler. It’s in rewriting that and stuff so it’s simpler. It’ll be simpler by using exponentials. Shut up, it will too. I get there through Gauss, Descartes, and Euler.
At least I think it was Gauss who pointed out how you can match complex-valued numbers with points on the two-dimensional plane. On a sheet of graph paper, if you like. The number matches to the point with x-coordinate 1, y-coordinate 1. The number matches to the point with x-coordinate 1, y-coordinate -1. Yes, yes, this doesn’t sound like much of an insight Gauss had, but his work goes on. I’m leaving it off here because that’s all that I need for right now.
So these two numbers that offended me I can think of as points. They have Cartesian coordinates (1, 1) and (1, -1). But there’s never only one coordinate system for something. There may be only one that’s good for the problem you’re doing. I mean that makes the problem easier to study. But there are always infinitely many choices. For points on a flat surface like a piece of paper, and where the points don’t represent any particular physics problem, there’s two good choices. One is the Cartesian coordinates. In it you refer to points by an origin, an x-axis, and a y-axis. How far is the point from the origin in a direction parallel to the x-axis? (And in which direction? This gives us a positive or a negative number) How far is the point from the origin in a direction parallel to the y-axis? (And in which direction? Same positive or negative thing.)
The other good choice is polar coordinates. For that we need an origin and a positive x-axis. We refer to points by how far they are from the origin, heedless of direction. And then to get direction, what angle the line segment connecting the point with the origin makes with the positive x-axis. The first of these numbers, the distance, we normally label ‘r’ unless there’s compelling reason otherwise. The other we label ‘θ’. ‘r’ is always going to be a positive number or, possibly, zero. ‘θ’ might be any number, positive or negative. By convention, we measure angles so that positive numbers are counterclockwise from the x-axis. I don’t know why. I guess it seemed less weird for, say, the point with Cartesian coordinates (0, 1) to have a positive angle rather than a negative angle. That angle would be , because mathematicians like radians more than degrees. They make other work easier.
So. The point corresponds to the polar coordinates and . The point corresponds to the polar coordinates and . Yes, the θ coordinates being negative one times each other is common in conjugate pairs. Also, if you have doubts about my use of the word “the” before “polar coordinates”, well-spotted. If you’re not sure about that thing where ‘r’ is not negative, again, well-spotted. I intend to come back to that.
With the polar coordinates ‘r’ and ‘θ’ to describe a point I can go back to complex numbers. I can match the point to the complex number with the value given by , where ‘e’ is that old 2.71828something number. Superficially, this looks like a big dumb waste of time. I had some problem with imaginary numbers raised to powers, so now, I’m rewriting things with a number raised to imaginary powers. Here’s why it isn’t dumb.
It’s easy to raise a number written like this to a power. raised to the n-th power is going to be equal to . (Because and we’re going to go ahead and assume this stays true if ‘b’ is a complex-valued number. It does, but you’re right to ask how we know that.) And this turns into raising a real-valued number to a power, which we know how to do. And it involves dividing a number by that power, which is also easy.
And we can get back to something that looks like too. That is, something that’s a real number plus times some real number. This is through one of the many Euler’s Formulas. The one that’s relevant here is that for any real number ‘φ’. So, that’s true also for ‘θ’ times ‘n’. Or, looking to where everybody knows we’re going, also true for ‘θ’ divided by ‘x’.
OK, on to the people so anxious about all this. I talked about the angle made between the line segment that connects a point and the origin and the positive x-axis. “The” angle. “The”. If that wasn’t enough explanation of the problem, mention how your thinking’s done a 360 degree turn and you see it different now. In an empty room, if you happen to be in one. Your pedantic know-it-all friend is explaining it now. There’s an infinite number of angles that correspond to any given direction. They’re all separated by 360 degrees or, to a mathematician, 2π.
And more. What’s the difference between going out five units of distance in the direction of angle 0 and going out minus-five units of distance in the direction of angle -π? That is, between walking forward five paces while facing east and walking backward five paces while facing west? Yeah. So if we let ‘r’ be negative we’ve got twice as many infinitely many sets of coordinates for each point.
This complicates raising numbers to powers. θ times n might match with some point that’s very different from θ-plus-2-π times n. There might be a whole ring of powers. This seems … hard to work with, at least. But it’s, at heart, the same problem you get thinking about the square root of 4 and concluding it’s both plus 2 and minus 2. If you want “the” square root, you’d like it to be a single number. At least if you want to calculate anything from it. You have to pick out a preferred θ from the family of possible candidates.
For me, that’s whatever set of coordinates has ‘r’ that’s positive (or zero), and that has ‘θ’ between -π and π. Or between 0 and 2π. It could be any strip of numbers that’s 2π wide. Pick what makes sense for the problem you’re doing. It’s going to be the strip from -π to π. Perhaps the strip from 0 to 2π.
What this all amounts to is that I can turn this:
without changing its meaning any. Raising a number to the one-over-x power looks different from raising it to the n power. But the work isn’t different. The function I wrote out up there is the same as this function:
I can’t look at that number, , sitting there, multiplied by two things added together, and leave that. (OK, subtracted, but same thing.) I want to something something distributive law something and that gets us here:
Also, yeah, that square root of two raised to a power looks weird. I can turn that square root of two into “two to the one-half power”. That gets to this rewrite:
And then. Those parentheses. e raised to an imaginary number minus e raised to minus-one-times that same imaginary number. This is another one of those magic tricks that mathematicians know because they see it all the time. Part of what we know from Euler’s Formula, the one I waved at back when I was talking about coordinates, is this:
That’s good for any real-valued φ. For example, it’s good for the number . And that means we can rewrite that function into something that, finally, actually looks a little bit simpler. It looks like this:
And that’s the function whose limit I want to take at ∞. No, really.
A couple weeks ago I shared a fascinating formula for π. I got it from Carl B Boyer’s The History of Calculus and its Conceptual Development. He got it from Józef Maria Hoëne-Wronski, early 19th-century Polish mathematician. His idea was that an absolute, culturally-independent definition of π would come not from thinking about circles and diameters but rather this formula:
Now, this formula is beautiful, at least to my eyes. It’s also gibberish. At least it’s ungrammatical. Mathematicians don’t like to write stuff like “four times infinity”, at least not as more than a rough draft on the way to a real thought. What does it mean to multiply four by infinity? Is arithmetic even a thing that can be done on infinitely large quantities? Among Wronski’s problems is that they didn’t have a clear answer to this. We’re a little more advanced in our mathematics now. We’ve had a century and a half of rather sound treatment of infinitely large and infinitely small things. Can we save Wronski’s work?
Start with the easiest thing. I’m offended by those bits. Well, no, I’m more unsettled by them. I would rather have in there. The difference? … More taste than anything sound. I prefer, if I can get away with it, using the square root symbol to mean the positive square root of the thing inside. There is no positive square root of -1, so, pfaugh, away with it. Mere style? All right, well, how do you know whether those terms are meant to be or its additive inverse, ? How do you know they’re all meant to be the same one? See? … As with all style preferences, it’s impossible to be perfectly consistent. I’m sure there are times I accept a big square root symbol over a negative or a complex-valued quantity. But I’m not forced to have it here so I’d rather not. First step:
Also dividing by is the same as multiplying by so the second easy step gives me:
Now the hard part. All those infinities. I don’t like multiplying by infinity. I don’t like dividing by infinity. I really, really don’t like raising a quantity to the one-over-infinity power. Most mathematicians don’t. We have a tool for dealing with this sort of thing. It’s called a “limit”.
Mathematicians developed the idea of limits over … well, since they started doing mathematics. In the 19th century limits got sound enough that we still trust the idea. Here’s the rough way it works. Suppose we have a function which I’m going to name ‘f’ because I have better things to do than give functions good names. Its domain is the real numbers. Its range is the real numbers. (We can define functions for other domains and ranges, too. Those definitions look like what they do here.)
I’m going to use ‘x’ for the independent variable. It’s any number in the domain. I’m going to use ‘a’ for some point. We want to know the limit of the function “at a”. ‘a’ might be in the domain. But — and this is genius — it doesn’t have to be. We can talk sensibly about the limit of a function at some point where the function doesn’t exist. We can say “the limit of f at a is the number L”. I hadn’t introduced ‘L’ into evidence before, but … it’s a number. It has some specific set value. Can’t say which one without knowing what ‘f’ is and what its domain is and what ‘a’ is. But I know this about it.
Pick any error margin that you like. Call it ε because mathematicians do. However small this (positive) number is, there’s at least one neighborhood in the domain of ‘f’ that surrounds ‘a’. Check every point in that neighborhood other than ‘a’. The value of ‘f’ at all those points in that neighborhood other than ‘a’ will be larger than L – ε and smaller than L + ε.
Yeah, pause a bit there. It’s a tricky definition. It’s a nice common place to crash hard in freshman calculus. Also again in Intro to Real Analysis. It’s not just you. Perhaps it’ll help to think of it as a kind of mutual challenge game. Try this.
You draw whatever error bar, as big or as little as you like, around ‘L’.
But Ialways respond by drawing some strip around ‘a’.
You then pick absolutely any ‘x’ inside my strip, other than ‘a’.
Is f(x) always within the error bar you drew?
Suppose f(x) is. Suppose that you can pick any error bar however tiny, and I can answer with a strip however tiny, and every single ‘x’ inside my strip has an f(x) within your error bar … then, L is the limit of f at a.
Again, yes, tricky. But mathematicians haven’t found a better definition that doesn’t break something mathematicians need.
To write “the limit of f at a is L” we use the notation:
The ‘lim’ part probably makes perfect sense. And you can see where ‘f’ and ‘a’ have to enter into it. ‘x’ here is a “dummy variable”. It’s the falsework of the mathematical expression. We need some name for the independent variable. It’s clumsy to do without. But it doesn’t matter what the name is. It’ll never appear in the answer. If it does then the work went wrong somewhere.
What I want to do, then, is turn all those appearances of ‘∞’ in Wronski’s expression into limits of something at infinity. And having just said what a limit is I have to do a patch job. In that talk about the limit at ‘a’ I talked about a neighborhood containing ‘a’. What’s it mean to have a neighborhood “containing ∞”?
The answer is exactly what you’d think if you got this question and were eight years old. The “neighborhood of infinity” is “all the big enough numbers”. To make it rigorous, it’s “all the numbers bigger than some finite number that let’s just call N”. So you give me an error bar around ‘L’. I’ll give you back some number ‘N’. Every ‘x’ that’s bigger than ‘N’ has f(x) inside your error bars. And note that I don’t have to say what ‘f(∞)’ is or even commit to the idea that such a thing can be meaningful. I only ever have to think directly about values of ‘f(x)’ where ‘x’ is some real number.
So! First, let me rewrite Wronski’s formula as a function, defined on the real numbers. Then I can replace each ∞ with the limit of something at infinity and … oh, wait a minute. There’s three ∞ symbols there. Do I need three limits?
Ugh. Yeah. Probably. This can be all right. We can do multiple limits. This can be well-defined. It can also be a right pain. The challenge-and-response game needs a little modifying to work. You still draw error bars. But I have to draw multiple strips. One for each of the variables. And every combination of values inside all those strips has give an ‘f’ that’s inside your error bars. There’s room for great mischief. You can arrange combinations of variables that look likely to break ‘f’ outside the error bars.
So. Three independent variables, all taking a limit at ∞? That’s not guaranteed to be trouble, but I’d expect trouble. At least I’d expect something to keep the limit from existing. That is, we could find there’s no number ‘L’ so that this drawing-neighborhoods thing works for all three variables at once.
Let’s try. One of the ∞ will be a limit of a variable named ‘x’. One of them a variable named ‘y’. One of them a variable named ‘z’. Then:
Without doing the work, my hunch is: this is utter madness. I expect it’s probably possible to make this function take on many wildly different values by the judicious choice of ‘x’, ‘y’, and ‘z’. Particularly ‘y’ and ‘z’. You maybe see it already. If you don’t, you maybe see it now that I’ve said you maybe see it. If you don’t, I’ll get there, but not in this essay. But let’s suppose that it’s possible to make f(x, y, z) take on wildly different values like I’m getting at. This implies that there’s not any limit ‘L’, and therefore Wronski’s work is just wrong.
Thing is, Wronski wouldn’t have thought that. Deep down, I am certain, he thought the three appearances of ∞ were the same “value”. And that to translate him fairly we’d use the same name for all three appearances. So I am going to do that. I shall use ‘x’ as my variable name, and replace all three appearances of ∞ with the same variable and a common limit. So this gives me the single function:
And then I need to take the limit of this at ∞. If Wronski is right, and if I’ve translated him fairly, it’s going to be π. Or something easy to get π from.
Today Gaurish, of For the love of Mathematics, gives me the last subject for my Summer 2017 A To Z sequence. And also my greatest challenge: the Zeta function. The subject comes to all pop mathematics blogs. It comes to all mathematics blogs. It’s not difficult to say something about a particular zeta function. But to say something at all original? Let’s watch.
The spring semester of my sophomore year I had Intro to Complex Analysis. Monday Wednesday 7:30; a rare evening class, one of the few times I’d eat dinner and then go to a lecture hall. There I discovered something strange and wonderful. Complex Analysis is a far easier topic than Real Analysis. Both are courses about why calculus works. But why calculus for complex-valued numbers works is a much easier problem than why calculus for real-valued numbers works. It’s dazzling. Part of this is that Complex Analysis, yes, builds on Real Analysis. So Complex can take for granted some things that Real has to prove. I didn’t mind. Given the way I crashed through Intro to Real Analysis I was glad for a subject that was, relatively, a breeze.
As we worked through Complex Variables and Applications so many things, so very many things, got to be easy. The basic unit of complex analysis, at least as we young majors learned it, was in contour integrals. These are integrals whose value depends on the values of a function on a closed loop. The loop is in the complex plane. The complex plane is, well, your ordinary plane. But we say the x-coordinate and the y-coordinate are parts of the same complex-valued number. The x-coordinate is the real-valued part. The y-coordinate is the imaginary-valued part. And we call that summation ‘z’. In complex-valued functions ‘z’ serves the role that ‘x’ does in normal mathematics.
So a closed loop is exactly what you think. Take a rubber band and twist it up and drop it on the table. That’s a closed loop. Suppose you want to integrate a function, ‘f(z)’. If you can always take its derivative on this loop and on the interior of that loop, then its contour integral is … zero. No matter what the function is. As long as it’s “analytic”, as the terminology has it. Yeah, we were all stunned into silence too. (Granted, mathematics classes are usually quiet, since it’s hard to get a good discussion going. Plus many of us were in post-dinner digestive lulls.)
Integrating regular old functions of real-valued numbers is this tedious process. There’s sooooo many rules and possibilities and special cases to consider. There’s sooooo many tricks that get you the integrals of some functions. And then here, with complex-valued integrals for analytic functions, you know the answer before you even look at the function.
As you might imagine, since this is only page 113 of a 341-page book there’s more to it. Most functions that anyone cares about aren’t analytic. At least they’re not analytic everywhere inside regions that might be interesting. There’s usually some points where an interesting function ‘f(z)’ is undefined. We call these “singularities”. Yes, like starships are always running into. Only we rarely get propelled into other universes or other times or turned into ghosts or stuff like that.
So much of the rest of the course turns into ways to avoid singularities. Sometimes you can spackle them over. This is when the function happens not to be defined somewhere, but you can see what it ought to be. Sometimes you have to do something more. This turns into a search for “removable” singularities. And this does something so brilliant it looks illicit. You modify your closed loop, so that it comes up very close, as close as possible, to the singularity, but studiously avoids it. Follow this game of I’m-not-touching-you right and you can turn your integral into two parts. One is the part that’s equal to zero. The other is the part that’s a constant times whatever the function is at the singularity you’re removing. And that ought to be easy to find the value for. (Being able to find a function’s value doesn’t mean you can find its derivative.)
Those tricks were hard to master. Not because they were hard. Because they were easy, in a context where we expected hard. But after that we got into how to move singularities. That is, how to do a change of variables that moved the singularities to where they’re more convenient for some reason. How could this be more convenient? Because of chapter five, “Series”. In regular old calculus we learn how to approximate well-behaved functions with polynomials. In complex-variable calculus, we learn the same thing all over again. They’re polynomials of complex-valued variables, but it’s the same sort of thing. And not just polynomials, but things that look like polynomials except they’re powers of instead. These open up new ways to approximate functions, and to remove singularities from functions.
And then we get into transformations. These are about turning a problem that’s hard into one that’s easy. Or at least different. They’re a change of variable, yes. But they also change what exactly the function is. This reshuffles the problem. Makes for a change in singularities. Could make ones that are easier to work with.
One of the useful, and so common, transforms is called the Laplace-Stieltjes Transform. (“Laplace” is said like you might guess. “Stieltjes” is said, or at least we were taught to say it, like “Stilton cheese” without the “ton”.) And it tends to create functions that look like a series, the sum of a bunch of terms. Infinitely many terms. Each of those terms looks like a number times another number raised to some constant times ‘z’. As the course came to its conclusion, we were all prepared to think about these infinite series. Where singularities might be. Which of them might be removable.
These functions, these results of the Laplace-Stieltjes Transform, we collectively call ‘zeta functions’. There are infinitely many of them. Some of them are relatively tame. Some of them are exotic. One of them is world-famous. Professor Walsh — I don’t mean to name-drop, but I discovered the syllabus for the course tucked in the back of my textbook and I’m delighted to rediscover it — talked about it.
That world-famous one is, of course, the Riemann Zeta function. Yes, that same Riemann who keeps turning up, over and over again. It looks simple enough. Almost tame. Take the counting numbers, 1, 2, 3, and so on. Take your ‘z’. Raise each of the counting numbers to that ‘z’. Take the reciprocals of all those numbers. Add them up. What do you get?
A mass of fascinating results, for one. Functions you wouldn’t expect are concealed in there. There’s strips where the real part is zero. There’s strips where the imaginary part is zero. There’s points where both the real and imaginary parts are zero. We know infinitely many of them. If ‘z’ is -2, for example, the sum is zero. Also if ‘z’ is -4. -6. -8. And so on. These are easy to show, and so are dubbed ‘trivial’ zeroes. To say some are ‘trivial’ is to say that there are others that are not trivial. Where are they?
Professor Walsh explained. We know of many of them. The nontrivial zeroes we know of all share something in common. They have a real part that’s equal to 1/2. There’s a zero that’s at about the number . Also at . There’s one at about . Also about . (There’s a symmetry, you maybe guessed.) Every nontrivial zero we’ve found has a real component that’s got the same real-valued part. But we don’t know that they all do. Nobody does. It is the Riemann Hypothesis, the great unsolved problem of mathematics. Much more important than that Fermat’s Last Theorem, which back then was still merely a conjecture.
What a prospect! What a promise! What a way to set us up for the final exam in a couple of weeks.
I had an inspiration, a kind of scheme of showing that a nontrivial zero couldn’t be within a given circular contour. Make the size of this circle grow. Move its center farther away from the z-coordinate to match. Show there’s still no nontrivial zeroes inside. And therefore, logically, since I would have shown nontrivial zeroes couldn’t be anywhere but on this special line, and we know nontrivial zeroes exist … I leapt enthusiastically into this project. A little less enthusiastically the next day. Less so the day after. And on. After maybe a week I went a day without working on it. But came back, now and then, prodding at my brilliant would-be proof.
The Riemann Zeta function was not on the final exam, which I’ve discovered was also tucked into the back of my textbook. It asked more things like finding all the singular points and classifying what kinds of singularities they were for functions like instead. If the syllabus is accurate, we got as far as page 218. And I’m surprised to see the professor put his e-mail address on the syllabus. It was merely “bwalsh@math”, but understand, the Internet was a smaller place back then.
I finished the course with an A-, but without answering any of the great unsolved problems of mathematics.
Once more do I have Gaurish to thank for the day’s topic. (There’ll be two more chances this week, providing I keep my writing just enough ahead of deadline.) This one doesn’t touch category theory or topology.
I keep touching on group theory here. It’s a field that’s about what kinds of things can work like arithmetic does. A group is a set of things that you can add together. At least, you can do something that works like adding regular numbers together does. A ring is a set of things that you can add and multiply together.
There are many interesting rings. Here’s one. It’s called the Gaussian Integers. They’re made of numbers we can write as , where ‘a’ and ‘b’ are some integers. is what you figure, that number that multiplied by itself is -1. These aren’t the complex-valued numbers, you notice, because ‘a’ and ‘b’ are always integers. But you add them together the way you add complex-valued numbers together. That is, plus is the number . And you multiply them the way you multiply complex-valued numbers together. That is, times is the number .
We created something that has addition and multiplication. It picks up subtraction for free. It doesn’t have division. We can create rings that do, but this one won’t, any more than regular old integers have division. But we can ask what other normal-arithmetic-like stuff these Gaussian integers do have. For instance, can we factor numbers?
This isn’t an obvious one. No, we can’t expect to be able to divide one Gaussian integer by another. But we can’t expect to divide a regular old integer by another, not and get an integer out of it. That doesn’t mean we can’t factor them. It means we divide the regular old integers into a couple classes. There’s prime numbers. There’s composites. There’s the unit, the number 1. There’s zero. We know prime numbers; they’re 2, 3, 5, 7, and so on. Composite numbers are the ones you get by multiplying prime numbers together: 4, 6, 8, 9, 10, and so on. 1 and 0 are off on their own. Leave them there. We can’t divide any old integer by any old integer. But we can say an integer is equal to this string of prime numbers multiplied together. This gives us a handle by which we can prove a lot of interesting results.
We can do the same with Gaussian integers. We can divide them up into Gaussian primes, Gaussian composites, units, and zero. The words mean what they mean for regular old integers. A Gaussian composite can be factored into the multiples of Gaussian primes. Gaussian primes can’t be factored any further.
If we know what the prime numbers are for regular old integers we can tell whether something’s a Gaussian prime. Admittedly, knowing all the prime numbers is a challenge. But a Gaussian integer will be prime whenever a couple simple-to-test conditions are true. First is if ‘a’ and ‘b’ are both not zero, but is a prime number. So, for example, is a Gaussian prime.
You might ask, hey, would also be a Gaussian prime? That’s also got components that are integers, and the squares of them add up to a prime number (41). Well-spotted. Gaussian primes appear in quartets. If is a Gaussian prime, so is . And so are and .
There’s another group of Gaussian primes. These are the numbers where either ‘a’ or ‘b’ is zero. Then the other one is, if positive, three more than a whole multiple of four. If it’s negative, then it’s three less than a whole multiple of four. So ‘3’ is a Gaussian prime, as is -3, and as is and so is .
This has strange effects. Like, ‘3’ is a prime number in the regular old scheme of things. It’s also a Gaussian prime. But familiar other prime numbers like ‘2’ and ‘5’? Not anymore. Two is equal to ; both of those terms are Gaussian primes. Five is equal to . There are similar shocking results for 13. But, roughly, the world of composites and prime numbers translates into Gaussian composites and Gaussian primes. In this slightly exotic structure we have everything familiar about factoring numbers.
You might have some nagging thoughts. Like, sure, two is equal to . But isn’t it also equal to ? One of the important things about prime numbers is that every composite number is the product of a unique string of prime numbers. Do we have to give that up for Gaussian integers?
Good nag. But no; the doubt is coming about because you’ve forgotten the difference between “the positive integers” and “all the integers”. If we stick to positive whole numbers then, yeah, (say) ten is equal to two times five and no other combination of prime numbers. But suppose we have all the integers, positive and negative. Then ten is equal to either two times five or it’s equal to negative two times negative five. Or, better, it’s equal to negative one times two times negative one times five. Or suffix times any even number of negative ones.
Remember that bit about separating ‘one’ out from the world of primes and composites? That’s because the number one screws up these unique factorizations. You can always toss in extra factors of one, to taste, without changing the product of something. If we have positive and negative integers to use, then negative one does almost the same trick. We can toss in any even number of extra negative ones without changing the product. This is why we separate “units” out of the numbers. They’re not part of the prime factorization of any numbers.
For the Gaussian integers there are four units. 1 and -1, and . They are neither primes nor composites, and we don’t worry about how they would otherwise multiply the number of factorizations we get.
But let me close with a neat, easy-to-understand puzzle. It’s called the moat-crossing problem. In the regular old integers it’s this: imagine that the prime numbers are islands in a dangerous sea. You start on the number ‘2’. Imagine you have a board that can be set down and safely crossed, then picked up to be put down again. Could you get from the start and go off to safety, which is infinitely far away? If your board is some, fixed, finite length?
No, you can’t. The problem amounts to how big the gap between one prime number and the next largest prime number can be. It turns out there’s no limit to that. That is, you give me a number, as small or as large as you like. I can find some prime number that’s more than your number less than its successor. There are infinitely large gaps between prime numbers.
Gaussian primes, though? Since a Gaussian prime might have nearest neighbors in any direction? Nobody knows. We know there are arbitrarily large gaps. Pick a moat size; we can (eventually) find a Gaussian prime that’s at least that far away from its nearest neighbors. But this does not say whether it’s impossible to get from the smallest Gaussian primes — and its companions and on — infinitely far away. We know there’s a moat of width 6 separating the origin of things from infinity. We don’t know that there’s bigger ones.
You’re not going to solve this problem. Unless I have more brilliant readers than I know about; if I have ones who can solve this problem then I might be too intimidated to write anything more. But there is surely a pleasant pastime, maybe a charming game, to be made from this. Try finding the biggest possible moats around some set of Gaussian prime island.
Learning of imaginary numbers, things created to be the square roots of negative numbers, inspired me. It probably inspires anyone who’s the sort of person who’d become a mathematician. The trick was great. I wondered could I do it? Could I find some other useful expansion of the number system?
The square root of a complex-valued number sounded like the obvious way to go, until a little later that week when I learned that’s just some other complex-valued numbers. The next thing I hit on: how about the logarithm of a negative number? Couldn’t that be a useful expansion of numbers?
No. It turns out you can make a sensible logarithm of negative, and complex-valued, numbers using complex-valued numbers. Same with trigonometric and inverse trig functions, tangents and arccosines and all that. There isn’t anything we can do with the normal mathematical operations that needs something bigger than the complex-valued numbers to play with. It’s possible to expand on the complex-valued numbers. We can make quaternions and some more elaborate constructs there. They don’t solve any particular shortcoming in complex-valued numbers, but they’ve got their uses. I never got anywhere near reinventing them. I don’t regret the time spent on that. There’s something useful in trying to invent something even if it fails.
One problem with mathematics — with all intellectual fields, really — is that it’s easy, when teaching, to give the impression that this stuff is the Word of God, built into the nature of the universe and inarguable. It’s so not. The stuff we find interesting and how we describe those things are the results of human thought, attempts to say what is interesting about a thing and what is useful. And what best approximates our ideas of what we would like to know. So I was happy to see this come across my Twitter feed:
Some background on Euler, Leibniz, Bernoulli and the controversy about logs of negative numbers. https://t.co/MZAJ6LkTwX
Also: it turns out there’s not “the” logarithm of a complex-valued number. There’s infinitely many logarithms. But they’re a family, all strikingly similar, so we can pick one that’s convenient and just use that. Ask if you’re really interested.
I have today another request from gaurish, who’s also been good enough to give me requests for ‘Y’ and ‘Z’. I apologize for coming to this a day late. But it was Christmas and many things demanded my attention.
We start with complex-valued numbers. People discovered them because they were useful tools to solve polynomials. They turned out to be more than useful fictions, if numbers are anything more than useful fictions. We can add and subtract them easily. Multiply and divide them less easily. We can even raise them to powers, or raise numbers to them.
If you become a mathematics major then somewhere in Intro to Complex Analysis you’re introduced to an exotic, infinitely large sum. It’s spoken of reverently as the Riemann Zeta Function, and it connects to something named the Riemann Hypothesis. Then you remember that you’ve heard of this, because if you’re willing to become a mathematics major you’ve read mathematics popularizations. And you know the Riemann Hypothesis is an unsolved problem. It proposes something that might be true or might be false. Either way has astounding implications for the way numbers fit together.
Riemann here is Bernard Riemann, who’s turned up often in these A To Z sequences. We saw him in spheres and in sums, leading to integrals. We’ll see him again. Riemann just covered so much of 19th century mathematics; we can’t talk about calculus without him. Zeta, Xi, and later on, Gamma are the famous Greek letters. Mathematicians fall back on them because the Roman alphabet just hasn’t got enough letters for our needs. I’m writing them out as English words instead because if you aren’t familiar with them they look like an indistinct set of squiggles. Even if you are familiar, sometimes. I got confused in researching this some because I did slip between a lowercase-xi and a lowercase-zeta in my mind. All I can plead is it’s been a hard week.
Riemann’s Zeta function is famous. It’s easy to approach. You can write it as a sum. An infinite sum, but still, those are easy to understand. Pick a complex-valued number. I’ll call it ‘s’ because that’s the standard. Next take each of the counting numbers: 1, 2, 3, and so on. Raise each of them to the power ‘s’. And take the reciprocal, one divided by those numbers. Add all that together. You’ll get something. Might be real. Might be complex-valued. Might be zero. We know many values of ‘s’ what would give us a zero. The Riemann Hypothesis is about characterizing all the possible values of ‘s’ that give us a zero. We know some of them, so boring we call them trivial: -2, -4, -6, -8, and so on. (This looks crazy. There’s another way of writing the Riemann Zeta function which makes it obvious instead.) The Riemann Hypothesis is about whether all the proper, that is, non-boring values of ‘s’ that give us a zero are 1/2 plus some imaginary number.
It’s a rare thing mathematicians have only one way of writing. If something’s been known and studied for a long time there are usually variations. We find different ways to write the problem. Or we find different problems which, if solved, would solve the original problem. The Riemann Xi function is an example of this.
I’m going to spare you the formula for it. That’s in self-defense. I haven’t found an expression of the Xi function that isn’t a mess. The normal ways to write it themselves call on the Zeta function, as well as the Gamma function. The Gamma function looks like factorials, for the counting numbers. It does its own thing for other complex-valued numbers.
That said, I’m not sure what the advantages are in looking at the Xi function. The one that people talk about is its symmetry. Its value at a particular complex-valued number ‘s’ is the same as its value at the number ‘1 – s’. This may not seem like much. But it gives us this way of rewriting the Riemann Hypothesis. Imagine all the complex-valued numbers with the same imaginary part. That is, all the numbers that we could write as, say, ‘x + 4i’, where ‘x’ is some real number. If the size of the value of Xi, evaluated at ‘x + 4i’, always increases as ‘x’ starts out equal to 1/2 and increases, then the Riemann hypothesis is true. (This has to be true not just for ‘x + 4i’, but for all possible imaginary numbers. So, ‘x + 5i’, and ‘x + 6i’, and even ‘x + 4.1 i’ and so on. But it’s easier to start with a single example.)
Or another way to write it. Suppose the size of the value of Xi, evaluated at ‘x + 4i’ (or whatever), always gets smaller as ‘x’ starts out at a negative infinitely large number and keeps increasing all the way to 1/2. If that’s true, and true for every imaginary number, including ‘x – i’, then the Riemann hypothesis is true.
And it turns out if the Riemann hypothesis is true we can prove the two cases above. We’d write the theorem about this in our papers with the start ‘The Following Are Equivalent’. In our notes we’d write ‘TFAE’, which is just as good. Then we’d take which ever of them seemed easiest to prove and find out it isn’t that easy after all. But if we do get through we declare ourselves fortunate, sit back feeling triumphant, and consider going out somewhere to celebrate. But we haven’t got any of these alternatives solved yet. None of the equivalent ways to write it has helped so far.
We know some some things. For example, we know there are infinitely many roots for the Xi function with a real part that’s 1/2. This is what we’d need for the Riemann hypothesis to be true. But we don’t know that all of them are.
The Xi function isn’t entirely about what it can tell us for the Zeta function. The Xi function has its own exotic and wonderful properties. In a 2009 paper on arxiv.org, for example, Drs Yang-Hui He, Vishnu Jejjala, and Djordje Minic describe how if the zeroes of the Xi function are all exactly where we expect them to be then we learn something about a particular kind of string theory. I admit not knowing just what to say about a genus-one free energy of the topological string past what I have read in this paper. In another paper they write of how the zeroes of the Xi function correspond to the description of the behavior for a quantum-mechanical operator that I just can’t find a way to describe clearly in under three thousand words.
But mathematicians often speak of the strangeness that mathematical constructs can match reality so well. And here is surely a powerful one. We learned of the Riemann Hypothesis originally by studying how many prime numbers there are compared to the counting numbers. If it’s true, then the physics of the universe may be set up one particular way. Is that not astounding?
I’ve got another request from Gaurish today. And it’s a word I had been thinking to do anyway. When one looks for mathematical terms starting with ‘q’ this is one that stands out. I’m a little surprised I didn’t do it for last summer’s A To Z. But here it is at last:
I remember the seizing of my imagination the summer I learned imaginary numbers. If we could define a number i, so that i-squared equalled negative 1, and work out arithmetic which made sense out of that, why not do it again? Complex-valued numbers are great. Why not something more? Maybe we could also have some other non-real number. I reached deep into my imagination and picked j as its name. It could be something else. Maybe the logarithm of -1. Maybe the square root of i. Maybe something else. And maybe we could build arithmetic with a whole second other non-real number.
My hopes of this brilliant idea petered out over the summer. It’s easy to imagine a super-complex number, something that’s “1 + 2i + 3j”. And it’s easy to work out adding two super-complex numbers like this together. But multiplying them together? What should i times j be? I couldn’t solve the problem. Also I learned that we didn’t need another number to be the logarithm of -1. It would be π times i. (Or some other numbers. There’s some surprising stuff in logarithms of negative or of complex-valued numbers.) We also don’t need something special to be the square root of i, either. will do. (So will another number.) So I shelved the project.
Even if I hadn’t given up, I wouldn’t have invented something. Not along those lines. Finer minds had done the same work and had found a way to do it. The most famous of these is the quaternions. It has a famous discovery. Sir William Rowan Hamilton — the namesake of “Hamiltonian mechanics”, so you already know what a fantastic mind he was — had a flash of insight that’s come down in the folklore and romance of mathematical history. He had the idea on the 16th of October, 1843, while walking with his wife along the Royal Canal, in Dublin, Ireland. While walking across the bridge he saw what was missing. It seems he lacked pencil and paper. He carved it into the bridge:
And they are a mysterious three! i, j, and k are somehow not the same number. But each of them, multiplied by themselves, gives us -1. And the product of the three is -1. They are even more mysterious. To work sensibly, i times j can’t be the same thing as j times i. Instead, i times j equals minus j times i. And j times k equals minus k times j. And k times i equals minus i times k. We must give up commutivity, the idea that the order in which we multiply things doesn’t matter.
But if we’re willing to accept that the order matters, then quaternions are well-behaved things. We can add and subtract them just as we would think to do if we didn’t know they were strange constructs. If we keep the funny rules about the products of i and j and k straight, then we can multiply them as easily as we multiply polynomials together. We can even divide them. We can do all the things we do with real numbers, only with these odd sets of four real numbers.
The way they look, that pattern of 1 + 2i + 3j + 4k, makes them look a lot like vectors. And we can use them like vectors pointing to stuff in three-dimensional space. It’s not quite a comfortable fit, though. That plain old real number at the start of things seems like it ought to signify something, but it doesn’t. In practice, it doesn’t give us anything that regular old vectors don’t. And vectors allow us to ponder not just three- or maybe four-dimensional spaces, but as many as we need. You might wonder why we need more than four dimensions, even allowing for time. It’s because if we want to track a lot of interacting things, it’s surprisingly useful to put them all into one big vector in a very high-dimension space. It’s hard to draw, but the mathematics is nice. Hamiltonian mechanics, particularly, almost beg for it.
That’s not to call them useless, or even a niche interest. They do some things fantastically well. One of them is rotations. We can represent rotating a point around an arbitrary axis by an arbitrary angle as the multiplication of quaternions. There are many ways to calculate rotations. But if we need to do three-dimensional rotations this is a great one because it’s easy to understand and easier to program. And as you’d imagine, being able to calculate what rotations do is useful in all sorts of applications.
They’ve got good uses in number theory too, as they correspond well to the different ways to solve problems, often polynomials. They’re also popular in group theory. They might be the simplest rings that work like arithmetic but that don’t commute. So they can serve as ways to learn properties of more exotic ring structures.
Knowing of these marvelous exotic creatures of the deep mathematics your imagination might be fired. Can we do this again? Can we make something with, say, four unreal numbers? No, no we can’t. Four won’t work. Nor will five. If we keep going, though, we do hit upon success with seven unreal numbers.
This is a set called the octonions. Hamilton had barely worked out the scheme for quaternions when John T Graves, a friend of his at least up through the 16th of December, 1843, wrote of this new scheme. (Graves didn’t publish before Arthur Cayley did. Cayley’s one of those unspeakably prolific 19th century mathematicians. He has at least 967 papers to his credit. And he was a lawyer doing mathematics on the side for about 250 of those papers. This depresses every mathematician who ponders it these days.)
But where quaternions are peculiar, octonions are really peculiar. Let me call a couple quaternions p, q, and r. p times q might not be the same thing as q times r. But p times the product of q and r will be the same thing as the product of p and q itself times r. This we call associativity. Octonions don’t have that. Let me call a couple quaternions s, t, and u. s times the product of t times u may be either positive or negative the product of s and t times u. (It depends.)
Octonions have some neat mathematical properties. But I don’t know of any general uses for them that are as catchy as understanding rotations. Not rotations in the three-dimensional world, anyway.
Yes, yes, we can go farther still. There’s a construct called “sedenions”, which have fifteen non-real numbers on them. That’s 16 terms in each number. Where octonions are peculiar, sedenions are really peculiar. They work even less like regular old numbers than octonions do. With octonions, at least, when you multiply s by the product of s and t, you get the same number as you would multiplying s by s and then multiplying that by t. Sedenions don’t even offer that shred of normality. Besides being a way to learn about abstract algebra structures I don’t know what they’re used for.
I also don’t know of further exotic terms along this line. It would seem to fit a pattern if there’s some 32-term construct that we can define something like multiplication for. But it would presumably be even less like regular multiplication than sedenion multiplication is. If you want to fiddle about with that please do enjoy yourself. I’d be interested to hear if you turn up anything, but I don’t expect it’ll revolutionize the way I look at numbers. Sorry. But the discovery might be the fun part anyway.
I feel a bit odd about this week’s guest in the Set Tour. I’ve been mostly concentrating on sets that get used as the domains or ranges for functions a lot. The ones I want to talk about here don’t tend to serve the role of domain or range. But they are used a great deal in some interesting functions. So I loosen my rule about what to talk about.
Rm x n and Cm x n
Rm x n might explain itself by this point. If it doesn’t, then this may help: the “x” here is the multiplication symbol. “m” and “n” are positive whole numbers. They might be the same number; they might be different. So, are we done here?
Maybe not quite. I was fibbing a little when I said “x” was the multiplication symbol. R2 x 3 is not a longer way of saying R6, an ordered collection of six real-valued numbers. The x does represent a kind of product, though. What we mean by R2 x 3 is an ordered collection, two rows by three columns, of real-valued numbers. Say the “x” here aloud as “by” and you’re pronouncing it correctly.
What we get is called a “matrix”. If we put into it only real-valued numbers, it’s a “real matrix”, or a “matrix of reals”. Sometimes mathematical terminology isn’t so hard to follow. Just as with vectors, Rn, it matters just how the numbers are organized. R2 x 3 means something completely different from what R3 x 2 means. And swapping which positions the numbers in the matrix occupy changes what matrix we have, as you might expect.
You can add together matrices, exactly as you can add together vectors. The same rules even apply. You can only add together two matrices of the same size. They have to have the same number of rows and the same number of columns. You add them by adding together the numbers in the corresponding slots. It’s exactly what you would do if you went in without preconceptions.
You can also multiply a matrix by a single number. We called this scalar multiplication back when we were working with vectors. With matrices, we call this scalar multiplication. If it strikes you that we could see vectors as a kind of matrix, yes, we can. Sometimes that’s wise. We can see a vector as a matrix in the set R1 x n or as one in the set Rn x 1, depending on just what we mean to do.
It’s trickier to multiply two matrices together. As with vectors multiplying the numbers in corresponding positions together doesn’t give us anything. What we do instead is a time-consuming but not actually hard process. But according to its rules, something in Rm x n we can multiply by something in Rn x k. “k” is another whole number. The second thing has to have exactly as many rows as the first thing has columns. What we get is a matrix in Rm x k.
I grant you maybe didn’t see that coming. Also a potential complication: if you can multiply something in Rm x n by something in Rn x k, can you multiply the thing in Rn x k by the thing in Rm x n? … No, not unless k and m are the same number. Even if they are, you can’t count on getting the same product. Matrices are weird things this way. They’re also gateways to weirder things. But it is a productive weirdness, and I’ll explain why in a few paragraphs.
A matrix is a way of organizing terms. Those terms can be anything. Real matrices are surely the most common kind of matrix, at least in mathematical usage. Next in common use would be complex-valued matrices, much like how we get complex-valued vectors. These are written Cm x n. A complex-valued matrix is different from a real-valued matrix. The terms inside the matrix can be complex-valued numbers, instead of real-valued numbers. Again, sometimes, these mathematical terms aren’t so tricky.
I’ve heard occasionally of people organizing matrices of other sets. The notation is similar. If you’re building a matrix of “m” rows and “n” columns out of the things you find inside a set we’ll call H, then you write that as Hm x n. I’m not saying you should do this, just that if you need to, that’s how to tell people what you’re doing.
Now. We don’t really have a lot of functions that use matrices as domains, and I can think of fewer that use matrices as ranges. There are a couple of valuable ones, ones so valuable they get special names like “eigenvalue” and “eigenvector”. (Don’t worry about what those are.) They take in Rm x n or Cm x n and return a set of real- or complex-valued numbers, or real- or complex-valued vectors. Not even those, actually. Eigenvectors and eigenfunctions are only meaningful if there are exactly as many rows as columns. That is, for Rm x m and Cm x m. These are known as “square” matrices, just as you might guess if you were shaken awake and ordered to say what you guessed a “square matrix” might be.
They’re important functions. There are some other important functions, with names like “rank” and “condition number” and the like. But they’re not many. I believe they’re not even thought of as functions, any more than we think of “the length of a vector” as primarily a function. They’re just properties of these matrices, that’s all.
So why are they worth knowing? Besides the joy that comes of knowing something, I mean?
Here’s one answer, and the one that I find most compelling. There is cultural bias in this: I come from an applications-heavy mathematical heritage. We like differential equations, which study how stuff changes in time and in space. It’s very easy to go from differential equations to ordered sets of equations. The first equation may describe how the position of particle 1 changes in time. It might describe how the velocity of the fluid moving past point 1 changes in time. It might describe how the temperature measured by sensor 1 changes as it moves. It doesn’t matter. We get a set of these equations together and we have a majestic set of differential equations.
Now, the dirty little secret of differential equations: we can’t solve them. Most interesting physical phenomena are nonlinear. Linear stuff is easy. Small change 1 has effect A; small change 2 has effect B. If we make small change 1 and small change 2 together, this has effect A plus B. Nonlinear stuff, though … it just doesn’t work. Small change 1 has effect A; small change 2 has effect B. Small change 1 and small change 2 together has effect … A plus B plus some weird A times B thing plus some effect C that nobody saw coming and then C does something with A and B and now maybe we’d best hide.
There are some nonlinear differential equations we can solve. Those are the result of heroic work and brilliant insights. Compared to all the things we would like to solve there’s not many of them. Methods to solve nonlinear differential equations are as precious as ways to slay krakens.
But here’s what we can do. What we usually like to know about in systems are equilibriums. Those are the conditions in which the system stops changing. Those are interesting. We can usually find those points by boring but not conceptually challenging calculations. If we can’t, we can declare x0 represents the equilibrium. If we still care, we leave calculating its actual values to the interested reader or hungry grad student.
But what’s really interesting is: what happens if we’re near but not exactly at the equilibrium? Sometimes, we stay near it. Think of pushing a swing. However good a push you give, it’s going to settle back to the boring old equilibrium of dangling straight down. Sometimes, we go racing away from it. Think of trying to balance a pencil on its tip; if we did this perfectly it would stay balanced. It never does. We’re never perfect, or there’s some wind or somebody walks by and the perfect balance is foiled. It falls down and doesn’t bounce back up. Sometimes, whether it it stays near or goes away depends on what way it’s away from the equilibrium.
And now we finally get back to matrices. Suppose we are starting out near an equilibrium. We can, usually, approximate the differential equations that describe what will happen. The approximation may only be good if we’re just a tiny bit away from the equilibrium, but that might be all we really want to know. That approximation will be some linear differential equations. (If they’re not, then we’re just wasting our time.) And that system of linear differential equations we can describe using matrices.
If we can write what we are interested in as a set of linear differential equations, then we have won. We can use the many powerful tools of matrix arithmetic — linear algebra, specifically — to tell us everything we want to know about the system. We can say whether a small push away from the equilibrium stays small, or whether it grows, or whether it depends. We can say how fast the small push shrinks, or grows (for a while). We can say how the system will change, approximately.
This is what I love in matrices. It’s not everything there is to them. But it’s enough to make matrices important to me.
The square root of negative one. Everybody knows it doesn’t exist; there’s no real number you can multiply by itself and get negative one out. But then sometime in algebra, deep in a section about polynomials, suddenly we come out and declare there is such a thing. It’s an “imaginary number” that we call “i”. It’s hard to blame students for feeling betrayed by this. To make it worse, we throw real and imaginary numbers together and call the result “complex numbers”. It’s as if we’re out to tease them for feeling confused.
It’s an important set of things, though. It turns up as the domain, or the range, of functions so often that one of the major fields of analysis is called, “Complex Analysis”. If the course listing allows for more words, it’s called “Analysis of Functions of a Complex Variable” or something like that. Despite the connotations of the word “complex”, though, the field is a delight. It’s considerably easier to understand than Real Analysis, the study of functions of mere real numbers. When there is a theorem that has a version in Real Analysis and a version in Complex Analysis, the Complex Analysis side is usually easier to prove and easier to understand. It’s uncanny.
The set of all complex numbers is denoted C, in parallel to the set of real numbers, R. To make it clear that we mean this set, and not some piddling little common set that might happen to share the name C, add a vertical stroke to the left of the letter. This is just as we add a vertical stroke to R to emphasize we mean the Real Numbers. We should approach the set with respect, removing our hats, thinking seriously about great things. It would look silly to add a second curve to C though, so we just add a straight vertical stroke on the left side of the letter C. This makes it look a bit like it’s an Old English typeface (the kind you call Gothic until you learn that means “sans serif”) pared down to its minimum.
Why do we teach people there’s no such thing as a square root of minus one, and then one day, teach them there is? Part of it is that whether there is a square root depends on your context. If you are interested only in the real numbers, there’s nothing that, squared, gives you minus one. This is exactly the way that it’s not possible to equally divide five objects between two people if you aren’t allowed to cut the objects in half. But if you are willing to allow half-objects to be things, then you can do what was previously forbidden. What you can do depends on what the rules you set out are.
And there’s surely some echo of the historical discovery of imaginary and complex numbers at work here. They were noticed when working out the roots of third- and fourth-degree polynomials. These can be done by way of formulas that nobody ever remembers because there are so many better things to remember. These formulas would sometimes require one to calculate a square root of a negative number, a thing that obviously didn’t exist. Except that if you pretended it did, you could get out correct answers, just as if these were ordinary numbers. You can see why this may be dubbed an “imaginary” number. The name hints at the suspicion with which it’s viewed. It’s much as “negative” numbers look like some trap to people who’re just getting comfortable with fractions.
It goes against the stereotype of mathematicians to suppose they’d accept working with something they don’t understand because the results are all right, afterwards. But, actually, mathematicians are willing to accept getting answers by any crazy method. If you have a plausible answer, you can test whether it’s right, and if all you really need this minute is the right answer, good.
But we do like having methods; they’re more useful than mere answers. And we can imagine this set called the complex numbers. They contain … well, all the possible roots, the solutions, of all polynomials. (The polynomials might have coefficients — the numbers in front of the variable — of integers, or rational numbers, or irrational numbers. If we already accept the idea of complex numbers, the coefficients can be complex numbers too.)
It’s exceedingly common to think of the complex numbers by starting off with a new number called “i”. This is a number about which we know nothing except that i times i equals minus one. Then we tend to think of complex numbers as “a real number plus i times another real number”. The first real number gets called “the real component”, and is usually denoted as either “a” or “x”. The second real number gets called “the imaginary component”, and is usually denoted as either “b” or “y”. Then the complex number is written “a + i*b” or “x + i*y”. Sometimes it’s written “a + b*i” or “x + y*i”; that’s a mere matter of house style. Don’t let it throw you.
Writing a complex number this way has advantages. Particularly, it makes it easy to see how one would add together (or subtract) complex numbers: “a + b*i + x + y*i” almost suggests that the sum should be “(a + x) + (b + y)*i”. What we know from ordinary arithmetic gives us guidance. And if we’re comfortable with binomials, then we know how to multiply complex numbers. Start with “(a + b*i) * (x + y*i)” and follow the distributive law. We get, first, “a*x + a*y*i + b*i*x + b*y*i*i”. But “i*i” equals minus one, so this is the same as “a*x + a*y*i + b*i*x – b*y”. Move the real components together, and move the imaginary components together, and we have “(a*x – b*y) + (a*y + b*x)*i”.
It’s surprisingly natural to think of the real component as how far to the right or left of an origin your complex number is, and to think of the imaginary component as how far above or below the origin it is. Much complex-number work makes sense if you think of complex numbers as points in space, or directions in space. The language of vectors trips us up only a little bit here. We speak of a complex number as corresponding to a point on the “complex plane”, just as we might speak of a real number as a point on the “(real) number line”.
But there are other descriptions yet. We can represent complex numbers as a pair of numbers with a scheme that looks like polar coordinates. Pick a point on the complex plane. We can say where that is by two points of information. The first is the amplitude, or magnitude: how far the point is from the origin. The second is the phase, or angle: draw the line segment connecting the origin and your point. What angle does that make with the positive horizontal axis?
This representation is called the “phasor” representation. It’s tolerably popular in physics and I hear tell of engineers liking it. We represent numbers then not as “x + i*y” but instead as “r * eiθ”, with r the magnitude and θ the angle. “e” is the base of the natural logarithm, which you get very comfortable with if you do much mathematics or physics. And “i” is just what we’ve been talking about here. This is a pretty natural way to write about complex numbers that represent stuff that oscillates, such as alternating current or the probability function in quantum mechanics. A lot of stuff oscillates, if you study it through the right lens. So numbers that look like this keep creeping in, and into unexpected places. It’s quite easy to multiply numbers in phasor form — just multiply the magnitude parts, and add the angle parts — although addition and subtraction become a pain.
Mathematicians generally use the letter “z” to represent a complex-valued number whose identity is not known. As best I can tell, this is because we do think so much of a complex number as the sum “x + y*i”. So if we used familiar old “x” for an unknown number, it would carry the connotations of “the real component of our complex-valued number” and mislead the unwary mathematician. The connection is so common that a mathematician might carelessly switch between “z” and the real and imaginary components “x” and “y” without specifying that “z” is another way of writing “x + y*i”. A good copy editor or an alert student should catch this.
Complex numbers work very much like real numbers do. They add and multiply in natural-looking ways, and you can do subtraction and division just as well. You can take exponentials, and can define all the common arithmetic functions — sines and cosines, square roots and logarithms, integrals and differentials — on them just as well as you can with real numbers. And you can embed the real numbers within the complex numbers: if you have a real number x, you can match that perfectly with the complex number “x + 0*i”.
But that doesn’t mean complex numbers are exactly like the real numbers. For example, it’s possible to order the real numbers. You can say that the number “a” is less than the number “b”, and have that mean something. That’s not possible to do with complex numbers. You can’t say that “a + b*i” is less than, or greater than, “x + y*i” in a logically consistent way. You can say the magnitude of one complex-valued number is greater than the magnitude of another. But the magnitudes are real numbers. For all that complex numbers give us there are things they’re not good for.
This will be a hastily-written installment since I married just this weekend and have other things occupying me. But there’s still comics mentioning math subjects so let me summarize them for you. The first since my last collection of these, on the 13th of June, came on the 15th, with Dave Whamond’s Reality Check, which goes into one of the minor linguistic quirks that bothers me: the claim that one can’t give “110 percent,” since 100 percent is all there is. I don’t object to phrases like “110 percent”, though, since it seems to me the baseline, the 100 percent, must be to some standard reference performance. For example, the Space Shuttle Main Engines routinely operated at around 104 percent, not because they were exceeding their theoretical limits, but because the original design thrust was found to be not quite enough, and the engines were redesigned to deliver more thrust, and it would have been far too confusing to rewrite all the documentation so that the new design thrust was the new 100 percent. Instead 100 percent was the design capacity of an engine which never flew but which existed in paper form. So I’m forgiving of “110 percent” constructions, is the important thing to me.
Since I suspect that the comics roundup posts are the most popular ones I post, I’m very glad to see there was a bumper crop of strips among the ones I read regularly (from King Features Syndicate and from gocomics.com) this past week. Some of those were from cancelled strips in perpetual reruns, but that’s fine, I think: there aren’t any particular limits on how big an electronic comics page one can have, after all, and while it’s possible to read a short-lived strip long enough that you see all its entries, it takes a couple go-rounds to actually have them all memorized.