My 2018 Mathematics A To Z: Extreme Value Theorem


The letter ‘X’ is a problem. For all that the letter ‘x’ is important to mathematics there aren’t many mathematical terms starting with it. Mr Wu, mathematics tutor and author of the MathTuition88 blog, had a suggestion. Why not 90s it up a little and write about an Extreme theorem? I’m game.

The Extreme Value Theorem, which I chose to write about, is a fundamental bit of analysis. There is also a similarly-named but completely unrelated Extreme Value Theory. This exists in the world of statistics. That’s about outliers, and about how likely it is you’ll find an even more extreme outlier if you continue sampling. This is valuable in risk assessment: put another way, it’s the question of what neighborhoods you expect to flood based on how the river’s overflowed the last hundred years. Or be in a wildfire, or be hit by a major earthquake, or whatever. The more I think about it the more I realize that’s worth discussing too. Maybe in the new year, if I decide to do some A To Z extras.

Cartoon of a thinking coati (it's a raccoon-like animal from Latin America); beside him are spelled out on Scrabble titles, 'MATHEMATICS A TO Z', on a starry background. Various arithmetic symbols are constellations in the background.
Art by Thomas K Dye, creator of the web comics Newshounds, Something Happens, and Infinity Refugees. His current project is Projection Edge. And you can get Projection Edge six months ahead of public publication by subscribing to his Patreon. And he’s on Twitter as @Newshoundscomic.

Extreme Value Theorem.

There are some mathematical theorems which defy intuition. You can encounter one and conclude that can’t be so. This can inspire one to study mathematics, to understand how it could be. Famously, the philosopher Thomas Hobbes encountered the Pythagorean Theorem and disbelieved it. He then fell into a controversial love with the subject. Some you can encounter, and study, and understand, and never come to believe. This would be the Banach-Tarski Paradox. It’s the realization that one can split a ball into as few as five pieces, and reassemble the pieces, and have two complete balls. They can even be wildly larger or smaller than the one you started with. It’s dazzling.

And then there are theorems that seem the opposite. Ones that seem so obvious, and so obviously true, that they hardly seem like mathematics. If they’re not axioms, they might as well be. The extreme value theorem is one of these.

It’s a theorem about functions. Here, functions that have a domain and a range that are both real numbers. Even more specifically, about continuous functions. “Continuous” is a tricky idea to make precise, but we don’t have to do it. A century of mathematicians worked out meanings that correspond pretty well to what you’d imagine it should mean. It means you can draw a graph representing the function without lifting the pen. (Do not attempt to use this definition at your thesis defense. I’m skipping what a century’s worth of hard thinking about the subject.)

And it’s a theorem about “extreme” values. “Extreme” is a convenient word. It means “maximum or minimum”. We’re often interested in the greatest or least value of a function. Having a scheme to find the maximum is as good as having one to find a minimum. So there’s little point talking about them as separate things. But that forces us to use a bunch of syllables. Or to adopt a convention that “by maximum we always mean maximum or minimum”. We could say we mean that, but I’ll bet a good number of mathematicians, and 95% of mathematics students, would forget the “or minimum” within ten minutes. “Extreme”, then. It’s short and punchy and doesn’t commit us to a maximum or a minimum. It’s simply the most outstanding value we can find.

The Extreme Value Theorem doesn’t help us find them. It only proves to us there is an extreme to find. Particularly, it says that if a continuous function has a domain that’s a closed interval, then it has to have a maximum and a minimum. And it has to attain the maximum and the minimum at least once each. That is, something in the domain matches to the maximum. And something in the domain matches to the minimum. Could be multiple times, yes.

This might not seem like much of a theorem. Existence proofs rarely do. It’s a bias, I suppose. We like to think we’re out looking for solutions. So we suppose there’s a solution to find. Checking that there is an answer before we start looking? That seems excessive. Before heading to the airport we might check the flight wasn’t delayed. But we almost never check that there is still a Newark to fly to. I’m not sure, in working out problems, that we check it explicitly. We decide early on that we’re working with continuous functions and so we can try out the usual approaches. That we use the theorem becomes invisible.

And that’s sort of the history of this theorem. The Extreme Value Theorem, for example, is part of how we now prove Rolle’s Theorem. Rolle’s theorem is about functions continuous and differentiable on the interval from a to b. And functions that have the same value for a and for b. The conclusion is the function hass got a local maximum or minimum in-between these. It’s the theorem depicted in that xkcd comic you maybe didn’t check out a few paragraphs ago. Rolle’s Theorem is named for Michael Rolle, who prove the theorem (for polynomials) in 1691. The Indian mathematician Bhaskara II, in the 12th century, is credited with stating the theorem too. The Extreme Value Theorem was proven around 1860. (There was an earlier proof, by Bernard Bolzano, whose name you’ll find all over talk about limits and functions and continuity and all. But that was unpublished until 1930. The proofs known about at the time were done by Karl Weierstrass. His is the other name you’ll find all over talk about limits and functions and continuity and all. Go on, now, guess who it was proved the Extreme Value Theorem. And guess what theorem, bearing the name of two important 19th-century mathematicians, is at the core of proving that. You need at most two chances!) That is, mathematicians were comfortable using the theorem before it had a clear identity.

Once you know that it’s there, though, the Extreme Value Theorem’s a great one. It’s useful. Rolle’s Theorem I just went through. There’s also the quite similar Mean Value Theorem. This one is about functions continuous and differentiable on an interval. It tells us there’s at least one point where the derivative is equal to the mean slope of the function on that interval. This is another theorem that’s a quick proof once you have the Extreme Value Theorem. Or we can get more esoteric. There’s a technique known as Lagrange Multipliers. It’s a way to find where on a constrained surface a function is at its maximum or minimum. It’s a clever technique, one that I needed time to accept as a thing that could possibly work. And why should it work? Go ahead, guess what the centerpiece of at least one method of proving it is.

Step back from calculus and into real analysis. That’s the study of why calculus works, and how real numbers work. The Extreme Value Theorem turns up again and again. Like, one technique for defining the integral itself is to approximate a function with a “stepwise” function. This is one that looks like a pixellated, rectangular approximation of the function. The definition depends on having a stepwise rectangular approximation that’s as close as you can get to a function while always staying less than it. And another stepwise rectangular approximation that’s as close as you can get while always staying greater than it.

And then other results. Often in real analysis we want to know about whether sets are closed and bounded. The Extreme Value Theorem has a neat corollary. Start with a continuous function with domain that’s a closed and bounded interval. Then, this theorem demonstrates, the range is also a closed and bounded interval. I know this sounds like a technical point. But it is the sort of technical point that makes life easier.

The Extreme Value Theorem even takes on meaning when we don’t look at real numbers. We can rewrite it in topological spaces. These are sets of points for which we have an idea of a “neighborhood” of points. We don’t demand that we know what distance is exactly, though. What had been a closed and bounded interval becomes a mathematical construct called a “compact set”. The idea of a continuous function changes into one about the image of an open set being another open set. And there is still something recognizably the Extreme Value Theorem. It tells us about things called the supremum and infimum, which are slightly different from the maximum and minimum. Just enough to confuse the student taking real analysis the first time through.

Topological spaces are an abstracted concept. Real numbers are topological spaces, yes. But many other things also are. Neighborhoods and compact sets and open sets are also abstracted concepts. And so this theorem has its same quiet utility in these many spaces. It’s just there quietly supporting more challenging work.


And now I get to really relax: I already have a Reading the Comics post ready for tomorrow, and Sunday’s is partly written. Now I just have to find a mathematical term starting with ‘Y’ that’s interesting enough to write about.

Some Mathematics Things I Read On Twitter


I had thought I’d culled some more pieces from my Twitter and other mathematics-writing-reading the last couple weeks and I’m not sure where it all went. I think I might be baffled by the repostings of things on Quanta Magazine (which has a lot of good mathematics articles, but not, like, a 3,000-word piece every day, and they showcase their archive just as anyone ought).

So, here, first.

It reviews Kim Plofker’s 2008 text Mathematics In India, a subject that I both know is important — I love to teach with historic context included — and something that I very much bluff my way through. I mean, I do research things I expect I’ll mention, but I don’t learn enough of the big picture and a determined questioner could prove how fragile my knowledge was. So Plofker’s book should go on my reading list at least.

These are lecture notes about analysis. In the 19th century mathematicians tried to tighten up exactly what we meant by things like “functions” and “limits” and “integrals” and “numbers” and all that. It was a lot of good solid argument, and a lot of surprising, intuition-defying results. This isn’t something that a lay reader’s likely to appreciate, and I’m sorry for that, but if you do know the difference between Riemann and Lebesgue integrals the notes are likely to help.

And this, Daniel Grieser and Svenja Maronna’s Hearing The Shape Of A Triangle, follows up on a classic mathematics paper, Mark Kac’s Can One Hear The Shape Of A Drum? This is part of a class of problems in which you try to reconstruct what kinds of things can produce a signal. It turns out to be impossible to perfectly say what shape and material of a drum produced a certain sound of a drum. But. A triangle — the instrument, that is, but also the shape — has a simpler structure. Could we go from the way a triangle sounds to knowing what it looks like?

And I mentioned this before but if you want to go reading every Calvin and Hobbes strip to pick out the ones that mention mathematics, you can be doing someone a favor too.

As I Try To Make Wronski’s Formula For Pi Into Something I Like


Previously:

I remain fascinated with Józef Maria Hoëne-Wronski’s attempted definition of π. It had started out like this:

\pi = \frac{4\infty}{\sqrt{-1}}\left\{ \left(1 + \sqrt{-1}\right)^{\frac{1}{\infty}} -  \left(1 - \sqrt{-1}\right)^{\frac{1}{\infty}} \right\}

And I’d translated that into something that modern mathematicians would accept without flinching. That is to evaluate the limit of a function that looks like this:

\displaystyle \lim_{x \to \infty} f(x)

where

f(x) = -4 \imath x \left\{ \left(1 + \imath\right)^{\frac{1}{x}} -  \left(1 - \imath\right)^{\frac{1}{x}} \right\}

So. I don’t want to deal with that f(x) as it’s written. I can make it better. One thing that bothers me is seeing the complex number 1 + \imath raised to a power. I’d like to work with something simpler than that. And I can’t see that number without also noticing that I’m subtracting from it 1 - \imath raised to the same power. 1 + \imath and 1 - \imath are a “conjugate pair”. It’s usually nice to see those. It often hints at ways to make your expression simpler. That’s one of those patterns you pick up from doing a lot of problems as a mathematics major, and that then look like magic to the lay audience.

Here’s the first way I figure to make my life simpler. It’s in rewriting that 1 + \imath and 1 - \imath stuff so it’s simpler. It’ll be simpler by using exponentials. Shut up, it will too. I get there through Gauss, Descartes, and Euler.

At least I think it was Gauss who pointed out how you can match complex-valued numbers with points on the two-dimensional plane. On a sheet of graph paper, if you like. The number 1 + \imath matches to the point with x-coordinate 1, y-coordinate 1. The number 1 - \imath matches to the point with x-coordinate 1, y-coordinate -1. Yes, yes, this doesn’t sound like much of an insight Gauss had, but his work goes on. I’m leaving it off here because that’s all that I need for right now.

So these two numbers that offended me I can think of as points. They have Cartesian coordinates (1, 1) and (1, -1). But there’s never only one coordinate system for something. There may be only one that’s good for the problem you’re doing. I mean that makes the problem easier to study. But there are always infinitely many choices. For points on a flat surface like a piece of paper, and where the points don’t represent any particular physics problem, there’s two good choices. One is the Cartesian coordinates. In it you refer to points by an origin, an x-axis, and a y-axis. How far is the point from the origin in a direction parallel to the x-axis? (And in which direction? This gives us a positive or a negative number) How far is the point from the origin in a direction parallel to the y-axis? (And in which direction? Same positive or negative thing.)

The other good choice is polar coordinates. For that we need an origin and a positive x-axis. We refer to points by how far they are from the origin, heedless of direction. And then to get direction, what angle the line segment connecting the point with the origin makes with the positive x-axis. The first of these numbers, the distance, we normally label ‘r’ unless there’s compelling reason otherwise. The other we label ‘θ’. ‘r’ is always going to be a positive number or, possibly, zero. ‘θ’ might be any number, positive or negative. By convention, we measure angles so that positive numbers are counterclockwise from the x-axis. I don’t know why. I guess it seemed less weird for, say, the point with Cartesian coordinates (0, 1) to have a positive angle rather than a negative angle. That angle would be \frac{\pi}{2} , because mathematicians like radians more than degrees. They make other work easier.

So. The point 1 + \imath corresponds to the polar coordinates r = \sqrt{2} and \theta = \frac{\pi}{4} . The point 1 - \imath corresponds to the polar coordinates r = \sqrt{2} and \theta = -\frac{\pi}{4} . Yes, the θ coordinates being negative one times each other is common in conjugate pairs. Also, if you have doubts about my use of the word “the” before “polar coordinates”, well-spotted. If you’re not sure about that thing where ‘r’ is not negative, again, well-spotted. I intend to come back to that.

With the polar coordinates ‘r’ and ‘θ’ to describe a point I can go back to complex numbers. I can match the point to the complex number with the value given by r e^{\imath\theta} , where ‘e’ is that old 2.71828something number. Superficially, this looks like a big dumb waste of time. I had some problem with imaginary numbers raised to powers, so now, I’m rewriting things with a number raised to imaginary powers. Here’s why it isn’t dumb.

It’s easy to raise a number written like this to a power. r e^{\imath\theta} raised to the n-th power is going to be equal to r^n e^{\imath\theta \cdot n} . (Because (a \cdot b)^n = a^n \cdot b^n and we’re going to go ahead and assume this stays true if ‘b’ is a complex-valued number. It does, but you’re right to ask how we know that.) And this turns into raising a real-valued number to a power, which we know how to do. And it involves dividing a number by that power, which is also easy.

And we can get back to something that looks like 1 + \imath too. That is, something that’s a real number plus \imath times some real number. This is through one of the many Euler’s Formulas. The one that’s relevant here is that e^{\imath \phi} = \cos(\phi) + \imath \sin(\phi) for any real number ‘φ’. So, that’s true also for ‘θ’ times ‘n’. Or, looking to where everybody knows we’re going, also true for ‘θ’ divided by ‘x’.

OK, on to the people so anxious about all this. I talked about the angle made between the line segment that connects a point and the origin and the positive x-axis. “The” angle. “The”. If that wasn’t enough explanation of the problem, mention how your thinking’s done a 360 degree turn and you see it different now. In an empty room, if you happen to be in one. Your pedantic know-it-all friend is explaining it now. There’s an infinite number of angles that correspond to any given direction. They’re all separated by 360 degrees or, to a mathematician, 2π.

And more. What’s the difference between going out five units of distance in the direction of angle 0 and going out minus-five units of distance in the direction of angle -π? That is, between walking forward five paces while facing east and walking backward five paces while facing west? Yeah. So if we let ‘r’ be negative we’ve got twice as many infinitely many sets of coordinates for each point.

This complicates raising numbers to powers. θ times n might match with some point that’s very different from θ-plus-2-π times n. There might be a whole ring of powers. This seems … hard to work with, at least. But it’s, at heart, the same problem you get thinking about the square root of 4 and concluding it’s both plus 2 and minus 2. If you want “the” square root, you’d like it to be a single number. At least if you want to calculate anything from it. You have to pick out a preferred θ from the family of possible candidates.

For me, that’s whatever set of coordinates has ‘r’ that’s positive (or zero), and that has ‘θ’ between -π and π. Or between 0 and 2π. It could be any strip of numbers that’s 2π wide. Pick what makes sense for the problem you’re doing. It’s going to be the strip from -π to π. Perhaps the strip from 0 to 2π.

What this all amounts to is that I can turn this:

f(x) = -4 \imath x \left\{ \left(1 + \imath\right)^{\frac{1}{x}} -  \left(1 - \imath\right)^{\frac{1}{x}} \right\}

into this:

f(x) = -4 \imath x \left\{ \left(\sqrt{2} e^{\imath \frac{\pi}{4}}\right)^{\frac{1}{x}} -  \left(\sqrt{2} e^{-\imath \frac{\pi}{4}} \right)^{\frac{1}{x}} \right\}

without changing its meaning any. Raising a number to the one-over-x power looks different from raising it to the n power. But the work isn’t different. The function I wrote out up there is the same as this function:

f(x) = -4 \imath x \left\{ \sqrt{2}^{\frac{1}{x}} e^{\imath \frac{\pi}{4}\cdot\frac{1}{x}} - \sqrt{2}^{\frac{1}{x}} e^{-\imath \frac{\pi}{4}\cdot\frac{1}{x}} \right\}

I can’t look at that number, \sqrt{2}^{\frac{1}{x}} , sitting there, multiplied by two things added together, and leave that. (OK, subtracted, but same thing.) I want to something something distributive law something and that gets us here:

f(x) = -4 \imath x \sqrt{2}^{\frac{1}{x}} \left\{ e^{\imath \frac{\pi}{4}\cdot\frac{1}{x}} -  e^{- \imath \frac{\pi}{4}\cdot\frac{1}{x}} \right\}

Also, yeah, that square root of two raised to a power looks weird. I can turn that square root of two into “two to the one-half power”. That gets to this rewrite:

f(x) = -4 \imath x 2^{\frac{1}{2}\cdot \frac{1}{x}} \left\{ e^{\imath \frac{\pi}{4}\cdot\frac{1}{x}} -  e^{- \imath \frac{\pi}{4}\cdot\frac{1}{x}} \right\}

And then. Those parentheses. e raised to an imaginary number minus e raised to minus-one-times that same imaginary number. This is another one of those magic tricks that mathematicians know because they see it all the time. Part of what we know from Euler’s Formula, the one I waved at back when I was talking about coordinates, is this:

\sin\left(\phi\right) = \frac{e^{\imath \phi} - e^{-\imath \phi}}{2\imath }

That’s good for any real-valued φ. For example, it’s good for the number \frac{\pi}{4}\cdot\frac{1}{x} . And that means we can rewrite that function into something that, finally, actually looks a little bit simpler. It looks like this:

f(x) = -2 x 2^{\frac{1}{2}\cdot \frac{1}{x}} \sin\left(\frac{\pi}{4}\cdot \frac{1}{x}\right)

And that’s the function whose limit I want to take at ∞. No, really.

The Summer 2017 Mathematics A To Z: Zeta Function


Today Gaurish, of For the love of Mathematics, gives me the last subject for my Summer 2017 A To Z sequence. And also my greatest challenge: the Zeta function. The subject comes to all pop mathematics blogs. It comes to all mathematics blogs. It’s not difficult to say something about a particular zeta function. But to say something at all original? Let’s watch.

Summer 2017 Mathematics A to Z, featuring a coati (it's kind of the Latin American raccoon) looking over alphabet blocks, with a lot of equations in the background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patreon for those able to support his work. He’s also open for commissions, starting from US$10.

Zeta Function.

The spring semester of my sophomore year I had Intro to Complex Analysis. Monday Wednesday 7:30; a rare evening class, one of the few times I’d eat dinner and then go to a lecture hall. There I discovered something strange and wonderful. Complex Analysis is a far easier topic than Real Analysis. Both are courses about why calculus works. But why calculus for complex-valued numbers works is a much easier problem than why calculus for real-valued numbers works. It’s dazzling. Part of this is that Complex Analysis, yes, builds on Real Analysis. So Complex can take for granted some things that Real has to prove. I didn’t mind. Given the way I crashed through Intro to Real Analysis I was glad for a subject that was, relatively, a breeze.

As we worked through Complex Variables and Applications so many things, so very many things, got to be easy. The basic unit of complex analysis, at least as we young majors learned it, was in contour integrals. These are integrals whose value depends on the values of a function on a closed loop. The loop is in the complex plane. The complex plane is, well, your ordinary plane. But we say the x-coordinate and the y-coordinate are parts of the same complex-valued number. The x-coordinate is the real-valued part. The y-coordinate is the imaginary-valued part. And we call that summation ‘z’. In complex-valued functions ‘z’ serves the role that ‘x’ does in normal mathematics.

So a closed loop is exactly what you think. Take a rubber band and twist it up and drop it on the table. That’s a closed loop. Suppose you want to integrate a function, ‘f(z)’. If you can always take its derivative on this loop and on the interior of that loop, then its contour integral is … zero. No matter what the function is. As long as it’s “analytic”, as the terminology has it. Yeah, we were all stunned into silence too. (Granted, mathematics classes are usually quiet, since it’s hard to get a good discussion going. Plus many of us were in post-dinner digestive lulls.)

Integrating regular old functions of real-valued numbers is this tedious process. There’s sooooo many rules and possibilities and special cases to consider. There’s sooooo many tricks that get you the integrals of some functions. And then here, with complex-valued integrals for analytic functions, you know the answer before you even look at the function.

As you might imagine, since this is only page 113 of a 341-page book there’s more to it. Most functions that anyone cares about aren’t analytic. At least they’re not analytic everywhere inside regions that might be interesting. There’s usually some points where an interesting function ‘f(z)’ is undefined. We call these “singularities”. Yes, like starships are always running into. Only we rarely get propelled into other universes or other times or turned into ghosts or stuff like that.

So much of the rest of the course turns into ways to avoid singularities. Sometimes you can spackel them over. This is when the function happens not to be defined somewhere, but you can see what it ought to be. Sometimes you have to do something more. This turns into a search for “removable” singularities. And this does something so brilliant it looks illicit. You modify your closed loop, so that it comes up very close, as close as possible, to the singularity, but studiously avoids it. Follow this game of I’m-not-touching-you right and you can turn your integral into two parts. One is the part that’s equal to zero. The other is the part that’s a constant times whatever the function is at the singularity you’re removing. And that ought to be easy to find the value for. (Being able to find a function’s value doesn’t mean you can find its derivative.)

Those tricks were hard to master. Not because they were hard. Because they were easy, in a context where we expected hard. But after that we got into how to move singularities. That is, how to do a change of variables that moved the singularities to where they’re more convenient for some reason. How could this be more convenient? Because of chapter five, series. In regular old calculus we learn how to approximate well-behaved functions with polynomials. In complex-variable calculus, we learn the same thing all over again. They’re polynomials of complex-valued variables, but it’s the same sort of thing. And not just polynomials, but things that look like polynomials except they’re powers of \frac{1}{z} instead. These open up new ways to approximate functions, and to remove singularities from functions.

And then we get into transformations. These are about turning a problem that’s hard into one that’s easy. Or at least different. They’re a change of variable, yes. But they also change what exactly the function is. This reshuffles the problem. Makes for a change in singularities. Could make ones that are easier to work with.

One of the useful, and so common, transforms is called the Laplace-Stieltjes Transform. (“Laplace” is said like you might guess. “Stieltjes” is said, or at least we were taught to say it, like “Stilton cheese” without the “ton”.) And it tends to create functions that look like a series, the sum of a bunch of terms. Infinitely many terms. Each of those terms looks like a number times another number raised to some constant times ‘z’. As the course came to its conclusion, we were all prepared to think about these infinite series. Where singularities might be. Which of them might be removable.

These functions, these results of the Laplace-Stieltjes Transform, we collectively call ‘zeta functions’. There are infinitely many of them. Some of them are relatively tame. Some of them are exotic. One of them is world-famous. Professor Walsh — I don’t mean to name-drop, but I discovered the syllabus for the course tucked in the back of my textbook and I’m delighted to rediscover it — talked about it.

That world-famous one is, of course, the Riemann Zeta function. Yes, that same Riemann who keeps turning up, over and over again. It looks simple enough. Almost tame. Take the counting numbers, 1, 2, 3, and so on. Take your ‘z’. Raise each of the counting numbers to that ‘z’. Take the reciprocals of all those numbers. Add them up. What do you get?

A mass of fascinating results, for one. Functions you wouldn’t expect are concealed in there. There’s strips where the real part is zero. There’s strips where the imaginary part is zero. There’s points where both the real and imaginary parts are zero. We know infinitely many of them. If ‘z’ is -2, for example, the sum is zero. Also if ‘z’ is -4. -6. -8. And so on. These are easy to show, and so are dubbed ‘trivial’ zeroes. To say some are ‘trivial’ is to say that there are others that are not trivial. Where are they?

Professor Walsh explained. We know of many of them. The nontrivial zeroes we know of all share something in common. They have a real part that’s equal to 1/2. There’s a zero that’s at about the number \frac{1}{2} - \imath 14.13 . Also at \frac{1}{2} + \imath 14.13 . There’s one at about \frac{1}{2} - \imath 21.02 . Also about \frac{1}{2} + \imath 21.02 . (There’s a symmetry, you maybe guessed.) Every nontrivial zero we’ve found has a real component that’s got the same real-valued part. But we don’t know that they all do. Nobody does. It is the Riemann Hypothesis, the great unsolved problem of mathematics. Much more important than that Fermat’s Last Theorem, which back then was still merely a conjecture.

What a prospect! What a promise! What a way to set us up for the final exam in a couple of weeks.

I had an inspiration, a kind of scheme of showing that a nontrivial zero couldn’t be within a given circular contour. Make the size of this circle grow. Move its center farther away from the z-coordinate \frac{1}{2} + \imath 0 to match. Show there’s still no nontrivial zeroes inside. And therefore, logically, since I would have shown nontrivial zeroes couldn’t be anywhere but on this special line, and we know nontrivial zeroes exist … I leapt enthusiastically into this project. A little less enthusiastically the next day. Less so the day after. And on. After maybe a week I went a day without working on it. But came back, now and then, prodding at my brilliant would-be proof.

The Riemann Zeta function was not on the final exam, which I’ve discovered was also tucked into the back of my textbook. It asked more things like finding all the singular points and classifying what kinds of singularities they were for functions like e^{-\frac{1}{z}} instead. If the syllabus is accurate, we got as far as page 218. And I’m surprised to see the professor put his e-mail address on the syllabus. It was merely “bwalsh@math”, but understand, the Internet was a smaller place back then.

I finished the course with an A-, but without answering any of the great unsolved problems of mathematics.

The Summer 2017 Mathematics A To Z: N-Sphere/N-Ball


Today’s glossary entry is a request from Elke Stangl, author of the Elkemental Force blog, which among other things has made me realize how much there is interesting to say about heat pumps. Well, you never know what’s interesting before you give it serious thought.

Summer 2017 Mathematics A to Z, featuring a coati (it's kind of the Latin American raccoon) looking over alphabet blocks, with a lot of equations in the background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patreon for those able to support his work. He’s also open for commissions, starting from US$10.

N-Sphere/N-Ball.

I’ll start with space. Mathematics uses a lot of spaces. They’re inspired by geometry, by the thing that fills up our room. Sometimes we make them different by simplifying them, by thinking of the surface of a table, or what geometry looks like along a thread. Sometimes we make them bigger, imagining a space with more directions than we have. Sometimes we make them very abstract. We realize that we can think of polynomials, or functions, or shapes as if they were points in space. We can describe things that work like distance and direction and angle that work for these more abstract things.

What are useful things we know about space? Many things. Whole books full of things. Let me pick one of them. Start with a point. Suppose we have a sense of distance, of how far one thing is from one another. Then we can have an idea of the neighborhood. We can talk about some chunk of space that’s near our starting point.

So let’s agree on a space, and on some point in that space. You give me a distance. I give back to you — well, two obvious choices. One of them is all the points in that space that are exactly that distance from our agreed-on point. We know what this is, at least in the two kinds of space we grow up comfortable with. In three-dimensional space, this is a sphere. A shell, at least, centered around whatever that first point was. In two-dimensional space, on our desktop, it’s a circle. We know it can look a little weird: if we started out in a one-dimensional space, there’d be only two points, one on either side of the original center point. But it won’t look too weird. Imagine a four-dimensional space. Then we can speak of a hypersphere. And we can imagine that as being somehow a ball that’s extremely spherical. Maybe it pokes out of the rendering we try making of it, like a cartoon character falling out of the movie screen. We can imagine a five-dimensional space, or a ten-dimensional one, or something with even more dimensions. And we can conclude there’s a sphere for even that much space. Well, let it.

What are spheres good for? Well, they’re nice familiar shapes. Even if they’re in a weird number of dimensions. They’re useful, too. A lot of what we do in calculus, and in analysis, is about dealing with difficult points. Points where a function is discontinuous. Points where the function doesn’t have a value. One of calculus’s reliable tricks, though, is that we can swap information about the edge of things for information about the interior. We can replace a point with a sphere and find our work is easier.

The other thing I could give you. It’s a ball. That’s all the points that aren’t more than your distance away from our point. It’s the inside, the whole planet rather than just the surface of the Earth.

And here’s an ambiguity. Is the surface a part of the ball? Should we include the edge, or do we just want the inside? And that depends on what we want to do. Either might be right. If we don’t need the edge, then we have an open set (stick around for Friday). This gives us the open ball. If we do need the edge, then we have a closed set, and so, the closed ball.

Balls are so useful. Take a chunk of space that you find interesting for whatever reason. We can represent that space as the joining together (the “union”) of a bunch of balls. Probably not all the same size, but that’s all right. We might need infinitely many of these balls to get the chunk precisely right, or as close to right as can be. But that’s all right. We can still do it. Most anything we want to analyze is easier to prove on any one of these balls. And since we can describe the complicated shape as this combination of balls, then we can know things about the whole complicated shape. It’s much the way we can know things about polygons by breaking them into triangles, and showing things are true about triangles.

Sphere or ball, whatever you like. We can describe how many dimensions of space the thing occupies with the prefix. The 3-ball is everything close enough to a point that’s in a three-dimensional space. The 2-ball is everything close enough in a two-dimensional space. The 10-ball is everything close enough to a point in a ten-dimensional space. The 3-sphere is … oh, all right. Here we have a little squabble. People doing geometry prefer this to be the sphere in three dimensions. People doing topology prefer this to be the sphere whose surface has three dimensions, that is, the sphere in four dimensions. Usually which you mean will be clear from context: are you reading a geometry or a topology paper? If you’re not sure, oh, look for anything hinting at the number of spatial dimensions. If nothing gives you a hint maybe it doesn’t matter.

Either way, we do want to talk about the family of shapes without committing ourselves to any particular number of dimensions. And so that’s why we fall back on ‘N’. ‘N’ is a good name for “the number of dimensions we’re working in”, and so we use it. Then we have the N-sphere and the N-ball, a sphere-like shape, or a ball-like shape, that’s in however much space we need for the problem.

I mentioned something early on that I bet you paid no attention to. That was that we need a space, and a point inside the space, and some idea of distance. One of the surprising things mathematics teaches us about distance is … there’s a lot of ideas of distance out there. We have what I’ll call an instinctive idea of distance. It’s the one that matches what holding a ruler up to stuff tells us. But we don’t have to have that.

I sense the grumbling already. Yes, sure, we can define distance by some screwball idea, but do we ever need it? To which the mathematician answers, well, what if you’re trying to figure out how far away something in midtown Manhattan is? Where you can only walk along streets or avenues and we pretend Broadway doesn’t exist? Huh? How about that? Oh, fine, the skeptic might answer. Grant that there can be weird cases where the straight-line ruler distance is less enlightening than some other scheme is.

Well, there are. There exists a whole universe of different ideas of distance. There’s a handful of useful ones. The ordinary straight-line ruler one, the Euclidean distance, you get in a method so familiar it’s worth saying what you do. You find the coordinates of your two given points. Take the pairs of corresponding coordinates: the x-coordinates of the two points, the y-coordinates of the two points, the z-coordinates, and so on. Find the differences between corresponding coordinates. Take the absolute value of those differences. Square all those absolute-value differences. Add up all those squares. Take the square root of that. Fine enough.

There’s a lot of novelty acts. For example, do that same thing, only instead of raising the differences to the second power, raise them to the 26th power. When you get the sum, instead of the square root, take the 26th root. There. That’s a legitimate distance. No, you will never need this, but your analysis professor might give you it as a homework problem sometime.

Some are useful, though. Raising to the first power, and then eventually taking the first root, gives us something useful. Yes, raising to a first power and taking a first root isn’t doing anything. We just say we’re doing that for the sake of consistency. Raising to an infinitely large power, and then taking an infinitely great root, inspires angry glares. But we can make that idea rigorous. When we do it gives us something useful.

And here’s a new, amazing thing. We can still make “spheres” for these other distances. On a two-dimensional space, the “sphere” with this first-power-based distance will look like a diamond. The “sphere” with this infinite-power-based distance will look like a square. On a three-dimensional space the “sphere” with the first-power-based distance looks like a … well, more complicated, three-dimensional diamond. The “sphere” with the infinite-power-based distance looks like a box. The “balls” in all these cases look like what you expect from knowing the spheres.

As with the ordinary ideas of spheres and balls these shapes let us understand space. Spheres offer a natural path to understanding difficult points. Balls offer a natural path to understanding complicated shapes. The different ideas of distance change how we represent these, and how complicated they are, but not the fact that we can do it. And it allows us to start thinking of what spheres and balls for more abstract spaces, universes made of polynomials or formed of trig functions, might be. They’re difficult to visualize. But we have the grammar that lets us speak about them now.

And for a postscript: I also wrote about spheres and balls as part of my Set Tour a couple years ago. Here’s the essay about the N-sphere, although I didn’t exactly call it that. And here’s the essay about the N-ball, again not quite called that.

The Summer 2017 Mathematics A To Z: Integration


One more mathematics term suggested by Gaurish for the A-To-Z today, and then I’ll move on to a couple of others. Today’s is a good one.

Summer 2017 Mathematics A to Z, featuring a coati (it's kind of the Latin American raccoon) looking over alphabet blocks, with a lot of equations in the background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patreon for those able to support his work. He’s also open for commissions, starting from US$10.

Integration.

Stand on the edge of a plot of land. Walk along its boundary. As you walk the edge pay attention. Note how far you walk before changing direction, even in the slightest. When you return to where you started consult your notes. Contained within them is the area you circumnavigated.

If that doesn’t startle you perhaps you haven’t thought about how odd that is. You don’t ever touch the interior of the region. You never do anything like see how many standard-size tiles would fit inside. You walk a path that is as close to one-dimensional as your feet allow. And encoded in there somewhere is an area. Stare at that incongruity and you realize why integrals baffle the student so. They have a deep strangeness embedded in them.

We who do mathematics have always liked integration. They grow, in the western tradition, out of geometry. Given a shape, what is a square that has the same area? There are shapes it’s easy to find the area for, given only straightedge and compass: a rectangle? Easy. A triangle? Just as straightforward. A polygon? If you know triangles then you know polygons. A lune, the crescent-moon shape formed by taking a circular cut out of a circle? We can do that. (If the cut is the right size.) A circle? … All right, we can’t do that, but we spent two thousand years trying before we found that out for sure. And we can do some excellent approximations.

That bit of finding-a-square-with-the-same-area was called “quadrature”. The name survives, mostly in the phrase “numerical quadrature”. We use that to mean that we computed an integral’s approximate value, instead of finding a formula that would get it exactly. The otherwise obvious choice of “numerical integration” we use already. It describes computing the solution of a differential equation. We’re not trying to be difficult about this. Solving a differential equation is a kind of integration, and we need to do that a lot. We could recast a solving-a-differential-equation problem as a find-the-area problem, and vice-versa. But that’s bother, if we don’t need to, and so we talk about numerical quadrature and numerical integration.

Integrals are built on two infinities. This is part of why it took so long to work out their logic. One is the infinity of number; we find an integral’s value, in principle, by adding together infinitely many things. The other is an infinity of smallness. The things we add together are infinitesimally small. That we need to take things, each smaller than any number yet somehow not zero, and in such quantity that they add up to something, seems paradoxical. Their geometric origins had to be merged into that of arithmetic, of algebra, and it is not easy. Bishop George Berkeley made a steady name for himself in calculus textbooks by pointing this out. We have worked out several logically consistent schemes for evaluating integrals. They work, mostly, by showing that we can make the error caused by approximating the integral smaller than any margin we like. This is a standard trick, or at least it is, now that we know it.

That “in principle” above is important. We don’t actually work out an integral by finding the sum of infinitely many, infinitely tiny, things. It’s too hard. I remember in grad school the analysis professor working out by the proper definitions the integral of 1. This is as easy an integral as you can do without just integrating zero. He escaped with his life, but it was a close scrape. He offered the integral of x as a way to test our endurance, without actually doing it. I’ve never made it through that.

But we do integrals anyway. We have tools on our side. We can show, for example, that if a function obeys some common rules then we can use simpler formulas. Ones that don’t demand so many symbols in such tight formation. Ones that we can use in high school. Also, ones we can adapt to numerical computing, so that we can let machines give us answers which are near enough right. We get to choose how near is “near enough”. But then the machines decide how long we’ll have to wait to get that answer.

The greatest tool we have on our side is the Fundamental Theorem of Calculus. Even the name promises it’s the greatest tool we might have. This rule tells us how to connect integrating a function to differentiating another function. If we can find a function whose derivative is the thing we want to integrate, then we have a formula for the integral. It’s that function we found. What a fantastic result.

The trouble is it’s so hard to find functions whose derivatives are the thing we wanted to integrate. There are a lot of functions we can find, mind you. If we want to integrate a polynomial it’s easy. Sine and cosine and even tangent? Yeah. Logarithms? A little tedious but all right. A constant number raised to the power x? Also tedious but doable. A constant number raised to the power x2? Hold on there, that’s madness. No, we can’t do that.

There is a weird grab-bag of functions we can find these integrals for. They’re mostly ones we can find some integration trick for. An integration trick is some way to turn the integral we’re interested in into a couple of integrals we can do and then mix back together. A lot of a Freshman Calculus course is a heap of tricks we’ve learned. They have names like “u-substitution” and “integration by parts” and “trigonometric substitution”. Some of them are really exotic, such as turning a single integral into a double integral because that leads us to something we can do. And there’s something called “differentiation under the integral sign” that I don’t know of anyone actually using. People know of it because Richard Feynman, in his fun memoir What Do You Care What Other People Think: 250 Pages Of How Awesome I Was In Every Situation Ever, mentions how awesome it made him in so many situations. Mathematics, physics, and engineering nerds are required to read this at an impressionable age, so we fall in love with a technique no textbook ever mentions. Sorry.

I’ve written about all this as if we were interested just in areas. We’re not. We like calculating lengths and volumes and, if we dare venture into more dimensions, hypervolumes and the like. That’s all right. If we understand how to calculate areas, we have the tools we need. We can adapt them to as many or as few dimensions as we need. By weighting integrals we can do calculations that tell us about centers of mass and moments of inertial, about the most and least probable values of something, about all quantum mechanics.

As often happens, this powerful tool starts with something anyone might ponder: what size square has the same area as this other shape? And then think seriously about it.

The Set Tour, Part 4: Complex Numbers


C

The square root of negative one. Everybody knows it doesn’t exist; there’s no real number you can multiply by itself and get negative one out. But then sometime in algebra, deep in a section about polynomials, suddenly we come out and declare there is such a thing. It’s an “imaginary number” that we call “i”. It’s hard to blame students for feeling betrayed by this. To make it worse, we throw real and imaginary numbers together and call the result “complex numbers”. It’s as if we’re out to tease them for feeling confused.

It’s an important set of things, though. It turns up as the domain, or the range, of functions so often that one of the major fields of analysis is called, “Complex Analysis”. If the course listing allows for more words, it’s called “Analysis of Functions of a Complex Variable” or something like that. Despite the connotations of the word “complex”, though, the field is a delight. It’s considerably easier to understand than Real Analysis, the study of functions of mere real numbers. When there is a theorem that has a version in Real Analysis and a version in Complex Analysis, the Complex Analysis side is usually easier to prove and easier to understand. It’s uncanny.

The set of all complex numbers is denoted C, in parallel to the set of real numbers, R. To make it clear that we mean this set, and not some piddling little common set that might happen to share the name C, add a vertical stroke to the left of the letter. This is just as we add a vertical stroke to R to emphasize we mean the Real Numbers. We should approach the set with respect, removing our hats, thinking seriously about great things. It would look silly to add a second curve to C though, so we just add a straight vertical stroke on the left side of the letter C. This makes it look a bit like it’s an Old English typeface (the kind you call Gothic until you learn that means “sans serif”) pared down to its minimum.

Why do we teach people there’s no such thing as a square root of minus one, and then one day, teach them there is? Part of it is that whether there is a square root depends on your context. If you are interested only in the real numbers, there’s nothing that, squared, gives you minus one. This is exactly the way that it’s not possible to equally divide five objects between two people if you aren’t allowed to cut the objects in half. But if you are willing to allow half-objects to be things, then you can do what was previously forbidden. What you can do depends on what the rules you set out are.

And there’s surely some echo of the historical discovery of imaginary and complex numbers at work here. They were noticed when working out the roots of third- and fourth-degree polynomials. These can be done by way of formulas that nobody ever remembers because there are so many better things to remember. These formulas would sometimes require one to calculate a square root of a negative number, a thing that obviously didn’t exist. Except that if you pretended it did, you could get out correct answers, just as if these were ordinary numbers. You can see why this may be dubbed an “imaginary” number. The name hints at the suspicion with which it’s viewed. It’s much as “negative” numbers look like some trap to people who’re just getting comfortable with fractions.

It goes against the stereotype of mathematicians to suppose they’d accept working with something they don’t understand because the results are all right, afterwards. But, actually, mathematicians are willing to accept getting answers by any crazy method. If you have a plausible answer, you can test whether it’s right, and if all you really need this minute is the right answer, good.

But we do like having methods; they’re more useful than mere answers. And we can imagine this set called the complex numbers. They contain … well, all the possible roots, the solutions, of all polynomials. (The polynomials might have coefficients — the numbers in front of the variable — of integers, or rational numbers, or irrational numbers. If we already accept the idea of complex numbers, the coefficients can be complex numbers too.)

It’s exceedingly common to think of the complex numbers by starting off with a new number called “i”. This is a number about which we know nothing except that i times i equals minus one. Then we tend to think of complex numbers as “a real number plus i times another real number”. The first real number gets called “the real component”, and is usually denoted as either “a” or “x”. The second real number gets called “the imaginary component”, and is usually denoted as either “b” or “y”. Then the complex number is written “a + i*b” or “x + i*y”. Sometimes it’s written “a + b*i” or “x + y*i”; that’s a mere matter of house style. Don’t let it throw you.

Writing a complex number this way has advantages. Particularly, it makes it easy to see how one would add together (or subtract) complex numbers: “a + b*i + x + y*i” almost suggests that the sum should be “(a + x) + (b + y)*i”. What we know from ordinary arithmetic gives us guidance. And if we’re comfortable with binomials, then we know how to multiply complex numbers. Start with “(a + b*i) * (x + y*i)” and follow the distributive law. We get, first, “a*x + a*y*i + b*i*x + b*y*i*i”. But “i*i” equals minus one, so this is the same as “a*x + a*y*i + b*i*x – b*y”. Move the real components together, and move the imaginary components together, and we have “(a*x – b*y) + (a*y + b*x)*i”.

That’s the most common way of writing out complex numbers. It’s so common that Eric W Weisstein’s Mathworld encyclopedia even says that’s what complex numbers are. But it isn’t the only way to construct, or look at, complex numbers. A common alternate way to look at complex numbers is to match a complex number to a point on the plane, or if you prefer, a point in the set R2.

It’s surprisingly natural to think of the real component as how far to the right or left of an origin your complex number is, and to think of the imaginary component as how far above or below the origin it is. Much complex-number work makes sense if you think of complex numbers as points in space, or directions in space. The language of vectors trips us up only a little bit here. We speak of a complex number as corresponding to a point on the “complex plane”, just as we might speak of a real number as a point on the “(real) number line”.

But there are other descriptions yet. We can represent complex numbers as a pair of numbers with a scheme that looks like polar coordinates. Pick a point on the complex plane. We can say where that is by two points of information. The first is the amplitude, or magnitude: how far the point is from the origin. The second is the phase, or angle: draw the line segment connecting the origin and your point. What angle does that make with the positive horizontal axis?

This representation is called the “phasor” representation. It’s tolerably popular in physics and I hear tell of engineers liking it. We represent numbers then not as “x + i*y” but instead as “r * e”, with r the magnitude and θ the angle. “e” is the base of the natural logarithm, which you get very comfortable with if you do much mathematics or physics. And “i” is just what we’ve been talking about here. This is a pretty natural way to write about complex numbers that represent stuff that oscillates, such as alternating current or the probability function in quantum mechanics. A lot of stuff oscillates, if you study it through the right lens. So numbers that look like this keep creeping in, and into unexpected places. It’s quite easy to multiply numbers in phasor form — just multiply the magnitude parts, and add the angle parts — although addition and subtraction become a pain.

Mathematicians generally use the letter “z” to represent a complex-valued number whose identity is not known. As best I can tell, this is because we do think so much of a complex number as the sum “x + y*i”. So if we used familiar old “x” for an unknown number, it would carry the connotations of “the real component of our complex-valued number” and mislead the unwary mathematician. The connection is so common that a mathematician might carelessly switch between “z” and the real and imaginary components “x” and “y” without specifying that “z” is another way of writing “x + y*i”. A good copy editor or an alert student should catch this.

Complex numbers work very much like real numbers do. They add and multiply in natural-looking ways, and you can do subtraction and division just as well. You can take exponentials, and can define all the common arithmetic functions — sines and cosines, square roots and logarithms, integrals and differentials — on them just as well as you can with real numbers. And you can embed the real numbers within the complex numbers: if you have a real number x, you can match that perfectly with the complex number “x + 0*i”.

But that doesn’t mean complex numbers are exactly like the real numbers. For example, it’s possible to order the real numbers. You can say that the number “a” is less than the number “b”, and have that mean something. That’s not possible to do with complex numbers. You can’t say that “a + b*i” is less than, or greater than, “x + y*i” in a logically consistent way. You can say the magnitude of one complex-valued number is greater than the magnitude of another. But the magnitudes are real numbers. For all that complex numbers give us there are things they’re not good for.

Do You Have To Understand This?


At least around here school is starting up again and that’s got me thinking about learning mathematics. Particularly, it’s got me on the question: what should you do if you get stuck?

You will get stuck. Much of mathematics is learning a series series of arguments. They won’t all make sense, at least not at first. The arguments are almost certainly correct. If you’re reading something from a textbook, especially a textbook with a name like “Introductory” and that’s got into its seventh edition, the arguments can be counted on. (On the cutting edge of new mathematical discovery arguments might yet be uncertain.) But just because the arguments are right doesn’t mean you’ll see why they’re right, or even how they work at all.

So is it all right, if you’re stuck on a point, to just accept that this is something you don’t get, and move on, maybe coming back later?

Some will say no. Charles Dodgson — Lewis Carroll — took a rather hard line on this, insisting that one must study the argument until it makes sense. There are good reasons for this attitude. One is that while mathematics is made up of lots of arguments, it’s also made up of lots of very similar arguments. If you don’t understand the proof for (say) Green’s Theorem, it’s rather likely you won’t understand Stokes’s Theorem. And that’s coming in a couple of pages. Nor will you get a number of other theorems built on similar setups and using similar arguments. If you want to progress you have to get this.

Another strong argument is that much of mathematics is cumulative. Green’s Theorem is used as a building block to many other theorems. If you haven’t got an understanding of why that theorem works, then you probably also don’t have a clear idea why its follow-up theorems work. Before long the entire chapter is an indistinct mass of the not-quite-understood.

I’m less hard-line about this. I’m sure that shocks everyone who has never heard me express an opinion on anything, ever. But I have to judge the way I learn stuff to be the best possible way to learn stuff. And that includes, after a certain while of beating my head against the wall, moving on and coming back around later.

Why do I think that’s justified? Well, for one, because I’m not in school anymore. What mathematics I learn is because I find it beautiful or fun, and if I’m making myself miserable then I’m missing the point. This is a good attitude when all mathematics is recreational. It’s not so applicable when the exam is Monday, 9:50 am.

But sometimes it’s easier to understand something when you have experience using it. A simple statement of Green’s Theorem can make it sound too intimidating to be useful. When you see it in use, the “why” and “how” can be clearer. The motivation for the theorem can be compelling. The slightly grim joke we shared as majors was that we never really understood a course until we took its successor. This had dire implications for understanding what we would take senior year.

What about the cumulative nature of mathematical knowledge? That’s so and it’s not disputable. But it seems to me possible to accept “this statement is true, even if I’m not quite sure why” on the way to something that requires it. We always have to depend on things that are true that we can’t quite justify. I don’t even mean the axioms or the assumptions going into a theorem. I’m not sure how to characterize the kind of thing I mean.

I can give examples, though. When I was learning simple harmonic motion, the study of pendulums, I was hung up on a particular point. In describing how the pendulum swings, there’s a point where we substitute the sine of the angle of the pendulum for the measure of the angle of the pendulum. If the angle is small enough these numbers are just about the same. But … why? What justifies going from the exact sine of the angle to the approximation of the angle? Why then and not somewhere else? How do you know to do it there and not somewhere else?

I couldn’t get satisfying answers as a student. If I had refused to move on until I understood the process? Well, I might have earlier had an understanding that these sorts of approximations defy rigor. They’re about judgement, when to approximate and when to not. And they come from experience. You learn that approximating this will give you a solvable interesting problem. But approximating that leaves you too simple a problem to be worth studying. But I would have been quite delayed in understanding simple harmonic motion, which is at least as important. Maybe more important if you’re studying physics problems. There have to be priorities.

Is that right, though? I did get to what I thought was more important at the time. But the making of approximations is important, and I didn’t really learn it then. I’d accepted that we would do this and move on, and I did fill in that gap later. But it is so easy to never get back to the gap.

There’s hope if you’re studying something well-developed. By “well-developed” I mean something like “there are several good textbooks someone teaching this might choose from”. If a subject gets several good textbooks it usually has several independent proofs of anything interesting. If you’re stuck on one point, you usually can find it explained by a different chain of reasoning.

Sometimes even just a different author will help. I survived Introduction to Real Analysis (the study of why calculus works) by accepting that I just didn’t speak the textbook’s language. I borrowed an intro to real analysis textbook that was written in French. I don’t speak or really read French, though I had a couple years of it in middle and high school. But the straightforward grammar of mathematical French, and the common vocabulary, meant I was able to work through at least the harder things to understand. Of course, the difference might have been that I had to slowly consider every sentence to turn it from French text to English reading.

Probably there can’t be a universally right answer. We learn by different methods, for different goals, at different times. Whether it’s all right to skip the difficult part and come back later will depend. But I’d like to know what other people think, and more, what they do.

Avoiding Monsters and Non-Monsters


R J Lipton has an engaging post which starts from something that rather horrified the mathematics community when it was discovered: it’s a function which is continuous everywhere, but it’s not differentiable anywhere, no matter where you look. Continuity and differentiability are important concepts in mathematics, and have very precise definitions — motivated, in part, by things like the difficult function Lipton discusses here — but they can be put into ordinary language tolerably well.

If you think of a continuous function as being one whose graph you could draw without having to lift the pencil from the paper you’re not doing badly. Similarly a function is differentiable at a point if, from that point, you know what way the curve is going. This function, found by Karl Weierstrass, is one example of the breed.

Lipton points out the somewhat unsettling point that it’s much more common for functions to be like this than they are to be neat and well-behaved functions like y = 4x - 3 or even y = e^{-\frac{1}{2}x^2} , in much the same way a real number is much more likely to be irrational than it is to be rational, and Lipton goes on to give an example in an area of mathematics I’m not familiar with of the “pathological” case being the vastly more common one. Fortunately, it turns out, we can usually approximate the “pathological” or “monster” function with something easier to work with — a very fortunate thing or we could get done very few computations that reflected anything actually interesting — and that’s another thing we can credit Weirstrass with discovering.

Gödel's Lost Letter and P=NP

Unknown

Karl Weierstrass is often credited with the creation of modern analysis. In his quest for rigor and precision, he also created a shock when he presented his “monster” to the Berlin Academy in 1872.

Today I want to talk about the existence of strange and wonderful math objects—other monsters.

Weierstrass’s monster is defined as

$latex displaystyle f(x) = sum_{k=1}^{infty} a^{k} cos(b^{k}pi x), &fg=000000$

where $latex {0<a<1}&fg=000000$, $latex {b}&fg=000000$ is any odd integer, and $latex {ab>1+3pi/2}&fg=000000$. This function is continuous everywhere, but is differentiable nowhere.

The shock was that most mathematicians at the time thought that a continous function would have to be differentiable at a significant number of points. Some even had tried to prove this. While Weierstrass was the first to publish this, it was apparently known to others as early as 1830 that such functions existed.

This is a picture of the function—note its recursive structure, which is…

View original post 1,142 more words

What Is True Almost Everywhere?


I was reading a thermodynamics book (C Truesdell and S Bharatha’s The Concepts and Logic of Classical Thermodynamics as a Theory of Heat Engines, which is a fascinating read, for the field, and includes a number of entertaining, for the field, snipes at the stuff textbook writers put in because they’re just passing on stuff without rethinking it carefully), and ran across a couple proofs which mentioned equations that were true “almost everywhere”. That’s a construction it might be surprising to know even exists in mathematics, so, let me take a couple hundred words to talk about it.

The idea isn’t really exotic. You’ve seen a kind of version of it when you see an equation containing the note that there’s an exception, such as, \frac{\left(x - 1\right)^2}{\left(x - 1\right)} = x \mbox{ for } x \neq 1 . If the exceptions are tedious to list — because there are many of them to write down, or because they’re wordy to describe (the thermodynamics book mentioned the exceptions were where a particular set of conditions on several differential equations happened simultaneously, if it ever happened) — and if they’re unlikely to come up, then, we might just write whatever it is we want to say and add an “almost everywhere”, or for shorthand, put an “ae” after the line. This “almost everywhere” will, except in freak cases, propagate through the rest of the proof, but I only see people writing that when they’re students working through the concept. In publications, the “almost everywhere” gets put in where the condition first stops being true everywhere-everywhere and becomes only almost-everywhere, and taken as read after that.

I introduced this with an equation, but it can apply to any relationship: something is greater than something else, something is less than or equal to something else, even something is not equal to something else. (After all, “x \neq -x is true almost everywhere, but there is that nagging exception.) A mathematical proof is normally about things which are true. Whether one thing is equal to another is often incidental to that.

What’s meant by “unlikely to come up” is actually rigorously defined, which is why we can get away with this. It’s otherwise a bit daft to think we can just talk about things that are true except where they aren’t and not even post warnings about where they’re not true. If we say something is true “almost everywhere” on the real number line, for example, that means that the set of exceptions has a total length of zero. So if the only exception is where x equals 1, sure enough, that’s a set with no length. Similarly if the exceptions are where x equals positive 1 or negative 1, that’s still a total length of zero. But if the set of exceptions were all values of x from 0 to 4, well, that’s a set of total length 4 and we can’t say “almost everywhere” for that.

This is all quite like saying that it can’t happen that if you flip a fair coin infinitely many times it will come up tails every single time. It won’t, even though properly speaking there’s no reason that it couldn’t. If something is true almost everywhere, then your chance of picking an exception out of all the possibilities is about like your chance of flipping that fair coin and getting tails infinitely many times over.

Augustin-Louis Cauchy’s birthday


The Maths History feed on Twitter mentioned that the 21st of August was the birthday of Augustin-Louis Cauchy, who lived from 1789 to 1857. His is one of those names you get to know very well when you’re a mathematics major, since he published 789 papers in his life, and did very well at publishing important papers, ones that established concepts people would actually use.

He’s got an intriguing biography, as he lived (mostly) in France during the time of the Revolution, the Directorate, Napoleon, the Bourbon Restoration, the July Monarchy, the Revolutions of 1848, the Second Republic, and the Second Empire, and had a career which got inextricably tangled with the political upheavals of the era. I note that, according to the MacTutor biography linked to earlier this paragraph, he followed the deposed King Charles X to Prague in order to tutor his grandson, but might not have had the right temperament for it: at least once he got annoyed at the grandson’s confusion and screamed and yelled, with the Queen, Marie Thérèse, sometimes telling him, “too loud, not so loud”. But we’ve all had students that frustrate us.

Cauchy’s name appears on many theorems and principles and definitions of interesting things — I just checked Mathworld and his name returned 124 different items — though I’ll admit I’m stumped how to describe what the Cauchy-Frobenius Lemma is without scaring readers off. So let me talk about something simpler.

Continue reading “Augustin-Louis Cauchy’s birthday”

Real Experiments with Grading Mathematics


[ On an unrelated note I see someone’s been going through and grading my essays. I thank you, whoever you are; I’ll take any stars I can get. And I’m also delighted to be near to my 9,500th page view; I’ll try to find something neat to do for either 9,999 or 10,000, whichever feels like the better number. ]

As a math major I staggered through a yearlong course in Real Analysis. My impression is this is the reaction most math majors have to it, as it’s the course in which you study why it is that Calculus works, so it’s everything that’s baffling about Calculus only moreso. I’d be interested to know what courses math majors consider their most crushingly difficult; I’d think only Abstract Algebra could rival Real Analysis for the position.

While I didn’t fail, I did have to re-take Real Analysis in graduate school, since you can’t go on to many other important courses without mastering it. Remarkably, courses that sound like they should be harder — Complex Analysis, Functional Analysis and their like — often feel easier. Possibly this is because the most important tricks to studying these fields are all introduced in Real Analysis so that by the fourth semester around the techniques are comfortably familiar. Or Functional Analysis really is easier than Real Analysis.

The second time around went quite well, possibly because a class really is easier the second time around (I don’t have the experience in re-taking classes to compare it to) or possibly because I clicked better with the professor, Dr Harry McLaughlin at Rensselaer Polytechnic Institute. Besides giving what I think might be the best homework assignment I ever received, he also used a grading scheme that I really responded to well, and that I’m sorry I haven’t been able to effectively employ when I’ve taught courses.

His concept — I believe he used it for all his classes, but certainly he put it to use in Real Analysis — came from as I remember it his being bored with the routine of grading weekly homeworks and monthly exams and a big final. Instead, students could put together a portfolio, showing their mastery of different parts of the course’s topics. The grade for the course was what he judged your mastery of the subject was, based on the breadth and depth of your portfolio work.

Any slightly different way of running class is a source of anxiety, and he did some steps to keep it from being too terrifying a departure. First is that you could turn in a portfolio for a review as you liked mid-course and he’d say what he felt was missing or inadequate or which needed reworking. I believe his official policy was that you could turn it in as often as you liked for review, though I wonder what he would do for the most grade-grabby students, the ones who wrestle obsessively for every half-point on every assignment, and who might turn in portfolio revisions on an hourly basis. Maybe he had a rule about doing at most one review a week per student or something like that.

The other is that he still gave out homework assignments and offered exams, and if you wanted you could have them graded as in a normal course, with the portfolio grade being what the traditional course grade would be. So if you were just too afraid to try this portfolio scheme you could just pretend the whole thing was one of those odd jokes professors will offer and not worry.

I really liked this system and was sorry I didn’t have the chance to take more courses from him. The course work felt easier, no doubt partly because there was no particular need to do homework at the last minute or cram for an exam, and if you just couldn’t get around to one assignment you didn’t need to fear a specific and immediate grade penalty. Or at least the penalty as you estimated it was something you could make up by thinking about the material and working on a similar breadth of work to the assignments and exams offered.

I regret that I haven’t had the courage to try this system on a course I was teaching, although I have tried a couple of non-traditional grading schemes. I’m always interested in hearing of more, though, in case I do get back into teaching and feel secure enough to try something odd.

What Numbers Equal Zero?


I want to give some examples of showing numbers are equal by showing the difference between them is ε. It’s a fairly abstruse idea but when it works amazing things become possible.

The easy example, although one that produces strong resistance, is showing that the number 1 is equal to the number 0.9999…. But here I have to say what I mean by that second number. It’s obvious to me that I mean a number formed by putting a decimal point up, and then filling in a ‘9’ to every digit past the decimal, repeating forever and ever without end. That’s a description so easy to grasp it looks obvious. I can give a more precise, less intuitively obvious, description, though, which makes it easier to prove what I’m going to be claiming.

Continue reading “What Numbers Equal Zero?”

Introducing a Very Small Number


Last time I talked mathematics I introduced the idea of using some little tolerated difference between quantities. This tolerated difference has an immediately obvious and useful real-world interpretation: if we measure two things and they differ by less than that amount, we’d say they’re equal, or close enough to equal for whatever it is we’re doing. And it has great use in the nice exact proofs of some sophisticated mathematical concepts, most of which I think I can get to without introducing equations, which will make everyone happy. Readers like reading things that don’t have equations (folklore has it that every equation, other than E = mc2, cuts book sales in half, although I don’t remember seeing that long-established folklore before Stephen Hawking claimed it in A Brief History Of Time, which sold a hundred million billion trillion copies). Writers like not putting in equations because web standards have evolved so that there’s not only no good ways of putting in equations, but there aren’t even ways that rate as only lousy. But we can make do.

The tolerated difference is usually written as ε, the Greek lower-case e, at least if we are working on calculus or analysis at least, and it’s typically taken to mean some small number. The use seems to go back to Augustin-Louis Cauchy, who lived from 1789 to 1857, who paired it with the symbol δ to talk about small quantities. He seems to have meant δ the Greek lowercase d, to be a small number representing a difference, and ε as a small number representing an error, and the symbols have been with us ever since.

Cauchy’s an interesting person, although it seems sometimes that every mathematician who lived in France anytime around the Revolution and the era of Napoleon was interesting. He was certainly prolific: the MacTutor biography credits him with 789 published papers, and they covered a wide swath of mathematics: solid geometry, polygonal numbers, waves, inelastic shocks, astronomy, differential equations, matrices, and a powerful tool called the Fourier transform. This is why mathematics majors spend about two years running across all sorts of new things named after Cauchy — the Cauchy-Schwarz inequality, Cauchy sequences, Cauchy convergence, Cauchy-Reimann equations, Cauchy-Kovalevskaya existence, Cauchy integrals, and more — until they almost get interested enough to look up something about who he was. For a while Cauchy was tutor to the grandson of France’s King Charles X, but apparently Cauchy had a tendency to get annoyed and start screaming at the uninterested prince. He has two lunar features (a crater and an escarpment) named for him, indicating, I suppose, that Charles X wasn’t asked for a reference.

Little Enough Differences


It’s as far from my workplace to home as it is from my workplace to my sister-in-law’s home. That’s a fair coincidence, but nobody thinks it’s precisely true. I don’t think it’s exactly true myself, but let me try to make it a little interesting. I’d be surprised if it were the same number of miles from work to either home. I’d be shocked if it were the same number of miles down to the tenth of the mile. To be precisely the same distance, down to the n-th decimal point, would be just impossibly unlikely. But I’d still make the claim, and most people would accept it, and everyone knows what the claim is supposed to mean and why it’s true. What I mean, and what I imagine anyone hearing the claim takes me to mean, is that the difference between these two quantities, the distance from work to home and the distance from work to my sister-in-law’s home, is smaller than some tolerable margin for error.

That’s a good definition of equality between two things in the practical world. It applies mathematically as well. A good number of proofs, particularly the ones that go into proving calculus works, amount to showing that there is some number in which we are interested, and there is some number which we are actually able to calculate, and the difference between those two numbers is less than some tolerated difference. If we’re just looking for an approximate answer, that’s about where we stop. If we want to do prove something rigorously and exactly, then we use a slightly different trick.

Instead of proving that the difference is smaller than some tolerated error — say, that the distance to these two homes is the same plus or minus two miles, or that these two cups of soda have the same amount of drink plus or minus a half-ounce, or so — what we do is prove that we can pick some arbitrary small tolerated difference, and find that the number we want and the number we can calculate must be smaller than that tolerated difference. But that tolerated difference might be any positive number. We weren’t given it up front. If the difference is smaller than any positive number, then, we can, at least in imagination, make sure the difference is smaller than every positive number, however tiny. The conclusion, then, is that if the difference between what-we-want and what-we-have is smaller than every positive number, then the difference must be zero. The two quantities have to be equal.

That probably read fairly smoothly. It’s worth going over and thinking about closely because, at least in my experience, that’s one of the spots where calculus and analysis gets really confusing. It’s going to deserve some examples.