## A Summer 2017 Mathematics A To Z Appendix: Are Colbert Numbers A Thing?

This is something I didn’t have space for in the proper A To Z sequence. And it’s more a question than exposition anyway. But what the heck: I like excuses to use my nice shiny art package.

I was looking for mathematics topics I might write about if I didn’t get requests for particular letters. ‘C’ came up ‘cohomology’, but what if it hadn’t? I found an interesting-looking listing at MathWorld’s dictionary. The Colbert Numbers sounded interesting. this is a collection of very long prime numbers. Each of them has at least a million decimal digits. They relate to a conjecture by Wacław Sierpiński, who’s gone months without a mention around here.

The conjecture is about whole numbers that are equal to $k \cdot 2^n + 1$ for some whole numbers ‘k’ and ‘n’. Are there choices of ‘k’ for which, no matter what positive whole number ‘n’ you pick, $k \cdot 2^n + 1$ is never a prime number? (‘k’ has to meet some extra conditions.) I’m not going to explain why this is interesting because I don’t know. It’s a number theory question. They’re all strange and interesting questions in their ways. If I were writing an essay about Colbert Numbers I’d have figured that out.

Thing is we believe we know what the smallest possible ‘k’ is. We think that the smallest possible Sierpiński number is 78,557. We don’t have this quite proved, though. There are some numbers that might be prime numbers of the form $k \cdot 2^n + 1$ for some ‘k’ and some ‘n’. There was a set of seventeen possible candidates of numbers smaller than 78,557 that might be Sierpiński numbers. If those candidates could be ruled out then we’d have proved 78,557 was it. That’s easy to imagine. Find for each of them a number ‘n’ so that the candidate times 2n plus one was a prime number. But finding big prime numbers is hard. This turned into a distributed-computing search. This would evaluate these huge numbers and find whether they were prime numbers. (The project, “Seventeen Or Bust”, was destroyed by computer failure in 2016. Attempts to verify the work done, and continue it, are underway.) There are six possible candidates left.

MathWorld says that the seventeen cases that had to be checked were named Colbert Numbers. This was in honor of Stephen T Colbert, the screamingly brilliant character host of The Colbert Report. (Ask me sometime about the Watership Down anecdote.) It’s a plausible enough claim. Part of Stephen T Colbert’s persona was demanding things be named for him. And he’d take appropriate delight in having minor but interesting things named for him. Treadmills on the space station. Minor-league hockey team mascots. A class of numbers for proving a 60-year-old mathematical conjecture is exactly the sort of thing that would get his name and attention.

But here’s my problem. Who named them Colbert Numbers? MathWorld doesn’t say. Wikipedia doesn’t say. The best I can find with search engines doesn’t say. When were they named Colbert Numbers? Again, no answers. Poking around fan sites for The Colbert Report — where you’d expect the naming of stuff in his honor to be mentioned — doesn’t turn up anything. Does anyone call them Colbert Numbers? I mean outside people who’ve skimmed MathWorld’s glossary for topics with intersting names?

I don’t mean to sound overly skeptical here. But, like, I know there’s a class of science fiction fans who like to explain how telerobotics engineers name their hardware “waldoes”. This is in honor of a character in a Robert Heinlein story I never read either. I’d accepted that without much interest until Google’s US Patent search became a thing. One afternoon I noticed that if telerobotics engineers do call their hardware “waldoes” they never use the term in patent applications. Is it possible that someone might have slipped a joke in to Wikipedia or something and had it taken seriously? Certainly. What amounts to a Wikipedia prank briefly gave the coati — an obscure-to-the-United-States animal that I like — the nickname of “Brazilian aardvark”. There are surely other instances of Wikipedia-generated pranks becoming “real” things.

So I would like to know. Who named Colbert Numbers that, and when, and were they — as seems obvious, but you never know — named for Stephen T Colbert? Or is this an example of Wikiality, the sense that reality can be whatever enough people decide to believe is true, as described by … well, Stephen T Colbert?

## The Summer 2017 Mathematics A To Z: What I Learned

I’ve in the past done essays about what I’ve taken away from an A to Z project. Please indulge me with this.

The big thing I learned from the Summer 2017 A To Z, besides that it would have been a little better started two weeks earlier? (I couldn’t have started it two weeks earlier. July was a frightfully busy month. As it was I was writing much too close to deadline. Starting sooner would have been impossible.)

Category theory, mostly. Many of the topics requested had some category theory component. Next would be tensors and tensor-related subjects. This is exciting and dangerous. Neither’s a field I know well. Both are fields I want to know better. It’s a truism that to really learn an advanced subject you have to teach a course in it. That’s how I picked up what I know about signal processing and about numerical quantum mechanics. Still, it’s perilous, especially when I would realize the subject asked-for wasn’t what I faintly remembered had been asked for, and that I’d been composing an essay for in my head for a week already.

Also, scheduling. The past A To Z sequences were relatively low-stress things for me. I could get as many as six essays ahead of what I needed to post. That’s so comfortable a place to be. This time around, I was working much closer to deadline, with some pieces needing major rewriting as few as fifteen hours before my posting hour. More needed minor editing the day of posting. There’s several causes for this. But the biggest is that I wrote much longer this time. Past A To Z sequences could have at least a couple essays that were a few paragraphs. This time around I don’t think any piece came in at under a thousand words, and the default was getting to be around 1500 words. I don’t think I broke 2,000 words, but I came close.

That’s fine, because the essays came out great. This has been the A To Z sequence I’m proudest of, so far. They’re the ones that make me think my father’s ever-supportive assurance that I could put these into a book that people would give me actual money for can be right. Still, the combination of writing about stuff I had to research more first and writing longer pieces made the workload more than I’d figured on. When I get to doing this again — and I will, when the exhaustion’s faded enough from memory — I will need more lead time between asking for topics and starting to write. And will need to freeze topics farther in advance than I did this time. I still suspect my father’s too supportive to say I could get money for this. But it’s a less unrealistic thought than I had figured before.

Also learned: hire an artist! I got a better-banner-than-I-paid-for from Thomas K Dye for this series. His work added a snappy bit of visual appeal to my sentence heaps. I’d also gotten from him a banner for the Why Stuff Can Orbit sequence, which I mean to resume now that I have some more writing time. But the banners give a needed bit of unity to my writing, and the automatically-generated Twitter announcements of these posts, and that’s helped the look of the place. Something like nine-tenths of the people I know online are visual artists of one kind or another. (The rest are writers, my siblings, and my mother.) I should be making reasons to commission them. For example, if I want to describe something too complicated to do in words alone I should turn it over to them. Remember, I don’t do the few-pictures thing because I’m a good writer. It’s because I’m too lazy to make an illustration myself. A bit of money can be as good as effort.

Speaking of effort, between the A To Z essays and Reading the Comics posts, and a couple miscellaneous other pieces, I wrote five to six thousand words per week for two months. That’s probably not sustainable indefinitely, but a slightly lower pace? And for a specific big project? It’s good to know that’s something I can do, albeit possibly by putting this blog on hold.

Learned to my personal everlasting humiliation: I spelled “Klein Bottle” wrong. Fortunately, I only spelled it “Klien” in the title of the essay, so it sits there in my tweet publicizing the post and in the full-length URL to the post, forever. I’ll recover, I hope.

## The Summer 2017 Mathematics A To Z: What I Talked About

This is just a list of all the topics I covered in the Summer 2017 A To Z.

And if those aren’t enough essays for you, here’s a collection of all the topics from the three previous A To Z sequences that I’ve done. Thank you, and thanks for reading and for challenging me to write.

## The Summer 2017 Mathematics A To Z: Zeta Function

Today Gaurish, of For the love of Mathematics, gives me the last subject for my Summer 2017 A To Z sequence. And also my greatest challenge: the Zeta function. The subject comes to all pop mathematics blogs. It comes to all mathematics blogs. It’s not difficult to say something about a particular zeta function. But to say something at all original? Let’s watch.

# Zeta Function.

The spring semester of my sophomore year I had Intro to Complex Analysis. Monday Wednesday 7:30; a rare evening class, one of the few times I’d eat dinner and then go to a lecture hall. There I discovered something strange and wonderful. Complex Analysis is a far easier topic than Real Analysis. Both are courses about why calculus works. But why calculus for complex-valued numbers works is a much easier problem than why calculus for real-valued numbers works. It’s dazzling. Part of this is that Complex Analysis, yes, builds on Real Analysis. So Complex can take for granted some things that Real has to prove. I didn’t mind. Given the way I crashed through Intro to Real Analysis I was glad for a subject that was, relatively, a breeze.

As we worked through Complex Variables and Applications so many things, so very many things, got to be easy. The basic unit of complex analysis, at least as we young majors learned it, was in contour integrals. These are integrals whose value depends on the values of a function on a closed loop. The loop is in the complex plane. The complex plane is, well, your ordinary plane. But we say the x-coordinate and the y-coordinate are parts of the same complex-valued number. The x-coordinate is the real-valued part. The y-coordinate is the imaginary-valued part. And we call that summation ‘z’. In complex-valued functions ‘z’ serves the role that ‘x’ does in normal mathematics.

So a closed loop is exactly what you think. Take a rubber band and twist it up and drop it on the table. That’s a closed loop. Suppose you want to integrate a function, ‘f(z)’. If you can always take its derivative on this loop and on the interior of that loop, then its contour integral is … zero. No matter what the function is. As long as it’s “analytic”, as the terminology has it. Yeah, we were all stunned into silence too. (Granted, mathematics classes are usually quiet, since it’s hard to get a good discussion going. Plus many of us were in post-dinner digestive lulls.)

Integrating regular old functions of real-valued numbers is this tedious process. There’s sooooo many rules and possibilities and special cases to consider. There’s sooooo many tricks that get you the integrals of some functions. And then here, with complex-valued integrals for analytic functions, you know the answer before you even look at the function.

As you might imagine, since this is only page 113 of a 341-page book there’s more to it. Most functions that anyone cares about aren’t analytic. At least they’re not analytic everywhere inside regions that might be interesting. There’s usually some points where an interesting function ‘f(z)’ is undefined. We call these “singularities”. Yes, like starships are always running into. Only we rarely get propelled into other universes or other times or turned into ghosts or stuff like that.

So much of the rest of the course turns into ways to avoid singularities. Sometimes you can spackle them over. This is when the function happens not to be defined somewhere, but you can see what it ought to be. Sometimes you have to do something more. This turns into a search for “removable” singularities. And this does something so brilliant it looks illicit. You modify your closed loop, so that it comes up very close, as close as possible, to the singularity, but studiously avoids it. Follow this game of I’m-not-touching-you right and you can turn your integral into two parts. One is the part that’s equal to zero. The other is the part that’s a constant times whatever the function is at the singularity you’re removing. And that ought to be easy to find the value for. (Being able to find a function’s value doesn’t mean you can find its derivative.)

Those tricks were hard to master. Not because they were hard. Because they were easy, in a context where we expected hard. But after that we got into how to move singularities. That is, how to do a change of variables that moved the singularities to where they’re more convenient for some reason. How could this be more convenient? Because of chapter five, “Series”. In regular old calculus we learn how to approximate well-behaved functions with polynomials. In complex-variable calculus, we learn the same thing all over again. They’re polynomials of complex-valued variables, but it’s the same sort of thing. And not just polynomials, but things that look like polynomials except they’re powers of $\frac{1}{z}$ instead. These open up new ways to approximate functions, and to remove singularities from functions.

And then we get into transformations. These are about turning a problem that’s hard into one that’s easy. Or at least different. They’re a change of variable, yes. But they also change what exactly the function is. This reshuffles the problem. Makes for a change in singularities. Could make ones that are easier to work with.

One of the useful, and so common, transforms is called the Laplace-Stieltjes Transform. (“Laplace” is said like you might guess. “Stieltjes” is said, or at least we were taught to say it, like “Stilton cheese” without the “ton”.) And it tends to create functions that look like a series, the sum of a bunch of terms. Infinitely many terms. Each of those terms looks like a number times another number raised to some constant times ‘z’. As the course came to its conclusion, we were all prepared to think about these infinite series. Where singularities might be. Which of them might be removable.

These functions, these results of the Laplace-Stieltjes Transform, we collectively call ‘zeta functions’. There are infinitely many of them. Some of them are relatively tame. Some of them are exotic. One of them is world-famous. Professor Walsh — I don’t mean to name-drop, but I discovered the syllabus for the course tucked in the back of my textbook and I’m delighted to rediscover it — talked about it.

That world-famous one is, of course, the Riemann Zeta function. Yes, that same Riemann who keeps turning up, over and over again. It looks simple enough. Almost tame. Take the counting numbers, 1, 2, 3, and so on. Take your ‘z’. Raise each of the counting numbers to that ‘z’. Take the reciprocals of all those numbers. Add them up. What do you get?

A mass of fascinating results, for one. Functions you wouldn’t expect are concealed in there. There’s strips where the real part is zero. There’s strips where the imaginary part is zero. There’s points where both the real and imaginary parts are zero. We know infinitely many of them. If ‘z’ is -2, for example, the sum is zero. Also if ‘z’ is -4. -6. -8. And so on. These are easy to show, and so are dubbed ‘trivial’ zeroes. To say some are ‘trivial’ is to say that there are others that are not trivial. Where are they?

Professor Walsh explained. We know of many of them. The nontrivial zeroes we know of all share something in common. They have a real part that’s equal to 1/2. There’s a zero that’s at about the number $\frac{1}{2} - \imath 14.13$. Also at $\frac{1}{2} + \imath 14.13$. There’s one at about $\frac{1}{2} - \imath 21.02$. Also about $\frac{1}{2} + \imath 21.02$. (There’s a symmetry, you maybe guessed.) Every nontrivial zero we’ve found has a real component that’s got the same real-valued part. But we don’t know that they all do. Nobody does. It is the Riemann Hypothesis, the great unsolved problem of mathematics. Much more important than that Fermat’s Last Theorem, which back then was still merely a conjecture.

What a prospect! What a promise! What a way to set us up for the final exam in a couple of weeks.

I had an inspiration, a kind of scheme of showing that a nontrivial zero couldn’t be within a given circular contour. Make the size of this circle grow. Move its center farther away from the z-coordinate $\frac{1}{2} + \imath 0$ to match. Show there’s still no nontrivial zeroes inside. And therefore, logically, since I would have shown nontrivial zeroes couldn’t be anywhere but on this special line, and we know nontrivial zeroes exist … I leapt enthusiastically into this project. A little less enthusiastically the next day. Less so the day after. And on. After maybe a week I went a day without working on it. But came back, now and then, prodding at my brilliant would-be proof.

The Riemann Zeta function was not on the final exam, which I’ve discovered was also tucked into the back of my textbook. It asked more things like finding all the singular points and classifying what kinds of singularities they were for functions like $e^{-\frac{1}{z}}$ instead. If the syllabus is accurate, we got as far as page 218. And I’m surprised to see the professor put his e-mail address on the syllabus. It was merely “bwalsh@math”, but understand, the Internet was a smaller place back then.

I finished the course with an A-, but without answering any of the great unsolved problems of mathematics.

## The Summer 2017 Mathematics A To Z: Young Tableau

I never heard of today’s entry topic three months ago. Indeed, three weeks ago I was still making guesses about just what Gaurish, author of For the love of Mathematics,, was asking about. It turns out to be maybe the grand union of everything that’s ever been in one of my A To Z sequences. I overstate, but barely.

# Young Tableau.

The specific thing that a Young Tableau is is beautiful in its simplicity. It could almost be a recreational mathematics puzzle, except that it isn’t challenging enough.

Start with a couple of boxes laid in a row. As many or as few as you like.

Now set another row of boxes. You can have as many as the first row did, or fewer. You just can’t have more. Set the second row of boxes — well, your choice. Either below the first row, or else above. I’m going to assume you’re going below the first row, and will write my directions accordingly. If you do things the other way you’re following a common enough convention. I’m leaving it on you to figure out what the directions should be, though.

Now add in a third row of boxes, if you like. Again, as many or as few boxes as you like. There can’t be more than there are in the second row. Set it below the second row.

And a fourth row, if you want four rows. Again, no more boxes in it than the third row had. Keep this up until you’ve got tired of adding rows of boxes.

How many boxes do you have? I don’t know. But take the numbers 1, 2, 3, 4, 5, and so on, up to whatever the count of your boxes is. Can you fill in one number for each box? So that the numbers are always increasing as you go left to right in a single row? And as you go top to bottom in a single column? Yes, of course. Go in order: ‘1’ for the first box you laid down, then ‘2’, then ‘3’, and so on, increasing up to the last box in the last row.

Can you do it in another way? Any other order?

Except for the simplest of arrangements, like a single row of four boxes or three rows of one box atop another, the answer is yes. There can be many of them, turns out. Seven boxes, arranged three in the first row, two in the second, one in the third, and one in the fourth, have 35 possible arrangements. It doesn’t take a very big diagram to get an enormous number of possibilities. Could be fun drawing an arbitrary stack of boxes and working out how many arrangements there are, if you have some time in a dull meeting to pass.

Let me step away from filling boxes. In one of its later, disappointing, seasons Futurama finally did a body-swap episode. The gimmick: two bodies could only swap the brains within them one time. So would it be possible to put Bender’s brain back in his original body, if he and Amy (or whoever) had already swapped once? The episode drew minor amusement in mathematics circles, and a lot of amazement in pop-culture circles. The writer, a mathematics major, found a proof that showed it was indeed always possible, even after many pairs of people had swapped bodies. The idea that a theorem was created for a TV show impressed many people who think theorems are rarer and harder to create than they necessarily are.

It was a legitimate theorem, and in a well-developed field of mathematics. It’s about permutation groups. These are the study of the ways you can swap pairs of things. I grant this doesn’t sound like much of a field. There is a surprising lot of interesting things to learn just from studying how stuff can be swapped, though. It’s even of real-world relevance. Most subatomic particles of a kind — electrons, top quarks, gluons, whatever — are identical to every other particle of the same kind. Physics wouldn’t work if they weren’t. What would happen if we swap the electron on the left for the electron on the right, and vice-versa? How would that change our physics?

A chunk of quantum mechanics studies what kinds of swaps of particles would produce an observable change, and what kind of swaps wouldn’t. When the swap doesn’t make a change we can describe this as a symmetric operation. When the swap does make a change, that’s an antisymmetric operation. And — the Young Tableau that’s a single row of two boxes? That matches up well with this symmetric operation. The Young Tableau that’s two rows of a single box each? That matches up with the antisymmetric operation.

How many ways could you set up three boxes, according to the rules of the game? A single row of three boxes, sure. One row of two boxes and a row of one box. Three rows of one box each. How many ways are there to assign the numbers 1, 2, and 3 to those boxes, and satisfy the rules? One way to do the single row of three boxes. Also one way to do the three rows of a single box. There’s two ways to do the one-row-of-two-boxes, one-row-of-one-box case.

What if we have three particles? How could they interact? Well, all three could be symmetric with each other. This matches the first case, the single row of three boxes. All three could be antisymmetric with each other. This matches the three rows of one box. Or you could have two particles that are symmetric with each other and antisymmetric with the third particle. Or two particles that are antisymmetric with each other but symmetric with the third particle. Two ways to do that. Two ways to fill in the one-row-of-two-boxes, one-row-of-one-box case.

This isn’t merely a neat, aesthetically interesting coincidence. I wouldn’t spend so much time on it if it were. There’s a matching here that’s built on something meaningful. The different ways to arrange numbers in a set of boxes like this pair up with a select, interesting set of matrices whose elements are complex-valued numbers. You might wonder who introduced complex-valued numbers, let alone matrices of them, into evidence. Well, who cares? We’ve got them. They do a lot of work for us. So much work they have a common name, the “symmetric group over the complex numbers”. As my leading example suggests, they’re all over the place in quantum mechanics. They’re good to have around in regular physics too, at least in the right neighborhoods.

These Young Tableaus turn up over and over in group theory. They match up with polynomials, because yeah, everything is polynomials. But they turn out to describe polynomial representations of some of the superstar groups out there. Groups with names like the General Linear Group (square matrices), or the Special Linear Group (square matrices with determinant equal to 1), or the Special Unitary Group (that thing where quantum mechanics says there have to be particles whose names are obscure Greek letters with superscripts of up to five + marks). If you’d care for more, here’s a chapter by Dr Frank Porter describing, in part, how you get from Young Tableaus to the obscure baryons.

Porter’s chapter also lets me tie this back to tensors. Tensors have varied ranks, the number of different indicies you can have on the things. What happens when you swap pairs of indices in a tensor? How many ways can you swap them, and what does that do to what the tensor describes? Please tell me you already suspect this is going to match something in Young Tableaus. They do this by way of the symmetries and permutations mentioned above. But they are there.

As I say, three months ago I had no idea these things existed. If I ever ran across them it was from seeing the name at MathWorld’s list of terms that start with ‘Y’. The article shows some nice examples (with each rows a atop the previous one) but doesn’t make clear how much stuff this subject runs through. I can’t fit everything in to a reasonable essay. (For example: the number of ways to arrange, say, 20 boxes into rows meeting these rules is itself a partition problem. Partition problems are probability and statistical mechanics. Statistical mechanics is the flow of heat, and the movement of the stars in a galaxy, and the chemistry of life.) I am delighted by what does fit.

## The Summer 2017 Mathematics A To Z: X

We come now almost to the end of the Summer 2017 A To Z. Possibly also the end of all these A To Z sequences. Gaurish of, For the love of Mathematics, proposed that I talk about the obvious logical choice. The last promising thing I hadn’t talked about. I have no idea what to do for future A To Z’s, if they’re even possible anymore. But that’s a problem for some later time.

# X.

Some good advice that I don’t always take. When starting a new problem, make a list of all the things that seem likely to be relevant. Problems that are worth doing are usually about things. They’ll be quantities like the radius or volume of some interesting surface. The amount of a quantity under consideration. The speed at which something is moving. The rate at which that speed is changing. The length something has to travel. The number of nodes something must go across. Whatever. This all sounds like stuff from story problems. But most interesting mathematics is from a story problem; we want to know what this property is like. Even if we stick to a purely mathematical problem, there’s usually a couple of things that we’re interested in and that we describe. If we’re attacking the four-color map theorem, we have the number of territories to color. We have, for each territory, the number of territories that touch it.

Next, select a name for each of these quantities. Write it down, in the table, next to the term. The volume of the tank is ‘V’. The radius of the tank is ‘r’. The height of the tank is ‘h’. The fluid is flowing in at a rate ‘r’. The fluid is flowing out at a rate, oh, let’s say ‘s’. And so on. You might take a moment to go through and think out which of these variables are connected to which other ones, and how. Volume, for example, is surely something to do with the radius times something to do with the height. It’s nice to have that stuff written down. You may not know the thing you set out to solve, but you at least know you’ve got this under control.

I recommend this. It’s a good way to organize your thoughts. It establishes what things you expect you could know, or could want to know, about the problem. It gives you some hint how these things relate to each other. It sets you up to think about what kinds of relationships you figure to study when you solve the problem. It gives you a lifeline, when you’re lost in a sea of calculation. It’s reassurance that these symbols do mean something. Better, it shows what those things are.

I don’t always do it. I have my excuses. If I’m doing a problem that’s very like one I’ve already recently done, the things affecting it are probably the same. The names to give these variables are probably going to be about the same. Maybe I’ll make a quick sketch to show how the parts of the problem relate. If it seems like less work to recreate my thoughts than to write them down, I skip writing them down. Not always good practice. I tell myself I can always go back and do things the fully right way if I do get lost. So far that’s been true.

So, the names. Suppose I am interested in, say, the length of the longest rod that will fit around this hallway corridor. Then I am in a freshman calculus book, yes. Fine. Suppose I am interested in whether this pinball machine can be angled up the flight of stairs that has a turn in it Then I will measure things like the width of the pinball machine. And the width of the stairs, and of the landing. I will measure this carefully. Pinball machines are heavy and there are many hilarious sad stories of people wedging them into hallways and stairwells four and a half stories up from the street. But: once I have identified, say, ‘width of pinball machine’ as a quantity of interest, why would I ever refer to it as anything but?

This is no dumb question. It is always dangerous to lose the link between the thing we calculate and the thing we are interested in. Without that link we are less able to notice mistakes in either our calculations or the thing we mean to calculate. Without that link we can’t do a sanity check, that reassurance that it’s not plausible we just might fit something 96 feet long around the corner. Or that we estimated that we could fit something of six square feet around the corner. It is common advice in programming computers to always give variables meaningful names. Don’t write ‘T’ when ‘Total’ or, better, ‘Total_Value_Of_Purchase’ is available. Why do we disregard this in mathematics, and switch to ‘T’ instead?

First reason is, well, try writing this stuff out. Your hand (h) will fall off (foff) in about fifteen minutes, twenty seconds. (15′ 20”). If you’re writing a program, the programming environment you have will auto-complete the variable after one or two letters in. Or you can copy and paste the whole name. It’s still good practice to leave a comment about what the variable should represent, if the name leaves any reasonable ambiguity.

Another reason is that sure, we do specific problems for specific cases. But a mathematician is naturally drawn to thinking of general problems, in abstract cases. We see something in common between the problem “a length and a quarter of the length is fifteen feet; what is the length?” and the problem “a volume plus a quarter of the volume is fifteen gallons; what is the volume?”. That one is about lengths and the other about volumes doesn’t concern us. We see a saving in effort by separating the quantity of a thing from the kind of the thing. This restores danger. We must think, after we are done calculating, about whether the answer could make sense. But we can minimize that, we hope. At the least we can check once we’re done to see if our answer makes sense. Maybe even whether it’s right.

For centuries, as the things we now recognize as algebra developed, we would use words. We would talk about the “thing” or the “quantity” or “it”. Some impersonal name, or convenient pronoun. This would often get shortened because anything you write often you write shorter. “Re”, perhaps. In the late 16th century we start to see the “New Algebra”. Here mathematics starts looking like … you know … mathematics. We start to see stuff like “addition” represented with the + symbol instead of an abbreviation for “addition” or a p with a squiggle over it or some other shorthand. We get equals signs. You start to see decimals and exponents. And we start to see letters used in place of numbers whose value we don’t know.

There are a couple kinds of “numbers whose value we don’t know”. One is the number whose value we don’t know, but hope to learn. This is the classic variable we want to solve for. Another kind is the number whose value we don’t know because we don’t care. I mean, it has some value, and presumably it doesn’t change over the course of our problem. But it’s not like our work will be so different if, say, the tank is two feet high rather than four.

Is there a problem? If we pick our letters to fit a specific problem, no. Presumably all the things we want to describe have some clear name, and some letter that best represents the name. It’s annoying when we have to consider, say, the pinball machine width and the corridor width. But we can work something out.

Is $m b \cos(e) + b^2 \log(y) = \sqrt{e}$ an easy problem to solve?

If we want to figure what ‘m’ is, yes. Similarly ‘y’. If we want to know what ‘b’ is, it’s tedious, but we can do that. If we want to know what ‘e’ is? Run and hide, that stuff is crazy. If you have to, do it numerically and accept an estimate. Don’t try figuring what that is.

And so we’ve developed conventions. There are some letters that, except in weird circumstances, are coefficients. They’re numbers whose value we don’t know, but either don’t care about or could look up. And there are some that, by default, are variables. They’re the ones whose value we want to know.

These conventions started forming, as mentioned, in the late 16th century. François Viète here made a name that lasts to mathematics historians at least. His texts described how to do algebra problems in the sort of procedural methods that we would recognize as algebra today. And he had a great idea for these letters. Use the whole alphabet, if needed. Use the consonants to represent the coefficients, the numbers we know but don’t care what they are. Use the vowels to represent the variables, whose values we want to learn. So he would look at that equation and see right away: it’s a terrible mess. (I exaggerate. He doesn’t seem to have known the = sign, and I don’t know offhand when ‘log’ and ‘cos’ became common. But suppose the rest of the equation were translated into his terminology.)

It’s not a bad approach. Besides the mnemonic value of consonant-coefficient, vowel-variable, it’s true that we usually have fewer variables than anything else. The more variables in a problem the harder it is. If someone expects you to solve an equation with ten variables in it, you’re excused for refusing. So five or maybe six or possibly seven choices for variables is plenty.

But it’s not what we settled on. René Descartes had a better idea. He had a lot of them, but here’s one. Use the letters at the end of the alphabet for the unknowns. Use the letters at the start of the alphabet for coefficients. And that is, roughly, what we’ve settled on. In my example nightmare equation, we’d suppose ‘y’ to probably be the variable we want to solve for.

And so, and finally, x. It is almost the variable. It says “mathematics” in only two strokes. Even π takes more writing. Descartes used it. We follow him. It’s way off at the end of the alphabet. It starts few words, very few things, almost nothing we would want to measure. (Xylem … mass? Flow? What thing is the xylem anyway?) Even mathematical dictionaries don’t have much to say about it. The letter transports almost no connotations, no messy specific problems to it. If it suggests anything, it suggests the horizontal coordinate in a Cartesian system. It almost is mathematics. It signifies nothing in itself, but long use has given it an identity as the thing we hope to learn by study.

And pirate treasure maps. I don’t know when ‘X’ became the symbol of where to look for buried treasure. My casual reading suggests “never”. Treasure maps don’t really exist. Maps in general don’t work that way. Or at least didn’t before cartoons. X marking the spot seems to be the work of Robert Louis Stevenson, renowned for creating a fanciful map and then putting together a book to justify publishing it. (I jest. But according to Simon Garfield’s On The Map: A Mind-Expanding Exploration of the Way The World Looks, his map did get lost on the way to the publisher, and he had to re-create it from studying the text of Treasure Island. This delights me to no end.) It makes me wonder if Stevenson was thinking of x’s service in mathematics. But the advantages of x as a symbol are hard to ignore. It highlights a point clearly. It’s fast to write. Its use might be coincidence.

But it is a letter that does a needed job really well.

## The Summer 2017 Mathematics A To Z: Well-Ordering Principle

It’s the last full week of the Summer 2017 A To Z! Four more essays and I’ll have completed this project and curl up into a word coma. But I’m not there yet. Today’s request is another from Gaurish, who’s given me another delightful topic to write about. Gaurish hosts a fine blog, For the love of Mathematics, which I hope you’ve given a try.

# Well-Ordering Principle.

An old mathematics joke. Or paradox, if you prefer. What is the smallest whole number with no interesting properties?

Not one. That’s for sure. We could talk about one forever. It’s the first number we ever know. It’s the multiplicative identity. It divides into everything. It exists outside the realm of prime or composite numbers. It’s — all right, we don’t need to talk about one forever. Two? The smallest prime number. The smallest even number. The only even prime. The only — yeah, let’s move on. Three; the smallest odd prime number. Triangular number. One of only two prime numbers that isn’t one more or one less than a multiple of six. Let’s move on. Four. A square number. The smallest whole number that isn’t 1 or a prime. Five. Prime number. First sum of two different prime numbers. Part of the first prime pair. Six. Smallest perfect number. Smallest product of two different prime numbers. Let’s move on.

And so on. Somewhere around 22 or so, the imagination fails and we can’t think of anything not-boring about this number. So we’ve found the first number that hasn’t got any interesting properties! … Except that being the smallest boring number must be interesting. So we have to note that this is otherwise the smallest boring number except for that bit where it’s interesting. On to 23, which used to be the default funny number. 24. … Oh, carry on. Maybe around 31 things settle down again. Our first boring number! Except that, again, being the smallest boring number is interesting. We move on to 32, 33, 34. When we find one that couldn’t be interesting, we find that’s interesting. We’re left to conclude there is no such thing as a boring number.

This would be a nice thing to say for numbers that otherwise get no attention, if we pretend they can have hurt feelings. But we do have to admit, 1729 is actually only interesting because it’s a part of the legend of Srinivasa Ramanujan. Enjoy the silliness for a few paragraphs more.

(This is, if I’m not mistaken, a form of the heap paradox. Don’t remember that? Start with a heap of sand. Remove one grain; you’ve still got a heap of sand. Remove one grain again. Still a heap of sand. Remove another grain. Still a heap of sand. And yet if you did this enough you’d leave one or two grains, not a heap of sand. Where does that change?)

Another problem, something you might consider right after learning about fractions. What’s the smallest positive number? Not one-half, since one-third is smaller and still positive. Not one-third, since one-fourth is smaller and still positive. Not one-fourth, since one-fifth is smaller and still positive. Pick any number you like and there’s something smaller and still positive. This is a difference between the positive integers and the positive real numbers. (Or the positive rational numbers, if you prefer.) The thing positive integers have is obvious, but it is not a given.

The difference is that the positive integers are well-ordered, while the positive real numbers aren’t. Well-ordering we build on ordering. Ordering is exactly what you imagine it to be. Suppose you can say, for any two things in a set, which one is less than another. A set is well-ordered if whenever you have a non-empty subset you can pick out the smallest element. Smallest means exactly what you think, too.

The positive integers are well-ordered. And more. The way they’re set up, they have a property called the “well-ordering principle”. This means any non-empty set of positive integers has a smallest number in it.

This is one of those principles that seems so obvious and so basic that it can’t teach anything interesting. That it serves a role in some proofs, sure, that’s easy to imagine. But something important?

Look back to the joke/paradox I started with. It proves that every positive integer has to be interesting. Every number, including the ones we use every day. Including the ones that no one has ever used in any mathematics or physics or economics paper, and never will. We can avoid that paradox by attacking the vagueness of “interesting” as a word. Are you interested to know the 137th number you can write as the sum of cubes in two different ways? Before you say ‘yes’, consider whether you could name it ten days after you’ve heard the number.

(Granted, yes, it would be nice to know the 137th such number. But would you ever remember it? Would you trust that it’ll be on some Wikipedia page that somehow is never threatened with deletion for not being noteworthy? Be honest.)

But suppose we have some property that isn’t so mushy. Suppose that we can describe it in some way that’s indexed by the positive integers. Furthermore, suppose that we show that in any set of the positive integers it must be true for the smallest number in that set. What do we know?

— We know that it must be true for all the positive integers. There’s a smallest positive integer. The positive integers have this well-ordered principle. So any subset of the positive integers has some smallest member. And if we can show that something or other is always true for the smallest number in a subset of the positive integers, there you go.

This technique we call, when it’s introduced, induction. It’s usually a baffling subject because it’s usually taught like this: suppose the thing you want to show is indexed to the positive integers. Show that it’s true when the index is ‘1’. Show that if the thing is true for an arbitrary index ‘n’, then you know it’s true for ‘n + 1’. It’s baffling because that second part is hard to visualize. The student makes a lot of mistakes in learning, on examples of what the sum of the first ‘N’ whole numbers or their squares or cubes are. I don’t think induction is ever taught in this well-ordering principle method. But it does get used in proofs, once you get to the part of analysis where you don’t have to interact with actual specific numbers much anymore.

The well-ordering principle also gives us the method of infinite descent. You encountered this in learning proofs about, like, how the square root of two must be an irrational number. In this, you show that if something is true for some positive integer, then it must also be true for some other, smaller positive integer. And therefore some other, smaller positive integer again. And again, until you get into numbers small enough you can check by hand.

It keeps creeping in. The Fundamental Theorem of Arithmetic says that every positive whole number larger than one is a product of a unique string of prime numbers. (Well, the order of the primes doesn’t matter. 2 times 3 times 5 is the same number as 3 times 2 times 5, and so on.) The well-ordering principle guarantees you can factor numbers into a product of primes. Watch this slick argument.

Suppose you have a set of whole numbers that isn’t the product of prime numbers. There must, by the well-ordering principle, be some smallest number in that set. Call that number ‘n’. We know that ‘n’ can’t be prime, because if it were, then that would be its prime factorization. So it must be the product of at least two other numbers. Let’s suppose it’s two numbers. Call them ‘a’ and ‘b’. So, ‘n’ is equal to ‘a’ times ‘b’.

Well, ‘a’ and ‘b’ have to be less than ‘n’. So they’re smaller than the smallest number that isn’t a product of primes. So, ‘a’ is the product of some set of primes. And ‘b’ is the product of some set of primes. And so, ‘n’ has to equal the primes that factor ‘a’ times the primes that factor ‘b’. … Which is the prime factorization of ‘n’. So, ‘n’ can’t be in the set of numbers that don’t have prime factorizations. And so there can’t be any numbers that don’t have prime factorizations. It’s for the same reason we worked out there aren’t any numbers with nothing interesting to say about them.

And isn’t it delightful to find so simple a principle can prove such specific things?

## The Summer 2017 Mathematics A To Z: Volume Forms

I’ve been reading Elke Stangl’s Elkemental Force blog for years now. Sometimes I even feel social-media-caught-up enough to comment, or at least to like posts. This is relevant today as I discuss one of the Stangl’s suggestions for my letter-V topic.

# Volume Forms.

So sometime in pre-algebra, or early in (high school) algebra, you start drawing equations. It’s a simple trick. Lay down a coordinate system, some set of axes for ‘x’ and ‘y’ and maybe ‘z’ or whatever letters are important. Look to the equation, made up of x’s and y’s and maybe z’s and so. Highlight all the points with coordinates whose values make the equation true. This is the logical basis for saying (eg) that the straight line “is” $y = 2x + 1$.

A short while later, you learn about polar coordinates. Instead of using ‘x’ and ‘y’, you have ‘r’ and ‘θ’. ‘r’ is the distance from the center of the universe. ‘θ’ is the angle made with respect to some reference axis. It’s as legitimate a way of describing points in space. Some classrooms even have a part of the blackboard (whiteboard, whatever) with a polar-coordinates “grid” on it. This looks like the lines of a dartboard. And you learn that some shapes are easy to describe in polar coordinates. A circle, centered on the origin, is ‘r = 2’ or something like that. A line through the origin is ‘θ = 1’ or whatever. The line that we’d called $y = 2x + 1$ before? … That’s … some mess. And now $r = 2\theta + 1$ … that’s not even a line. That’s some kind of spiral. Two spirals, really. Kind of wild.

And something to bother you a while. $y = 2x + 1$ is an equation that looks the same as $r = 2\theta + 1$. You’ve changed the names of the variables, but not how they relate to each other. But one is a straight line and the other a spiral thing. How can that be?

The answer, ultimately, is that the letters in the equations aren’t these content-neutral labels. They carry meaning. ‘x’ and ‘y’ imply looking at space a particular way. ‘r’ and ‘θ’ imply looking at space a different way. A shape has different representations in different coordinate systems. Fair enough. That seems to settle the question.

But if you get to calculus the question comes back. You can integrate over a region of space that’s defined by Cartesian coordinates, x’s and y’s. Or you can integrate over a region that’s defined by polar coordinates, r’s and θ’s. The first time you try this, you find … well, that any region easy to describe in Cartesian coordinates is painful in polar coordinates. And vice-versa. Way too hard. But if you struggle through all that symbol manipulation, you get … different answers. Eventually the calculus teacher has mercy and explains. If you’re integrating in Cartesian coordinates you need to use “dx dy”. If you’re integrating in polar coordinates you need to use “r dr dθ”. If you’ve never taken calculus, never mind what this means. What is important is that “r dr dθ” looks like three things multiplied together, while “dx dy” is two.

We get this explained as a “change of variables”. If we want to go from one set of coordinates to a different one, we have to do something fiddly. The extra ‘r’ in “r dr dθ” is what we get going from Cartesian to polar coordinates. And we get formulas to describe what we should do if we need other kinds of coordinates. It’s some work that introduces us to the Jacobian, which looks like the most tedious possible calculation ever at that time. (In Intro to Differential Equations we learn we were wrong, and the Wronskian is the most tedious possible calculation ever. This is also wrong, but it might as well be true.) We typically move on after this and count ourselves lucky it got no worse than that.

None of this is wrong, even from the perspective of more advanced mathematics. It’s not even misleading, which is a refreshing change. But we can look a little deeper, and get something good from doing so.

The deeper perspective looks at “differential forms”. These are about how to encode information about how your coordinate system represents space. They’re tensors. I don’t blame you for wondering if they would be. A differential form uses interactions between some of the directions in a space. A volume form is a differential form that uses all the directions in a space. And satisfies some other rules too. I’m skipping those because some of the symbols involved I don’t even know how to look up, much less make WordPress present.

What’s important is the volume form carries information compactly. As symbols it tells us that this represents a chunk of space that’s constant no matter what the coordinates look like. This makes it possible to do analysis on how functions work. It also tells us what we would need to do to calculate specific kinds of problem. This makes it possible to describe, for example, how something moving in space would change.

The volume form, and the tools to do anything useful with it, demand a lot of supporting work. You can dodge having to explicitly work with tensors. But you’ll need a lot of tensor-related materials, like wedge products and exterior derivatives and stuff like that. If you’ve never taken freshman calculus don’t worry: the people who have taken freshman calculus never heard of those things either. So what makes this worthwhile?

Yes, person who called out “polynomials”. Good instinct. Polynomials are usually a reason for any mathematics thing. This is one of maybe four exceptions. I have to appeal to my other standard answer: “group theory”. These volume forms match up naturally with groups. There’s not only information about how coordinates describe a space to consider. There’s ways to set up coordinates that tell us things.

That isn’t all. These volume forms can give us new invariants. Invariants are what mathematicians say instead of “conservation laws”. They’re properties whose value for a given problem is constant. This can make it easier to work out how one variable depends on another, or to work out specific values of variables.

For example, classical physics problems like how a bunch of planets orbit a sun often have a “symplectic manifold” that matches the problem. This is a description of how the positions and momentums of all the things in the problem relate. The symplectic manifold has a volume form. That volume is going to be constant as time progresses. That is, there’s this way of representing the positions and speeds of all the planets that does not change, no matter what. It’s much like the conservation of energy or the conservation of angular momentum. And this has practical value. It’s the subject that brought my and Elke Stangl’s blogs into contact, years ago. It also has broader applicability.

There’s no way to provide an exact answer for the movement of, like, the sun and nine-ish planets and a couple major moons and all that. So there’s no known way to answer the question of whether the Earth’s orbit is stable. All the planets are always tugging one another, changing their orbits a little. Could this converge in a weird way suddenly, on geologic timescales? Might the planet might go flying off out of the solar system? It doesn’t seem like the solar system could be all that unstable, or it would have already. But we can’t rule out that some freaky alignment of Jupiter, Saturn, and Halley’s Comet might not tweak the Earth’s orbit just far enough for catastrophe to unfold. Granted there’s nothing we could do about the Earth flying out of the solar system, but it would be nice to know if we face it, we tell ourselves.

But we can answer this numerically. We can set a computer to simulate the movement of the solar system. But there will always be numerical errors. For example, we can’t use the exact value of π in a numerical computation. 3.141592 (and more digits) might be good enough for projecting stuff out a day, a week, a thousand years. But if we’re looking at millions of years? The difference can add up. We can imagine compensating for not having the value of π exactly right. But what about compensating for something we don’t know precisely, like, where Jupiter will be in 16 million years and two months?

Symplectic forms can help us. The volume form represented by this space has to be conserved. So we can rewrite our simulation so that these forms are conserved, by design. This does not mean we avoid making errors. But it means we avoid making certain kinds of errors. We’re more likely to make what we call “phase” errors. We predict Jupiter’s location in 16 million years and two months. Our simulation puts it thirty degrees farther in its circular orbit than it actually would be. This is a less serious mistake to make than putting Jupiter, say, eight-tenths as far from the Sun as it would really be.

Volume forms seem, at first, a lot of mechanism for a small problem. And, unfortunately for students, they are. They’re more trouble than they’re worth for changing Cartesian to polar coordinates, or similar problems. You know, ones that the student already has some feel for. They pay off on more abstract problems. Tracking the movement of a dozen interacting things, say, or describing a space that’s very strangely shaped. Those make the effort to learn about forms worthwhile.

## The Summer 2017 Mathematics A To Z: Ulam’s Spiral

Gaurish, of For the love of Mathematics, asked me about one of those modestly famous (among mathematicians) mathematical figures. Yeah, I don’t have a picture of it. Too much effort. It’s easier to write instead.

# Ulam’s Spiral.

Boredom is unfairly maligned in our society. I’ve said this before, but that was years ago, and I have some different readers today. We treat boredom as a terrible thing, something to eliminate. We treat it as a state in which nothing seems interesting. It’s not. Boredom is a state in which anything, however trivial, engages the mind. We would not count the tiles on the floor, or time the rocking of a chandelier, or wonder what fraction of solitaire games can be won if we were never bored. A bored mind is a mind ready to discover things. We should welcome the state.

Several times in the 20th century Stanislaw Ulam was bored. I mention solitaire games because, according to Ulam, he spent some time in 1946 bored, convalescent and playing a lot of solitaire. He got to wondering what’s the probability a particular solitaire game is winnable? (He was specifically playing Canfield solitaire. The game’s also called Demon, Chameleon, or Storehouse, if Wikipedia is right.) What’s the chance the cards can’t be played right, no matter how skilled the player is? It’s a problem impossible to do exactly. Ulam was one of the mathematicians designing and programming the computers of the day.

He, with John von Neumann, worked out how to get a computer to simulate many, many rounds of cards. They would get an answer that I have never seen given in any history of the field. The field is Monte Carlo simulations. It’s built on using random numbers to conduct experiments that approximate an answer. (They’re also what my specialty is in. I mention this for those who’ve wondered what, if any, mathematics field I do consider myself competent in. This is not it.) The chance of a winnable deal is about 71 to 72 percent, although actual humans can’t hope to do more than about 35 percent. My evening’s experience with this Canfield Solitaire game suggests the chance of winning is about zero.

In 1963, Ulam told Martin Gardner, he was bored again during a paper’s presentation. Ulam doodled, and doodled something interesting enough to have a computer doodle more than mere pen and paper could. It was interesting enough to feature in Gardner’s Mathematical Games column for March 1964. It started with what the name suggested, a spiral.

Write down ‘1’ in the center. Write a ‘2’ next to it. This is usually done to the right of the ‘1’. If you want the ‘2’ to be on the left, or above, or below, fine, it’s your spiral. Write a ‘3’ above the ‘2’. (Or below if you want, or left or right if you’re doing your spiral that way. You’re tracing out a right angle from the “path” of numbers before that.) A ‘4’ to the left of that, a ‘5’ under that, a ‘6’ under that, a ‘7’ to the right of that, and so on. A spiral, for as long as your paper or your patience lasts. Now draw a circle around the ‘2’. Or a box. Whatever. Highlight it. Also do this for the ‘3’, and the ‘5’, and the ‘7’ and all the other prime numbers. Do this for all the numbers on your spiral. And look at what’s highlighted.

It looks like …

It’s …

Well, it’s something.

It’s hard to say what exactly. There’s a lot of diagonal lines to it. Not uninterrupted lines. Every diagonal line has some spottiness to it. There are blank regions too. There are some long stretches of numbers not highlighted, many of them horizontal or vertical lines with no prime numbers in them. Those stop too. The eye can’t help seeing clumps, especially. Imperfect diagonal stitching across the fabric of the counting numbers.

Maybe seeing this is some fluke. Start with another number in the center. 2, if you like. 41, if you feel ambitious. Repeat the process. The details vary. But the pattern looks much the same. Regions of dense-packed broken diagonals, all over the plane.

It begs us to believe there’s some knowable pattern here. That we could get an artist to draw a figure, with each spot in the figure corresponding to a prime number. This would be great. We know many things about prime numbers, but we don’t really have any system to generate a lot of prime numbers. Not much better than “here’s a thing, try dividing it”. Back in the 80s and 90s we had the big Fractal Boom. Everybody got computers that could draw what passed for pictures. And we could write programs that drew them. The Ulam Spiral was a minor but exciting prospect there. Was it a fractal? I don’t know. I’m not sure if anyone knows. (The spiral like you’d draw on paper wouldn’t be. The spiral that went out to infinitely large numbers might conceivably be.) It seemed plausible enough for computing magazines to be interested in. Maybe we could describe the pattern by something as simple as the Koch curve (that wriggly triangular snowflake shape). Or as easy to program as the Mandelbrot Curve.

We haven’t found one. As keeps happening with prime numbers, the answers evade us. We can understand why diagonals should appear. Write a polynomial of the form $4n^2 + b n + c$. Evaluate it for n of 1, 2, 3, 4, and so on. Highlight those numbers. This will tend to highlight numbers that, in this spiral, are diagonal or horizontal or vertical lines. A lot of polynomials like this give a string of some prime numbers. But the polynomials all peter out. The lines all have interruptions.

There are other patterns. One, predating Ulam’s boring paper by thirty years, was made by Laurence Klauber. Klauber was a herpetologist of some renown, if Wikipedia isn’t misleading me. It claims his Rattlesnakes: Their Habits, Life Histories, and Influence on Mankind is still an authoritative text. I don’t know and will defer to people versed in the field. It also credits him with several patents in electrical power transmission.

Anyway, Klauber’s Triangle sets a ‘1’ at the top of the triangle. The numbers ‘2 3 4’ under that, with the ‘3’ directly beneath the ‘1’. The numbers ‘5 6 7 8 9’ beneath that, the ‘7’ directly beneath the ‘3’. ’10 11 12 13 14 15 16′ beneath that, the ’13’ underneath the ‘7’. And so on. Again highlight the prime numbers. You get again these patterns of dots and lines. Many vertical lines. Some lines in isometric view. It looks like strands of Morse Code.

In 1994 Robert Sacks created another variant. This one places the counting numbers on an Archimedian spiral. Space the numbers correctly and highlight the primes. The primes will trace out broken curves. Some are radial. Some spiral in (or out, if you rather). Some open up islands. The pattern looks like a Saul Bass logo for a “Nifty Fifty”-era telecommunications firm or maybe an airline.

You can do more. Draw a hexagonal spiral. Triangular ones. Other patterns of laying down numbers. You get patterns. The eye can’t help seeing order there. We can’t quite pin down what it is. Prime numbers keep evading our full understanding. Perhaps it would help to doodle a little during a tiresome conference call.

Stanislaw Ulam did enough fascinating numerical mathematics that I could probably do a sequence just on his work. I do want to mention one thing. It’s part of information theory. You know the game Twenty Questions. Play that, but allow for some lying. The game is still playable. Ulam did not invent this game; Alfréd Rényi did. (I do not know anything else about Rényi.) But Ulam ran across Rényi’s game, and pointed out how interesting it was, and mathematicians paid attention to him.

## The Summer 2017 Mathematics A To Z: Topology

Today’s glossary entry comes from Elke Stangl, author of the Elkemental Force blog. I’ll do my best, although it would have made my essay a bit easier if I’d had the chance to do another topic first. We’ll get there.

# Topology.

Start with a universe. Nice thing to have around. Call it ‘M’. I’ll get to why that name.

I’ve talked a fair bit about weird mathematical objects that need some bundle of traits to be interesting. So this will change the pace some. Here, I request only that the universe have a concept of “sets”. OK, that carries a little baggage along with it. We have to have intersections and unions. Those come about from having pairs of sets. The intersection of two sets is all the things that are in both sets simultaneously. The union of two sets is all the things that are in one set, or the other, or both simultaneously. But it’s hard to think of something that could have sets that couldn’t have intersections and unions.

So from your universe ‘M’ create a new collection of things. Call it ‘T’. I’ll get to why that name. But if you’ve formed a guess about why, then you know. So I suppose I don’t need to say why, now. ‘T’ is a collection of subsets of ‘M’. Now let’s suppose these four things are true.

First. ‘M’ is one of the sets in ‘T’.

Second. The empty set ∅ (which has nothing at all in it) is one of the sets in ‘T’.

Third. Whenever two sets are in ‘T’, their intersection is also in ‘T’.

Fourth. Whenever two (or more) sets are in ‘T’, their union is also in ‘T’.

Got all that? I imagine a lot of shrugging and head-nodding out there. So let’s take that. Your universe ‘M’ and your collection of sets ‘T’ are a topology. And that’s that.

Yeah, that’s never that. Let me put in some more text. Suppose we have a universe that consists of two symbols, say, ‘a’ and ‘b’. There’s four distinct topologies you can make of that. Take the universe plus the collection of sets {∅}, {a}, {b}, and {a, b}. That’s a topology. Try it out. That’s the first collection you would probably think of.

Here’s another collection. Take this two-thing universe and the collection of sets {∅}, {a}, and {a, b}. That’s another topology and you might want to double-check that. Or there’s this one: the universe and the collection of sets {∅}, {b}, and {a, b}. Last one: the universe and the collection of sets {∅} and {a, b} and nothing else. That one barely looks legitimate, but it is. Not a topology: the universe and the collection of sets {∅}, {a}, and {b}.

The number of toplogies grows surprisingly with the number of things in the universe. Like, if we had three symbols, ‘a’, ‘b’, and ‘c’, there would be 29 possible topologies. The universe of the three symbols and the collection of sets {∅}, {a}, {b, c}, and {a, b, c}, for example, would be a topology. But the universe and the collection of sets {∅}, {a}, {b}, {c}, and {a, b, c} would not. It’s a good thing to ponder if you need something to occupy your mind while awake in bed.

With four symbols, there’s 355 possibilities. Good luck working those all out before you fall asleep. Five symbols have 6,942 possibilities. You realize this doesn’t look like any expected sequence. After ‘4’ the count of topologies isn’t anything obvious like “two to the number of symbols” or “the number of symbols factorial” or something.

Are you getting ready to call me on being inconsistent? In the past I’ve talked about topology as studying what we can know about geometry without involving the idea of distance. How’s that got anything to do with this fiddling about with sets and intersections and stuff?

So now we come to that name ‘M’, and what it’s finally mnemonic for. I have to touch on something Elke Stangl hoped I’d write about, but a letter someone else bid on first. That would be a manifold. I come from an applied-mathematics background so I’m not sure I ever got a proper introduction to manifolds. They appeared one day in the background of some talk about physics problems. I think they were introduced as “it’s a space that works like normal space”, and that was it. We were supposed to pretend we had always known about them. (I’m translating. What we were actually told would be that it “works like R3”. That’s how mathematicians say “like normal space”.) That was all we needed.

Properly, a manifold is … eh. It’s something that works kind of like normal space. That is, it’s a set, something that can be a universe. And it has to be something we can define “open sets” on. The open sets for the manifold follow the rules I gave for a topology above. You can make a collection of these open sets. And the empty set has to be in that collection. So does the whole universe. The intersection of two open sets in that collection is itself in that collection. The union of open sets in that collection is in that collection. If all that’s true, then we have a manifold.

And now the piece that makes every pop mathematics article about topology talk about doughnuts and coffee cups. It’s possible that two topologies might be homeomorphic to each other. “Homeomorphic” is a term of art. But you understand it if you remember that “morph” means shape, and suspect that “homeo” is probably close to “homogenous”. Two things being homeomorphic means you can match their parts up. In the matching there’s nothing left over in the first thing or the second. And the relations between the parts of the first thing are the same as the relations between the parts of the second thing.

So. Imagine the snippet of the number line for the numbers larger than -π and smaller than π. Think of all the open sets you can use to cover that. It will have a set like “the numbers bigger than 0 and less than 1”. A set like “the numbers bigger than -π and smaller than 2.1”. A set like “the numbers bigger than 0.01 and smaller than 0.011”. And so on.

Now imagine the points that exist on a circle, if you’ve omitted one point. Let’s say it’s the unit circle, centered on the origin, and that what we’re leaving out is the point that’s exactly to the left of the origin. The open sets for this are the arcs that cover some part of this punctured circle. There’s the arc that corresponds to the angles from 0 to 1 radian measure. There’s the arc that corresponds to the angles from -π to 2.1 radians. There’s the arc that corresponds to the angles from 0.01 to 0.011 radians. You see where this is going. You see why I say we can match those sets on the number line to the arcs of this punctured circle. There’s some details to fill in here. But you probably believe me this could be done if I had to.

There’s two (or three) great branches of topology. One is called “algebraic topology”. It’s the one that makes for fun pop mathematics articles about imaginary rubber sheets. It’s called “algebraic” because this field makes it natural to study the holes in a sheet. And those holes tend to form groups and rings, basic pieces of Not That Algebra. The field (I’m told) can be interpreted as looking at functors on groups and rings. This makes for some neat tying-together of subjects this A To Z round.

The other branch is called “differential topology”, which is a great field to study because it sounds like what Mister Spock is thinking about. It inspires awestruck looks where saying you study, like, Bayesian probability gets blank stares. Differential topology is about differentiable functions on manifolds. This gets deep into mathematical physics.

As you study mathematical physics, you stop worrying about ever solving specific physics problems. Specific problems are petty stuff. What you like is solving whole classes of problems. A steady trick for this is to try to find some properties that are true about the problem regardless of what exactly it’s doing at the time. This amounts to finding a manifold that relates to the problem. Consider a central-force problem, for example, with planets orbiting a sun. A planet can’t move just anywhere. It can only be in places and moving in directions that give the system the same total energy that it had to start. And the same linear momentum. And the same angular momentum. We can match these constraints to manifolds. Whatever the planet does, it does it without ever leaving these manifolds. To know the shapes of these manifolds — how they are connected — and what kinds of functions are defined on them tells us something of how the planets move.

The maybe-third branch is “low-dimensional topology”. This is what differential topology is for two- or three- or four-dimensional spaces. You know, shapes we can imagine with ease in the real world. Maybe imagine with some effort, for four dimensions. This kind of branches out of differential topology because having so few dimensions to work in makes a lot of problems harder. We need specialized theoretical tools that only work for these cases. Is that enough to count as a separate branch? It depends what topologists you want to pick a fight with. (I don’t want a fight with any of them. I’m over here in numerical mathematics when I’m not merely blogging. I’m happy to provide space for anyone wishing to defend her branch of topology.)

But each grows out of this quite general, quite abstract idea, also known as “point-set topology”, that’s all about sets and collections of sets. There is much that we can learn from thinking about how to collect the things that are possible.

## The Summer 2017 Mathematics A To Z: Sárközy’s Theorem

Gaurish, of For the love of Mathematics, gives me another chance to talk number theory today. Let’s see how that turns out.

# Sárközy’s Theorem.

I have two pieces to assemble for this. One is in factors. We can take any counting number, a positive whole number, and write it as the product of prime numbers. 2038 is equal to the prime 2 times the prime 1019. 4312 is equal to 2 raised to the third power times 7 raised to the second times 11. 1040 is 2 to the fourth power times 5 times 13. 455 is 5 times 7 times 13.

There are many ways to divide up numbers like this. Here’s one. Is there a square number among its factors? 2038 and 455 don’t have any. They’re each a product of prime numbers that are never repeated. 1040 has a square among its factors. 2 times 2 divides into 1040. 4312, similarly, has a square: we can write it as 2 squared times 2 times 7 squared times 11. So that is my first piece. We can divide counting numbers into squarefree and not-squarefree.

The other piece is in binomial coefficients. These are numbers, often quite big numbers, that get dumped on the high school algebra student as she tries to work with some expression like $(a + b)^n$. They’re also dumped on the poor student in calculus, as something about Newton’s binomial coefficient theorem. Which we hear is something really important. In my experience it wasn’t explained why this should rank up there with, like, the differential calculus. (Spoiler: it’s because of polynomials.) But it’s got some great stuff to it.

Binomial coefficients are among those utility players in mathematics. They turn up in weird places. In dealing with polynomials, of course. They also turn up in combinatorics, and through that, probability. If you run, for example, 10 experiments each of which could succeed or fail, the chance you’ll get exactly five successes is going to be proportional to one of these binomial coefficients. That they touch on polynomials and probability is a sign we’re looking at a thing woven into the whole universe of mathematics. We saw them some in talking, last A-To-Z around, about Yang Hui’s Triangle. That’s also known as Pascal’s Triangle. It has more names too, since it’s been found many times over.

The theorem under discussion is about central binomial coefficients. These are one specific coefficient in a row. The ones that appear, in the triangle, along the line of symmetry. They’re easy to describe in formulas. for a whole number ‘n’ that’s greater than or equal to zero, evaluate what we call 2n choose n:

${{2n} \choose{n}} = \frac{(2n)!}{(n!)^2}$

If ‘n’ is zero, this number is $\frac{0!}{(0!)^2}$ or 1. If ‘n’ is 1, this number is $\frac{2!}{(1!)^2}$ or 2. If ‘n’ is 2, this number is $\frac{4!}{(2!)^2}$ 6. If ‘n’ is 3, this number is (sparing the formula) 20. The numbers keep growing. 70, 252, 924, 3432, 12870, and so on.

So. 1 and 2 and 6 are squarefree numbers. Not much arguing that. But 20? That’s 2 squared times 5. 70? 2 times 5 times 7. 252? 2 squared times 3 squared times 7. 924? That’s 2 squared times 3 times 7 times 11. 3432? 2 cubed times 3 times 11 times 13; there’s a 2 squared in there. 12870? 2 times 3 squared times it doesn’t matter anymore. It’s not a squarefree number.

There’s a bunch of not-squarefree numbers in there. The question: do we ever stop seeing squarefree numbers here?

So here’s Sárközy’s Theorem. It says that this central binomial coefficient ${{2n} \choose{n}}$ is never squarefree as long as ‘n’ is big enough. András Sárközy showed in 1985 that this was true. How big is big enough? … We have a bound, at least, for this theorem. If ‘n’ is larger than the number $2^{8000}$ then the corresponding coefficient can’t be squarefree. It might not surprise you that the formulas involved here feature the Riemann Zeta function. That always seems to turn up for questions about large prime numbers.

That’s a common state of affairs for number theory problems. Very often we can show that something is true for big enough numbers. I’m not sure there’s a clear reason why. When numbers get large enough it can be more convenient to deal with their logarithms, I suppose. And those look more like the real numbers than the integers. And real numbers are typically easier to prove stuff about. Maybe that’s it. This is vague, yes. But to ask ‘why’ some things are easy and some are hard to prove is a hard question. What is a satisfying ’cause’ here?

It’s tempting to say that since we know this is true for all ‘n’ above a bound, we’re done. We can just test all the numbers below that bound, and the rest is done. You can do a satisfying proof this way: show that eventually the statement is true, and show all the special little cases before it is. This particular result is kind of useless, though. $2^{8000}$ is a number that’s something like 241 digits long. For comparison, the total number of things in the universe is something like a number about 80 digits long. Certainly not more than 90. It’d take too long to test all those cases.

That’s all right. Since Sárközy’s proof in 1985 there’ve been other breakthroughs. In 1988 P Goetgheluck proved it was true for a big range of numbers: every ‘n’ that’s larger than 4 and less than $2^{42,205,184}$. That’s a number something more than 12 million digits long. In 1991 I Vardi proved we had no squarefree central binomial coefficients for ‘n’ greater than 4 and less than $2^{774,840,978}$, which is a number about 233 million digits long. And then in 1996 Andrew Granville and Olivier Ramare showed directly that this was so for all ‘n’ larger than 4.

So that 70 that turned up just a few lines in is the last squarefree one of these coefficients.

Is this surprising? Maybe, maybe not. I’ll bet most of you didn’t have an opinion on this topic twenty minutes ago. Let me share something that did surprise me, and continues to surprise me. In 1974 David Singmaster proved that any integer divides almost all the binomial coefficients out there. “Almost all” is here a term of art, but it means just about what you’d expect. Imagine the giant list of all the numbers that can be binomial coefficients. Then pick any positive integer you like. The number you picked will divide into so many of the giant list that the exceptions won’t be noticeable. So that square numbers like 4 and 9 and 16 and 25 should divide into most binomial coefficients? … That’s to be expected, suddenly. Into the central binomial coefficients? That’s not so obvious to me. But then so much of number theory is strange and surprising and not so obvious.

## The Summer 2017 Mathematics A To Z: Ricci Tensor

Today’s is technically a request from Elke Stangl, author of the Elkemental Force blog. I think it’s also me setting out my own petard for self-hoisting, as my recollection is that I tossed off a mention of “defining the Ricci Tensor” as the sort of thing that’s got a deep beauty that’s hard to share with people. And that set off the search for where I had written about the Ricci Tensor. I hadn’t, and now look what trouble I’m in. Well, here goes.

# Ricci Tensor.

Imagine if nothing existed.

You’re not doing that right, by the way. I expect what you’re thinking of is a universe that’s a big block of space that doesn’t happen to have any things clogging it up. Maybe you have a natural sense of volume in it, so that you know something is there. Maybe you even imagine something with grid lines or reticules or some reference points. What I imagine after a command like that is a sort of great rectangular expanse, dark and faintly purple-tinged, with small dots to mark its expanse. That’s fine. This is what I really want. But it’s not really imagining nothing existing. There’s space. There’s some sense of where things would be, if they happened to be in there. We’d have to get rid of the space to have “nothing” exist. And even then we have logical problems that sound like word games. (How can nothing have a property like “existing”? Or a property like “not existing”?) This is dangerous territory. Let’s not step there.

So take the empty space that’s what mathematics and physics people mean by “nothing”. What do we know about it? Unless we’re being difficult, it’s got some extent. There are points in it. There’s some idea of distance between these points. There’s probably more than one dimension of space. There’s probably some sense of time, too. At least we’re used to the expectation that things would change if we watched. It’s a tricky sense to have, though. It’s hard to say exactly what time is. We usually fall back on the idea that we know time has passed if we see something change. But if there isn’t anything to see change? How do we know there’s still time passing?

You maybe already answered. We know time is passing because we can see space changing. One of the legs of Modern Physics is geometry, how space is shaped and how its shape changes. This tells us how gravity works, and how electricity and magnetism propagate. If there were no matter, no energy, no things in the universe there would still be some kind of physics. And interesting physics, since the mathematics describing this stuff is even subtler and more challenging to the intuition than even normal Euclidean space. If you’re going to read a pop mathematics blog like this, you’re very used to this idea.

Probably haven’t looked very hard at the idea, though. How do you tell whether space is changing if there’s nothing in it? It’s all right to imagine a coordinate system put on empty space. Coordinates are our concept. They don’t affect the space any more than the names we give the squirrels in the yard affect their behavior. But how to make the coordinates move with the space? It seems question-begging at least.

We have a mathematical gimmick to resolve this. Of course we do. We call it a name like a “test mass” or a “test charge” or maybe just “test particle”. Imagine that we drop into space a thing. But it’s only barely a thing. It’s tiny in extent. It’s tiny in mass. It’s tiny in charge. It’s tiny in energy. It’s so slight in every possible trait that it can’t sully our nothingness. All it does is let us detect it. It’s a good question how. We have good eyes. But now, we could see the particle moving as the space it’s in moves.

But again we can ask how. Just one point doesn’t seem to tell us much. We need a bunch of test particles, a whole cloud of them. They don’t interact. They don’t carry energy or mass or anything. They just carry the sense of place. This is how we would perceive space changing in time. We can ask questions meaningfully.

Here’s an obvious question: how much volume does our cloud take up? If we’re going to be difficult about this, none at all, since it’s a finite number of particles that all have no extent. But you know what we mean. Draw a ball, or at least an ellipsoid, around the test particles. How big is that? Wait a while. Draw another ball around the now-moved test particles. How big is that now?

Here’s another question: has the cloud rotated any? The test particles, by definition, don’t have mass or anything. So they don’t have angular momentum. They aren’t pulling one another to the side any. If they rotate it’s because space has rotated, and that’s interesting to consider. And another question: might they swap positions? Could a pair of particles that go left-to-right swap so they go right-to-left? That I ask admits that I want to allow the possibility.

These are questions about coordinates. They’re about how one direction shifts to other directions. How it stretches or shrinks. That is to say, these are questions of tensors. Tensors are tools for many things, most of them about how things transmit through different directions. In this context, time is another direction.

All our questions about how space moves we can describe as curvature. How do directions fall away from being perpendicular to one another? From being parallel to themselves? How do their directions change in time? If we have three dimensions in space and one in time — a four-dimensional “manifold” — then there’s 20 different “directions” each with maybe their own curvature to consider. This may seem a lot. Every point on this manifold has this set of twenty numbers describing the curvature of space around it. There’s not much to do but accept that, though. If we could do with fewer numbers we would, but trying cheats us out of physics.

Ten of the numbers in that set are themselves a tensor. It’s known as the Weyl Tensor. It describes gravity’s equivalent to light waves. It’s about how the shape of our cloud will change as it moves. The other ten numbers form another tensor. That is, a thousand words into the essay, the Ricci Tensor. The Ricci Tensor describes how the volume of our cloud will change as the test particles move along. It may seem odd to need ten numbers for this, but that’s what we need. For three-dimensional space and one-dimensional time, anyway. We need fewer for two-dimensional space; more, for more dimensions of space.

The Ricci Tensor is a geometric construct. Most of us come to it, if we do, by way of physics. It’s a useful piece of general relativity. It has uses outside this, though. It appears in the study of Ricci Flows. Here space moves in ways akin to how heat flows. And the Ricci Tensor appears in projective geometry, in the study of what properties of shapes don’t depend on how we present them.

It’s still tricky stuff to get a feeling for. I’m not sure I have a good feel for it myself. There’s a long trail of mathematical symbols leading up to these tensors. The geometry of them becomes more compelling in four or more dimensions, which taxes the imagination. Yann Ollivier here has a paper that attempts to provide visual explanations for many of the curvatures and tensors that are part of the field. It might help.

## The Summer 2017 Mathematics A To Z: Quasirandom numbers

Gaurish, host of, For the love of Mathematics, gives me the excuse to talk about amusement parks. You may want to brace yourself. Yes, this essay includes a picture. It would have included a video if I had enough WordPress privileges for that.

# Quasirandom numbers.

Think of a merry-go-round. Or carousel, if you prefer. I will venture a guess. You might like merry-go-rounds. They’re beautiful. They can evoke happy thoughts of childhood when they were a big ride it was safe to go on. But they don’t often make one think of thrills.. They’re generally sedate things. They don’t need to be. There’s no great secret to making a carousel a thrill ride. They knew it a century ago, when all the great American carousels were carved. It’s simple. Make the thing spin fast enough, at the five or six rotations per minute the ride was made for. There are places that do this yet. There’s the Cedar Downs ride at Cedar Point, Sandusky, Ohio. There’s the antique carousel at Crossroads Village, a historical village/park just outside Flint, Michigan. There’s the Derby Racer at Playland in Rye, New York. There’s the carousel in the Merry-Go-Round Museum in Sandusky, Ohio. Any of them are great rides. Two of them have a special edge. I’ll come back to them.

Randomness is a valuable resource. We know it’s key to many things. We have major fields of mathematics built on it. We can understand the behavior of variables without ever knowing what value they have. All we need is to know than the chance they might be in some particular range. This makes possible all kinds of problems too complicated to do otherwise. We know it’s critical. Quantum mechanics would not work without randomness. Without quantum mechanics, matter doesn’t work. And that’s true randomness, the kind where something is unpredictable. It’s not the kind of randomness we talk about when we ask, say, what’s the chance someone was born on a Tuesday. That’s mere hidden information: if we knew the month and date and year of a person’s birth we would know whether they were born Tuesday or not. We need more.

So the trouble is actually getting a random number. Well, a sequence of randomly drawn numbers. We rarely need this if we’re doing analysis. We can understand how some process changes the shape of a distribution without ever using the distribution. We can take derivatives of a function without ever evaluating the original function, after all.

But we do need randomly drawn numbers. We do too much numerical work with them. For example, it’s impossible to exactly integrate most functions. Numerical methods can take a ferociously long time to evaluate. A family of methods called Monte Carlo rely on randomly-drawn values to estimate the integral. The results are strikingly good for the work required. But they must have random numbers. The name “Monte Carlo” is not some cryptic code. It is an expression of how randomly drawn numbers make the tool work.

It’s hard to get random numbers. Consider: we can’t write an algorithm to do it. If we were to write one, then we’d be able to predict that the sequence of numbers was. We have some recourse. We could set up instruments to rely on the randomness that seems to be in the world. Thermal fluctuations, for example, created by processes outside any computer’s control, can give us a pleasant dose of randomness. If we need higher-quality random numbers than that we can go to exotic equipment. Geiger counters watching the decay of a not-alarmingly-radioactive sample. Cosmic ray detectors watching the sky.

Or we can write something that produces numbers that look random enough. They won’t really be random, and if we wait long enough we’ll notice the sequence repeats itself. But if we only need, say, ten numbers, who cares if the sequence will repeat after ten million numbers? (We’ll surely need more than ten numbers. But we can postpone the repetition until we’ve drawn far more than ten million numbers.)

Two of the carousels I’ve mentioned have an astounding property. The horses in a file move. I mean, relative to each other. Some horse will start the race in front of its neighbors; some will start behind. The four move forward and back thanks to a mechanism of, I am assured, staggering complexity. There are only three carousels in the world that have it. There’s Cedar Downs at Cedar Point in Sandusky, Ohio; the Racing Downs at Playland in Rye, New York; and the Derby Racer at Blackpool Pleasure Beach in Blackpool, England. The mechanism in Blackpool’s hasn’t operated in years. The one at Playland’s had not run in years, but was restored for the 2017 season. My love and I made a trip specifically to ride that. (You may have heard of a fire at the carousel in Playland this summer. This was of part of the building for their other, non-racing, antique carousel. My last information was that the carousel itself was all right.)

These racing derbies have the horses in a file move forward and back in a “random” way. It’s not truly random. If you knew exactly which gears were underneath each horse, and where in their rotations they were, you could say which horse was about to gain on its partners and which was about to fall back. But all that is concealed from the rider. The horse patterns will eventually, someday, repeat. If the gear cycles aren’t interrupted by maintenance or malfunctions. But nobody’s going to ride any horse long enough to notice. We have in these rides a randomness as good as what your computer makes, at least for the purpose it serves.

What does it mean to look random? Some things seem obvious. All the possible numbers ought to come up, sooner or later. Any particular possible number shouldn’t repeat too often. Any particular possible number shouldn’t go too long without repeating. There shouldn’t be clumps of numbers; if, say, ‘4’ turns up, we shouldn’t see ‘5’ turn up right away all the time.

We can make the idea of “looking” random quite literal. Suppose we’re selecting numbers from 0 through 9. We can draw the random numbers we’ve picked. Use the numbers as coordinates. Say we pick four digits: 1, 3, 9, and 0. Then draw the point that’s at x-coordinate 13, y-coordinate 90. Then the next four digits. Let’s say they’re 4, 2, 3, and 8. Then draw the point that’s at x-coordinate 42, y-coordinate 38. And repeat. What will this look like?

If it clumps up, we probably don’t have good random numbers. If we see lines that points collect along, or avoid, there’s a good chance our numbers aren’t very random. If there’s whole blocks of space that they occupy, and others they avoid, we may have a defective source of random numbers. We should expect the points to cover a space pretty uniformly. (There are more rigorous, logically sound, methods. The eye can be fooled easily enough. But it’s the same principle. We have some test that notices clumps and gaps.) But …

The thing is, there’s always going to be some clumps. There’ll always be some gaps. Part of randomness is that it forms patterns, or at least things that look like patterns to us. We can describe how big a clump (or gap; it’s the same thing, really) is for any particular quantity of randomly drawn numbers. If we see clumps bigger than that we can throw out the numbers as suspect. But … still …

Toss a coin fairly twenty times, and there’s no reason it can’t turn up tails sixteen times. This doesn’t happen often, but it will happen sometimes. Just luck. This surplus of tails should evaporate as we take more tosses. That is, we most likely won’t see 160 tails out of 200 tosses. We certainly will not see 1,600 tails out of 2,000 tosses. We know this as the Law of Large Numbers. Wait long enough and weird fluctuations will average out.

What if we don’t have time, though? For coin-tossing that’s silly; of course we have time. But for Monte Carlo integration? It could take too long to be confident we haven’t got too-large gaps or too-tight clusters.

This is why we take quasi-random numbers. We begin with what randomness we’re able to manage. But we massage it. Imagine our coins example. Suppose after ten fair tosses we noticed there had been eight tails turn up. Then we would start tossing less fairly, trying to make heads more common. We would be happier if there were 12 rather than 16 tails after twenty tosses.

Draw the results. We get now a pattern that looks still like randomness. But it’s a finer sorting; it looks like static tidied up some. The quasi-random numbers are not properly random. Knowing that, say, the last several numbers were odd means the next one is more likely to be even, the Gambler’s Fallacy put to work. But in aggregate, we trust, we’ll be able to enjoy the speed and power of randomly-drawn numbers. It shows its strengths when we don’t know just how finely we must sample a range of numbers to get good, reliable results.

To carousels. I don’t know whether the derby racers have quasirandom outcomes. I would find believable someone telling me that all the possible orderings of the four horses in any file are equally likely. To know would demand detailed knowledge of how the gearing works, though. Also probably simulations of how the system would work if it ran long enough. It might be easier to watch the ride for a couple of days and keep track of the outcomes. If someone wants to sponsor me doing a month-long research expedition to Cedar Point, drop me a note. Or just pay for my season pass. You folks would do that for me, wouldn’t you? Thanks.

## The Summer 2017 Mathematics A To Z: Prime Number

Gaurish, host of, For the love of Mathematics, gives me another topic for today’s A To Z entry. I think the subject got away from me. But I also like where it got.

# Prime Number.

Something about ‘5’ that you only notice when you’re a kid first learning about numbers. You know that it’s a prime number because it’s equal to 1 times 5 and nothing else. You also know that once you introduce fractions, it’s equal to all kinds of things. It’s 10 times one-half and it’s 15 times one-third and it’s 2.5 times 2 and many other things. Why, you might ask the teacher, is it a prime number if it’s got a million billion trillion different factors? And when every other whole number has as many factors? If you get to the real numbers it’s even worse yet, although when you’re a kid you probably don’t realize that. If you ask, the teacher probably answers that it’s only the whole numbers that count for saying whether something is prime or not. And, like, 2.5 can’t be considered anything, prime or composite. This satisfies the immediate question. It doesn’t quite get at the underlying one, though. Why do integers have prime numbers while real numbers don’t?

To maybe have a prime number we need a ring. This is a creature of group theory, or what we call “algebra” once we get to college. A ring consists of a set of elements, and a rule for adding them together, and a rule for multiplying them together. And I want this ring to have a multiplicative identity. That’s some number which works like ‘1’: take something, multiply it by that, and you get that something back again. Also, I want this multiplication rule to commute. That is, the order of multiplication doesn’t affect what the result is. (If the order matters then everything gets too complicated to deal with.) Let me say the things in the set are numbers. It turns out (spoiler!) they don’t have to be. But that’s how we start out.

Whether the numbers in a ring are prime or not depends on the multiplication rule. Let’s take a candidate number that I’ll call ‘a’ to make my writing easier. If the only numbers whose product is ‘a’ are the pair of ‘a’ and the multiplicative identity, then ‘a’ is prime. If there’s some other pair of numbers that give you ‘a’, then ‘a’ is not prime.

The integers — the positive and negative whole numbers, including zero — are a ring. And they have prime numbers just like you’d expect, if we figure out some rule about how to deal with the number ‘-1’. There are many other rings. There’s a whole family of rings, in fact, so commonly used that they have shorthand. Mathematicians write them as “Zn”, where ‘n’ is some whole number. They’re the integers, modulo ‘n’. That is, they’re the whole numbers from ‘0’ up to the number ‘n-1’, whatever that is. Addition and multiplication work as they do with normal arithmetic, except that if the result is less than ‘0’ we add ‘n’ to it. If the result is more than ‘n-1’ we subtract ‘n’ from it. We repeat that until the result is something from ‘0’ to ‘n-1’, inclusive.

(We use the letter ‘Z’ because it’s from the German word for numbers, and a lot of foundational work was done by German-speaking mathematicians. Alternatively, we might write this set as “In”, where “I” stands for integers. If that doesn’t satisfy, we might write this set as “Jn”, where “J” stands for integers. This is because it’s only very recently that we’ve come to see “I” and “J” as different letters rather than different ways to write the same letter.)

These modulo arithmetics are legitimate ones, good reliable rings. They make us realize how strange prime numbers are, though. Consider the set Z4, where the only numbers are 0, 1, 2, and 3. 0 times anything is 0. 1 times anything is whatever you started with. 2 times 1 is 2. Obvious. 2 times 2 is … 0. All right. 2 times 3 is 2 again. 3 times 1 is 3. 3 times 2 is 2. 3 times 3 is 1. … So that’s a little weird. The only product that gives us 3 is 3 times 1. So 3’s a prime number here. 2 isn’t a prime number: 2 times 3 is 2. For that matter even 1 is a composite number, an unsettling consequence.

Or then Z5, where the only numbers are 0, 1, 2, 3, and 4. Here, there are no prime numbers. Each number is the product of at least one pair of other numbers. In Z6 we start to have prime numbers again. But Z7? Z8? I recommend these questions to a night when your mind is too busy to let you fall asleep.

Prime numbers depend on context. In the crowded universe of all the rational numbers, or all the real numbers, nothing is prime. In the more austere world of the Gaussian Integers, familiar friends like ‘3’ are prime again, although ‘5’ no longer is. We recognize that as the product of $2 + \imath$ and $2 - \imath$, themselves now prime numbers.

So given that these things do depend on context. Should we care? Or let me put it another way. Suppose we contact a wholly separate culture, one that we can’t have influenced and one not influenced by us. It’s plausible that they should have a mathematics. Would they notice prime numbers as something worth study? Or would they notice them the way we notice, say, pentagonal numbers, a thing that allows for some pretty patterns and that’s about it?

Well, anything could happen, of course. I’m inclined to think that prime numbers would be noticed, though. They seem to follow naturally from pondering arithmetic. And if one has thought of rings, then prime numbers seem to stand out. The way that Zn behaves changes in important ways if ‘n’ is a prime number. Most notably, if ‘n’ is prime (among the whole numbers), then we can define something that works like division on Zn. If ‘n’ isn’t prime (again), we can’t. This stands out. There are a host of other intriguing results that all seem to depend on whether ‘n’ is a prime number among the whole numbers. It seems hard to believe someone could think of the whole numbers and not notice the prime numbers among them.

And they do stand out, as these reliably peculiar things. Many things about them (in the whole numbers) are easy to prove. That there are infinitely many, for example, you can prove to a child. And there are many things we have no idea how to prove. That there are infinitely many primes which are exactly two more than another prime, for example. Any child can understand the question. The one who can prove it will win what fame mathematicians enjoy. If it can be proved.

They turn up in strange, surprising places. Just in the whole numbers we find some patches where there are many prime numbers in a row (Forty percent of the numbers 1 through 10!). We can find deserts; we know of a stretch of 1,113,106 numbers in a row without a single prime among them. We know it’s possible to find prime deserts as vast as we want. Say you want a gap between primes of at least size N. Then look at the numbers (N+1)! + 2, (N+1)! + 3, (N+1)! + 4, and so on, up to (N+1)! + N+1. None of those can be prime numbers. You must have a gap at least the size N. It may be larger; how we know that (N+1)! + 1 is a prime number?

No telling. Well, we can check. See if any prime number divides into (N+1)! + 1. This takes a long time to do if N is all that big. There’s no formulas we know that will make this easy or quick.

We don’t call it a “prime number” if it’s in a ring that isn’t enough like the numbers. Fair enough. We shift the name to “prime element”. “Element” is a good generic name for a thing whose identity we don’t mean to pin down too closely. I’ve talked about the Gaussian Primes already, in an earlier essay and earlier in this essay. We can make a ring out of the polynomials whose coefficients are all integers. In that, $x^2 + 1$ is a prime. So is $x^2 - 2$. If this hasn’t given you some ideas what other polynomials might be primes, then you have something else to ponder while trying to sleep. Thinking of all the prime polynomials is likely harder than you can do, though.

Prime numbers seem to stand out, obvious and important. Humans have known about prime numbers for as long as we’ve known about multiplication. And yet there is something obscure about them. If there are cultures completely independent of our own, do they have insights which make prime numbers not such occult figures? How different would the world be if we knew all the things we now wonder about primes?

## The Summer 2017 Mathematics A To Z: Open Set

Today’s glossary entry is another request from Elke Stangl, author of the Elkemental Force blog. I’m hoping this also turns out to be a well-received entry. Half of that is up to you, the kind reader. At least I hope you’re a reader. It’s already gone wrong, as it was supposed to be Friday’s entry. I discovered I hadn’t actually scheduled it while I was too far from my laptop to do anything about that mistake. This spoils the nice Monday-Wednesday-Friday routine of these glossary entries that dates back to the first one I ever posted and just means I have to quit forever and not show my face ever again. Sorry, Ulam Spiral. Someone else will have to think of you.

# Open Set.

Mathematics likes to present itself as being universal truths. And it is. At least if we allow that the rules of logic by which mathematics works are universal. Suppose them to be true and the rest follows. But we start out with intuition, with things we observe in the real world. We’re happy when we can remove the stuff that’s clearly based on idiosyncratic experience. We find something that’s got to be universal.

Sets are pretty abstract things, as mathematicians use the term. They get to be hard to talk about; we run out of simpler words that we can use. A set is … a bunch of things. The things are … stuff that could be in a set, or else that we’d rule out of a set. We can end up better understanding things by drawing a picture. We draw the universe, which is a rectangular block, sometimes with dashed lines as the edges. The set is some blotch drawn on the inside of it. Some shade it in to emphasize which stuff we want in the set. If we need to pick out a couple things in the universe we drop in dots or numerals. If we’re rigorous about the drawing we could create a Venn Diagram.

When we do this, we’re giving up on the pure mathematical abstraction of the set. We’re replacing it with a territory on a map. Several territories, if we have several sets. The territories can overlap or be completely separate. We’re subtly letting our sense of geography, our sense of the spaces in which we move, infiltrate our understanding of sets. That’s all right. It can give us useful ideas. Later on, we’ll try to separate out the ideas that are too bound to geography.

A set is open if whenever you’re in it, you can’t be on its boundary. We never quite have this in the real world, with territories. The border between, say, New Jersey and New York becomes this infinitesimally slender thing, as wide in space as midnight is in time. But we can, with some effort, imagine the state. Imagine being as tiny in every direction as the border between two states. Then we can imagine the difference between being on the border and being away from it.

And not being on the border matters. If we are not on the border we can imagine the problem of getting to the border. Pick any direction; we can move some distance while staying inside the set. It might be a lot of distance, it might be a tiny bit. But we stay inside however we might move. If we are on the border, then there’s some direction in which any movement, however small, drops us out of the set. That’s a difference in kind between a set that’s open and a set that isn’t.

I say “a set that’s open and a set that isn’t”. There are such things as closed sets. A set doesn’t have to be either open or closed. It can be neither, a set that includes some of its borders but not other parts of it. It can even be both open and closed simultaneously. The whole universe, for example, is both an open and a closed set. The empty set, with nothing in it, is both open and closed. (This looks like a semantic trick. OK, if you’re in the empty set you’re not on its boundary. But you can’t be in the empty set. So what’s going on? … The usual. It makes other work easier if we call the empty set ‘open’. And the extra work we’d have to do to rule out the empty set doesn’t seem to get us anything interesting. So we accept what might be a trick.) The definitions of ‘open’ and ‘closed’ don’t exclude one another.

I’m not sure how this confusing state of affairs developed. My hunch is that the words ‘open’ and ‘closed’ evolved independent of each other. Why do I think this? An open set has its openness from, well, not containing its boundaries; from the inside there’s always a little more to it. A closed set has its closedness from sequences. That is, you can consider a string of points inside a set. Are these points leading somewhere? Is that point inside your set? If a string of points always leads to somewhere, and that somewhere is inside the set, then you have closure. You have a closed set. I’m not sure that the terms were derived with that much thought. But it does explain, at least in terms a mathematician might respect, why a set that isn’t open isn’t necessarily closed.

Back to open sets. What does it mean to not be on the boundary of the set? How do we know if we’re on it? We can define sets by all sorts of complicated rules: complex-valued numbers of size less than five, say. Rational numbers whose denominator (in lowest form) is no more than ten. Points in space from which a satellite dropped would crash into the moon rather than into the Earth or Sun. If we have an idea of distance we could measure how far it is from a point to the nearest part of the boundary. Do we need distance, though?

No, it turns out. We can get the idea of open sets without using distance. Introduce a neighborhood of a point. A neighborhood of a point is an open set that contains that point. It doesn’t have to be small, but that’s the connotation. And we get to thinking of little N-balls, circle or sphere-like constructs centered on the target point. It doesn’t have to be N-balls. But we think of them so much that we might as well say it’s necessary. If every point in a set has a neighborhood around it that’s also inside the set, then the set’s open.

You’re going to accuse me of begging the question. Fair enough. I was using open sets to define open sets. This use is all right for an intuitive idea of what makes a set open, but it’s not rigorous. We can give in and say we have to have distance. Then we have N-balls and we can build open sets out of balls that don’t contain the edges. Or we can try to drive distance out of our idea of open sets.

We can do it this way. Start off by saying the whole universe is an open set. Also that the union of any number of open sets is also an open set. And that the intersection of any finite number of open sets is also an open set. Does this sound weak? So it sounds weak. It’s enough. We get the open sets we were thinking of all along from this.

This works for the sets that look like territories on a map. It also works for sets for which we have some idea of distance, however strange it is to our everyday distances. It even works if we don’t have any idea of distance. This lets us talk about topological spaces, and study what geometry looks like if we can’t tell how far apart two points are. We can, for example, at least tell that two points are different. Can we find a neighborhood of one that doesn’t contain the other? Then we know they’re some distance apart, even without knowing what distance is.

That we reached so abstract an idea of what an open set is without losing the idea’s usefulness suggests we’re doing well. So we are. It also shows why Nicholas Bourbaki, the famous nonexistent mathematician, thought set theory and its related ideas were the core of mathematics. Today category theory is a more popular candidate for the core of mathematics. But set theory is still close to the core, and much of analysis is about what we can know from the fact of sets being open. Open sets let us explain a lot.

## The Summer 2017 Mathematics A To Z: N-Sphere/N-Ball

Today’s glossary entry is a request from Elke Stangl, author of the Elkemental Force blog, which among other things has made me realize how much there is interesting to say about heat pumps. Well, you never know what’s interesting before you give it serious thought.

# N-Sphere/N-Ball.

I’ll start with space. Mathematics uses a lot of spaces. They’re inspired by geometry, by the thing that fills up our room. Sometimes we make them different by simplifying them, by thinking of the surface of a table, or what geometry looks like along a thread. Sometimes we make them bigger, imagining a space with more directions than we have. Sometimes we make them very abstract. We realize that we can think of polynomials, or functions, or shapes as if they were points in space. We can describe things that work like distance and direction and angle that work for these more abstract things.

What are useful things we know about space? Many things. Whole books full of things. Let me pick one of them. Start with a point. Suppose we have a sense of distance, of how far one thing is from one another. Then we can have an idea of the neighborhood. We can talk about some chunk of space that’s near our starting point.

So let’s agree on a space, and on some point in that space. You give me a distance. I give back to you — well, two obvious choices. One of them is all the points in that space that are exactly that distance from our agreed-on point. We know what this is, at least in the two kinds of space we grow up comfortable with. In three-dimensional space, this is a sphere. A shell, at least, centered around whatever that first point was. In two-dimensional space, on our desktop, it’s a circle. We know it can look a little weird: if we started out in a one-dimensional space, there’d be only two points, one on either side of the original center point. But it won’t look too weird. Imagine a four-dimensional space. Then we can speak of a hypersphere. And we can imagine that as being somehow a ball that’s extremely spherical. Maybe it pokes out of the rendering we try making of it, like a cartoon character falling out of the movie screen. We can imagine a five-dimensional space, or a ten-dimensional one, or something with even more dimensions. And we can conclude there’s a sphere for even that much space. Well, let it.

What are spheres good for? Well, they’re nice familiar shapes. Even if they’re in a weird number of dimensions. They’re useful, too. A lot of what we do in calculus, and in analysis, is about dealing with difficult points. Points where a function is discontinuous. Points where the function doesn’t have a value. One of calculus’s reliable tricks, though, is that we can swap information about the edge of things for information about the interior. We can replace a point with a sphere and find our work is easier.

The other thing I could give you. It’s a ball. That’s all the points that aren’t more than your distance away from our point. It’s the inside, the whole planet rather than just the surface of the Earth.

And here’s an ambiguity. Is the surface a part of the ball? Should we include the edge, or do we just want the inside? And that depends on what we want to do. Either might be right. If we don’t need the edge, then we have an open set (stick around for Friday). This gives us the open ball. If we do need the edge, then we have a closed set, and so, the closed ball.

Balls are so useful. Take a chunk of space that you find interesting for whatever reason. We can represent that space as the joining together (the “union”) of a bunch of balls. Probably not all the same size, but that’s all right. We might need infinitely many of these balls to get the chunk precisely right, or as close to right as can be. But that’s all right. We can still do it. Most anything we want to analyze is easier to prove on any one of these balls. And since we can describe the complicated shape as this combination of balls, then we can know things about the whole complicated shape. It’s much the way we can know things about polygons by breaking them into triangles, and showing things are true about triangles.

Sphere or ball, whatever you like. We can describe how many dimensions of space the thing occupies with the prefix. The 3-ball is everything close enough to a point that’s in a three-dimensional space. The 2-ball is everything close enough in a two-dimensional space. The 10-ball is everything close enough to a point in a ten-dimensional space. The 3-sphere is … oh, all right. Here we have a little squabble. People doing geometry prefer this to be the sphere in three dimensions. People doing topology prefer this to be the sphere whose surface has three dimensions, that is, the sphere in four dimensions. Usually which you mean will be clear from context: are you reading a geometry or a topology paper? If you’re not sure, oh, look for anything hinting at the number of spatial dimensions. If nothing gives you a hint maybe it doesn’t matter.

Either way, we do want to talk about the family of shapes without committing ourselves to any particular number of dimensions. And so that’s why we fall back on ‘N’. ‘N’ is a good name for “the number of dimensions we’re working in”, and so we use it. Then we have the N-sphere and the N-ball, a sphere-like shape, or a ball-like shape, that’s in however much space we need for the problem.

I mentioned something early on that I bet you paid no attention to. That was that we need a space, and a point inside the space, and some idea of distance. One of the surprising things mathematics teaches us about distance is … there’s a lot of ideas of distance out there. We have what I’ll call an instinctive idea of distance. It’s the one that matches what holding a ruler up to stuff tells us. But we don’t have to have that.

I sense the grumbling already. Yes, sure, we can define distance by some screwball idea, but do we ever need it? To which the mathematician answers, well, what if you’re trying to figure out how far away something in midtown Manhattan is? Where you can only walk along streets or avenues and we pretend Broadway doesn’t exist? Huh? How about that? Oh, fine, the skeptic might answer. Grant that there can be weird cases where the straight-line ruler distance is less enlightening than some other scheme is.

Well, there are. There exists a whole universe of different ideas of distance. There’s a handful of useful ones. The ordinary straight-line ruler one, the Euclidean distance, you get in a method so familiar it’s worth saying what you do. You find the coordinates of your two given points. Take the pairs of corresponding coordinates: the x-coordinates of the two points, the y-coordinates of the two points, the z-coordinates, and so on. Find the differences between corresponding coordinates. Take the absolute value of those differences. Square all those absolute-value differences. Add up all those squares. Take the square root of that. Fine enough.

There’s a lot of novelty acts. For example, do that same thing, only instead of raising the differences to the second power, raise them to the 26th power. When you get the sum, instead of the square root, take the 26th root. There. That’s a legitimate distance. No, you will never need this, but your analysis professor might give you it as a homework problem sometime.

Some are useful, though. Raising to the first power, and then eventually taking the first root, gives us something useful. Yes, raising to a first power and taking a first root isn’t doing anything. We just say we’re doing that for the sake of consistency. Raising to an infinitely large power, and then taking an infinitely great root, inspires angry glares. But we can make that idea rigorous. When we do it gives us something useful.

And here’s a new, amazing thing. We can still make “spheres” for these other distances. On a two-dimensional space, the “sphere” with this first-power-based distance will look like a diamond. The “sphere” with this infinite-power-based distance will look like a square. On a three-dimensional space the “sphere” with the first-power-based distance looks like a … well, more complicated, three-dimensional diamond. The “sphere” with the infinite-power-based distance looks like a box. The “balls” in all these cases look like what you expect from knowing the spheres.

As with the ordinary ideas of spheres and balls these shapes let us understand space. Spheres offer a natural path to understanding difficult points. Balls offer a natural path to understanding complicated shapes. The different ideas of distance change how we represent these, and how complicated they are, but not the fact that we can do it. And it allows us to start thinking of what spheres and balls for more abstract spaces, universes made of polynomials or formed of trig functions, might be. They’re difficult to visualize. But we have the grammar that lets us speak about them now.

And for a postscript: I also wrote about spheres and balls as part of my Set Tour a couple years ago. Here’s the essay about the N-sphere, although I didn’t exactly call it that. And here’s the essay about the N-ball, again not quite called that.

## The Summer 2017 Mathematics A To Z: Morse Theory

Today’s A To Z entry is a change of pace. It dives deeper into analysis than this round has been. The term comes from Mr Wu, of the Singapore Maths Tuition blog, whom I thank for the request.

# Morse Theory.

An old joke, as most of my academia-related ones are. The young scholar says to his teacher how amazing it was in the old days, when people were foolish, and thought the Sun and the Stars moved around the Earth. How fortunate we are to know better. The elder says, ah yes, but what would it look like if it were the other way around?

There are many things to ponder packed into that joke. For one, the elder scholar’s awareness that our ancestors were no less smart or perceptive or clever than we are. For another, the awareness that there is a problem. We want to know about the universe. But we can only know what we perceive now, where we are at this moment. Even a note we’ve written in the past, or a message from a trusted friend, we can’t take uncritically. What we know is that we perceive this information in this way, now. When we pay attention to our friends in the philosophy department we learn that knowledge is even harder than we imagine. But I’ll stop there. The problem is hard enough already.

We can put it in a mathematical form, one that seems immune to many of the worst problems of knowledge. In this form it looks something like this: if what can we know about the universe, if all we really know is what things in that universe are doing near us? The things that we look at are functions. The universe we’re hoping to understand is the domain of the functions. One filter we use to see the universe is Morse Theory.

We don’t look at every possible function. Functions are too varied and weird for that. We look at functions whose range is the real numbers. And they must be smooth. This is a term of art. It means the function has derivatives. It has to be continuous. It can’t have sharp corners. And it has to have lots of derivatives. The first derivative of a smooth function has to also be continuous, and has to also lack corners. And the derivative of that first derivative has to be continuous, and to lack corners. And the derivative of that derivative has to be the same. A smooth function can can differentiate over and over again, infinitely many times. None of those derivatives can have corners or jumps or missing patches or anything. This is what makes it smooth.

Most functions are not smooth, in much the same way most shapes are not circles. That’s all right. There are many smooth functions anyway, and they describe things we find interesting. Or we think they’re interesting, anyway. Smooth functions are easy for us to work with, and to know things about. There’s plenty of smooth functions. If you’re interested in something else there’s probably a smooth function that’s close enough for practical use.

Morse Theory builds on the “critical points” of these smooth functions. A critical point, in this context, is one where the derivative is zero. Derivatives being zero usually signal something interesting going on. Often they show where the function changes behavior. In freshman calculus they signal where a function changes from increasing to decreasing, so the critical point is a maximum. In physics they show where a moving body no longer has an acceleration, so the critical point is an equilibrium. Or where a system changes from one kind of behavior to another. And here — well, many things can happen.

So take a smooth function. And take a critical point that it’s got. (And, erg. Technical point. The derivative of your smooth function, at that critical point, shouldn’t be having its own critical point going on at the same spot. That makes stuff more complicated.) It’s possible to approximate your smooth function near that critical point with, of course, a polynomial. It’s always polynomials. The shape of these polynomials gives you an index for these points. And that can tell you something about the shape of the domain you’re on.

At least, it tells you something about what the shape is where you are. The universal model for this — based on skimming texts and papers and popularizations of this — is of a torus standing vertically. Like a doughnut that hasn’t tipped over, or like a tire on a car that’s working as normal. I suspect this is the best shape to use for teaching, as anyone can understand it while it still shows the different behaviors. I won’t resist.

Imagine slicing this tire horizontally. Slice it close to the bottom, below the central hole, and the part that drops down is a disc. At least, it could be flattened out tolerably well to a disc.

Slice it somewhere that intersects the hole, though, and you have a different shape. You can’t squash that down to a disc. You have a noodle shape. A cylinder at least. That’s different from what you got the first slice.

Slice the tire somewhere higher. Somewhere above the central hole, and you have … well, it’s still a tire. It’s got a hole in it, but you could imagine patching it and driving on. There’s another different shape that we’ve gotten from this.

Imagine we were confined to the surface of the tire, but did not know what surface it was. That we start at the lowest point on the tire and ascend it. From the way the smooth functions around us change we can tell how the surface we’re on has changed. We can see its change from “basically a disc” to “basically a noodle” to “basically a doughnut”. We could work out what the surface we’re on has to be, thanks to how these smooth functions around us change behavior.

Occasionally we mathematical-physics types want to act as though we’re not afraid of our friends in the philosophy department. So we deploy the second thing we know about Immanuel Kant. He observed that knowing the force of gravity falls off as the square of the distance between two things implies that the things should exist in a three-dimensional space. (Source: I dunno, I never read his paper or book or whatever and dunno I ever heard anyone say they did.) It’s a good observation. Geometry tells us what physics can happen, but what physics does happen tells us what geometry they happen in. And it tells the philosophy department that we’ve heard of Immanuel Kant. This impresses them greatly, we tell ourselves.

Morse Theory is a manifestation of how observable physics teaches us the geometry they happen on. And in an urgent way, too. Some of Edward Witten’s pioneering work in superstring theory was in bringing Morse Theory to quantum field theory. He showed a set of problems called the Morse Inequalities gave us insight into supersymmetric quantum mechanics. The link between physics and doughnut-shapes may seem vague. This is because you’re not remembering that mathematical physics sees “stuff happening” as curves drawn on shapes which represent the kind of problem you’re interested in. Learning what the shapes representing the problem look like is solving the problem.

If you’re interested in the substance of this, the universally-agreed reference is J Milnor’s 1963 text Morse Theory. I confess it’s hard going to read, because it’s a symbols-heavy textbook written before the existence of LaTeX. Each page reminds one why typesetters used to get hazard pay, and not enough of it.

## The Summer 2017 Mathematics A To Z: L-function

I’m brought back to elliptic curves today thanks to another request from Gaurish, of the For The Love Of Mathematics blog. Interested in how that’s going to work out? Me too.

So stop me if you’ve heard this one before. We’re going to make something interesting. You bring to it a complex-valued number. Anything you like. Let me call it ‘s’ for the sake of convenience. I know, it’s weird not to call it ‘z’, but that’s how this field of mathematics developed. I’m going to make a series built on this. A series is the sum of all the terms in a sequence. I know, it seems weird for a ‘series’ to be a single number, but that’s how that field of mathematics developed. The underlying sequence? I’ll make it in three steps. First, I start with all the counting numbers: 1, 2, 3, 4, 5, and so on. Second, I take each one of those terms and raise them to the power of your ‘s’. Third, I take the reciprocal of each of them. That’s the sequence. And when we add —

Yes, that’s right, it’s the Riemann-Zeta Function. The one behind the Riemann Hypothesis. That’s the mathematical conjecture that everybody loves to cite as the biggest unsolved problem in mathematics now that we know someone did something about Fermat’s Last Theorem. The conjecture is about what the zeroes of this function are. What values of ‘s’ make this sum equal to zero? Some boring ones. Zero, negative two, negative four, negative six, and so on. It has a lot of non-boring zeroes. All the ones we know of have an ‘s’ with a real part of ½. So far we know of at least 36 billion values of ‘s’ that make this add up to zero. They’re all ½ plus some imaginary number. We conjecture that this isn’t coincidence and all the non-boring zeroes are like that. We might be wrong. But it’s the way I would bet.

Anyone who’d be reading this far into a pop mathematics blog knows something of why the Riemann Hypothesis is interesting. It carries implications about prime numbers. It tells us things about a host of other theorems that are nice to have. Also they know it’s hard to prove. Really, really hard.

Ancient mathematical lore tells us there are a couple ways to solve a really, really hard problem. One is to narrow its focus. Try to find as simple a case of it as you can solve. Maybe a second simple case you can solve. Maybe a third. This could show you how, roughly, to solve the general problem. Not always. Individual cases of Fermat’s Last Theorem are easy enough to solve. You can show that $a^3 + b^3 = c^3$ doesn’t have any non-boring answers where a, b, and c are all positive whole numbers. Same with $a^5 + b^5 = c^5$, though it takes longer. That doesn’t help you with the general $a^n + b^n = c^n$.

There’s another approach. It sounds like the sort of crazy thing Captain Kirk would get away with. It’s to generalize, to make a bigger, even more abstract problem. Sometimes that makes it easier.

For the Riemann-Zeta Function there’s one compelling generalization. It fits into that sequence I described making. After taking the reciprocals of integers-raised-to-the-s-power, multiply each by some number. Which number? Well, that depends on what you like. It could be the same number every time, if you like. That’s boring, though. That’s just the Riemann-Zeta Function times your number. It’s more interesting if what number you multiply by depends on which integer you started with. (Do not let it depend on ‘s’; that’s more complicated than you want.) When you do that? Then you’ve created an L-Function.

Specifically, you’ve created a Dirichlet L-Function. Dirichlet here is Peter Gustav Lejeune Dirichlet, a 19th century German mathematician who got his name on like everything. He did major work on partial differential equations, on Fourier series, on topology, in algebra, and on number theory, which is what we’d call these L-functions. There are other L-Functions, with identifying names such as Artin and Hecke and Euler, which get more directly into group theory. They look much like the Dirichlet L-Function. In building the sequence I described in the top paragraph, they do something else for the second step.

The L-Function is going to look like this:

$L(s) = \sum_{n \ge 1}^{\infty} a_n \cdot \frac{1}{n^s}$

The sigma there means to evaluate the thing that comes after it for each value of ‘n’ starting at 1 and increasing, by 1, up to … well, something infinitely large. The $a_n$ are the numbers you’ve picked. They’re some value that depend on the index ‘n’, but don’t depend on the power ‘s’. This may look funny but it’s a standard way of writing the terms in a sequence.

An L-Function has to meet some particular criteria that I’m not going to worry about here. Look them up before you get too far into your research. These criteria give us ways to classify different L-Functions, though. We can describe them by degree, much as we describe polynomials. We can describe them by signature, part of those criteria I’m not getting into. We can describe them by properties of the extra numbers, the ones in that fourth step that you multiply the reciprocals by. And so on. LMFDB, an encyclopedia of L-Functions, lists eight or nine properties usable for a taxonomy of these things. (The ambiguity is in what things you consider to depend on what other things.)

What makes this interesting? For one, everything that makes the Riemann Hypothesis interesting. The Riemann-Zeta Function is a slice of the L-Functions. But there’s more. They merge into elliptic curves. Every elliptic curve corresponds to some L-Function. We can use the elliptic curve or the L-Function to prove what we wish to show. Elliptic curves are subject to group theory; so, we can bring group theory into these series.

And then it gets deeper. It always does. Go back to that formula for the L-Function like I put in mathematical symbols. I’m going to define a new function. It’s going to look a lot like a polynomial. Well, that L(s) already looked a lot like a polynomial, but this is going to look even more like one.

Pick a number τ. It’s complex-valued. Any number. All that I care is that its imaginary part be positive. In the trade we say that’s “in the upper half-plane”, because we often draw complex-valued numbers as points on a plane. The real part serves as the horizontal and the imaginary part serves as the vertical axis.

Now go back to your L-Function. Remember those $a_n$ numbers you picked? Good. I’m going to define a new function based on them. It looks like this:

$f(\tau) = \sum_{n \ge 1}^{\infty} a_n \left( e^{2 \pi \imath \tau}\right)^n$

You see what I mean about looking like a polynomial? If τ is a complex-valued number, then $e^{2 \pi \imath \tau}$ is just another complex-valued number. If we gave that a new name like ‘z’, this function would look like the sum of constants times z raised to positive powers. We’d never know it was any kind of weird polynomial.

Anyway. This new function ‘f(τ)’ has some properties. It might be something called a weight-2 Hecke eigenform, a thing I am not going to explain without charging someone by the hour. But see the logic here: every elliptic curve matches with some kind of L-Function. Each L-Function matches with some ‘f(τ)’ kind of function. Those functions might or might not be these weight-2 Hecke eigenforms.

So here’s the thing. There was a big hypothesis formed in the 1950s that every rational elliptic curve matches to one of these ‘f(τ)’ functions that’s one of these eigenforms. It’s true. It took decades to prove. You may have heard of it, as the Taniyama-Shimura Conjecture. In the 1990s Wiles and Taylor proved this was true for a lot of elliptic curves, which is what proved Fermat’s Last Theorem after all that time. The rest of it was proved around 2000.

As I said, sometimes you have to make your problem bigger and harder to get something interesting out of it.

I mentioned this above. LMFDB is a fascinating site worth looking at. It’s got a lot of L-Function and Riemann-Zeta function-related materials.

## The Summer 2017 Mathematics A To Z: Jordan Canonical Form

I made a mistake! I thought we had got to the end of the block of A To Z topics suggested by Gaurish, of the For The Love Of Mathematics blog. Not so and, indeed, I wonder if it wouldn’t be a viable writing strategy around here for me to just ask Gaurish to throw out topics and I have two weeks to write about them. I don’t think there’s a single unpromising one in the set.

# Jordan Canonical Form.

Before you ask, yes, this is named for the Camille Jordan.

So this is a thing from algebra. Particularly, linear algebra. And more particularly, matrices. Matrices are so much of linear algebra that you could be forgiven thinking they’re all of linear algebra. The thing is, matrices are a really good way of describing linear transformations. That is, where you take a block of space and stretch it out, or squash it down, or rotate it, or do some combination of these things. And stretching and squashing and rotating is a lot of what you’d ever want to do. Refer to any book on how to draw animated cartoons. The only thing matrices can’t do is have their eyes bug out huge when an attractive region of space walks past.

Thing about a matrix is if you want to do something with it, you’re going to write it as a grid of numbers. It doesn’t have to be a grid of numbers. But about all the matrices anyone does anything with are grids of numbers. And that’s fine. They do an incredible lot of stuff. What’s not fine is that on looking at a huge block of numbers, the mind sees: huh. That’s a big block of numbers. Good luck finding what’s meaningful in them. To help find meaning we have a set of standard forms. We call them “canonical” or “normal” or some other approving term. They rearrange and change the terms in the matrix so that more interesting stuff is more obvious.

Now you’re justified asking: how can we rearrange and change the terms in a matrix without changing what the matrix is? We can get away with doing this because we can show some rearrangements don’t change what we’re interested in. That covers the “how dare we” part of “how”. We do it by using matrix multiplication. You might remember from high school algebra that matrix multiplication is this agonizing process of multiplying every pair of numbers that ever existed together, then adding them all up, and then maybe you multiply something by minus one because you’re thinking of determinants, and it all comes out wrong anyway and you have to do it over? Yeah. Well, matrix multiplication is defined hard because it makes stuff like this work out. So that covers the “by what technique” part of “how”. We start out with some matrix, let me imaginatively name it $A$. And then we find some transformation matrix for which, eh, let’s say $P$ is a good enough name. I’ll say why in a moment. Then we use that matrix and its multiplicative inverse $P^{-1}$. And we evaluate the product $P^{-1} A P$. This won’t just be the same old matrix we started with. Not usually. Promise. But what this will be, if we chose our matrix $P$ correctly, is some new matrix that’s easier to read.

The matrices involved here have to follow some rules. Most important, they’re all going to be square matrices. There’ll be more rules that your linear algebra textbook will tell you. Or your instructor will, after checking the textbook.

So what makes a matrix easy to read? Zeroes. Lots and lots of zeroes. When we have a standardized form of a matrix it’s nearly all zeroes. This is for a good reason: zeroes are easy to multiply stuff by. And they’re easy to add stuff to. And almost everything we do with matrices, as a calculation, is a lot of multiplication and addition of the numbers in the matrix.

What also makes a matrix easy to read? Everything important being on the diagonal. The diagonal is one of the two things you would imagine if you were told “here’s a grid of numbers, pick out the diagonal”. In particular it’s the one that goes from the upper left to the bottom right, that is, row one column one, and row two column two, and row three column three, and so on up to row 86 column 86 (or whatever). If everything is on the diagonal the matrix is incredibly easy to work with. If it can’t all be on the diagonal at least everything should be close to it. As close as possible.

In the Jordan Canonical Form not everything is on the diagonal. I mean, it can be, but you shouldn’t count on that. But everything either will be on the diagonal or else it’ll be one row up from the diagonal. That is, row one column two, row two column three, row 85 column 86. Like that. There’s two other important pieces.

First is the thing in the row above the diagonal will be either 1 or 0. Second is that on the diagonal you’ll have a sequence of all the same number. Like, you’ll get four instances of the number ‘2’ along this string of the diagonal. Third is that you’ll get a 1 above all but the row above first instance of this particular number. Fourth is that you’ll get a 0 in the row above the first instance of this number.

Yeah, that’s fussy to visualize. This is one of those things easiest to show in a picture. A Jordan canonical form is a matrix that looks like this:

 2 1 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 -2 1 0 0 0 0 0 0 0 0 0 0 0 -2

This may have you dazzled. It dazzles mathematicians too. When we have to write a matrix that’s almost all zeroes like this we drop nearly all the zeroes. If we have to write anything we just write a really huge 0 in the upper-right and the lower-left corners.

What makes this the Jordan Canonical Form is that the matrix looks like it’s put together from what we call Jordan Blocks. Look around the diagonals. Here’s the first Jordan Block:

 2 1 0 0 0 2 1 0 0 0 2 1 0 0 0 2

Here’s the second:

 3 1 0 3

Here’s the third:

 4 1 0 0 4 1 0 0 4

Here’s the fourth:

 -1

And here’s the fifth:

 -2 1 0 -2

And we can represent the whole matrix as this might-as-well-be-diagonal thing:

 First Block 0 0 0 0 0 Second Block 0 0 0 0 0 Third Block 0 0 0 0 0 Fourth Block 0 0 0 0 0 Fifth Block

These blocks can be as small as a single number. They can be as big as however many rows and columns you like. Each individual block is some repeated number on the diagonal, and a repeated one in the row above the diagonal. You can call this the “superdiagonal”.

(Mathworld, and Wikipedia, assert that sometimes the row below the diagonal — the “subdiagonal” — gets the 1’s instead of the superdiagonal. That’s fine if you like it that way, and it won’t change any of the real work. I have not seen these subdiagonal 1’s in the wild. But I admit I don’t do a lot of this field and maybe there’s times it’s more convenient.)

Using the Jordan Canonical Form for a matrix is a lot like putting an object in a standard reference pose for photographing. This is a good metaphor. We get a Jordan Canonical Form by matrix multiplication, which works like rotating and scaling volumes of space. You can view the Jordan Canonical Form for a matrix as how you represent the original matrix from a new viewing angle that makes it easy to recognize. And this is why $P$ is not a bad name for the matrix that does this work. We can see all this as “projecting” the matrix we started with into a new frame of reference. The new frame is maybe rotated and stretched and squashed and whatnot, compared to how we started. But it’s as valid a base. Projecting a mathematical object from one frame of reference to another usually involves calculating something that looks like $P^{-1} A P$ so, projection. That’s our name.

Mathematicians will speak of “the” Jordan Canonical Form for a matrix as if there were such a thing. I don’t mean that Jordan Canonical Forms don’t exist. They exist just as much as matrices do. It’s the “the” that misleads. You can put the Jordan Blocks in any order and have as valid, and as useful, a Jordan Canonical Form. But it’s easy to swap the orders of these blocks around — it’s another matrix multiplication, and a blessedly easy one — so it doesn’t matter which form you have. Get any one and you have them all.

I haven’t said anything about what these numbers on the diagonal are. They’re the eigenvalues of the original matrix. I hope that clears things up.

Yeah, not to anyone who didn’t know what a Jordan Canonical Form was to start with. Rather than get into calculations let me go to well-established metaphor. Take a sample of an unknown chemical and set it on fire. Put the light from this through a prism and photograph the spectrum. There will be lines, interruptions in the progress of colors. The locations of those lines and how intense they are tell you what the chemical is made of, and in what proportions. These are much like the eigenvectors and eigenvalues of a matrix. The eigenvectors tell you what the matrix is made of, and the eigenvalues how much of the matrix is those. This stuff gets you very far in proving a lot of great stuff. And part of what makes the Jordan Canonical Form great is that you get the eigenvalues right there in neat order, right where anyone can see them.

So! All that’s left is finding the things. The best way to find the Jordan Canonical Form for a given matrix is to become an instructor for a class on linear algebra and assign it as homework. The second-best way is to give the problem to your TA, who will type it in to Mathematica and return the result. It’s too much work to do most of the time. Almost all the stuff you could learn from having the thing in the Jordan Canonical Form you work out in the process of finding the matrix $P$ that would let you calculate what the Jordan Canonical Form is. And once you had that, why go on?

Where the Jordan Canonical Form shines is in doing proofs about what matrices can do. We can always put a square matrix into a Jordan Canonical Form. So if we want to show something is true about matrices in general, we can show that it’s true for the simpler-to-work-with Jordan Canonical Form. Then show that shifting a matrix to or from the Jordan Canonical Form doesn’t change whether the thing we’re interested in is true. It exists in that strange space: it is quite useful, but never on a specific problem.

Oh, all right. Yes, it’s the same Camille Jordan of the Jordan Curve and also of the Jordan Curve Theorem. That fellow.

## The Summer 2017 Mathematics A To Z: Integration

One more mathematics term suggested by Gaurish for the A-To-Z today, and then I’ll move on to a couple of others. Today’s is a good one.

# Integration.

Stand on the edge of a plot of land. Walk along its boundary. As you walk the edge pay attention. Note how far you walk before changing direction, even in the slightest. When you return to where you started consult your notes. Contained within them is the area you circumnavigated.

If that doesn’t startle you perhaps you haven’t thought about how odd that is. You don’t ever touch the interior of the region. You never do anything like see how many standard-size tiles would fit inside. You walk a path that is as close to one-dimensional as your feet allow. And encoded in there somewhere is an area. Stare at that incongruity and you realize why integrals baffle the student so. They have a deep strangeness embedded in them.

We who do mathematics have always liked integration. They grow, in the western tradition, out of geometry. Given a shape, what is a square that has the same area? There are shapes it’s easy to find the area for, given only straightedge and compass: a rectangle? Easy. A triangle? Just as straightforward. A polygon? If you know triangles then you know polygons. A lune, the crescent-moon shape formed by taking a circular cut out of a circle? We can do that. (If the cut is the right size.) A circle? … All right, we can’t do that, but we spent two thousand years trying before we found that out for sure. And we can do some excellent approximations.

That bit of finding-a-square-with-the-same-area was called “quadrature”. The name survives, mostly in the phrase “numerical quadrature”. We use that to mean that we computed an integral’s approximate value, instead of finding a formula that would get it exactly. The otherwise obvious choice of “numerical integration” we use already. It describes computing the solution of a differential equation. We’re not trying to be difficult about this. Solving a differential equation is a kind of integration, and we need to do that a lot. We could recast a solving-a-differential-equation problem as a find-the-area problem, and vice-versa. But that’s bother, if we don’t need to, and so we talk about numerical quadrature and numerical integration.

Integrals are built on two infinities. This is part of why it took so long to work out their logic. One is the infinity of number; we find an integral’s value, in principle, by adding together infinitely many things. The other is an infinity of smallness. The things we add together are infinitesimally small. That we need to take things, each smaller than any number yet somehow not zero, and in such quantity that they add up to something, seems paradoxical. Their geometric origins had to be merged into that of arithmetic, of algebra, and it is not easy. Bishop George Berkeley made a steady name for himself in calculus textbooks by pointing this out. We have worked out several logically consistent schemes for evaluating integrals. They work, mostly, by showing that we can make the error caused by approximating the integral smaller than any margin we like. This is a standard trick, or at least it is, now that we know it.

That “in principle” above is important. We don’t actually work out an integral by finding the sum of infinitely many, infinitely tiny, things. It’s too hard. I remember in grad school the analysis professor working out by the proper definitions the integral of 1. This is as easy an integral as you can do without just integrating zero. He escaped with his life, but it was a close scrape. He offered the integral of x as a way to test our endurance, without actually doing it. I’ve never made it through that.

But we do integrals anyway. We have tools on our side. We can show, for example, that if a function obeys some common rules then we can use simpler formulas. Ones that don’t demand so many symbols in such tight formation. Ones that we can use in high school. Also, ones we can adapt to numerical computing, so that we can let machines give us answers which are near enough right. We get to choose how near is “near enough”. But then the machines decide how long we’ll have to wait to get that answer.

The greatest tool we have on our side is the Fundamental Theorem of Calculus. Even the name promises it’s the greatest tool we might have. This rule tells us how to connect integrating a function to differentiating another function. If we can find a function whose derivative is the thing we want to integrate, then we have a formula for the integral. It’s that function we found. What a fantastic result.

The trouble is it’s so hard to find functions whose derivatives are the thing we wanted to integrate. There are a lot of functions we can find, mind you. If we want to integrate a polynomial it’s easy. Sine and cosine and even tangent? Yeah. Logarithms? A little tedious but all right. A constant number raised to the power x? Also tedious but doable. A constant number raised to the power x2? Hold on there, that’s madness. No, we can’t do that.

There is a weird grab-bag of functions we can find these integrals for. They’re mostly ones we can find some integration trick for. An integration trick is some way to turn the integral we’re interested in into a couple of integrals we can do and then mix back together. A lot of a Freshman Calculus course is a heap of tricks we’ve learned. They have names like “u-substitution” and “integration by parts” and “trigonometric substitution”. Some of them are really exotic, such as turning a single integral into a double integral because that leads us to something we can do. And there’s something called “differentiation under the integral sign” that I don’t know of anyone actually using. People know of it because Richard Feynman, in his fun memoir What Do You Care What Other People Think: 250 Pages Of How Awesome I Was In Every Situation Ever, mentions how awesome it made him in so many situations. Mathematics, physics, and engineering nerds are required to read this at an impressionable age, so we fall in love with a technique no textbook ever mentions. Sorry.

I’ve written about all this as if we were interested just in areas. We’re not. We like calculating lengths and volumes and, if we dare venture into more dimensions, hypervolumes and the like. That’s all right. If we understand how to calculate areas, we have the tools we need. We can adapt them to as many or as few dimensions as we need. By weighting integrals we can do calculations that tell us about centers of mass and moments of inertial, about the most and least probable values of something, about all quantum mechanics.

As often happens, this powerful tool starts with something anyone might ponder: what size square has the same area as this other shape? And then think seriously about it.

## The Summer 2017 Mathematics A To Z: Height Function (elliptic curves)

I am one letter closer to the end of Gaurish’s main block of requests. They’re all good ones, mind you. This gets me back into elliptic curves and Diophantine equations. I might be writing about the wrong thing.

# Height Function.

My love’s father has a habit of asking us to rate our hobbies. This turned into a new running joke over a family vacation this summer. It’s a simple joke: I shuffled the comparables. “Which is better, Bon Jovi or a roller coaster?” It’s still a good question.

But as genial yet nasty as the spoof is, my love’s father asks natural questions. We always want to compare things. When we form a mathematical construct we look for ways to measure it. There’s typically something. We’ll put one together. We call this a height function.

We start with an elliptic curve. The coordinates of the points on this curve satisfy some equation. Well, there are many equations they satisfy. We pick one representation for convenience. The convenient thing is to have an easy-to-calculate height. We’ll write the equation for the curve as

$y^2 = x^3 + Ax + B$

Here both ‘A’ and ‘B’ are some integers. This form might be unique, depending on whether a slightly fussy condition on prime numbers hold. (Specifically, if ‘p’ is a prime number and ‘p4‘ divides into ‘A’, then ‘p6‘ must not divide into ‘B’. Yes, I know you realized that right away. But I write to a general audience, some of whom are learning how to see these things.) Then the height of this curve is whichever is the larger number, four times the cube of the absolute value of ‘A’, or 27 times the square of ‘B’. I ask you to just run with it. I don’t know the implications of the height function well enough to say why, oh, 25 times the square of ‘B’ wouldn’t do as well. The usual reason for something like that is that some obvious manipulation makes the 27 appear right away, or disappear right away.

This idea of height feeds in to a measure called rank. “Rank” is a term the young mathematician encounters first while learning matrices. It’s the number of rows in a matrix that aren’t equal to some sum or multiple of other rows. That is, it’s how many different things there are among a set. You can see why we might find that interesting. So many topics have something called “rank” and it measures how many different things there are in a set of things. In elliptic curves, the rank is a measure of how complicated the curve is. We can imagine the rational points on the elliptic curve as things generated by some small set of starter points. The starter points have to be of infinite order. Starter points that don’t, don’t count for the rank. Please don’t worry about what “infinite order” means here. I only mention this infinite-order business because if I don’t then something I have to say about two paragraphs from here will sound daft. So, the rank is how many of these starter points you need to generate the elliptic curve. (WARNING: Call them “generating points” or “generators” during your thesis defense.)

There’s no known way of guessing what the rank is if you just know ‘A’ and ‘B’. There are algorithms that can calculate the rank given a particular ‘A’ and ‘B’. But it’s not something like the quadratic formula where you can just do a quick calculation and know what you’re looking for. We don’t even know if the algorithms we have will work for every elliptic curve.

We think that there’s no limit to the height of elliptic curves. We don’t know this. We know there exist curves with ranks as high as 28. They seem to be rare [*]. I don’t know if that’s proven. But we do know there are elliptic curves with rank zero. A lot of them, in fact. (See what I meant two paragraphs back?) These are the elliptic curves that have only finitely many rational points on them.

And there’s a lot of those. There’s a well-respected that the average rank, of all the elliptic curves there are, is ½. It might be. What we have been able to prove is that the average rank is less than or equal to 1.17. Also that it should be larger than zero. So we’re maybe closing in on the ½ conjecture? At least we know something. I admit this essay I’ve started wondering what we do know of elliptic curves.

What do the height, and through it the rank, get us? I worry I’m repeating myself. By themselves they give us families of elliptic curves. Shapes that are similar in a particular and not-always-obvious way. And they feed into the Birch and Swinnerton-Dyer conjecture, which is the hipster’s Riemann Hypothesis. That is, it’s this big, unanswered, important problem that would, if answered, tell us things about a lot of questions that I’m not sure can be concisely explained. At least not why they’re interesting. We know some special cases, at least. Wikipedia tells me nothing’s proved for curves with rank greater than 1. Humanity’s ignorance on this point makes me feel slightly better pondering what I don’t know about elliptic curves.

(There are some other things within the field of elliptic curves called height functions. There’s particularly a height of individual points. I was unsure which height Gaurish found interesting so chose one. The other starts by measuring something different; it views, for example, $\frac{1}{2}$ as having a lower height than does $\frac{51}{101}$, even though the numbers are quite close in value. It develops along similar lines, trying to find classes of curves with similar behavior. And it gets into different unsolved conjectures. We have our ideas about how to think of fields.).

[*] Wikipedia seems to suggest we only know of one, provided by Professor Noam Elkies in 2006, and let me quote it in full. I apologize that it isn’t in the format I suggested at top was standard. Elkies way outranks me academically so we have to do things his way:

$y^2 + xy + y = x^3 - x^2 - 20,067,762,415,575,526,585,033,208,209,338,542,750,930,230,312,178,956,502 x + 34,481,611,795,030,556,467,032,985,690,390,720,374,855,944,359,319,180,361,266,008,296,291,939,448,732,243,429$

I can’t figure how to get WordPress to present that larger. I sympathize. I’m tired just looking at an equation like that. This page lists records of known elliptic curve ranks. I don’t know if the lack of any records more recent than 2006 reflects the page not having been updated or nobody having found a rank-29 curve. I fully accept the field might be more difficult than even doing maintenance on a web page’s content is.

## The Summer 2017 Mathematics A To Z: Gaussian Primes

Once more do I have Gaurish to thank for the day’s topic. (There’ll be two more chances this week, providing I keep my writing just enough ahead of deadline.) This one doesn’t touch category theory or topology.

# Gaussian Primes.

I keep touching on group theory here. It’s a field that’s about what kinds of things can work like arithmetic does. A group is a set of things that you can add together. At least, you can do something that works like adding regular numbers together does. A ring is a set of things that you can add and multiply together.

There are many interesting rings. Here’s one. It’s called the Gaussian Integers. They’re made of numbers we can write as $a + b\imath$, where ‘a’ and ‘b’ are some integers. $\imath$ is what you figure, that number that multiplied by itself is -1. These aren’t the complex-valued numbers, you notice, because ‘a’ and ‘b’ are always integers. But you add them together the way you add complex-valued numbers together. That is, $a + b\imath$ plus $c + d\imath$ is the number $(a + c) + (b + d)\imath$. And you multiply them the way you multiply complex-valued numbers together. That is, $a + b\imath$ times $c + d\imath$ is the number $(a\cdot c - b\cdot d) + (a\cdot d + b\cdot c)\imath$.

We created something that has addition and multiplication. It picks up subtraction for free. It doesn’t have division. We can create rings that do, but this one won’t, any more than regular old integers have division. But we can ask what other normal-arithmetic-like stuff these Gaussian integers do have. For instance, can we factor numbers?

This isn’t an obvious one. No, we can’t expect to be able to divide one Gaussian integer by another. But we can’t expect to divide a regular old integer by another, not and get an integer out of it. That doesn’t mean we can’t factor them. It means we divide the regular old integers into a couple classes. There’s prime numbers. There’s composites. There’s the unit, the number 1. There’s zero. We know prime numbers; they’re 2, 3, 5, 7, and so on. Composite numbers are the ones you get by multiplying prime numbers together: 4, 6, 8, 9, 10, and so on. 1 and 0 are off on their own. Leave them there. We can’t divide any old integer by any old integer. But we can say an integer is equal to this string of prime numbers multiplied together. This gives us a handle by which we can prove a lot of interesting results.

We can do the same with Gaussian integers. We can divide them up into Gaussian primes, Gaussian composites, units, and zero. The words mean what they mean for regular old integers. A Gaussian composite can be factored into the multiples of Gaussian primes. Gaussian primes can’t be factored any further.

If we know what the prime numbers are for regular old integers we can tell whether something’s a Gaussian prime. Admittedly, knowing all the prime numbers is a challenge. But a Gaussian integer $a + b\imath$ will be prime whenever a couple simple-to-test conditions are true. First is if ‘a’ and ‘b’ are both not zero, but $a^2 + b^2$ is a prime number. So, for example, $5 + 4\imath$ is a Gaussian prime.

You might ask, hey, would $-5 - 4\imath$ also be a Gaussian prime? That’s also got components that are integers, and the squares of them add up to a prime number (41). Well-spotted. Gaussian primes appear in quartets. If $a + b\imath$ is a Gaussian prime, so is $-a -b\imath$. And so are $-b + a\imath$ and $b - a\imath$.

There’s another group of Gaussian primes. These are the numbers $a + b\imath$ where either ‘a’ or ‘b’ is zero. Then the other one is, if positive, three more than a whole multiple of four. If it’s negative, then it’s three less than a whole multiple of four. So ‘3’ is a Gaussian prime, as is -3, and as is $3\imath$ and so is $-3\imath$.

This has strange effects. Like, ‘3’ is a prime number in the regular old scheme of things. It’s also a Gaussian prime. But familiar other prime numbers like ‘2’ and ‘5’? Not anymore. Two is equal to $(1 + \imath) \cdot (1 - \imath)$; both of those terms are Gaussian primes. Five is equal to $(2 + \imath) \cdot (2 - \imath)$. There are similar shocking results for 13. But, roughly, the world of composites and prime numbers translates into Gaussian composites and Gaussian primes. In this slightly exotic structure we have everything familiar about factoring numbers.

You might have some nagging thoughts. Like, sure, two is equal to $(1 + \imath) \cdot (1 - \imath)$. But isn’t it also equal to $(1 + \imath) \cdot (1 - \imath) \cdot \imath \cdot (-\imath)$? One of the important things about prime numbers is that every composite number is the product of a unique string of prime numbers. Do we have to give that up for Gaussian integers?

Good nag. But no; the doubt is coming about because you’ve forgotten the difference between “the positive integers” and “all the integers”. If we stick to positive whole numbers then, yeah, (say) ten is equal to two times five and no other combination of prime numbers. But suppose we have all the integers, positive and negative. Then ten is equal to either two times five or it’s equal to negative two times negative five. Or, better, it’s equal to negative one times two times negative one times five. Or suffix times any even number of negative ones.

Remember that bit about separating ‘one’ out from the world of primes and composites? That’s because the number one screws up these unique factorizations. You can always toss in extra factors of one, to taste, without changing the product of something. If we have positive and negative integers to use, then negative one does almost the same trick. We can toss in any even number of extra negative ones without changing the product. This is why we separate “units” out of the numbers. They’re not part of the prime factorization of any numbers.

For the Gaussian integers there are four units. 1 and -1, $\imath$ and $-\imath$. They are neither primes nor composites, and we don’t worry about how they would otherwise multiply the number of factorizations we get.

But let me close with a neat, easy-to-understand puzzle. It’s called the moat-crossing problem. In the regular old integers it’s this: imagine that the prime numbers are islands in a dangerous sea. You start on the number ‘2’. Imagine you have a board that can be set down and safely crossed, then picked up to be put down again. Could you get from the start and go off to safety, which is infinitely far away? If your board is some, fixed, finite length?

No, you can’t. The problem amounts to how big the gap between one prime number and the next largest prime number can be. It turns out there’s no limit to that. That is, you give me a number, as small or as large as you like. I can find some prime number that’s more than your number less than its successor. There are infinitely large gaps between prime numbers.

Gaussian primes, though? Since a Gaussian prime might have nearest neighbors in any direction? Nobody knows. We know there are arbitrarily large gaps. Pick a moat size; we can (eventually) find a Gaussian prime that’s at least that far away from its nearest neighbors. But this does not say whether it’s impossible to get from the smallest Gaussian primes — $1 + \imath$ and its companions $-1 + \imath$ and on — infinitely far away. We know there’s a moat of width 6 separating the origin of things from infinity. We don’t know that there’s bigger ones.

You’re not going to solve this problem. Unless I have more brilliant readers than I know about; if I have ones who can solve this problem then I might be too intimidated to write anything more. But there is surely a pleasant pastime, maybe a charming game, to be made from this. Try finding the biggest possible moats around some set of Gaussian prime island.

Ellen Gethner, Stan Wagon, and Brian Wick’s A Stroll Through the Gaussian Primes describes this moat problem. It also sports some fine pictures of where the Gaussian primes are and what kinds of moats you can find. If you don’t follow the reasoning, you can still enjoy the illustrations.

## The Summer 2017 Mathematics A To Z: Functor

Gaurish gives me another topic for today. I’m now no longer sure whether Gaurish hopes me to become a topology blogger or a category theory blogger. I have the last laugh, though. I’ve wanted to get better-versed in both fields and there’s nothing like explaining something to learn about it.

# Functor.

So, category theory. It’s a foundational field. It talks about stuff that’s terribly abstract. This means it’s powerful, but it can be hard to think of interesting examples. I’ll try, though.

It starts with categories. These have three parts. The first part is a set of things. (There always is.) The second part is a collection of matches between pairs of things in the set. They’re called morphisms. The third part is a rule that lets us combine two morphisms into a new, third one. That is. Suppose ‘a’, ‘b’, and ‘c’ are things in the set. Then there’s a morphism that matches $a \rightarrow b$, and a morphism that matches $b \rightarrow c$. And we can combine them into another morphism that matches $a \rightarrow c$. So we have a set of things, and a set of things we can do with those things. And the set of things we can do is itself a group.

This describes a lot of stuff. Group theory fits seamlessly into this description. Most of what we do with numbers is a kind of group theory. Vector spaces do too. Most of what we do with analysis has vector spaces underneath it. Topology does too. Most of what we do with geometry is an expression of topology. So you see why category theory is so foundational.

Functors enter our picture when we have two categories. Or more. They’re about the ways we can match up categories. But let’s start with two categories. One of them I’ll name ‘C’, and the other, ‘D’. A functor has to match everything that’s in the set of ‘C’ to something that’s in the set of ‘D’.

And it does more. It has to match every morphism between things in ‘C’ to some other morphism, between corresponding things in ‘D’. It’s got to do it in a way that satisfies that combining, too. That is, suppose that ‘f’ and ‘g’ are morphisms for ‘C’. And that ‘f’ and ‘g’ combine to make ‘h’. Then, the functor has to match ‘f’ and ‘g’ and ‘h’ to some morphisms for ‘D’. The combination of whatever ‘f’ matches to and whatever ‘g’ matches to has to be whatever ‘h’ matches to.

This might sound to you like a homomorphism. If it does, I admire your memory or mathematical prowess. Functors are about matching one thing to another in a way that preserves structure. Structure is the way that sets of things can interact. We naturally look for stuff made up of different things that have the same structure. Yes, functors are themselves a category. That is, you can make a brand-new category whose set of things are the functors between two other categories. This is a good spot to pause while the dizziness passes.

There are two kingdoms of functor. You tell them apart by what they do with the morphisms. Here again I’m going to need my categories ‘C’ and ‘D’. I need a morphism for ‘C’. I’ll call that ‘f’. ‘f’ has to match something in the set of ‘C’ to something in the set of ‘C’. Let me call the first something ‘a’, and the second something ‘b’. That’s all right so far? Thank you.

Let me call my functor ‘F’. ‘F’ matches all the elements in ‘C’ to elements in ‘D’. And it matches all the morphisms on the elements in ‘C’ to morphisms on the elmenets in ‘D’. So if I write ‘F(a)’, what I mean is look at the element ‘a’ in the set for ‘C’. Then look at what element in the set for ‘D’ the functor matches with ‘a’. If I write ‘F(b)’, what I mean is look at the element ‘b’ in the set for ‘C’. Then pick out whatever element in the set for ‘D’ gets matched to ‘b’. If I write ‘F(f)’, what I mean is to look at the morphism ‘f’ between elements in ‘C’. Then pick out whatever morphism between elements in ‘D’ that that gets matched with.

Here’s where I’m going with this. Suppose my morphism ‘f’ matches ‘a’ to ‘b’. Does the functor of that morphism, ‘F(f)’, match ‘F(a)’ to ‘F(b)’? Of course, you say, what else could it do? And the answer is: why couldn’t it match ‘F(b)’ to ‘F(a)’?

No, it doesn’t break everything. Not if you’re consistent about swapping the order of the matchings. The normal everyday order, the one you’d thought couldn’t have an alternative, is a “covariant functor”. The crosswise order, this second thought, is a “contravariant functor”. Covariant and contravariant are distinctions that weave through much of mathematics. They particularly appear through tensors and the geometry they imply. In that introduction they tend to be difficult, even mean, creations, since in regular old Euclidean space they don’t mean anything different. They’re different for non-Euclidean spaces, and that’s important and valuable. The covariant versus contravariant difference is easier to grasp here.

Functors work their way into computer science. The avenue here is in functional programming. That’s a method of programming in which instead of the normal long list of commands, you write a single line of code that holds like fourteen “->” symbols that makes the computer stop and catch fire when it encounters a bug. The advantage is that when you have the code debugged it’s quite speedy and memory-efficient. The disadvantage is if you have to alter the function later, it’s easiest to throw everything out and start from scratch, beginning from vacuum-tube-based computing machines. But it works well while it does. You just have to get the hang of it.

## The Summer 2017 Mathematics A To Z: Elliptic Curves

Gaurish, of the For The Love Of Mathematics gives me another subject today. It’s one that isn’t about ellipses. Sad to say it’s also not about elliptic integrals. This is sad to me because I have a cute little anecdote about a time I accidentally gave my class an impossible problem. I did apologize. No, nobody solved it anyway.

# Elliptic Curves.

Elliptic Curves start, of course, with polynomials. Particularly, they’re polynomials with two variables. We call the ‘x’ and ‘y’ because we have no reason to be difficult. They’re of at most third degree. That is, we can have terms like ‘x’ and ‘y2‘ and ‘x2y’ and ‘y3‘. Something with higher powers, like, ‘x4‘ or ‘x2y2‘ — a fourth power, all together — is right out. Doesn’t matter. Start from this and we can do some slick changes of variables so that we can rewrite it to look like this:

$y^2 = x^3 + Ax + B$

Here, ‘A’ and ‘B’ are some numbers that don’t change for this particular curve. Also, we need it to be true that $4A^3 + 27B^2$ doesn’t equal zero. It avoids problems. What we’ll be looking at are coordinates, values of ‘x’ and ‘y’ together which make this equation true. That is, it’s points on the curve. If you pick some real numbers ‘A’ and ‘B’ and draw all the values of ‘x’ and ‘y’ that make the equation true you get … well, there’s different shapes. They all look like those microscope photos of a water drop emerging and falling from a tap, only rotated clockwise ninety degrees.

So. Pick any of these curves that you like. Pick a point. I’m going to name your point ‘P’. Now pick a point once more. I’m going to name that point ‘Q’. Now draw a line from P through Q. Keep drawing it. It’ll cross the original elliptic curve again. And that point is … not actually special. What is special is the reflection of that point. That is, the same x-coordinate, but flip the plus or minus sign for the y-coordinate. (WARNING! Do not call it “the reflection” at your thesis defense! Call it the “conjugate” point. It means “reflection”.) Your elliptic curve will be symmetric around the x-axis. If, say, the point with x-coordinate 4 and y-coordinate 3 is on the curve, so is the point with x-coordinate 4 and y-coordinate -3. So that reflected point is … something special.

This lets us do something wonderful. We can think of this reflected point as the sum of your ‘P’ and ‘Q’. You can ‘add’ any two points on the curve and get a third point. This means we can do something that looks like addition for points on the elliptic curve. And this means the points on this curve are a group, and we can bring all our group-theory knowledge to studying them. It’s a commutative group, too; ‘P’ added to ‘Q’ leads to the same point as ‘Q’ added to ‘P’.

Let me head off some clever thoughts that make fair objections. What if ‘P’ and ‘Q’ are already reflections, so the line between them is vertical? That never touches the original elliptic curve again, right? Yeah, fair complaint. We patch this by saying that there’s one more point, ‘O’, that’s off “at infinity”. Where is infinity? It’s wherever your vertical lines end. Shut up, this can too be made rigorous. In any case it’s a common hack for this sort of problem. When we add that, everything’s nice. The ‘O’ serves the role in this group that zero serves in arithmetic: the sum of point ‘O’ and any point ‘P’ is going to be ‘P’ again.

Second clever thought to head off: what if ‘P’ and ‘Q’ are the same point? There’s infinitely many lines that go through a single point so how do we pick one to find an intersection with the elliptic curve? Huh? If you did that, then we pick the tangent line to the elliptic curve that touches ‘P’, and carry on as before.

There’s more. What kind of number is ‘x’? Or ‘y’? I’ll bet that you figured they were real numbers. You know, ordinary stuff. I didn’t say what they were, so left it to our instinct, and that usually runs toward real numbers. Those are what I meant, yes. But we didn’t have to. ‘x’ and ‘y’ could be in other sets of numbers too. They could be complex-valued numbers. They could be just the rational numbers. They could even be part of a finite collection of possible numbers. As the equation $y^2 = x^3 + Ax + B$ is something meaningful (and some technical points are met) we can carry on. The elliptical curves, and the points we “add” on them, might not look like the curves we started with anymore. They might not look like anything recognizable anymore. But the logic continues to hold. We still create these groups out of the points on these lines intersecting a curve.

By now you probably admit this is neat stuff. You may also think: so what? We can take this thing you never thought about, draw points and lines on it, and make it look very loosely kind of like just adding numbers together. Why is this interesting? No appreciation just for the beauty of the structure involved? Well, we live in a fallen world.

It comes back to number theory. The modern study of Diophantine equations grows out of studying elliptic curves on the rational numbers. It turns out the group of points you get for that looks like a finite collection of points with some collection of integers hanging on. How long that collection of numbers is is called the ‘rank’, and there are deep mysteries at work. We know there are elliptic equations that have a rank as big as 28. Nobody knows if the rank can be arbitrary high, though. And I believe we don’t even know if there are any curves with rank of, like, 27, or 25.

Yeah, I’m still sensing skepticism out there. Fine. We’ll go back to the only part of number theory everybody agrees is useful. Encryption. We have roughly the same goals for every encryption scheme. We want it to be easy to encode a message. We want it to be easy to decode the message if you have the key. We want it to be hard to decode the message if you don’t have the key.

Take something inside one of these elliptic curve groups. Especially one that’s got a finite field. Let me call your thing ‘g’. It’s really easy for you, knowing what ‘g’ is and what your field is, to raise it to a power. You can pretty well impress me by sharing the value of ‘g’ raised to some whole number ‘m’. Call that ‘h’.

Why am I impressed? Because if all I know is ‘h’, I have a heck of a time figuring out what ‘g’ is. Especially on these finite field groups there’s no obvious connection between how big ‘h’ is and how big ‘g’ is and how big ‘m’ is. Start with a big enough finite field and you can encode messages in ways that are crazy hard to crack.

We trust. At least, if there are any ways to break the code quickly, nobody’s shared them. And there’s one of those enormous-money-prize awards waiting for someone who does know how to break such a code quickly. (I don’t know which. I’m going by what I expect from people.)

And then there’s fame. These were used to prove Fermat’s Last Theorem. Suppose there are some non-boring numbers ‘a’, ‘b’, and ‘c’, so that for some prime number ‘p’ that’s five or larger, it’s true that $a^p + b^p = c^p$. (We can separately prove Fermat’s Last Theorem for a power that isn’t a prime number, or a power that’s 3 or 4.) Then this implies properties about the elliptic curve:

$y^2 = x(x - a^p)(x + b^p)$

This is a convenient way of writing things since it showcases the ap and bp. It’s equal to:

$y^2 = x^3 + \left(b^p - a^p\right)x^2 + a^p b^p x$

(I was so tempted to leave an arithmetic error in there so I could make sure someone commented.)

If there’s a solution to Fermat’s Last Theorem, then this elliptic equation can’t be modular. I don’t have enough words to explain what ‘modular’ means here. Andrew Wiles and Richard Taylor showed that the equation was modular. So there is no solution to Fermat’s Last Theorem except the boring ones. (Like, where ‘b’ is zero and ‘a’ and ‘c’ equal each other.) And it all comes from looking close at these neat curves, none of which looks like an ellipse.

They’re named elliptic curves because we first noticed them when Carl Jacobi — yes, that Carl Jacobi — while studying the length of arcs of an ellipse. That’s interesting enough on its own. But it is hard. Maybe I could have fit in that anecdote about giving my class an impossible problem after all.

## The Summer 2017 Mathematics A To Z: Diophantine Equations

I have another request from Gaurish, of the For The Love Of Mathematics blog, today. It’s another change of pace.

# Diophantine Equations

A Diophantine equation is a polynomial. Well, of course it is. It’s an equation, or a set of equations, setting one polynomial equal to another. Possibly equal to a constant. What makes this different from “any old equation” is the coefficients. These are the constant numbers that you multiply the variables, your x and y and x2 and z8 and so on, by. To make a Diophantine equation all these coefficients have to be integers. You know one well, because it’s that $x^n + y^n = z^n$ thing that Fermat’s Last Theorem is all about. And you’ve probably seen $ax + by = 1$. It turns up a lot because that’s a line, and we do a lot of stuff with lines.

Diophantine equations are interesting. There are a couple of cases that are easy to solve. I mean, at least that we can find solutions for. $ax + by = 1$, for example, that’s easy to solve. $x^n + y^n = z^n$ it turns out we can’t solve. Well, we can if n is equal to 1 or 2. Or if x or y or z are zero. These are obvious, that is, they’re quite boring. That one took about four hundred years to solve, and the solution was “there aren’t any solutions”. This may convince you of how interesting these problems are. What, from looking at it, tells you that $ax + by = 1$ is simple while $x^n + y^n = z^n$ is (most of the time) impossible?

I don’t know. Nobody really does. There are many kinds of Diophantine equation, all different-looking polynomials. Some of them are special one-off cases, like $x^n + y^n = z^n$. For example, there’s $x^4 + y^4 + z^4 = w^4$ for some integers x, y, z, and w. Leonhard Euler conjectured this equation had only boring solutions. You’ll remember Euler. He wrote the foundational work for every field of mathematics. It turns out he was wrong. It has infinitely many interesting solutions. But the smallest one is $2,682,440^4 + 15,365,639^4 + 18,796,760^4 = 20,615,673^4$ and that one took a computer search to find. We can forgive Euler not noticing it.

Some are groups of equations that have similar shapes. There’s the Fermat’s Last Theorem formula, for example, which is a different equation for every different integer n. Then there’s what we call Pell’s Equation. This one is $x^2 - D y^2 = 1$ (or equals -1), for some counting number D. It’s named for the English mathematician John Pell, who did not discover the equation (even in the Western European tradition; Indian mathematicians were familiar with it for a millennium), did not solve the equation, and did not do anything particularly noteworthy in advancing human understanding of the solution. Pell owes his fame in this regard to Leonhard Euler, who misunderstood Pell’s revising a translation of a book discussing a solution for Pell’s authoring a solution. I confess Euler isn’t looking very good on Diophantine equations.

But nobody looks very good on Diophantine equations. Make up a Diophantine equation of your own. Use whatever whole numbers, positive or negative, that you like for your equation. Use whatever powers of however many variables you like for your equation. So you get something that looks maybe like this:

$7x^2 - 20y + 18y^2 - 38z = 9$

Does it have any solutions? I don’t know. Nobody does. There isn’t a general all-around solution. You know how with a quadratic equation we have this formula where you recite some incantation about “b squared minus four a c” and get any roots that exist? Nothing like that exists for Diophantine equations in general. Specific ones, yes. But they’re all specialties, crafted to fit the equation that has just that shape.

So for each equation we have to ask: is there a solution? Is there any solution that isn’t obvious? Are there finitely many solutions? Are there infinitely many? Either way, can we find all the solutions? And we have to answer them anew. What answers these have? Whether answers are known to exist? Whether answers can exist? We have to discover anew for each kind of equation. Knowing answers for one kind doesn’t help us for any others, except as inspiration. If some trick worked before, maybe it will work this time.

There are a couple usually reliable tricks. Can the equation be rewritten in some way that it becomes the equation for a line? If it can we probably have a good handle on any solutions. Can we apply modulo arithmetic to the equation? If it is, we might be able to reduce the number of possible solutions that the equation has. In particular we might be able to reduce the number of possible solutions until we can just check every case. Can we use induction? That is, can we show there’s some parameter for the equations, and that knowing the solutions for one value of that parameter implies knowing solutions for larger values? And then find some small enough value we can test it out by hand? Or can we show that if there is a solution, then there must be a smaller solution, and smaller yet, until we can either find an answer or show there aren’t any? Sometimes. Not always. The field blends seamlessly into number theory. And number theory is all sorts of problems easy to pose and hard or impossible to solve.

We name these equation after Diophantus of Alexandria, a 3rd century Greek mathematician. His writings, what we have of them, discuss how to solve equations. Not general solutions, the way we might want to solve $ax^2 + bx + c = 0$, but specific ones, like $1x^2 - 5x + 6 = 0$. His books are among those whose rediscovery shaped the rebirth of mathematics. Pierre de Fermat’s scribbled his famous note in the too-small margins of Diophantus’s Arithmetica. (Well, a popular translation.)

But the field predates Diophantus, at least if we look at specific problems. Of course it does. In mathematics, as in life, any search for a source ends in a vast, marshy ambiguity. The field stays vital. If we loosen ourselves to looking at inequalities — $x - Dy^2 < A$, let's say — then we start seeing optimization problems. What values of x and y will make this equation most nearly true? What values will come closest to satisfying this bunch of equations? The questions are about how to find the best possible fit to whatever our complicated sets of needs are. We can't always answer. We keep searching.

## The Summer 2017 Mathematics A To Z: Cohomology

Today’s A To Z topic is another request from Gaurish, of the For The Love Of Mathematics blog. Also part of what looks like a quest to make me become a topology blogger, at least for a little while. It’s going to be exciting and I hope not to faceplant as I try this.

Also, a note about Thomas K Dye, who’s drawn the banner art for this and for the Why Stuff Can Orbit series: the publisher for collections of his comic strip is having a sale this weekend.

# Cohomology.

The word looks intimidating, and faintly of technobabble. It’s less cryptic than it appears. We see parts of it in non-mathematical contexts. In biology class we would see “homology”, the sharing of structure in body parts that look superficially very different. We also see it in art class. The instructor points out that a dog’s leg looks like that because they stand on their toes. What looks like a backward-facing knee is just the ankle, and if we stand on our toes we see that in ourselves. We might see it in chemistry, as many interesting organic compounds differ only in how long or how numerous the boring parts are. The stuff that does work is the same, or close to the same. And this is a hint to what a mathematician means by cohomology. It’s something in shapes. It’s particularly something in how different things might have similar shapes. Yes, I am using a homology in language here.

I often talk casually about the “shape” of mathematical things. Or their “structures”. This sounds weird and abstract to start and never really gets better. We can get some footing if we think about drawing the thing we’re talking about. Could we represent the thing we’re working on as a figure? Often we can. Maybe we can draw a polygon, with the vertices of the shape matching the pieces of our mathematical thing. We get the structure of our thing from thinking about what we can do to that polygon without changing the way it looks. Or without changing the way we can do whatever our original mathematical thing does.

This leads us to homologies. We get them by looking for stuff that’s true even if we moosh up the original thing. The classic homology comes from polyhedrons, three-dimensional shapes. There’s a relationship between the number of vertices, the number of edges, and the number of faces of a polyhedron. It doesn’t change even if you stretch the shape out longer, or squish it down, for that matter slice off a corner. It only changes if you punch a new hole through the middle of it. Or if you plug one up. That would be unsporting. A homology describes something about the structure of a mathematical thing. It might even be literal. Topology, the study of what we know about shapes without bringing distance into it, has the number of holes that go through a thing as a homology. This gets feeling like a comfortable, familiar idea now.

But that isn’t a cohomology. That ‘co’ prefix looks dangerous. At least it looks significant. When the ‘co’ prefix has turned up before it’s meant something is shaped by how it refers to something else. Coordinates aren’t just number lines; they’re collections of number lines that we can use to say where things are. If ‘a’ is a factor of the number ‘x’, its cofactor is the number you multiply ‘a’ by in order to get ‘x’. (For real numbers that’s just x divided by a. For other stuff it might be weirder.) A codomain is a set that a function maps a domain into (and must contain the range, at least). Cosets aren’t just sets; they’re ways we can divide (for example) the counting numbers into odds and evens.

So what’s the ‘co’ part for a homology? I’m sad to say we start losing that comfortable feeling now. We have to look at something we’re used to thinking of as a process as though it were a thing. These things are morphisms: what are the ways we can match one mathematical structure to another? Sometimes the morphisms are easy. We can match the even numbers up with all the integers: match 0 with 0, match 2 with 1, match -6 with -3, and so on. Addition on the even numbers matches with addition on the integers: 4 plus 6 is 10; 2 plus 3 is 5. For that matter, we can match the integers with the multiples of three: match 1 with 3, match -1 with -3, match 5 with 15. 1 plus -2 is -1; 3 plus -6 is -9.

What happens if we look at the sets of matchings that we can do as if that were a set of things? That is, not some human concept like ‘2’ but rather ‘match a number with one-half its value’? And ‘match a number with three times its value’? These can be the population of a new set of things.

And these things can interact. Suppose we “match a number with one-half its value” and then immediately “match a number with three times its value”. Can we do that? … Sure, easily. 4 matches to 2 which goes on to 6. 8 matches to 4 which goes on to 12. Can we write that as a single matching? Again, sure. 4 matches to 6. 8 matches to 12. -2 matches to -3. We can write this as “match a number with three-halves its value”. We’ve taken “match a number with one-half its value” and combined it with “match a number with three times its value”. And it’s given us the new “match a number with three-halves its value”. These things we can do to the integers are themselves things that can interact.

This is a good moment to pause and let the dizziness pass.

It isn’t just you. There is something weird thinking of “doing stuff to a set” as a thing. And we have to get a touch more abstract than even this. We should be all right, but please do not try not to use this to defend your thesis in category theory. Just use it to not look forlorn when talking to your friend who’s defending her thesis in category theory.

Now, we can take this collection of all the ways we can relate one set of things to another. And we can combine this with an operation that works kind of like addition. Some way to “add” one way-to-match-things to another and get a way-to-match-things. There’s also something that works kind of like multiplication. It’s a different way to combine these ways-to-match-things. This forms a ring, which is a kind of structure that mathematicians learn about in Introduction to Not That Kind Of Algebra. There are many constructs that are rings. The integers, for example, are also a ring, with addition and multiplication the same old processes we’ve always used.

And just as we can sort the integers into odds and evens — or into other groupings, like “multiples of three” and “one plus a multiple of three” and “two plus a multiple of three” — so we can sort the ways-to-match-things into new collections. And this is our cohomology. It’s the ways we can sort and classify the different ways to manipulate whatever we started on.

I apologize that this sounds so abstract as to barely exist. I admit we’re far from a nice solid example such as “six”. But the abstractness is what gives cohomologies explanatory power. We depend very little on the specifics of what we might talk about. And therefore what we can prove is true for very many things. It takes a while to get there, is all.

## The Summer 2017 Mathematics A To Z: Benford's Law

Today’s entry in the Summer 2017 Mathematics A To Z is one for myself. I couldn’t post this any later.

# Benford’s Law.

My car’s odometer first read 9 on my final test drive before buying it, in June of 2009. It flipped over to 10 barely a minute after that, somewhere near Jersey Freeze ice cream parlor at what used to be the Freehold Traffic Circle. Ask a Central New Jersey person of sufficient vintage about that place. Its odometer read 90 miles sometime that weekend, I think while I was driving to The Book Garden on Route 537. Ask a Central New Jersey person of sufficient reading habits about that place. It’s still there. It flipped over to 100 sometime when I was driving back later that day.

The odometer read 900 about two months after that, probably while I was driving to work, as I had a longer commute in those days. It flipped over to 1000 a couple days after that. The odometer first read 9,000 miles sometime in spring of 2010 and I don’t remember what I was driving to for that. It flipped over from 9,999 to 10,000 miles several weeks later, as I pulled into the car dealership for its scheduled servicing. Yes, this kind of impressed the dealer that I got there exactly on the round number.

The odometer first read 90,000 in late August of last year, as I was driving to some competitive pinball event in western Michigan. It’s scheduled to flip over to 100,000 miles sometime this week as I get to the dealer for its scheduled maintenance. While cars have gotten to be much more reliable and durable than they used to be, the odometer will never flip over to 900,000 miles. At least I can’t imagine owning it long enough, at my rate of driving the past eight years, that this would ever happen. It’s hard to imagine living long enough for the car to reach 900,000 miles. Thursday or Friday it should flip over to 100,000 miles. The leading digit on the odometer will be 1 or, possibly, 2 for the rest of my association with it.

The point of this little autobiography is this observation. Imagine all the days that I have owned this car, from sometime in June 2009 to whatever day I sell, lose, or replace it. Pick one. What is the leading digit of my odometer on that day? It could be anything from 1 to 9. But it’s more likely to be 1 than it is 9. Right now it’s as likely to be any of the digits. But after this week the chance of ‘1’ being the leading digit will rise, and become quite more likely than that of ‘9’. And it’ll never lose that edge.

This is a reflection of Benford’s Law. It is named, as most mathematical things are, imperfectly. The law-namer was Frank Benford, a physicist, who in 1938 published a paper The Law Of Anomalous Numbers. It confirmed the observation of Simon Newcomb. Newcomb was a 19th century astronomer and mathematician of an exhausting number of observations and developments. Newcomb observed the logarithm tables that anyone who needed to compute referred to often. The earlier pages were more worn-out and dirty and damaged than the later pages. People worked with numbers that start with ‘1’ more than they did numbers starting with ‘2’. And more those that start ‘2’ than start ‘3’. More that start with ‘3’ than start with ‘4’. And on. Benford showed this was not some fluke of calculations. It turned up in bizarre collections of data. The surface areas of rivers. The populations of thousands of United States municipalities. Molecular weights. The digits that turned up in an issue of Reader’s Digest. There is a bias in the world toward numbers that start with ‘1’.

And this is, prima facie, crazy. How can the surface areas of rivers somehow prefer to be, say, 100-199 hectares instead of 500-599 hectares? A hundred is a human construct. (Indeed, it’s many human constructs.) That we think ten is an interesting number is an artefact of our society. To think that 100 is a nice round number and that, say, 81 or 144 are not is a cultural choice. Grant that the digits of street addresses of people listed in American Men of Science — one of Benford’s data sources — have some cultural bias. How can another of his sources, molecular weights, possibly?

The bias sneaks in subtly. Don’t they all? It lurks at the edge of the table of data. The table header, perhaps, where it says “River Name” and “Surface Area (sq km)”. Or at the bottom where it says “Length (miles)”. Or it’s never explicit, because I take for granted people know my car’s mileage is measured in miles.

What would be different in my introduction if my car were Canadian, and the odometer measured kilometers instead? … Well, I’d not have driven the 9th kilometer; someone else doing a test-drive would have. The 90th through 99th kilometers would have come a little earlier that first weekend. The 900th through 999th kilometers too. I would have passed the 99,999th kilometer years ago. In kilometers my car has been in the 100,000s for something like four years now. It’s less absurd that it could reach the 900,000th kilometer in my lifetime, but that still won’t happen.

What would be different is the precise dates about when my car reached its milestones, and the amount of days it spent in the 1’s and the 2’s and the 3’s and so on. But the proportions? What fraction of its days it spends with a 1 as the leading digit versus a 2 or a 5? … Well, that’s changed a little bit. There is some final mile, or kilometer, my car will ever register and it makes a little difference whether that’s 239,000 or 385,000. But it’s only a little difference. It’s the difference in how many times a tossed coin comes up heads on the first 1,000 flips versus the second 1,000 flips. They’ll be different numbers, but not that different.

What’s the difference between a mile and a kilometer? A mile is longer than a kilometer, but that’s it. They measure the same kinds of things. You can convert a measurement in miles to one in kilometers by multiplying by a constant. We could as well measure my car’s odometer in meters, or inches, or parsecs, or lengths of football fields. The difference is what number we multiply the original measurement by. We call this “scaling”.

Whatever we measure, in whatever unit we measure, has to have a leading digit of something. So it’s got to have some chance of starting out with a ‘1’, some chance of starting out with a ‘2’, some chance of starting out with a ‘3’, and so on. But that chance can’t depend on the scale. Measuring something in smaller or larger units doesn’t change the proportion of how often each leading digit is there.

These facts combine to imply that leading digits follow a logarithmic-scale law. The leading digit should be a ‘1’ something like 30 percent of the time. And a ‘2’ about 18 percent of the time. A ‘3’ about one-eighth of the time. And it decreases from there. ‘9’ gets to take the lead a meager 4.6 percent of the time.

Roughly. It’s not going to be so all the time. Measure the heights of humans in meters and there’ll be far more leading digits of ‘1’ than we should expect, as most people are between 1 and 2 meters tall. Measure them in feet and ‘5’ and ‘6’ take a great lead. The law works best when data can sprawl over many orders of magnitude. If we lived in a world where people could as easily be two inches as two hundred feet tall, Benford’s Law would make more accurate predictions about their heights. That something is a mathematical truth does not mean it’s independent of all reason.

For example, the reader thinking back some may be wondering: granted that atomic weights and river areas and populations carry units with them that create this distribution. How do street addresses, one of Benford’s observed sources, carry any unit? Well, street addresses are, at least in the United States custom, a loose measure of distance. The 100 block (for example) of a street is within one … block … from whatever the more important street or river crossing that street is. The 900 block is farther away.

This extends further. Block numbers are proxies for distance from the major cross feature. House numbers on the block are proxies for distance from the start of the block. We have a better chance to see street number 419 than 1419, to see 419 than 489, or to see 419 than to see 1489. We can look at Benford’s Law in the second and third and other minor digits of numbers. But we have to be more cautious. There is more room for variation and quirk events. A block-filling building in the downtown area can take whatever street number the owners think most auspicious. Smaller samples of anything are less predictable.

Nevertheless, Benford’s Law has become famous to forensic accountants the past several decades, if we allow the use of the word “famous” in this context. But its fame is thanks to the economists Hal Varian and Mark Nigrini. They observed that real-world financial data should be expected to follow this same distribution. If they don’t, then there might be something suspicious going on. This is not an ironclad rule. There might be good reasons for the discrepancy. If your work trips are always to the same location, and always for one week, and there’s one hotel it makes sense to stay at, and you always learn you’ll need to make the trips about one month ahead of time, of course the hotel bill will be roughly the same. Benford’s Law is a simple, rough tool, a way to decide what data to scrutinize for mischief. With this in mind I trust none of my readers will make the obvious leading-digit mistake when padding their expense accounts anymore.

Since I’ve done you that favor, anyone out there think they can pick me up at the dealer’s Thursday, maybe Friday? Thanks in advance.

## The Summer 2017 Mathematics A To Z: Arithmetic

And now as summer (United States edition) reaches its closing months I plunge into the fourth of my A To Z mathematics-glossary sequences. I hope I know what I’m doing! Today’s request is one of several from Gaurish, who’s got to be my top requester for mathematical terms and whom I thank for it. It’s a lot easier writing these things when I don’t have to think up topics. Gaurish hosts a fine blog, For the love of Mathematics, which you might consider reading.

# Arithmetic.

Arithmetic is what people who aren’t mathematicians figure mathematicians do all day. I remember in my childhood a Berenstain Bears book about people’s jobs. Its mathematician was an adorable little bear adding up sums on the chalkboard, in an observatory, on the Moon. I liked every part of this. I wouldn’t say it’s the whole reason I became a mathematician but it did made the prospect look good early on.

People who aren’t mathematicians are right. At least, the bulk of what mathematics people do is arithmetic. If we work by volume. Arithmetic is about the calculations we do to evaluate or solve polynomials. And polynomials are everything that humans find interesting. Arithmetic is adding and subtracting, of multiplication and division, of taking powers and taking roots. Arithmetic is changing the units of a thing, and of breaking something into several smaller units, or of merging several smaller units into one big one. Arithmetic’s role in commerce and in finance must overwhelm the higher mathematics. Higher mathematics offers cohomologies and Ricci tensors. Arithmetic offers a budget.

This is old mathematics. There’s evidence of humans twenty thousands of years ago recording their arithmetic computations. My understanding is the evidence is ambiguous and interpretations vary. This seems fair. I assume that humans did such arithmetic then, granting that I do not know how to interpret archeological evidence. The thing is that arithmetic is older than humans. Animals are able to count, to do addition and subtraction, perhaps to do harder computations. (I crib this from The Number Sense:
How the Mind Creates Mathematics
, by Stanislas Daehaene.) We learn it first, refining our rough instinctively developed sense to something rigorous. At least we learn it at the same time we learn geometry, the other branch of mathematics that must predate human existence.

The primality of arithmetic governs how it becomes an adjective. We will have, for example, the “arithmetic progression” of terms in a sequence. This is a sequence of numbers such as 1, 3, 5, 7, 9, and so on. Or 4, 9, 14, 19, 24, 29, and so on. The difference between one term and its successor is the same as the difference between the predecessor and this term. Or we speak of the “arithmetic mean”. This is the one found by adding together all the numbers of a sample and dividing by the number of terms in the sample. These are important concepts, useful concepts. They are among the first concepts we have when we think of a thing. Their familiarity makes them easy tools to overlook.

Consider the Fundamental Theorem of Arithmetic. There are many Fundamental Theorems; that of Algebra guarantees us the number of roots of a polynomial equation. That of Calculus guarantees us that derivatives and integrals are joined concepts. The Fundamental Theorem of Arithmetic tells us that every whole number greater than one is equal to one and only one product of prime numbers. If a number is equal to (say) two times two times thirteen times nineteen, it cannot also be equal to (say) five times eleven times seventeen. This may seem uncontroversial. The budding mathematician will convince herself it’s so by trying to work out all the ways to write 60 as the product of prime numbers. It’s hard to imagine mathematics for which it isn’t true.

But it needn’t be true. As we study why arithmetic works we discover many strange things. This mathematics that we know even without learning is sophisticated. To build a logical justification for it requires a theory of sets and hundreds of pages of tight reasoning. Or a theory of categories and I don’t even know how much reasoning. The thing that is obvious from putting a couple objects on a table and then a couple more is hard to prove.

As we continue studying arithmetic we start to ponder things like Goldbach’s Conjecture, about even numbers (other than two) being the sum of exactly two prime numbers. This brings us into number theory, a land of fascinating problems. Many of them are so accessible you could pose them to a person while waiting in a fast-food line. This befits a field that grows out of such simple stuff. Many of those are so hard to answer that no person knows whether they are true, or are false, or are even answerable.

And it splits off other ideas. Arithmetic starts, at least, with the counting numbers. It moves into the whole numbers and soon all the integers. With division we soon get rational numbers. With roots we soon get certain irrational numbers. A close study of this implies there must be irrational numbers that must exist, at least as much as “four” exists. Yet they can’t be reached by studying polynomials. Not polynomials that don’t already use these exotic irrational numbers. These are transcendental numbers. If we were to say the transcendental numbers were the only real numbers we would be making only a very slight mistake. We learn they exist by thinking long enough and deep enough about arithmetic to realize there must be more there than we realized.

Thought compounds thought. The integers and the rational numbers and the real numbers have a structure. They interact in certain ways. We can look for things that are not numbers, but which follow rules like that for addition and for multiplication. Sometimes even for powers and for roots. Some of these can be strange: polynomials themselves, for example, follow rules like those of arithmetic. Matrices, which we can represent as grids of numbers, can have powers and even something like roots. Arithmetic is inspiration to finding mathematical structures that look little like our arithmetic. We can find things that follow mathematical operations but which don’t have a Fundamental Theorem of Arithmetic.

And there are more related ideas. These are often very useful. There’s modular arithmetic, in which we adjust the rules of addition and multiplication so that we can work with a finite set of numbers. There’s floating point arithmetic, in which we set machines to do our calculations. These calculations are no longer precise. But they are fast, and reliable, and that is often what we need.

So arithmetic is what people who aren’t mathematicians figure mathematicians do all day. And they are mistaken, but not by much. Arithmetic gives us an idea of what mathematics we can hope to understand. So it structures the way we think about mathematics.