What I Learned Doing The End 2016 Mathematics A To Z


The slightest thing I learned in the most recent set of essays is that I somehow slid from the descriptive “End Of 2016” title to the prescriptive “End 2016” identifier for the series. My unscientific survey suggests that most people would agree that we had too much 2016 and would have been better off doing without it altogether. So it goes.

The most important thing I learned about this is I have to pace things better. The A To Z essays have been creeping up in length. I didn’t keep close track of their lengths but I don’t think any of them came in under a thousand words. 1500 words was more common. And that’s fine enough, but at three per week, plus the Reading the Comics posts, that’s 5500 or 6000 words of mathematics alone. And that before getting to my humor blog, which even on a brief week will be a couple thousand words. I understand in retrospect why November and December felt like I didn’t have any time outside the word mines.

I’m not bothered by writing longer essays, mind. I can apparently go on at any length on any subject. And I like the words I’ve been using. My suspicion is between these A To Zs and the Theorem Thursdays over the summer I’ve found a mode for writing pop mathematics that works for me. It’s just a matter of how to balance workloads. The humor blog has gotten consistently better readership, for the obvious reasons (lately I’ve been trying to explain what the story comics are doing), but the mathematics more satisfying. If I should have to cut back on either it’d be the humor blog that gets the cut first.

Another little discovery is that I can swap out equations and formulas and the like for historical discussion. That’s probably a useful tradeoff for most of my readers. And it plays to my natural tendencies. It is very easy to imagine me having gone into history than into mathematics or science. It makes me aware how mediocre my knowledge of mathematics history is, though. For example, several times in the End 2016 A To Z the Crisis of Foundations came up, directly or in passing. But I’ve never read a proper history, not even a basic essay, about the Crisis. I don’t even know of a good description of this important-to-the-field event. Most mathematics history focuses around biographies of a few figures, often cribbed from Eric Temple Bell’s great but unreliable book, or a couple of famous specific incidents. (Newton versus Leibniz, the bridges of Köningsburg, Cantor’s insanity, Gödel’s citizenship exam.) Plus Bourbaki.

That’s not enough for someone taking the subject seriously, and I do mean to. So if someone has a suggestion for good histories of, for example, how Fourier series affected mathematicians’ understanding of what functions are, I’d love to know it. Maybe I should set that as a standing open request.

In looking over the subjects I wrote about I find a pretty strong mix of group theory and real analysis. Maybe that shouldn’t surprise. Those are two of the maybe three legs that form a mathematics major’s education. So anyone wanting to understand mathematicians would see this stuff and have questions about it. (There are more things mathematics majors learn, but there are a handful of things almost any mathematics major is sure to spend a year being baffled by.)

The third leg, I’d say, is differential equations. That’s a fantastic field, but it’s hard to describe without equations. Also pictures of what the equations imply. I’ve tended towards essays with few equations and pictures. That’s my laziness. Equations are best written in LaTeX, a typesetting tool that might as well be the standard for mathematicians writing papers and books. While WordPress supports a bit of LaTeX it isn’t quite effortless. That comes back around to balancing my workload. I do that a little better and I can explain solving first-order differential equations by integrating factors. (This is a prank. Nobody has ever needed to solve a first-order differential equation by integrating factors except for mathematics majors being taught the method.) But maybe I could make a go of that.

I’m not setting any particular date for the next A-To-Z, or similar, project. I need some time to recuperate. And maybe some time to think of other running projects that would be fun or educational for me. There’ll be something, though.

The End 2016 Mathematics A To Z Roundup


As is my tradition for the end of these roundups (see Summer 2015 and then Leap Day 2016) I want to just put up a page listing the whole set of articles. It’s a chance for people who missed a piece to easily see what they missed. And it lets me recover that little bit extra from the experience. Run over the past two months were:

The End 2016 Mathematics A To Z: Zermelo-Fraenkel Axioms


gaurish gave me a choice for the Z-term to finish off the End 2016 A To Z. I appreciate it. I’m picking the more abstract thing because I’m not sure that I can explain zero briefly. The foundations of mathematics are a lot easier.

Zermelo-Fraenkel Axioms

I remember the look on my father’s face when I asked if he’d tell me what he knew about sets. He misheard what I was asking about. When we had that straightened out my father admitted that he didn’t know anything particular. I thanked him and went off disappointed. In hindsight, I kind of understand why everyone treated me like that in middle school.

My father’s always quick to dismiss how much mathematics he knows, or could understand. It’s a common habit. But in this case he was probably right. I knew a bit about set theory as a kid because I came to mathematics late in the “New Math” wave. Sets were seen as fundamental to why mathematics worked without being so exotic that kids couldn’t understand them. Perhaps so; both my love and I delighted in what we got of set theory as kids. But if you grew up before that stuff was popular you probably had a vague, intuitive, and imprecise idea of what sets were. Mathematicians had only a vague, intuitive, and imprecise idea of what sets were through to the late 19th century.

And then came what mathematics majors hear of as the Crisis of Foundations. (Or a similar name, like Foundational Crisis. I suspect there are dialect differences here.) It reflected mathematics taking seriously one of its ideals: that everything in it could be deduced from clearly stated axioms and definitions using logically rigorous arguments. As often happens, taking one’s ideals seriously produces great turmoil and strife.

Before about 1900 we could get away with saying that a set was a bunch of things which all satisfied some description. That’s how I would describe it to a new acquaintance if I didn’t want to be treated like I was in middle school. The definition is fine if we don’t look at it too hard. “The set of all roots of this polynomial”. “The set of all rectangles with area 2”. “The set of all animals with four-fingered front paws”. “The set of all houses in Central New Jersey that are yellow”. That’s all fine.

And then if we try to be logically rigorous we get problems. We always did, though. They’re embodied by ancient jokes like the person from Crete who declared that all Cretans always lie; is the statement true? Or the slightly less ancient joke about the barber who shaves only the men who do not shave themselves; does he shave himself? If not jokes these should at least be puzzles faced in fairy-tale quests. Logicians dressed this up some. Bertrand Russell gave us the quite respectable “The set consisting of all sets which are not members of themselves”, and asked us to stare hard into that set. To this we have only one logical response, which is to shout, “Look at that big, distracting thing!” and run away. This satisfies the problem only for a while.

The while ended in — well, that took a while too. But between 1908 and the early 1920s Ernst Zermelo, Abraham Fraenkel, and Thoralf Skolem paused from arguing whose name would also be the best indie rock band name long enough to put set theory right. Their structure is known as Zermelo-Fraenkel Set Theory, or ZF. It gives us a reliable base for set theory that avoids any contradictions or catastrophic pitfalls. Or does so far as we have found in a century of work.

It’s built on a set of axioms, of course. Most of them are uncontroversial, things like declaring two sets are equivalent if they have the same elements. Declaring that the union of sets is itself a set. Obvious, sure, but it’s the obvious things that we have to make axioms. Maybe you could start an argument about whether we should just assume there exists some infinitely large set. But if we’re aware sets probably have something to teach us about numbers, and that numbers can get infinitely large, then it seems fair to suppose that there must be some infinitely large set. The axioms that aren’t simple obvious things like that are too useful to do without. They assume stuff like that no set is an element of itself. Or that every set has a “power set”, a new set comprised of all the subsets of the original set. Good stuff to know.

There is one axiom that’s controversial. Not controversial the way Euclid’s Parallel Postulate was. That’s the ugly one about lines crossing another line meeting on the same side they make angles smaller than something something or other. That axiom was controversial because it read so weird, so needlessly complicated. (It isn’t; it’s exactly as complicated as it must be. Or better, it’s as simple as it could possibly be and still be useful.) The controversial axiom of Zermelo-Fraenkel Set Theory is known as the Axiom of Choice. It says if we have a collection of mutually disjoint sets, each with at least one thing in them, then it’s possible to pick exactly one item from each of the sets.

It’s impossible to dispute this is what we have axioms for. It’s about something that feels like it should be obvious: we can always pick something from a set. How could this not be true?

If it is true, though, we get some unsavory conclusions. For example, it becomes possible to take a ball the size of an orange and slice it up. We slice using mathematical blades. They’re not halted by something as petty as the desire not to slice atoms down the middle. We can reassemble the pieces. Into two balls. And worse, it doesn’t require we do something like cut the orange into infinitely many pieces. We expect crazy things to happen when we let infinities get involved. No, though, we can do this cut-and-duplicate thing by cutting the orange into five pieces. When you hear that it’s hard to know whether to point to the big, distracting thing and run away. If we dump the Axiom of Choice we don’t have that problem. But can we do anything useful without the ability to make a choice like that?

And we’ve learned that we can. If we want to use the Zermelo-Fraenkel Set Theory with the Axiom of Choice we say we were working in “ZFC”, Zermelo-Fraenkel-with-Choice. We don’t have to. If we don’t want to make any assumption about choices we say we’re working in “ZF”. Which to use depends on what one wants to use.

Either way Zermelo and Fraenkel and Skolem established set theory on the foundation we use to this day. We’re not required to use them, no; there’s a construction called von Neumann-Bernays-Gödel Set Theory that’s supposed to be more elegant. They didn’t mention it in my logic classes that I remember, though.

And still there’s important stuff we would like to know which even ZFC can’t answer. The most famous of these is the continuum hypothesis. Everyone knows — excuse me. That’s wrong. Everyone who would be reading a pop mathematics blog knows there are different-sized infinitely-large sets. And knows that the set of integers is smaller than the set of real numbers. The question is: is there a set bigger than the integers yet smaller than the real numbers? The Continuum Hypothesis says there is not.

Zermelo-Fraenkel Set Theory, even though it’s all about the properties of sets, can’t tell us if the Continuum Hypothesis is true. But that’s all right; it can’t tell us if it’s false, either. Whether the Continuum Hypothesis is true or false stands independent of the rest of the theory. We can assume whichever state is more useful for our work.

Back to the ideals of mathematics. One question that produced the Crisis of Foundations was consistency. How do we know our axioms don’t contain a contradiction? It’s hard to say. Typically a set of axioms we can prove consistent are also a set too boring to do anything useful in. Zermelo-Fraenkel Set Theory, with or without the Axiom of Choice, has a lot of interesting results. Do we know the axioms are consistent?

No, not yet. We know some of the axioms are mutually consistent, at least. And we have some results which, if true, would prove the axioms to be consistent. We don’t know if they’re true. Mathematicians are generally confident that these axioms are consistent. Mostly on the grounds that if there were a problem something would have turned up by now. It’s withstood all the obvious faults. But the universe is vaster than we imagine. We could be wrong.

It’s hard to live up to our ideals. After a generation of valiant struggling we settle into hoping we’re doing good enough. And waiting for some brilliant mind that can get us a bit closer to what we ought to be.

The End 2016 Mathematics A To Z: Yang Hui’s Triangle


Today’s is another request from gaurish and another I’m glad to have as it let me learn things too. That’s a particularly fun kind of essay to have here.

Yang Hui’s Triangle.

It’s a triangle. Not because we’re interested in triangles, but because it’s a particularly good way to organize what we’re doing and show why we do that. We’re making an arrangement of numbers. First we need cells to put the numbers in.

Start with a single cell in what’ll be the top middle of the triangle. It spreads out in rows beneath that. The rows are staggered. The second row has two cells, each one-half width to the side of the starting one. The third row has three cells, each one-half width to the sides of the row above, so that its center cell is directly under the original one. The fourth row has four cells, two of which are exactly underneath the cells of the second row. The fifth row has five cells, three of them directly underneath the third row’s cells. And so on. You know the pattern. It’s the one that pins in a plinko board take. Just trimmed down to a triangle. Make as many rows as you find interesting. You can always add more later.

In the top cell goes the number ‘1’. There’s also a ‘1’ in the leftmost cell of each row, and a ‘1’ in the rightmost cell of each row.

What of interior cells? The number for those we work out by looking to the row above. Take the cells to the immediate left and right of it. Add the values of those together. So for example the center cell in the third row will be ‘1’ plus ‘1’, commonly regarded as ‘2’. In the third row the leftmost cell is ‘1’; it always is. The next cell over will be ‘1’ plus ‘2’, from the row above. That’s ‘3’. The cell next to that will be ‘2’ plus ‘1’, a subtly different ‘3’. And the last cell in the row is ‘1’ because it always is. In the fourth row we get, starting from the left, ‘1’, ‘4’, ‘6’, ‘4’, and ‘1’. And so on.

It’s a neat little arithmetic project. It has useful application beyond the joy of making something neat. Many neat little arithmetic projects don’t have that. But the numbers in each row give us binomial coefficients, which we often want to know. That is, if we wanted to work out (a + b) to, say, the third power, we would know what it looks like from looking at the fourth row of Yanghui’s Triangle. It will be 1\cdot a^4 + 4\cdot a^3 \cdot b^1 + 6\cdot a^2\cdot b^2 + 4\cdot a^1\cdot b^3 + 1\cdot b^4 . This turns up in polynomials all the time.

Look at diagonals. By diagonal here I mean a line parallel to the line of ‘1’s. Left side or right side; it doesn’t matter. Yang Hui’s triangle is bilaterally symmetric around its center. The first diagonal under the edges is a bit boring but familiar enough: 1-2-3-4-5-6-7-et cetera. The second diagonal is more curious: 1-3-6-10-15-21-28 and so on. You’ve seen those numbers before. They’re called the triangular numbers. They’re the number of dots you need to make a uniformly spaced, staggered-row triangle. Doodle a bit and you’ll see. Or play with coins or pool balls.

The third diagonal looks more arbitrary yet: 1-4-10-20-35-56-84 and on. But these are something too. They’re the tetrahedronal numbers. They’re the number of things you need to make a tetrahedron. Try it out with a couple of balls. Oranges if you’re bored at the grocer’s. Four, ten, twenty, these make a nice stack. The fourth diagonal is a bunch of numbers I never paid attention to before. 1-5-15-35-70-126-210 and so on. This is — well. We just did tetrahedrons, the triangular arrangement of three-dimensional balls. Before that we did triangles, the triangular arrangement of two-dimensional discs. Do you want to put in a guess what these “pentatope numbers” are about? Sure, but you hardly need to. If we’ve got a bunch of four-dimensional hyperspheres and want to stack them in a neat triangular pile we need one, or five, or fifteen, or so on to make the pile come out neat. You can guess what might be in the fifth diagonal. I don’t want to think too hard about making triangular heaps of five-dimensional hyperspheres.

There’s more stuff lurking in here, waiting to be decoded. Add the numbers of, say, row four up and you get two raised to the third power. Add the numbers of row ten up and you get two raised to the ninth power. You see the pattern. Add everything in, say, the top five rows together and you get the fifth Mersenne number, two raised to the fifth power (32) minus one (31, when we’re done). Add everything in the top ten rows together and you get the tenth Mersenne number, two raised to the tenth power (1024) minus one (1023).

Or add together things on “shallow diagonals”. Start from a ‘1’ on the outer edge. I’m going to suppose you started on the left edge, but remember symmetry; it’ll be fine if you go from the right instead. Add to that ‘1’ the number you get by moving one cell to the right and going up-and-right. And then again, go one cell to the right and then one cell up-and-right. And again and again, until you run out of cells. You get the Fibonacci sequence, 1-1-2-3-5-8-13-21-and so on.

We can even make an astounding picture from this. Take the cells of Yang Hui’s triangle. Color them in. One shade if the cell has an odd number, another if the cell has an even number. It will create a pattern we know as the Sierpiński Triangle. (Wacław Sierpiński is proving to be the surprise special guest star in many of this A To Z sequence’s essays.) That’s the fractal of a triangle subdivided into four triangles with the center one knocked out, and the remaining triangles them subdivided into four triangles with the center knocked out, and on and on.

By now I imagine even my most skeptical readers agree this is an interesting, useful mathematical construct. Also that they’re wondering why I haven’t said the name “Blaise Pascal”. The Western mathematical tradition knows of this from Pascal’s work, particularly his 1653 Traité du triangle arithmétique. But mathematicians like to say their work is universal, and independent of the mere human beings who find it. Constructions like this triangle give support to this. Yang lived in China, in the 12th century. I imagine it possible Pascal had hard of his work or been influenced by it, by some chain, but I know of no evidence that he did.

And even if he had, there are other apparently independent inventions. The Avanti Indian astronomer-mathematician-astrologer Varāhamihira described the addition rule which makes the triangle work in commentaries written around the year 500. Omar Khayyám, who keeps appearing in the history of science and mathematics, wrote about the triangle in his 1070 Treatise on Demonstration of Problems of Algebra. Again so far as I am aware there’s not a direct link between any of these discoveries. They are things different people in different traditions found because the tools — arithmetic and aesthetically-pleasing orders of things — were ready for them.

Yang Hui wrote about his triangle in the 1261 book Xiangjie Jiuzhang Suanfa. In it he credits the use of the triangle (for finding roots) as invented around 1100 by mathematician Jia Xian. This reminds us that it is not merely mathematical discoveries that are found by many peoples at many times and places. So is Boyer’s Law, discovered by Hubert Kennedy.

The End 2016 Mathematics A To Z: Xi Function


I have today another request from gaurish, who’s also been good enough to give me requests for ‘Y’ and ‘Z’. I apologize for coming to this a day late. But it was Christmas and many things demanded my attention.

Xi Function.

We start with complex-valued numbers. People discovered them because they were useful tools to solve polynomials. They turned out to be more than useful fictions, if numbers are anything more than useful fictions. We can add and subtract them easily. Multiply and divide them less easily. We can even raise them to powers, or raise numbers to them.

If you become a mathematics major then somewhere in Intro to Complex Analysis you’re introduced to an exotic, infinitely large sum. It’s spoken of reverently as the Riemann Zeta Function, and it connects to something named the Riemann Hypothesis. Then you remember that you’ve heard of this, because if you’re willing to become a mathematics major you’ve read mathematics popularizations. And you know the Riemann Hypothesis is an unsolved problem. It proposes something that might be true or might be false. Either way has astounding implications for the way numbers fit together.

Riemann here is Bernard Riemann, who’s turned up often in these A To Z sequences. We saw him in spheres and in sums, leading to integrals. We’ll see him again. Riemann just covered so much of 19th century mathematics; we can’t talk about calculus without him. Zeta, Xi, and later on, Gamma are the famous Greek letters. Mathematicians fall back on them because the Roman alphabet just hasn’t got enough letters for our needs. I’m writing them out as English words instead because if you aren’t familiar with them they look like an indistinct set of squiggles. Even if you are familiar, sometimes. I got confused in researching this some because I did slip between a lowercase-xi and a lowercase-zeta in my mind. All I can plead is it’s been a hard week.

Riemann’s Zeta function is famous. It’s easy to approach. You can write it as a sum. An infinite sum, but still, those are easy to understand. Pick a complex-valued number. I’ll call it ‘s’ because that’s the standard. Next take each of the counting numbers: 1, 2, 3, and so on. Raise each of them to the power ‘s’. And take the reciprocal, one divided by those numbers. Add all that together. You’ll get something. Might be real. Might be complex-valued. Might be zero. We know many values of ‘s’ what would give us a zero. The Riemann Hypothesis is about characterizing all the possible values of ‘s’ that give us a zero. We know some of them, so boring we call them trivial: -2, -4, -6, -8, and so on. (This looks crazy. There’s another way of writing the Riemann Zeta function which makes it obvious instead.) The Riemann Hypothesis is about whether all the proper, that is, non-boring values of ‘s’ that give us a zero are 1/2 plus some imaginary number.

It’s a rare thing mathematicians have only one way of writing. If something’s been known and studied for a long time there are usually variations. We find different ways to write the problem. Or we find different problems which, if solved, would solve the original problem. The Riemann Xi function is an example of this.

I’m going to spare you the formula for it. That’s in self-defense. I haven’t found an expression of the Xi function that isn’t a mess. The normal ways to write it themselves call on the Zeta function, as well as the Gamma function. The Gamma function looks like factorials, for the counting numbers. It does its own thing for other complex-valued numbers.

That said, I’m not sure what the advantages are in looking at the Xi function. The one that people talk about is its symmetry. Its value at a particular complex-valued number ‘s’ is the same as its value at the number ‘1 – s’. This may not seem like much. But it gives us this way of rewriting the Riemann Hypothesis. Imagine all the complex-valued numbers with the same imaginary part. That is, all the numbers that we could write as, say, ‘x + 4i’, where ‘x’ is some real number. If the size of the value of Xi, evaluated at ‘x + 4i’, always increases as ‘x’ starts out equal to 1/2 and increases, then the Riemann hypothesis is true. (This has to be true not just for ‘x + 4i’, but for all possible imaginary numbers. So, ‘x + 5i’, and ‘x + 6i’, and even ‘x + 4.1 i’ and so on. But it’s easier to start with a single example.)

Or another way to write it. Suppose the size of the value of Xi, evaluated at ‘x + 4i’ (or whatever), always gets smaller as ‘x’ starts out at a negative infinitely large number and keeps increasing all the way to 1/2. If that’s true, and true for every imaginary number, including ‘x – i’, then the Riemann hypothesis is true.

And it turns out if the Riemann hypothesis is true we can prove the two cases above. We’d write the theorem about this in our papers with the start ‘The Following Are Equivalent’. In our notes we’d write ‘TFAE’, which is just as good. Then we’d take which ever of them seemed easiest to prove and find out it isn’t that easy after all. But if we do get through we declare ourselves fortunate, sit back feeling triumphant, and consider going out somewhere to celebrate. But we haven’t got any of these alternatives solved yet. None of the equivalent ways to write it has helped so far.

We know some some things. For example, we know there are infinitely many roots for the Xi function with a real part that’s 1/2. This is what we’d need for the Riemann hypothesis to be true. But we don’t know that all of them are.

The Xi function isn’t entirely about what it can tell us for the Zeta function. The Xi function has its own exotic and wonderful properties. In a 2009 paper on arxiv.org, for example, Drs Yang-Hui He, Vishnu Jejjala, and Djordje Minic describe how if the zeroes of the Xi function are all exactly where we expect them to be then we learn something about a particular kind of string theory. I admit not knowing just what to say about a genus-one free energy of the topological string past what I have read in this paper. In another paper they write of how the zeroes of the Xi function correspond to the description of the behavior for a quantum-mechanical operator that I just can’t find a way to describe clearly in under three thousand words.

But mathematicians often speak of the strangeness that mathematical constructs can match reality so well. And here is surely a powerful one. We learned of the Riemann Hypothesis originally by studying how many prime numbers there are compared to the counting numbers. If it’s true, then the physics of the universe may be set up one particular way. Is that not astounding?

The End 2016 Mathematics A To Z: Weierstrass Function


I’ve teased this one before.

Weierstrass Function.

So you know how the Earth is a sphere, but from our normal vantage point right up close to its surface it looks flat? That happens with functions too. Here I mean the normal kinds of functions we deal with, ones with domains that are the real numbers or a Euclidean space. And ranges that are real numbers. The functions you can draw on a sheet of paper with some wiggly bits. Let the function wiggle as much as you want. Pick a part of it and zoom in close. That zoomed-in part will look straight. If it doesn’t look straight, zoom in closer.

We rely on this. Functions that are straight, or at least straight enough, are easy to work with. We can do calculus on them. We can do analysis on them. Functions with plots that look like straight lines are easy to work with. Often the best approach to working with the function you’re interested in is to approximate it with an easy-to-work-with function. I bet it’ll be a polynomial. That serves us well. Polynomials are these continuous functions. They’re differentiable. They’re smooth.

That thing about the Earth looking flat, though? That’s a lie. I’ve never been to any of the really great cuts in the Earth’s surface, but I have been to some decent gorges. I went to grad school in the Hudson River Valley. I’ve driven I-80 over Pennsylvania’s scariest bridges. There’s points where the surface of the Earth just drops a great distance between your one footstep and your last.

Functions do that too. We can have points where a function isn’t differentiable, where it’s impossible to define the direction it’s headed. We can have points where a function isn’t continuous, where it jumps from one region of values to another region. Everyone knows this. We can’t dismiss those as abberations not worthy of the name “function”; too many of them are too useful. Typically we handle this by admitting there’s points that aren’t continuous and we chop the function up. We make it into a couple of functions, each stretching from discontinuity to discontinuity. Between them we have continuous region and we can go about our business as before.

Then came the 19th century when things got crazy. This particular craziness we credit to Karl Weierstrass. Weierstrass’s name is all over 19th century analysis. He had that talent for probing the limits of our intuition about basic mathematical ideas. We have a calculus that is logically rigorous because he found great counterexamples to what we had assumed without proving.

The Weierstrass function challenges this idea that any function is going to eventually level out. Or that we can even smooth a function out into basically straight, predictable chunks in-between sudden changes of direction. The function is continuous everywhere; you can draw it perfectly without lifting your pen from paper. But it always looks like a zig-zag pattern, jumping around like it was always randomly deciding whether to go up or down next. Zoom in on any patch and it still jumps around, zig-zagging up and down. There’s never an interval where it’s always moving up, or always moving down, or even just staying constant.

Despite being continuous it’s not differentiable. I’ve described that casually as it being impossible to predict where the function is going. That’s an abuse of words, yes. The function is defined. Its value at a point isn’t any more random than the value of “x2” is for any particular x. The unpredictability I’m talking about here is a side effect of ignorance. Imagine I showed you a plot of “x2” with a part of it concealed and asked you to fill in the gap. You’d probably do pretty well estimating it. The Weierstrass function, though? No; your guess would be lousy. My guess would be lousy too.

That’s a weird thing to have happen. A century and a half later it’s still weird. It gets weirder. The Weierstrass function isn’t differentiable generally. But there are exceptions. There are little dots of differentiability, where the rate at which the function changes is known. Not intervals, though. Single points. This is crazy. Derivatives are about how a function changes. We work out what they should even mean by thinking of a function’s value on strips of the domain. Those strips are small, but they’re still, you know, strips. But on almost all of that strip the derivative isn’t defined. It’s only at isolated points, a set with measure zero, that this derivative even exists. It evokes the medieval Mysteries, of how we are supposed to try, even though we know we shall fail, to understand how God can have contradictory properties.

It’s not quite that Mysterious here. Properties like this challenge our intuition, if we’ve gotten any. Once we’ve laid out good definitions for ideas like “derivative” and “continuous” and “limit” and “function” we can work out whether results like this make sense. And they — well, they follow. We can avoid weird conclusions like this, but at the cost of messing up our definitions for what a “function” and other things are. Making those useless. For the mathematical world to make sense, we have to change our idea of what quite makes sense.

That’s all right. When we look close we realize the Earth around us is never flat. Even reasonably flat areas have slight rises and falls. The ends of properties are marked with curbs or ditches, and bordered by streets that rise to a center. Look closely even at the dirt and we notice that as level as it gets there are still rocks and scratches in the ground, clumps of dirt an infinitesimal bit higher here and lower there. The flatness of the Earth around us is a useful tool, but we miss a lot by pretending it’s everything. The Weierstrass function is one of the ways a student mathematician learns that while smooth, predictable functions are essential, there is much more out there.

The End 2016 Mathematics A To Z: Voronoi Diagram


This is one I never heard of before grad school. And not my first year in grad school either; I was pretty well past the point I should’ve been out of grad school before I remember hearing of it, somehow. I can’t explain that.

Voronoi Diagram.

Take a sheet of paper. Draw two dots on it. Anywhere you like. It’s your paper. But here’s the obvious thing: you can divide the paper into the parts of it that are nearer to the first, or that are nearer to the second. Yes, yes, I see you saying there’s also a line that’s exactly the same distance between the two and shouldn’t that be a third part? Fine, go ahead. We’ll be drawing that in anyway. But here we’ve got a piece of paper and two dots and this line dividing it into two chunks.

Now drop in a third point. Now every point on your paper might be closer to the first, or closer to the second, or closer to the third. Or, yeah, it might be on an edge equidistant between two of those points. Maybe even equidistant to all three points. It’s not guaranteed there is such a “triple point”, but if you weren’t picking points to cause trouble there probably is. You get the page divided up into three regions that you say are coming together in a triangle before realizing that no, it’s a Y intersection. Or else the regions are three strips and they don’t come together at all.

What if you have four points … You should get four regions. They might all come together in one grand intersection. Or they might come together at weird angles, two and three regions touching each other. You might get a weird one where there’s a triangle in the center and three regions that go off to the edge of the paper. Or all sorts of fun little abstract flag icons, maybe. It’s hard to say. If we had, say, 26 points all sorts of weird things could happen.

These weird things are Voronoi Diagrams. They’re a partition of some surface. Usually it’s a plane or some well-behaved subset of the plane like a sheet of paper. The partitioning is into polygons. Exactly one of the points you start with is inside each of the polygons. And everything else inside that polygon is nearer to its one contained starting point than it is any other point. All you need for the diagram are your original points and the edges dividing spots between them. But the thing begs to be colored. Give in to it and you have your own, abstract, stained-glass window pattern. So I’m glad to give you some useful mathematics to play with.

Voronoi diagrams turn up naturally whenever you want to divide up space by the shortest route to get something. Sometimes this is literally so. For example, a radio picking up two FM signals will switch to the stronger of the two. That’s what the superheterodyne does. If the two signals are transmitted with equal strength, then the receiver will pick up on whichever the nearer signal is. And unless the other mathematicians who’ve talked about this were just as misinformed, cell phones pick which signal tower to communicate with by which one has the stronger signal. If you could look at what tower your cell phone communicates with as you move around, you would produce a Voronoi diagram of cell phone towers in your area.

Mathematicians hoping to get credit for a good thing may also bring up Dr John Snow’s famous halting of an 1854 cholera epidemic in London. He did this by tracking cholera outbreaks and measuring their proximity to public water pumps. He shut down the water pump at the center of the severest outbreak and the epidemic soon stopped. One could claim this as a triumph for Voronoi diagrams, although Snow can not have had this tool in mind. Georgy Voronoy (yes, the spelling isn’t consistent. Fashions in transliterating Eastern European names — Voronoy was Ukrainian and worked in Warsaw when Poland was part of the Russian Empire — have changed over the years) wasn’t even born until 1868. And it doesn’t require great mathematical insight to look for the things an infected population has in common. But mathematicians need some tales of heroism too. And it isn’t as though we’ve run out of epidemics with sources that need tracking down.

Voronoi diagrams turned out to be useful in my own meager research. I needed to model the flow of a fluid over a whole planet, but could only do so with a modest number of points to represent the whole thing. Scattering points over the planet was easy enough. To represent the fluid over the whole planet as a collection of single values at a couple hundred points required this Voronoi-diagram type division. … Well, it used them anyway. I suppose there might have been other ways. But I’d just learned about them and was happy to find a reason to use them. Anyway, this is the sort of technique often used to turn information about a single point into approximate information about a region.

(And I discover some amusing connections here. Voronoy’s thesis advisor was Andrey Markov, who’s the person being named by “Markov Chains”. You know those as those predictive-word things that are kind of amusing for a while. Markov Chains were part of the tool I used to scatter points over the whole planet. Also, Voronoy’s thesis was On A Generalization Of A Continuous Fraction, so, hi, Gaurish! … And one of Voronoy’s doctoral students was Wacław Sierpiński, famous for fractals and normal numbers.)

Voronoi diagrams have a lot of beauty to them. Some of it is subtle. Take a point inside its polygon and look to a neighboring polygon. Where is the representative point inside that neighbor polygon? … There’s only one place it can be. It’s got to be exactly as far as the original point is from the edge between them, and it’s got to be on the direction perpendicular to the edge between them. It’s where you’d see the reflection of the original point if the border between them were a mirror. And that has to apply to all the polygons and their neighbors.

From there it’s a short step to wondering: imagine you knew the edges. The mirrors. But you don’t know the original points. Could you figure out where the representative points must be to fit that diagram? … Or at least some points where they may be? This is the inverse problem, and it’s how I first encountered them. This inverse problem allows nice stuff like algorithm compression. Remember my description of the result of a Voronoi diagram being a stained glass window image? There’s no reason a stained glass image can’t be quite good, if we have enough points and enough gradations of color. And storing a bunch of points and the color for the region is probably less demanding than storing the color information for every point in the original image.

If we want images. Many kinds of data turn out to work pretty much like pictures, set up right.

The End 2016 Mathematics A To Z: Unlink


This is going to be a fun one. It lets me get into knot theory again.

Unlink.

An unlink is what knot theorists call that heap of loose rubber bands in that one drawer compartment.

The longer way around. It starts with knots. I love knots. If I were stronger on abstract reasoning and weaker on computation I’d have been a knot theorist. At least graph theory anyway. The mathematical idea of a knot is inspired by a string tied together. In making it a mathematical idea we perfect the string. It becomes as thin as a line, though it curves as much as we want. It can stretch out or squash down as much as we want. It slides frictionlessly against itself. Gravity doesn’t make it drop any. This removes the hassles of real-world objects from it. It also means actual strings or yarns or whatever can’t be knots anymore. Only something that’s a loop which closes back on itself can be a knot. The knot you might make in a shoelace, to use an example, could be undone by pushing the tip back through the ‘knot’. Since our mathematical string is frictionless we can do that, effortlessly. We’re left with nothing.

But you can create a pretty good approximation to a mathematical knot if you have some kind of cable that can be connected to its own end. Loop the thing around as you like, connect end to end, and you’ve got it. I recommend the glow sticks sold for people to take to parties or raves or the like. They’re fun. If you tie it up so that the string (rope, glow stick, whatever) can’t spread out into a simple O shape no matter how you shake it up (short of breaking the cable) then you have a knot. There are many of them. Trefoil knots are probably the easiest to get, but if you’re short on inspiration try looking at Celtic knot patterns.

But if the string can be shaken out until it’s a simple O shape, the sort of thing you can place flat on a table, then you have an unknot. Just from the vocabulary this you see why I like the subject so. Since this hasn’t quite got silly enough, let me assure you that an unknot is itself a kind of knot; we call it the trivial knot. It’s the knot that’s too simple to be a knot. I’m sure you were worried about that. I only hear people call it an unknot, but maybe there are heritages that prefer “trivial knot”.

So that’s knots. What happens if you have more than one thing, though? What if you have a couple of string-loops? Several cables. We know these things can happen in the real world, since we’ve looked behind the TV set or the wireless router and we know there’s somehow more cables there than there are even things to connect.

Even mathematicians wouldn’t want to ignore something that caught up with real world implications. And we don’t. We get to them after we’re pretty comfortable working with knots. Describing them, working out the theoretical tools we’d use to un-knot a proper knot (spoiler: we cut things), coming up with polynomials that describe them, that sort of thing. When we’re ready for a new trick there we consider what happens if we have several knots. We call this bundle of knots a “link”. Well, what would you call it?

A link is a collection of knots. By talking about a link we expect that at least some of the knots are going to loop around each other. This covers a lot of possibilities. We could picture one of those construction-paper chains, made of intertwined loops, that are good for elementary school craft projects to be a link. We can picture a keychain with a bunch of keys dangling from it to be a link. (Imagine each key is a knot, just made of a very fat, metal “string”. C’mon, you can give me that.) The mass of cables hiding behind the TV stand is not properly a link, since it’s not properly made out of knots. But if you can imagine taking the ends of each of those wires and looping them back to the origins, then the somehow vaster mess you get from that would be a link again.

And then we come to an “unlink”. This has two pieces. The first is that it’s a collection of knots, yes, but knots that don’t interlink. We can pull them apart without any of them tugging the others along. The second piece is that each of the knots is itself an unknot. Trivial knots. Whichever you like to call them.

The “unlink” also gets called the “trivial link”, since it’s as boring a link as you can imagine. Manifested in the real world, well, an unbroken rubber band is a pretty good unknot. And a pile of unbroken rubber bands will therefore be an unlink.

If you get into knot theory you end up trying to prove stuff about complicated knots, and complicated links. Often these are easiest to prove by chopping up the knot or the link into something simpler. Maybe you chop those smaller pieces up again. And you can’t get simpler than an unlink. If you can prove whatever you want to show for that then you’ve got a step done toward proving your whole actually interesting thing. This is why we see unknots and unlinks enough to give them names and attention.

The End 2016 Mathematics A To Z: Tree


Graph theory begins with a beautiful legend. I have no reason to suppose it’s false, except my natural suspicion of beautiful legends as origin stories. Its organization as a field is traced to 18th century Köningsburg, where seven bridges connected the banks of a river and a small island in the center. Whether it was possible to cross each bridge exactly once and get back where one started was, they say, a pleasant idle thought to ponder and path to try walking. Then Leonhard Euler solved the problem. It’s impossible.

Tree.

Graph theory arises whenever we have a bunch of things that can be connected. We call the things “vertices”, because that’s a good corner-type word. The connections we call “edges”, because that’s a good connection-type word. It’s easy to create graphs that look like the edges of a crystal, especially if you draw edges as straight as much as possible. You don’t have to. You can draw them curved. Then they look like the scary tangles of wire around your wireless router complex.

Graph theory really got organized in the 19th century, and went crazy in the 20th. It turns out there’s lots of things that connect to other things. Networks, whether computers or social or thematically linked concepts. Anything that has to be delivered from one place to another. All the interesting chemicals. Anything that could be put in a pipe or taken on a road has some graph theory thing applicable to it.

A lot of graph theory ponders loops. The original problem was about how to use every bridge, every edge, exactly one time. Look at a tangled mass of a graph and it’s hard not to start looking for loops. They’re often interesting. It’s not easy to tell if there’s a loop that lets you get to every vertex exactly once.

What if there aren’t loops? What if there aren’t any vertices you can step away from and get back to by another route? Well, then you have a tree.

A tree’s a graph where all the vertices are connected so that there aren’t any closed loops. We normally draw them with straight lines, the better to look like actual trees. We then stop trying to make them look like actual trees by doing stuff like drawing them as a long horizontal spine with a couple branches sticking off above and below, or as * type stars, or H shapes. They still correspond to real-world things. If you’re not sure how consider the layout of one of those long, single-corridor hallways as in a hotel or dormitory. The rooms connect to one another as a tree once again, as long as no room opens to anything but its own closet or bathroom or the central hallway.

We can talk about the radius of a graph. That’s how many edges away any point can be from the center of the tree. And every tree has a center. Or two centers. If it has two centers they share an edge between the two. And that’s one of the quietly amazing things about trees to me. However complicated and messy the tree might be, we can find its center. How many things allow us that?

A tree might have some special vertex. That’s called the ‘root’. It’s what the vertices and the connections represent that make a root; it’s not something inherent in the way trees look. We pick one for some special reason and then we highlight it. Maybe put it at the bottom of the drawing, making ‘root’ for once a sensible name for a mathematics thing. Often we put it at the top of the drawing, because I guess we’re just being difficult. Well, we do that because we were modelling stuff where a thing’s properties depend on what it comes from. And that puts us into thoughts of inheritance and of family trees. And weird as it is to put the root of a tree at the top, it’s also weird to put the eldest ancestors at the bottom of a family tree. People do it, but in those illuminated drawings that make a literal tree out of things. You don’t see it in family trees used for actual work, like filling up a couple pages at the start of a king or a queen’s biography.

Trees give us neat new questions to ponder, like, how many are there? I mean, if you have a certain number of vertices then how many ways are there to arrange them? One or two or three vertices all have just the one way to arrange them. Four vertices can be hooked up a whole two ways. Five vertices offer a whole three different ways to connect them. Six vertices offer six ways to connect and now we’re finally getting something interesting. There’s eleven ways to connect seven vertices, and 23 ways to connect eight vertices. The number keeps on rising, but it doesn’t follow the obvious patterns for growth of this sort of thing.

And if that’s not enough to idly ponder then think of destroying trees. Draw a tree, any shape you like. Pick one of the vertices. Imagine you obliterate that. How many separate pieces has the tree been broken into? It might be as few as two. It might be as many as the number of remaining vertices. If graph theory took away the pastime of wandering around Köningsburg’s bridges, it has given us this pastime we can create anytime we have pen, paper, and a long meeting.

The End 2016 Mathematics A To Z: Smooth


Mathematicians affect a pose of objectivity. We justify this by working on things whose truth we can know, and which must be true whenever we accept certain rules of deduction and certain definitions and axioms. This seems fair. But we choose to pay attention to things that interest us for particular reasons. We study things we like. My A To Z glossary term for today is about one of those things we like.

Smooth.

Functions. Not everything mathematicians do is functions. But functions turn up a lot. We need to set some rules. “A function” is so generic a thing we can’t handle it much. Narrow it down. Pick functions with domains that are numbers. Range too. By numbers I mean real numbers, maybe complex numbers. That gives us something.

There’s functions that are hard to work with. This is almost all of them, so we don’t touch them unless we absolutely must. But they’re functions that aren’t continuous. That means what you imagine. The value of the function at some point is wholly unrelated to its value at some nearby point. It’s hard to work with anything that’s unpredictable like that. Functions as well as people.

We like functions that are continuous. They’re predictable. We can make approximations. We can estimate the function’s value at some point using its value at some more convenient point. It’s easy to see why that’s useful for numerical mathematics, for calculations to approximate stuff. The dazzling thing is it’s useful analytically. We step into the Platonic-ideal world of pure mathematics. We have tools that let us work as if we had infinitely many digits of precision, for infinitely many numbers at once. And yet we use estimates and approximations and errors. We use them in ways to give us perfect knowledge; we get there by estimates.

Continuous functions are nice. Well, they’re nicer to us than functions that aren’t continuous. But there are even nicer functions. Functions nicer to us. A continuous function, for example, can have corners; it can change direction suddenly and without warning. A differentiable function is more predictable. It can’t have corners like that. Knowing the function well at one point gives us more information about what it’s like nearby.

The derivative of a function doesn’t have to be continuous. Grumble. It’s nice when it is, though. It makes the function easier to work with. It’s really nice for us when the derivative itself has a derivative. Nothing guarantees that the derivative of a derivative is continuous. But maybe it is. Maybe the derivative of the derivative has a derivative. That’s a function we can do a lot with.

A function is “smooth” if it has as many derivatives as we need for whatever it is we’re doing. And if those derivatives are continuous. If this seems loose that’s because it is. A proof for whatever we’re interested in might need only the original function and its first derivative. It might need the original function and its first, second, third, and fourth derivatives. It might need hundreds of derivatives. If we look through the details of the proof we might find exactly how many derivatives we need and how many of them need to be continuous. But that’s tedious. We save ourselves considerable time by saying the function is “smooth”, as in, “smooth enough for what we need”.

If we do want to specify how many continuous derivatives a function has we call it a “Ck function”. The C here means continuous. The ‘k’ means there are the number ‘k’ continuous derivatives of it. This is completely different from a “Ck function”, which would be one that’s a k-dimensional vector. Whether the “C” is boldface or not is important. A function might have infinitely many continuous derivatives. That we call a “C function”. That’s got wonderful properties, especially if the domain and range are complex-valued numbers. We couldn’t do Complex Analysis without it. Complex Analysis is the course students take after wondering how they’ll ever survive Real Analysis. It’s much easier than Real Analysis. Mathematics can be strange.

The End 2016 Mathematics A To Z: Riemann Sum


I see for the other A To Z I did this year I did something else named for Riemann. So I did. Bernhard Riemann did a lot of work that’s essential to how we see mathematics today. We name all kinds of things for him, and correctly so. Here’s one of his many essential bits of work.

Riemann Sum.

The Riemann Sum is a thing we learn in Intro to Calculus. It’s essential in getting us to definite integrals. We’re introduced to it in functions of a single variable. The functions have a domain that’s an interval of real numbers and a range that’s somewhere in the real numbers. The Riemann Sum — and from it, the integral — is a real number.

We get this number by following a couple steps. The first is we chop the interval up into a bunch of smaller intervals. That chopping-up we call a “partition” because it’s another of those times mathematicians use a word the way people might use the same word. From each one of those chopped-up pieces we pick a representative point. Now with each piece evaluate what the function is for that representative point. Multiply that by the width of the partition it was in. Then take those products for each of those pieces and add them all together. If you’ve done it right you’ve got a number.

You need a couple pieces in place to have “the” Riemann Sum for something. You need a function, which is fair enough. And you need a partitioning of the interval. And you need some representative point for each of the partitions. Change any of them — function, partition, or point — and you may change the sum you get. You expect that for changing the function. Changing the partition? That’s less obvious. But draw some wiggly curvy function on a sheet of paper. Draw a couple of partitions of the horizontal axis. (You’ll probably want to use different colors for different partitions.) That should coax you into it. And you’d probably take it on my word that different representative points give you different sums.

Very different? It’s possible. There’s nothing stopping it from happening. But if the results aren’t very different then we might just have an integrable function. That’s a function that gives us the same Riemann Sum no matter how we pick representative points, as long as we pick partitions that get finer and finer enough. We measure how fine a partition is by how big the widest chopped-up piece is. To be integrable the Riemann Sum for a function has to get to the same number whenever the partition’s size gets small enough and however we pick points inside. We get the lovely quiet paradox in which we add together infinitely many things, each of them infinitesimally tiny, and get a regular old number out of all that work.

We use the Riemann Sum for what we call numerical quadrature. That’s working out integrals on the computer. Or calculator. Or by hand. When we do it by evaluating numbers instead of using analysis. It’s very easy to program. And we can do some tricks based on the Riemann Sum to make the numerical estimate a closer match to the actual integral.

And we use the Riemann Sum to learn how the Riemann Integral works. It’s a blessedly straightforward thing. It appeals to intuition well. It lets us draw all sorts of curves with rectangular boxes overlaying them. It’s so easy to work out the area of a rectangular box. We can imagine adding up these areas without being confused.

We don’t use the Riemann Sum to actually do integrals, though. Numerical approximations to an integral, yes. For the actual integral it’s too hard to use. What makes it hard is you need to evaluate this for every possible partition and every possible pick of representative points. In grad school my analysis professor worked through — once — using this to integrate the number 1. This is the easiest possible thing to integrate and it was barely manageable. He gave a good try at integrating the function ‘f(x) = x’ but admitted he couldn’t do it. None of us could.

When you see the Riemann Sum in an Introduction to Calculus course you see it in simplified form. You get partitions that are very easy to work with. Like, you break the interval up into some number of equally-sized chunks. You get representative points that follow one of a couple good choices. The left end of the partition. The right end of the partition. The middle of the partition.

That’s fine, numerically. If the function is integrable it doesn’t matter what partition or representative points we pick. And it’s fine for learning about whether functions are integrable. If it matters whether you pick left or middle or right ends of the partition then the function isn’t integrable. The instructor can give functions that break integrability based on a given partition or endpoint choice or whatever.

But that isn’t every possible partition and every possible pick of representative points. I suppose it’s possible to work all that out for a couple of really, really simple functions. But it’s so much work. We’re better off using the Riemann Sum to get to formulas about integrals that don’t depend on actually using the Riemann Sum.

So that is the curious position the Riemann Sum has. It is a fundament of integral calculus. It is the way we first define the definite integral. We rely on it to learn what definite integrals are like. We use it all the time numerically. We never use it analytically. It’s too hard. I hope you appreciate the strange beauty of that.

The End 2016 Mathematics A To Z: Quotient Groups


I’ve got another request today, from the ever-interested and group-theory-minded gaurish. It’s another inspirational one.

Quotient Groups.

We all know about even and odd numbers. We don’t have to think about them. That’s why it’s worth discussing them some.

We do know what they are, though. The integers — whole numbers, positive and negative — we can split into two sets. One of them is the even numbers, two and four and eight and twelve. Zero, negative two, negative six, negative 2,038. The other is the odd numbers, one and three and nine. Negative five, negative nine, negative one.

What do we know about numbers, if all we look at is whether numbers are even or odd? Well, we know every integer is either an odd or an even number. It’s not both; it’s not neither.

We know that if we start with an even number, its negative is also an even number. If we start with an odd number, its negative is also an odd number.

We know that if we start with a number, even or odd, and add to it its negative then we get an even number. A specific number, too: zero. And that zero is interesting because any number plus zero is that same original number.

We know we can add odds or evens together. An even number plus an even number will be an even number. An odd number plus an odd number is an even number. An odd number plus an even number is an odd number. And subtraction is the same as addition, by these lights. One number minus an other number is just one number plus negative the other number. So even minus even is even. Odd minus odd is even. Odd minus even is odd.

We can pluck out some of the even and odd numbers as representative of these sets. We don’t want to deal with big numbers, nor do we want to deal with negative numbers if we don’t have to. So take ‘0’ as representative of the even numbers. ‘1’ as representative of the odd numbers. 0 + 0 is 0. 0 + 1 is 1. 1 + 0 is 1. The addition is the same thing we would do with the original set of integers. 1 + 1 would be 2, which is one of the even numbers, which we represent with 0. So 1 + 1 is 0. If we’ve picked out just these two numbers each is the minus of itself: 0 – 0 is 0 + 0. 1 – 1 is 1 + 1. All that gives us 0, like we should expect.

Two paragraphs back I said something that’s obvious, but deserves attention anyway. An even plus an even is an even number. You can’t get an odd number out of it. An odd plus an odd is an even number. You can’t get an odd number out of it. There’s something fundamentally different between the even and the odd numbers.

And now, kindly reader, you’ve learned quotient groups.

OK, I’ll do some backfilling. It starts with groups. A group is the most skeletal cartoon of arithmetic. It’s a set of things and some operation that works like addition. The thing-like-addition has to work on pairs of things in your set, and it has to give something else in the set. There has to be a zero, something you can add to anything without changing it. We call that the identity, or the additive identity, because it doesn’t change something else’s identity. It makes sense if you don’t stare at it too hard. Everything has an additive inverse. That is everything has a “minus”, that you can add to it to get zero.

With odd and even numbers the set of things is the integers. The thing-like-addition is, well, addition. I said groups were based on how normal arithmetic works, right?

And then you need a subgroup. A subgroup is … well, it’s a subset of the original group that’s itself a group. It has to use the same addition the original group does. The even numbers are such a subgroup of the integers. Formally they make something called a “normal subgroup”, which is a little too much for me to explain right now. If your addition works like it does for normal numbers, that is, “a + b” is the same thing as “b + a”, then all your subgroups are normal groups. Yes, it can happen that they’re not. If the addition is something like rotations in three-dimensional space, or swapping the order of things, then the order you “add” things in matters.

We make a quotient group by … OK, this isn’t going to sound like anything. It’s a group, though, like the name says. It uses the same addition that the original group does. Its set, though, that’s itself made up of sets. One of the sets is the normal subgroup. That’s the easy part.

Then there’s something called cosets. You make a coset by picking something from the original group and adding it to everything in the subgroup. If the thing you pick was from the original subgroup that’s just going to be the subgroup again. If you pick something outside the original subgroup then you’ll get some other set.

Starting from the subgroup of even numbers there’s not a lot to do. You can get the even numbers and you get the odd numbers. Doesn’t seem like much. We can do otherwise though. Suppose we start from the subgroup of numbers divisible by 4, though. That’s 0, 4, 8, 12, -4, -8, -12, and so on. Now there’s three cosets we can make from that. We can start with the original set of numbers. Or we have 1 plus that set: 1, 5, 9, 13, -3, -7, -11, and so on. Or we have 2 plus that set: 2, 6, 10, 14, -2, -6, -10, and so on. Or we have 3 plus that set: 3, 7, 11, 15, -1, -5, -9, and so on. None of these others are subgroups, which is why we don’t call them subgroups. We call them cosets.

These collections of cosets, though, they’re the pieces of a new group. The quotient group. One of them, the normal subgroup you started with, is the identity, the thing that’s as good as zero. And you can “add” the cosets together, in just the same way you can add “odd plus odd” or “odd plus even” or “even plus even”.

For example. Let me start with the numbers divisible by 4. I will have so much a better time if I give this a name. I’ll pick ‘Q’. This is because, you know, quarters, quartet, quadrilateral, this all sounds like four-y stuff. The integers — the integers have a couple of names. ‘I’, ‘J’, and ‘Z’ are the most common ones. We get ‘Z’ from German; a lot of important group theory was done by German-speaking mathematicians. I’m used to it so I’ll stick with that. The quotient group ‘Z / Q’, read “Z modulo Q”, has (it happens) four cosets. One of them is Q. One of them is “1 + Q”, that set 1, 5, 9, and so on. Another of them is “2 + Q”, that set 2, 6, 10, and so on. And the last is “3 + Q”, that set 3, 7, 11, and so on.

And you can add them together. 1 + Q plus 1 + Q turns out to be 2 + Q. Try it out, you’ll see. 1 + Q plus 2 + Q turns out to be 3 + Q. 2 + Q plus 2 + Q is Q again.

The quotient group uses the same addition as the original group. But it doesn’t add together elements of the original group, or even of the normal subgroup. It adds together sets made from the normal subgroup. We’ll denote them using some form that looks like “a + N”, or maybe “a N”, if ‘N’ was the normal subgroup and ‘a’ something that wasn’t in it. (Sometimes it’s more convenient writing the group operation like it was multiplication, because we do that by not writing anything at all, which saves us from writing stuff.)

If we’re comfortable with the idea that “odd plus odd is even” and “even plus odd is odd” then we should be comfortable with adding together quotient groups. We’re not, not without practice, but that’s all right. In the Introduction To Not That Kind Of Algebra course mathematics majors take they get a lot of practice, just in time to be thrown into rings.

Quotient groups land on the mathematics major as a baffling thing. They don’t actually turn up things from the original group. And they lead into important theorems. But to an undergraduate they all look like text huddling up to ladders of quotient groups. We’re told these are important theorems and they are. They also go along with beautiful diagrams of how these quotient groups relate to each other. But they’re hard going. It’s tough finding good examples and almost impossible to explain what a question is. It comes as a relief to be thrown into rings. By the time we come back around to quotient groups we’ve usually had enough time to get used to the idea that they don’t seem so hard.

Really, looking at odds and evens, they shouldn’t be so hard.

The End 2016 Mathematics A To Z: Principal


Functions. They’re at the center of so much mathematics. They have three pieces: a domain, a range, and a rule. The one thing functions absolutely must do is match stuff in the domain to one and only one thing in the range. So this is where it gets tricky.

Principal.

Thing with this one-and-only-one thing in the range is it’s not always practical. Sometimes it only makes sense to allow for something in the domain to match several things in the range. For example, suppose we have the domain of positive numbers. And we want a function that gives us the numbers which, squared, are whatever the original function was. For any positive real number there’s two numbers that do that. 4 should match to both +2 and -2.

You might ask why I want a function that tells me the numbers which, squared, equal something. I ask back, what business is that of yours? I want a function that does this and shouldn’t that be enough? We’re getting off to a bad start here. I’m sorry; I’ve been running ragged the last few days. I blame the flat tire on my car.

Anyway. I’d want something like that function because I’m looking for what state of things makes some other thing true. This turns up often in “inverse problems”, problems in which we know what some measurement is and want to know what caused the measurement. We do that sort of problem all the time.

We can handle these multi-valued functions. Of course we can. Mathematicians are as good at loopholes as anyone else is. Formally we declare that the range isn’t the real numbers but rather sets of real numbers. My what-number-squared function then matches ‘4’ in the domain to the set of numbers ‘+2 and -2’. The set has several things in it, but there’s just the one set. Clever, huh?

This sort of thing turns up a lot. There’s two numbers that, squared, give us any real number (except zero). There’s three numbers that, squared, give us any real number (again except zero). Polynomials might have a whole bunch of numbers that make some equation true. Trig functions are worse. The tangent of 45 degrees equals 1. So is the tangent of 225 degrees. Also 405 degrees. Also -45 degrees. Also -585 degrees. OK, a mathematician would use radians instead of degrees, but that just changes what the numbers are. Not that there’s infinitely many of them.

It’s nice to have options. We don’t always want options. Sometimes we just want one blasted simple answer to things. It’s coded into the language. We say “the square root of four”. We speak of “the arctangent of 1”, which is to say, “the angle with tangent of 1”. We only say “all square roots of four” if we’re making a point about overlooking options.

If we’ve got a set of things, then we can pick out one of them. This is obvious, which means it is so very hard to prove. We just have to assume we can. Go ahead; assume we can. Our pick of the one thing out of this set is the “principal”. It’s not any more inherently right than the other possibilities. It’s just the one we choose to grab first.

So. The principal square root of four is positive two. The principal arctangent of 1 is 45 degrees, or in the dialect of mathematicians π divided by four. We pick these values over other possibilities because they’re nice. What makes them nice? Well, they’re nice. Um. Most of their numbers aren’t that big. They use positive numbers if we have a choice in the matter. Deep down we still suspect negative numbers of being up to something.

If nobody says otherwise then the principal square root is the positive one, or the one with a positive number in front of the imaginary part. If nobody says otherwise the principal arcsine is between -90 and +90 degrees (-π/2 and π/2). The principal arccosine is between 0 and 180 degrees (0 and π), unless someone says otherwise. The principal arctangent is … between -90 and 90 degrees, unless it’s between 0 and 180 degrees. You can count on the 0 to 90 part. Use your best judgement and roll with whatever develops for the other half of the range there. There’s not one answer that’s right for every possible case. The point of a principal value is to pick out one answer that’s usually a good starting point.

When you stare at what it means to be a function you realize that there’s a difference between the original function and the one that returns the principal value. The original function has a range that’s “sets of values”. The principal-value version has a range that’s just one value. If you’re being kind to your audience you make some note of that. Usually we note this by capitalizing the start of the function: “arcsin z” gives way to “Arcsin z”. “Log z” would be the principal-value version of “log z”. When you start pondering logarithms for negative numbers or for complex-valued numbers you get multiple values. It’s the same way that the arcsine function does.

And it’s good to warn your audience which principal value you mean, especially for the arc-trigonometric-functions or logarithms. (I’ve never seen someone break the square root convention.) The principal value is about picking the most obvious and easy-to-work-with value out of a set of them. It’s just impossible to get everyone to agree on what the obvious is.

The End 2016 Mathematics A To Z: Osculating Circle


I’m happy to say it’s another request today. This one’s from HowardAt58, author of the Saving School Math blog. He’s given me some great inspiration in the past.

Osculating Circle.

It’s right there in the name. Osculating. You know what that is from that one Daffy Duck cartoon where he cries out “Greetings, Gate, let’s osculate” while wearing a moustache. Daffy’s imitating somebody there, but goodness knows who. Someday the mystery drives the young you to a dictionary web site. Osculate means kiss. This doesn’t seem to explain the scene. Daffy was imitating Jerry Colonna. That meant something in 1943. You can find him on old-time radio recordings. I think he’s funny, in that 40s style.

Make the substitution. A kissing circle. Suppose it’s not some playground antic one level up from the Kissing Bandit that plagues recess yet one or two levels down what we imagine we’d do in high school. It suggests a circle that comes really close to something, that touches it a moment, and then goes off its own way.

But then touching. We know another word for that. It’s the root behind “tangent”. Tangent is a trigonometry term. But it appears in calculus too. The tangent line is a line that touches a curve at one specific point and is going in the same direction as the original curve is at that point. We like this because … well, we do. The tangent line is a good approximation of the original curve, at least at the tangent point and for some region local to that. The tangent touches the original curve, and maybe it does something else later on. What could kissing be?

The osculating circle is about approximating an interesting thing with a well-behaved thing. So are similar things with names like “osculating curve” or “osculating sphere”. We need that a lot. Interesting things are complicated. Well-behaved things are understood. We move from what we understand to what we would like to know, often, by an approximation. This is why we have tangent lines. This is why we build polynomials that approximate an interesting function. They share the original function’s value, and its derivative’s value. A polynomial approximation can share many derivatives. If the function is nice enough, and the polynomial big enough, it can be impossible to tell the difference between the polynomial and the original function.

The osculating circle, or sphere, isn’t so concerned with matching derivatives. I know, I’m as shocked as you are. Well, it matches the first and the second derivatives of the original curve. Anything past that, though, it matches only by luck. The osculating circle is instead about matching the curvature of the original curve. The curvature is what you think it would be: it’s how much a function curves. If you imagine looking closely at the original curve and an osculating circle they appear to be two arcs that come together. They must touch at one point. They might touch at others, but that’s incidental.

Osculating circles, and osculating spheres, sneak out of mathematics and into practical work. This is because we often want to work with things that are almost circles. The surface of the Earth, for example, is not a sphere. But it’s only a tiny bit off. It’s off in ways that you only notice if you are doing high-precision mapping. Or taking close measurements of things in the sky. Sometimes we do this. So we map the Earth locally as if it were a perfect sphere, with curvature exactly what its curvature is at our observation post.

Or we might be observing something moving in orbit. If the universe had only two things in it, and they were the correct two things, all orbits would be simple: they would be ellipses. They would have to be “point masses”, things that have mass without any volume. They never are. They’re always shapes. Spheres would be fine, but they’re never perfect spheres even. The slight difference between a perfect sphere and whatever the things really are affects the orbit. Or the other things in the universe tug on the orbiting things. Or the thing orbiting makes a course correction. All these things make little changes in the orbiting thing’s orbit. The actual orbit of the thing is a complicated curve. The orbit we could calculate is an osculating — well, an osculating ellipse, rather than an osculating circle. Similar idea, though. Call it an osculating orbit if you’d rather.

That osculating circles have practical uses doesn’t mean they aren’t respectable mathematics. I’ll concede they’re not used as much as polynomials or sine curves are. I suppose that’s because polynomials and sine curves have nicer derivatives than circles do. But osculating circles do turn up as ways to try solving nonlinear differential equations. We need the help. Linear differential equations anyone can solve. Nonlinear differential equations are pretty much impossible. They also turn up in signal processing, as ways to find the frequencies of a signal from a sampling of data. This, too, we would like to know.

We get the name “osculating circle” from Gottfried Wilhelm Leibniz. This might not surprise. Finding easy-to-understand shapes that approximate interesting shapes is why we have calculus. Isaac Newton described a way of making them in the Principia Mathematica. This also might not surprise. Of course they would on this subject come so close together without kissing.

The End 2016 Mathematics A To Z: Normal Numbers


Today’s A To Z term is another of gaurish’s requests. It’s also a fun one so I’m glad to have reason to write about it.

Normal Numbers

A normal number is any real number you never heard of.

Yeah, that’s not what we say a normal number is. But that’s what a normal number is. If we could imagine the real numbers to be a stream, and that we could reach into it and pluck out a water-drop that was a single number, we know what we would likely pick. It would be an irrational number. It would be a transcendental number. And it would be a normal number.

We know normal numbers — or we would, anyway — by looking at their representation in digits. For example, π is a number that starts out 3.1415926535897931159979634685441851615905 and so on forever. Look at those digits. Some of them are 1’s. How many? How many are 2’s? How many are 3’s? Are there more than you would expect? Are there fewer? What would you expect?

Expect. That’s the key. What should we expect in the digits of any number? The numbers we work with don’t offer much help. A whole number, like 2? That has a decimal representation of a single ‘2’ and infinitely many zeroes past the decimal point. Two and a half? A single ‘2, a single ‘5’, and then infinitely many zeroes past the decimal point. One-seventh? Well, we get infinitely many 1’s, 4’s, 2’s, 8’s, 5’s, and 7’s. Never any 3’s, nor any 0’s, nor 6’s or 9’s. This doesn’t tell us anything about how often we would expect ‘8’ to appear in the digits of π.

In an normal number we get all the decimal digits. And we get each of them about one-tenth of the time. If all we had was a chart of how often digits turn up we couldn’t tell the summary of one normal number from the summary of any other normal number. Nor could we tell either from the summary of a perfectly uniform randomly drawn number.

It goes beyond single digits, though. Look at pairs of digits. How often does ’14’ turn up in the digits of a normal number? … Well, something like once for every hundred pairs of digits you draw from the random number. Look at triplets of digits. ‘141’ should turn up about once in every thousand sets of three digits. ‘1415’ should turn up about once in every ten thousand sets of four digits. Any finite string of digits should turn up, and exactly as often as any other finite string of digits.

That’s in the full representation. If you look at all the infinitely many digits the normal number has to offer. If all you have is a slice then some digits are going to be more common and some less common. That’s similar to how if you fairly toss a coin (say) forty times, there’s a good chance you’ll get tails something other than exactly twenty times. Look at the first 35 or so digits of π there’s not a zero to be found. But as you survey more digits you get closer and closer to the expected average frequency. It’s the same way coin flips get closer and closer to 50 percent tails. Zero is a rarity in the first 35 digits. It’s about one-tenth of the first 3500 digits.

The digits of a specific number are not random, not if we know what the number is. But we can be presented with a subset of its digits and have no good way of guessing what the next digit might be. That is getting into the same strange territory in which we can speak about the “chance” of a month having a Friday the 13th even though the appearances of Fridays the 13th have absolutely no randomness to them.

This has staggering implications. Some of them inspire an argument in science fiction Usenet newsgroup rec.arts.sf.written every two years or so. Probably it does so in other venues; Usenet is just my first home and love for this. In a minor point in Carl Sagan’s novel Cosmos possibly-imaginary aliens reveal there’s a pattern hidden in the digits of π. (It’s not in the movie version, which is a shame. But to include it would require people watching a computer. So that could not make for a good movie scene, we now know.) Look far enough into π, says the book, and there’s suddenly a string of digits that are nearly all zeroes, interrupted with a few ones. Arrange the zeroes and ones into a rectangle and it draws a pixel-art circle. And the aliens don’t know how something astounding like that could be.

Nonsense, respond the kind of science fiction reader that likes to identify what the nonsense in science fiction stories is. (Spoiler: it’s the science. In this case, the mathematics too.) In a normal number every finite string of digits appears. It would be truly astounding if there weren’t an encoded circle in the digits of π. Indeed, it would be impossible for there not to be infinitely many circles of every possible size encoded in every possible way in the digits of π. If the aliens are amazed by that they would be amazed to find how every triangle has three corners.

I’m a more forgiving reader. And I’ll give Sagan this amazingness. I have two reasons. The first reason is on the grounds of discoverability. Yes, the digits of a normal number will have in them every possible finite “message” encoded every possible way. (I put the quotes around “message” because it feels like an abuse to call something a message if it has no sender. But it’s hard to not see as a “message” something that seems to mean something, since we live in an era that accepts the Death of the Author as a concept at least.) Pick your classic cypher `1 = A, 2 = B, 3 = C’ and so on, and take any normal number. If you look far enough into its digits you will find every message you might ever wish to send, every book you could read. Every normal number holds Jorge Luis Borges’s Library of Babel, and almost every real number is a normal number.

But. The key there is if you look far enough. Look above; the first 35 or so digits of π have no 0’s, when you would expect three or four of them. There’s no 22’s, even though that number has as much right to appear as does 15, which gets in at least twice that I see. And we will only ever know finitely many digits of π. It may be staggeringly many digits, sure. It already is. But it will never be enough to be confident that a circle, or any other long enough “message”, must appear. It is staggering that a detectable “message” that long should be in the tiny slice of digits that we might ever get to see.

And it’s harder than that. Sagan’s book says the circle appears in whatever base π gets represented in. So not only does the aliens’ circle pop up in base ten, but also in base two and base sixteen and all the other, even less important bases. The circle happening to appear in the accessible digits of π might be an imaginable coincidence in some base. There’s infinitely many bases, one of them has to be lucky, right? But to appear in the accessible digits of π in every one of them? That’s staggeringly impossible. I say the aliens are correct to be amazed.

Now to my second reason to side with the book. It’s true that any normal number will have any “message” contained in it. So who says that π is a normal number?

We think it is. It looks like a normal number. We have figured out many, many digits of π and they’re distributed the way we would expect from a normal number. And we know that nearly all real numbers are normal numbers. If I had to put money on it I would bet π is normal. It’s the clearly safe bet. But nobody has ever proved that it is, nor that it isn’t. Whether π is normal or not is a fit subject for conjecture. A writer of science fiction may suppose anything she likes about its normality without current knowledge saying she’s wrong.

It’s easy to imagine numbers that aren’t normal. Rational numbers aren’t, for example. If you followed my instructions and made your own transcendental number then you made a non-normal number. It’s possible that π should be non-normal. The first thirty million digits or so look good, though, if you think normal is good. But what’s thirty million against infinitely many possible counterexamples? For all we know, there comes a time when π runs out of interesting-looking digits and turns into an unpredictable little fluttering between 6 and 8.

It’s hard to prove that any numbers we’d like to know about are normal. We don’t know about π. We don’t know about e, the base of the natural logarithm. We don’t know about the natural logarithm of 2. There is a proof that the square root of two (and other non-square whole numbers, like 3 or 5) is normal in base two. But my understanding is it’s a nonstandard approach that isn’t quite satisfactory to experts in the field. I’m not expert so I can’t say why it isn’t quite satisfactory. If the proof’s authors or grad students wish to quarrel with my characterization I’m happy to give space for their rebuttal.

It’s much the way transcendental numbers were in the 19th century. We understand there to be this class of numbers that comprises nearly every number. We just don’t have many examples. But we’re still short on examples of transcendental numbers. Maybe we’re not that badly off with normal numbers.

We can construct normal numbers. For example, there’s the Champernowne Constant. It’s the number you would make if you wanted to show you could make a normal number. It’s 0.12345678910111213141516171819202122232425 and I bet you can imagine how that develops from that point. (David Gawen Champernowne proved it was normal, which is the hard part.) There’s other ways to build normal numbers too, if you like. But those numbers aren’t of any interest except that we know them to be normal.

Mere normality is tied to a base. A number might be normal in base ten (the way normal people write numbers) but not in base two or base sixteen (which computers and people working on computers use). It might be normal in base twelve, used by nobody except mathematics popularizers of the 1960s explaining bases, but not normal in base ten. There can be numbers normal in every base. They’re called “absolutely normal”. Nearly all real numbers are absolutely normal. Wacław Sierpiński constructed the first known absolutely normal number in 1917. If you got in on the fractals boom of the 80s and 90s you know his name, although without the Polish spelling. He did stuff with gaskets and curves and carpets you wouldn’t believe. I’ve never seen Sierpiński’s construction of an absolutely normal number. From my references I’m not sure if we know how to construct any other absolutely normal numbers.

So that is the strange state of things. Nearly every real number is normal. Nearly every number is absolutely normal. We know a couple normal numbers. We know at least one absolutely normal number. But we haven’t (to my knowledge) proved any number that’s otherwise interesting is also a normal number. This is why I say: a normal number is any real number you never heard of.

The End 2016 Mathematics A To Z: Monster Group


Today’s is one of my requested mathematics terms. This one comes to us from group theory, by way of Gaurish, and as ever I’m thankful for the prompt.

Monster Group.

It’s hard to learn from an example. Examples are great, and I wouldn’t try teaching anything subtle without one. Might not even try teaching the obvious without one. But a single example is dangerous. The learner has trouble telling what parts of the example are the general lesson to learn and what parts are just things that happen to be true for that case. Having several examples, of different kinds of things, saves the student. The thing in common to many different examples is the thing to retain.

The mathematics major learns group theory in Introduction To Not That Kind Of Algebra, MAT 351. A group extracts the barest essence of arithmetic: a bunch of things and the ability to add them together. So what’s an example? … Well, the integers do nicely. What’s another example? … Well, the integers modulo two, where the only things are 0 and 1 and we know 1 + 1 equals 0. What’s another example? … The integers modulo three, where the only things are 0 and 1 and 2 and we know 1 + 2 equals 0. How about another? … The integers modulo four? Modulo five?

All true. All, also, basically the same thing. The whole set of integers, or of real numbers, are different. But as finite groups, the integers modulo anything are nice easy to understand groups. They’re known as Cyclic Groups for reasons I’ll explain if asked. But all the Cyclic Groups are kind of the same.

So how about another example? And here we get some good ones. There’s the Permutation Groups. These are fun. You start off with a set of things. You can label them anything you like, but you’re daft if you don’t label them the counting numbers. So, say, the set of things 1, 2, 3, 4, 5. Start with them in that order. A permutation is the swapping of any pair of those things. So swapping, say, the second and fifth things to get the list 1, 5, 3, 4, 2. The collection of all the swaps you can make is the Permutation Group on this set of things. The things in the group are not 1, 2, 3, 4, 5. The things in the permutation group are “swap the second and fifth thing” or “swap the third and first thing” or “swap the fourth and the third thing”. You maybe feel uneasy about this. That’s all right. I suggest playing with this until you feel comfortable because it is a lot of fun to play with. Playing in this case mean writing out all the ways you can swap stuff, which you can always do as a string of swaps of exactly two things.

(Some people may remember an episode of Futurama that involved a brain-swapping machine. Or a body-swapping machine, if you prefer. The gimmick of the episode is that two people could only swap bodies/brains exactly one time. The problem was how to get everybody back in their correct bodies. It turns out to be possible to do, and one of the show’s writers did write a proof of it. It’s shown on-screen for a moment. Many fans were awestruck by an episode of the show inspiring a Mathematical Theorem. They’re overestimating how rare theorems are. But it is fun when real mathematics gets done as a side effect of telling a good joke. Anyway, the theorem fits well in group theory and the study of these permutation groups.)

So the student wanting examples of groups can get the Permutation Group on three elements. Or the Permutation Group on four elements. The Permutation Group on five elements. … You kind of see, this is certainly different from those Cyclic Groups. But they’re all kind of like each other.

An “Alternating Group” is one where all the elements in it are an even number of permutations. So, “swap the second and fifth things” would not be in an alternating group. But “swap the second and fifth things, and swap the fourth and second things” would be. And so the student needing examples can look at the Alternating Group on two elements. Or the Alternating Group on three elements. The Alternating Group on four elements. And so on. It’s slightly different from the Permutation Group. It’s certainly different from the Cyclic Group. But still, if you’ve mastered the Alternating Group on five elements you aren’t going to see the Alternating Group on six elements as all that different.

Cyclic Groups and Alternating Groups have some stuff in common. Permutation Groups not so much and I’m going to leave them in the above paragraph, waving, since they got me to the Alternating Groups I wanted.

One is that they’re finite. At least they can be. I like finite groups. I imagine students like them too. It’s nice having a mathematical thing you can write out in full and know you aren’t missing anything.

The second thing is that they are, or they can be, “simple groups”. That’s … a challenge to explain. This has to do with the structure of the group and the kinds of subgroup you can extract from it. It’s very very loosely and figuratively and do not try to pass this off at your thesis defense kind of like being a prime number. In fact, Cyclic Groups for a prime number of elements are simple groups. So are Alternating Groups on five or more elements.

So we get to wondering: what are the finite simple groups? Turns out they come in four main families. One family is the Cyclic Groups for a prime number of things. One family is the Alternating Groups on five or more things. One family is this collection called the Chevalley Groups. Those are mostly things about projections: the ways to map one set of coordinates into another. We don’t talk about them much in Introduction To Not That Kind Of Algebra. They’re too deep into Geometry for people learning Algebra. The last family is this collection called the Twisted Chevalley Groups, or the Steinberg Groups. And they .. uhm. Well, I never got far enough into Geometry I’m Guessing to understand what they’re for. I’m certain they’re quite useful to people working in the field of order-three automorphisms of the whatever exactly D4 is.

And that’s it. That’s all the families there are. If it’s a finite simple group then it’s one of these. … Unless it isn’t.

Because there are a couple of stragglers. There are a few finite simple groups that don’t fit in any of the four big families. And it really is only a few. I would have expected an infinite number of weird little cases that don’t belong to a family that looks similar. Instead, there are 26. (27 if you decide a particular one of the Steinberg Groups doesn’t really belong in that family. I’m not familiar enough with the case to have an opinion.) Funny number to have turn up. It took ten thousand pages to prove there were just the 26 special cases. I haven’t read them all. (I haven’t read any of the pages. But my Algebra professors at Rutgers were proud to mention their department’s work in tracking down all these cases.)

Some of these cases have some resemblance to one another. But not enough to see them as a family the way the Cyclic Groups are. We bundle all these together in a wastebasket taxon called “the sporadic groups”. The first five of them were worked out in the 1860s. The last of them was worked out in 1980, seven years after its existence was first suspected.

The sporadic groups all have weird sizes. The smallest one, known as M11 (for “Mathieu”, who found it and four of its siblings in the 1860s) has 7,920 things in it. They get enormous soon after that.

The biggest of the sporadic groups, and the last one described, is the Monster Group. It’s known as M. It has a lot of things in it. In particular it’s got 808,017,424,794,512,875,886,459,904,961,710,757,005,754,368,000,000,000 things in it. So, you know, it’s not like we’ve written out everything that’s in it. We’ve just got descriptions of how you would write out everything in it, if you wanted to try. And you can get a good argument going about what it means for a mathematical object to “exist”, or to be “created”. There are something like 1054 things in it. That’s something like a trillion times a trillion times the number of stars in the observable universe. Not just the stars in our galaxy, but all the stars in all the galaxies we could in principle ever see.

It’s one of the rare things for which “Brobdingnagian” is an understatement. Everything about it is mind-boggling, the sort of thing that staggers the imagination more than infinitely large things do. We don’t really think of infinitely large things; we just picture “something big”. A number like that one above is definite, and awesomely big. Just read off the digits of that number; it sounds like what we imagine infinity ought to be.

We can make a chart, called the “character table”, which describes how subsets of the group interact with one another. The character table for the Monster Group is 194 rows tall and 194 columns wide. The Monster Group can be represented as this, I am solemnly assured, logical and beautiful algebraic structure. It’s something like a polyhedron in rather more than three dimensions of space. In particular it needs 196,884 dimensions to show off its particular beauty. I am taking experts’ word for it. I can’t quite imagine more than 196,883 dimensions for a thing.

And it’s a thing full of mystery. This creature of group theory makes us think of the number 196,884. The same 196,884 turns up in number theory, the study of how integers are put together. It’s the first non-boring coefficient in a thing called the j-function. It’s not coincidence. This bit of number theory and this bit of group theory are bound together, but it took some years for anyone to quite understand why.

There are more mysteries. The character table has 194 rows and columns. Each column implies a function. Some of those functions are duplicated; there are 171 distinct ones. But some of the distinct ones it turns out you can find by adding together multiples of others. There are 163 distinct ones. 163 appears again in number theory, in the study of algebraic integers. These are, of course, not integers at all. They’re things that look like complex-valued numbers: some real number plus some (possibly other) real number times the square root of some specified negative number. They’ve got neat properties. Or weird ones.

You know how with integers there’s just one way to factor them? Like, fifteen is equal to three times five and no other set of prime numbers? Algebraic integers don’t work like that. There’s usually multiple ways to do that. There are exceptions, algebraic integers that still have unique factorings. They happen only for a few square roots of negative numbers. The biggest of those negative numbers? Minus 163.

I don’t know if this 163 appearance means something. As I understand the matter, neither does anybody else.

There is some link to the mathematics of string theory. That’s an interesting but controversial and hard-to-experiment-upon model for how the physics of the universe may work. But I don’t know string theory well enough to say what it is or how surprising this should be.

The Monster Group creates a monster essay. I suppose it couldn’t do otherwise. I suppose I can’t adequately describe all its sublime mystery. Dr Mark Ronan has written a fine web page describing much of the Monster Group and the history of our understanding of it. He also has written a book, Symmetry and the Monster, to explain all this in greater depths. I’ve not read the book. But I do mean to, now.

The End 2016 Mathematics A To Z: Local


Today’s is another of those words that means nearly what you would guess. There are still seven letters left, by the way, which haven’t had any requested terms. If you’d like something described please try asking.

Local.

Stops at every station, rather than just the main ones.

OK, I’ll take it seriously.

So a couple years ago I visited Niagara Falls, and I stepped into the river, just above the really big drop.

A view (from the United States side) of the Niagara Falls. With a lot of falling water and somehow even more mist.
Niagara Falls, demonstrating some locally unsafe waters to be in. Background: Canada (left), United States (right).

I didn’t have any plans to go over the falls, and didn’t, but I liked the thrill of claiming I had. I’m not crazy, though; I picked a spot I knew was safe to step in. It’s only in the retelling I went into the Niagara River just above the falls.

Because yes, there is surely danger in certain spots of the Niagara River. But there are also spots that are perfectly safe. And not isolated spots either. I wouldn’t have been less safe if I’d stepped into the river a few feet closer to the edge. Nor if I’d stepped in a few feet farther away. Where I stepped in was locally safe.

Speedy but not actually turbulent waters on the Niagara River, above the falls.
The Niagara River, and some locally safe enough waters to be in. That’s not me in the picture; if you do know who it is, I have no way of challenging you. But it’s the area I stepped into and felt this lovely illicit thrill doing so.

Over in mathematics we do a lot of work on stuff that’s true or false depending on what some parameters are. We can look at bunches of those parameters, and they often look something like normal everyday space. There’s some values that are close to what we started from. There’s others that are far from that.

So, a “neighborhood” of some point is that point and some set of points containing it. It needs to be an “open” set, which means it doesn’t contain its boundary. So, like, everything less than one minute’s walk away, but not the stuff that’s precisely one minute’s walk away. (If we include boundaries we break stuff that we don’t want broken is why.) And certainly not the stuff more than one minute’s walk away. A neighborhood could have any shape. It’s easy to think of it as a little disc around the point you want. That’s usually the easiest to describe in a proof, because it’s “everything a distance less than (something) away”. (That “something” is either ‘δ’ or ‘ε’. Both Greek letters are called in to mean “a tiny distance”. They have different connotations about what the tiny distance is in.) It’s easiest to draw as little amoeba-like blob around a point, and contained inside a bigger amoeba-like blob.

Anyway, something is true “locally” to a point if it’s true in that neighborhood. That means true for everything in that neighborhood. Which is what you’d expect. “Local” means just that. It’s the stuff that’s close to where we started out.

Often we would like to know something “globally”, which means … er … everywhere. Universally so. But it’s usually easier to prove a thing locally. I suppose having a point where we know something is so makes it easier to prove things about what’s nearby. Distant stuff, who knows?

“Local” serves as an adjective for many things. We think of a “local maximum”, for example, or “local minimum”. This is where whatever we’re studying has a value bigger (or smaller) than anywhere else nearby has. Or we speak of a function being “locally continuous”, meaning that we know it’s continuous near this point and we make no promises away from it. It might be “locally differentiable”, meaning we can take derivatives of it close to some interesting point. We say nothing about what happens far from it.

Unless we do. We can talk about something being “local to infinity”. Your first reaction to that should probably be to slap the table and declare that’s it, we’re done. But we can make it sensible, at least to other mathematicians. We do it by starting with a neighborhood that contains the origin, zero, that point in the middle of everything. So, what’s the inverse of that? It’s everything that’s far enough away from the origin. (Don’t include the boundary, we don’t need those headaches.) So why not call that the “neighborhood of infinity”? Other than that it’s a weird set of words to put together? And if something is true in that “neighborhood of infinity”, what is that thing other than true “local to infinity”?

I don’t blame you for being skeptical.

The End 2016 Mathematics A To Z: Kernel


I told you that Image thing would reappear. Meanwhile I learned something about myself in writing this.

Kernel.

I want to talk about functions again. I’ve been keeping like a proper mathematician to a nice general idea of what a function is. The sort where a function’s this rule matching stuff in a set called the domain with stuff in a set called the range. And I’ve tried not to commit myself to saying anything about what that domain and range are. They could be numbers. They could be other functions. They could be the set of DVDs you own but haven’t watched in more than two years. They could be collections socks. Haven’t said.

But we know what functions anyone cares about. They’re stuff that have domains and ranges that are numbers. Preferably real numbers. Complex-valued numbers if we must. If we look at more exotic sets they’re ones that stick close to being numbers: vectors made up of an ordered set of numbers. Matrices of numbers. Functions that are themselves about numbers. Maybe we’ll get to something exotic like a rotation, but then what is a rotation but spinning something a certain number of degrees? There are a bunch of unavoidably common domains and ranges.

Fine, then. I’ll stick to functions with ranges that look enough like regular old numbers. By “enough” I mean they have a zero. That is, something that works like zero does. You know, add it to something else and that something else isn’t changed. That’s all I need.

A natural thing to wonder about a function — hold on. “Natural” is the wrong word. Something we learn to wonder about in functions, in pre-algebra class where they’re all polynomials, is where the zeroes are. They’re generally not at zero. Why would we say “zeroes” to mean “zero”? That could let non-mathematicians think they knew what we were on about. By the “zeroes” we mean the things in the domain that get matched to the zero in the range. It might be zero; no reason it couldn’t, until we know what the function’s rule is. Just we can’t count on that.

A polynomial we know has … well, it might have zero zeroes. Might have no zeroes. It might have one, or two, or so on. If it’s an n-th degree polynomial it can have up to n zeroes. And if it’s not a polynomial? Well, then it could have any conceivable number of zeroes and nobody is going to give you a nice little formula to say where they all are. It’s not that we’re being mean. It’s just that there isn’t a nice little formula that works for all possibilities. There aren’t even nice little formulas that work for all polynomials. You have to find zeroes by thinking about the problem. Sorry.

But! Suppose you have a collection of all the zeroes for your function. That’s all the points in the domain that match with zero in the range. Then we have a new name for the thing you have. And that’s the kernel of your function. It’s the biggest subset in the domain with an image that’s just the zero in the range.

So we have a name for the zeroes that isn’t just “the zeroes”. What does this get us?

If we don’t know anything about the kind of function we have, not much. If the function belongs to some common kinds of functions, though, it tells us stuff.

For example. Suppose the function has domain and range that are vectors. And that the function is linear, which is to say, easy to deal with. Let me call the function ‘f’. And let me pick out two things in the domain. I’ll call them ‘x’ and ‘y’ because I’m writing this after Thanksgiving dinner and can’t work up a cleverer name for anything. If f is linear then f(x + y) is the same thing as f(x) + f(y). And now something magic happens. If x and y are both in the kernel, then x + y has to be in the kernel too. Think about it. Meanwhile, if x is in the kernel but y isn’t, then f(x + y) is f(y). Again think about it.

What we can see is that the domain fractures into two directions. One of them, the direction of the kernel, is invisible to the function. You can move however much you like in that direction and f can’t see it. The other direction, perpendicular (“orthogonal”, we say in the trade) to the kernel, is visible. Everything that might change changes in that direction.

This idea threads through vector spaces, and we study a lot of things that turn out to look like vector spaces. It keeps surprising us by letting us solve problems, or find the best-possible approximate solutions. This kernel gives us room to match some fiddly conditions without breaking the real solution. The size of the null space alone can tell us whether some problems are solvable, or whether they’ll have infinitely large sets of solutions.

In this vector-space construct the kernel often takes on another name, the “null space”. This means the same thing. But it reminds us that superhero comics writers miss out on many excellent pieces of terminology by not taking advanced courses in mathematics.

Kernels also appear in group theory, whenever we get into rings. We’re always working with rings. They’re nearly as unavoidable as vector spaces.

You know how you can divide the whole numbers into odd and even? And you can do some neat tricks with that for some problems? You can do that with every ring, using the kernel as a dividing point. This gives us information about how the ring is shaped, and what other structures might look like the ring. This often lets us turn proofs that might be hard into a collection of proofs on individual cases that are, at least, doable. Tricks about odd and even numbers become, in trained hands, subtle proofs of surprising results.

We see vector spaces and rings all over the place in mathematics. Some of that’s selection bias. Vector spaces capture a lot of what’s important about geometry. Rings capture a lot of what’s important about arithmetic. We have understandings of geometry and arithmetic that transcend even our species. Raccoons understand space. Crows understand number. When we look to do mathematics we look for patterns we understand, and these are major patterns we understand. And there are kernels that matter to each of them.

Some mathematical ideas inspire metaphors to me. Kernels are one. Kernels feel to me like the process of holding a polarized lens up to a crystal. This lets one see how the crystal is put together. I realize writing this down that my metaphor is unclear: is the kernel the lens or the structure seen in the crystal? I suppose the function has to be the lens, with the kernel the crystallization planes made clear under it. It’s curious I had enjoyed this feeling about kernels and functions for so long without making it precise. Feelings about mathematical structures can be like that.

The End 2016 Mathematics A To Z: Jordan Curve


I realize I used this thing in one of my Theorem Thursday posts but never quite said what it was. Let me fix that.

Jordan Curve

Get a rubber band. Well, maybe you can’t just now, even if you wanted to after I gave orders like that. Imagine a rubber band. I apologize to anyone so offended by my imperious tone that they’re refusing. It’s the convention for pop mathematics or science.

Anyway, take your rubber band. Drop it on a table. Fiddle with it so it hasn’t got any loops in it and it doesn’t twist over any. I want the whole of one edge of the band touching the table. You can imagine the table too. That is a Jordan Curve, at least as long as the rubber band hasn’t broken.

This may not look much like a circle. It might be close, but I bet it’s got some wriggles in its curves. Maybe it even curves so much the thing looks more like a kidney bean than a circle. Maybe it pinches so much that it looks like a figure eight, a couple of loops connected by a tiny bridge on the interior. Doesn’t matter. You can bring out the circle. Put your finger inside the rubber band’s loops and spiral your finger around. Do this gently and the rubber band won’t jump off the table. It’ll round out to as perfect a circle as the limitations of matter allow.

And for that matter, if we wanted, we could take a rubber band laid down as a perfect circle. Then nudge it here and push it there and wrinkle it up into as complicated a figure as you like. Either way is as possible.

A Jordan Curve is a closed curve, a curve that loops around back to itself. And it’s simple. That is, it doesn’t cross over itself at any point. However weird and loopy this figure is, as long as it doesn’t cross over itself, it’s got in a sense the same shape as a circle. We can imagine a function that matches every point on a true circle to a point on the Jordan Curve. A set of points in order on the original circle will match to points in the same order on the Jordan Curve. There’s nothing missing and there’s no jumps or ambiguous points. And no point on the Jordan Curve matches to two or more on the original circle. (This is why we don’t let the curve to cross over itself.)

When I wrote about the Jordan Curve Theorem it was about how to tell how a curve divides a plane into two pieces, an inside and an outside. You can have some pretty complicated-looking figures. I have an example on the Jordan Curve Theorem essay, but you can make your own by doodling. And we can look at it as a circle, as a rubber band, twisted all around.

This all dips into topology, the study of how shapes connect when we don’t care about distance. But there are simple wondrous things to find about them. For example. Draw a Jordan Curve, please. Any that you like. Now draw a triangle. Again, any that you like.

There is some trio of points in your Jordan Curve which connect to a triangle the same shape as the one you drew. It may be bigger than your triangle, or smaller. But it’ll look similar. The angles inside will all be the same as the ones you started with. This should help make doodling during a dull meeting even more exciting.

There may be four points on your Jordan Curve that make a square. I don’t know. Nobody knows for sure. There certainly are if your curve is convex, that is, if no line between any two points on the curve goes outside the curve. And it’s true even for curves that aren’t complex if they are smooth enough. But generally? For an arbitrary curve? We don’t know. It might be true. It might be impossible to find a square in some Jordan Curve. It might be the Jordan Curve you drew. Good luck looking.

The End 2016 Mathematics A To Z: Image


It’s another free-choice entry. I’ve got something that I can use to make my Friday easier.

Image.

So remember a while back I talked about what functions are? I described them the way modern mathematicians like. A function’s got three components to it. One is a set of things called the domain. Another is a set of things called the range. And there’s some rule linking things in the domain to things in the range. In shorthand we’ll write something like “f(x) = y”, where we know that x is in the domain and y is in the range. In a slightly more advanced mathematics class we’ll write f: x \rightarrow y . That maybe looks a little more computer-y. But I bet you can read that already: “f matches x to y”. Or maybe “f maps x to y”.

We have a couple ways to think about what ‘y’ is here. One is to say that ‘y’ is the image of ‘x’, under ‘f’. The language evokes camera trickery, or at least the way a trick lens might make us see something different. Pretend that the domain is something you could gaze at. If the domain is, say, some part of the real line, or a two-dimensional plane, or the like that’s not too hard to do. Then we can think of the rule part of ‘f’ as some distorting filter. When we look to where ‘x’ would be, we see the thing in the range we know as ‘y’.

At this point you probably imagine this is a pointless word to have. And that it’s backed up by a useless analogy. So it is. As far as I’ve gone this addresses a problem we don’t need to solve. If we want “the thing f matches x to” we can just say “f(x)”. Well, we write “f(x)”. We say “f of x”. Maybe “f at x”, or “f evaluated at x” if we want to emphasize ‘f’ more than ‘x’ or ‘f(x)’.

Where it gets useful is that we start looking at subsets. Bunches of points, not just one. Call ‘D’ some interesting-looking subset of the domain. What would it mean if we wrote the expression ‘f(D)’? Could we make that meaningful?

We do mean something by it. We mean what you might imagine by it. If you haven’t thought about what ‘f(D)’ might mean, take a moment — a short moment — and guess what it might. Don’t overthink it and you’ll have it right. I’ll put the answer just after this little bit so you can ponder.

Close up view of a Flemish Giant rabbit looking at you from the corner of his eye.
Our pet rabbit on the beach in Omena, Michigan back in July this year. Which is a small town on the Traverse Bay, which is just off Lake Michigan where … oh, you have Google Maps, you don’t need me. Anyway we wondered what he would make of vast expanses of water, considering he doesn’t like water what with being a rabbit and all that. And he watched it for a while and then shuffled his way in to where the waves come up and could wash over his front legs, making us wonder what kind of crazy rabbit he is, exactly.

So. ‘f(D)’ is a set. We make that set by taking, in turn, every single thing that’s in ‘D’. And find everything in the range that’s matched by ‘f’ to those things in ‘D’. Collect them all together. This set, ‘f(D)’, is “the image of D under f”.

We use images a lot when we’re studying how functions work. A function that maps a simple lump into a simple lump of about the same size is one thing. A function that maps a simple lump into a cloud of disparate particles is a very different thing. A function that describes how physical systems evolve will preserve the volume and some other properties of these lumps of space. But it might stretch out and twist around that space, which is how we discovered chaos.

Properly speaking, the range of a function ‘f’ is just the image of the whole domain under that ‘f’. But we’re not usually that careful about defining ranges. We’ll say something like ‘the domain and range are the sets of real numbers’ even though we only need the positive real numbers in the range. Well, it’s not like we’re paying for unnecessary range. Let me call the whole domain ‘X’, because I went and used ‘D’ earlier. Then the range, let me call that ‘Y’, would be ‘Y = f(X)’.

Images will turn up again. They’re a handy way to let us get at some useful ideas.

The End 2016 Mathematics A To Z: Hat


I was hoping to pick a term that was a quick and easy one to dash off. I learned better.

Hat.

This is a simple one. It’s about notation. Notation is never simple. But it’s important. Good symbols organize our thoughts. They tell us what are the common ordinary bits of our problem, and what are the unique bits we need to pay attention to here. We like them to be easy to write. Easy to type is nice, too, but in my experience mathematicians work by hand first. Typing is tidying-up, and we accept that being sluggish. Unique would be nice, so that anyone knows what kind of work we’re doing just by looking at the symbols. I don’t think anything manages that. But at least some notation has alternate uses rare enough we don’t have to worry about it.

“Hat” has two major uses I know of. And we call it “hat”, although our friends in the languages department would point out this is a caret. The little pointy corner that goes above a letter, like so: \hat{i} . \hat{x} . \hat{e} . It’s not something we see on its own. It’s always above some variable.

The first use of the hat like this comes up in statistics. It’s a way of marking that something is an estimate. By “estimate” here we mean what anyone might mean by “estimate”. Statistics is full of uses for this sort of thing. For example, we often want to know what the arithmetic mean of some quantity is. The average height of people. The average temperature for the 18th of November. The average weight of a loaf of bread. We have some letter that we use to mean “the value this has for any one example”. By some letter we mean ‘x’, maybe sometimes ‘y’. We can use any and maybe the problem begs for something. But it’s ‘x’, maybe sometimes ‘y’.

For the arithmetic mean of ‘x’ for the whole population we write the letter with a horizontal bar over it. (The arithmetic mean is the thing everybody in the world except mathematicians calls the average. Also, it’s what mathematicians mean when they say the average. We just get fussy because we know if we don’t say “arithmetic mean” someone will come along and point out there are other averages.) That arithmetic mean is \bar{x} . Maybe \bar{y} if we must. Must be some number. But what is it? If we can’t measure whatever it is for every single example of our group — the whole population — then we have to make an estimate. We do that by taking a sample, ideally one that isn’t biased in some way. (This is so hard to do, or at least be sure you’ve done.) We can find the mean for this sample, though, because that’s how we picked it. The mean of this sample is probably close to the mean of the whole population. It’s an estimate. So we can write \hat{x} and understand. This is not \bar{x} but it does give us a good idea what \hat{x} should be.

(We don’t always use the caret ^ for this. Sometimes we use a tilde ~ instead. ~ has the advantage that it’s often used for “approximately equal to”. So it will carry that suggestion over to its new context.)

The other major use of the hat comes in vectors. Mathematics types do a lot of work with vectors. It turns out a lot of mathematical structures work the way that pointing and moving in directions in ordinary space do. That’s why back when I talked about what vectors were I didn’t say “they’re like arrows pointing some length in some direction”. Arrows pointing some length in some direction are vectors, yes, but there are many more things that are vectors. Thinking of moving in particular directions gives us good intuition for how to work with vectors, and for stuff that turns out to be vectors. But they’re not everything.

If we need to highlight that something is a vector we put a little arrow over its name. \vec{x} . \vec{e} . That sort of thing. (Or if we’re typing, we might put the letter in boldface: x. This was good back before computers let us put in mathematics without giving the typesetters hazard pay.) We don’t always do that. By the time we do a lot of stuff with vectors we don’t always need the reminder. But we will include it if we need a warning. Like if we want to have both \vec{r} telling us where something is and to use a plain old r to tell us how big the vector \vec{r} is. That turns up a lot in physics problems.

Every vector has some length. Even vectors that don’t seem to have anything to do with distances do. We can make a perfectly good vector out of “polynomials defined for the domain of numbers between -2 and +2”. Those polynomials are vectors, and they have lengths.

There’s a special class of vectors, ones that we really like in mathematics. They’re the “unit vectors”. Those are vectors with a length of 1. And we are always glad to see them. They’re usually good choices for a basis. Basis vectors are useful things. They give us, in a way, a representative slate of cases to solve. Then we can use that representative slate to give us whatever our specific problem’s solution is. So mathematicians learn to look instinctively to them. We want basis vectors, and we really like them to have a length of 1. Even if we aren’t putting the arrow over our variables we’ll put the caret over the unit vectors.

There are some unit vectors we use all the time. One is just the directions in space. That’s \hat{e}_1 and \hat{e}_2 and for that matter \hat{e}_3 and I bet you have an idea what the next one in the set might be. You might be right. These are basis vectors for normal, Euclidean space, which is why they’re labelled “e”. We have as many of them as we have dimensions of space. We have as many dimensions of space as we need for whatever problem we’re working on. If we need a basis vector and aren’t sure which one, we summon one of the letters used as indices all the time. \hat{e}_i , say, or \hat{e}_j . If we have an n-dimensional space, then we have unit vectors all the way up to \hat{e}_n .

We also use the hat a lot if we’re writing quaternions. You remember quaternions, vaguely. They’re complex-valued numbers for people who’re bored with complex-valued numbers and want some thrills again. We build them as a quartet of numbers, each added together. Three of them are multiplied by the mysterious numbers ‘i’, ‘j’, and ‘k’. Each ‘i’, ‘j’, or ‘k’ multiplied by itself is equal to -1. But ‘i’ doesn’t equal ‘j’. Nor does ‘j’ equal ‘k’. Nor does ‘k’ equal ‘i’. And ‘i’ times ‘j’ is ‘k’, while ‘j’ times ‘i’ is minus ‘k’. That sort of thing. Easy to look up. You don’t need to know all the rules just now.

But we often end up writing a quaternion as a number like 4 + 2\hat{i} - 3\hat{j} + 1 \hat{k} . OK, that’s just the one number. But we will write numbers like a + b\hat{i} + c\hat{j} + d\hat{k} . Here a, b, c, and d are all real numbers. This is kind of sloppy; the pieces of a quaternion aren’t in fact vectors added together. But it is hard not to look at a quaternion and see something pointing in some direction, like the first vectors we ever learn about. And there are some problems in pointing-in-a-direction vectors that quaternions handle so well. (Mostly how to rotate one direction around another axis.) So a bit of vector notation seeps in where it isn’t appropriate.

I suppose there’s some value in pointing out that the ‘i’ and ‘j’ and ‘k’ in a quaternion are fixed and set numbers. They’re unlike an ‘a’ or an ‘x’ we might see in the expression. I’m not sure anyone was thinking they were, though. Notation is a tricky thing. It’s as hard to get sensible and consistent and clear as it is to make words and grammar sensible. But the hat is a simple one. It’s good to have something like that to rely on.

The End 2016 Mathematics A To Z: General Covariance


Today’s term is another request, and another of those that tests my ability to make something understandable. I’ll try anyway. The request comes from Elke Stangl, whose “Research Notes on Energy, Software, Life, the Universe, and Everything” blog I first ran across years ago, when she was explaining some dynamical systems work.

General Covariance

So, tensors. They’re the things mathematicians get into when they figure vectors just aren’t hard enough. Physics majors learn about them too. Electrical engineers really get into them. Some material science types too.

You maybe notice something about those last three groups. They’re interested in subjects that are about space. Like, just, regions of the universe. Material scientists wonder how pressure exerted on something will get transmitted. The structure of what’s in the space matters here. Electrical engineers wonder how electric and magnetic fields send energy in different directions. And physicists — well, everybody who’s ever read a pop science treatment of general relativity knows. There’s something about the shape of space something something gravity something equivalent acceleration.

So this gets us to tensors. Tensors are this mathematical structure. They’re about how stuff that starts in one direction gets transmitted into other directions. You can see how that’s got to have something to do with transmitting pressure through objects. It’s probably not too much work to figure how that’s relevant to energy moving through space. That it has something to do with space as just volume is harder to imagine. But physics types have talked about it quite casually for over a century now. Science fiction writers have been enthusiastic about it almost that long. So it’s kind of like the Roman Empire. It’s an idea we hear about early and often enough we’re never really introduced to it. It’s never a big new idea we’re presented, the way, like, you get specifically told there was (say) a War of 1812. We just soak up a couple bits we overhear about the idea and carry on as best our lives allow.

But to think of space. Start from somewhere. Imagine moving a little bit in one direction. How far have you moved? If you started out in this one direction, did you somehow end up in a different one? Now imagine moving in a different direction. Now how far are you from where you started? How far is your direction from where you might have imagined you’d be? Our intuition is built around a Euclidean space, or one close enough to Euclidean. These directions and distances and combined movements work as they would on a sheet of paper, or in our living room. But there is a difference. Walk a kilometer due east and then one due north and you will not be in exactly the same spot as if you had walked a kilometer due north and then one due east. Tensors are efficient ways to describe those little differences. And they tell us something of the shape of the Earth from knowing these differences. And they do it using much of the form that matrices and vectors do, so they’re not so hard to learn as they might be.

That’s all prelude. Here’s the next piece. We go looking at transformations. We take a perfectly good coordinate system and a point in it. Now let the light of the full Moon shine upon it, so that it shifts to being a coordinate werewolf. Look around you. There’s a tensor that describes how your coordinates look here. What is it?

You might wonder why we care about transformations. What was wrong with the coordinates we started with? But that’s because mathematicians have lumped a lot of stuff into the same name of “transformation”. A transformation might be something as dull as “sliding things over a little bit”. Or “turning things a bit”. It might be “letting a second of time pass”. Or “following the flow of whatever’s moving”. Stuff we’d like to know for physics work.

“General covariance” is a term that comes up when thinking about transformations. Suppose we have a description of some physics problem. By this mostly we mean “something moving in space” or “a bit of light moving in space”. That’s because they’re good building blocks. A lot of what we might want to know can be understood as some mix of those two problems.

Put your description through the same transformation your coordinate system had. This will (most of the time) change the details of how your problem’s represented. But does it change the overall description? Is our old description no longer even meaningful?

I trust at this point you’ve nodded and thought something like “well, that makes sense”. Give it another thought. How could we not have a “generally covariant” description of something? Coordinate systems are our impositions on a problem. We create them to make our lives easier. They’re real things in exactly the same way that lines of longitude and latitude are real. If we increased the number describing the longitude of every point in the world by 14, we wouldn’t change anything real about where stuff was or how to navigate to it. We couldn’t.

Here I admit I’m stumped. I can’t think of a good example of a system that would look good but not be generally covariant. I’m forced to resort to metaphors and analogies that make this essay particularly unsuitable to use for your thesis defense.

So here’s the thing. Longitude is a completely arbitrary thing. Measuring where you are east or west of some prime meridian might be universal, or easy for anyone to tumble onto. But the prime meridian is a cultural choice. It’s changed before. It may change again. Indeed, Geographic Information Services people still work with many different prime meridians. Most of them are for specialized purposes. Stuff like mapping New Jersey in feet north and east of some reference, for which Greenwich would make the numbers too ugly. But if our planet is mapped in an alien’s records, that map has at its center some line almost surely not Greenwich.

But latitude? Latitude is, at least, less arbitrary. That we measure it from zero to ninety degrees, north or south, is a cultural choice. (Or from -90 to 90 degrees. Same thing.) But that there’s a north pole and a south pole? That’s true as long as the planet is rotating. And that’s forced on us. If we tried to describe the Earth as rotating on an axis between Paris and Mexico City, we would … be fighting an uphill struggle, at least. It’s hard to see any problem that might make easier, apart from getting between Paris and Mexico City.

In models of the laws of physics we don’t really care about the north or south pole. A planet might have them or might not. But it has got some privileged stuff that just has to be so. We can’t have stuff that makes the speed of light in a vacuum change. And we have to make sense of a block of space that hasn’t got anything in it, no matter, no light, no energy, no gravity. I think those are the important pieces actually. But I’ll defer, growling angrily, to an expert in general relativity or non-Euclidean coordinates if I’ve misunderstood.

It’s often put that “general covariance” is one of the requirements for a scheme to describe General Relativity. I shall risk sounding like I’m making a joke and say that depends on your perspective. One can use different philosophical bases for describing General Relativity. In some of them you can see general covariance as a result rather than use it as a basic assumption. Here’s a 1993 paper by Dr John D Norton that describes some of the different ways to understand the point of general covariance.

By the way the term “general covariance” comes from two pieces. The “covariance” is because it describes how changes in one coordinate system are reflected in another. It’s “general” because we talk about coordinate transformations without knowing much about them. That is, we’re talking about transformations in general, instead of some specific case that’s easy to work with. This is why the mathematics of this can be frightfully tricky; we don’t know much about the transformations we’re working with. For a parallel, it’s easy to tell someone how to divide 14 into 112. It’s harder to tell them how to divide absolutely any number into absolutely any other number.

Quite a bit of mathematical physics plays into geometry. Gravity physicists mostly see as a problem of geometry. People who like reading up on science take that as given too. But many problems can be understood as a point or a blob of points in some kind of space, and how that point moves or that blob evolves in time. We don’t see “general covariance” in these other fields exactly. But we do see things that resemble it. It’s an idea with considerable reach.


I’m not sure how I feel about this. For most of my essays I’ve kept away from equations, even for the Why Stuff Can Orbit sequence. But this is one of those subjects it’s hard to be exact about without equations. I might revisit this in a special all-symbols, calculus-included, edition. Depends what my schedule looks like.

The End 2016 Mathematics A To Z: The Fredholm Alternative


Some things are created with magnificent names. My essay today is about one of them. It’s one of my favorite terms and I get a strange little delight whenever it needs to be mentioned in a proof. It’s also the title I shall use for my 1970s Paranoid-Conspiracy Thriller.

The Fredholm Alternative.

So the Fredholm Alternative is about whether this supercomputer with the ability to monitor every commercial transaction in the country falls into the hands of the Parallax Corporation or whether — ahm. Sorry. Wrong one. OK.

The Fredholm Alternative comes from the world of functional analysis. In functional analysis we study sets of functions with tools from elsewhere in mathematics. Some you’d be surprised aren’t already in there. There’s adding functions together, multiplying them, the stuff of arithmetic. Some might be a bit surprising, like the stuff we draw from linear algebra. That’s ideas like functions having length, or being at angles to each other. Or that length and those angles changing when we take a function of those functions. This may sound baffling. But a mathematics student who’s got into functional analysis usually has a happy surprise waiting. She discovers the subject is easy. At least, it relies on a lot of stuff she’s learned already, applied to stuff that’s less difficult to work with than, like, numbers.

(This may be a personal bias. I found functional analysis a thoroughgoing delight, even though I didn’t specialize in it. But I got the impression from other grad students that functional analysis was well-liked. Maybe we just got the right instructor for it.)

I’ve mentioned in passing “operators”. These are functions that have a domain that’s a set of functions and a range that’s another set of functions. Suppose you come up to me with some function, let’s say f(x) = x^2 . I give you back some other function — say, F(x) = \frac{1}{3}x^3 - 4 . Then I’m acting as an operator.

Why should I do such a thing? Many operators correspond to doing interesting stuff. Taking derivatives of functions, for example. Or undoing the work of taking a derivative. Describing how changing a condition changes what sorts of outcomes a process has. We do a lot of stuff with these. Trust me.

Let me use the name `T’ for some operator. I’m not going to say anything about what it does. The letter’s arbitrary. We like to use capital letters for operators because it makes the operators look extra important. And we don’t want to use `O’ because that just looks like zero and we don’t need that confusion.

Anyway. We need two functions. One of them will be called ‘f’ because we always call functions ‘f’. The other we’ll call ‘v’. In setting up the Fredholm Alternative we have this important thing: we know what ‘f’ is. We don’t know what ‘v’ is. We’re finding out something about what ‘v’ might be. The operator doing whatever it does to a function we write down as if it were multiplication, that is, like ‘Tv’. We get this notation from linear algebra. There we multiple matrices by vectors. Matrix-times-vector multiplication works like operator-on-a-function stuff. So much so that if we didn’t use the same notation young mathematics grad students would rise in rebellion. “This is absurd,” they would say, in unison. “The connotations of these processes are too alike not to use the same notation!” And the department chair would admit they have a point. So we write ‘Tv’.

If you skipped out on mathematics after high school you might guess we’d write ‘T(v)’ and that would make sense too. And, actually, we do sometimes. But by the time we’re doing a lot of functional analysis we don’t need the parentheses so much. They don’t clarify anything we’re confused about, and they require all the work of parenthesis-making. But I do see it sometimes, mostly in older books. This makes me think mathematicians started out with ‘T(v)’ and then wrote less as people got used to what they were doing.

I admit we might not literally know what ‘f’ is. I mean we know what ‘f’ is in the same way that, for a quadratic equation, “ax2 + bx + c = 0”, we “know” what ‘a’, ‘b’, and ‘c’ are. Similarly we don’t know what ‘v’ is in the same way we don’t know what ‘x’ there is. The Fredholm Alternative tells us exactly one of these two things has to be true:

For operators that meet some requirements I don’t feel like getting into, either:

  1. There’s one and only one ‘v’ which makes the equation Tv  = f true.
  2. Or else Tv = 0 for some ‘v’ that isn’t just zero everywhere.

That is, either there’s exactly one solution, or else there’s no solving this particular equation. We can rule out there being two solutions (the way quadratic equations often have), or ten solutions (the way some annoying problems will), or infinitely many solutions (oh, it happens).

It turns up often in boundary value problems. Often before we try solving one we spend some time working out whether there is a solution. You can imagine why it’s worth spending a little time working that out before committing to a big equation-solving project. But it comes up elsewhere. Very often we have problems that, at their core, are “does this operator match anything at all in the domain to a particular function in the range?” When we try to answer we stumble across Fredholm’s Alternative over and over.

Fredholm here was Ivar Fredholm, a Swedish mathematician of the late 19th and early 20th centuries. He worked for Uppsala University, and for the Swedish Social Insurance Agency, and as an actuary for the Skandia insurance company. Wikipedia tells me that his mathematical work was used to calculate buyback prices. I have no idea how.

The End 2016 Mathematics A To Z: Ergodic


This essay follows up on distributions, mentioned back on Wednesday. This is only one of the ideas which distributions serve. Do you have a word you’d like to request? I figure to close ‘F’ on Saturday afternoon, and ‘G’ is already taken. But give me a request for a free letter soon and I may be able to work it in.

Ergodic.

There comes a time a physics major, or a mathematics major paying attention to one of the field’s best non-finance customers, first works on a statistical mechanics problem. Instead of keeping track of the positions and momentums of one or two or four particles she’s given the task of tracking millions of particles. It’s listed as a distribution of all the possible values they can have. But she still knows what it really is. And she looks at how to describe the way this distribution changes in time. If she’s the slightest bit like me, or anyone I knew, she freezes up this. Calculate the development of millions of particles? Impossible! She tries working out what happens to just one, instead, and hopes that gives some useful results.

And then it does.

It’s a bit much to call this luck. But it is because the student starts off with some simple problems. Particles of gas in a strong box, typically. They don’t interact chemically. Maybe they bounce off each other, but she’s never asked about that. She’s asked about how they bounce off the walls. She can find the relationship between the volume of the box and the internal gas pressure on the interior and the temperature of the gas. And it comes out right.

She goes on to some other problems and it suddenly fails. Eventually she re-reads the descriptions of how to do this sort of problem. And she does them again and again and it doesn’t feel useful. With luck there’s a moment, possibly while showering, that the universe suddenly changes. And the next time the problem works out. She’s working on distributions instead of toy little single-particle problems.

But the problem remains: why did it ever work, even for that toy little problem?

It’s because some systems of things are ergodic. It’s a property that some physics (or mathematics) problems have. Not all. It’s a bit hard to describe clearly. Part of what motivated me to take this topic is that I want to see if I can explain it clearly.

Every part of some system has a set of possible values it might have. A particle of gas can be in any spot inside the box holding it. A person could be in any of the buildings of her city. A pool ball could be travelling in any direction on the pool table. Sometimes that will change. Gas particles move. People go to the store. Pool balls bounce off the edges of the table.

These values will have some kind of distribution. Look at where the gas particle is now. And a second from now. And a second after that. And so on, to the limits of human knowledge. Or to when the box breaks open. Maybe the particle will be more often in some areas than in others. Maybe it won’t. Doesn’t matter. It has some distribution. Over time we can say how often we expect to find the gas particle in each of its possible places.

The same with whatever our system is. People in buildings. Balls on pool tables. Whatever.

Now instead of looking at one particle (person, ball, whatever) we have a lot of them. Millions of particle in the box. Tens of thousands of people in the city. A pool table that somehow supports ten thousand balls. Imagine they’re all settled to wherever they happen to be.

So where are they? The gas particle one is easy to imagine. At least for a mathematics major. If you’re stuck on it I’m sorry. I didn’t know. I’ve thought about boxes full of gas particles for decades now and it’s hard to remember that isn’t normal. Let me know if you’re stuck, and where you are. I’d like to know where the conceptual traps are.

But back to the gas particles in a box. Some fraction of them are in each possible place in the box. There’s a distribution here of how likely you are to find a particle in each spot.

How does that distribution, the one you get from lots of particles at once, compare to the first, the one you got from one particle given plenty of time? If they agree the system is ergodic. And that’s why my hypothetical physics major got the right answers from the wrong work. (If you are about to write me to complain I’m leaving out important qualifiers let me say I know. Please pretend those qualifiers are in place. If you don’t see what someone might complain about thank you, but it wouldn’t hurt to think of something I might be leaving out here. Try taking a shower.)

The person in a building is almost certainly not an ergodic system. There’s buildings any one person will never ever go into, however possible it might be. But nearly all buildings have some people who will go into them. The one-person-with-time distribution won’t be the same as the many-people-at-once distribution. Maybe there’s a way to qualify things so that it becomes ergodic. I doubt it.

The pool table, now, that’s trickier to say. For a real pool table no, of course not. An actual ball on an actual table rolls to a stop pretty soon, either from the table felt’s friction or because it drops into a pocket. Tens of thousands of balls would form an immobile heap on the table that would be pretty funny to see, now that I think of it. Well, maybe those are the same. But they’re a pretty boring same.

Anyway when we talk about “pool tables” in this context we don’t mean anything so sordid as something a person could play pool on. We mean something where the table surface hasn’t any friction. That makes the physics easier to model. It also makes the game unplayable, which leaves the mathematical physicist strangely unmoved. In this context anyway. We also mean a pool table that hasn’t got any pockets. This makes the game even more unplayable, but the physics even easier. (It makes it, really, like a gas particle in a box. Only without that difficult third dimension to deal with.)

And that makes it clear. The one ball on a frictionless, pocketless table bouncing around forever maybe we can imagine. A huge number of balls on that frictionless, pocketless table? Possibly trouble. As long as we’re doing imaginary impossible unplayable pool we could pretend the balls don’t collide with each other. Then the distributions of what ways the balls are moving could be equal. If they do bounce off each other, or if they get so numerous they can’t squeeze past one another, well, that’s different.

An ergodic system lets you do this neat, useful trick. You can look at a single example for a long time. Or you can look at a lot of examples at one time. And they’ll agree in their typical behavior. If one is easier to study than the other, good! Use the one that you can work with. Mathematicians like to do this sort of swapping between equivalent problems a lot.

The problem is it’s hard to find ergodic systems. We may have a lot of things that look ergodic, that feel like they should be ergodic. But proved ergodic, with a logic that we can’t shake? That’s harder to do. Often in practice we will include a note up top that we are assuming the system to be ergodic. With that “ergodic hypothesis” in mind we carry on with our work. It gives us a handle on a lot of problems that otherwise would be beyond us.

The End 2016 Mathematics A To Z: Distribution (statistics)


As I’ve done before I’m using one of my essays to set up for another essay. It makes a later essay easier. What I want to talk about is worth some paragraphs on its own.

Distribution (statistics)

The 19th Century saw the discovery of some unsettling truths about … well, everything, really. If there is an intellectual theme of the 19th Century it’s that everything has an unsettling side. In the 20th Century craziness broke loose. The 19th Century, though, saw great reasons to doubt that we knew what we knew.

But one of the unsettling truths grew out of mathematical physics. We start out studying physics the way Galileo or Newton might have, with falling balls. Ones that don’t suffer from air resistance. Then we move up to more complicated problems, like balls on a spring. Or two balls bouncing off each other. Maybe one ball, called a “planet”, orbiting another, called a “sun”. Maybe a ball on a lever swinging back and forth. We try a couple simple problems with three balls and find out that’s just too hard. We have to track so much information about the balls, about their positions and momentums, that we can’t solve any problems anymore. Oh, we can do the simplest ones, but we’re helpless against the interesting ones.

And then we discovered something. By “we” I mean people like James Clerk Maxwell and Josiah Willard Gibbs. And that is that we can know important stuff about how millions and billions and even vaster numbers of things move around. Maxwell could work out how the enormously many chunks of rock and ice that make up Saturn’s rings move. Gibbs could work out how the trillions of trillions of trillions of trillions of particles of gas in a room move. We can’t work out how four particles move. How is it we can work out how a godzillion particles move?

We do it by letting go. We stop looking for that precision and exactitude and knowledge down to infinitely many decimal points. Even though we think that’s what mathematicians and physicists should have. What we do instead is consider the things we would like to know. Where something is. What its momentum is. What side of a coin is showing after a toss. What card was taken off the top of the deck. What tile was drawn out of the Scrabble bag.

There are possible results for each of these things we would like to know. Perhaps some of them are quite likely. Perhaps some of them are unlikely. We track how likely each of these outcomes are. This is called the distribution of the values. This can be simple. The distribution for a fairly tossed coin is “heads, 1/2; tails, 1/2”. The distribution for a fairly tossed six-sided die is “1/6 chance of 1; 1/6 chance of 2; 1/6 chance of 3” and so on. It can be more complicated. The distribution for a fairly tossed pair of six-sided die starts out “1/36 chance of 2; 2/36 chance of 3; 3/36 chance of 4” and so on. If we’re measuring something that doesn’t come in nice discrete chunks we have to talk about ranges: the chance that a 30-year-old male weighs between 180 and 185 pounds, or between 185 and 190 pounds. The chance that a particle in the rings of Saturn is moving between 20 and 21 kilometers per second, or between 21 and 22 kilometers per second, and so on.

We may be unable to describe how a system evolves exactly. But often we’re able to describe how the distribution of its possible values evolves. And the laws by which probability work conspire to work for us here. We can get quite precise predictions for how a whole bunch of things behave even without ever knowing what any thing is doing.

That’s unsettling to start with. It’s made worse by one of the 19th Century’s late discoveries, that of chaos. That a system can be perfectly deterministic. That you might know what every part of it is doing as precisely as you care to measure. And you’re still unable to predict its long-term behavior. That’s unshakeable too, although statistical techniques will give you an idea of how likely different behaviors are. You can learn the distribution of what is likely, what is unlikely, and how often the outright impossible will happen.

Distributions follow rules. Of course they do. They’re basically the rules you’d imagine from looking at and thinking about something with a range of values. Something like a chart of how many students got what grades in a class, or how tall the people in a group are, or so on. Each possible outcome turns up some fraction of the time. That fraction’s never less than zero nor greater than 1. Add up all the fractions representing all the times every possible outcome happens and the sum is exactly 1. Something happens, even if we never know just what. But we know how often each outcome will.

There is something amazing to consider here. We can know and track everything there is to know about a physical problem. But we will be unable to do anything with it, except for the most basic and simple problems. We can choose to relax, to accept that the world is unknown and unknowable in detail. And this makes imaginable all sorts of problems that should be beyond our power. Once we’ve given up on this precision we get precise, exact information about what could happen. We can choose to see it as a moral about the benefits and costs and risks of how tightly we control a situation. It’s a surprising lesson to learn from one’s training in mathematics.

The End 2016 Mathematics A To Z: Cantor’s Middle Third


Today’s term is a request, the first of this series. It comes from HowardAt58, head of the Saving School Math blog. There are many letters not yet claimed; if you have a term you’d like to see my write about please head over to the “Any Requests?” page and pick a letter. Please not one I figure to get to in the next day or two.

Cantor’s Middle Third.

I think one could make a defensible history of mathematics by describing it as a series of ridiculous things that get discovered. And then, by thinking about these ridiculous things long enough, mathematicians come to accept them. Even rely on them. Sometime later the public even comes to accept them. I don’t mean to say getting people to accept ridiculous things is the point of mathematics. But there is a pattern which happens.

Consider. People doing mathematics came to see how a number could be detached from a count or a measure of things. That we can do work on, say, “three” whether it’s three people, three kilograms, or three square meters. We’re so used to this it’s only when we try teaching mathematics to the young we realize it isn’t obvious.

Or consider that we can have, rather than a whole number of things, a fraction. Some part of a thing, as if you could have one-half pieces of chalk or two-thirds a fruit. Counting is relatively obvious; fractions are something novel but important.

We have “zero”; somehow, the lack of something is still a number, the way two or five or one-half might be. For that matter, “one” is a number. How can something that isn’t numerous be a number? We’re used to it anyway. We can have not just fraction and one and zero but irrational numbers, ones that can’t be represented as a fraction. We have negative numbers, somehow a lack of whatever we were counting so great that we might add some of what we were counting to the pile and still have nothing.

That takes us up to about eight hundred years ago or something like that. The public’s gotten to accept all this as recently as maybe three hundred years ago. They’ve still got doubts. I don’t blame folks. Complex numbers mathematicians like; the public’s still getting used to the idea, but at least they’ve heard of them.

Cantor’s Middle Third is part of the current edge. It’s something mathematicians are aware of and that defies sense at least. But we’ve come to accept it. The public, well, they don’t know about it. Maybe some do; it turns up in pop mathematics books that like sharing the strangeness of infinities. Few people read them. Sometimes it feels like all those who do go online to tell mathematicians they’re crazy. It comes to us, as you might guess from the name, from Georg Cantor. Cantor established the modern mathematical concept of how to study infinitely large sets in the late 19th century. And he was repeatedly hospitalized for depression. It’s cruel to write all that off as “and he was crazy”. His work’s withstood a hundred and thirty-five years of extremely smart people looking at it skeptically.

The Middle Third starts out easily enough. Take a line segment. Then chop it into three equal pieces and throw away the middle third. You see where the name comes from. What do you have left? Some of the original line. Two-thirds of the original line length. A big gap in the middle.

Now take the two line segments. Chop each of them into three equal pieces. Throw away the middle thirds of the two pieces. Now we’re left with four chunks of line and four-ninths of the original length. One big and two little gaps in the middle.

Now take the four little line segments. Chop each of them into three equal pieces. Throw away the middle thirds of the four pieces. We’re left with eight chunks of line, about eight-twenty-sevenths of the original length. Lots of little gaps. Keep doing this, chopping up line segments and throwing away middle pieces. Never stop. Well, pretend you never stop and imagine what’s left.

What’s left is deeply weird. What’s left has no length, no measure. That’s easy enough to prove. But we haven’t thrown everything away. There are bits of the original line segment left over. The left endpoint of the original line is left behind. So is the right endpoint of the original line. The endpoints of the line segments after the first time we chopped out a third? Those are left behind. The endpoints of the line segments after chopping out a third the second time, the third time? Those have to be in the set. We have a dust, isolated little spots of the original line, none of them combining together to cover any length. And there are infinitely many of these isolated dots.

We’ve seen that before. At least we have if we’ve read anything about the Cantor Diagonal Argument. You can find that among the first ten posts of every mathematics blog. (Not this one. I was saving the subject until I had something good to say about it. Then I realized many bloggers have covered it better than I could.) Part of it is pondering how there can be a set of infinitely many things that don’t cover any length. The whole numbers are such a set and it seems reasonable they don’t cover any length. The rational numbers, though, are also an infinitely-large set that doesn’t cover any length. And there’s exactly as many rational numbers as there are whole numbers. This is unsettling but if you’re the sort of person who reads about infinities you come to accept it. Or you get into arguments with mathematicians online and never know you’ve lost.

Here’s where things get weird. How many bits of dust are there in this middle third set? It seems like it should be countable, the same size as the whole numbers. After all, we pick up some of these points every time we throw away a middle third. So we double the number of points left behind every time we throw away a middle third. That’s countable, right?

It’s not. We can prove it. The proof looks uncannily like that of the Cantor Diagonal Argument. That’s the one that proves there are more real numbers than there are whole numbers. There are points in this leftover set that were not endpoints of any of these middle-third excerpts. This dust has more points in it than there are rational numbers, but it covers no length.

(I don’t know if the dust has the same size as the real numbers. I suspect it’s unproved whether it has or hasn’t, because otherwise I’d surely be able to find the answer easily.)

It’s got other neat properties. It’s a fractal, which is why someone might have heard of it, back in the Great Fractal Land Rush of the 80s and 90s. Look closely at part of this set and it looks like the original set, with bits of dust edging gaps of bigger and smaller sizes. It’s got a fractal dimension, or “Hausdorff dimension” in the lingo, that’s the logarithm of two divided by the logarithm of three. That’s a number actually known to be transcendental, which is reassuring. Nearly all numbers are transcendental, but we only know a few examples of them.

HowardAt58 asked me about the Middle Third set, and that’s how I’ve referred to it here. It’s more often called the “Cantor set” or “Cantor comb”. The “comb” makes sense because if you draw successive middle-thirds-thrown-away, one after the other, you get something that looks kind of like a hair comb, if you squint.

You can build sets like this that aren’t based around thirds. You can, for example, develop one by cutting lines into five chunks and throw away the second and fourth. You get results that are similar, and similarly heady, but different. They’re all astounding. They’re all hard to believe in yet. They may get to be stuff we just accept as part of how mathematics works.

The End 2016 Mathematics A To Z: Boundary Value Problems


I went to a grad school, Rensselaer Polytechnic Institute. The joke at the school is that the mathematics department has two tracks, “Applied Mathematics” and “More Applied Mathematics”. So I got to know the subject of today’s A To Z very well. It’s worth your knowing too.

Boundary Value Problems.

I’ve talked about differential equations before. I’ll talk about them again. They’re important. They might be the most directly useful sort of higher mathematics. They turn up naturally whenever you have a system whose changes depend on the current state of things.

There are many kinds of differential equations problems. The ones that come first to mind, and that students first learn, are “initial value problems”. In these, you’re given some system, told how it changes in time, and told what things are at a start. There’s good reasons to do that. It’s conceptually easy. It describes all sorts of systems where something moves. Think of your classic physics problems of a ball being tossed in the air, or a weight being put on a spring, or a planet orbiting a sun. These are classic initial value problems. They almost look like natural experiments. Set a thing up and watch what happens.

They’re not everything. There’s another class of problems at least as important. Maybe more important. In these we’re given how the parts of a system affect one another. And we’re told some information about the edges of the system. The boundaries, that is. And these are “boundary value problems”.

Mathematics majors learn them after getting thoroughly trained in and sick of initial value problems. There’s reasons for that. First is that they almost need to be about problems with multiple variables. You can set one up for, like, a ball tossed in the air. But they’re rarer. Differential equations for multiple variables are harder than differential equations for a single variable, because of course. We have to learn the tools of “partial differential equations”. In these we work out how the system changes if we pretend all but one of the variables is fixed. We combine information about all those changes for each individual changing variable. Lots more, and lots stranger, stuff can happen.

The partial differential equation describes some region. It involves maybe some space, maybe some time, maybe both. There’s a region, called the “domain”, for which the differential equation is true.

For example, maybe we’re interested in the amount of heat in a metal bar as it’s warmed on one end and cooled on another. The domain here is the length of the bar and the time it’s subjected to the heat and cool. Or maybe we’re interested in the amount of water flowing through a section of a river bed. The domain here is the length and width and depth of the river, if we suppose the river isn’t swelling or shrinking or changing much. Maybe we’re intersted in the electric field created by putting a bit of charge on a metal ball. Then the domain is the entire universe except the metal ball and the space inside it. We’re comfortable with boundlessly large domains.

But what makes this a boundary value problem is that we know something about the boundary looks like. Once again a mathematics term is less baffling than you might figure. The boundary is just what it sounds like: the edge of the domain, the part that divides the domain from not-the-domain. The metal bar being heated up has boundaries on either end. The river bed has boundaries at the surface of the water, the banks of the river, and the start and the end of wherever we’re observing. The metal ball has boundaries of the ball’s surface and … uh … the limits of space and time, somewhere off infinitely far away.

There’s all kinds of information we might get about a boundary. What we actually get is one of four kinds. The first kind is “we get told what values the solution should be at the boundary”. Mathematics majors love this because it lets us know we at least have the boundary’s values right. It’s certainly what we learn on first. And it might be most common. If we’re measuring, say, temperature or fluid speed or something like that we feel like we can know what these are. If we need a name we call this “Dirichlet Boundary Conditions”. That’s named for Peter Gustav Lejune Dirichlet. He’s one of those people mathematics majors keep running across. We get stuff named for him in mathematical physics, in probability, in heat, in Fourier series.

The second kind is “we get told what the derivative of the solution should be at the boundary”. Mathematics majors hate this because we’re having a hard enough time solving this already and you want us to worry about the derivative of the solution on the boundary? Give us something we can check, please. But this sort of boundary condition keeps turning up. It comes up, for instance, in the electric field around a conductive metal box, or ball, or plate. The electric field will be, near the metal plate, perpendicular to the conductive metal. Goodness knows what the electric field’s value is, but we know something about how it changes. If we need a name we call this “Neumann Boundary Conditions”. This is not named for the applied mathematician/computer scientist/physicist John von Neumann. Nobody remembers the Neumann it is named for, who was Carl Neumann.

The third kind is called “Robin boundary conditions” if someone remembers the name for it. It’s slightly named for Victor Gustave Robin. In these we don’t necessarily know the value the solution should have on the boundary. And we don’t know what the derivative of the solution on the boundary should be. But we do know some linear combination of them. That is, we know some number times the original value plus some (possibly other) number times the derivative. Mathematics majors loathe this one because the Neumann boundary conditions were hard enough and now we have this? They turn up in heat and diffusion problems, when there’s something limiting the flow of whatever you’re studying into and out of the region.

And the last kind is called “mixed boundary conditions” as, I don’t know, nobody seems to have got their name attached to it. In this we break up the boundary. For some of it we get, say, Dirichlet boundary conditions. For some of the boundary we get, say, Neumann boundary conditions. Or maybe we have Robin boundary conditions for some of the edge and Dirichlet for others. Whatever. This mathematics majors get once or twice, as punishment for their sinful natures, and then we try never to think of them again because of the pain. Sometimes it’s the only approach that fits the problem. Still hurts.

We see boundary value problems when we do things like blow a soap bubble using weird wireframes and ponder the shape. Or when we mix hot coffee and cold milk in a travel mug and ponder how the temperatures mix. Or when we see a pipe squeezing into narrower channels and wonder how this affects the speed of water flowing into and out of it. Often these will be problems about how stuff over a region, maybe of space and maybe of time, will settle down to some predictable, steady pattern. This is why it turns up all over applied mathematics problems, and why in grad school we got to know them so very well.

The End 2016 Mathematics A To Z: Algebra


So let me start the End 2016 Mathematics A To Z with a word everybody figures they know. As will happen, everybody’s right and everybody’s wrong about that.

Algebra.

Everybody knows what algebra is. It’s the point where suddenly mathematics involves spelling. Instead of long division we’re on a never-ending search for ‘x’. Years later we pass along gifs of either someone saying “stop asking us to find your ex” or someone who’s circled the letter ‘x’ and written “there it is”. And make jokes about how we got through life without using algebra. And we know it’s the thing mathematicians are always doing.

Mathematicians aren’t always doing that. I expect the average mathematician would say she almost never does that. That’s a bit of a fib. We have a lot of work where we do stuff that would be recognizable as high school algebra. It’s just we don’t really care about that. We’re doing that because it’s how we get the problem we are interested in done. the most recent few pieces in my “Why Stuff can Orbit” series include a bunch of high school algebra-style work. But that was just because it was the easiest way to answer some calculus-inspired questions.

Still, “algebra” is a much-used word. It comes back around the second or third year of a mathematics major’s career. It comes in two forms in undergraduate life. One form is “linear algebra”, which is a great subject. That field’s about how stuff moves. You get to imagine space as this stretchy material. You can stretch it out. You can squash it down. You can stretch it in some directions and squash it in others. You can rotate it. These are simple things to build on. You can spend a whole career building on that. It becomes practical in surprising ways. For example, it’s the field of study behind finding equations that best match some complicated, messy real data.

The second form is “abstract algebra”, which comes in about the same time. This one is alien and baffling for a long while. It doesn’t help that the books all call it Introduction to Algebra or just Algebra and all your friends think you’re slumming. The mathematics major stumbles through confusing definitions and theorems that ought to sound comforting. (“Fermat’s Little Theorem”? That’s a good thing, right?) But the confusion passes, in time. There’s a beautiful subject here, one of my favorites. I’ve talked about it a lot.

We start with something that looks like the loosest cartoon of arithmetic. We get a bunch of things we can add together, and an ‘addition’ operation. This lets us do a lot of stuff that looks like addition modulo numbers. Then we go on to stuff that looks like picking up floor tiles and rotating them. Add in something that we call ‘multiplication’ and we get rings. This is a bit more like normal arithmetic. Add in some other stuff and we get ‘fields’ and other structures. We can keep falling back on arithmetic and on rotating tiles to build our intuition about what we’re doing. This trains mathematicians to look for particular patterns in new, abstract constructs.

Linear algebra is not an abstract-algebra sort of algebra. Sorry about that.

And there’s another kind of algebra that mathematicians talk about. At least once they get into grad school they do. There’s a huge family of these kinds of algebras. The family trait for them is that they share a particular rule about how you can multiply their elements together. I won’t get into that here. There are many kinds of these algebras. One that I keep trying to study on my own and crash hard against is Lie Algebra. That’s named for the Norwegian mathematician Sophus Lie. Pronounce it “lee”, as in “leaning”. You can understand quantum mechanics much better if you’re comfortable with Lie Algebras and so now you know one of my weaknesses. Another kind is the Clifford Algebra. This lets us create something called a “hypercomplex number”. It isn’t much like a complex number. Sorry. Clifford Algebra does lend to a construct called spinors. These help physicists understand the behavior of bosons and fermions. Every bit of matter seems to be either a boson or a fermion. So you see why this is something people might like to understand.

Boolean Algebra is the algebra of this type that a normal person is likely to have heard of. It’s about what we can build using two values and a few operations. Those values by tradition we call True and False, or 1 and 0. The operations we call things like ‘and’ and ‘or’ and ‘not’. It doesn’t sound like much. It gives us computational logic. Isn’t that amazing stuff?

So if someone says “algebra” she might mean any of these. A normal person in a non-academic context probably means high school algebra. A mathematician speaking without further context probably means abstract algebra. If you hear something about “matrices” it’s more likely that she’s speaking of linear algebra. But abstract algebra can’t be ruled out yet. If you hear a word like “eigenvector” or “eigenvalue” or anything else starting “eigen” (or “characteristic”) she’s more probably speaking of abstract algebra. And if there’s someone’s name before the word “algebra” then she’s probably speaking of the last of these. This is not a perfect guide. But it is the sort of context mathematicians expect other mathematicians notice.