When Is Leap Day Most Likely To Happen?


Anyone hoping for an answer besides the 29th of February either suspects I’m doing some clickbait thing, maybe talking about that time Sweden didn’t quite make the transition from the Julian to the Gregorian calendar, or realizes I’m talking about days of the week. Are 29ths of February more likely to be a Sunday, a Monday, what?

The reason this is a question at all is that the Gregorian calendar has this very slight, but real, bias. Some days of the week are more likely for a year to start on than other days are. This gives us the phenomenon where the 13th of months are slightly more likely to be Fridays than any other day of the week. Here “likely” reflects that, if we do not know a specific month and year, then we can’t say which of the seven days the calendar’s rules give us for the date of the 13th.

Or, for that matter, if we don’t know which leap year we’re thinking of. There are 97 of them every 400 years. Since 97 things can’t be uniformly spread across the seven days of the week, how are they spread?

This is what computers are for. You’ve seen me do this for the date of Easter and for the date of (US) Thanksgiving. Using the ‘weekday’ function in Octave (a Matlab clone) I checked. In any 400-year span of the Gregorian calendar — and the calendar recycles every 400 years, so that’s as much as we need — we will see this distribution:

Leap Day will be a this many times
Sunday 13
Monday 15
Tuesday 13
Wednesday 15
Thursday 13
Friday 14
Saturday 14
in 400 years

Through to 2100, though, the calendar is going to follow a 28-year period. So this will be the last Saturday leap day until 2048. The next several ones will be Thursday, Tuesday, Sunday, Friday, Wednesday, and Monday.

Reading the Comics, January 23, 2018: Adult Content Edition


I was all set to say how complaining about GoComics.com’s pages not loading had gotten them fixed. But they only worked for Monday alone; today they’re broken again. Right. I haven’t tried sending an error report again; we’ll see if that works. Meanwhile, I’m still not through last week’s comic strips and I had just enough for one day to nearly enough justify an installment for the one day. Should finish off the rest of the week next essay, probably in time for next week.

Mark Leiknes’s Cow and Boy rerun for the 23rd circles around some of Zeno’s Paradoxes. At the heart of some of them is the question of whether a thing can be divided infinitely many times, or whether there must be some smallest amount of a thing. Zeno wonders about space and time, but you can do as well with substance, with matter. Mathematics majors like to say the problem is easy; Zeno just didn’t realize that a sum of infinitely many things could be a finite and nonzero number. This misses the good question of how the sum of infinitely many things, none of which are zero, can be anything but infinitely large? Or, put another way, what’s different in adding \frac11 + \frac12 + \frac13 + \frac14 + \cdots and adding \frac11 + \frac14 + \frac19 + \frac{1}{16} + \cdots that the one is infinitely large and the other not?

Or how about this. Pick your favorite string of digits. 23. 314. 271828. Whatever. Add together the series \frac11 + \frac12 + \frac13 + \frac14 + \cdots except that you omit any terms that have your favorite string there. So, if you picked 23, don’t add \frac{1}{23} , or \frac{1}{123} , or \frac{1}{802301} or such. That depleted series does converge. The heck is happening there? (Here’s why it’s true for a single digit being thrown out. Showing it’s true for longer strings of digits takes more work but not really different work.)

J C Duffy’s Lug Nuts for the 23rd is, I think, the first time I have to give a content warning for one of these. It’s a porn-movie advertisement spoof. But it mentions Einstein and Pi and has the tagline “she didn’t go for eggheads … until he showed her a new equation!”. So, you know, it’s using mathematics skill as a signifier of intelligence and riffing on the idea that nerds like sex too.

John Graziano’s Ripley’s Believe It or Not for the 23rd has a trivia that made me initially think “not”. It notes Vince Parker, Senior and Junior, of Alabama were both born on Leap Day, the 29th of February. I’ll accept this without further proof because of the very slight harm that would befall me were I to accept this wrongly. But it also asserted this was a 1-in-2.1-million chance. That sounded wrong. Whether it is depends on what you think the chance is of.

Because what’s the remarkable thing here? That a father and son have the same birthday? Surely the chance of that is 1 in 365. The father could be born any day of the year; the son, also any day. Trusting there’s no influence of the father’s birthday on the son’s, then, 1 in 365 it is. Or, well, 1 in about 365.25, since there are leap days. There’s approximately one leap day every four years, so, surely that, right?

And not quite. In four years there’ll be 1,461 days. Four of them will be the 29th of January and four the 29th of September and four the 29th of August and so on. So if the father was born any day but leap day (a “non-bissextile day”, if you want to use a word that starts a good fight in a Scrabble match), the chance the son’s birth is the same is 4 chances in 1,461. 1 in 365.25. If the father was born on Leap Day, then the chance the son was born the same day is only 1 chance in 1,461. Still way short of 1-in-2.1-million. So, Graziano’s Ripley’s is wrong if that’s the chance we’re looking at.

Ah, but what if we’re looking at a different chance? What if we’re looking for the chance that the father is born the 29th of February and the son is also born the 29th of February? There’s a 1-in-1,461 chance the father’s born on Leap Day. And a 1-in-1,461 chance the son’s born on Leap Day. And if those events are independent, the father’s birth date not influencing the son’s, then the chance of both those together is indeed 1 in 2,134,521. So Graziano’s Ripley’s is right if that’s the chance we’re looking at.

Which is a good reminder: if you want to work out the probability of some event, work out precisely what the event is. Ordinary language is ambiguous. This is usually a good thing. But it’s fatal to discussing probability questions sensibly.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 23rd presents his mathematician discovering a new set of numbers. This will happen. Mathematics has had great success, historically, finding new sets of things that look only a bit like numbers were understood. And showing that if they follow rules that are, as much as possible, like the old numbers, we get useful stuff out of them. The mathematician claims to be a formalist, in the punch line. This is a philosophy that considers mathematical results to be the things you get by starting with some symbols and some rules for manipulating them. What this stuff means, and whether it reflects anything of interest in the real world, isn’t of interest. We can know the results are good because they follow the rules.

This sort of approach can be fruitful. It can force you to accept results that are true but intuition-defying. And it can give results impressive confidence. You can even, at least in principle, automate the creating and the checking of logical proofs. The disadvantages are that it takes forever to get anything done. And it’s hard to shake the idea that we ought to have some idea what any of this stuff means.

What I Learned Doing The Leap Day 2016 Mathematics A To Z


The biggest thing I learned in the recently concluded mathematics glossary is that continued fractions have enthusiasts. I hadn’t intended to cause controversy when I claimed they weren’t much used anymore. The most I have grounds to say is that the United States educational process as I experienced it doesn’t use them for more than a few special purposes. There is a general lesson there. While my experience may be typical, that doesn’t mean everyone’s is like it. There is a mystery to learn from in that.

The next big thing I learned was the Kullbach-Leibler Divergence. I’m glad to know it now. And I would not have known it, I imagine, if it weren’t for my trying something novel and getting a fine result from it. That was throwing open the A To Z glossary to requests from readers. At least half the terms were ones that someone reading my original call had asked for.

And that was thrilling. It gave me a greater feeling that I was communicating with specific people than most of the things that I’ve written, is the biggest point. I understand that I have readers, and occasionally chat with some. This was a rare chance to feel engaged, though.

And getting asked things I hadn’t thought of, or in some cases hadn’t heard of, was great. It foiled the idea of two months’ worth of easy postings, but it made me look up and learn and think about a variety of things. And also to re-think them. My first drafts of the Dedekind Domain and the Kullbach-Leibler divergence essays were completely scrapped, and the Jacobian made it through only with a lot of rewriting. I’ve been inclined to write with few equations and even fewer drawings around here. Part of that’s to be less intimidating. Part of that’s because of laziness. Some stuff is wonderfully easy to express in a sketch, but transferring that to a digital form is the heavy work of getting out the scanner and plugging it in. Or drawing from scratch on my iPad. Cleaning it up is even more work. So better to spend a thousand extra words on the setup.

But that seemed to work! I’m especially surprised that the Jacobian and the Lagrangian essays seemed to make sense without pictures or equations. Homomorphisms and isomorphisms were only a bit less surprising. I feel like I’ve been writing better thanks to this.

I do figure on another A To Z for sometime this summer. Perhaps I should open nominations already, and with a better-organized scheme for knocking out letters. Some people were disappointed (I suppose) by picking letters that had already got assigned. And I could certainly use time and help finding more x- and y-words. Q isn’t an easy one either.

A Leap Day 2016 Mathematics A To Z: The Roundup


And with the conclusion of the alphabet I move now into posting about each of the counting numbers. … No, wait, that’s already being done. But I should gather together the A To Z posts in order that it’s easier to find them later on.

I mean to put together some thoughts about this A To Z. I haven’t had time yet. I can say that it’s been a lot of fun to write, even if after the first two weeks I was never as far ahead of deadline as I hoped to be. I do expect to run another one of these, although I don’t know when that will be. After I’ve had some chance to recuperate, though. It’s fun going two months without missing a day’s posting on my mathematics blog. But it’s also work and who wants that?

A Leap Day 2016 Mathematics A To Z: Z-score


And we come to the last of the Leap Day 2016 Mathematics A To Z series! Z is a richer letter than x or y, but it’s still not so rich as you might expect. This is why I’m using a term that everybody figured I’d use the last time around, when I went with z-transforms instead.

Z-Score

You get an exam back. You get an 83. Did you do well?

Hard to say. It depends on so much. If you expected to barely pass and maybe get as high as a 70, then you’ve done well. If you took the Preliminary SAT, with a composite score that ranges from 60 to 240, an 83 is catastrophic. If the instructor gave an easy test, you maybe scored right in the middle of the pack. If the instructor sees tests as a way to weed out the undeserving, you maybe had the best score in the class. It’s impossible to say whether you did well without context.

The z-score is a way to provide that context. It draws that context by comparing a single score to all the other values. And underlying that comparison is the assumption that whatever it is we’re measuring fits a pattern. Usually it does. The pattern we suppose stuff we measure will fit is the Normal Distribution. Sometimes it’s called the Standard Distribution. Sometimes it’s called the Standard Normal Distribution, so that you know we mean business. Sometimes it’s called the Gaussian Distribution. I wouldn’t rule out someone writing the Gaussian Normal Distribution. It’s also called the bell curve distribution. As the names suggest by throwing around “normal” and “standard” so much, it shows up everywhere.

A normal distribution means that whatever it is we’re measuring follows some rules. One is that there’s a well-defined arithmetic mean of all the possible results. And that arithmetic mean is the most common value to turn up. That’s called the mode. Also, this arithmetic mean, and mode, is also the median value. There’s as many data points less than it as there are greater than it. Most of the data values are pretty close to the mean/mode/median value. There’s some more as you get farther from this mean. But the number of data values far away from it are pretty tiny. You can, in principle, get a value that’s way far away from the mean, but it’s unlikely.

We call this standard because it might as well be. Measure anything that varies at all. Draw a chart with the horizontal axis all the values you could measure. The vertical axis is how many times each of those values comes up. It’ll be a standard distribution uncannily often. The standard distribution appears when the thing we measure satisfies some quite common conditions. Almost everything satisfies them, or nearly satisfies them. So we see bell curves so often when we plot how frequently data points come up. It’s easy to forget that not everything is a bell curve.

The normal distribution has a mean, and median, and mode, of 0. It’s tidy that way. And it has a standard deviation of exactly 1. The standard deviation is a way of measuring how spread out the bell curve is. About 95 percent of all observed results are less than two standard deviations away from the mean. About 99 percent of all observed results are less than three standard deviations away. 99.9997 percent of all observed results are less than six standard deviations away. That last might sound familiar to those who’ve worked in manufacturing. At least it des once you know that the Greek letter sigma is the common shorthand for a standard deviation. “Six Sigma” is a quality-control approach. It’s meant to make sure one understands all the factors that influence a product and controls them. This is so the product falls outside the design specifications only 0.0003 percent of the time.

This is the normal distribution. It has a standard deviation of 1 and a mean of 0, by definition. And then people using statistics go and muddle the definition. It is always so, with the stuff people actually use. Forgive them. It doesn’t really change the shape of the curve if we scale it, so that the standard deviation is, say, two, or ten, or π, or any positive number. It just changes where the tick marks are on the x-axis of our plot. And it doesn’t really change the shape of the curve if we translate it, adding (or subtracting) some number to it. That makes the mean, oh, 80. Or -15. Or eπ. Or some other number. That just changes what value we write underneath the tick marks on the plot’s x-axis. We can find a scaling and translation of the normal distribution that fits whatever data we’re observing.

When we find the z-score for a particular data point we’re undoing this translation and scaling. We figure out what number on the standard distribution maps onto the original data set’s value. About two-thirds of all data points are going to have z-scores between -1 and 1. About nineteen out of twenty will have z-scores between -2 and 2. About 99 out of 100 will have z-scores between -3 and 3. If we don’t see this, and we have a lot of data points, then that’s suggests our data isn’t normally distributed.

I don’t know why the letter ‘z’ is used for this instead of, say, ‘y’ or ‘w’ or something else. ‘x’ is out, I imagine, because we use that for the original data. And ‘y’ is a natural pick for a second measured variable. z’, I expect, is just far enough from ‘x’ it isn’t needed for some more urgent duty, while being close enough to ‘x’ to suggest it’s some measured thing.

The z-score gives us a way to compare how interesting or unusual scores are. If the exam on which we got an 83 has a mean of, say, 74, and a standard deviation of 5, then we can say this 83 is a pretty solid score. If it has a mean of 78 and a standard deviation of 10, then the score is better-than-average but not exceptional. If the exam has a mean of 70 and a standard deviation of 4, then the score is fantastic. We get to meaningfully compare scores from the measurements of different things. And so it’s one of the tools with which statisticians build their work.

A Leap Day 2016 Mathematics A To Z: Yukawa Potential


Yeah, ‘Y’ is a lousy letter in the Mathematics Glossary. I have a half-dozen mathematics books on the shelf by my computer. Some is semi-popular stuff like Richard Courant and Herbert Robbins’s What Is Mathematics? (the Ian Stewart revision). Some is fairly technical stuff, by which I mean Hidetoshi Nishimori’s Statistical Physics of Spin Glasses and Information Processing. There’s just no ‘Y’ terms in any of them worth anything. But I can rope something into the field. For example …

Yukawa Potential

When you as a physics undergraduate first take mechanics it’s mostly about very simple objects doing things according to one rule. The objects are usually these indivisible chunks. They’re either perfectly solid or they’re points, too tiny to have a surface area or volume that might mess things up. We draw them as circles or as blocks because they’re too hard to see on the paper or board otherwise. We spend a little time describing how they fall in a room. This lends itself to demonstrations in which the instructor drops a rubber ball. Then we go on to a mass on a spring hanging from the ceiling. Then to a mass on a spring hanging to another mass.

Then we go onto two things sliding on a surface and colliding, which would really lend itself to bouncing pool balls against one another. Instead we use smaller solid balls. Sometimes those “Newton’s Cradle” things with the five balls that dangle from wires and just barely touch each other. They give a good reason to start talking about vectors. I mean positional vectors, the ones that say “stuff moving this much in this direction”. Normal vectors, that is. Then we get into stars and planets and moons attracting each other by gravity. And then we get into the stuff that really needs calculus. The earlier stuff is helped by it, yes. It’s just by this point we can’t do without.

The “things colliding” and “balls dropped in a room” are the odd cases in this. Most of the interesting stuff in an introduction to mechanics course is about things attracting, or repelling, other things. And, particularly, they’re particles that interact by “central forces”. Their attraction or repulsion is along the line that connects the two particles. (Impossible for a force to do otherwise? Just wait until Intro to Mechanics II, when magnetism gets in the game. After that, somewhere in a fluid dynamics course, you’ll see how a vortex interacts with another vortex.) The potential energies for these all vary with distance between the points.

Yeah, they also depend on the mass, or charge, or some kind of strength-constant for the points. They also depend on some universal constant for the strength of the interacting force. But those are, well, constant. If you move the particles closer together or farther apart the potential changes just by how much you moved them, nothing else.

Particles hooked together by a spring have a potential that looks like \frac{1}{2}k r^2 . Here ‘r’ is how far the particles are from each other. ‘k’ is the spring constant; it’s just how strong the spring is. The one-half makes some other stuff neater. It doesn’t do anything much for us here. A particle attracted by another gravitationally has a potential that looks like -G M \frac{1}{r} . Again ‘r’ is how far the particles are from each other. ‘G’ is the gravitational constant of the universe. ‘M’ is the mass of the other particle. (The particle’s own mass doesn’t enter into it.) The electric potential looks like the gravitational potential but we have different symbols for stuff besides the \frac{1}{r} bit.

The spring potential and the gravitational/electric potential have an interesting property. You can have “closed orbits” with a pair of them. You can set a particle orbiting another and, with time, get back to exactly the original positions and velocities. (Three or more particles you’re not guaranteed of anything.) The curious thing is this doesn’t always happen for potentials that look like “something or other times r to a power”. In fact, it never happens, except for the spring potential, the gravitational/electric potential, and — peculiarly — for the potential k r^7 . ‘k’ doesn’t mean anything there, and we don’t put a one-seventh or anything out front for convenience, because nobody knows anything that needs anything like that, ever. We can have stable orbits, ones that stay within a minimum and a maximum radius, for a potential k r^n whenever n is larger than -2, at least. And that’s it, for potentials that are nothing but r-to-a-power.

Ah, but does the potential have to be r-to-a-power? And here we see Dr Hideki Yukawa’s potential energy. Like these springs and gravitational/electric potentials, it varies only with the distance between particles. Its strength isn’t just the radius to a power, though. It uses a more complicated expression:

-K \frac{e^{-br}}{r}

Here ‘K’ is a scaling constant for the strength of the whole force. It’s the kind of thing we have ‘G M’ for in the gravitational potential, or ‘k’ in the spring potential. The ‘b’ is a second kind of scaling. And that a kind of range. A range of what? It’ll help to look at this potential rewritten a little. It’s the same as -\left(K \frac{1}{r}\right) \cdot \left(e^{-br}\right) . That’s the gravitational/electric potential, times e-br. That’s a number that will be very large as r is small, but will drop to zero surprisingly quickly as r gets larger. How quickly will depend on b. The larger a number b is, the faster this drops to zero. The smaller a number b is, the slower this drops to zero. And if b is equal to zero, then e-br is equal to 1, and we have the gravitational/electric potential all over again.

Yukawa introduced this potential to physics in the 1930s. He was trying to model the forces which keep an atom’s nucleus together. It represents the potential we expect from particles that attract one another by exchanging some particles with a rest mass. This rest mass is hidden within that number ‘b’ there. If the rest mass is zero, the particles are exchanging something like light, and that’s just what we expect for the electric potential. For the gravitational potential … um. It’s complicated. It’s one of the reasons why we expect that gravitons, if they exist, have zero rest mass. But we don’t know that gravitons exist. We have a lot of trouble making theoretical gravitons and quantum mechanics work together. I’d rather be skeptical of the things until we need them.

Still, the Yukawa potential is an interesting mathematical creature even if we ignore its important role in modern physics. When I took my Introduction to Mechanics final one of the exam problems was deriving the equivalent of Kepler’s Laws of Motion for the Yukawa Potential. I thought then it was a brilliant problem. I still do. It struck me while writing this that I don’t remember whether it allows for closed orbits, except when b is zero. I’m a bit afraid to try to work out whether it does, lest I learn that I can’t follow the reasoning for that anymore. That would be a terrible thing to learn.

A Leap Day 2016 Mathematics A To Z: X-Intercept


Oh, x- and y-, why are you so poor in mathematics terms? I brave my way.

X-Intercept.

I did not get much out of my eighth-grade, pre-algebra, class. I didn’t connect with the teacher at all. There were a few little bits to get through my disinterest. One came in graphing. Not graph theory, of course, but the graphing we do in middle school and high school. That’s where we find points on the plane with coordinates that make some expression true. Two major terms kept coming up in drawing curves of lines. They’re the x-intercept and the y-intercept. They had this lovely, faintly technical, faintly science-y sound. I think the teacher emphasized a few times they were “intercepts”, not “intersects”. But it’s hard to explain to an eighth-grader why this is an important difference to make. I’m not sure I could explain it to myself.

An x-intercept is a point where the plot of a curve and the x-axis meet. So we’re assuming this is a Cartesian coordinate system, the kind marked off with a pair of lines meeting at right angles. It’s usually two-dimensional, sometimes three-dimensional. I don’t know anyone who’s worried about the x-intercept for a four-dimensional space. Even higher dimensions are right out. The thing that confused me the most, when learning this, is a small one. The x-axis is points that have a y-coordinate of zero. Not an x-coordinate of zero. So in a two-dimensional space it makes sense to describe the x-intercept as a single value. That’ll be the x-coordinate, and the point with the x-coordinate of that and the y-coordinate of zero is the intercept.

If you have an expression and you want to find an x-intercept, you need to find values of x which make the expression equal to zero. We get the idea from studying lines. There are a couple of typical representations of lines. They almost always use x for the horizontal coordinate, and y for the vertical coordinate. The names are only different if the author is making a point about the arbitrariness of variable names. Sigh at such an author and move on. An x-intercept has a y-coordinate of zero, so, set any appearance of ‘y’ in the expression equal to zero and find out what value or values of x make this true. If the expression is an equation for a line there’ll be just the one point, unless the line is horizontal. (If the line is horizontal, then either every point on the x-axis is an intercept, or else none of them are. The line is either “y equals zero”, or it is “y equals something other than zero”. )

There’s also a y-intercept. It is exactly what you’d imagine once you know that. It’s usually easier to find what the y-intercept is. The equation describing a curve is typically written in the form “y = f(x)”. That is, y is by itself on one side, and some complicated expression involving x’s is on the other. Working out what y is for a given x is straightforward. Working out what x is for a given y is … not hard, for a line. For more complicated shapes it can be difficult. There might not be a unique answer. That’s all right. There may be several x-intercepts.

There are a couple names for the x-intercepts. The one that turns up most often away from the pre-algebra and high school algebra study of lines is a “zero”. It’s one of those bits in which mathematicians seem to be trying to make it hard for students. A “zero” of the function f(x) is generally not what you get when you evaluate it for x equalling zero. Sorry about that. It’s the values of x for which f(x) equals zero. We also call them “roots”.

OK, but who cares?

Well, if you want to understand the shape of a curve, the way a function looks, it helps to plot it. Today, yeah, pull up Mathematica or Matlab or Octave or some other program and you get your plot. Fair enough. If you don’t have a computer that can plot like that, the way I did in middle school, you have to do it by hand. And then the intercepts are clues to how to sketch the function. They are, relatively, easy points which you can find, and which you know must be on the curve. We may form a very rough sketch of the curve. But that rough picture may be better than having nothing.

And we can learn about the behavior of functions even without plotting, or sketching a plot. Intercepts of expressions, or of parts of expressions, are points where the value might change from positive to negative. If the denominator of a part of the expression has an x-intercept, this could be a point where the function’s value is undefined. It may be a discontinuity in the function. The function’s values might jump wildly between one side and another. These are often the important things about understanding functions. Where are they positive? Where are they negative? Where are they continuous? Where are they not?

These are things we often want to know about functions. And we learn many of them by looking for the intercepts, x- and y-.

A Leap Day 2016 Mathematics A To Z: Wlog


Wait for it.

Wlog.

I’d like to say a good word for boredom. It needs the good words. The emotional state has an appalling reputation. We think it’s the sad state someone’s in when they can’t find anything interesting. It’s not. It’s the state in which we are so desperate for engagement that anything is interesting enough.

And that isn’t a bad thing! Finding something interesting enough is a precursor to noticing something curious. And curiosity is a precursor to discovery. And discovery is a precursor to seeing a fuller richness of the world.

Think of being stuck in a waiting room, deprived of reading materials or a phone to play with or much of anything to do. But there is a clock. Your classic analog-face clock. Its long minute hand sweeps out the full 360 degrees of the circle once every hour, 24 times a day. Its short hour hand sweeps out that same arc every twelve hours, only twice a day. Why is the big unit of time marked with the short hand? Good question, I don’t know. Probably, ultimately, because it changes so much less than the minute hand that it doesn’t need the attention of length drawn to it.

But let our waiting mathematician get a little more bored, and think more about the clock. The hour and minute hand must sometimes point in the same direction. They do at 12:00 by the clock, for example. And they will at … a little bit past 1:00, and a little more past 2:00, and a good while after 9:00, and so on. How many times during the day will they point the same direction?

Well, one easy way to do this is to work out how long it takes the hands, once they’ve met, to meet up again. Presumably we don’t want to wait the whole hour-and-some-more-time for it. But how long is that? Well, we know the hands start out pointing the same direction at 12:00. The first time after that will be after 1:00. At exactly 1:00 the hour hand is 30 degrees clockwise of the minute hand. The minute hand will need five minutes to catch up to that. In those five minutes the hour hand will have moved another 2.5 degrees clockwise. The minute hand needs about four-tenths of a minute to catch up to that. In that time the hour hand moves — OK, we’re starting to see why Zeno was not an idiot. He never was.

But we have this roughly worked out. It’s about one hour, five and a half minutes between one time the hands meet and the next. In the course of twelve hours there’ll be time for them to meet up … oh, of course, eleven times. Over the course of the day they’ll meet up 22 times and we can get into a fight over whether midnight counts as part of today, tomorrow, or both days, or neither. (The answer: pretend the day starts at 12:01.)

Hold on, though. How do we know that the time between the hands meeting up at 12:00 and the one at about 1:05 is the same as the time between the hands meeting up near 1:05 and the next one, sometime a little after 2:10? Or between that one and the one at a little past 3:15? What grounds do we have for saying this one interval is a fair representation of them all?

We can argue that it should be fairly enough. Imagine that all the markings were washed off the clock. It’s just two hands sweeping around in circles, one relatively fast, one relatively slow, forever. Give the clockface a spin. When the hands come together again rotate the clock so those two hands are vertical, the “12:00” position. Is this actually 12:00? … Well, we’ve got a one-in-eleven chance it is. It might be a little past 1:05; it might be that time something past 6:30. The movement of the clock hands gives no hint what time it really is.

And that is why we’re justified taking this one interval as representative of them all. The rate at which the hands move, relative to each other, doesn’t depend on what the clock face behind it says. The rate is, if the clock isn’t broken, always the same. So we can use information about one special case that happens to be easy to work out to handle all the cases.

That’s the mathematics term for this essay. We can study the one specific case without loss of generality, or as it’s inevitably abbreviated, wlog. This is the trick of studying something possibly complicated, possibly abstract, by looking for a representative case. That representative case may tell us everything we need to know, at least about this particular problem. Generality means what you might figure from the ordinary English meaning of it: it means this answer holds in general, as opposed to in this specific instance.

Some thought has to go in to choosing the representative case. We have to pick something that doesn’t, somehow, miss out on a class of problems we would want to solve. We mustn’t lose the generality. And it’s an easy mistake to make, especially as a mathematics student first venturing into more abstract waters. I remember coming up against that often when trying to prove properties of infinitely long series. It’s so hard to reason something about a bunch of numbers whose identities I have no idea about; why can’t I just use the sequence, oh, 1/1, 1/2, 1/3, 1/4, et cetera and let that be good enough? Maybe 1/1, 1/4, 1/9, 1/16, et cetera for a second test, just in case? It’s because it takes time to learn how to safely handle infinities.

It’s still worth doing. Few of us are good at manipulating things in the abstract. We have to spend more mental energy imagining the thing rather than asking the questions we want of it. Reducing that abstraction — even if it’s just a little bit, changing, say, from “an infinitely-differentiable function” to “a polynomial of high enough degree” — can rescue us. We can try out things we’re confident we understand, and derive from it things we don’t know.

I can’t say that a bored person observing a clock would deduce all this. Parts of it, certainly. Maybe all, if she thought long enough. I believe it’s worth noticing and thinking of these kinds of things. And it’s why I believe it’s fine to be bored sometimes.

A Leap Day 2016 Mathematics A To Z: Vector


And as we approach the last letters of the alphabet, my Leap Day A To Z gets to the lats of Gaurish’s requests.

Vector.

A vector’s a thing you can multiply by a number and then add to another vector.

Oh, I know what you’re thinking. Wasn’t a vector one of those things that points somewhere? A direction and a length in that direction? (Maybe dressed up in more formal language. I’m glad to see that apparently New Jersey Tech’s student newspaper is still The Vector and still uses the motto “With Magnitude And Direction’.) Yeah, that’s how we’re always introduced to it. Pointing to stuff is a good introduction to vectors. Nearly everyone finds their way around places. And it’s a good learning model, to learn how to multiply vectors by numbers and to add vectors together.

But thinking too much about directions, either in real-world three-dimensional space, or in the two-dimensional space of the thing we’re writing notes on, can be limiting. We can get too hung up on a particular representation of a vector. Usually that’s an ordered set of numbers. That’s all right as far as it goes, but why limit ourselves? A particular representation can be easy to understand, but as the scary people in the philosophy department have been pointing out for 26 centuries now, a particular example of a thing and the thing are not identical.

And if we look at vectors as “things we can multiply by a number, then add another vector to”, then we see something grand. We see a commonality in many different kinds of things. We can do this multiply-and-add with those things that point somewhere. Call those coordinates. But we can also do this with matrices, grids of numbers or other stuff it’s convenient to have. We can also do this with ordinary old numbers. (Think about it.) We can do this with polynomials. We can do this with sets of linear equations. We can do this with functions, as long as they’re defined for compatible domains. We can even do this with differential equations. We can see a unity in things that seem, at first, to have nothing to do with one another.

We call these collections of things “vector spaces”. It’s a space much like the space you happen to exist in is. Adding two things in the space together is much like moving from one place to another, then moving again. You can’t get out of the space. Multiplying a thing in the space by a real number is like going in one direction a short or a long or whatever great distance you want. Again you can’t get out of the space. This is called “being closed”.

(I know, you may be wondering if it isn’t question-begging to say a vector is a thing in a vector space, which is made up of vectors. It isn’t. We define a vector space as a set of things that satisfy a certain group of rules. The things in that set are the vectors.)

Vector spaces are nice things. They work much like ordinary space does. We can bring many of the ideas we know from spatial awareness to vector spaces. For example, we can usually define a “length” of things. And something that works like the “angle” between things. We can define bases, breaking down a particular element into a combination of standard reference elements. This helps us solve problems, by finding ways they’re shadows of things we already know how to solve. And it doesn’t take much to satisfy the rules of being a vector space. I think mathematicians studying new groups of objects look instinctively for how we might organize them into a vector space.

We can organize them further. A vector space that satisfies some rules about sequences of terms, and that has a “norm” which is pretty much a size, becomes a Banach space. It works a little more like ordinary three-dimensional space. A Banach space that has a norm defined by a certain common method is a Hilbert space. These work even more like ordinary space, but they don’t need anything in common with it. For example, the functions that describe quantum mechanics are in a Hilbert space. There’s a thing called a Sobolev Space, a kind of vector space that also meets criteria I forget, but the name has stuck with me for decades because it is so wonderfully assonant.

I mentioned how vectors are stuff you can multiply by numbers, and add to other vectors. That’s true, but it’s a little limiting. The thing we multiply a vector by is called a scalar. And the scalar is a number — real or complex-valued — so often it’s easy to think that’s the default. But it doesn’t have to be. The scalar just has to be an element of some field. A ‘field’ is a ring that you can do addition, multiplication, and division on. So numbers are the obvious choice. They’re not the only ones, though. The scalar has to be able to multiply with the vector, since otherwise the entire concept collapses into gibberish. But we wouldn’t go looking among the gibberish except to be funny anyway.

The idea of the ‘vector’ is straightforward and powerful. So we see it all over a wide swath of mathematics. It’s one of the things that shapes how we expect mathematics to look.

A Leap Day 2016 Mathematics A To Z: Uncountable


I’m drawing closer to the end of the alphabet. While I have got choices for ‘V’ and ‘W’ set, I’ll admit that I’m still looking for something that inspires me in the last couple letters. Such inspiration might come from anywhere. HowardAt58, of that WordPress blog, gave me the notion for today’s entry.

Uncountable.

What are we doing when we count things?

Maybe nothing. We might be counting just to be doing something. Or we might be counting because we want to do nothing. Counting can be a good way into a restful state. Fair enough. Just because we do something doesn’t mean we care about the result.

Suppose we do care about the result of our counting. Then what is it we do when we count? The mechanism is straightforward enough. We pick out things and say, or imagine saying, “one, two, three, four,” and so on. Or we at least imagine the numbers along with the things being numbered. When we run out of things to count, we take whatever the last number was. That’s how many of the things there were. Why are there eight light bulbs in the chandelier fixture above the dining room table? Because there are not nine.

That’s how lay people count anyway. Mathematicians would naturally have a more sophisticated view of the business. A much more powerful counting scheme. Concepts in counting that go far beyond what you might work out in first grade.

Yeah, so that’s what most of us would figure. Things don’t get much more sophisticated than that, though. This probably is because the idea of counting is tied to the theory of sets. And the theory of sets grew, in part, to come up with a logically solid base for arithmetic. So many of the key ideas of set theory are so straightforward they hardly seem to need explaining.

We build the idea of “countable” off of the nice, familiar numbers 1, 2, 3, and so on. That set’s called the counting numbers. They’re the numbers that everybody seems to recognize as numbers. Not just people. Even animals seem to understand at least the first couple of counting numbers. Sometimes these are called the natural numbers.

Take a set of things we want to study. We’re interested in whether we can match the things in that set one-to-one with the things in the counting numbers. We don’t have to use all the counting numbers. But we can’t use the same counting number twice. If we’ve matched one chandelier light bulb with the number ‘4’, we mustn’t match a different bulb with the same number. Similarly, if we’ve got the number ‘4’ matched to one bulb, we mustn’t match ‘4’ with another bulb at the same time.

If we can do this, then our set’s countable. If we really wanted, we could pick the counting numbers in order, starting from 1, and match up all the things with counting numbers. If we run out of things, then we have a finitely large set. The last number we used to match anything up with anything is the size, or in the jargon, the cardinality of our set. We might not care about the cardinality, just whether the set is finite. Then we can pick counting numbers as we like in no particular order. Just use whatever’s convenient.

But what if we don’t run out of things? And it’s possible we won’t. Suppose our set is the negative whole numbers: -1, -2, -3, -4, -5, and so on. We can match each of those to a counting number many ways. We always can. But there’s an easy way. Match -1 to 1, match -2 to 2, match -3 to 3, and so on. Why work harder than that? We aren’t going to run out of negative whole numbers. And we aren’t going to find any we can’t match with some counting number. And we aren’t going to have to match two different negative numbers to the same counting number. So what we have here is an infinitely large, yet still countable, set.

So a set of things can be countable and finite. It can be countable and infinite. What else is there to be?

There must be something. It’d be peculiar to have a classification that everything was in, after all. At least it would be peculiar except for people studying what it means to exist or to not exist. And most of those people are in the philosophy department, where we’re scared of visiting. So we must mean there’s some such thing as an uncountable set.

The idea means just what you’d guess if you didn’t know enough mathematics to be tricky. Something is uncountable if it can’t be counted. It can’t be counted if there’s no way to match it up, one thing-to-one thing, with the counting numbers. We have to somehow run out of counting numbers.

It’s not obvious that we can do that. Some promising approaches don’t work. For example, the set of all the integers — 1, 2, 3, 4, 5, and all that, and 0, and the negative numbers -1, -2, -3, -4, -5, and so on — is still countable. Match the counting number 1 to 0. Match the counting number 2 to 1. Match the counting number 3 to -1. Match 4 to 2. Match 5 to -2. Match 6 to 3. Match 7 to -3. And so on.

Even ordered pair of the counting numbers don’t do it. We can match the counting number 1 to the pair (1, 1). Match the counting number 2 to the pair (2, 1). Match the counting number 3 to (1, 2). Match 4 to (3, 1). Match 5 to (2, 2). Match 6 to (1, 3). Match 7 to (4, 1). Match 8 to (3, 2). And so on. We can achieve similar staggering results with ordered triplets, quadruplets, and more. Ordered pairs of integers, positive and negative? Longer to do, yes, but just as doable.

So are there any uncountable things?

Sure. Wouldn’t be here if there weren’t. For example: think about the set that’s all the ways to pick things from a set. I sense your confusion. Let me give you an example. Suppose we have the set of three things. They’re the numbers 1, 2, and 3. We can make a bunch of sets out of things from this set. We can make the set that just has ‘1’ in it. We can make the set that just has ‘2’ in it. Or the set that just has ‘3’ in it. We can also make the set that has just ‘1’ and ‘2’ in it. Or the set that just has ‘2’ and 3′ in it. Or the set that just has ‘3’ and ‘1’ in it. Or the set that has all of ‘1’, ‘2’, and ‘3’ in it. And we can make the set that hasn’t got any of these in it. (Yes, that does too count as a set.)

So from a set of three things, we were able to make a collection of eight sets. If we had a set of four things, we’d be able to make a collection of sixteen sets. With five things to start from, we’d be able to make a collection of thirty-two sets. This collection of sets we call the “power set” of our original set, and if there’s one thing we can say about it, it’s that it’s bigger than the set we start from.

The power set for a finite set, well, that’ll be much bigger. But it’ll still be finite. Still be countable. What about the power set for an infinitely large set?

And the power set of the counting numbers, the collection of all the ways you can make a set of counting numbers, is really big. Is it uncountably big?

Let’s step back. Remember when I said mathematicians don’t get “much more” sophisticated than matching up things to the counting numbers? Here’s a little bit of that sophistication. We don’t have to match stuff up to counting numbers if we like. We can match the things in one set to the things in another set. If it’s possible to match them up one-to-one, with nothing missing in either set, then the two sets have to be the same size. The same cardinality, in the jargon.

So. The set of the numbers 1, 2, 3, has to have a smaller cardinality than its power set. Want to prove it? Do this exactly the way you imagine. You run out of things in the original set before you run out of things in the power set, so there’s no making a one-to-one matchup between the two.

With the infinitely large yet countable set of the counting numbers … well, the same result holds. It’s harder to prove. You have to show that there’s no possible way to match the infinitely many things in the counting numbers to the infinitely many things in the power set of the counting numbers. (The easiest way to do this is by contradiction. Imagine that you have made such a matchup, pairing everything in your power set to everything in the counting numbers. Then you go through your matchup and put together a collection that isn’t accounted for. Whoops! So you must not have matched everything up in the first place. Why not? Because you can’t.)

But the result holds. The power set of the counting numbers is some other set. It’s infinitely large, yes. And it’s so infinitely large that it’s somehow bigger than the counting numbers. It is uncountable.

There’s more than one uncountably large set. Of course there are. We even know of some of them. For example, there’s the set of real numbers. Three-quarters of my readers have been sitting anxiously for the past eight paragraphs wondering if I’d ever get to them. There’s good reason for that. Everybody feels like they know what the real numbers are. And the proof that the real numbers are a larger set than the counting numbers is easy to understand. An eight-year-old could master it. You can find that proof well-explained within the first ten posts of pretty much every mathematics blog other than this one. (I was saving the subject. Then I finally decided I couldn’t explain it any better than everyone else has done.)

Are the real numbers the same size, the same cardinality, as the power set of the counting numbers?

Sure, they are.

No, they’re not.

Whichever you like. This is one of the many surprising mathematical results of the surprising 20th century. Starting from the common set of axioms about set theory, it’s undecidable whether the set of real numbers is as big as the power set of the counting numbers. You can assume that it is. This is known as the Continuum Hypothesis. And you can do fine mathematical work with it. You can assume that it is not. This is known as the … uh … Rejecting the Continuum Hypothesis. And you can do fine mathematical work with that. What’s right depends on what work you want to do. Either is consistent with the starting hypothesis. You are free to choose either, or if you like, neither.

My understanding is that most set theory finds it more productive to suppose that they’re not the same size. I don’t know why this is. I know enough set theory to lead you to this point, but not past it.

But that the question can exist tells you something fascinating. You can take the power set of the power set of the counting numbers. And this gives you another, even vaster, uncountably large set. As enormous as the collection of all the ways to pick things out of the counting numbers is, this power set of the power set is even vaster.

We’re not done. There’s the power set of the power set of the power set of the counting numbers. And the power set of that. Much as geology teaches us to see Deep Time, and astronomy Deep Space, so power sets teach us to see Deep … something. Deep Infinity, perhaps.

A Leap Day 2016 Mathematics A To Z: Transcendental Number


I’m down to the last seven letters in the Leap Day 2016 A To Z. It’s also the next-to-the-last of Gaurish’s requests. This was a fun one.

Transcendental Number.

Take a huge bag and stuff all the real numbers into it. Give the bag a good solid shaking. Stir up all the numbers until they’re thoroughly mixed. Reach in and grab just the one. There you go: you’ve got a transcendental number. Enjoy!

OK, I detect some grumbling out there. The first is that you tried doing this in your head because you somehow don’t have a bag large enough to hold all the real numbers. And you imagined pulling out some number like “2” or “37” or maybe “one-half”. And you may not be exactly sure what a transcendental number is. But you’re confident the strangest number you extracted, “minus 8”, isn’t it. And you’re right. None of those are transcendental numbers.

I regret saying this, but that’s your own fault. You’re lousy at picking random numbers from your head. So am I. We all are. Don’t believe me? Think of a positive whole number. I predict you probably picked something between 1 and 10. Almost surely something between 1 and 100. Surely something less than 10,000. You didn’t even consider picking something between 10,012,002,214,473,325,937,775 and 10,012,002,214,473,325,937,785. Challenged to pick a number, people will select nice and familiar ones. The nice familiar numbers happen not to be transcendental.

I detect some secondary grumbling there. Somebody picked π. And someone else picked e. Very good. Those are transcendental numbers. They’re also nice familiar numbers, at least to people who like mathematics a lot. So they attract attention.

Still haven’t said what they are. What they are traces back, of course, to polynomials. Take a polynomial that’s got one variable, which we call ‘x’ because we don’t want to be difficult. Suppose that all the coefficients of the polynomial, the constant numbers we presumably know or could find out, are integers. What are the roots of the polynomial? That is, for what values of x is the polynomial a complicated way of writing ‘zero’?

For example, try the polynomial x2 – 6x + 5. If x = 1, then that polynomial is equal to zero. If x = 5, the polynomial’s equal to zero. Or how about the polynomial x2 + 4x + 4? That’s equal to zero if x is equal to -2. So a polynomial with integer coefficients can certainly have positive and negative integers as roots.

How about the polynomial 2x – 3? Yes, that is so a polynomial. This is almost easy. That’s equal to zero if x = 3/2. How about the polynomial (2x – 3)(4x + 5)(6x – 7)? It’s my polynomial and I want to write it so it’s easy to find the roots. That polynomial will be zero if x = 3/2, or if x = -5/4, or if x = 7/6. So a polynomial with integer coefficients can have positive and negative rational numbers as roots.

How about the polynomial x2 – 2? That’s equal to zero if x is the square root of 2, about 1.414. It’s also equal to zero if x is minus the square root of 2, about -1.414. And the square root of 2 is irrational. So we can certainly have irrational numbers as roots.

So if we can have whole numbers, and rational numbers, and irrational numbers as roots, how can there be anything else? Yes, complex numbers, I see you raising your hand there. We’re not talking about complex numbers just now. Only real numbers.

It isn’t hard to work out why we can get any whole number, positive or negative, from a polynomial with integer coefficients. Or why we can get any rational number. The irrationals, though … it turns out we can only get some of them this way. We can get square roots and cube roots and fourth roots and all that. We can get combinations of those. But we can’t get everything. There are irrational numbers that are there but that even polynomials can’t reach.

It’s all right to be surprised. It’s a surprising result. Maybe even unsettling. Transcendental numbers have something peculiar about them. The 19th Century French mathematician Joseph Liouville first proved the things must exist, in 1844. (He used continued fractions to show there must be such things.) It would be seven years later that he gave an example of one in nice, easy-to-understand decimals. This is the number 0.110 001 000 000 000 000 000 001 000 000 (et cetera). This number is zero almost everywhere. But there’s a 1 in the n-th digit past the decimal if n is the factorial of some number. That is, 1! is 1, so the 1st digit past the decimal is a 1. 2! is 2, so the 2nd digit past the decimal is a 1. 3! is 6, so the 6th digit past the decimal is a 1. 4! is 24, so the 24th digit past the decimal is a 1. The next 1 will appear in spot number 5!, which is 120. After that, 6! is 720 so we wait for the 720th digit to be 1 again.

And what is this Liouville number 0.110 001 000 000 000 000 000 001 000 000 (et cetera) used for, besides showing that a transcendental number exists? Not a thing. It’s of no other interest. And this plagued the transcendental numbers until 1873. The only examples anyone had of transcendental numbers were ones built to show that they existed. In 1873 Charles Hermite showed finally that e, the base of the natural logarithm, was transcendental. e is a much more interesting number; we have reasons to care about it. Every exponential growth or decay or oscillating process has e lurking in it somewhere. In 1882 Ferdinand von Lindemann showed that π was transcendental, and that’s an even more interesting number.

That bit about π has interesting implications. One goes back to the ancient Greeks. Is it possible, using straightedge and compass, to create a square that’s exactly the same size as a given circle? This is equivalent to saying, if I give you a line segment, can you create another line segment that’s exactly the square root of π times as long? This geometric problem is equivalent to an algebraic one. That problem: can you create a polynomial, with integer coefficients, that has the square root of π as a root? (WARNING: I’m skipping some important points for the sake of clarity. DO NOT attempt to use this to pass your thesis defense without putting those points back in.) We want the square root of π because … well, what’s the area of a square whose sides are the square root of π long? That’s right. So we start with a line segment that’s equal to the radius of the circle and we can do that, surely. Once we have the radius, can’t we make a line that’s the square root of π times the radius, and from that make a square with area exactly π times the radius squared? Since π is transcendental, then, no. We can’t. Sorry. One of the great problems of ancient mathematics, and one that still has the power to attract the casual mathematician, got its final answer in 1882.

Georg Cantor is a name even non-mathematicians might recognize. He showed there have to be some infinite sets bigger than others, and that there must be more real numbers than there are rational numbers. Four years after showing that, he proved there are as many transcendental numbers as there are real numbers.

They’re everywhere. They permeate the real numbers so much that we can understand the real numbers as the transcendental numbers plus some dust. They’re almost the dark matter of mathematics. We don’t actually know all that many of them. Wolfram MathWorld has a table listing numbers proven to be transcendental, and the fact we can list that on a single web page is remarkable. Some of them are large sets of numbers, yes, like e^{\pi \sqrt{d}} for every positive whole number d. And we can infer many more from them; if π is transcendental then so is 2π, and so is 5π, and so is -20.38π, and so on. But the table of numbers proven to be irrational is still just 25 rows long.

There are even mysteries about obvious numbers. π is transcendental. So is e. We know that at least one of π times e and π plus e is transcendental. Perhaps both are. We don’t know which one is, or if both are. We don’t know whether ππ is transcendental. We don’t know whether ee is, either. Don’t even ask if πe is.

How, by the way, does this fit with my claim that everything in mathematics is polynomials? — Well, we found these numbers in the first place by looking at polynomials. The set is defined, even to this day, by how a particular kind of polynomial can’t reach them. Thinking about a particular kind of polynomial makes visible this interesting set.

A Leap Day 2016 Mathematics A To Z: Surjective Map


Gaurish today gives me one more request for the Leap Day Mathematics A To Z. And it lets me step away from abstract algebra again, into the world of analysis and what makes functions work. It also hovers around some of my past talk about functions.

Surjective Map.

This request echoes one of the first terms from my Summer 2015 Mathematics A To Z. Then I’d spent some time on a bijection, or a bijective map. A surjective map is a less complicated concept. But if you understood bijective maps, you picked up surjective maps along the way.

By “map”, in this context, mathematicians don’t mean those diagrams that tell you where things are and how you might get there. Of course we don’t. By a “map” we mean that we have some rule that matches things in one set to things in another. If this sounds to you like what I’ve claimed a function is then you have a good ear. A mapping and a function are pretty much different names for one another. If there’s a difference in connotation I suppose it’s that a “mapping” makes a weaker suggestion that we’re necessarily talking about numbers.

(In some areas of mathematics, a mapping means a function with some extra properties, often some kind of continuity. Don’t worry about that. Someone will tell you when you’re doing mathematics deep enough to need this care. Mind, that person will tell you by way of a snarky follow-up comment picking on some minor point. It’s nothing personal. They just want you to appreciate that they’re very smart.)

So a function, or a mapping, has three parts. One is a set called the domain. One is a set called the range. And then there’s a rule matching things in the domain to things in the range. With functions we’re so used to the domain and range being the real numbers that we often forget to mention those parts. We go on thinking “the function” is just “the rule”. But the function is all three of these pieces.

A function has to match everything in the domain to something in the range. That’s by definition. There’s no unused scraps in the domain. If it looks like there is, that’s because were being sloppy in defining the domain. Or let’s be charitable. We assumed the reader understands the domain is only the set of things that make sense. And things make sense by being matched to something in the range.

Ah, but now, the range. The range could have unused bits in it. There’s nothing that inherently limits the range to “things matched by the rule to some thing in the domain”.

By now, then, you’ve probably spotted there have to be two kinds of functions. There’s one in which the whole range is used, and there’s ones in which it’s not. Good eye. This is exactly so.

If a function only uses part of the range, if it leaves out anything, even if it’s just a single value out of infinitely many, then the function is called an “into” mapping. If you like, it takes the domain and stuffs it into the range without filling the range.

Ah, but if a function uses every scrap of the range, with nothing left out, then we have an “onto” mapping. The whole of the domain gets sent onto the whole of the range. And this is also known as a “surjective” mapping. We get the term “surjective” from Nicolas Bourbaki. Bourbaki is/was the renowned 20th century mathematics art-collective group which did so much to place rigor and intuition-free bases into mathematics.

The term pairs up with the “injective” mapping. In this, the elements in the range match up with one and only one thing in the domain. So if you know the function’s rule, then if you know a thing in the range, you also know the one and only thing in the domain matched to that. If you don’t feel very French, you might call this sort of function one-to-one. That might be a better name for saying why this kind of function is interesting.

Not every function is injective. But then not every function is surjective either. But if a function is both injective and surjective — if it’s both one-to-one and onto — then we have a bijection. It’s a mapping that can represent the way a system changes and that we know how to undo. That’s pretty comforting stuff.

If we use a mapping to describe how a process changes a system, then knowing it’s a surjective map tells us something about the process. It tells us the process makes the system settle into a subset of all the possible states. That doesn’t mean the thing is stable — that little jolts get worn down. And it doesn’t mean that the thing is settling to a fixed state. But it is a piece of information suggesting that’s possible. This may not seem like a strong conclusion. But considering how little we know about the function it’s impressive to be able to say that much.

A Leap Day 2016 Mathematics A To Z: Riemann Sphere


To my surprise nobody requested any terms beginning with `R’ for this A To Z. So I take this free day to pick on a concept I’d imagine nobody saw coming.

Riemann Sphere.

We need to start with the complex plane. This is just, well, a plane. All the points on the plane correspond to a complex-valued number. That’s a real number plus a real number times i. And i is one of those numbers which, squared, equals -1. It’s like the real number line, only in two directions at once.

Take that plane. Now put a sphere on it. The sphere has radius one-half. And it sits on top of the plane. Its lowest point, the south pole, sits on the origin. That’s whatever point corresponds to the number 0 + 0i, or as humans know it, “zero”.

We’re going to do something amazing with this. We’re going to make a projection, something that maps every point on the sphere to every point on the plane, and vice-versa. In other words, we can match every complex-valued number to one point on the sphere. And every point on the sphere to one complex-valued number. Here’s how.

Imagine sitting at the north pole. And imagine that you can see through the sphere. Pick any point on the plane. Look directly at it. Shine a laser beam, if that helps you pick the point out. The laser beam is going to go into the sphere — you’re squatting down to better look through the sphere — and come out somewhere on the sphere, before going on to the point in the plane. The point where the laser beam emerges? That’s the mapping of the point on the plane to the sphere.

There’s one point with an obvious match. The south pole is going to match zero. They touch, after all. Other points … it’s less obvious. But some are easy enough to work out. The equator of the sphere, for instance, is going to match all the points a distance of 1 from the origin. So it’ll have the point matching the number 1 on it. It’ll also have the point matching the number -1, and the point matching i, and the point matching -i. And some other numbers.

All the numbers that are less than 1 from the origin, in fact, will have matches somewhere in the southern hemisphere. If you don’t see why that is, draw some sketches and think about it. You’ll convince yourself. If you write down what convinced you and sprinkle the word “continuity” in here and there, you’ll convince a mathematician. (WARNING! Don’t actually try getting through your Intro to Complex Analysis class doing this. But this is what you’ll be doing.)

What about the numbers more than 1 from the origin? … Well, they all match to points on the northern hemisphere. And tell me that doesn’t stagger you. It’s one thing to match the southern hemisphere to all the points in a circle of radius 1 away from the origin. But we can match everything outside that little circle to the northern hemisphere. And it all fits in!

Not amazed enough? How about this: draw a circle on the plane. Then look at the points on the Riemann sphere that match it. That set of points? It’s also a circle. A line on the plane? That’s also a line on the sphere. (Well, it’s a geodesic. It’s the thing that looks like a line, on spheres.)

How about this? Take a pair of intersecting lines or circles in the plane. Look at what they map to. That mapping, squashed as it might be to the northern hemisphere of the sphere? The projection of the lines or circles will intersect at the same angles as the original. As much as space gets stretched out (near the south pole) or squashed down (near the north pole), angles stay intact.

OK, but besides being stunning, what good is all this?

Well, one is that it’s a good thing to learn on. Geometry gets interested in things that look, at least in places, like planes, but aren’t necessarily. These spheres are, and the way a sphere matches a plane is obvious. We can learn the tools for geometry on the Möbius strip or the Klein bottle or other exotic creations by the tools we prove out on this.

And then physics comes in, being all weird. Much of quantum mechanics makes sense if you imagine it as things on the sphere. (I admit I don’t know exactly how. I went to grad school in mathematics, not in physics, and I didn’t get to the physics side of mathematics much at that time.) The strange ways distance can get mushed up or stretched out have echoes in relativity. They’ll continue having these echoes in other efforts to explain physics as geometry, the way that string theory will.

Also important is that the sphere has a top, the north pole. That point matches … well, what? It’s got to be something infinitely far away from the origin. And this make sense. We can use this projection to make a logically coherent, sensible description of things “approaching infinity”, the way we want to when we first learn about infinitely big things. Wrapping all the complex-valued numbers to this ball makes the vast manageable.

It’s also good numerical practice. Computer simulations have problems with infinitely large things, for the obvious reason. We have a couple of tools to handle this. One is to model a really big but not infinitely large space and hope we aren’t breaking anything. One is to create a “tiling”, making the space we are able to simulate repeat itself in a perfect grid forever and ever. But recasting the problem from the infinitely large plane onto the sphere can also work. This requires some ingenuity, to be sure we do the recasting correctly, but that’s all right. If we need to run a simulation over all of space, we can often get away with doing a simulation on a sphere. And isn’t that also grand?

The Riemann named here is Bernhard Riemann, yet another of those absurdly prolific 19th century mathematicians, especially considering how young he was when he died. His name is all over the fundamentals of analysis and geometry. When you take Introduction to Calculus you get introduced pretty quickly to the Riemann Sum, which is how we first learn how to calculate integrals. It’s that guy. General relativity, and much of modern physics, is based on advanced geometries that again fall back on principles Riemann noticed or set out or described so well that we still think of them as he discovered.

A Leap Day 2016 Mathematics A To Z: Quaternion


I’ve got another request from Gaurish today. And it’s a word I had been thinking to do anyway. When one looks for mathematical terms starting with ‘q’ this is one that stands out. I’m a little surprised I didn’t do it for last summer’s A To Z. But here it is at last:

Quaternion.

I remember the seizing of my imagination the summer I learned imaginary numbers. If we could define a number i, so that i-squared equalled negative 1, and work out arithmetic which made sense out of that, why not do it again? Complex-valued numbers are great. Why not something more? Maybe we could also have some other non-real number. I reached deep into my imagination and picked j as its name. It could be something else. Maybe the logarithm of -1. Maybe the square root of i. Maybe something else. And maybe we could build arithmetic with a whole second other non-real number.

My hopes of this brilliant idea petered out over the summer. It’s easy to imagine a super-complex number, something that’s “1 + 2i + 3j”. And it’s easy to work out adding two super-complex numbers like this together. But multiplying them together? What should i times j be? I couldn’t solve the problem. Also I learned that we didn’t need another number to be the logarithm of -1. It would be π times i. (Or some other numbers. There’s some surprising stuff in logarithms of negative or of complex-valued numbers.) We also don’t need something special to be the square root of i, either. \frac{1}{2}\sqrt{2} + \frac{1}{2}\sqrt{2}\imath will do. (So will another number.) So I shelved the project.

Even if I hadn’t given up, I wouldn’t have invented something. Not along those lines. Finer minds had done the same work and had found a way to do it. The most famous of these is the quaternions. It has a famous discovery. Sir William Rowan Hamilton — the namesake of “Hamiltonian mechanics”, so you already know what a fantastic mind he was — had a flash of insight that’s come down in the folklore and romance of mathematical history. He had the idea on the 16th of October, 1843, while walking with his wife along the Royal Canal, in Dublin, Ireland. While walking across the bridge he saw what was missing. It seems he lacked pencil and paper. He carved it into the bridge:

i^2 = j^2 = k^2 = ijk = -1

The bridge now has a plaque commemorating the moment. You can’t make a sensible system with two non-real numbers. But three? Three works.

And they are a mysterious three! i, j, and k are somehow not the same number. But each of them, multiplied by themselves, gives us -1. And the product of the three is -1. They are even more mysterious. To work sensibly, i times j can’t be the same thing as j times i. Instead, i times j equals minus j times i. And j times k equals minus k times j. And k times i equals minus i times k. We must give up commutivity, the idea that the order in which we multiply things doesn’t matter.

But if we’re willing to accept that the order matters, then quaternions are well-behaved things. We can add and subtract them just as we would think to do if we didn’t know they were strange constructs. If we keep the funny rules about the products of i and j and k straight, then we can multiply them as easily as we multiply polynomials together. We can even divide them. We can do all the things we do with real numbers, only with these odd sets of four real numbers.

The way they look, that pattern of 1 + 2i + 3j + 4k, makes them look a lot like vectors. And we can use them like vectors pointing to stuff in three-dimensional space. It’s not quite a comfortable fit, though. That plain old real number at the start of things seems like it ought to signify something, but it doesn’t. In practice, it doesn’t give us anything that regular old vectors don’t. And vectors allow us to ponder not just three- or maybe four-dimensional spaces, but as many as we need. You might wonder why we need more than four dimensions, even allowing for time. It’s because if we want to track a lot of interacting things, it’s surprisingly useful to put them all into one big vector in a very high-dimension space. It’s hard to draw, but the mathematics is nice. Hamiltonian mechanics, particularly, almost beg for it.

That’s not to call them useless, or even a niche interest. They do some things fantastically well. One of them is rotations. We can represent rotating a point around an arbitrary axis by an arbitrary angle as the multiplication of quaternions. There are many ways to calculate rotations. But if we need to do three-dimensional rotations this is a great one because it’s easy to understand and easier to program. And as you’d imagine, being able to calculate what rotations do is useful in all sorts of applications.

They’ve got good uses in number theory too, as they correspond well to the different ways to solve problems, often polynomials. They’re also popular in group theory. They might be the simplest rings that work like arithmetic but that don’t commute. So they can serve as ways to learn properties of more exotic ring structures.

Knowing of these marvelous exotic creatures of the deep mathematics your imagination might be fired. Can we do this again? Can we make something with, say, four unreal numbers? No, no we can’t. Four won’t work. Nor will five. If we keep going, though, we do hit upon success with seven unreal numbers.

This is a set called the octonions. Hamilton had barely worked out the scheme for quaternions when John T Graves, a friend of his at least up through the 16th of December, 1843, wrote of this new scheme. (Graves didn’t publish before Arthur Cayley did. Cayley’s one of those unspeakably prolific 19th century mathematicians. He has at least 967 papers to his credit. And he was a lawyer doing mathematics on the side for about 250 of those papers. This depresses every mathematician who ponders it these days.)

But where quaternions are peculiar, octonions are really peculiar. Let me call a couple quaternions p, q, and r. p times q might not be the same thing as q times r. But p times the product of q and r will be the same thing as the product of p and q itself times r. This we call associativity. Octonions don’t have that. Let me call a couple quaternions s, t, and u. s times the product of t times u may be either positive or negative the product of s and t times u. (It depends.)

Octonions have some neat mathematical properties. But I don’t know of any general uses for them that are as catchy as understanding rotations. Not rotations in the three-dimensional world, anyway.

Yes, yes, we can go farther still. There’s a construct called “sedenions”, which have fifteen non-real numbers on them. That’s 16 terms in each number. Where octonions are peculiar, sedenions are really peculiar. They work even less like regular old numbers than octonions do. With octonions, at least, when you multiply s by the product of s and t, you get the same number as you would multiplying s by s and then multiplying that by t. Sedenions don’t even offer that shred of normality. Besides being a way to learn about abstract algebra structures I don’t know what they’re used for.

I also don’t know of further exotic terms along this line. It would seem to fit a pattern if there’s some 32-term construct that we can define something like multiplication for. But it would presumably be even less like regular multiplication than sedenion multiplication is. If you want to fiddle about with that please do enjoy yourself. I’d be interested to hear if you turn up anything, but I don’t expect it’ll revolutionize the way I look at numbers. Sorry. But the discovery might be the fun part anyway.

A Leap Day 2016 Mathematics A To Z: Polynomials


I have another request for today’s Leap Day Mathematics A To Z term. Gaurish asked for something exciting. This should be less challenging than Dedekind Domains. I hope.

Polynomials.

Polynomials are everything. Everything in mathematics, anyway. If humans study it, it’s a polynomial. If we know anything about a mathematical construct, it’s because we ran across it while trying to understand polynomials.

I exaggerate. A tiny bit. Maybe by three percent. But polynomials are big.

They’re easy to recognize. We can get them in pre-algebra. We make them out of a set of numbers called coefficients and one or more variables. The coefficients are usually either real numbers or complex-valued numbers. The variables we usually allow to be either real or complex-valued numbers. We take each coefficient and multiply it by some power of each variable. And we add all that up. So, polynomials are things that look like these things:

x^2 - 2x + 1
12 x^4 + 2\pi x^2 y^3 - 4x^3 y - \sqrt{6}
\ln(2) + \frac{1}{2}\left(x - 2\right) - \frac{1}{2 \cdot 2^2}\left(x - 2\right)^2 + \frac{1}{2 \cdot 2^3}\left(x - 2\right)^3 - \frac{1}{2 \cdot 2^4}\left(x - 2\right)^4  + \cdots
a_n x^n + a_{n - 1}x^{n - 1} + a_{n - 2}x^{n - 2} + \cdots + a_2 x^2 + a_1 x^1 + a_0

The first polynomial maybe looks nice and comfortable. The second may look a little threatening, what with it having two variables and a square root in it, but it’s not too weird. The third is an infinitely long polynomial; you’re supposed to keep going on in that pattern, adding even more terms. The last is a generic representation of a polynomial. Each number a0, a1, a2, et cetera is some coefficient that we in principle know. It’s a good way of representing a polynomial when we want to work with it but don’t want to tie ourselves down to a particular example. The highest power we raise a variable to we call the degree of the polynomial. A second-degree polynomial, for example, has an x2 in it, but not an x3 or x4 or x18 or anything like that. A third-degree polynomial has an x3, but not x to any higher powers. Degree is a useful way of saying roughly how long a polynomial is, so it appears all over discussions of polynomials.

But why do we like polynomials? Why like them so much that MathWorld lists 1,163 pages that mention polynomials?

It’s because they’re great. They do everything we’d ever want to do and they’re great at it. We can add them together as easily as we add regular old numbers. We can subtract them as well. We can multiply and divide them. There’s even prime polynomials, just like there are prime numbers. They take longer to work out, but they’re not harder.

And they do great stuff in advanced mathematics too. In calculus we want to take derivatives of functions. Polynomials, we always can. We get another polynomial out of that. So we can keep taking derivatives, as many as we need. (We might need a lot of them.) We can integrate too. The integration produces another polynomial. So we can keep doing that as long as we need too. (We need to do this a lot, too.) This lets us solve so many problems in calculus, which is about how functions work. It also lets us solve so many problems in differential equations, which is about systems whose change depends on the current state of things.

That’s great for analyzing polynomials, but what about things that aren’t polynomials?

Well, if a function is continuous, then it might as well be a polynomial. To be a little more exact, we can set a margin of error. And we can always find polynomials that are less than that margin of error away from the original function. The original function might be annoying to deal with. The polynomial that’s as close to it as we want, though, isn’t.

Not every function is continuous. Most of them aren’t. But most of the functions we want to do work with are, or at least are continuous in stretches. Polynomials let us understand the functions that describe most real stuff.

Nice for mathematicians, all right, but how about for real uses? How about for calculations?

Oh, polynomials are just magnificent. You know why? Because you can evaluate any polynomial as soon as you can add and multiply. (Also subtract, but we think of that as addition.) Remember, x4 just means “x times x times x times x”, four of those x’s in the product. All these polynomials are easy to evaluate.

Even better, we don’t have to evaluate them. We can automate away the evaluation. It’s easy to set a calculator doing this work, and it will do it without complaint and with few unforeseeable mistakes.

Now remember that thing where we can make a polynomial close enough to any continuous function? And we can always set a calculator to evaluate a polynomial? Guess that this means about continuous functions. We have a tool that lets us calculate stuff we would want to know. Things like arccosines and logarithms and Bessel functions and all that. And we get nice easy to understand numbers out of them. For example, that third polynomial I gave you above? That’s not just infinitely long. It’s also a polynomial that approximates the natural logarithm. Pick a positive number x that’s between 0 and 4 and put it in that polynomial. Calculate terms and add them up. You’ll get closer and closer to the natural logarithm of that number. You’ll get there faster if you pick a number near 2, but you’ll eventually get there for whatever number you pick. (Calculus will tell us why x has to be between 0 and 4. Don’t worry about it for now.)

So through polynomials we can understand functions, analytically and numerically.

And they keep revealing things to us. We discovered complex-valued numbers because we wanted to find roots, values of x that make a polynomial of x equal to zero. Some formulas worked well for third- and fourth-degree polynomials. (They look like the quadratic formula, which solves second-degree polynomials. The big difference is nobody remembers what they are without looking them up.) But the formulas sometimes called for things that looked like square roots of negative numbers. Absurd! But if you carried on as if these square roots of negative numbers meant something, you got meaningful answers. And correct answers.

We wanted formulas to solve fifth- and higher-degree polynomials exactly. We can do this with second and third and fourth-degree polynomials, after all. It turns out we can’t. Oh, we can solve some of them exactly. The attempt to understand why, though, helped us create and shape group theory, the study of things that look like but aren’t numbers.

Polynomials go on, sneaking into everything. We can look at a square matrix and discover its characteristic polynomial. This allows us to find beautifully-named things like eigenvalues and eigenvectors. These reveal secrets of the matrix’s structure. We can find polynomials in the formulas that describe how many ways to split up a group of things into a smaller number of sets. We can find polynomials that describe how networks of things are connected. We can find polynomials that describe how a knot is tied. We can even find polynomials that distinguish between a knot and the knot’s reflection in the mirror.

Polynomials are everything.

A Leap Day 2016 Mathematics A To Z: Orthonormal


Jacob Kanev had requested “orthogonal” for this glossary. I’d be happy to oblige. But I used the word in last summer’s Mathematics A To Z. And I admit I’m tempted to just reprint that essay, since it would save some needed time. But I can do something more.

Orthonormal.

“Orthogonal” is another word for “perpendicular”. Mathematicians use it for reasons I’m not precisely sure of. My belief is that it’s because “perpendicular” sounds like we’re talking about directions. And we want to extend the idea to things that aren’t necessarily directions. As majors, mathematicians learn orthogonality for vectors, things pointing in different directions. Then we extend it to other ideas. To functions, particularly, but we can also define it for spaces and for other stuff.

I was vague, last summer, about how we do that. We do it by creating a function called the “inner product”. That takes in two of whatever things we’re measuring and gives us a real number. If the inner product of two things is zero, then the two things are orthogonal.

The first example mathematics majors learn of this, before they even hear the words “inner product”, are dot products. These are for vectors, ordered sets of numbers. The dot product we find by matching up numbers in the corresponding slots for the two vectors, multiplying them together, and then adding up the products. For example. Give me the vector with values (1, 2, 3), and the other vector with values (-6, 5, -4). The inner product will be 1 times -6 (which is -6) plus 2 times 5 (which is 10) plus 3 times -4 (which is -12). So that’s -6 + 10 – 12 or -8.

So those vectors aren’t orthogonal. But how about the vectors (1, -1, 0) and (0, 0, 1)? Their dot product is 1 times 0 (which is 0) plus -1 times 0 (which is 0) plus 0 times 1 (which is 0). The vectors are perpendicular. And if you tried drawing this you’d see, yeah, they are. The first vector we’d draw as being inside a flat plane, and the second vector as pointing up, through that plane, like a thumbtack.

So that’s orthogonal. What about this orthonormal stuff?

Well … the inner product can tell us something besides orthogonality. What happens if we take the inner product of a vector with itself? Say, (1, 2, 3) with itself? That’s going to be 1 times 1 (which is 1) plus 2 times 2 (4, according to rumor) plus 3 times 3 (which is 9). That’s 14, a tidy sum, although, so what?

The inner product of (-6, 5, -4) with itself? Oh, that’s some ugly numbers. Let’s skip it. How about the inner product of (1, -1, 0) with itself? That’ll be 1 times 1 (which is 1) plus -1 times -1 (which is positive 1) plus 0 times 0 (which is 0). That adds up to 2. And now, wait a minute. This might be something.

Start from somewhere. Move 1 unit to the east. (Don’t care what the unit is. Inches, kilometers, astronomical units, anything.) Then move -1 units to the north, or like normal people would say, 1 unit o the south. How far are you from the starting point? … Well, you’re the square root of 2 units away.

Now imagine starting from somewhere and moving 1 unit east, and then 2 units north, and then 3 units straight up, because you found a convenient elevator. How far are you from the starting point? This may take a moment of fiddling around with the Pythagorean theorem. But you’re the square root of 14 units away.

And what the heck, (0, 0, 1). The inner product of that with itself is 0 times 0 (which is zero) plus 0 times 0 (still zero) plus 1 times 1 (which is 1). That adds up to 1. And, yeah, if we go one unit straight up, we’re one unit away from where we started.

The inner product of a vector with itself gives us the square of the vector’s length. At least if we aren’t using some freak definition of inner products and lengths and vectors. And this is great! It means we can talk about the length — maybe better to say the size — of things that maybe don’t have obvious sizes.

Some stuff will have convenient sizes. For example, they’ll have size 1. The vector (0, 0, 1) was one such. So is (1, 0, 0). And you can think of another example easily. Yes, it’s \left(\frac{1}{\sqrt{2}}, -\frac{1}{2}, \frac{1}{2}\right) . (Go ahead, check!)

So by “orthonormal” we mean a collection of things that are orthogonal to each other, and that themselves are all of size 1. It’s a description of both what things are by themselves and how they relate to one another. A thing can’t be orthonormal by itself, for the same reason a line can’t be perpendicular to nothing in particular. But a pair of things might be orthogonal, and they might be the right length to be orthonormal too.

Why do this? Well, the same reasons we always do this. We can impose something like direction onto a problem. We might be able to break up a problem into simpler problems, one in each direction. We might at least be able to simplify the ways different directions are entangled. We might be able to write a problem’s solution as the sum of solutions to a standard set of representative simple problems. This one turns up all the time. And an orthogonal set of something is often a really good choice of a standard set of representative problems.

This sort of thing turns up a lot when solving differential equations. And those often turn up when we want to describe things that happen in the real world. So a good number of mathematicians develop a habit of looking for orthonormal sets.

A Leap Day 2016 Mathematics A To Z: Normal Subgroup


The Leap Day Mathematics A to Z term today is another abstract algebra term. This one again comes from from Gaurish, chief author of the Gaurish4Math blog. Part of it is going to be easy. Part of it is going to need a running start.

Normal Subgroup.

The “subgroup” part of this is easy. Remember that a “group” means a collection of things and some operation that lets us combine them. We usually call that either addition or multiplication. We usually write it out like it’s multiplication. If a and b are things from the collection, we write “ab” to mean adding or multiplying them together. (If we had a ring, we’d have something like addition and something like multiplication, and we’d be able to do “a + b” or “ab” as needed.)

So with that in mind, the first thing you’d imagine a subgroup to be? That’s what it is. It’s a collection of things, all of which are in the original group, and that uses the same operation as the original group. For example, if the original group has a set that’s the whole numbers and the operation of addition, a subgroup would be the even numbers and the same old addition.

Now things will get much clearer if I have names. Let me use G to mean some group. This is a common generic name for a group. Let me use H as the name for a subgroup of G. This is a common generic name for a subgroup of G. You see how deeply we reach to find names for things. And we’ll still want names for elements inside groups. Those are almost always lowercase letters: a and b, for example. If we want to make clear it’s something from G’s set, we might use g. If we want to be make clear it’s something from H’s set, we might use h.

I need to tax your imagination again. Suppose “g” is some element in G’s set. What would you imagine the symbol “gH” means? No, imagine something simpler.

Mathematicians call this “left-multiplying H by g”. What we mean is, take every single element h that’s in the set H, and find out what gh is. Then take all these products together. That’s the set “gH”. This might be a subgroup. It might not. No telling. Not without knowing what G is, what H is, what g is, and what the operation is. And we call it left-multiplying even if the operation is called addition or something else. It’s just easier to have a standard name even if the name doesn’t make perfect sense.

That we named something left-multiplying probably inspires a question. Is there right-multiplying? Yes, there is. We’d write that as “Hg”. And that means take every single element h that’s in the set H, and find out what hg is. Then take all these products together.

You see the subtle difference between left-multiplying and right-multiplying. In the one, you multiply everything in H on the left. In the other, you multiply everything in H on the right.

So. Take anything in G. Let me call that g. If it’s always, necessarily, true that the left-product, gH, is the same set as the right-product, Hg, then H is a normal subgroup of G.

The mistake mathematics majors make in doing this: we need the set gH to be the same as the set Hg. That is, the whole collection of products has to be the same for left-multiplying as right-multiplying. Nobody cares whether for any particular thing, h, inside H whether gh is the same as hg. It doesn’t matter. It’s whether the whole collection of things is the same that counts. I assume every mathematics major makes this mistake. I did, anyway.

The natural thing to wonder here: how can the set gH ever not be the same as Hg? For that matter, how can a single product gh ever not be the same as hg? Do mathematicians just forget how multiplication works?

Technically speaking no, we don’t. We just want to be able to talk about operations where maybe the order does too matter. With ordinary regular-old-number addition and multiplication the order doesn’t matter. gh always equals hg. We say this “commutes”. And if the operation for a group commutes, then every subgroup is a normal subgroup.

But sometimes we’re interested in things that don’t commute. Or that we can’t assume commute. The example every algebra book uses for this is three-dimensional rotations. Set your algebra book down on a table. If you don’t have an algebra book you may use another one instead. I recommend Christopher Miller’s American Cornball: A Laffopedic Guide To The Formerly Funny. It’s a fine guide to all sorts of jokes that used to amuse and what was supposed to be amusing about them. If you don’t have a table then I don’t know what to suggest.

Spin the book clockwise on the table and then stand it up on the edge nearer you. Then try again. Put the book back where it started. Stand it up on the edge nearer you and then spin it clockwise on the table. The book faces a different way this time around. (If it doesn’t, you spun too much. Try again until you get the answer I said.)

Three-dimensional rotations like this form a group. The different ways you can turn something are the elements of its set. The operation between two rotations is just to do one and then the other, in order. But they don’t commute, not most of the time. So they can have a subgroup that isn’t normal.

You may believe me now that such things exist. Now you can move on to wondering why we should care.

Let me start by saying every group has at least two normal subgroups. Whatever your group G is, there’s a subgroup that’s made up just of the identity element and the group’s operation. The identity element is the thing that acts like 1 does for multiplication. You can multiply stuff by it and you get the same thing you started. The identity and the operator make a subgroup. And you’ll convince yourself that it’s a normal subgroup as soon as you write down g1 = 1g.

(Wait, you might ask! What if multiplying on the left has a different identity than multiplying on the right does? Great question. Very good insight. You’ve got a knack for asking good questions. If we have that then we’re working with a more exotic group-like mathematical object, so don’t worry.)

So the identity, ‘1’, makes a normal subgroup. Here’s another normal subgroup. The whole of G qualifies. (It’s OK if you feel uneasy. Think it over.)

So ‘1’ is a normal subgroup of G. G is a normal subgroup of G. They’re boring answers. We know them before we even know anything about G. But they qualify.

Does this sound familiar any? We have a thing. ‘1’ and the original thing subdivide it. It might be possible to subdivide it more, but maybe not.

Is this all … factoring?

Please here pretend I make a bunch of awkward faces while trying not to say either yes or no. But if H is a normal subgroup of G, then we can write something G/H, just like we might write 4/2, and that means something.

That G/H we call a quotient group. It’s a subgroup, sure. As to what it is … well, let me go back to examples.

Let’s say that G is the set of whole numbers and the operation of ordinary old addition. And H is the set of whole numbers that are multiples of 4, again with addition. So the things in H are 0, 4, 8, 12, and so on. Also -4, -8, -12, and so on.

Suppose we pick things in G. And we use the group operation on the set of things in H. How many different sets can we get out of it? So for example we might pick the number 1 out of G. The set 1 + H is … well, list all the things that are in H, and add 1 to them. So that’s 1 + 0, 1 + 4, 1 + 8, 1 + 12, and 1 + -4, 1 + -8, 1 + -12, and so on. All told, it’s a bunch of numbers one more than a whole multiple of 4.

Or we might pick the number 7 out of G. The set 7 + H is 7 + 0, 7 + 4, 7 + 8, 7 + 12, and so on. It’s also got 7 + -4, 7 + -8, 7 + -12, and all that. These are all the numbers that are three more than a whole multiple of 4.

We might pick the number 8 out of G. This happens to be in H, but so what? The set 8 + H is going to be 8 + 0, 8 + 4, 8 + 8 … you know, these are all going to be multiples of 4 again. So 8 + H is just H. Some of these are simple.

How about the number 3? 3 + H is 3 + 0, 3 + 4, 3 + 8, and so on. The thing is, the collection of numbers you get by 3 + H is the same as the collection of numbers you get by 7 + H. Both 3 and 7 do the same thing when we add them to H.

Fiddle around with this and you realize there’s only four possible different sets you get out of this. You can get 0 + H, 1 + H, 2 + H, or 3 + H. Any other numbers in G give you a set that looks exactly like one of those. So we can speak of 0, 1, 2, and 3 as being a new group, the “quotient group” that you get by G/H. (This looks more like remainders to me, too, but that’s the terminology we have.)

But we can do something like this with any group and any normal subgroup of that group. The normal subgroup gives us a way of picking out a representative set of the original group. That set shows off all the different ways we can manipulate the normal subgroup. It tells us things about the way the original group is put together.

Normal subgroups are not just “factors, but for groups”. They do give us a way to see groups as things built up of other groups. We can see structures in sets of things.

A Leap Day 2016 Mathematics A To Z: Matrix


I get to start this week with another request. Today’s Leap Day Mathematics A To Z term is a famous one, and one that I remember terrifying me in the earliest days of high school. The request comes from Gaurish, chief author of the Gaurish4Math blog.

Matrix.

Lewis Carroll didn’t like the matrix. Well, Charles Dodgson, anyway. And it isn’t that he disliked matrices particularly. He believed it was a bad use of a word. “Surely,” he wrote, “[ matrix ] means rather the mould, or form, into which algebraical quantities may be introduced, than an actual assemblage of such quantities”. He might have had etymology on his side. The word meant the place where something was developed, the source of something else. History has outvoted him, and his preferred “block”. The first mathematicians to use the word “matrix” were interested in things derived from the matrix. So for them, the matrix was the source of something else.

What we mean by a matrix is a collection of some number of rows and columns. Inside each individual row and column is some mathematical entity. We call this an element. Elements are almost always real numbers. When they’re not real numbers they’re complex-valued numbers. (I’m sure somebody, somewhere has created matrices with something else as elements. You’ll never see these freaks.)

Matrices work a lot like vectors do. We can add them together. We can multiply them by real- or complex-valued numbers, called scalars. But we can do other things with them. We can define multiplication, at least sometimes. The definition looks like a lot of work, but it represents something useful that way. And for square matrices, ones with equal numbers of rows and columns, we can find other useful stuff. We give that stuff wonderful names like traces and determinants and eigenvalues and eigenvectors and such.

One of the big uses of matrices is to represent a mapping. A matrix can describe how points in a domain map to points in a range. Properly, a matrix made up of real numbers can only describe what are called linear mappings. These are ones that turn the domain into the range by stretching or squeezing down or rotating the whole domain the same amount. A mapping might follow different rules in different regions, but that’s all right. We can write a matrix that approximates the original mapping, at least in some areas. We do this in the same way, and for pretty much the same reason, we can approximate a real and complicated curve with a bunch of straight lines. Or the way we can approximate a complicated surface with a bunch of triangular plates.

We can compound mappings. That is, we can start with a domain and a mapping, and find the image of that domain. We can then use a mapping again and find the image of the image of that domain. The matrix that describes this mapping-of-a-mapping is the one you get by multiplying the matrix of the first mapping and the matrix of the second mapping together. This is why we define matrix multiplication the odd way we do. Mapping are that useful, and matrices are that tied to them.

I wrote about some of the uses of matrices in a Set Tour essay. That was based on a use of matrices in physics. We can describe the changing of a physical system with a mapping. And we can understand equilibriums, states where a system doesn’t change, by looking at the matrix that approximates what the mapping does near but not exactly on the equilibrium.

But there are other uses of matrices. Many of them have nothing to do with mappings or physical systems or anything. For example, we have graph theory. A graph, here, means a bunch of points, “vertices”, connected by curves, “edges”. Many interesting properties of graphs depend on how many other vertices each vertex is connected to. And this is well-represented by a matrix. Index your vertices. Then create a matrix. If vertex number 1 connects to vertex number 2, put a ‘1’ in the first row, second column. If vertex number 1 connects to vertex number 3, put a ‘1’ in the first row, third column. If vertex number 2 isn’t connected to vertex number 3, put a ‘0’ in the second row, third column. And so on.

We don’t have to use ones and zeroes. A “network” is a kind of graph where there’s some cost associated with each edge. We can put that cost, that number, into the matrix. Studying the matrix of a graph or network can tell us things that aren’t obvious from looking at the drawing.

A Leap Day 2016 Mathematics A To Z: Lagrangian


It’s another of my handful of free choice days today. I’ll step outside the abstract algebra focus I’ve somehow gotten lately to look instead at mechanics.

Lagrangian.

So, you likely know Newton’s Laws of Motion. At least you know of them. We build physics out of them. So a lot of applied mathematics relies on them. There’s a law about bodies at rest staying at rest. There’s one about bodies in motion continuing in a straight line. There’s one about the force on a body changing its momentum. Something about F equalling m a. There’s something about equal and opposite forces. That’s all good enough, and that’s all correct. We don’t use them anyway.

I’m overstating for the sake of a good hook. They’re all correct. And if the problem’s simple enough there’s not much reason to go past this F and m a stuff. It’s just that once you start looking at complicated problems this gets to be an awkward tool. Sometimes a system is just hard to describe using forces and accelerations. Sometimes it’s impossible to say even where to start.

For example, imagine you have one of those pricey showpiece globes. The kind that’s a big ball that spins on an axis, and whose axis in on a ring that can tip forward or back. And it’s an expensive showpiece globe. That axis is itself in another ring that rotates clockwise and counterclockwise. Give the globe a good solid spin so it won’t slow down anytime soon. Then nudge the frame, so both the horizontal ring and the ring the axis is on wobble some. The whole shape is going to wobble and move in some way. We ought to be able to model that. How? Force and mass and acceleration barely seem to even exist.

The Lagrangian we get from Joseph-Louis Lagrange, who in the 18th century saw a brilliant new way to understand physics. It doesn’t describe how things move in response to forces, at least not directly. It describes how things move using energy. In particular, it uses on potential energy and kinetic energy.

This is brilliant on many counts. The biggest is in switching from forces to energy. Forces are vectors; they carry information about their size and their direction. Energy is a scalar; it’s just a number. A number is almost always easier to work with than a number alongside a direction.

The second big brilliance is that the Lagrangian gives us freedom in choosing coordinate systems. We have to know where things are and how they’re changing. The first obvious guess for how to describe things is their position in space. And that works fine until we look at stuff such as this spinning, wobbling globe. That never quite moves, although the spinning and the wobbling is some kind of motion. The problem begs us to think of the globe’s rotation around three different axes. Newton doesn’t help us with that. The Lagrangian, though —

The Lagrangian lets us describe physics using “generalized coordinates”. By this we mean coordinates that make sense for the problem even if they don’t directly relate to where something or other is in space. Any pick of coordinates is good, as long as we can describe the potential energy and the kinetic energy of the system using them.

I’ve been writing about this as if the Lagrangian were the cure for all hard work ever. It’s not, alas. For example, we often want to study big bunches of particles that all attract (or repel) each other. That attraction (or repulsion) we represent as potential energy. This is easier to deal with than forces, granted. But that’s easier, which is not the same as easy.

Still, the Lagrangian is great. We can do all the physics we used to. And we have a new freedom to set up problems in convenient ways. And the perspective of looking at energy instead of forces gives us a fruitful view on physics problems.

A Leap Day 2016 Mathematics A To Z: Kullbach-Leibler Divergence


Today’s mathematics glossary term is another one requested by Jacob Kanev. Kaven, I learned last time, has got a blog, “Some Unconsidered Trifles”, for those interested in having more things to read. Kanev’s request this time was a term new to me. But learning things I didn’t expect to consider is part of the fun of this dance.

Kullback-Leibler Divergence.

The Kullback-Leibler Divergence comes to us from information theory. It’s also known as “information divergence” or “relative entropy”. Entropy is by now a familiar friend. We got to know it through, among other things, the “How interesting is a basketball tournament?” question. In this context, entropy is a measure of how surprising it would be to know which of several possible outcomes happens. A sure thing has an entropy of zero; there’s no potential surprise in it. If there are two equally likely outcomes, then the entropy is 1. If there are four equally likely outcomes, then the entropy is 2. If there are four possible outcomes, but one is very likely and the other three mediocre, the entropy might be low, say, 0.5 or so. It’s mostly but not perfectly predictable.

Suppose we have a set of possible outcomes for something. (Pick anything you like. It could be the outcomes of a basketball tournament. It could be how much a favored stock rises or falls over the day. It could be how long your ride into work takes. As long as there are different possible outcomes, we have something workable.) If we have a probability, a measure of how likely each of the different outcomes is, then we have a probability distribution. More likely things have probabilities closer to 1. Less likely things have probabilities closer to 0. No probability is less than zero or more than 1. All the probabilities added together sum up to 1. (These are the rules which make something a probability distribution, not just a bunch of numbers we had in the junk drawer.)

The Kullback-Leibler Divergence describes how similar two probability distributions are to one another. Let me call one of these probability distributions p. I’ll call the other one q. We have some number of possible outcomes, and we’ll use k as an index for them. pk is how likely, in distribution p, that outcome number k is. qk is how likely, in distribution q, that outcome number k is.

To calculate this divergence, we work out, for each k, the number pk times the logarithm of pk divided by qk. Here the logarithm is base two. Calculate all this for every one of the possible outcomes, and add it together. This will be some number that’s at least zero, but it might be larger.

The closer that distribution p and distribution q are to each other, the smaller this number is. If they’re exactly the same, this number will be zero. The less that distribution p and distribution q are like each other, the bigger this number is.

And that’s all good fun, but, why bother with it? And at least one answer I can give is that it lets us measure how good a model of something is.

Suppose we think we have an explanation for how something varies. We can say how likely it is we think there’ll be each of the possible different outcomes. This gives us a probability distribution which let’s call q. We can compare that to actual data. Watch whatever it is for a while, and measure how often each of the different possible outcomes actually does happen. This gives us a probability distribution which let’s call p.

If our model is a good one, then the Kullback-Leibler Divergence between p and q will be small. If our model’s a lousy one, then this divergence will be large. If we have a couple different models, we can see which ones make for smaller divergences and which ones make for larger divergences. Probably we’ll want smaller divergences.

Here you might ask: why do we need a model? Isn’t the actual data the best model we might have? It’s a fair question. But no, real data is kind of lousy. It’s all messy. It’s complicated. We get extraneous little bits of nonsense clogging it up. And the next batch of results is going to be different from the old ones anyway, because real data always varies.

Furthermore, one of the purposes of a model is to be simpler than reality. A model should do away with complications so that it is easier to analyze, easier to make predictions with, and easier to teach than the reality is. But a model mustn’t be so simple that it can’t represent important aspects of the thing we want to study.

The Kullback-Leibler Divergence is a tool that we can use to quantify how much better one model or another fits our data. It also lets us quantify how much of the grit of reality we lose in our model. And this is at least some of the use of this quantity.

A Leap Day 2016 Mathematics A To Z: Jacobian


I don’t believe I got any requests for a mathematics term starting ‘J’. I’m as surprised as you. Well, maybe less surprised. I’ve looked at the alphabetical index for Wolfram MathWorld and noticed its relative poverty for ‘J’. It’s not as bad as ‘X’ or ‘Y’, though. But it gives me room to pick a word of my own.

Jacobian.

The Jacobian is named for Carl Gustav Jacob Jacobi, who lived in the first half of the 19th century. He’s renowned for work in mechanics, the study of mathematically modeling physics. He’s also renowned for matrices, rectangular grids of numbers which represent problems. There’s more, of course, but those are the points that bring me to the Jacobian I mean to talk about. There are other things named for Jacobi, including other things named “Jacobian”. But I mean to limit the focus to two, related, things.

I discussed mappings some while describing homomorphisms and isomorphisms. A mapping’s a relationship matching things in one set, a domain, to things in a set, the range. The domain and the range can be anything at all. They can even be the same thing, if you like.

A very common domain is … space. Like, the thing you move around in. It’s a region full of points that are all some distance and some direction from one another. There’s almost always assumed to be multiple directions possible. We often call this “Euclidean space”. It’s the space that works like we expect for normal geometry. We might start with a two- or three-dimensional space. But it’s often convenient, especially for physics problems, to work with more dimensions. Four-dimensions. Six-dimensions. Incredibly huge numbers of dimensions. Honest, this often helps. It’s just harder to sketch out.

So we might for a problem need, say, 12-dimensional space. We can describe a point in that with an ordered set of twelve coordinates. Each describes how far you are from some standard reference point known as The Origin. If it doesn’t matter how many dimensions we’re working with, we call it an N-dimensional space. Or we use another letter if N is committed to something or other.

This is our stage. We are going to be interested in some N-dimensional Euclidean space. Let’s pretend N is 2; then our stage looks like the screen you’re reading now. We don’t need to pretend N is larger yet.

Our player is a mapping. It matches things in our N-dimensional space back to the same N-dimensional space. For example, maybe we have a mapping that takes the point with coordinates (3, 1) to the point (-3, -1). And it takes the point with coordinates (5.5, -2) to the point (-5.5, 2). And it takes the point with coordinates (-6, -π) to the point (6, π). You get the pattern. If we start from the point with coordinates (x, y) for some real numbers x and y, then the mapping gives us the point with coordinates (-x, -y).

One more step and then the play begins. Let’s not just think about a single point. Think about a whole region. If we look at the mapping of every point in that whole region, we get out … probably, some new region. We call this the “image” of the original region. With the mapping from the paragraph above, it’s easy to say what the image of a region is. It’ll look like the reflection in a corner mirror of the original region.

What if the mapping’s more complicated? What if we had a mapping that described how something was reflected in a cylindrical mirror? Or a mapping that describes how the points would move if they represent points of water flowing around a drain? — And that last explains why Jacobians appear in mathematical physics.

Many physics problems can be understood as describing how points that describe the system move in time. The dynamics of a system can be understood by how moving in time changes a region of starting conditions. A system might keep a region pretty much unchanged. Maybe it makes the region move, but it doesn’t change size or shape much. Or a system might change the region impressively. It might keep the area about the same, but stretch it out and fold it back, the way one might knead cookie dough.

The Jacobian, the one I’m interested in here, is a way of measuring these changes. The Jacobian matrix describes, for each point in the original domain, how a tiny change in one coordinate causes a change in the mapping’s coordinates. So if we have a mapping from an N-dimensional space to an N-dimensional space, there are going to be N times N values at work. Each one represents a different piece. How much does a tiny change in the first coordinate of the original point change the first coordinate of the mapping of the point? How much does a tiny change in the first coordinate of the original point change the second coordinate of the mapping of the the point? How much does a tiny change in the first coordinate of the original point change the third coordinate of the mapping of the point? … how much does a tiny change in the second coordinate of the original point change the first coordinate of the mapping of the point? And on and on and now you know why mathematics majors are trained on Jacobians with two-by-two and three-by-three matrices. We do maybe a couple four-by-four matrices to remind us that we are born to suffer. We never actually work out bigger matrices. Life is just too short.

(I’ve been talking, by the way, about the mapping of an N-dimensional space to an N-dimensional space. This is because we’re about to get to something that requires it. But we can write a matrix like this for a mapping of an N-dimensional space to an M-dimensional space, a different-sized space. It has uses. Let’s not worry about that.)

If you have a square matrix, one that has as many rows as columns, then you can calculate something named the determinant. This involves a lot of work. It takes even more work the bigger the matrix is. This is why mathematics majors learn to calculate determinants on two-by-two and three-by-three matrices. We do a couple four-by-four matrices and maybe one five-by-five to again remind us about suffering.

Anyway, by calculating the determinant of a Jacobian matrix, we get the Jacobian determinant. Finally we have something simple. The Jacobian determinant says how the area of a region changes in the mapping. Suppose the Jacobian determinant at a point is 2. Then a small region containing that point has an image with twice the original area. Suppose the Jacobian determinant is 0.8. Then a small region containing that point has an image with area 0.8 times the original area. Suppose the Jacobian determinant is -1. Then —

Well, what would you imagine?

If the Jacobian determinant is -1, then a small region around that point gets mapped to something with the same area. What changes is called the handedness. The mapping doesn’t just stretch or squash the region, but it also flips it along at least one dimension. The Jacobian determinant can tell us that.

So the Jacobian matrix, and the Jacobian determinant, are ways to describe how mappings change areas. Mathematicians will often call either of them just “the Jacobian”. We trust context to make clear what we mean. Either one is a way of describing how mappings change space: how they expand or contract, how they rotate, how they reflect spaces. Some fields of mathematics, including a surprising amount of the study of physics, are about studying how space changes.

A Leap Day 2016 Mathematics A To Z: Isomorphism


Gillian B made the request that’s today’s A To Z word. I’d said it would be challenging. Many have been, so far. But I set up some of the work with “homomorphism” last time. As with “homomorphism” it’s a word that appears in several fields and about different kinds of mathematical structure. As with homomorphism, I’ll try describing what it is for groups. They seem least challenging to the imagination.

Isomorphism.

An isomorphism is a kind of homomorphism. And a homomorphism is a kind of thing we do with groups. A group is a mathematical construct made up of two things. One is a set of things. The other is an operation, like addition, where we take two of the things and get one of the things in the set. I think that’s as far as we need to go in this chain of defining things.

A homomorphism is a mapping, or if you like the word better, a function. The homomorphism matches everything in a group to the things in a group. It might be the same group; it might be a different group. What makes it a homomorphism is that it preserves addition.

I gave an example last time, with groups I called G and H. G had as its set the whole numbers 0 through 3 and as operation addition modulo 4. H had as its set the whole numbers 0 through 7 and as operation addition modulo 8. And I defined a homomorphism φ which took a number in G and matched it the number in H which was twice that. Then for any a and b which were in G’s set, φ(a + b) was equal to φ(a) + φ(b).

We can have all kinds of homomorphisms. For example, imagine my new φ1. It takes whatever you start with in G and maps it to the 0 inside H. φ1(1) = 0, φ1(2) = 0, φ1(3) = 0, φ1(0) = 0. It’s a legitimate homomorphism. Seems like it’s wasting a lot of what’s in H, though.

An isomorphism doesn’t waste anything that’s in H. It’s a homomorphism in which everything in G’s set matches to exactly one thing in H’s, and vice-versa. That is, it’s both a homomorphism and a bijection, to use one of the terms from the Summer 2015 A To Z. The key to remembering this is the “iso” prefix. It comes from the Greek “isos”, meaning “equal”. You can often understand an isomorphism from group G to group H showing how they’re the same thing. They might be represented differently, but they’re equivalent in the lights you use.

I can’t make an isomorphism between the G and the H I started with. Their sets are different sizes. There’s no matching everything in H’s set to everything in G’s set without some duplication. But we can make other examples.

For instance, let me start with a new group G. It’s got as its set the positive real numbers. And it has as its operation ordinary multiplication, the kind you always do. And I want a new group H. It’s got as its set all the real numbers, positive and negative. It has as its operation ordinary addition, the kind you always do.

For an isomorphism φ, take the number x that’s in G’s set. Match it to the number that’s the logarithm of x, found in H’s set. This is a one-to-one pairing: if the logarithm of x equals the logarithm of y, then x has to equal y. And it covers everything: all the positive real numbers have a logarithm, somewhere in the positive or negative real numbers.

And this is a homomorphism. Take any x and y that are in G’s set. Their “addition”, the group operation, is to multiply them together. So “x + y”, in G, gives us the number xy. (I know, I know. But trust me.) φ(x + y) is equal to log(xy), which equals log(x) + log(y), which is the same number as φ(x) + φ(y). There’s a way to see the postive real numbers being multiplied together as equivalent to all the real numbers being added together.

You might figure that the positive real numbers and all the real numbers aren’t very different-looking things. Perhaps so. Here’s another example I like, drawn from Wikipedia’s entry on Isomorphism. It has as sets things that don’t seem to have anything to do with one another.

Let me have another brand-new group G. It has as its set the whole numbers 0, 1, 2, 3, 4, and 5. Its operation is addition modulo 6. So 2 + 2 is 4, while 2 + 3 is 5, and 2 + 4 is 0, and 2 + 5 is 1, and so on. You get the pattern, I hope.

The brand-new group H, now, that has a more complicated-looking set. Its set is ordered pairs of whole numbers, which I’ll represent as (a, b). Here ‘a’ may be either 0 or 1. ‘b’ may be 0, 1, or 2. To describe its addition rule, let me say we have the elements (a, b) and (c, d). Find their sum first by adding together a and c, modulo 2. So 0 + 0 is 0, 1 + 0 is 1, 0 + 1 is 1, and 1 + 1 is 0. That result is the first number in the pair. The second number we find by adding together b and d, modulo 3. So 1 + 0 is 1, and 1 + 1 is 2, and 1 + 2 is 0, and so on.

So, for example, (0, 1) plus (1, 1) will be (1, 2). But (0, 1) plus (1, 2) will be (1, 0). (1, 2) plus (1, 0) will be (0, 2). (1, 2) plus (1, 2) will be (0, 1). And so on.

The isomorphism matches up things in G to things in H this way:

In G φ(G), in H
0 (0, 0)
1 (1, 1)
2 (0, 2)
3 (1, 0)
4 (0, 1)
5 (1, 2)

I recommend playing with this a while. Pick any pair of numbers x and y that you like from G. And check their matching ordered pairs φ(x) and φ(y) in H. φ(x + y) is the same thing as φ(x) + φ(y) even though the things in G’s set don’t look anything like the things in H’s.

Isomorphisms exist for other structures. The idea extends the way homomorphisms do. A ring, for example, has two operations which we think of as addition and multiplication. An isomorphism matches two rings in ways that preserve the addition and multiplication, and which match everything in the first ring’s set to everything in the second ring’s set, one-to-one. The idea of the isomorphism is that two different things can be paired up so that they look, and work, remarkably like one another.

One of the common uses of isomorphisms is describing the evolution of systems. We often like to look at how some physical system develops from different starting conditions. If you make a little variation in how things start, does this produce a small change in how it develops, or does it produce a big change? How big? And the description of how time changes the system is, often, an isomorphism.

Isomorphisms also appear when we study the structures of groups. They turn up naturally when we look at things called “normal subgroups”. The name alone gives you a good idea what a “subgroup” is. “Normal”, well, that’ll be another essay.

A Leap Day 2016 Mathematics A To Z: Homomorphism


I’m not sure how, but many of my Mathematics A To Z essays seem to circle around algebra. I mean abstract algebra, not the kind that involves petty concerns like ‘x’ and ‘y’. In abstract algebra we worry about letters like ‘g’ and ‘h’. For special purposes we might even have ‘e’. Maybe it’s that the subject has a lot of familiar-looking words. For today’s term, I’m doing an algebra term, and one that wasn’t requested. But it’ll make my life a little easier when I get to a word that was requested.

Homomorphism.

Also, I lied when I said this was an abstract algebra word. At least I was imprecise. The word appears in a fairly wide swath of mathematics. But abstract algebra is where most mathematics majors first encounter it. And the other uses hearken back to this. If you understand what an algebraist means by “homomorphism” then you understand the essence of what someone else means by it.

One of the things mathematicians study a lot is mapping. This is matching the things in one set to things in another set. Most often we want this to be done by some easy-to-understand rule. Why? Well, we often want to understand how one group of things relates to another group. So we set up maps between them. These describe how to match the things in one set to the things in another set. You may think this sounds like it’s just a function. You’re right. I suppose the name “mapping” carries connotations of transforming things into other things that a “function” might not have. And “functions”, I think, suggest we’re working with numbers. “Mappings” sound more abstract, at least to my ear. But it’s just a difference in dialect, not substance.

A homomorphism is a mapping that obeys a couple of rules. What they are depends on the kind of things the homomorphism maps between. I want a simple example, so I’m going to use groups.

A group is made up of two things. One is a set, a collection of elements. For example, take the whole numbers 0, 1, 2, and 3. That’s a good enough set. The second thing in the group is an operation, something to work like addition. For example, we might use “addition modulo 4”. In this scheme, addition (and subtraction) work like they do with ordinary whole numbers. But if the result would be more than 3, we subtract 4 from the result, until we get something that’s 0, 1, 2, or 3. Similarly if the result would be less than 0, we add 4, until we get something that’s 0, 1, 2, or 3. The result is an addition table that looks like this:

+ 0 1 2 3
0 0 1 2 3
1 1 2 3 0
2 2 3 0 1
3 3 0 1 2

So let me call G the group that has as its elements 0, 1, 2, and 3, and that has addition be this modulo-4 addition.

Now I want another group. I’m going to name it H, because the alternative is calling it G2 and subscripts are tedious to put on web pages. H will have a set with the elements 0, 1, 2, 3, 4, 5, 6, and 7. Its addition will be modulo-8 addition, which works the way you might have guessed after looking at the above. But here’s the addition table:

+ 0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
1 1 2 3 4 5 6 7 0
2 2 3 4 5 6 7 0 1
3 3 4 5 6 7 0 1 2
4 4 5 6 7 0 1 2 3
5 5 6 7 0 1 2 3 4
6 6 7 0 1 2 3 4 5
7 7 0 1 2 3 4 5 6

G and H look a fair bit like each other. Their sets are made up of familiar numbers, anyway. And the addition rules look a lot like what we’re used to.

We can imagine mapping from one to the other pretty easily. At least it’s easy to imagine mapping from G to H. Just match a number in G’s set — say, ‘1’ — to a number in H’s set — say, ‘2’. Easy enough. We’ll do something just as daring in matching ‘0’ to ‘1’, and we’ll map ‘2’ to ‘3’. And ‘3’? Let’s match that to ‘4’. Let me call that mapping f.

But f is not a homomorphism. What makes a homomorphism an interesting map is that the group’s original addition rule carries through. This is easier to show than to explain.

In the original group G, what’s 1 + 2? … 3. That’s easy to work out. But in H, what’s f(1) + f(2)? f(1) is 2, and f(2) is 3. So f(1) + f(2) is 5. But what is f(3)? We set that to be 4. So in this mapping, f(1) + f(2) is not equal to f(3). And so f is not a homomorphism.

Could anything be? After all, G and H have different sets, sets that aren’t even the same size. And they have different addition rules, even if the addition rules look like they should be related. Why should we expect it’s possible to match the things in group G to the things in group H?

Let me show you how they could be. I’m going to define a mapping φ. The letter’s often used for homomorphisms. φ matches things in G’s set to things in H’s set. φ(0) I choose to be 0. φ(1) I choose to be 2. φ(2) I choose to be 4. φ(3) I choose to be 6.

And now look at this … φ(1) + φ(2) is equal to 2 + 4, which is 6 … which is φ(3). Was I lucky? Try some more. φ(2) + φ(2) is 4 + 4, which in the group H is 0. In the group G, 2 + 2 is 0, and φ(0) is … 0. We’re all right so far.

One more. φ(3) + φ(3) is 6 + 6, which in group H is 4. In group G, 3 + 3 is 2. φ(2) is 4.

If you want to test the other thirteen possibilities go ahead. If you want to argue there’s actually only seven other possibilities do that, too. What makes φ a homomorphism is that if x and y are things from the set of G, then φ(x) + φ(y) equals φ(x + y). φ(x) + φ(y) uses the addition rule for group H. φ(x + y) uses the addition rule for group G. Some mappings keep the addition of things from breaking. We call this “preserving” addition.

This particular example is called a group homomorphism. That’s because it’s a homomorphism that starts with one group and ends with a group. There are other kinds of homomorphism. For example, a ring homomorphism is a homomorphism that maps a ring to a ring. A ring is like a group, but it has two operations. One works like addition and the other works like multiplication. A ring homomorphism preserves both the addition and the multiplication simultaneously.

And there are homomorphisms for other structures. What makes them homomorphisms is that they preserve whatever the important operations on the strutures are. That’s typically what you might expect when you are introduced to a homomorphism, whatever the field.

A Leap Day 2016 Mathematics A To Z: Grammar


My next entry for this A To Z was another request, this one from Jacob Kanev, who doesn’t seem to have a WordPress or other blog. (If I’m mistaken, please, let me know.) Kanev’s given me several requests, some of them quite challenging. Some too challenging: I have to step back from describing “both context sensitive and not” kinds of grammar just now. I hope all will forgive me if I just introduce the base idea.

Grammar.

One of the ideals humans hold when writing a mathematical proof is to crush all humanity from the proof. It’s nothing personal. It reflects a desire to be certain we have proved things without letting any unstated assumptions or unnoticed biases interfering. The 19th century was a lousy century for mathematicians and their intuitions. Many ideas that seemed clear enough turned out to be paradoxical. It’s natural to want to not make those mistakes again. We can succeed.

We can do this by stripping out everything but the essentials. We can even do away with words. After all, if I say something is a “square”, that suggests I mean what we mean by “square” in English. Our mathematics might not have proved all the square-ness of the thing. And so we reduce the universe to symbols. Letters will do as symbols, if we want to be kind to our typesetters. We do want to be kind now that, thanks to LaTeX, we do our own typesetting.

This is called building a “formal language”. The “formal” here means “relating to the form” rather than “the way you address people when you can’t just say `heya, gang’.” A formal language has two important components. One is the symbols that can be operated on. The other is the operations you can do on the symbols.

If we’ve set it all up correctly then we get something wonderful. We have “statements”. They’re strings of the various symbols. Some of the statements are axioms; they’re assumed to be true without proof. We can turn a statement into another one by using a statement we have and one of the operations. If the operation requires, we can add in something else we already know to be true. Something we’ve already proven.

Any statement we build this way — starting from an axiom and building with the valid operations — is a new and true statement. It’s a theorem. The proof of the theorem? It’s the full sequence of symbols and operations that we’ve built. The line between advanced mathematics and magic is blurred. To give a theorem its full name is to give its proof. (And now you understand why the biographies of many of the pioneering logicians of the late 19th and early 20th centuries include a period of fascination with the Kabbalah and other forms of occult or gnostic mysticism.)

A grammar is what’s required to describe a language like this. It’s defined to be a quartet of properties. The first property is the collection of symbols that can’t be the end of a statement. These are called nonterminal symbols. The second property is the collection of symbols that can end a statement. These are called terminal symbols. (You see why we want to have those as separate lists.) The third property is the collection of rules that let you build new statements from old. The fourth property is the collection of things we take to be true to start. We only have finitely many options for each of these, at least for your typical grammar. I imagine someone has experimented with infinite grammars. But that hasn’t got to be enough of a research field people have to pay attention to them. Not yet, anyway.

Now it’s reasonable to ask if we need mathematicians at all. If building up theorems is just a matter of applying the finitely many rules of inference on finitely many collections of symbols, finitely many times over, then what about this can’t be done by computer? And done better by a computer, since a computer doesn’t need coffee, or bathroom breaks an hour later, or the hope of moving to a tenure-track position?

Well, we do need mathematicians. I don’t say that just because I hope someone will give me money in exchange for doing mathematics. It’s because setting up a computer to just grind out every possible theorem will never turn up what you want to know now. There are several reasons for this.

Here’s a way to see why. It’s drawn from Douglas Hofstadter’s Gödel, Escher, Bach, a copy of which you can find in any college dorm room or student organization office. At least you could back when I was an undergraduate. I don’t know what the kids today use.

Anyway, this scheme has three nonterminal symbols: I, M, and U. As a terminal symbol … oh, let’s just use the space at the end of a string. That way everything looks like words. We will include a couple variables, lowercase letters like x and y and z. They stand for any string of nonterminal symbols. They’re falsework. They help us get work done, but must not appear in our final result.

There’s four rules of inference. The first: if xI is valid, then so is xIM. The second: if Mx is valid, then so is Mxx. The third: if MxIIIy is valid, then so is MxUy. The fourth: if MxUUy is valid, then so is Mxy.

We have one axiom, assumed without proof to be true: MI.

So let’s putter around some. MI is true. So by the second rule, so is MII. That’s a theorem. And since MII is true, by the second rule again, so is MIIII. That’s another theorem. Since MIIII is true, by the first rule, so is MIIIIM. We’ve got another theorem already. Since MIIIIM is true, by the third rule, so is MIUM. We’ve got another theorem. For that matter, since MIIIIM is true, again by the third rule, so is MUIM. Would you like MIUMIUM? That’s waiting there to be proved too.

And that will do. First question: what does any of this even mean? Nobody cares about whether MIUMIUM is a theorem in this system. Nobody cares about figuring out whether MUIUMUIUI might be a theorem. We care about questions like “what’s the smallest odd perfect number?” or “how many equally-strong vortices can be placed in a ring without the system becoming unstable?” With everything reduced to symbol-shuffling like this we’re safe from accidentally assuming something which isn’t justified. But we’re pretty far from understanding what these theorems even mean.

In this case, these strings don’t mean anything. They’re a toy so we can get comfortable with the idea of building theorems this way. We don’t expect them to do any more work than we expect Lincoln Logs to build usable housing. But you can see how we’re starting pretty far from most interesting mathematics questions.

Still, if we started from a system that meant something, we would get there in time, right? … Surely? …

Well, maybe. The thing is, even with this I, M, U scheme and its four rules there are a lot of things to try out. From the first axiom, MI, we can produce either MII or MIM. From MII we can produce MIIM or MIIII. From MIIII we could produce MIIIIM, or MUI, or MIU, or MIIIIIIII. From each of those we can produce … quite a bit of stuff.

All of those are theorems in this scheme and that’s nice. But it’s a lot. Suppose we have set up symbols and axioms and rules that have clear interpretations that relate to something we care about. If we set the computer to produce every possible legitimate result we are going to produce an enormous number of results that we don’t care about. They’re not wrong, they’re just off-point. And there’s a lot more true things that are off-point than there are true things on-point. We need something with judgement to pick out results that have anything to do with what we want to know. And trying out combinations to see if we can produce the pattern we want is hard. Really hard.

And there’s worse. If we set up a formal language that matches real mathematics, then we need a lot of work to prove anything. Even simple statements can take forever. I seem to remember my logic professor needing 27 steps to work out the uncontroversial theorem “if x = y and y = z, then x = z”. (Granting he may have been taking the long way around for demonstration purposes.) We would have to look in theorems of unspeakably many symbols to find the good stuff.

Now it’s reasonable to ask what the point of all this is. Why create a scheme that lets us find everything that can be proved, only to have all we’re interested in buried in garbage?

There are some uses. To make us swear we’ve read Jorge Luis Borges, for one. Another is to study the theory of what we can prove. That is, what are we able to learn by logical deduction? And another is to design systems meant to let us solve particular kinds of problems. That approach makes the subject merge into computer science. Code for a computer is, in a sense, about how to change a string of data into another string of data. What are the legitimate data to start with? What are the rules by which to change the data? And these are the sorts of things grammars, and the study of grammars, are about.

A Leap Day 2016 Mathematics A To Z: Fractions (Continued)


Another request! I was asked to write about continued fractions for the Leap Day 2016 A To Z. The request came from Keilah, of the Knot Theorist blog. But I’d already had a c-word request in (conjecture). So you see my elegant workaround to talk about continued fractions anyway.

Fractions (continued).

There are fashions in mathematics. There are fashions in all human endeavors. But mathematics almost begs people to forget that it is a human endeavor. Sometimes a field of mathematics will be popular a while and then fade. Some fade almost to oblivion. Continued fractions are one of them.

A continued fraction comes from a simple enough starting point. Start with a whole number. Add a fraction to it. 1 + \frac{2}{3}. Everyone knows what that is. But then look at the denominator. In this case, that’s the ‘3’. Why couldn’t that be a sum, instead? No reason. Imagine then the number 1 + \frac{2}{3 + 4}. Is there a reason that we couldn’t, instead of the ‘4’ there, have a fraction instead? No reason beyond our own timidity. Let’s be courageous. Does 1 + \frac{2}{3 + \frac{4}{5}} even mean anything?

Well, sure. It’s getting a little hard to read, but 3 + \frac{4}{5} is a fine enough number. It’s 3.8. \frac{2}{3.8} is a less friendly number, but it’s a number anyway. It’s a little over 0.526. (It takes a fair number of digits past the decimal before it ends, but trust me, it does.) And we can add 1 to that easily. So 1 + \frac{2}{3 + \frac{4}{5}} means a number a slight bit more than 1.526.

Dare we replace the “5” in that expression with a sum? Better, with the sum of a whole number and a fraction? If we don’t fear being audacious, yes. Could we replace the denominator of that with another sum? Yes. Can we keep doing this forever, creating this never-ending stack of whole numbers plus fractions? … If we want an irrational number, anyway. If we want a rational number, this stack will eventually end. But suppose we feel like creating an infinitely long stack of continued fractions. Can we do it? Why not? Who dares, wins!

OK. Wins what, exactly?

Well … um. Continued fractions certainly had a fashionable time. John Wallis, the 17th century mathematician famous for introducing the ∞ symbol, and for an interminable quarrel with Thomas Hobbes over Hobbes’s attempts to reform mathematics, did much to establish continuous fractions as a field of study. (He’s credited with inventing the field. But all claims to inventing something big are misleading. Real things are complicated and go back farther than people realize, and inventions are more ambiguous than people think.) The astronomer Christiaan Huygens showed how to use continued fractions to design better gear ratios. This may strike you as the dullest application of mathematics ever. Let it. It’s also important stuff. People who need to scale one movement to another need this.

In the 18th and 19th century continued fractions became interesting for higher mathematics. Continued fractions were the approach Leonhard Euler used to prove that e had to be irrational. That’s one of the superstar numbers of mathematics. Johan Heinrich Lambert used this to show that if θ is a rational number (other than zero) then the tangent of θ must be irrational. This is one path to showing that π must be irrational. Many of the astounding theorems of Srinivasa Ramanujan were about continued fractions, or ideas which built on continued fractions.

But since the early 20th century the field’s evaporated. I don’t have a good answer why. The best speculation I’ve heard is that the field seems to fit poorly into any particular topic. Continued fractions get interesting when you have an infinitely long stack of nesting denominators. You don’t want to work with infinitely long strings of things before you’ve studied calculus. You have to be comfortable with these things. But that means students don’t encounter it until college, at least. And at that point fractions seem beneath the grade level. There’s a handful of proofs best done by them. But those proofs can be shown as odd, novel approaches to these particular problems. Studying the whole field is hardly needed.

So, perhaps because it seems like an odd fit, the subject’s dried up and blown away. Even enthusiasts seem to be resigned to its oblivion. Professor Adam Van Tyul, then at Queens University in Kingston, Ontario, composed a nice set of introductory pages about continued fractions. But the page is defunct. Dr Ron Knott has a more thorough page, though, and one with calculators that work well.

Will continued fractions make a comeback? Maybe. It might take the discovery of some interesting new results, or some better visualization tools, to reignite interest. Chaos theory, the study of deterministic yet unpredictable systems, first grew (we now recognize) in the 1890s. But it fell into obscurity. When we got some new theoretical papers and the ability to do computer simulations, it flowered again. For a time it looked ready to take over all mathematics, although we’ve got things under better control now. Could continued fractions do the same? I’m skeptical, but won’t rule it out.

Postscript: something you notice quickly with continued fractions is they’re a pain to typeset. We’re all right with 1 + \frac{2}{3 + \frac{4}{5}} . But after that the LaTeX engine that WordPress uses to render mathematical symbols is doomed. A real LaTeX engine gets another couple nested denominators in before the situation is hopeless. If you’re writing this out on paper, the way people did in the 19th century, that’s all right. But there’s no typing it out that way.

But notation is made for us, not us for notation. If we want to write a continued fraction in which the numerators are all 1, we have a brackets shorthand available. In this we would write 2 + \frac{1}{3 + \frac{1}{4 + \cdots }} as [2; 3, 4, … ]. The numbers are the whole numbers added to the next level of fractions. Another option, and one that lends itself to having numerators which aren’t 1, is to write out a string of fractions. In this we’d write 2 + \frac{1}{3 +} \frac{1}{4 +} \frac{1}{\cdots + }. We have to trust people notice the + sign is in the denominator there. But if people know we’re doing continued fractions then they know to look for the peculiar notation.

A Leap Day 2016 Mathematics A To Z: Energy


Another of the requests I got for this A To Z was for energy. It came from Dave Kingsbury, of the A Nomad In Cyberspace blog. He was particularly intersted in how E = mc2 and how we might know that’s so. But we ended up threshing that out tolerably well in the original Any Requests post. So I’ll take the energy as my starting point again and go in a different direction.

Energy.

When I was in high school, back when the world was new, our physics teacher presented the class with a problem inspired by an Indiana Jones movie. There’s a scene where Jones is faced with dropping to sure death from a rope bridge. He cuts the bridge instead, swinging on it to the cliff face and safety. Our teacher asked: would that help any?

It’s easy to understand a person dropping the fifty feet we supposed it was. A high school physics class can do the mathematics involved and say how fast Jones would hit the water below. You don’t even need the little bit of calculus we could do then. At least if you’re willing to ignore air resistance. High school physics classes always are.

Swinging on the rope bridge, though — that’s harder. We could model it all right. We could pretend Jones was a weight on the end of a rigid pendulum. And we could describe what the forces accelerating this weight on a pendulum are going through as it swings its arc down. But we looked at the integrals we would have to work out to say how fast he would hit the cliff face. It wasn’t pretty. We had no idea how to even look up how to do these.

He spared us this work. His point in this was to revive our interest in physics by bringing in pop culture and to introduce the conservation of energy. We can ignore all these forces and positions and the path of a falling thing. We can look at the potential energy, the result of gravity, at the top of the bridge. Then look at how much less there is at the bottom. Where does that energy go? It goes into kinetic energy, increasing the momentum of the falling body. We can get what we are interested in — how fast Jones is moving at the end of his fall — with a lot less work.

Why is this less work? I doubt I can explain the deep philosophical implications of that well enough. I can point to the obvious. Forces and positions and velocities and all are vectors. They’re ordered sets of numbers. You have to keep the ordering consistent. You have to pay attention to paths. You have to keep track of the angles between, say, what direction gravity accelerates Jones, and where Jones is relative his starting point, and in what direction he’s moving. We have notation that makes all this easy to follow. But there’s a lot of work hiding behind the symbols.

Energy, though … well, that’s just a number. It’s even a constant number, if energy is conserved. We can split off a potential energy. That’s still just a number. If it changes, we can tell how much it’s changed by subtraction. We’re comfortable with that.

Mathematicians call that a scalar. That just means that it’s a real number. It happens to represent something interesting. We can relate the scalar representing potential energy to the vectors of forces that describe how things move. (Spoiler: finding the relationship involves calculus. We go from vectors to a scalar by integration. We go from the scalar to the vector by a gradient, which is a kind of vector-valued derivative.) Once we know this relationship we have two ways of describing the same stuff. We can switch to whichever one makes our work easier. This is often the scalar. Solitary numbers are just so often easier than ordered sets of numbers.

The energy, or the potential energy, of a physical system isn’t the only time we can change a vector problem into a scalar. And we can’t always do that anyway. If we have to keep track of things like air resistance or energy spent, say, melting the ice we’re staking over, then the change from vectors to a scalar loses information we can’t do without. But the trick often works. Potential energy is one of the most familiar ways this is used.

I assume Jones got through his bridge problem all right. Happens that I still haven’t seen the movies, but I have heard quite a bit about them and played the pinball game.

A Leap Day 2016 Mathematics A To Z: Dedekind Domain


When I tossed this season’s A To Z open to requests I figured I’d get some surprising ones. So I did. This one’s particularly challenging. It comes from Gaurish Korpal, author of the Gaurish4Math blog.

Dedekind Domain

A major field of mathematics is Algebra. By this mathematicians don’t mean algebra. They mean studying collections of things on which you can do stuff that looks like arithmetic. There’s good reasons why this field has that confusing name. Nobody knows what they are.

We’ve seen before the creation of things that look a bit like arithmetic. Rings are a collection of things for which we can do something that works like addition and something that works like multiplication. There are a lot of different kinds of rings. When a mathematics popularizer tries to talk about rings, she’ll talk a lot about the whole numbers. We can usually count on the audience to know what they are. If that won’t do for the particular topic, she’ll try the whole numbers modulo something. If she needs another example then she talks about the ways you can rotate or reflect a triangle, or square, or hexagon and get the original shape back. Maybe she calls on the sets of polynomials you can describe. Then she has to give up on words and make do with pictures of beautifully complicated things. And after that she has to give up because the structures get too abstract to describe without losing the audience.

Dedekind Domains are a kind of ring that meets a bunch of extra criteria. There’s no point my listing them all. It would take several hundred words and you would lose motivation to continue before I was done. If you need them anyway Eric W Weisstein’s MathWorld dictionary gives the exact criteria. It also has explanations for all the words in those criteria.

Dedekind Domains, also called Dedekind Rings, are aptly named for Richard Dedekind. He was a 19th century mathematician, the last doctoral student of Gauss, and one of the people who defined what we think of as algebra. He also gave us a rigorous foundation for what irrational numbers are.

Among the problems that fascinated Dedekind was Fermat’s Last Theorem. This can’t surprise you. Every person who would be a mathematician is fascinated by it. We take our innings fiddling with cases and ways to show an + bn can’t equal cn for interesting whole numbers a, b, c, and n. We usually go about this by saying, “Suppose we have the smallest a, b, and c for which this is true and for which n is bigger than 2”. Then we do a lot of scribbling that shows this implies something contradictory, like an even number equals an odd, or that there’s some set of smaller numbers making this true. This proves the original supposition was false. Mathematicians first learn that trick as a way to show the square root of two can’t be a rational number. We stick with it because it’s nice and familiar and looks relevant. Most of us get maybe as far as proving there aren’t any solutions for n = 3 or maybe n = 4 and go on to other work. Dedekind didn’t prove the theorem. But he did find new ways to look at numbers.

One problem with proving Fermat’s Last Theorem is that it’s all about integers. Integers are hard to prove things about. Real numbers are easier. Complex-valued numbers are easier still. This is weird but it’s so. So we have this promising approach: if we could prove something like Fermat’s Last Theorem for complex-valued numbers, we’d get it up for integers. Or at least we’d be a lot of the way there. The one flaw is that Fermat’s Last Theorem isn’t true for complex-valued numbers. It would be ridiculous if it were true.

But we can patch things up. We can construct something called Gaussian Integers. These are complex-valued numbers which we can match up to integers in a compelling way. We could use the tools that work on complex-valued numbers to squeeze out a result about integers.

You know that this didn’t work. If it had, we wouldn’t have had to wait for the 1990s for the proof of Fermat’s Last Theorem. And that proof would have anything to do with this stuff. It hasn’t. One of the problems keeping this kind of proof from working is factoring. Whole numbers are either prime numbers or the product of prime numbers. Or they’re 1, ruled out of the universe of prime numbers for reasons I get to after the next paragraph. Prime numbers are those like 2, 5, 13, 37 and many others. They haven’t got any factors besides themselves and 1. The other whole numbers are the products of prime numbers. 12 is equal to 2 times 2 times 3. 35 is equal to 5 times 7. 165 is equal to 3 times 5 times 11.

If we stick to whole numbers, then, these all have unique prime factorizations. 24 is equal to 2 times 2 times 2 times 3. And there are no other combinations of prime numbers that multiply together to give us 24. We could rearrange the numbers — 2 times 3 times 2 times 2 works. But it will always be a combination of three 2’s and a single 3 that we multiply together to get 24.

(This is a reason we don’t consider 1 a prime number. If we did consider a prime number, then “three 2’s and a single 3” would be a prime factorization of 24, but so would “three 2’s, a single 3, and two 1’s”. Also “three 2’s, a single 3, and fifteen 1’s”. Also “three 2’s, a single 3, and one 1”. We have a lot of theorems that depend on whole numbers having a unique prime factorization. We could add the phrase “except for the count of 1’s in the factorization” to every occurrence of the phrase “prime factorization”. Or we could say that 1 isn’t a prime number. It’s a lot less work to say 1 isn’t a prime number.)

The trouble is that if we work with Gaussian integers we don’t have that unique prime factorization anymore. There are still prime numbers. But it’s possible to get some numbers as a product of different sets of prime numbers. And this point breaks a lot of otherwise promising attempts to prove Fermat’s Last Theorem. And there’s no getting around that, not for Fermat’s Last Theorem.

Dedekind saw a good concept lurking under this, though. The concept is called an ideal. It’s a subset of a ring that itself satisfies the rules for being a ring. And if you take something from the original ring and multiply it by something in the ideal, you get something that’s still in the ideal. You might already have one in mind. Start with the ring of integers. The even numbers are an ideal of that. Add any two even numbers together and you get an even number. Multiply any two even numbers together and you get an even number. Take any integer, even or not, and multiply it by an even number. You get an even number.

(If you were wondering: I mean the ideal would be a “ring without identity”. It’s not required to have something that acts like 1 for the purpose of multiplication. If we insisted on looking at the even numbers and the number 1, then we couldn’t be sure that adding two things from the ideal would stay in the ideal. After all, 2 is in the ideal, and if 1 also is, then 2 + 1 is a peculiar thing to consider an even number.)

It’s not just even numbers that do this. The multiples of 3 make an ideal in the integers too. Add two multiples of 3 together and you get a multiple of 3. Multiply two multiples of 3 together and you get another multiple of 3. Multiply any integer by a multiple of 3 and you get a multiple of 3.

The multiples of 4 also make an ideal, as do the multiples of 5, or the multiples of 82, or of any whole number you like.

Odd numbers don’t make an ideal, though. Add two odd numbers together and you don’t get an odd number. Multiply an integer by an odd number and you might get an odd number, you might not.

And not every ring has an ideal lurking within it. For example, take the integers modulo 3. In this case there are only three numbers: 0, 1, and 2. 1 + 1 is 2, uncontroversially. But 1 + 2 is 0. 2 + 2 is 1. 2 times 1 is 2, but 2 times 2 is 1 again. This is self-consistent. But it hasn’t got an ideal within it. There isn’t a smaller set that has addition work.

The multiples of 4 make an interesting ideal in the integers. They’re not just an ideal of the integers. They’re also an ideal of the even numbers. Well, the even numbers make a ring. They couldn’t be an ideal of the integers if they couldn’t be a ring in their own right. And the multiples of 4 — well, multiply any even number by a multiple of 4. You get a multiple of 4 again. This keeps on going. The multiples of 8 are an ideal for the multiples of 4, the multiples of 2, and the integers. Multiples of 16 and 32 make for even deeper nestings of ideals.

The multiples of 6, now … that’s an ideal of the integers, for all the reasons the multiples of 2 and 3 and 4 were. But it’s also an ideal of the multiples of 2. And of the multiples of 3. We can see the collection of “things that are multiples of 6” as a product of “things that are multiples of 2” and “things that are multiples of 3”. Dedekind saw this before us.

You might want to pause a moment while considering the idea of multiplying whole sets of numbers together. It’s a heady concept. Trying to do proofs with the concept feels at first like being tasked with alphabetizing a cloud. But we’re not planning to prove anything so you can move on if you like with an unalphabetized cloud.

A Dedekind Domain is a ring that has ideals like this. And the ideals come in two categories. Some are “prime ideals”, which act like prime numbers do. The non-prime ideals are the products of prime ideals. And while we might not have unique prime factorizations of numbers, we do have unique prime factorizations of ideals. That is, if an ideal is a product of some set of prime ideals, then it can’t also be the product of some other set of prime ideals. We get back something like unique factors.

This may sound abstract. But you know a Dedekind Domain. The integers are one. That wasn’t a given. Yes, we start algebra by looking for things that work like regular arithmetic do. But that doesn’t promise that regular old numbers will still satisfy us. We can, for instance, study things where the order matters in multiplication. Then multiplying one thing by a second gives us a different answer to multiplying the second thing by the first. Still, regular old integers are Dedekind domains and it’s hard to think of being more familiar than that.

Another example is the set of polynomials. You might want to pause for a moment here. Mathematics majors need a pause to start thinking of polynomials as being something kind of like regular old numbers. But you can certainly add one polynomial to another, and you get a polynomial out of it. You can multiply one polynomial by another, and you get a polynomial out of that. Try it. After that the only surprise would be that there are prime polynomials. But if you try to think of two polynomials that multiply together to give you “x + 1” you realize there have to be.

Other examples start getting more exotic. They’re things like the Gaussian integers I mentioned before. Gaussian integers are themselves an example of a structure called algebraic integers. Algebraic integers are — well, think of all the polynomials you can out of integer coefficients, and with a leading coefficient of 1. So, polynomials that look like “x3 – 4 x2 + 15 x + 6” or the like. All of the roots of those, the values of x which make that expression equal to zero, are algebraic integers. Yes, almost none of them are integers. We know. But the algebraic integers are also a Dedekind Domain.

I’d like to describe some more Dedekind Domains. I am foiled. I can find some more, but explaining them outside the dialect of mathematics is hard. It would take me more words than I am confident readers will give me.

I hope you are satisfied to know a bit of what a Dedekind Domain is. It is a kind of thing which works much like integers do. But a Dedekind Domain can be just different enough that we can’t count on factoring working like we are used to. We don’t lose factoring altogether, though. We are able to keep an attenuated version. It does take quite a few words to explain exactly how to set this up, however.

A Leap Day 2016 Mathematics A To Z: Conjecture


For today’s entry in the Leap Day 2016 Mathematics A To Z I have an actual request from from Elke Stangl. I’d had another ‘c’ request, for ‘continued fractions’. I’ve decided to address that by putting ‘Fractions, continued’ on the roster. If you have other requests, for letters not already committed, please let me know. I’ve got some letters I can use yet.

Conjecture.

An old joke says a mathematician’s job is to turn coffee into theorems. I prefer tea, which may be why I’m not employed as a mathematician. A theorem is a logical argument that starts from something known to be true. Or we might start from something assumed to be true, if we think the setup interesting and plausible. And it uses laws of logical inference to draw a conclusion that’s also true and, hopefully, interesting. If it isn’t interesting, maybe it’s useful. If it isn’t either, maybe at least the argument is clever.

How does a mathematician know what theorems to try proving? We could assemble any combination of premises as the setup to a possible theorem. And we could imagine all sorts of possible conclusions. Most of them will be syntactically gibberish, the equivalent of our friends the monkeys banging away on keyboards. Of those that aren’t, most will be untrue, or at least impossible to argue. Of the rest, potential theorems that could be argued, many will be too long or too unfocused to follow. Only a tiny few potential combinations of premises and conclusions could form theorems of any value. How does a mathematician get a good idea where to spend her time?

She gets it from experience. In learning what theorems, what arguments, have been true in the past she develops a feeling for things that would plausibly be true. In playing with mathematical constructs she notices patterns that seem to be true. As she gains expertise she gets a sense for things that feel right. And she gets a feel for what would be a reasonable set of premises to bundle together. And what kinds of conclusions probably follow from an argument that people can follow.

This potential theorem, this thing that feels like it should be true, a conjecture.

Properly, we don’t know whether a conjecture is true or false. The most we can say is that we don’t have evidence that it’s false. New information might show that we’re wrong and we would have to give up the conjecture. Finding new examples that it’s true might reinforce our idea that it’s true, but that doesn’t prove it’s true.

For example, we have the Goldbach Conjecture. According to it every even number greater than two can be written as the sum of exactly two prime numbers. The evidence for it is very good: every even number we’ve tied has worked out, up through at least 4,000,000,000,000,000,000. But it isn’t proven. It’s possible that it’s impossible from the standard rules of arithmetic.

That’s a famous conjecture. It’s frustrated mathematicians for centuries. It’s easy to understand and nobody’s found a proof. Famous conjectures, the ones that get names, tend to do that. They looked nice and simple and had hidden depths.

Most conjectures aren’t so storied. They instead appear as notes at the end of a section in a journal article or a book chapter. Or they’re put on slides meant to refresh the audience’s interest where it’s needed. They are needed at the fifteen-minute park of a presentation, just after four slides full of dense equations. They are also needed at the 35-minute mark, in the middle of a field of plots with too many symbols and not enough labels. And one’s needed just before the summary of the talk, so that the audience can try to remember what the presentation was about and why they thought they could understand it. If the deadline were not so tight, if the conference were a month or so later, perhaps the mathematician would find a proof for these conjectures.

Perhaps. As above, some conjectures turn out to be hard. Fermat’s Last Theorem stood for four centuries as a conjecture. Its first proof turned out to be nothing like anything Fermat could have had in mind. Mathematics popularizers lost an easy hook when that was proven. We used to be able to start an essay on Fermat’s Last Theorem by huffing about how it was properly a conjecture but the wrong term stuck to it because English is a perverse language. Now we have to start by saying how it used to be a conjecture instead.

But few are like that. Most conjectures are ideas that feel like they ought to be true. They appear because a curious mind will look for new ideas that resemble old ones, or will notice patterns that seem to resemble old patterns.

And sometimes conjectures turn out to be false. Something can look like it ought to be true, or maybe would be true, and yet be false. Often we can prove something isn’t true by finding an example, just as you might expect. But that doesn’t mean it’s easy. Here’s a false conjecture, one that was put forth by Goldbach. All odd numbers are either prime, or can be written as the sum of a prime and twice a square number. (He considered 1 to be a prime number.) It’s not true, but it took over a century to show that. If you want to find a counterexample go ahead and have fun trying.

Still, if a mathematician turns coffee into theorems, it is through the step of finding conjectures, promising little paths in the forest of what is not yet known.

A Leap Day 2016 Mathematics A To Z: Basis


Today’s glossary term is one that turns up in many areas of mathematics. But these all share some connotations. So I mean to start with the easiest one to understand.

Basis.

Suppose you are somewhere. Most of us are. Where is something else?

That isn’t hard to answer if conditions are right. If we’re allowed to point and the something else is in sight, we’re done. It’s when pointing and following the line of sight breaks down that we’re in trouble. We’re also in trouble if we want to say how to get from that something to yet another spot. How can we guide someone from one point to another?

We have a good answer from everyday life. We can impose some order, some direction, on space. We’re familiar with this from the cardinal directions. We say where things on the surface of the Earth are by how far they are north or south, east or west, from something else. The scheme breaks down a bit if we’re at the North or the South pole exactly, but there we can fall back on pointing.

When we start using north and south and east and west as directions we are choosing basis vectors. Vectors are directions in how far to move and in what direction. Suppose we have two vectors that aren’t pointing in the same direction. Then we can describe any two-dimensional movement using them. We can say “go this far in the direction of the first vector and also that far in the direction of the second vector”. With the cardinal directions, we consider north and east, or east and south, or south and west, or west and north to be a pair of vectors going in different directions.

(North and south, in this context, are the same thing. “Go twenty paces north” says the same thing as “go negative twenty paces south”. Most mathematicians don’t pull this sort of stunt when telling you how to get somewhere unless they’re trying to be funny without succeeding.)

A basis vector is just a direction, and distance in that direction, that we’ve decided to be a reference for telling different points in space apart. A basis set, or basis, is the collection of all the basis vectors we need. What do we need? We need enough basis vectors to get to all the points in whatever space we’re working with.

(If you are going to ask about doesn’t “east” point in different directions as we go around the surface of the Earth, you’re doing very well. Please pretend we never move so far from where we start that anyone could notice the difference. If you can’t do that, please pretend the Earth has been smooshed into a huge flat square with north at one end and we’re only just now noticing.)

We are free to choose whatever basis vectors we like. The worst that can happen if we choose a lousy basis is that we have to write out more things than we otherwise would. Our work won’t be less true, it’ll just be more tedious. But there are some properties that often make for a good basis.

One is that the basis should relate to the problem you’re doing. Suppose you were in one of mathematicians’ favorite places, midtown Manhattan. There is a compelling grid here of streets running north-south and avenues running east-west. (Broadway we ignore as an implementation error retained for reasons of backwards compatibility.) Well, we pretend they run north-south and east-west. They’re actually a good bit clockwise of north-south and east-west. They do that to better match the geography of the island. A “north” street runs about parallel to the way Manhattan’s long dimension runs. In the circumstance, it would be daft to describe directions by true north or true east. We would say to go so many streets “north” and so many avenues “east”.

Purely mathematical problems aren’t concerned with streets and avenues. But there will often be preferred directions. Mathematicians often look at the way a process alters shapes or redirects forces. There’ll be some directions where the alterations are biggest. There’ll be some where the alterations are shortest. Those directions are probably good choices for a basis. They stand out as important.

We also tend to like basis vectors that are a unit length. That is, their size is 1 in some convenient unit. That’s for the same reason it’s easier to say how expensive something is if it costs 45 dollars instead of nine five-dollar bills. Or if you’re told it was 180 quarter-dollars. The length of your basis vector is just a scaling factor. But the more factors you have to work with the more likely you are to misunderstand something.

And we tend to like basis vectors that are perpendicular to one another. They don’t have to be. But if they are then it’s easier to divide up our work. We can study each direction separately. Mathematicians tend to like techniques that let us divide problems up into smaller ones that we can study separately.

I’ve described basis sets using vectors. They have intuitive appeal. It’s easy to understand directions of things in space. But the idea carries across into other things. For example, we can build functions out of other functions. So we can choose a set of basis functions. We can multiply them by real numbers (scalars) and add them together. This makes whatever function we’re interested in into a kind of weighted average of basis functions.

Why do that? Well, again, we often study processes that change shapes and directions. If we choose a basis well, though, the process changes the basis vectors in easy to describe ways. And many interesting processes let us describe the changing of an arbitrary function as the weighted sum of the changes in the basis vectors. By solving a couple of simple problems we get the ability to solve every interesting problem.

We can even define something that works like the angle between functions. And something that works a lot like perpendicularity for functions.

And this carries on to other mathematical constructs. We look for ways to impose some order, some direction, on whatever structure we’re looking at. We’re often successful, and can work with unreal things using tools like those that let us find our place in a city.

A Leap Day 2016 Mathematics A To Z: Axiom


I had a great deal of fun last summer with an A To Z glossary of mathematics terms. To repeat a trick with some variation, I called for requests a couple weeks back. I think the requests have settled down so let me start. (However, if you’ve got a request for one of the latter alphabet letters, please let me know. There’s ten letters not yet committed.) I’m going to call this a Leap Day 2016 Mathematics A To Z to mark when it sets off. This way I’m not committed to wrapping things up before a particular season ends. On, now, to the start and the first request, this one from Elke Stangl:

Axiom.

Mathematics is built of arguments. Ideally, these are all grounded in deductive logic. These would be arguments that start from things we know to be true, and use the laws of logical inference to conclude other things that are true. We want valid arguments, ones in which every implication is based on true premises and correct inferences. In practice we accept some looseness about this, because it would just take forever to justify every single little step. But the structure is there. From some things we know to be true, deduce something we hadn’t before proven was true.

But where do we get things we know to be true? Well, we could ask the philosophy department. The question’s one of their specialties. But we might be scared of them, and they of us. After all, the mathematics department and the philosophy department are only usually both put in the College of Arts and Sciences. Sometimes philosophy is put in the College of Humanities instead. Let’s stay where we were instead.

We know to be true stuff we’ve already proved to be true. So we can use the results of arguments we’ve already finished. That’s comforting. Whatever work we, or our forerunners, have done was not in vain. But how did we know those results were true? Maybe they were the consequences of earlier stuff we knew to be true. Maybe they came from earlier valid arguments.

You see the regression problem. We don’t have anything we know to be true except the results of arguments, and the arguments depended on having something true to build from. We need to start somewhere.

The real world turns out to be a poor starting point, by the way. Oh, it’s got some good sides. Reality is useful in many ways, but it has a lot of problems to be resolved. Most things we could say about the real world are transitory: they were once untrue, became true, and will someday be false again. It’s hard to see how you can build a universal truth on a transitory foundation. And that’s even if we know what’s true in the real world. We have senses that seem to tell us things about the real world. But the philosophy department, if we eavesdrop on them, would remind us of some dreadful implications. The concept of “the real world” is hard to make precise. Even if we suppose we’ve done that, we don’t know that what we could perceive has anything to do with the real world. The folks in the psychology department and the people who study physiology reinforce the direness of the situation. Even if perceptions can tell us something relevant, and even if our senses aren’t deliberately deceived, they’re still bad at perceiving stuff. We need to start somewhere else if we want certainty.

That somewhere is the axiom. We declare some things to be a kind of basic law. Here are some thing we need not prove true; they simply are.

(Sometimes mathematicians say “postulate” instead of “axiom”. This is because some things sound better called “postulates”. Meanwhile other things sound better called “axioms”. There is no functional difference.)

Most axioms tend to be straightforward things. We tend to like having uncontroversial foundations for our arguments. It may hardly seem necessary to say “all right angles are congruent”, but how would you prove that? It may seem obvious that, given a collection of sets of things, it’s possible to select exactly one thing from each of those sets. How do you know you can?

Well, they might follow from some other axioms, by some clever enough argument. This is possible. Mathematicians consider it elegant to have as few axioms as necessary for their work. (They’re not alone, or rare, in that preference.) I think that reflects a cultural desire to say as much as possible with as little work as possible. The more things we have to assume to show a thing is true, the more likely that in a new application one of those assumptions won’t hold. And that would spoil our knowledge of that conclusion. Sometimes we can show the interesting point of one axiom could be derived from some other axiom or axioms. We might replace an axiom with these alternates if that gives us more enlightening arguments.

Sometimes people seize on this whole axiom business to argue that mathematics (and science, dragged along behind) is a kind of religion. After all, you need to have faith that some things are true. This strikes me as bad theology and poor mathematics. The most obvious difference between an article of faith and an axiom must be that axioms are voluntary. They are things you assume to be true because you expect them to enlighten something you wish to study. If they don’t, you’re free to try other axioms.

The axiom I mentioned three paragraphs back, about selecting exactly one thing from each of a collection of sets? That’s known as the Axiom of Choice. It’s used in the theory of sets. But you don’t have to assume it’s true. Much of set theory stands independent of it. Many set theorists go about their work neither committing to the idea that it’s true or that it’s false.

What makes a good set of axioms is rather like what makes a good set of rules for a sport. You do want to have a set that’s reasonably clear. You want them to provide for many interesting consequences. You want them to not have any contradictions. (You settle for them having no contradictions anyone’s found or suspects.) You want them to have as few ambiguities as possible. What makes up that set may evolve as the field, or as the sport, evolves. People do things that weren’t originally thought about. People get more experience and more perspective on the way the rules are laid out. People notice they had been assuming something without stating it. We revise and, we hope, improve the foundations with time.

There’s no guarantee that every set of axioms will produce something interesting. Well, you wouldn’t expect to necessarily get a playable game by throwing together some random collection of rules from several different sports, either. Most mathematicians stick to familiar groups of axioms, for the same reason most athletes stick to sports they didn’t make up. We know from long experience that this set will give us an interesting geometry, or calculus, or topology, or so on.

There’ll never be a standard universal set of axioms covering all mathematics. There are different sets of axioms that directly contradict each other but that are, to the best of our knowledge, internally self-consistent. The axioms that describe geometry on a flat surface, like a map, are inconsistent with those that describe geometry on a curved surface, like a globe. We need both maps and globes. So we have both flat and curved geometries, and we decide what kind fits the work we want to do.

And there’ll never be a complete list of axioms for any interesting field, either. One of the unsettling discoveries of 20th Century logic was of incompleteness. Any set of axioms interesting enough to cover the ability to do arithmetic will have statements that would be meaningful, but that can’t be proven true or false. We might add some of these undecidable things to the set of axioms, if they seem useful. But we’ll always have other things not provably true or provably false.