I was all set to say how complaining about GoComics.com’s pages not loading had gotten them fixed. But they only worked for Monday alone; today they’re broken again. Right. I haven’t tried sending an error report again; we’ll see if that works. Meanwhile, I’m still not through last week’s comic strips and I had just enough for one day to nearly enough justify an installment for the one day. Should finish off the rest of the week next essay, probably in time for next week.

Mark Leiknes’s Cow and Boy rerun for the 23rd circles around some of Zeno’s Paradoxes. At the heart of some of them is the question of whether a thing can be divided infinitely many times, or whether there must be some smallest amount of a thing. Zeno wonders about space and time, but you can do as well with substance, with matter. Mathematics majors like to say the problem is easy; Zeno just didn’t realize that a sum of infinitely many things could be a finite and nonzero number. This misses the good question of how the sum of infinitely many things, none of which are zero, can be anything but infinitely large? Or, put another way, what’s different in adding $\frac11 + \frac12 + \frac13 + \frac14 + \cdots$ and adding $\frac11 + \frac14 + \frac19 + \frac{1}{16} + \cdots$ that the one is infinitely large and the other not?

Or how about this. Pick your favorite string of digits. 23. 314. 271828. Whatever. Add together the series $\frac11 + \frac12 + \frac13 + \frac14 + \cdots$except that you omit any terms that have your favorite string there. So, if you picked 23, don’t add $\frac{1}{23}$, or $\frac{1}{123}$, or $\frac{1}{802301}$ or such. That depleted series does converge. The heck is happening there? (Here’s why it’s true for a single digit being thrown out. Showing it’s true for longer strings of digits takes more work but not really different work.)

J C Duffy’s Lug Nuts for the 23rd is, I think, the first time I have to give a content warning for one of these. It’s a porn-movie advertisement spoof. But it mentions Einstein and Pi and has the tagline “she didn’t go for eggheads … until he showed her a new equation!”. So, you know, it’s using mathematics skill as a signifier of intelligence and riffing on the idea that nerds like sex too.

John Graziano’s Ripley’s Believe It or Not for the 23rd has a trivia that made me initially think “not”. It notes Vince Parker, Senior and Junior, of Alabama were both born on Leap Day, the 29th of February. I’ll accept this without further proof because of the very slight harm that would befall me were I to accept this wrongly. But it also asserted this was a 1-in-2.1-million chance. That sounded wrong. Whether it is depends on what you think the chance is of.

Because what’s the remarkable thing here? That a father and son have the same birthday? Surely the chance of that is 1 in 365. The father could be born any day of the year; the son, also any day. Trusting there’s no influence of the father’s birthday on the son’s, then, 1 in 365 it is. Or, well, 1 in about 365.25, since there are leap days. There’s approximately one leap day every four years, so, surely that, right?

And not quite. In four years there’ll be 1,461 days. Four of them will be the 29th of January and four the 29th of September and four the 29th of August and so on. So if the father was born any day but leap day (a “non-bissextile day”, if you want to use a word that starts a good fight in a Scrabble match), the chance the son’s birth is the same is 4 chances in 1,461. 1 in 365.25. If the father was born on Leap Day, then the chance the son was born the same day is only 1 chance in 1,461. Still way short of 1-in-2.1-million. So, Graziano’s Ripley’s is wrong if that’s the chance we’re looking at.

Ah, but what if we’re looking at a different chance? What if we’re looking for the chance that the father is born the 29th of February and the son is also born the 29th of February? There’s a 1-in-1,461 chance the father’s born on Leap Day. And a 1-in-1,461 chance the son’s born on Leap Day. And if those events are independent, the father’s birth date not influencing the son’s, then the chance of both those together is indeed 1 in 2,134,521. So Graziano’s Ripley’s is right if that’s the chance we’re looking at.

Which is a good reminder: if you want to work out the probability of some event, work out precisely what the event is. Ordinary language is ambiguous. This is usually a good thing. But it’s fatal to discussing probability questions sensibly.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 23rd presents his mathematician discovering a new set of numbers. This will happen. Mathematics has had great success, historically, finding new sets of things that look only a bit like numbers were understood. And showing that if they follow rules that are, as much as possible, like the old numbers, we get useful stuff out of them. The mathematician claims to be a formalist, in the punch line. This is a philosophy that considers mathematical results to be the things you get by starting with some symbols and some rules for manipulating them. What this stuff means, and whether it reflects anything of interest in the real world, isn’t of interest. We can know the results are good because they follow the rules.

This sort of approach can be fruitful. It can force you to accept results that are true but intuition-defying. And it can give results impressive confidence. You can even, at least in principle, automate the creating and the checking of logical proofs. The disadvantages are that it takes forever to get anything done. And it’s hard to shake the idea that we ought to have some idea what any of this stuff means.

## What I Learned Doing The Leap Day 2016 Mathematics A To Z

The biggest thing I learned in the recently concluded mathematics glossary is that continued fractions have enthusiasts. I hadn’t intended to cause controversy when I claimed they weren’t much used anymore. The most I have grounds to say is that the United States educational process as I experienced it doesn’t use them for more than a few special purposes. There is a general lesson there. While my experience may be typical, that doesn’t mean everyone’s is like it. There is a mystery to learn from in that.

The next big thing I learned was the Kullbach-Leibler Divergence. I’m glad to know it now. And I would not have known it, I imagine, if it weren’t for my trying something novel and getting a fine result from it. That was throwing open the A To Z glossary to requests from readers. At least half the terms were ones that someone reading my original call had asked for.

And that was thrilling. It gave me a greater feeling that I was communicating with specific people than most of the things that I’ve written, is the biggest point. I understand that I have readers, and occasionally chat with some. This was a rare chance to feel engaged, though.

And getting asked things I hadn’t thought of, or in some cases hadn’t heard of, was great. It foiled the idea of two months’ worth of easy postings, but it made me look up and learn and think about a variety of things. And also to re-think them. My first drafts of the Dedekind Domain and the Kullbach-Leibler divergence essays were completely scrapped, and the Jacobian made it through only with a lot of rewriting. I’ve been inclined to write with few equations and even fewer drawings around here. Part of that’s to be less intimidating. Part of that’s because of laziness. Some stuff is wonderfully easy to express in a sketch, but transferring that to a digital form is the heavy work of getting out the scanner and plugging it in. Or drawing from scratch on my iPad. Cleaning it up is even more work. So better to spend a thousand extra words on the setup.

But that seemed to work! I’m especially surprised that the Jacobian and the Lagrangian essays seemed to make sense without pictures or equations. Homomorphisms and isomorphisms were only a bit less surprising. I feel like I’ve been writing better thanks to this.

I do figure on another A To Z for sometime this summer. Perhaps I should open nominations already, and with a better-organized scheme for knocking out letters. Some people were disappointed (I suppose) by picking letters that had already got assigned. And I could certainly use time and help finding more x- and y-words. Q isn’t an easy one either.

## A Leap Day 2016 Mathematics A To Z: The Roundup

And with the conclusion of the alphabet I move now into posting about each of the counting numbers. … No, wait, that’s already being done. But I should gather together the A To Z posts in order that it’s easier to find them later on.

I mean to put together some thoughts about this A To Z. I haven’t had time yet. I can say that it’s been a lot of fun to write, even if after the first two weeks I was never as far ahead of deadline as I hoped to be. I do expect to run another one of these, although I don’t know when that will be. After I’ve had some chance to recuperate, though. It’s fun going two months without missing a day’s posting on my mathematics blog. But it’s also work and who wants that?

## A Leap Day 2016 Mathematics A To Z: Z-score

And we come to the last of the Leap Day 2016 Mathematics A To Z series! Z is a richer letter than x or y, but it’s still not so rich as you might expect. This is why I’m using a term that everybody figured I’d use the last time around, when I went with z-transforms instead.

## Z-Score

You get an exam back. You get an 83. Did you do well?

Hard to say. It depends on so much. If you expected to barely pass and maybe get as high as a 70, then you’ve done well. If you took the Preliminary SAT, with a composite score that ranges from 60 to 240, an 83 is catastrophic. If the instructor gave an easy test, you maybe scored right in the middle of the pack. If the instructor sees tests as a way to weed out the undeserving, you maybe had the best score in the class. It’s impossible to say whether you did well without context.

The z-score is a way to provide that context. It draws that context by comparing a single score to all the other values. And underlying that comparison is the assumption that whatever it is we’re measuring fits a pattern. Usually it does. The pattern we suppose stuff we measure will fit is the Normal Distribution. Sometimes it’s called the Standard Distribution. Sometimes it’s called the Standard Normal Distribution, so that you know we mean business. Sometimes it’s called the Gaussian Distribution. I wouldn’t rule out someone writing the Gaussian Normal Distribution. It’s also called the bell curve distribution. As the names suggest by throwing around “normal” and “standard” so much, it shows up everywhere.

A normal distribution means that whatever it is we’re measuring follows some rules. One is that there’s a well-defined arithmetic mean of all the possible results. And that arithmetic mean is the most common value to turn up. That’s called the mode. Also, this arithmetic mean, and mode, is also the median value. There’s as many data points less than it as there are greater than it. Most of the data values are pretty close to the mean/mode/median value. There’s some more as you get farther from this mean. But the number of data values far away from it are pretty tiny. You can, in principle, get a value that’s way far away from the mean, but it’s unlikely.

We call this standard because it might as well be. Measure anything that varies at all. Draw a chart with the horizontal axis all the values you could measure. The vertical axis is how many times each of those values comes up. It’ll be a standard distribution uncannily often. The standard distribution appears when the thing we measure satisfies some quite common conditions. Almost everything satisfies them, or nearly satisfies them. So we see bell curves so often when we plot how frequently data points come up. It’s easy to forget that not everything is a bell curve.

The normal distribution has a mean, and median, and mode, of 0. It’s tidy that way. And it has a standard deviation of exactly 1. The standard deviation is a way of measuring how spread out the bell curve is. About 95 percent of all observed results are less than two standard deviations away from the mean. About 99 percent of all observed results are less than three standard deviations away. 99.9997 percent of all observed results are less than six standard deviations away. That last might sound familiar to those who’ve worked in manufacturing. At least it des once you know that the Greek letter sigma is the common shorthand for a standard deviation. “Six Sigma” is a quality-control approach. It’s meant to make sure one understands all the factors that influence a product and controls them. This is so the product falls outside the design specifications only 0.0003 percent of the time.

This is the normal distribution. It has a standard deviation of 1 and a mean of 0, by definition. And then people using statistics go and muddle the definition. It is always so, with the stuff people actually use. Forgive them. It doesn’t really change the shape of the curve if we scale it, so that the standard deviation is, say, two, or ten, or π, or any positive number. It just changes where the tick marks are on the x-axis of our plot. And it doesn’t really change the shape of the curve if we translate it, adding (or subtracting) some number to it. That makes the mean, oh, 80. Or -15. Or eπ. Or some other number. That just changes what value we write underneath the tick marks on the plot’s x-axis. We can find a scaling and translation of the normal distribution that fits whatever data we’re observing.

When we find the z-score for a particular data point we’re undoing this translation and scaling. We figure out what number on the standard distribution maps onto the original data set’s value. About two-thirds of all data points are going to have z-scores between -1 and 1. About nineteen out of twenty will have z-scores between -2 and 2. About 99 out of 100 will have z-scores between -3 and 3. If we don’t see this, and we have a lot of data points, then that’s suggests our data isn’t normally distributed.

I don’t know why the letter ‘z’ is used for this instead of, say, ‘y’ or ‘w’ or something else. ‘x’ is out, I imagine, because we use that for the original data. And ‘y’ is a natural pick for a second measured variable. z’, I expect, is just far enough from ‘x’ it isn’t needed for some more urgent duty, while being close enough to ‘x’ to suggest it’s some measured thing.

The z-score gives us a way to compare how interesting or unusual scores are. If the exam on which we got an 83 has a mean of, say, 74, and a standard deviation of 5, then we can say this 83 is a pretty solid score. If it has a mean of 78 and a standard deviation of 10, then the score is better-than-average but not exceptional. If the exam has a mean of 70 and a standard deviation of 4, then the score is fantastic. We get to meaningfully compare scores from the measurements of different things. And so it’s one of the tools with which statisticians build their work.

## A Leap Day 2016 Mathematics A To Z: Yukawa Potential

Yeah, ‘Y’ is a lousy letter in the Mathematics Glossary. I have a half-dozen mathematics books on the shelf by my computer. Some is semi-popular stuff like Richard Courant and Herbert Robbins’s What Is Mathematics? (the Ian Stewart revision). Some is fairly technical stuff, by which I mean Hidetoshi Nishimori’s Statistical Physics of Spin Glasses and Information Processing. There’s just no ‘Y’ terms in any of them worth anything. But I can rope something into the field. For example …

## Yukawa Potential

When you as a physics undergraduate first take mechanics it’s mostly about very simple objects doing things according to one rule. The objects are usually these indivisible chunks. They’re either perfectly solid or they’re points, too tiny to have a surface area or volume that might mess things up. We draw them as circles or as blocks because they’re too hard to see on the paper or board otherwise. We spend a little time describing how they fall in a room. This lends itself to demonstrations in which the instructor drops a rubber ball. Then we go on to a mass on a spring hanging from the ceiling. Then to a mass on a spring hanging to another mass.

Then we go onto two things sliding on a surface and colliding, which would really lend itself to bouncing pool balls against one another. Instead we use smaller solid balls. Sometimes those “Newton’s Cradle” things with the five balls that dangle from wires and just barely touch each other. They give a good reason to start talking about vectors. I mean positional vectors, the ones that say “stuff moving this much in this direction”. Normal vectors, that is. Then we get into stars and planets and moons attracting each other by gravity. And then we get into the stuff that really needs calculus. The earlier stuff is helped by it, yes. It’s just by this point we can’t do without.

The “things colliding” and “balls dropped in a room” are the odd cases in this. Most of the interesting stuff in an introduction to mechanics course is about things attracting, or repelling, other things. And, particularly, they’re particles that interact by “central forces”. Their attraction or repulsion is along the line that connects the two particles. (Impossible for a force to do otherwise? Just wait until Intro to Mechanics II, when magnetism gets in the game. After that, somewhere in a fluid dynamics course, you’ll see how a vortex interacts with another vortex.) The potential energies for these all vary with distance between the points.

Yeah, they also depend on the mass, or charge, or some kind of strength-constant for the points. They also depend on some universal constant for the strength of the interacting force. But those are, well, constant. If you move the particles closer together or farther apart the potential changes just by how much you moved them, nothing else.

Particles hooked together by a spring have a potential that looks like $\frac{1}{2}k r^2$. Here ‘r’ is how far the particles are from each other. ‘k’ is the spring constant; it’s just how strong the spring is. The one-half makes some other stuff neater. It doesn’t do anything much for us here. A particle attracted by another gravitationally has a potential that looks like $-G M \frac{1}{r}$. Again ‘r’ is how far the particles are from each other. ‘G’ is the gravitational constant of the universe. ‘M’ is the mass of the other particle. (The particle’s own mass doesn’t enter into it.) The electric potential looks like the gravitational potential but we have different symbols for stuff besides the $\frac{1}{r}$ bit.

The spring potential and the gravitational/electric potential have an interesting property. You can have “closed orbits” with a pair of them. You can set a particle orbiting another and, with time, get back to exactly the original positions and velocities. (Three or more particles you’re not guaranteed of anything.) The curious thing is this doesn’t always happen for potentials that look like “something or other times r to a power”. In fact, it never happens, except for the spring potential, the gravitational/electric potential, and — peculiarly — for the potential $k r^7$. ‘k’ doesn’t mean anything there, and we don’t put a one-seventh or anything out front for convenience, because nobody knows anything that needs anything like that, ever. We can have stable orbits, ones that stay within a minimum and a maximum radius, for a potential $k r^n$ whenever n is larger than -2, at least. And that’s it, for potentials that are nothing but r-to-a-power.

Ah, but does the potential have to be r-to-a-power? And here we see Dr Hideki Yukawa’s potential energy. Like these springs and gravitational/electric potentials, it varies only with the distance between particles. Its strength isn’t just the radius to a power, though. It uses a more complicated expression:

$-K \frac{e^{-br}}{r}$

Here ‘K’ is a scaling constant for the strength of the whole force. It’s the kind of thing we have ‘G M’ for in the gravitational potential, or ‘k’ in the spring potential. The ‘b’ is a second kind of scaling. And that a kind of range. A range of what? It’ll help to look at this potential rewritten a little. It’s the same as $-\left(K \frac{1}{r}\right) \cdot \left(e^{-br}\right)$. That’s the gravitational/electric potential, times e-br. That’s a number that will be very large as r is small, but will drop to zero surprisingly quickly as r gets larger. How quickly will depend on b. The larger a number b is, the faster this drops to zero. The smaller a number b is, the slower this drops to zero. And if b is equal to zero, then e-br is equal to 1, and we have the gravitational/electric potential all over again.

Yukawa introduced this potential to physics in the 1930s. He was trying to model the forces which keep an atom’s nucleus together. It represents the potential we expect from particles that attract one another by exchanging some particles with a rest mass. This rest mass is hidden within that number ‘b’ there. If the rest mass is zero, the particles are exchanging something like light, and that’s just what we expect for the electric potential. For the gravitational potential … um. It’s complicated. It’s one of the reasons why we expect that gravitons, if they exist, have zero rest mass. But we don’t know that gravitons exist. We have a lot of trouble making theoretical gravitons and quantum mechanics work together. I’d rather be skeptical of the things until we need them.

Still, the Yukawa potential is an interesting mathematical creature even if we ignore its important role in modern physics. When I took my Introduction to Mechanics final one of the exam problems was deriving the equivalent of Kepler’s Laws of Motion for the Yukawa Potential. I thought then it was a brilliant problem. I still do. It struck me while writing this that I don’t remember whether it allows for closed orbits, except when b is zero. I’m a bit afraid to try to work out whether it does, lest I learn that I can’t follow the reasoning for that anymore. That would be a terrible thing to learn.

## A Leap Day 2016 Mathematics A To Z: X-Intercept

Oh, x- and y-, why are you so poor in mathematics terms? I brave my way.

## X-Intercept.

I did not get much out of my eighth-grade, pre-algebra, class. I didn’t connect with the teacher at all. There were a few little bits to get through my disinterest. One came in graphing. Not graph theory, of course, but the graphing we do in middle school and high school. That’s where we find points on the plane with coordinates that make some expression true. Two major terms kept coming up in drawing curves of lines. They’re the x-intercept and the y-intercept. They had this lovely, faintly technical, faintly science-y sound. I think the teacher emphasized a few times they were “intercepts”, not “intersects”. But it’s hard to explain to an eighth-grader why this is an important difference to make. I’m not sure I could explain it to myself.

An x-intercept is a point where the plot of a curve and the x-axis meet. So we’re assuming this is a Cartesian coordinate system, the kind marked off with a pair of lines meeting at right angles. It’s usually two-dimensional, sometimes three-dimensional. I don’t know anyone who’s worried about the x-intercept for a four-dimensional space. Even higher dimensions are right out. The thing that confused me the most, when learning this, is a small one. The x-axis is points that have a y-coordinate of zero. Not an x-coordinate of zero. So in a two-dimensional space it makes sense to describe the x-intercept as a single value. That’ll be the x-coordinate, and the point with the x-coordinate of that and the y-coordinate of zero is the intercept.

If you have an expression and you want to find an x-intercept, you need to find values of x which make the expression equal to zero. We get the idea from studying lines. There are a couple of typical representations of lines. They almost always use x for the horizontal coordinate, and y for the vertical coordinate. The names are only different if the author is making a point about the arbitrariness of variable names. Sigh at such an author and move on. An x-intercept has a y-coordinate of zero, so, set any appearance of ‘y’ in the expression equal to zero and find out what value or values of x make this true. If the expression is an equation for a line there’ll be just the one point, unless the line is horizontal. (If the line is horizontal, then either every point on the x-axis is an intercept, or else none of them are. The line is either “y equals zero”, or it is “y equals something other than zero”. )

There’s also a y-intercept. It is exactly what you’d imagine once you know that. It’s usually easier to find what the y-intercept is. The equation describing a curve is typically written in the form “y = f(x)”. That is, y is by itself on one side, and some complicated expression involving x’s is on the other. Working out what y is for a given x is straightforward. Working out what x is for a given y is … not hard, for a line. For more complicated shapes it can be difficult. There might not be a unique answer. That’s all right. There may be several x-intercepts.

There are a couple names for the x-intercepts. The one that turns up most often away from the pre-algebra and high school algebra study of lines is a “zero”. It’s one of those bits in which mathematicians seem to be trying to make it hard for students. A “zero” of the function f(x) is generally not what you get when you evaluate it for x equalling zero. Sorry about that. It’s the values of x for which f(x) equals zero. We also call them “roots”.

OK, but who cares?

Well, if you want to understand the shape of a curve, the way a function looks, it helps to plot it. Today, yeah, pull up Mathematica or Matlab or Octave or some other program and you get your plot. Fair enough. If you don’t have a computer that can plot like that, the way I did in middle school, you have to do it by hand. And then the intercepts are clues to how to sketch the function. They are, relatively, easy points which you can find, and which you know must be on the curve. We may form a very rough sketch of the curve. But that rough picture may be better than having nothing.

And we can learn about the behavior of functions even without plotting, or sketching a plot. Intercepts of expressions, or of parts of expressions, are points where the value might change from positive to negative. If the denominator of a part of the expression has an x-intercept, this could be a point where the function’s value is undefined. It may be a discontinuity in the function. The function’s values might jump wildly between one side and another. These are often the important things about understanding functions. Where are they positive? Where are they negative? Where are they continuous? Where are they not?

These are things we often want to know about functions. And we learn many of them by looking for the intercepts, x- and y-.

Wait for it.

## Wlog.

I’d like to say a good word for boredom. It needs the good words. The emotional state has an appalling reputation. We think it’s the sad state someone’s in when they can’t find anything interesting. It’s not. It’s the state in which we are so desperate for engagement that anything is interesting enough.

And that isn’t a bad thing! Finding something interesting enough is a precursor to noticing something curious. And curiosity is a precursor to discovery. And discovery is a precursor to seeing a fuller richness of the world.

Think of being stuck in a waiting room, deprived of reading materials or a phone to play with or much of anything to do. But there is a clock. Your classic analog-face clock. Its long minute hand sweeps out the full 360 degrees of the circle once every hour, 24 times a day. Its short hour hand sweeps out that same arc every twelve hours, only twice a day. Why is the big unit of time marked with the short hand? Good question, I don’t know. Probably, ultimately, because it changes so much less than the minute hand that it doesn’t need the attention of length drawn to it.

But let our waiting mathematician get a little more bored, and think more about the clock. The hour and minute hand must sometimes point in the same direction. They do at 12:00 by the clock, for example. And they will at … a little bit past 1:00, and a little more past 2:00, and a good while after 9:00, and so on. How many times during the day will they point the same direction?

Well, one easy way to do this is to work out how long it takes the hands, once they’ve met, to meet up again. Presumably we don’t want to wait the whole hour-and-some-more-time for it. But how long is that? Well, we know the hands start out pointing the same direction at 12:00. The first time after that will be after 1:00. At exactly 1:00 the hour hand is 30 degrees clockwise of the minute hand. The minute hand will need five minutes to catch up to that. In those five minutes the hour hand will have moved another 2.5 degrees clockwise. The minute hand needs about four-tenths of a minute to catch up to that. In that time the hour hand moves — OK, we’re starting to see why Zeno was not an idiot. He never was.

But we have this roughly worked out. It’s about one hour, five and a half minutes between one time the hands meet and the next. In the course of twelve hours there’ll be time for them to meet up … oh, of course, eleven times. Over the course of the day they’ll meet up 22 times and we can get into a fight over whether midnight counts as part of today, tomorrow, or both days, or neither. (The answer: pretend the day starts at 12:01.)

Hold on, though. How do we know that the time between the hands meeting up at 12:00 and the one at about 1:05 is the same as the time between the hands meeting up near 1:05 and the next one, sometime a little after 2:10? Or between that one and the one at a little past 3:15? What grounds do we have for saying this one interval is a fair representation of them all?

We can argue that it should be fairly enough. Imagine that all the markings were washed off the clock. It’s just two hands sweeping around in circles, one relatively fast, one relatively slow, forever. Give the clockface a spin. When the hands come together again rotate the clock so those two hands are vertical, the “12:00” position. Is this actually 12:00? … Well, we’ve got a one-in-eleven chance it is. It might be a little past 1:05; it might be that time something past 6:30. The movement of the clock hands gives no hint what time it really is.

And that is why we’re justified taking this one interval as representative of them all. The rate at which the hands move, relative to each other, doesn’t depend on what the clock face behind it says. The rate is, if the clock isn’t broken, always the same. So we can use information about one special case that happens to be easy to work out to handle all the cases.

That’s the mathematics term for this essay. We can study the one specific case without loss of generality, or as it’s inevitably abbreviated, wlog. This is the trick of studying something possibly complicated, possibly abstract, by looking for a representative case. That representative case may tell us everything we need to know, at least about this particular problem. Generality means what you might figure from the ordinary English meaning of it: it means this answer holds in general, as opposed to in this specific instance.

Some thought has to go in to choosing the representative case. We have to pick something that doesn’t, somehow, miss out on a class of problems we would want to solve. We mustn’t lose the generality. And it’s an easy mistake to make, especially as a mathematics student first venturing into more abstract waters. I remember coming up against that often when trying to prove properties of infinitely long series. It’s so hard to reason something about a bunch of numbers whose identities I have no idea about; why can’t I just use the sequence, oh, 1/1, 1/2, 1/3, 1/4, et cetera and let that be good enough? Maybe 1/1, 1/4, 1/9, 1/16, et cetera for a second test, just in case? It’s because it takes time to learn how to safely handle infinities.

It’s still worth doing. Few of us are good at manipulating things in the abstract. We have to spend more mental energy imagining the thing rather than asking the questions we want of it. Reducing that abstraction — even if it’s just a little bit, changing, say, from “an infinitely-differentiable function” to “a polynomial of high enough degree” — can rescue us. We can try out things we’re confident we understand, and derive from it things we don’t know.

I can’t say that a bored person observing a clock would deduce all this. Parts of it, certainly. Maybe all, if she thought long enough. I believe it’s worth noticing and thinking of these kinds of things. And it’s why I believe it’s fine to be bored sometimes.

## Vector.

A vector’s a thing you can multiply by a number and then add to another vector.

Oh, I know what you’re thinking. Wasn’t a vector one of those things that points somewhere? A direction and a length in that direction? (Maybe dressed up in more formal language. I’m glad to see that apparently New Jersey Tech’s student newspaper is still The Vector and still uses the motto “With Magnitude And Direction’.) Yeah, that’s how we’re always introduced to it. Pointing to stuff is a good introduction to vectors. Nearly everyone finds their way around places. And it’s a good learning model, to learn how to multiply vectors by numbers and to add vectors together.

But thinking too much about directions, either in real-world three-dimensional space, or in the two-dimensional space of the thing we’re writing notes on, can be limiting. We can get too hung up on a particular representation of a vector. Usually that’s an ordered set of numbers. That’s all right as far as it goes, but why limit ourselves? A particular representation can be easy to understand, but as the scary people in the philosophy department have been pointing out for 26 centuries now, a particular example of a thing and the thing are not identical.

And if we look at vectors as “things we can multiply by a number, then add another vector to”, then we see something grand. We see a commonality in many different kinds of things. We can do this multiply-and-add with those things that point somewhere. Call those coordinates. But we can also do this with matrices, grids of numbers or other stuff it’s convenient to have. We can also do this with ordinary old numbers. (Think about it.) We can do this with polynomials. We can do this with sets of linear equations. We can do this with functions, as long as they’re defined for compatible domains. We can even do this with differential equations. We can see a unity in things that seem, at first, to have nothing to do with one another.

We call these collections of things “vector spaces”. It’s a space much like the space you happen to exist in is. Adding two things in the space together is much like moving from one place to another, then moving again. You can’t get out of the space. Multiplying a thing in the space by a real number is like going in one direction a short or a long or whatever great distance you want. Again you can’t get out of the space. This is called “being closed”.

(I know, you may be wondering if it isn’t question-begging to say a vector is a thing in a vector space, which is made up of vectors. It isn’t. We define a vector space as a set of things that satisfy a certain group of rules. The things in that set are the vectors.)

Vector spaces are nice things. They work much like ordinary space does. We can bring many of the ideas we know from spatial awareness to vector spaces. For example, we can usually define a “length” of things. And something that works like the “angle” between things. We can define bases, breaking down a particular element into a combination of standard reference elements. This helps us solve problems, by finding ways they’re shadows of things we already know how to solve. And it doesn’t take much to satisfy the rules of being a vector space. I think mathematicians studying new groups of objects look instinctively for how we might organize them into a vector space.

We can organize them further. A vector space that satisfies some rules about sequences of terms, and that has a “norm” which is pretty much a size, becomes a Banach space. It works a little more like ordinary three-dimensional space. A Banach space that has a norm defined by a certain common method is a Hilbert space. These work even more like ordinary space, but they don’t need anything in common with it. For example, the functions that describe quantum mechanics are in a Hilbert space. There’s a thing called a Sobolev Space, a kind of vector space that also meets criteria I forget, but the name has stuck with me for decades because it is so wonderfully assonant.

I mentioned how vectors are stuff you can multiply by numbers, and add to other vectors. That’s true, but it’s a little limiting. The thing we multiply a vector by is called a scalar. And the scalar is a number — real or complex-valued — so often it’s easy to think that’s the default. But it doesn’t have to be. The scalar just has to be an element of some field. A ‘field’ is a ring that you can do addition, multiplication, and division on. So numbers are the obvious choice. They’re not the only ones, though. The scalar has to be able to multiply with the vector, since otherwise the entire concept collapses into gibberish. But we wouldn’t go looking among the gibberish except to be funny anyway.

The idea of the ‘vector’ is straightforward and powerful. So we see it all over a wide swath of mathematics. It’s one of the things that shapes how we expect mathematics to look.

## A Leap Day 2016 Mathematics A To Z: Uncountable

I’m drawing closer to the end of the alphabet. While I have got choices for ‘V’ and ‘W’ set, I’ll admit that I’m still looking for something that inspires me in the last couple letters. Such inspiration might come from anywhere. HowardAt58, of that WordPress blog, gave me the notion for today’s entry.

## Uncountable.

What are we doing when we count things?

Maybe nothing. We might be counting just to be doing something. Or we might be counting because we want to do nothing. Counting can be a good way into a restful state. Fair enough. Just because we do something doesn’t mean we care about the result.

Suppose we do care about the result of our counting. Then what is it we do when we count? The mechanism is straightforward enough. We pick out things and say, or imagine saying, “one, two, three, four,” and so on. Or we at least imagine the numbers along with the things being numbered. When we run out of things to count, we take whatever the last number was. That’s how many of the things there were. Why are there eight light bulbs in the chandelier fixture above the dining room table? Because there are not nine.

That’s how lay people count anyway. Mathematicians would naturally have a more sophisticated view of the business. A much more powerful counting scheme. Concepts in counting that go far beyond what you might work out in first grade.

Yeah, so that’s what most of us would figure. Things don’t get much more sophisticated than that, though. This probably is because the idea of counting is tied to the theory of sets. And the theory of sets grew, in part, to come up with a logically solid base for arithmetic. So many of the key ideas of set theory are so straightforward they hardly seem to need explaining.

We build the idea of “countable” off of the nice, familiar numbers 1, 2, 3, and so on. That set’s called the counting numbers. They’re the numbers that everybody seems to recognize as numbers. Not just people. Even animals seem to understand at least the first couple of counting numbers. Sometimes these are called the natural numbers.

Take a set of things we want to study. We’re interested in whether we can match the things in that set one-to-one with the things in the counting numbers. We don’t have to use all the counting numbers. But we can’t use the same counting number twice. If we’ve matched one chandelier light bulb with the number ‘4’, we mustn’t match a different bulb with the same number. Similarly, if we’ve got the number ‘4’ matched to one bulb, we mustn’t match ‘4’ with another bulb at the same time.

If we can do this, then our set’s countable. If we really wanted, we could pick the counting numbers in order, starting from 1, and match up all the things with counting numbers. If we run out of things, then we have a finitely large set. The last number we used to match anything up with anything is the size, or in the jargon, the cardinality of our set. We might not care about the cardinality, just whether the set is finite. Then we can pick counting numbers as we like in no particular order. Just use whatever’s convenient.

But what if we don’t run out of things? And it’s possible we won’t. Suppose our set is the negative whole numbers: -1, -2, -3, -4, -5, and so on. We can match each of those to a counting number many ways. We always can. But there’s an easy way. Match -1 to 1, match -2 to 2, match -3 to 3, and so on. Why work harder than that? We aren’t going to run out of negative whole numbers. And we aren’t going to find any we can’t match with some counting number. And we aren’t going to have to match two different negative numbers to the same counting number. So what we have here is an infinitely large, yet still countable, set.

So a set of things can be countable and finite. It can be countable and infinite. What else is there to be?

There must be something. It’d be peculiar to have a classification that everything was in, after all. At least it would be peculiar except for people studying what it means to exist or to not exist. And most of those people are in the philosophy department, where we’re scared of visiting. So we must mean there’s some such thing as an uncountable set.

The idea means just what you’d guess if you didn’t know enough mathematics to be tricky. Something is uncountable if it can’t be counted. It can’t be counted if there’s no way to match it up, one thing-to-one thing, with the counting numbers. We have to somehow run out of counting numbers.

It’s not obvious that we can do that. Some promising approaches don’t work. For example, the set of all the integers — 1, 2, 3, 4, 5, and all that, and 0, and the negative numbers -1, -2, -3, -4, -5, and so on — is still countable. Match the counting number 1 to 0. Match the counting number 2 to 1. Match the counting number 3 to -1. Match 4 to 2. Match 5 to -2. Match 6 to 3. Match 7 to -3. And so on.

Even ordered pair of the counting numbers don’t do it. We can match the counting number 1 to the pair (1, 1). Match the counting number 2 to the pair (2, 1). Match the counting number 3 to (1, 2). Match 4 to (3, 1). Match 5 to (2, 2). Match 6 to (1, 3). Match 7 to (4, 1). Match 8 to (3, 2). And so on. We can achieve similar staggering results with ordered triplets, quadruplets, and more. Ordered pairs of integers, positive and negative? Longer to do, yes, but just as doable.

So are there any uncountable things?

Sure. Wouldn’t be here if there weren’t. For example: think about the set that’s all the ways to pick things from a set. I sense your confusion. Let me give you an example. Suppose we have the set of three things. They’re the numbers 1, 2, and 3. We can make a bunch of sets out of things from this set. We can make the set that just has ‘1’ in it. We can make the set that just has ‘2’ in it. Or the set that just has ‘3’ in it. We can also make the set that has just ‘1’ and ‘2’ in it. Or the set that just has ‘2’ and 3′ in it. Or the set that just has ‘3’ and ‘1’ in it. Or the set that has all of ‘1’, ‘2’, and ‘3’ in it. And we can make the set that hasn’t got any of these in it. (Yes, that does too count as a set.)

So from a set of three things, we were able to make a collection of eight sets. If we had a set of four things, we’d be able to make a collection of sixteen sets. With five things to start from, we’d be able to make a collection of thirty-two sets. This collection of sets we call the “power set” of our original set, and if there’s one thing we can say about it, it’s that it’s bigger than the set we start from.

The power set for a finite set, well, that’ll be much bigger. But it’ll still be finite. Still be countable. What about the power set for an infinitely large set?

And the power set of the counting numbers, the collection of all the ways you can make a set of counting numbers, is really big. Is it uncountably big?

Let’s step back. Remember when I said mathematicians don’t get “much more” sophisticated than matching up things to the counting numbers? Here’s a little bit of that sophistication. We don’t have to match stuff up to counting numbers if we like. We can match the things in one set to the things in another set. If it’s possible to match them up one-to-one, with nothing missing in either set, then the two sets have to be the same size. The same cardinality, in the jargon.

So. The set of the numbers 1, 2, 3, has to have a smaller cardinality than its power set. Want to prove it? Do this exactly the way you imagine. You run out of things in the original set before you run out of things in the power set, so there’s no making a one-to-one matchup between the two.

With the infinitely large yet countable set of the counting numbers … well, the same result holds. It’s harder to prove. You have to show that there’s no possible way to match the infinitely many things in the counting numbers to the infinitely many things in the power set of the counting numbers. (The easiest way to do this is by contradiction. Imagine that you have made such a matchup, pairing everything in your power set to everything in the counting numbers. Then you go through your matchup and put together a collection that isn’t accounted for. Whoops! So you must not have matched everything up in the first place. Why not? Because you can’t.)

But the result holds. The power set of the counting numbers is some other set. It’s infinitely large, yes. And it’s so infinitely large that it’s somehow bigger than the counting numbers. It is uncountable.

There’s more than one uncountably large set. Of course there are. We even know of some of them. For example, there’s the set of real numbers. Three-quarters of my readers have been sitting anxiously for the past eight paragraphs wondering if I’d ever get to them. There’s good reason for that. Everybody feels like they know what the real numbers are. And the proof that the real numbers are a larger set than the counting numbers is easy to understand. An eight-year-old could master it. You can find that proof well-explained within the first ten posts of pretty much every mathematics blog other than this one. (I was saving the subject. Then I finally decided I couldn’t explain it any better than everyone else has done.)

Are the real numbers the same size, the same cardinality, as the power set of the counting numbers?

Sure, they are.

No, they’re not.

Whichever you like. This is one of the many surprising mathematical results of the surprising 20th century. Starting from the common set of axioms about set theory, it’s undecidable whether the set of real numbers is as big as the power set of the counting numbers. You can assume that it is. This is known as the Continuum Hypothesis. And you can do fine mathematical work with it. You can assume that it is not. This is known as the … uh … Rejecting the Continuum Hypothesis. And you can do fine mathematical work with that. What’s right depends on what work you want to do. Either is consistent with the starting hypothesis. You are free to choose either, or if you like, neither.

My understanding is that most set theory finds it more productive to suppose that they’re not the same size. I don’t know why this is. I know enough set theory to lead you to this point, but not past it.

But that the question can exist tells you something fascinating. You can take the power set of the power set of the counting numbers. And this gives you another, even vaster, uncountably large set. As enormous as the collection of all the ways to pick things out of the counting numbers is, this power set of the power set is even vaster.

We’re not done. There’s the power set of the power set of the power set of the counting numbers. And the power set of that. Much as geology teaches us to see Deep Time, and astronomy Deep Space, so power sets teach us to see Deep … something. Deep Infinity, perhaps.

## A Leap Day 2016 Mathematics A To Z: Transcendental Number

I’m down to the last seven letters in the Leap Day 2016 A To Z. It’s also the next-to-the-last of Gaurish’s requests. This was a fun one.

## Transcendental Number.

Take a huge bag and stuff all the real numbers into it. Give the bag a good solid shaking. Stir up all the numbers until they’re thoroughly mixed. Reach in and grab just the one. There you go: you’ve got a transcendental number. Enjoy!

OK, I detect some grumbling out there. The first is that you tried doing this in your head because you somehow don’t have a bag large enough to hold all the real numbers. And you imagined pulling out some number like “2” or “37” or maybe “one-half”. And you may not be exactly sure what a transcendental number is. But you’re confident the strangest number you extracted, “minus 8”, isn’t it. And you’re right. None of those are transcendental numbers.

I regret saying this, but that’s your own fault. You’re lousy at picking random numbers from your head. So am I. We all are. Don’t believe me? Think of a positive whole number. I predict you probably picked something between 1 and 10. Almost surely something between 1 and 100. Surely something less than 10,000. You didn’t even consider picking something between 10,012,002,214,473,325,937,775 and 10,012,002,214,473,325,937,785. Challenged to pick a number, people will select nice and familiar ones. The nice familiar numbers happen not to be transcendental.

I detect some secondary grumbling there. Somebody picked π. And someone else picked e. Very good. Those are transcendental numbers. They’re also nice familiar numbers, at least to people who like mathematics a lot. So they attract attention.

Still haven’t said what they are. What they are traces back, of course, to polynomials. Take a polynomial that’s got one variable, which we call ‘x’ because we don’t want to be difficult. Suppose that all the coefficients of the polynomial, the constant numbers we presumably know or could find out, are integers. What are the roots of the polynomial? That is, for what values of x is the polynomial a complicated way of writing ‘zero’?

For example, try the polynomial x2 – 6x + 5. If x = 1, then that polynomial is equal to zero. If x = 5, the polynomial’s equal to zero. Or how about the polynomial x2 + 4x + 4? That’s equal to zero if x is equal to -2. So a polynomial with integer coefficients can certainly have positive and negative integers as roots.

How about the polynomial 2x – 3? Yes, that is so a polynomial. This is almost easy. That’s equal to zero if x = 3/2. How about the polynomial (2x – 3)(4x + 5)(6x – 7)? It’s my polynomial and I want to write it so it’s easy to find the roots. That polynomial will be zero if x = 3/2, or if x = -5/4, or if x = 7/6. So a polynomial with integer coefficients can have positive and negative rational numbers as roots.

How about the polynomial x2 – 2? That’s equal to zero if x is the square root of 2, about 1.414. It’s also equal to zero if x is minus the square root of 2, about -1.414. And the square root of 2 is irrational. So we can certainly have irrational numbers as roots.

So if we can have whole numbers, and rational numbers, and irrational numbers as roots, how can there be anything else? Yes, complex numbers, I see you raising your hand there. We’re not talking about complex numbers just now. Only real numbers.

It isn’t hard to work out why we can get any whole number, positive or negative, from a polynomial with integer coefficients. Or why we can get any rational number. The irrationals, though … it turns out we can only get some of them this way. We can get square roots and cube roots and fourth roots and all that. We can get combinations of those. But we can’t get everything. There are irrational numbers that are there but that even polynomials can’t reach.

It’s all right to be surprised. It’s a surprising result. Maybe even unsettling. Transcendental numbers have something peculiar about them. The 19th Century French mathematician Joseph Liouville first proved the things must exist, in 1844. (He used continued fractions to show there must be such things.) It would be seven years later that he gave an example of one in nice, easy-to-understand decimals. This is the number 0.110 001 000 000 000 000 000 001 000 000 (et cetera). This number is zero almost everywhere. But there’s a 1 in the n-th digit past the decimal if n is the factorial of some number. That is, 1! is 1, so the 1st digit past the decimal is a 1. 2! is 2, so the 2nd digit past the decimal is a 1. 3! is 6, so the 6th digit past the decimal is a 1. 4! is 24, so the 24th digit past the decimal is a 1. The next 1 will appear in spot number 5!, which is 120. After that, 6! is 720 so we wait for the 720th digit to be 1 again.

And what is this Liouville number 0.110 001 000 000 000 000 000 001 000 000 (et cetera) used for, besides showing that a transcendental number exists? Not a thing. It’s of no other interest. And this plagued the transcendental numbers until 1873. The only examples anyone had of transcendental numbers were ones built to show that they existed. In 1873 Charles Hermite showed finally that e, the base of the natural logarithm, was transcendental. e is a much more interesting number; we have reasons to care about it. Every exponential growth or decay or oscillating process has e lurking in it somewhere. In 1882 Ferdinand von Lindemann showed that π was transcendental, and that’s an even more interesting number.

That bit about π has interesting implications. One goes back to the ancient Greeks. Is it possible, using straightedge and compass, to create a square that’s exactly the same size as a given circle? This is equivalent to saying, if I give you a line segment, can you create another line segment that’s exactly the square root of π times as long? This geometric problem is equivalent to an algebraic one. That problem: can you create a polynomial, with integer coefficients, that has the square root of π as a root? (WARNING: I’m skipping some important points for the sake of clarity. DO NOT attempt to use this to pass your thesis defense without putting those points back in.) We want the square root of π because … well, what’s the area of a square whose sides are the square root of π long? That’s right. So we start with a line segment that’s equal to the radius of the circle and we can do that, surely. Once we have the radius, can’t we make a line that’s the square root of π times the radius, and from that make a square with area exactly π times the radius squared? Since π is transcendental, then, no. We can’t. Sorry. One of the great problems of ancient mathematics, and one that still has the power to attract the casual mathematician, got its final answer in 1882.

Georg Cantor is a name even non-mathematicians might recognize. He showed there have to be some infinite sets bigger than others, and that there must be more real numbers than there are rational numbers. Four years after showing that, he proved there are as many transcendental numbers as there are real numbers.

They’re everywhere. They permeate the real numbers so much that we can understand the real numbers as the transcendental numbers plus some dust. They’re almost the dark matter of mathematics. We don’t actually know all that many of them. Wolfram MathWorld has a table listing numbers proven to be transcendental, and the fact we can list that on a single web page is remarkable. Some of them are large sets of numbers, yes, like $e^{\pi \sqrt{d}}$ for every positive whole number d. And we can infer many more from them; if π is transcendental then so is 2π, and so is 5π, and so is -20.38π, and so on. But the table of numbers proven to be irrational is still just 25 rows long.

There are even mysteries about obvious numbers. π is transcendental. So is e. We know that at least one of π times e and π plus e is transcendental. Perhaps both are. We don’t know which one is, or if both are. We don’t know whether ππ is transcendental. We don’t know whether ee is, either. Don’t even ask if πe is.

How, by the way, does this fit with my claim that everything in mathematics is polynomials? — Well, we found these numbers in the first place by looking at polynomials. The set is defined, even to this day, by how a particular kind of polynomial can’t reach them. Thinking about a particular kind of polynomial makes visible this interesting set.

## A Leap Day 2016 Mathematics A To Z: Surjective Map

Gaurish today gives me one more request for the Leap Day Mathematics A To Z. And it lets me step away from abstract algebra again, into the world of analysis and what makes functions work. It also hovers around some of my past talk about functions.

## Surjective Map.

This request echoes one of the first terms from my Summer 2015 Mathematics A To Z. Then I’d spent some time on a bijection, or a bijective map. A surjective map is a less complicated concept. But if you understood bijective maps, you picked up surjective maps along the way.

By “map”, in this context, mathematicians don’t mean those diagrams that tell you where things are and how you might get there. Of course we don’t. By a “map” we mean that we have some rule that matches things in one set to things in another. If this sounds to you like what I’ve claimed a function is then you have a good ear. A mapping and a function are pretty much different names for one another. If there’s a difference in connotation I suppose it’s that a “mapping” makes a weaker suggestion that we’re necessarily talking about numbers.

(In some areas of mathematics, a mapping means a function with some extra properties, often some kind of continuity. Don’t worry about that. Someone will tell you when you’re doing mathematics deep enough to need this care. Mind, that person will tell you by way of a snarky follow-up comment picking on some minor point. It’s nothing personal. They just want you to appreciate that they’re very smart.)

So a function, or a mapping, has three parts. One is a set called the domain. One is a set called the range. And then there’s a rule matching things in the domain to things in the range. With functions we’re so used to the domain and range being the real numbers that we often forget to mention those parts. We go on thinking “the function” is just “the rule”. But the function is all three of these pieces.

A function has to match everything in the domain to something in the range. That’s by definition. There’s no unused scraps in the domain. If it looks like there is, that’s because were being sloppy in defining the domain. Or let’s be charitable. We assumed the reader understands the domain is only the set of things that make sense. And things make sense by being matched to something in the range.

Ah, but now, the range. The range could have unused bits in it. There’s nothing that inherently limits the range to “things matched by the rule to some thing in the domain”.

By now, then, you’ve probably spotted there have to be two kinds of functions. There’s one in which the whole range is used, and there’s ones in which it’s not. Good eye. This is exactly so.

If a function only uses part of the range, if it leaves out anything, even if it’s just a single value out of infinitely many, then the function is called an “into” mapping. If you like, it takes the domain and stuffs it into the range without filling the range.

Ah, but if a function uses every scrap of the range, with nothing left out, then we have an “onto” mapping. The whole of the domain gets sent onto the whole of the range. And this is also known as a “surjective” mapping. We get the term “surjective” from Nicolas Bourbaki. Bourbaki is/was the renowned 20th century mathematics art-collective group which did so much to place rigor and intuition-free bases into mathematics.

The term pairs up with the “injective” mapping. In this, the elements in the range match up with one and only one thing in the domain. So if you know the function’s rule, then if you know a thing in the range, you also know the one and only thing in the domain matched to that. If you don’t feel very French, you might call this sort of function one-to-one. That might be a better name for saying why this kind of function is interesting.

Not every function is injective. But then not every function is surjective either. But if a function is both injective and surjective — if it’s both one-to-one and onto — then we have a bijection. It’s a mapping that can represent the way a system changes and that we know how to undo. That’s pretty comforting stuff.

If we use a mapping to describe how a process changes a system, then knowing it’s a surjective map tells us something about the process. It tells us the process makes the system settle into a subset of all the possible states. That doesn’t mean the thing is stable — that little jolts get worn down. And it doesn’t mean that the thing is settling to a fixed state. But it is a piece of information suggesting that’s possible. This may not seem like a strong conclusion. But considering how little we know about the function it’s impressive to be able to say that much.

## A Leap Day 2016 Mathematics A To Z: Riemann Sphere

To my surprise nobody requested any terms beginning with `R’ for this A To Z. So I take this free day to pick on a concept I’d imagine nobody saw coming.

## Riemann Sphere.

We need to start with the complex plane. This is just, well, a plane. All the points on the plane correspond to a complex-valued number. That’s a real number plus a real number times i. And i is one of those numbers which, squared, equals -1. It’s like the real number line, only in two directions at once.

Take that plane. Now put a sphere on it. The sphere has radius one-half. And it sits on top of the plane. Its lowest point, the south pole, sits on the origin. That’s whatever point corresponds to the number 0 + 0i, or as humans know it, “zero”.

We’re going to do something amazing with this. We’re going to make a projection, something that maps every point on the sphere to every point on the plane, and vice-versa. In other words, we can match every complex-valued number to one point on the sphere. And every point on the sphere to one complex-valued number. Here’s how.

Imagine sitting at the north pole. And imagine that you can see through the sphere. Pick any point on the plane. Look directly at it. Shine a laser beam, if that helps you pick the point out. The laser beam is going to go into the sphere — you’re squatting down to better look through the sphere — and come out somewhere on the sphere, before going on to the point in the plane. The point where the laser beam emerges? That’s the mapping of the point on the plane to the sphere.

There’s one point with an obvious match. The south pole is going to match zero. They touch, after all. Other points … it’s less obvious. But some are easy enough to work out. The equator of the sphere, for instance, is going to match all the points a distance of 1 from the origin. So it’ll have the point matching the number 1 on it. It’ll also have the point matching the number -1, and the point matching i, and the point matching -i. And some other numbers.

All the numbers that are less than 1 from the origin, in fact, will have matches somewhere in the southern hemisphere. If you don’t see why that is, draw some sketches and think about it. You’ll convince yourself. If you write down what convinced you and sprinkle the word “continuity” in here and there, you’ll convince a mathematician. (WARNING! Don’t actually try getting through your Intro to Complex Analysis class doing this. But this is what you’ll be doing.)

What about the numbers more than 1 from the origin? … Well, they all match to points on the northern hemisphere. And tell me that doesn’t stagger you. It’s one thing to match the southern hemisphere to all the points in a circle of radius 1 away from the origin. But we can match everything outside that little circle to the northern hemisphere. And it all fits in!

Not amazed enough? How about this: draw a circle on the plane. Then look at the points on the Riemann sphere that match it. That set of points? It’s also a circle. A line on the plane? That’s also a line on the sphere. (Well, it’s a geodesic. It’s the thing that looks like a line, on spheres.)

How about this? Take a pair of intersecting lines or circles in the plane. Look at what they map to. That mapping, squashed as it might be to the northern hemisphere of the sphere? The projection of the lines or circles will intersect at the same angles as the original. As much as space gets stretched out (near the south pole) or squashed down (near the north pole), angles stay intact.

OK, but besides being stunning, what good is all this?

Well, one is that it’s a good thing to learn on. Geometry gets interested in things that look, at least in places, like planes, but aren’t necessarily. These spheres are, and the way a sphere matches a plane is obvious. We can learn the tools for geometry on the Möbius strip or the Klein bottle or other exotic creations by the tools we prove out on this.

And then physics comes in, being all weird. Much of quantum mechanics makes sense if you imagine it as things on the sphere. (I admit I don’t know exactly how. I went to grad school in mathematics, not in physics, and I didn’t get to the physics side of mathematics much at that time.) The strange ways distance can get mushed up or stretched out have echoes in relativity. They’ll continue having these echoes in other efforts to explain physics as geometry, the way that string theory will.

Also important is that the sphere has a top, the north pole. That point matches … well, what? It’s got to be something infinitely far away from the origin. And this make sense. We can use this projection to make a logically coherent, sensible description of things “approaching infinity”, the way we want to when we first learn about infinitely big things. Wrapping all the complex-valued numbers to this ball makes the vast manageable.

It’s also good numerical practice. Computer simulations have problems with infinitely large things, for the obvious reason. We have a couple of tools to handle this. One is to model a really big but not infinitely large space and hope we aren’t breaking anything. One is to create a “tiling”, making the space we are able to simulate repeat itself in a perfect grid forever and ever. But recasting the problem from the infinitely large plane onto the sphere can also work. This requires some ingenuity, to be sure we do the recasting correctly, but that’s all right. If we need to run a simulation over all of space, we can often get away with doing a simulation on a sphere. And isn’t that also grand?

The Riemann named here is Bernhard Riemann, yet another of those absurdly prolific 19th century mathematicians, especially considering how young he was when he died. His name is all over the fundamentals of analysis and geometry. When you take Introduction to Calculus you get introduced pretty quickly to the Riemann Sum, which is how we first learn how to calculate integrals. It’s that guy. General relativity, and much of modern physics, is based on advanced geometries that again fall back on principles Riemann noticed or set out or described so well that we still think of them as he discovered.

## A Leap Day 2016 Mathematics A To Z: Quaternion

I’ve got another request from Gaurish today. And it’s a word I had been thinking to do anyway. When one looks for mathematical terms starting with ‘q’ this is one that stands out. I’m a little surprised I didn’t do it for last summer’s A To Z. But here it is at last:

## Quaternion.

I remember the seizing of my imagination the summer I learned imaginary numbers. If we could define a number i, so that i-squared equalled negative 1, and work out arithmetic which made sense out of that, why not do it again? Complex-valued numbers are great. Why not something more? Maybe we could also have some other non-real number. I reached deep into my imagination and picked j as its name. It could be something else. Maybe the logarithm of -1. Maybe the square root of i. Maybe something else. And maybe we could build arithmetic with a whole second other non-real number.

My hopes of this brilliant idea petered out over the summer. It’s easy to imagine a super-complex number, something that’s “1 + 2i + 3j”. And it’s easy to work out adding two super-complex numbers like this together. But multiplying them together? What should i times j be? I couldn’t solve the problem. Also I learned that we didn’t need another number to be the logarithm of -1. It would be π times i. (Or some other numbers. There’s some surprising stuff in logarithms of negative or of complex-valued numbers.) We also don’t need something special to be the square root of i, either. $\frac{1}{2}\sqrt{2} + \frac{1}{2}\sqrt{2}\imath$ will do. (So will another number.) So I shelved the project.

Even if I hadn’t given up, I wouldn’t have invented something. Not along those lines. Finer minds had done the same work and had found a way to do it. The most famous of these is the quaternions. It has a famous discovery. Sir William Rowan Hamilton — the namesake of “Hamiltonian mechanics”, so you already know what a fantastic mind he was — had a flash of insight that’s come down in the folklore and romance of mathematical history. He had the idea on the 16th of October, 1843, while walking with his wife along the Royal Canal, in Dublin, Ireland. While walking across the bridge he saw what was missing. It seems he lacked pencil and paper. He carved it into the bridge:

$i^2 = j^2 = k^2 = ijk = -1$

The bridge now has a plaque commemorating the moment. You can’t make a sensible system with two non-real numbers. But three? Three works.

And they are a mysterious three! i, j, and k are somehow not the same number. But each of them, multiplied by themselves, gives us -1. And the product of the three is -1. They are even more mysterious. To work sensibly, i times j can’t be the same thing as j times i. Instead, i times j equals minus j times i. And j times k equals minus k times j. And k times i equals minus i times k. We must give up commutivity, the idea that the order in which we multiply things doesn’t matter.

But if we’re willing to accept that the order matters, then quaternions are well-behaved things. We can add and subtract them just as we would think to do if we didn’t know they were strange constructs. If we keep the funny rules about the products of i and j and k straight, then we can multiply them as easily as we multiply polynomials together. We can even divide them. We can do all the things we do with real numbers, only with these odd sets of four real numbers.

The way they look, that pattern of 1 + 2i + 3j + 4k, makes them look a lot like vectors. And we can use them like vectors pointing to stuff in three-dimensional space. It’s not quite a comfortable fit, though. That plain old real number at the start of things seems like it ought to signify something, but it doesn’t. In practice, it doesn’t give us anything that regular old vectors don’t. And vectors allow us to ponder not just three- or maybe four-dimensional spaces, but as many as we need. You might wonder why we need more than four dimensions, even allowing for time. It’s because if we want to track a lot of interacting things, it’s surprisingly useful to put them all into one big vector in a very high-dimension space. It’s hard to draw, but the mathematics is nice. Hamiltonian mechanics, particularly, almost beg for it.

That’s not to call them useless, or even a niche interest. They do some things fantastically well. One of them is rotations. We can represent rotating a point around an arbitrary axis by an arbitrary angle as the multiplication of quaternions. There are many ways to calculate rotations. But if we need to do three-dimensional rotations this is a great one because it’s easy to understand and easier to program. And as you’d imagine, being able to calculate what rotations do is useful in all sorts of applications.

They’ve got good uses in number theory too, as they correspond well to the different ways to solve problems, often polynomials. They’re also popular in group theory. They might be the simplest rings that work like arithmetic but that don’t commute. So they can serve as ways to learn properties of more exotic ring structures.

Knowing of these marvelous exotic creatures of the deep mathematics your imagination might be fired. Can we do this again? Can we make something with, say, four unreal numbers? No, no we can’t. Four won’t work. Nor will five. If we keep going, though, we do hit upon success with seven unreal numbers.

This is a set called the octonions. Hamilton had barely worked out the scheme for quaternions when John T Graves, a friend of his at least up through the 16th of December, 1843, wrote of this new scheme. (Graves didn’t publish before Arthur Cayley did. Cayley’s one of those unspeakably prolific 19th century mathematicians. He has at least 967 papers to his credit. And he was a lawyer doing mathematics on the side for about 250 of those papers. This depresses every mathematician who ponders it these days.)

But where quaternions are peculiar, octonions are really peculiar. Let me call a couple quaternions p, q, and r. p times q might not be the same thing as q times r. But p times the product of q and r will be the same thing as the product of p and q itself times r. This we call associativity. Octonions don’t have that. Let me call a couple quaternions s, t, and u. s times the product of t times u may be either positive or negative the product of s and t times u. (It depends.)

Octonions have some neat mathematical properties. But I don’t know of any general uses for them that are as catchy as understanding rotations. Not rotations in the three-dimensional world, anyway.

Yes, yes, we can go farther still. There’s a construct called “sedenions”, which have fifteen non-real numbers on them. That’s 16 terms in each number. Where octonions are peculiar, sedenions are really peculiar. They work even less like regular old numbers than octonions do. With octonions, at least, when you multiply s by the product of s and t, you get the same number as you would multiplying s by s and then multiplying that by t. Sedenions don’t even offer that shred of normality. Besides being a way to learn about abstract algebra structures I don’t know what they’re used for.

I also don’t know of further exotic terms along this line. It would seem to fit a pattern if there’s some 32-term construct that we can define something like multiplication for. But it would presumably be even less like regular multiplication than sedenion multiplication is. If you want to fiddle about with that please do enjoy yourself. I’d be interested to hear if you turn up anything, but I don’t expect it’ll revolutionize the way I look at numbers. Sorry. But the discovery might be the fun part anyway.

## A Leap Day 2016 Mathematics A To Z: Polynomials

I have another request for today’s Leap Day Mathematics A To Z term. Gaurish asked for something exciting. This should be less challenging than Dedekind Domains. I hope.

## Polynomials.

Polynomials are everything. Everything in mathematics, anyway. If humans study it, it’s a polynomial. If we know anything about a mathematical construct, it’s because we ran across it while trying to understand polynomials.

I exaggerate. A tiny bit. Maybe by three percent. But polynomials are big.

They’re easy to recognize. We can get them in pre-algebra. We make them out of a set of numbers called coefficients and one or more variables. The coefficients are usually either real numbers or complex-valued numbers. The variables we usually allow to be either real or complex-valued numbers. We take each coefficient and multiply it by some power of each variable. And we add all that up. So, polynomials are things that look like these things:

$x^2 - 2x + 1$
$12 x^4 + 2\pi x^2 y^3 - 4x^3 y - \sqrt{6}$
$\ln(2) + \frac{1}{2}\left(x - 2\right) - \frac{1}{2 \cdot 2^2}\left(x - 2\right)^2 + \frac{1}{2 \cdot 2^3}\left(x - 2\right)^3 - \frac{1}{2 \cdot 2^4}\left(x - 2\right)^4 + \cdots$
$a_n x^n + a_{n - 1}x^{n - 1} + a_{n - 2}x^{n - 2} + \cdots + a_2 x^2 + a_1 x^1 + a_0$

The first polynomial maybe looks nice and comfortable. The second may look a little threatening, what with it having two variables and a square root in it, but it’s not too weird. The third is an infinitely long polynomial; you’re supposed to keep going on in that pattern, adding even more terms. The last is a generic representation of a polynomial. Each number a0, a1, a2, et cetera is some coefficient that we in principle know. It’s a good way of representing a polynomial when we want to work with it but don’t want to tie ourselves down to a particular example. The highest power we raise a variable to we call the degree of the polynomial. A second-degree polynomial, for example, has an x2 in it, but not an x3 or x4 or x18 or anything like that. A third-degree polynomial has an x3, but not x to any higher powers. Degree is a useful way of saying roughly how long a polynomial is, so it appears all over discussions of polynomials.

But why do we like polynomials? Why like them so much that MathWorld lists 1,163 pages that mention polynomials?

It’s because they’re great. They do everything we’d ever want to do and they’re great at it. We can add them together as easily as we add regular old numbers. We can subtract them as well. We can multiply and divide them. There’s even prime polynomials, just like there are prime numbers. They take longer to work out, but they’re not harder.

And they do great stuff in advanced mathematics too. In calculus we want to take derivatives of functions. Polynomials, we always can. We get another polynomial out of that. So we can keep taking derivatives, as many as we need. (We might need a lot of them.) We can integrate too. The integration produces another polynomial. So we can keep doing that as long as we need too. (We need to do this a lot, too.) This lets us solve so many problems in calculus, which is about how functions work. It also lets us solve so many problems in differential equations, which is about systems whose change depends on the current state of things.

That’s great for analyzing polynomials, but what about things that aren’t polynomials?

Well, if a function is continuous, then it might as well be a polynomial. To be a little more exact, we can set a margin of error. And we can always find polynomials that are less than that margin of error away from the original function. The original function might be annoying to deal with. The polynomial that’s as close to it as we want, though, isn’t.

Not every function is continuous. Most of them aren’t. But most of the functions we want to do work with are, or at least are continuous in stretches. Polynomials let us understand the functions that describe most real stuff.

Nice for mathematicians, all right, but how about for real uses? How about for calculations?

Oh, polynomials are just magnificent. You know why? Because you can evaluate any polynomial as soon as you can add and multiply. (Also subtract, but we think of that as addition.) Remember, x4 just means “x times x times x times x”, four of those x’s in the product. All these polynomials are easy to evaluate.

Even better, we don’t have to evaluate them. We can automate away the evaluation. It’s easy to set a calculator doing this work, and it will do it without complaint and with few unforeseeable mistakes.

Now remember that thing where we can make a polynomial close enough to any continuous function? And we can always set a calculator to evaluate a polynomial? Guess that this means about continuous functions. We have a tool that lets us calculate stuff we would want to know. Things like arccosines and logarithms and Bessel functions and all that. And we get nice easy to understand numbers out of them. For example, that third polynomial I gave you above? That’s not just infinitely long. It’s also a polynomial that approximates the natural logarithm. Pick a positive number x that’s between 0 and 4 and put it in that polynomial. Calculate terms and add them up. You’ll get closer and closer to the natural logarithm of that number. You’ll get there faster if you pick a number near 2, but you’ll eventually get there for whatever number you pick. (Calculus will tell us why x has to be between 0 and 4. Don’t worry about it for now.)

So through polynomials we can understand functions, analytically and numerically.

And they keep revealing things to us. We discovered complex-valued numbers because we wanted to find roots, values of x that make a polynomial of x equal to zero. Some formulas worked well for third- and fourth-degree polynomials. (They look like the quadratic formula, which solves second-degree polynomials. The big difference is nobody remembers what they are without looking them up.) But the formulas sometimes called for things that looked like square roots of negative numbers. Absurd! But if you carried on as if these square roots of negative numbers meant something, you got meaningful answers. And correct answers.

We wanted formulas to solve fifth- and higher-degree polynomials exactly. We can do this with second and third and fourth-degree polynomials, after all. It turns out we can’t. Oh, we can solve some of them exactly. The attempt to understand why, though, helped us create and shape group theory, the study of things that look like but aren’t numbers.

Polynomials go on, sneaking into everything. We can look at a square matrix and discover its characteristic polynomial. This allows us to find beautifully-named things like eigenvalues and eigenvectors. These reveal secrets of the matrix’s structure. We can find polynomials in the formulas that describe how many ways to split up a group of things into a smaller number of sets. We can find polynomials that describe how networks of things are connected. We can find polynomials that describe how a knot is tied. We can even find polynomials that distinguish between a knot and the knot’s reflection in the mirror.

Polynomials are everything.

## A Leap Day 2016 Mathematics A To Z: Orthonormal

Jacob Kanev had requested “orthogonal” for this glossary. I’d be happy to oblige. But I used the word in last summer’s Mathematics A To Z. And I admit I’m tempted to just reprint that essay, since it would save some needed time. But I can do something more.

## Orthonormal.

“Orthogonal” is another word for “perpendicular”. Mathematicians use it for reasons I’m not precisely sure of. My belief is that it’s because “perpendicular” sounds like we’re talking about directions. And we want to extend the idea to things that aren’t necessarily directions. As majors, mathematicians learn orthogonality for vectors, things pointing in different directions. Then we extend it to other ideas. To functions, particularly, but we can also define it for spaces and for other stuff.

I was vague, last summer, about how we do that. We do it by creating a function called the “inner product”. That takes in two of whatever things we’re measuring and gives us a real number. If the inner product of two things is zero, then the two things are orthogonal.

The first example mathematics majors learn of this, before they even hear the words “inner product”, are dot products. These are for vectors, ordered sets of numbers. The dot product we find by matching up numbers in the corresponding slots for the two vectors, multiplying them together, and then adding up the products. For example. Give me the vector with values (1, 2, 3), and the other vector with values (-6, 5, -4). The inner product will be 1 times -6 (which is -6) plus 2 times 5 (which is 10) plus 3 times -4 (which is -12). So that’s -6 + 10 – 12 or -8.

So those vectors aren’t orthogonal. But how about the vectors (1, -1, 0) and (0, 0, 1)? Their dot product is 1 times 0 (which is 0) plus -1 times 0 (which is 0) plus 0 times 1 (which is 0). The vectors are perpendicular. And if you tried drawing this you’d see, yeah, they are. The first vector we’d draw as being inside a flat plane, and the second vector as pointing up, through that plane, like a thumbtack.

Well … the inner product can tell us something besides orthogonality. What happens if we take the inner product of a vector with itself? Say, (1, 2, 3) with itself? That’s going to be 1 times 1 (which is 1) plus 2 times 2 (4, according to rumor) plus 3 times 3 (which is 9). That’s 14, a tidy sum, although, so what?

The inner product of (-6, 5, -4) with itself? Oh, that’s some ugly numbers. Let’s skip it. How about the inner product of (1, -1, 0) with itself? That’ll be 1 times 1 (which is 1) plus -1 times -1 (which is positive 1) plus 0 times 0 (which is 0). That adds up to 2. And now, wait a minute. This might be something.

Start from somewhere. Move 1 unit to the east. (Don’t care what the unit is. Inches, kilometers, astronomical units, anything.) Then move -1 units to the north, or like normal people would say, 1 unit o the south. How far are you from the starting point? … Well, you’re the square root of 2 units away.

Now imagine starting from somewhere and moving 1 unit east, and then 2 units north, and then 3 units straight up, because you found a convenient elevator. How far are you from the starting point? This may take a moment of fiddling around with the Pythagorean theorem. But you’re the square root of 14 units away.

And what the heck, (0, 0, 1). The inner product of that with itself is 0 times 0 (which is zero) plus 0 times 0 (still zero) plus 1 times 1 (which is 1). That adds up to 1. And, yeah, if we go one unit straight up, we’re one unit away from where we started.

The inner product of a vector with itself gives us the square of the vector’s length. At least if we aren’t using some freak definition of inner products and lengths and vectors. And this is great! It means we can talk about the length — maybe better to say the size — of things that maybe don’t have obvious sizes.

Some stuff will have convenient sizes. For example, they’ll have size 1. The vector (0, 0, 1) was one such. So is (1, 0, 0). And you can think of another example easily. Yes, it’s $\left(\frac{1}{\sqrt{2}}, -\frac{1}{2}, \frac{1}{2}\right)$. (Go ahead, check!)

So by “orthonormal” we mean a collection of things that are orthogonal to each other, and that themselves are all of size 1. It’s a description of both what things are by themselves and how they relate to one another. A thing can’t be orthonormal by itself, for the same reason a line can’t be perpendicular to nothing in particular. But a pair of things might be orthogonal, and they might be the right length to be orthonormal too.

Why do this? Well, the same reasons we always do this. We can impose something like direction onto a problem. We might be able to break up a problem into simpler problems, one in each direction. We might at least be able to simplify the ways different directions are entangled. We might be able to write a problem’s solution as the sum of solutions to a standard set of representative simple problems. This one turns up all the time. And an orthogonal set of something is often a really good choice of a standard set of representative problems.

This sort of thing turns up a lot when solving differential equations. And those often turn up when we want to describe things that happen in the real world. So a good number of mathematicians develop a habit of looking for orthonormal sets.

## A Leap Day 2016 Mathematics A To Z: Normal Subgroup

The Leap Day Mathematics A to Z term today is another abstract algebra term. This one again comes from from Gaurish, chief author of the Gaurish4Math blog. Part of it is going to be easy. Part of it is going to need a running start.

## Normal Subgroup.

The “subgroup” part of this is easy. Remember that a “group” means a collection of things and some operation that lets us combine them. We usually call that either addition or multiplication. We usually write it out like it’s multiplication. If a and b are things from the collection, we write “ab” to mean adding or multiplying them together. (If we had a ring, we’d have something like addition and something like multiplication, and we’d be able to do “a + b” or “ab” as needed.)

So with that in mind, the first thing you’d imagine a subgroup to be? That’s what it is. It’s a collection of things, all of which are in the original group, and that uses the same operation as the original group. For example, if the original group has a set that’s the whole numbers and the operation of addition, a subgroup would be the even numbers and the same old addition.

Now things will get much clearer if I have names. Let me use G to mean some group. This is a common generic name for a group. Let me use H as the name for a subgroup of G. This is a common generic name for a subgroup of G. You see how deeply we reach to find names for things. And we’ll still want names for elements inside groups. Those are almost always lowercase letters: a and b, for example. If we want to make clear it’s something from G’s set, we might use g. If we want to be make clear it’s something from H’s set, we might use h.

I need to tax your imagination again. Suppose “g” is some element in G’s set. What would you imagine the symbol “gH” means? No, imagine something simpler.

Mathematicians call this “left-multiplying H by g”. What we mean is, take every single element h that’s in the set H, and find out what gh is. Then take all these products together. That’s the set “gH”. This might be a subgroup. It might not. No telling. Not without knowing what G is, what H is, what g is, and what the operation is. And we call it left-multiplying even if the operation is called addition or something else. It’s just easier to have a standard name even if the name doesn’t make perfect sense.

That we named something left-multiplying probably inspires a question. Is there right-multiplying? Yes, there is. We’d write that as “Hg”. And that means take every single element h that’s in the set H, and find out what hg is. Then take all these products together.

You see the subtle difference between left-multiplying and right-multiplying. In the one, you multiply everything in H on the left. In the other, you multiply everything in H on the right.

So. Take anything in G. Let me call that g. If it’s always, necessarily, true that the left-product, gH, is the same set as the right-product, Hg, then H is a normal subgroup of G.

The mistake mathematics majors make in doing this: we need the set gH to be the same as the set Hg. That is, the whole collection of products has to be the same for left-multiplying as right-multiplying. Nobody cares whether for any particular thing, h, inside H whether gh is the same as hg. It doesn’t matter. It’s whether the whole collection of things is the same that counts. I assume every mathematics major makes this mistake. I did, anyway.

The natural thing to wonder here: how can the set gH ever not be the same as Hg? For that matter, how can a single product gh ever not be the same as hg? Do mathematicians just forget how multiplication works?

Technically speaking no, we don’t. We just want to be able to talk about operations where maybe the order does too matter. With ordinary regular-old-number addition and multiplication the order doesn’t matter. gh always equals hg. We say this “commutes”. And if the operation for a group commutes, then every subgroup is a normal subgroup.

But sometimes we’re interested in things that don’t commute. Or that we can’t assume commute. The example every algebra book uses for this is three-dimensional rotations. Set your algebra book down on a table. If you don’t have an algebra book you may use another one instead. I recommend Christopher Miller’s American Cornball: A Laffopedic Guide To The Formerly Funny. It’s a fine guide to all sorts of jokes that used to amuse and what was supposed to be amusing about them. If you don’t have a table then I don’t know what to suggest.

Spin the book clockwise on the table and then stand it up on the edge nearer you. Then try again. Put the book back where it started. Stand it up on the edge nearer you and then spin it clockwise on the table. The book faces a different way this time around. (If it doesn’t, you spun too much. Try again until you get the answer I said.)

Three-dimensional rotations like this form a group. The different ways you can turn something are the elements of its set. The operation between two rotations is just to do one and then the other, in order. But they don’t commute, not most of the time. So they can have a subgroup that isn’t normal.

You may believe me now that such things exist. Now you can move on to wondering why we should care.

Let me start by saying every group has at least two normal subgroups. Whatever your group G is, there’s a subgroup that’s made up just of the identity element and the group’s operation. The identity element is the thing that acts like 1 does for multiplication. You can multiply stuff by it and you get the same thing you started. The identity and the operator make a subgroup. And you’ll convince yourself that it’s a normal subgroup as soon as you write down g1 = 1g.

(Wait, you might ask! What if multiplying on the left has a different identity than multiplying on the right does? Great question. Very good insight. You’ve got a knack for asking good questions. If we have that then we’re working with a more exotic group-like mathematical object, so don’t worry.)

So the identity, ‘1’, makes a normal subgroup. Here’s another normal subgroup. The whole of G qualifies. (It’s OK if you feel uneasy. Think it over.)

So ‘1’ is a normal subgroup of G. G is a normal subgroup of G. They’re boring answers. We know them before we even know anything about G. But they qualify.

Does this sound familiar any? We have a thing. ‘1’ and the original thing subdivide it. It might be possible to subdivide it more, but maybe not.

Is this all … factoring?

Please here pretend I make a bunch of awkward faces while trying not to say either yes or no. But if H is a normal subgroup of G, then we can write something G/H, just like we might write 4/2, and that means something.

That G/H we call a quotient group. It’s a subgroup, sure. As to what it is … well, let me go back to examples.

Let’s say that G is the set of whole numbers and the operation of ordinary old addition. And H is the set of whole numbers that are multiples of 4, again with addition. So the things in H are 0, 4, 8, 12, and so on. Also -4, -8, -12, and so on.

Suppose we pick things in G. And we use the group operation on the set of things in H. How many different sets can we get out of it? So for example we might pick the number 1 out of G. The set 1 + H is … well, list all the things that are in H, and add 1 to them. So that’s 1 + 0, 1 + 4, 1 + 8, 1 + 12, and 1 + -4, 1 + -8, 1 + -12, and so on. All told, it’s a bunch of numbers one more than a whole multiple of 4.

Or we might pick the number 7 out of G. The set 7 + H is 7 + 0, 7 + 4, 7 + 8, 7 + 12, and so on. It’s also got 7 + -4, 7 + -8, 7 + -12, and all that. These are all the numbers that are three more than a whole multiple of 4.

We might pick the number 8 out of G. This happens to be in H, but so what? The set 8 + H is going to be 8 + 0, 8 + 4, 8 + 8 … you know, these are all going to be multiples of 4 again. So 8 + H is just H. Some of these are simple.

How about the number 3? 3 + H is 3 + 0, 3 + 4, 3 + 8, and so on. The thing is, the collection of numbers you get by 3 + H is the same as the collection of numbers you get by 7 + H. Both 3 and 7 do the same thing when we add them to H.

Fiddle around with this and you realize there’s only four possible different sets you get out of this. You can get 0 + H, 1 + H, 2 + H, or 3 + H. Any other numbers in G give you a set that looks exactly like one of those. So we can speak of 0, 1, 2, and 3 as being a new group, the “quotient group” that you get by G/H. (This looks more like remainders to me, too, but that’s the terminology we have.)

But we can do something like this with any group and any normal subgroup of that group. The normal subgroup gives us a way of picking out a representative set of the original group. That set shows off all the different ways we can manipulate the normal subgroup. It tells us things about the way the original group is put together.

Normal subgroups are not just “factors, but for groups”. They do give us a way to see groups as things built up of other groups. We can see structures in sets of things.

## A Leap Day 2016 Mathematics A To Z: Matrix

I get to start this week with another request. Today’s Leap Day Mathematics A To Z term is a famous one, and one that I remember terrifying me in the earliest days of high school. The request comes from Gaurish, chief author of the Gaurish4Math blog.

## Matrix.

Lewis Carroll didn’t like the matrix. Well, Charles Dodgson, anyway. And it isn’t that he disliked matrices particularly. He believed it was a bad use of a word. “Surely,” he wrote, “[ matrix ] means rather the mould, or form, into which algebraical quantities may be introduced, than an actual assemblage of such quantities”. He might have had etymology on his side. The word meant the place where something was developed, the source of something else. History has outvoted him, and his preferred “block”. The first mathematicians to use the word “matrix” were interested in things derived from the matrix. So for them, the matrix was the source of something else.

What we mean by a matrix is a collection of some number of rows and columns. Inside each individual row and column is some mathematical entity. We call this an element. Elements are almost always real numbers. When they’re not real numbers they’re complex-valued numbers. (I’m sure somebody, somewhere has created matrices with something else as elements. You’ll never see these freaks.)

Matrices work a lot like vectors do. We can add them together. We can multiply them by real- or complex-valued numbers, called scalars. But we can do other things with them. We can define multiplication, at least sometimes. The definition looks like a lot of work, but it represents something useful that way. And for square matrices, ones with equal numbers of rows and columns, we can find other useful stuff. We give that stuff wonderful names like traces and determinants and eigenvalues and eigenvectors and such.

One of the big uses of matrices is to represent a mapping. A matrix can describe how points in a domain map to points in a range. Properly, a matrix made up of real numbers can only describe what are called linear mappings. These are ones that turn the domain into the range by stretching or squeezing down or rotating the whole domain the same amount. A mapping might follow different rules in different regions, but that’s all right. We can write a matrix that approximates the original mapping, at least in some areas. We do this in the same way, and for pretty much the same reason, we can approximate a real and complicated curve with a bunch of straight lines. Or the way we can approximate a complicated surface with a bunch of triangular plates.

We can compound mappings. That is, we can start with a domain and a mapping, and find the image of that domain. We can then use a mapping again and find the image of the image of that domain. The matrix that describes this mapping-of-a-mapping is the one you get by multiplying the matrix of the first mapping and the matrix of the second mapping together. This is why we define matrix multiplication the odd way we do. Mapping are that useful, and matrices are that tied to them.

I wrote about some of the uses of matrices in a Set Tour essay. That was based on a use of matrices in physics. We can describe the changing of a physical system with a mapping. And we can understand equilibriums, states where a system doesn’t change, by looking at the matrix that approximates what the mapping does near but not exactly on the equilibrium.

But there are other uses of matrices. Many of them have nothing to do with mappings or physical systems or anything. For example, we have graph theory. A graph, here, means a bunch of points, “vertices”, connected by curves, “edges”. Many interesting properties of graphs depend on how many other vertices each vertex is connected to. And this is well-represented by a matrix. Index your vertices. Then create a matrix. If vertex number 1 connects to vertex number 2, put a ‘1’ in the first row, second column. If vertex number 1 connects to vertex number 3, put a ‘1’ in the first row, third column. If vertex number 2 isn’t connected to vertex number 3, put a ‘0’ in the second row, third column. And so on.

We don’t have to use ones and zeroes. A “network” is a kind of graph where there’s some cost associated with each edge. We can put that cost, that number, into the matrix. Studying the matrix of a graph or network can tell us things that aren’t obvious from looking at the drawing.

## A Leap Day 2016 Mathematics A To Z: Lagrangian

It’s another of my handful of free choice days today. I’ll step outside the abstract algebra focus I’ve somehow gotten lately to look instead at mechanics.

## Lagrangian.

So, you likely know Newton’s Laws of Motion. At least you know of them. We build physics out of them. So a lot of applied mathematics relies on them. There’s a law about bodies at rest staying at rest. There’s one about bodies in motion continuing in a straight line. There’s one about the force on a body changing its momentum. Something about F equalling m a. There’s something about equal and opposite forces. That’s all good enough, and that’s all correct. We don’t use them anyway.

I’m overstating for the sake of a good hook. They’re all correct. And if the problem’s simple enough there’s not much reason to go past this F and m a stuff. It’s just that once you start looking at complicated problems this gets to be an awkward tool. Sometimes a system is just hard to describe using forces and accelerations. Sometimes it’s impossible to say even where to start.

For example, imagine you have one of those pricey showpiece globes. The kind that’s a big ball that spins on an axis, and whose axis in on a ring that can tip forward or back. And it’s an expensive showpiece globe. That axis is itself in another ring that rotates clockwise and counterclockwise. Give the globe a good solid spin so it won’t slow down anytime soon. Then nudge the frame, so both the horizontal ring and the ring the axis is on wobble some. The whole shape is going to wobble and move in some way. We ought to be able to model that. How? Force and mass and acceleration barely seem to even exist.

The Lagrangian we get from Joseph-Louis Lagrange, who in the 18th century saw a brilliant new way to understand physics. It doesn’t describe how things move in response to forces, at least not directly. It describes how things move using energy. In particular, it uses on potential energy and kinetic energy.

This is brilliant on many counts. The biggest is in switching from forces to energy. Forces are vectors; they carry information about their size and their direction. Energy is a scalar; it’s just a number. A number is almost always easier to work with than a number alongside a direction.

The second big brilliance is that the Lagrangian gives us freedom in choosing coordinate systems. We have to know where things are and how they’re changing. The first obvious guess for how to describe things is their position in space. And that works fine until we look at stuff such as this spinning, wobbling globe. That never quite moves, although the spinning and the wobbling is some kind of motion. The problem begs us to think of the globe’s rotation around three different axes. Newton doesn’t help us with that. The Lagrangian, though —

The Lagrangian lets us describe physics using “generalized coordinates”. By this we mean coordinates that make sense for the problem even if they don’t directly relate to where something or other is in space. Any pick of coordinates is good, as long as we can describe the potential energy and the kinetic energy of the system using them.

I’ve been writing about this as if the Lagrangian were the cure for all hard work ever. It’s not, alas. For example, we often want to study big bunches of particles that all attract (or repel) each other. That attraction (or repulsion) we represent as potential energy. This is easier to deal with than forces, granted. But that’s easier, which is not the same as easy.

Still, the Lagrangian is great. We can do all the physics we used to. And we have a new freedom to set up problems in convenient ways. And the perspective of looking at energy instead of forces gives us a fruitful view on physics problems.

## A Leap Day 2016 Mathematics A To Z: Kullbach-Leibler Divergence

Today’s mathematics glossary term is another one requested by Jacob Kanev. Kaven, I learned last time, has got a blog, “Some Unconsidered Trifles”, for those interested in having more things to read. Kanev’s request this time was a term new to me. But learning things I didn’t expect to consider is part of the fun of this dance.

## Kullback-Leibler Divergence.

The Kullback-Leibler Divergence comes to us from information theory. It’s also known as “information divergence” or “relative entropy”. Entropy is by now a familiar friend. We got to know it through, among other things, the “How interesting is a basketball tournament?” question. In this context, entropy is a measure of how surprising it would be to know which of several possible outcomes happens. A sure thing has an entropy of zero; there’s no potential surprise in it. If there are two equally likely outcomes, then the entropy is 1. If there are four equally likely outcomes, then the entropy is 2. If there are four possible outcomes, but one is very likely and the other three mediocre, the entropy might be low, say, 0.5 or so. It’s mostly but not perfectly predictable.

Suppose we have a set of possible outcomes for something. (Pick anything you like. It could be the outcomes of a basketball tournament. It could be how much a favored stock rises or falls over the day. It could be how long your ride into work takes. As long as there are different possible outcomes, we have something workable.) If we have a probability, a measure of how likely each of the different outcomes is, then we have a probability distribution. More likely things have probabilities closer to 1. Less likely things have probabilities closer to 0. No probability is less than zero or more than 1. All the probabilities added together sum up to 1. (These are the rules which make something a probability distribution, not just a bunch of numbers we had in the junk drawer.)

The Kullback-Leibler Divergence describes how similar two probability distributions are to one another. Let me call one of these probability distributions p. I’ll call the other one q. We have some number of possible outcomes, and we’ll use k as an index for them. pk is how likely, in distribution p, that outcome number k is. qk is how likely, in distribution q, that outcome number k is.

To calculate this divergence, we work out, for each k, the number pk times the logarithm of pk divided by qk. Here the logarithm is base two. Calculate all this for every one of the possible outcomes, and add it together. This will be some number that’s at least zero, but it might be larger.

The closer that distribution p and distribution q are to each other, the smaller this number is. If they’re exactly the same, this number will be zero. The less that distribution p and distribution q are like each other, the bigger this number is.

And that’s all good fun, but, why bother with it? And at least one answer I can give is that it lets us measure how good a model of something is.

Suppose we think we have an explanation for how something varies. We can say how likely it is we think there’ll be each of the possible different outcomes. This gives us a probability distribution which let’s call q. We can compare that to actual data. Watch whatever it is for a while, and measure how often each of the different possible outcomes actually does happen. This gives us a probability distribution which let’s call p.

If our model is a good one, then the Kullback-Leibler Divergence between p and q will be small. If our model’s a lousy one, then this divergence will be large. If we have a couple different models, we can see which ones make for smaller divergences and which ones make for larger divergences. Probably we’ll want smaller divergences.

Here you might ask: why do we need a model? Isn’t the actual data the best model we might have? It’s a fair question. But no, real data is kind of lousy. It’s all messy. It’s complicated. We get extraneous little bits of nonsense clogging it up. And the next batch of results is going to be different from the old ones anyway, because real data always varies.

Furthermore, one of the purposes of a model is to be simpler than reality. A model should do away with complications so that it is easier to analyze, easier to make predictions with, and easier to teach than the reality is. But a model mustn’t be so simple that it can’t represent important aspects of the thing we want to study.

The Kullback-Leibler Divergence is a tool that we can use to quantify how much better one model or another fits our data. It also lets us quantify how much of the grit of reality we lose in our model. And this is at least some of the use of this quantity.

## A Leap Day 2016 Mathematics A To Z: Jacobian

I don’t believe I got any requests for a mathematics term starting ‘J’. I’m as surprised as you. Well, maybe less surprised. I’ve looked at the alphabetical index for Wolfram MathWorld and noticed its relative poverty for ‘J’. It’s not as bad as ‘X’ or ‘Y’, though. But it gives me room to pick a word of my own.

## Jacobian.

The Jacobian is named for Carl Gustav Jacob Jacobi, who lived in the first half of the 19th century. He’s renowned for work in mechanics, the study of mathematically modeling physics. He’s also renowned for matrices, rectangular grids of numbers which represent problems. There’s more, of course, but those are the points that bring me to the Jacobian I mean to talk about. There are other things named for Jacobi, including other things named “Jacobian”. But I mean to limit the focus to two, related, things.

I discussed mappings some while describing homomorphisms and isomorphisms. A mapping’s a relationship matching things in one set, a domain, to things in a set, the range. The domain and the range can be anything at all. They can even be the same thing, if you like.

A very common domain is … space. Like, the thing you move around in. It’s a region full of points that are all some distance and some direction from one another. There’s almost always assumed to be multiple directions possible. We often call this “Euclidean space”. It’s the space that works like we expect for normal geometry. We might start with a two- or three-dimensional space. But it’s often convenient, especially for physics problems, to work with more dimensions. Four-dimensions. Six-dimensions. Incredibly huge numbers of dimensions. Honest, this often helps. It’s just harder to sketch out.

So we might for a problem need, say, 12-dimensional space. We can describe a point in that with an ordered set of twelve coordinates. Each describes how far you are from some standard reference point known as The Origin. If it doesn’t matter how many dimensions we’re working with, we call it an N-dimensional space. Or we use another letter if N is committed to something or other.

This is our stage. We are going to be interested in some N-dimensional Euclidean space. Let’s pretend N is 2; then our stage looks like the screen you’re reading now. We don’t need to pretend N is larger yet.

Our player is a mapping. It matches things in our N-dimensional space back to the same N-dimensional space. For example, maybe we have a mapping that takes the point with coordinates (3, 1) to the point (-3, -1). And it takes the point with coordinates (5.5, -2) to the point (-5.5, 2). And it takes the point with coordinates (-6, -π) to the point (6, π). You get the pattern. If we start from the point with coordinates (x, y) for some real numbers x and y, then the mapping gives us the point with coordinates (-x, -y).

One more step and then the play begins. Let’s not just think about a single point. Think about a whole region. If we look at the mapping of every point in that whole region, we get out … probably, some new region. We call this the “image” of the original region. With the mapping from the paragraph above, it’s easy to say what the image of a region is. It’ll look like the reflection in a corner mirror of the original region.

What if the mapping’s more complicated? What if we had a mapping that described how something was reflected in a cylindrical mirror? Or a mapping that describes how the points would move if they represent points of water flowing around a drain? — And that last explains why Jacobians appear in mathematical physics.

Many physics problems can be understood as describing how points that describe the system move in time. The dynamics of a system can be understood by how moving in time changes a region of starting conditions. A system might keep a region pretty much unchanged. Maybe it makes the region move, but it doesn’t change size or shape much. Or a system might change the region impressively. It might keep the area about the same, but stretch it out and fold it back, the way one might knead cookie dough.

The Jacobian, the one I’m interested in here, is a way of measuring these changes. The Jacobian matrix describes, for each point in the original domain, how a tiny change in one coordinate causes a change in the mapping’s coordinates. So if we have a mapping from an N-dimensional space to an N-dimensional space, there are going to be N times N values at work. Each one represents a different piece. How much does a tiny change in the first coordinate of the original point change the first coordinate of the mapping of the point? How much does a tiny change in the first coordinate of the original point change the second coordinate of the mapping of the the point? How much does a tiny change in the first coordinate of the original point change the third coordinate of the mapping of the point? … how much does a tiny change in the second coordinate of the original point change the first coordinate of the mapping of the point? And on and on and now you know why mathematics majors are trained on Jacobians with two-by-two and three-by-three matrices. We do maybe a couple four-by-four matrices to remind us that we are born to suffer. We never actually work out bigger matrices. Life is just too short.

(I’ve been talking, by the way, about the mapping of an N-dimensional space to an N-dimensional space. This is because we’re about to get to something that requires it. But we can write a matrix like this for a mapping of an N-dimensional space to an M-dimensional space, a different-sized space. It has uses. Let’s not worry about that.)

If you have a square matrix, one that has as many rows as columns, then you can calculate something named the determinant. This involves a lot of work. It takes even more work the bigger the matrix is. This is why mathematics majors learn to calculate determinants on two-by-two and three-by-three matrices. We do a couple four-by-four matrices and maybe one five-by-five to again remind us about suffering.

Anyway, by calculating the determinant of a Jacobian matrix, we get the Jacobian determinant. Finally we have something simple. The Jacobian determinant says how the area of a region changes in the mapping. Suppose the Jacobian determinant at a point is 2. Then a small region containing that point has an image with twice the original area. Suppose the Jacobian determinant is 0.8. Then a small region containing that point has an image with area 0.8 times the original area. Suppose the Jacobian determinant is -1. Then —

Well, what would you imagine?

If the Jacobian determinant is -1, then a small region around that point gets mapped to something with the same area. What changes is called the handedness. The mapping doesn’t just stretch or squash the region, but it also flips it along at least one dimension. The Jacobian determinant can tell us that.

So the Jacobian matrix, and the Jacobian determinant, are ways to describe how mappings change areas. Mathematicians will often call either of them just “the Jacobian”. We trust context to make clear what we mean. Either one is a way of describing how mappings change space: how they expand or contract, how they rotate, how they reflect spaces. Some fields of mathematics, including a surprising amount of the study of physics, are about studying how space changes.