## The Summer 2017 Mathematics A To Z: Jordan Canonical Form

I made a mistake! I thought we had got to the end of the block of A To Z topics suggested by Gaurish, of the For The Love Of Mathematics blog. Not so and, indeed, I wonder if it wouldn’t be a viable writing strategy around here for me to just ask Gaurish to throw out topics and I have two weeks to write about them. I don’t think there’s a single unpromising one in the set.

# Jordan Canonical Form.

Before you ask, yes, this is named for the Camille Jordan.

So this is a thing from algebra. Particularly, linear algebra. And more particularly, matrices. Matrices are so much of linear algebra that you could be forgiven thinking they’re all of linear algebra. The thing is, matrices are a really good way of describing linear transformations. That is, where you take a block of space and stretch it out, or squash it down, or rotate it, or do some combination of these things. And stretching and squashing and rotating is a lot of what you’d ever want to do. Refer to any book on how to draw animated cartoons. The only thing matrices can’t do is have their eyes bug out huge when an attractive region of space walks past.

Thing about a matrix is if you want to do something with it, you’re going to write it as a grid of numbers. It doesn’t have to be a grid of numbers. But about all the matrices anyone does anything with are grids of numbers. And that’s fine. They do an incredible lot of stuff. What’s not fine is that on looking at a huge block of numbers, the mind sees: huh. That’s a big block of numbers. Good luck finding what’s meaningful in them. To help find meaning we have a set of standard forms. We call them “canonical” or “normal” or some other approving term. They rearrange and change the terms in the matrix so that more interesting stuff is more obvious.

Now you’re justified asking: how can we rearrange and change the terms in a matrix without changing what the matrix is? We can get away with doing this because we can show some rearrangements don’t change what we’re interested in. That covers the “how dare we” part of “how”. We do it by using matrix multiplication. You might remember from high school algebra that matrix multiplication is this agonizing process of multiplying every pair of numbers that ever existed together, then adding them all up, and then maybe you multiply something by minus one because you’re thinking of determinants, and it all comes out wrong anyway and you have to do it over? Yeah. Well, matrix multiplication is defined hard because it makes stuff like this work out. So that covers the “by what technique” part of “how”. We start out with some matrix, let me imaginatively name it $A$. And then we find some transformation matrix for which, eh, let’s say $P$ is a good enough name. I’ll say why in a moment. Then we use that matrix and its multiplicative inverse $P^{-1}$. And we evaluate the product $P^{-1} A P$. This won’t just be the same old matrix we started with. Not usually. Promise. But what this will be, if we chose our matrix $P$ correctly, is some new matrix that’s easier to read.

The matrices involved here have to follow some rules. Most important, they’re all going to be square matrices. There’ll be more rules that your linear algebra textbook will tell you. Or your instructor will, after checking the textbook.

So what makes a matrix easy to read? Zeroes. Lots and lots of zeroes. When we have a standardized form of a matrix it’s nearly all zeroes. This is for a good reason: zeroes are easy to multiply stuff by. And they’re easy to add stuff to. And almost everything we do with matrices, as a calculation, is a lot of multiplication and addition of the numbers in the matrix.

What also makes a matrix easy to read? Everything important being on the diagonal. The diagonal is one of the two things you would imagine if you were told “here’s a grid of numbers, pick out the diagonal”. In particular it’s the one that goes from the upper left to the bottom right, that is, row one column one, and row two column two, and row three column three, and so on up to row 86 column 86 (or whatever). If everything is on the diagonal the matrix is incredibly easy to work with. If it can’t all be on the diagonal at least everything should be close to it. As close as possible.

In the Jordan Canonical Form not everything is on the diagonal. I mean, it can be, but you shouldn’t count on that. But everything either will be on the diagonal or else it’ll be one row up from the diagonal. That is, row one column two, row two column three, row 85 column 86. Like that. There’s two other important pieces.

First is the thing in the row above the diagonal will be either 1 or 0. Second is that on the diagonal you’ll have a sequence of all the same number. Like, you’ll get four instances of the number ‘2’ along this string of the diagonal. Third is that you’ll get a 1 above all but the row above first instance of this particular number. Fourth is that you’ll get a 0 in the row above the first instance of this number.

Yeah, that’s fussy to visualize. This is one of those things easiest to show in a picture. A Jordan canonical form is a matrix that looks like this:

 2 1 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 -2 1 0 0 0 0 0 0 0 0 0 0 0 -2

This may have you dazzled. It dazzles mathematicians too. When we have to write a matrix that’s almost all zeroes like this we drop nearly all the zeroes. If we have to write anything we just write a really huge 0 in the upper-right and the lower-left corners.

What makes this the Jordan Canonical Form is that the matrix looks like it’s put together from what we call Jordan Blocks. Look around the diagonals. Here’s the first Jordan Block:

 2 1 0 0 0 2 1 0 0 0 2 1 0 0 0 2

Here’s the second:

 3 1 0 3

Here’s the third:

 4 1 0 0 4 1 0 0 4

Here’s the fourth:

 -1

And here’s the fifth:

 -2 1 0 -2

And we can represent the whole matrix as this might-as-well-be-diagonal thing:

 First Block 0 0 0 0 0 Second Block 0 0 0 0 0 Third Block 0 0 0 0 0 Fourth Block 0 0 0 0 0 Fifth Block

These blocks can be as small as a single number. They can be as big as however many rows and columns you like. Each individual block is some repeated number on the diagonal, and a repeated one in the row above the diagonal. You can call this the “superdiagonal”.

(Mathworld, and Wikipedia, assert that sometimes the row below the diagonal — the “subdiagonal” — gets the 1’s instead of the superdiagonal. That’s fine if you like it that way, and it won’t change any of the real work. I have not seen these subdiagonal 1’s in the wild. But I admit I don’t do a lot of this field and maybe there’s times it’s more convenient.)

Using the Jordan Canonical Form for a matrix is a lot like putting an object in a standard reference pose for photographing. This is a good metaphor. We get a Jordan Canonical Form by matrix multiplication, which works like rotating and scaling volumes of space. You can view the Jordan Canonical Form for a matrix as how you represent the original matrix from a new viewing angle that makes it easy to recognize. And this is why $P$ is not a bad name for the matrix that does this work. We can see all this as “projecting” the matrix we started with into a new frame of reference. The new frame is maybe rotated and stretched and squashed and whatnot, compared to how we started. But it’s as valid a base. Projecting a mathematical object from one frame of reference to another usually involves calculating something that looks like $P^{-1} A P$ so, projection. That’s our name.

Mathematicians will speak of “the” Jordan Canonical Form for a matrix as if there were such a thing. I don’t mean that Jordan Canonical Forms don’t exist. They exist just as much as matrices do. It’s the “the” that misleads. You can put the Jordan Blocks in any order and have as valid, and as useful, a Jordan Canonical Form. But it’s easy to swap the orders of these blocks around — it’s another matrix multiplication, and a blessedly easy one — so it doesn’t matter which form you have. Get any one and you have them all.

I haven’t said anything about what these numbers on the diagonal are. They’re the eigenvalues of the original matrix. I hope that clears things up.

Yeah, not to anyone who didn’t know what a Jordan Canonical Form was to start with. Rather than get into calculations let me go to well-established metaphor. Take a sample of an unknown chemical and set it on fire. Put the light from this through a prism and photograph the spectrum. There will be lines, interruptions in the progress of colors. The locations of those lines and how intense they are tell you what the chemical is made of, and in what proportions. These are much like the eigenvectors and eigenvalues of a matrix. The eigenvectors tell you what the matrix is made of, and the eigenvalues how much of the matrix is those. This stuff gets you very far in proving a lot of great stuff. And part of what makes the Jordan Canonical Form great is that you get the eigenvalues right there in neat order, right where anyone can see them.

So! All that’s left is finding the things. The best way to find the Jordan Canonical Form for a given matrix is to become an instructor for a class on linear algebra and assign it as homework. The second-best way is to give the problem to your TA, who will type it in to Mathematica and return the result. It’s too much work to do most of the time. Almost all the stuff you could learn from having the thing in the Jordan Canonical Form you work out in the process of finding the matrix $P$ that would let you calculate what the Jordan Canonical Form is. And once you had that, why go on?

Where the Jordan Canonical Form shines is in doing proofs about what matrices can do. We can always put a square matrix into a Jordan Canonical Form. So if we want to show something is true about matrices in general, we can show that it’s true for the simpler-to-work-with Jordan Canonical Form. Then show that shifting a matrix to or from the Jordan Canonical Form doesn’t change whether the thing we’re interested in is true. It exists in that strange space: it is quite useful, but never on a specific problem.

Oh, all right. Yes, it’s the same Camille Jordan of the Jordan Curve and also of the Jordan Curve Theorem. That fellow.

## The End 2016 Mathematics A To Z: Hat

I was hoping to pick a term that was a quick and easy one to dash off. I learned better.

## Hat.

This is a simple one. It’s about notation. Notation is never simple. But it’s important. Good symbols organize our thoughts. They tell us what are the common ordinary bits of our problem, and what are the unique bits we need to pay attention to here. We like them to be easy to write. Easy to type is nice, too, but in my experience mathematicians work by hand first. Typing is tidying-up, and we accept that being sluggish. Unique would be nice, so that anyone knows what kind of work we’re doing just by looking at the symbols. I don’t think anything manages that. But at least some notation has alternate uses rare enough we don’t have to worry about it.

“Hat” has two major uses I know of. And we call it “hat”, although our friends in the languages department would point out this is a caret. The little pointy corner that goes above a letter, like so: $\hat{i}$. $\hat{x}$. $\hat{e}$. It’s not something we see on its own. It’s always above some variable.

The first use of the hat like this comes up in statistics. It’s a way of marking that something is an estimate. By “estimate” here we mean what anyone might mean by “estimate”. Statistics is full of uses for this sort of thing. For example, we often want to know what the arithmetic mean of some quantity is. The average height of people. The average temperature for the 18th of November. The average weight of a loaf of bread. We have some letter that we use to mean “the value this has for any one example”. By some letter we mean ‘x’, maybe sometimes ‘y’. We can use any and maybe the problem begs for something. But it’s ‘x’, maybe sometimes ‘y’.

For the arithmetic mean of ‘x’ for the whole population we write the letter with a horizontal bar over it. (The arithmetic mean is the thing everybody in the world except mathematicians calls the average. Also, it’s what mathematicians mean when they say the average. We just get fussy because we know if we don’t say “arithmetic mean” someone will come along and point out there are other averages.) That arithmetic mean is $\bar{x}$. Maybe $\bar{y}$ if we must. Must be some number. But what is it? If we can’t measure whatever it is for every single example of our group — the whole population — then we have to make an estimate. We do that by taking a sample, ideally one that isn’t biased in some way. (This is so hard to do, or at least be sure you’ve done.) We can find the mean for this sample, though, because that’s how we picked it. The mean of this sample is probably close to the mean of the whole population. It’s an estimate. So we can write $\hat{x}$ and understand. This is not $\bar{x}$ but it does give us a good idea what $\hat{x}$ should be.

(We don’t always use the caret ^ for this. Sometimes we use a tilde ~ instead. ~ has the advantage that it’s often used for “approximately equal to”. So it will carry that suggestion over to its new context.)

The other major use of the hat comes in vectors. Mathematics types do a lot of work with vectors. It turns out a lot of mathematical structures work the way that pointing and moving in directions in ordinary space do. That’s why back when I talked about what vectors were I didn’t say “they’re like arrows pointing some length in some direction”. Arrows pointing some length in some direction are vectors, yes, but there are many more things that are vectors. Thinking of moving in particular directions gives us good intuition for how to work with vectors, and for stuff that turns out to be vectors. But they’re not everything.

If we need to highlight that something is a vector we put a little arrow over its name. $\vec{x}$. $\vec{e}$. That sort of thing. (Or if we’re typing, we might put the letter in boldface: x. This was good back before computers let us put in mathematics without giving the typesetters hazard pay.) We don’t always do that. By the time we do a lot of stuff with vectors we don’t always need the reminder. But we will include it if we need a warning. Like if we want to have both $\vec{r}$ telling us where something is and to use a plain old $r$ to tell us how big the vector $\vec{r}$ is. That turns up a lot in physics problems.

Every vector has some length. Even vectors that don’t seem to have anything to do with distances do. We can make a perfectly good vector out of “polynomials defined for the domain of numbers between -2 and +2”. Those polynomials are vectors, and they have lengths.

There’s a special class of vectors, ones that we really like in mathematics. They’re the “unit vectors”. Those are vectors with a length of 1. And we are always glad to see them. They’re usually good choices for a basis. Basis vectors are useful things. They give us, in a way, a representative slate of cases to solve. Then we can use that representative slate to give us whatever our specific problem’s solution is. So mathematicians learn to look instinctively to them. We want basis vectors, and we really like them to have a length of 1. Even if we aren’t putting the arrow over our variables we’ll put the caret over the unit vectors.

There are some unit vectors we use all the time. One is just the directions in space. That’s $\hat{e}_1$ and $\hat{e}_2$ and for that matter $\hat{e}_3$ and I bet you have an idea what the next one in the set might be. You might be right. These are basis vectors for normal, Euclidean space, which is why they’re labelled “e”. We have as many of them as we have dimensions of space. We have as many dimensions of space as we need for whatever problem we’re working on. If we need a basis vector and aren’t sure which one, we summon one of the letters used as indices all the time. $\hat{e}_i$, say, or $\hat{e}_j$. If we have an n-dimensional space, then we have unit vectors all the way up to $\hat{e}_n$.

We also use the hat a lot if we’re writing quaternions. You remember quaternions, vaguely. They’re complex-valued numbers for people who’re bored with complex-valued numbers and want some thrills again. We build them as a quartet of numbers, each added together. Three of them are multiplied by the mysterious numbers ‘i’, ‘j’, and ‘k’. Each ‘i’, ‘j’, or ‘k’ multiplied by itself is equal to -1. But ‘i’ doesn’t equal ‘j’. Nor does ‘j’ equal ‘k’. Nor does ‘k’ equal ‘i’. And ‘i’ times ‘j’ is ‘k’, while ‘j’ times ‘i’ is minus ‘k’. That sort of thing. Easy to look up. You don’t need to know all the rules just now.

But we often end up writing a quaternion as a number like $4 + 2\hat{i} - 3\hat{j} + 1 \hat{k}$. OK, that’s just the one number. But we will write numbers like $a + b\hat{i} + c\hat{j} + d\hat{k}$. Here a, b, c, and d are all real numbers. This is kind of sloppy; the pieces of a quaternion aren’t in fact vectors added together. But it is hard not to look at a quaternion and see something pointing in some direction, like the first vectors we ever learn about. And there are some problems in pointing-in-a-direction vectors that quaternions handle so well. (Mostly how to rotate one direction around another axis.) So a bit of vector notation seeps in where it isn’t appropriate.

I suppose there’s some value in pointing out that the ‘i’ and ‘j’ and ‘k’ in a quaternion are fixed and set numbers. They’re unlike an ‘a’ or an ‘x’ we might see in the expression. I’m not sure anyone was thinking they were, though. Notation is a tricky thing. It’s as hard to get sensible and consistent and clear as it is to make words and grammar sensible. But the hat is a simple one. It’s good to have something like that to rely on.

## A Leap Day 2016 Mathematics A To Z: Basis

Today’s glossary term is one that turns up in many areas of mathematics. But these all share some connotations. So I mean to start with the easiest one to understand.

## Basis.

Suppose you are somewhere. Most of us are. Where is something else?

That isn’t hard to answer if conditions are right. If we’re allowed to point and the something else is in sight, we’re done. It’s when pointing and following the line of sight breaks down that we’re in trouble. We’re also in trouble if we want to say how to get from that something to yet another spot. How can we guide someone from one point to another?

We have a good answer from everyday life. We can impose some order, some direction, on space. We’re familiar with this from the cardinal directions. We say where things on the surface of the Earth are by how far they are north or south, east or west, from something else. The scheme breaks down a bit if we’re at the North or the South pole exactly, but there we can fall back on pointing.

When we start using north and south and east and west as directions we are choosing basis vectors. Vectors are directions in how far to move and in what direction. Suppose we have two vectors that aren’t pointing in the same direction. Then we can describe any two-dimensional movement using them. We can say “go this far in the direction of the first vector and also that far in the direction of the second vector”. With the cardinal directions, we consider north and east, or east and south, or south and west, or west and north to be a pair of vectors going in different directions.

(North and south, in this context, are the same thing. “Go twenty paces north” says the same thing as “go negative twenty paces south”. Most mathematicians don’t pull this sort of stunt when telling you how to get somewhere unless they’re trying to be funny without succeeding.)

A basis vector is just a direction, and distance in that direction, that we’ve decided to be a reference for telling different points in space apart. A basis set, or basis, is the collection of all the basis vectors we need. What do we need? We need enough basis vectors to get to all the points in whatever space we’re working with.

(If you are going to ask about doesn’t “east” point in different directions as we go around the surface of the Earth, you’re doing very well. Please pretend we never move so far from where we start that anyone could notice the difference. If you can’t do that, please pretend the Earth has been smooshed into a huge flat square with north at one end and we’re only just now noticing.)

We are free to choose whatever basis vectors we like. The worst that can happen if we choose a lousy basis is that we have to write out more things than we otherwise would. Our work won’t be less true, it’ll just be more tedious. But there are some properties that often make for a good basis.

One is that the basis should relate to the problem you’re doing. Suppose you were in one of mathematicians’ favorite places, midtown Manhattan. There is a compelling grid here of streets running north-south and avenues running east-west. (Broadway we ignore as an implementation error retained for reasons of backwards compatibility.) Well, we pretend they run north-south and east-west. They’re actually a good bit clockwise of north-south and east-west. They do that to better match the geography of the island. A “north” street runs about parallel to the way Manhattan’s long dimension runs. In the circumstance, it would be daft to describe directions by true north or true east. We would say to go so many streets “north” and so many avenues “east”.

Purely mathematical problems aren’t concerned with streets and avenues. But there will often be preferred directions. Mathematicians often look at the way a process alters shapes or redirects forces. There’ll be some directions where the alterations are biggest. There’ll be some where the alterations are shortest. Those directions are probably good choices for a basis. They stand out as important.

We also tend to like basis vectors that are a unit length. That is, their size is 1 in some convenient unit. That’s for the same reason it’s easier to say how expensive something is if it costs 45 dollars instead of nine five-dollar bills. Or if you’re told it was 180 quarter-dollars. The length of your basis vector is just a scaling factor. But the more factors you have to work with the more likely you are to misunderstand something.

And we tend to like basis vectors that are perpendicular to one another. They don’t have to be. But if they are then it’s easier to divide up our work. We can study each direction separately. Mathematicians tend to like techniques that let us divide problems up into smaller ones that we can study separately.

I’ve described basis sets using vectors. They have intuitive appeal. It’s easy to understand directions of things in space. But the idea carries across into other things. For example, we can build functions out of other functions. So we can choose a set of basis functions. We can multiply them by real numbers (scalars) and add them together. This makes whatever function we’re interested in into a kind of weighted average of basis functions.

Why do that? Well, again, we often study processes that change shapes and directions. If we choose a basis well, though, the process changes the basis vectors in easy to describe ways. And many interesting processes let us describe the changing of an arbitrary function as the weighted sum of the changes in the basis vectors. By solving a couple of simple problems we get the ability to solve every interesting problem.

We can even define something that works like the angle between functions. And something that works a lot like perpendicularity for functions.

And this carries on to other mathematical constructs. We look for ways to impose some order, some direction, on whatever structure we’re looking at. We’re often successful, and can work with unreal things using tools like those that let us find our place in a city.