My All 2020 Mathematics A to Z: Unitary Matrix


I assume that last week I disappointed Mr Wu, of the Singapore Maths Tuition blog, last week when I passed on a topic he suggested to unintentionally rewrite a good enough essay. I hope to make it up this week with a piece of linear algebra.

Color cartoon illustration of a coati in a beret and neckerchief, holding up a director's megaphone and looking over the Hollywood hills. The megaphone has the symbols + x (division obelus) and = on it. The Hollywood sign is, instead, the letters MATHEMATICS. In the background are spotlights, with several of them crossing so as to make the letters A and Z; one leg of the spotlights has 'TO' in it, so the art reads out, subtly, 'Mathematics A to Z'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Unitary Matrix.

A Unitary Matrix — note the article; there is not a singular the Unitary Matrix — starts with a matrix. This is an ordered collection of scalars. The scalars we call elements. I can’t think of a time I ever saw a matrix represented except as a rectangular grid of elements, or as a capital letter for the name of a matrix. Or a block inside a matrix. In principle the elements can be anything. In practice, they’re almost always either real numbers or complex numbers. To speak of Unitary Matrixes invokes complex-valued numbers. If a matrix that would be Unitary has only real-valued elements, we call that an Orthogonal Matrix. It’s not wrong to call an Orthogonal matrix “Unitary”. It’s like pointing to a known square, though, and calling it a parallelogram. Your audience will grant that’s true. But it wonder what you’re getting at, unless you’re talking about a bunch of parallelograms and some of them happen to be squares.

As with polygons, though, there are many names for particular kinds of matrices. The flurry of them settles down on the Intro to Linear Algebra student and it takes three or four courses before most of them feel like familiar names. I will try to keep the flurry clear. First, we’re talking about square matrices, ones with the same number of rows as columns.

Start with any old square matrix. Give it the name U because you see where this is going. There are a couple of new matrices we can derive from it. One of them is the complex conjugate. This is the matrix you get by taking the complex conjugate of every term. So, if one element is 3 + 4\imath , in the complex conjugate, that element would be 3 - 4\imath . Reverse the plus or minus sign of the imaginary component. The shorthand for “the complex conjugate to matrix U” is U^* . Also we’ll often just say “the conjugate”, taking the “complex” part as implied.

Start back with any old square matrix, again called U. Another thing you can do with it is take the transposition. This matrix, U-transpose, you get by keeping the order of elements but changing rows and columns. That is, the elements in the first row become the elements in the first column. The elements in the second row become the elements in the second column. Third row becomes the third column, and so on. The diagonal — first row, first column; second row, second column; third row, third column; and so on — stays where it was. The shorthand for “the transposition of U” is U^T .

You can chain these together. If you start with U and take both its complex-conjugate and its transposition, you get the adjoint. We write that with a little dagger: U^{\dagger} = (U^*)^T . For a wonder, as matrices go, it doesn’t matter whether you take the transpose or the conjugate first. It’s the same U^{\dagger} = (U^T)^* . You may ask how people writing this out by hand never mistake U^T for U^{\dagger} . This is a good question and I hope to have an answer someday. (I would write it as U^{A} in my notes.)

And the last thing you can maybe do with a square matrix is take its inverse. This is like taking the reciprocal of a number. When you multiply a matrix by its inverse, you get the Identity Matrix. Not every matrix has an inverse, though. It’s worse than real numbers, where only zero doesn’t have a reciprocal. You can have a matrix that isn’t all zeroes and that doesn’t have an inverse. This is part of why linear algebra mathematicians command the big money. But if a matrix U has an inverse, we write that inverse as U^{-1} .

The Identity Matrix is one of a family of square matrices. Every element in an identity matrix is zero, except on the diagonal. That is, the element at row one, column one, is the number 1. The element at row two, column two is also the number 1. Same with row three, column three: another one. And so on. This is the “identity” matrix because it works like the multiplicative identity. Pick any matrix you like, and multiply it by the identity matrix; you get the original matrix right back. We use the name I for an identity matrix. If we have to be clear how many rows and columns the matrix has, we write that as a subscript: I_2 or I_3 or I_N or so on.

So this, finally, lets me say what a Unitary Matrix is. It’s any square matrix U where the adjoint, U^{\dagger} is the same matrix as the inverse, U^{-1} . It’s wonderful to learn you have a Unitary Matrix. Not just because, most of the time, finding the inverse of a matrix is a long and tedious procedure. Here? You have to write the elements in a different order and change the plus-or-minus sign on the imaginary numbers. The only way it would be easier if you had only real numbers, and didn’t have to take the conjugates.

That’s all a nice heap of terms. What makes any of them important, other than so Intro to Linear Algebra professors can test their students?

Well, you know mathematicians. If we like something like this, it’s usually because it holds out the prospect of turning a hard problems into easier ones. So it is. Start out with any old matrix. Call it A. Then there exist some unitary matrixes, call them U and V. And their product does something wonderful: UAV is a “diagonal” matrix. A diagonal matrix has zeroes for every element except the diagonal ones. That is, row one, column one; row two, column two; row three, column three; and so on. The elements that trace a path from the upper-left to the lower-right corner of the matrix. (The diagonal from the upper-right to the lower-left we have nothing to do with.) Everything we might do with matrices is easier on a diagonal matrix. So we process our matrix A into this diagonal matrix D. Process it by whatever the heck we’re doing. If we then multiply this by the inverses of U and V? If we calculate V^{-1}DU^{-1} ? We get whatever our process would have given us had we done it to A. And, since U and V are unitary matrices, it’s easy to find these inverses. Wonderful!

Also this sounds like I just said Unitary Matrixes are great because they solve a problem you never heard of before.

The 20th Century’s first great use for Unitary Matrixes, and I imagine the impulse for Mr Wu’s suggestion, was quantum mechanics. (A later use would be data compression.) Unitary Matrixes help us calculate how quantum systems evolve. This should be a little easier to understand if I use a simple physics problem as demonstration.

So imagine three blocks, all the same mass. They’re connected in a row, left to right. There’s two springs, one between the left and the center mass, one between the center and the right mass. The springs have the same strength. The blocks can only move left-to-right. But, within those bounds, you can do anything you like with the blocks. Move them wherever you like and let go. Let them go with a kick moving to the left or the right. The only restraint is they can’t pass through one another; you can’t slide the center block to the right of the right block.

This is not quantum mechanics, by the way. But it’s not far, either. You can turn this into a fine toy of a molecule. For now, though, think of it as a toy. What can you do with it?

A bunch of things, but there’s two really distinct ways these blocks can move. These are the ways the blocks would move if you just hit it with some energy and let the system do what felt natural. One is to have the center block stay right where it is, and the left and right blocks swinging out and in. We know they’ll swing symmetrically, the left block going as far to the left as the right block goes to the right. But all these symmetric oscillations look about the same. They’re one mode.

The other is … not quite antisymmetric. In this mode, the center block moves in one direction and the outer blocks move in the other, just enough to keep momentum conserved. Eventually the center block switches direction and swings the other way. But the outer blocks switch direction and swing the other way too. If you’re having trouble imagining this, imagine looking at it from the outer blocks’ point of view. To them, it’s just the center block wobbling back and forth. That’s the other mode.

And it turns out? It doesn’t matter how you started these blocks moving. The movement looks like a combination of the symmetric and the not-quite-antisymmetric modes. So if you know how the symmetric mode evolves, and how the not-quite-antisymmetric mode evolves? Then you know how every possible arrangement of this system evolves.

So here’s where we get to quantum mechanics. Suppose we know the quantum mechanics description of a system at some time. This we can do as a vector. And we know the Hamiltonian, the description of all the potential and kinetic energy, for how the system evolves. The evolution in time of our quantum mechanics description we can see as a unitary matrix multiplied by this vector.

The Hamiltonian, by itself, won’t (normally) be a Unitary Matrix. It gets the boring name H. It’ll be some complicated messy thing. But perhaps we can find a Unitary Matrix U, so that UHU^{\dagger} is a diagonal matrix. And then that’s great. The original H is hard to work with. The diagonalized version? That one we can almost always work with. And then we can go from solutions on the diagonalized version back to solutions on the original. (If the function \psi describes the evolution of UHU^{\dagger} , then U^{\dagger}\psi U describes the evolution of H .) The work that U (and U^{\dagger} ) does to H is basically what we did with that three-block, two-spring model. It’s picking out the modes, and letting us figure out their behavior. Then put that together to work out the behavior of what we’re interested in.

There are other uses, besides time-evolution. For instance, an important part of quantum mechanics and thermodynamics is that we can swap particles of the same type. Like, there’s no telling an electron that’s on your nose from an electron that’s in one of the reflective mirrors the Apollo astronauts left on the Moon. If they swapped positions, somehow, we wouldn’t know. It’s important for calculating things like entropy that we consider this possibility. Two particles swapping positions is a permutation. We can describe that as multiplying the vector that describes what every electron on the Earth and Moon is doing by a Unitary Matrix. Here it’s a matrix that does nothing but swap the descriptions of these two electrons. I concede this doesn’t sound thrilling. But anything that goes into calculating entropy is first-rank important.

As with time-evolution and with permutation, though, any symmetry matches a Unitary Matrix. This includes obvious things like reflecting across a plane. But it also covers, like, being displaced a set distance. And some outright obscure symmetries too, such as the phase of the state function \Psi . I don’t have a good way to describe what this is, physically; we can’t observe it directly. This symmetry, though, manifests as the conservation of electric charge, a thing we rather like.

This, then, is the sort of problem that draws Unitary Matrixes to our attention.


Thank you for reading. This and all of my 2020 A-to-Z essays should be at this link. All the essays from every A-to-Z series should be at this link. Next week, I hope to have something to say for the letter V.

Reading the Comics, April 6, 2020: My Perennials Edition


As much as everything is still happening, and so much, there’s still comic strips. I’m fortunately able here to focus just on the comics that discuss some mathematical theme, so let’s get started in exploring last week’s reading. Worth deeper discussion are the comics that turn up here all the time.

Lincoln Peirce’s Big Nate for the 5th is a casual mention. Nate wants to get out of having to do his mathematics homework. This really could be any subject as long as it fit the word balloon.

John Hambrock’s The Brilliant Mind of Edison Lee for the 6th is a funny-answers-to-story-problems joke. Edison Lee’s answer disregards the actual wording of the question, which supposes the group is travelling at an average 70 miles per hour. The number of stops doesn’t matter in this case.

Mark Anderson’s Andertoons for the 6th is the Mark Anderson’s Andertoons for the week. In it Wavehead gives the “just use a calculator” answer for geometry problems.

On the blackboard: Perimeter, with a quadrilateral drawn, the sides labelled A, B, C, and D, and the formula A + B + C + D on the board. Wavehead asks the teacher, 'Or you could just walk around thet edge and let your fitness tracker tell you the distance.'
Mark Anderson’s Andertoons for the 6th of April, 2020. I haven’t mentioned this strip in two days. Essays featuring Andertoons are at this link, though.

Not much to talk about there. But there is a fascinating thing about perimeters that you learn if you go far enough in Calculus. You have to get into multivariable calculus, something where you integrate a function that has at least two independent variables. When you do this, you can find the integral evaluated over a curve. If it’s a closed curve, something that loops around back to itself, then you can do something magic. Integrating the correct function on the curve around a shape will tell you the enclosed area.

And this is an example of one of the amazing things in multivariable calculus. It tells us that integrals over a boundary can tell us something about the integral within a volume, and vice-versa. It can be worth figuring out whether your integral is better solved by looking at the boundaries or at the interiors.

Heron’s Formula, for the area of a triangle based on the lengths of its sides, is an expression of this calculation. I don’t know of a formula exactly like that for the perimeter of a quadrilateral, but there are similar formulas if you know the lengths of the sides and of the diagonals.

Richard Thompson’s Cul de Sac rerun for the 6th sees Petey working on his mathematics homework. As with the Big Nate strip, it could be any subject.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 5th depicts, fairly, the sorts of things that excite mathematicians. The number discussed here is about algorithmic complexity. This is the study of how long it takes to do an algorithm. How long always depends on how big a problem you are working on; to sort four items takes less time than sorting four million items. Of interest here is how much the time to do work grows with the size of whatever you’re working on.

Caption: 'Mathematicians are weird.' Mathematician: 'You know that thing that was 2.3728642?' Group of mathematicians: 'Yes?' Mathematician; 'I got it down to 2.3728639.' The mathematicians burst out into thunderous applause.
Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 5th of April, 2020. I haven’t mentioned this strip in two days. Essays featuring Saturday Morning Breakfast Cereal are at this link, though.

The mathematician’s particular example, and I thank dtpimentel in the comments for finding this, is about the Coppersmith–Winograd algorithm. This is a scheme for doing matrix multiplication, a particular kind of multiplication and addition of squares of numbers. The squares have some number N rows and N columns. It’s thought that there exists some way to do matrix multiplication in the order of N2 time, that is, if it takes 10 time units to multiply matrices of three rows and three columns together, we should expect it takes 40 time units to multiply matrices of six rows and six columns together. The matrix multiplication you learn in linear algebra takes on the order of N3 time, so, it would take like 80 time units.

We don’t know the way to do that. The Coppersmith–Winograd algorithm was thought, after Virginia Vassilevska Williams’s work in 2011, to take something like N2.3728642 steps. So that six-rows-six-columns multiplication would take slightly over 51.796 844 time units. In 2014, François le Gall found it was no worse than N2.3728639 steps, so this would take slightly over 51.796 833 time units. The improvement doesn’t seem like much, but on tiny problems it never does. On big problems, the improvement’s worth it. And, sometimes, you make a good chunk of progress at once.


I’ll have some more comic strips to discuss in an essay at this link, sometime later this week. Thanks for reading.

Reading the Comics, February 24, 2018: My One Boring Linear Algebra Anecdote Edition


Wait for it.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 21st mentions mathematics — geometry, primarily — as something a substitute teacher has tried teaching with the use of a cucumber and condom. These aren’t terrible examples to use to make concrete the difference between volumes and surface areas. There are limitations, though. It’s possible to construct a shape that has a finite volume but an infinitely large surface area, albeit not using cucumbers.

There’s also a mention of the spring constant, and physics. This isn’t explicitly mathematical. But the description of movement on a spring are about the first interesting differential equation of mathematical physics. The solution is that of simple harmonic motion. I don’t think anyone taking the subject for the first time would guess at the answer. But it’s easy enough to verify it’s right. And this motion — sine waves — just turns up everywhere in mathematical physics.

Bud Blake’s Tiger rerun for the 23rd just mentions mathematics as a topic Hugo finds challenging, and what’s challenging about it. So a personal story: when I took Intro to Linear Algebra my freshman year one day I spaced on the fact we had an exam. So, I put the textbook on the shelf under my desk, and then forgot to take it when I left. The book disappeared, of course, and the professor never heard of it being turned in to lost-and-found or anything. Fortunately the homework was handwritten questions passed out on photocopies (ask your parents), so I could still do the assignments, but for all those, you know, definitions and examples I had to rely on my own notes. I don’t know why I couldn’t ask a classmate. Shyness, probably. Came through all right, though.

Hugo: 'These math problems we got for homework are gonna be hard to do.' Tiger: 'Because you don't understand them?' Hugo: 'Because I brought home my history book by mistake.'
Bud Blake’s Tiger rerun for the 23rd of February, 2018. Whose house are they at? I mean, did Tiger bring his dog to Hugo’s, or did Hugo bring his homework to Tiger’s house? I guess either’s not that odd, especially if they just got out of school, but then Hugo’s fussing with his homework when he’s right out of school and with Tiger?

Cathy Law’s Claw for the 23rd technically qualifies as an anthropomorphic-numerals joke, in this panel about the smothering of education by the infection of guns into American culture.

Jim Meddick’s Monty for the 23rd has wealthy child Wedgwick unsatisfied with a mere ball of snow. He instead has a snow Truncated Icosahedron (the hyphens in Jarvis’s word balloon may baffle the innocent reader). This is a real shape, one that’s been known for a very long time. It’s one of the Archimedean Solids, a set of 13 solids that have convex shapes (no holes or indents or anything) and have all vertices the same, the identical number of edges coming in to each point in the same relative directions. The truncated icosahedron you maybe also know as the soccer ball shape, at least for those old-style soccer balls made of patches that were hexagons and pentagons. An actual truncated icosahedron needs twelve pentagons, so the figure drawn in the third panel isn’t quite right. At least one pentagonal face would be visible. But that’s also tricky to draw. The aerodynamics of a truncated icosahedron are surely different from those of a sphere. But in snowball-fight conditions, probably not different enough to even notice.

Mark Litzler’s Joe Vanilla for the 24th uses a blackboard full of formulas to represent an overcomplicated answer. The formulas look, offhand, like gibberish to me. But I’ll admit uncertainty since the odd capitalization of “iG(p)” at the start makes me think of some deeper group theory or knot theory symbols. And to see an “m + p” and an “m – p” makes me think of quantum mechanics of atomic orbitals. (But then an “m – p2” is weird.) So if this were anything I’d say it was some quantum chemistry formula. But my gut says if Litzler did take the blackboard symbols from anything, it was without going back to references. (Which he has no need to do, I should point out; the joke wouldn’t be any stronger — or weaker — if the blackboard meant anything.)

The Summer 2017 Mathematics A To Z: Jordan Canonical Form


I made a mistake! I thought we had got to the end of the block of A To Z topics suggested by Gaurish, of the For The Love Of Mathematics blog. Not so and, indeed, I wonder if it wouldn’t be a viable writing strategy around here for me to just ask Gaurish to throw out topics and I have two weeks to write about them. I don’t think there’s a single unpromising one in the set.

Summer 2017 Mathematics A to Z, featuring a coati (it's kind of the Latin American raccoon) looking over alphabet blocks, with a lot of equations in the background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patreon for those able to support his work. He’s also open for commissions, starting from US$10.

Jordan Canonical Form.

Before you ask, yes, this is named for the Camille Jordan.

So this is a thing from algebra. Particularly, linear algebra. And more particularly, matrices. Matrices are so much of linear algebra that you could be forgiven thinking they’re all of linear algebra. The thing is, matrices are a really good way of describing linear transformations. That is, where you take a block of space and stretch it out, or squash it down, or rotate it, or do some combination of these things. And stretching and squashing and rotating is a lot of what you’d ever want to do. Refer to any book on how to draw animated cartoons. The only thing matrices can’t do is have their eyes bug out huge when an attractive region of space walks past.

Thing about a matrix is if you want to do something with it, you’re going to write it as a grid of numbers. It doesn’t have to be a grid of numbers. But about all the matrices anyone does anything with are grids of numbers. And that’s fine. They do an incredible lot of stuff. What’s not fine is that on looking at a huge block of numbers, the mind sees: huh. That’s a big block of numbers. Good luck finding what’s meaningful in them. To help find meaning we have a set of standard forms. We call them “canonical” or “normal” or some other approving term. They rearrange and change the terms in the matrix so that more interesting stuff is more obvious.

Now you’re justified asking: how can we rearrange and change the terms in a matrix without changing what the matrix is? We can get away with doing this because we can show some rearrangements don’t change what we’re interested in. That covers the “how dare we” part of “how”. We do it by using matrix multiplication. You might remember from high school algebra that matrix multiplication is this agonizing process of multiplying every pair of numbers that ever existed together, then adding them all up, and then maybe you multiply something by minus one because you’re thinking of determinants, and it all comes out wrong anyway and you have to do it over? Yeah. Well, matrix multiplication is defined hard because it makes stuff like this work out. So that covers the “by what technique” part of “how”. We start out with some matrix, let me imaginatively name it A . And then we find some transformation matrix for which, eh, let’s say P is a good enough name. I’ll say why in a moment. Then we use that matrix and its multiplicative inverse P^{-1} . And we evaluate the product P^{-1} A P . This won’t just be the same old matrix we started with. Not usually. Promise. But what this will be, if we chose our matrix P correctly, is some new matrix that’s easier to read.

The matrices involved here have to follow some rules. Most important, they’re all going to be square matrices. There’ll be more rules that your linear algebra textbook will tell you. Or your instructor will, after checking the textbook.

So what makes a matrix easy to read? Zeroes. Lots and lots of zeroes. When we have a standardized form of a matrix it’s nearly all zeroes. This is for a good reason: zeroes are easy to multiply stuff by. And they’re easy to add stuff to. And almost everything we do with matrices, as a calculation, is a lot of multiplication and addition of the numbers in the matrix.

What also makes a matrix easy to read? Everything important being on the diagonal. The diagonal is one of the two things you would imagine if you were told “here’s a grid of numbers, pick out the diagonal”. In particular it’s the one that goes from the upper left to the bottom right, that is, row one column one, and row two column two, and row three column three, and so on up to row 86 column 86 (or whatever). If everything is on the diagonal the matrix is incredibly easy to work with. If it can’t all be on the diagonal at least everything should be close to it. As close as possible.

In the Jordan Canonical Form not everything is on the diagonal. I mean, it can be, but you shouldn’t count on that. But everything either will be on the diagonal or else it’ll be one row up from the diagonal. That is, row one column two, row two column three, row 85 column 86. Like that. There’s two other important pieces.

First is the thing in the row above the diagonal will be either 1 or 0. Second is that on the diagonal you’ll have a sequence of all the same number. Like, you’ll get four instances of the number ‘2’ along this string of the diagonal. Third is that you’ll get a 1 above all but the row above first instance of this particular number. Fourth is that you’ll get a 0 in the row above the first instance of this number.

Yeah, that’s fussy to visualize. This is one of those things easiest to show in a picture. A Jordan canonical form is a matrix that looks like this:

2 1 0 0 0 0 0 0 0 0 0 0
0 2 1 0 0 0 0 0 0 0 0 0
0 0 2 1 0 0 0 0 0 0 0 0
0 0 0 2 0 0 0 0 0 0 0 0
0 0 0 0 3 1 0 0 0 0 0 0
0 0 0 0 0 3 0 0 0 0 0 0
0 0 0 0 0 0 4 1 0 0 0 0
0 0 0 0 0 0 0 4 1 0 0 0
0 0 0 0 0 0 0 0 4 0 0 0
0 0 0 0 0 0 0 0 0 -1 0 0
0 0 0 0 0 0 0 0 0 0 -2 1
0 0 0 0 0 0 0 0 0 0 0 -2

This may have you dazzled. It dazzles mathematicians too. When we have to write a matrix that’s almost all zeroes like this we drop nearly all the zeroes. If we have to write anything we just write a really huge 0 in the upper-right and the lower-left corners.

What makes this the Jordan Canonical Form is that the matrix looks like it’s put together from what we call Jordan Blocks. Look around the diagonals. Here’s the first Jordan Block:

2 1 0 0
0 2 1 0
0 0 2 1
0 0 0 2

Here’s the second:

3 1
0 3

Here’s the third:

4 1 0
0 4 1
0 0 4

Here’s the fourth:

-1

And here’s the fifth:

-2 1
0 -2

And we can represent the whole matrix as this might-as-well-be-diagonal thing:

First Block 0 0 0 0
0 Second Block 0 0 0
0 0 Third Block 0 0
0 0 0 Fourth Block 0
0 0 0 0 Fifth Block

These blocks can be as small as a single number. They can be as big as however many rows and columns you like. Each individual block is some repeated number on the diagonal, and a repeated one in the row above the diagonal. You can call this the “superdiagonal”.

(Mathworld, and Wikipedia, assert that sometimes the row below the diagonal — the “subdiagonal” — gets the 1’s instead of the superdiagonal. That’s fine if you like it that way, and it won’t change any of the real work. I have not seen these subdiagonal 1’s in the wild. But I admit I don’t do a lot of this field and maybe there’s times it’s more convenient.)

Using the Jordan Canonical Form for a matrix is a lot like putting an object in a standard reference pose for photographing. This is a good metaphor. We get a Jordan Canonical Form by matrix multiplication, which works like rotating and scaling volumes of space. You can view the Jordan Canonical Form for a matrix as how you represent the original matrix from a new viewing angle that makes it easy to recognize. And this is why P is not a bad name for the matrix that does this work. We can see all this as “projecting” the matrix we started with into a new frame of reference. The new frame is maybe rotated and stretched and squashed and whatnot, compared to how we started. But it’s as valid a base. Projecting a mathematical object from one frame of reference to another usually involves calculating something that looks like P^{-1} A P so, projection. That’s our name.

Mathematicians will speak of “the” Jordan Canonical Form for a matrix as if there were such a thing. I don’t mean that Jordan Canonical Forms don’t exist. They exist just as much as matrices do. It’s the “the” that misleads. You can put the Jordan Blocks in any order and have as valid, and as useful, a Jordan Canonical Form. But it’s easy to swap the orders of these blocks around — it’s another matrix multiplication, and a blessedly easy one — so it doesn’t matter which form you have. Get any one and you have them all.

I haven’t said anything about what these numbers on the diagonal are. They’re the eigenvalues of the original matrix. I hope that clears things up.

Yeah, not to anyone who didn’t know what a Jordan Canonical Form was to start with. Rather than get into calculations let me go to well-established metaphor. Take a sample of an unknown chemical and set it on fire. Put the light from this through a prism and photograph the spectrum. There will be lines, interruptions in the progress of colors. The locations of those lines and how intense they are tell you what the chemical is made of, and in what proportions. These are much like the eigenvectors and eigenvalues of a matrix. The eigenvectors tell you what the matrix is made of, and the eigenvalues how much of the matrix is those. This stuff gets you very far in proving a lot of great stuff. And part of what makes the Jordan Canonical Form great is that you get the eigenvalues right there in neat order, right where anyone can see them.

So! All that’s left is finding the things. The best way to find the Jordan Canonical Form for a given matrix is to become an instructor for a class on linear algebra and assign it as homework. The second-best way is to give the problem to your TA, who will type it in to Mathematica and return the result. It’s too much work to do most of the time. Almost all the stuff you could learn from having the thing in the Jordan Canonical Form you work out in the process of finding the matrix P that would let you calculate what the Jordan Canonical Form is. And once you had that, why go on?

Where the Jordan Canonical Form shines is in doing proofs about what matrices can do. We can always put a square matrix into a Jordan Canonical Form. So if we want to show something is true about matrices in general, we can show that it’s true for the simpler-to-work-with Jordan Canonical Form. Then show that shifting a matrix to or from the Jordan Canonical Form doesn’t change whether the thing we’re interested in is true. It exists in that strange space: it is quite useful, but never on a specific problem.

Oh, all right. Yes, it’s the same Camille Jordan of the Jordan Curve and also of the Jordan Curve Theorem. That fellow.

Reading the Comics, April 15, 2017: Extended Week Edition


It turns out last Saturday only had the one comic strip that was even remotely on point for me. And it wasn’t very on point either, but since it’s one of the Creators.com strips I’ve got the strip to show. That’s enough for me.

Henry Scarpelli and Craig Boldman’s Archie for the 8th is just about how algebra hurts. Some days I agree.

'Ugh! Achey head! All blocked up! Throbbing! Completely stuffed!' 'Sounds like sinuses!' 'No. Too much algebra!'
Henry Scarpelli and Craig Boldman’s Archie for the 8th of April, 2017. Do you suppose Archie knew that Dilton was listening there, or was he just emoting his fatigue to himself?

Ruben Bolling’s Super-Fun-Pak Comix for the 8th is an installation of They Came From The Third Dimension. “Dimension” is one of those oft-used words that’s come loose of any technical definition. We use it in mathematics all the time, at least once we get into Introduction to Linear Algebra. That’s the course that talks about how blocks of space can be stretched and squashed and twisted into each other. You’d expect this to be a warmup act to geometry, and I guess it’s relevant. But where it really pays off is in studying differential equations and how systems of stuff changes over time. When you get introduced to dimensions in linear algebra they describe degrees of freedom, or how much information you need about a problem to pin down exactly one solution.

It does give mathematicians cause to talk about “dimensions of space”, though, and these are intuitively at least like the two- and three-dimensional spaces that, you know, stuff moves in. That there could be more dimensions of space, ordinarily inaccessible, is an old enough idea we don’t really notice it. Perhaps it’s hidden somewhere too.

Amanda El-Dweek’s Amanda the Great of the 9th started a story with the adult Becky needing to take a mathematics qualification exam. It seems to be prerequisite to enrolling in some new classes. It’s a typical set of mathematics anxiety jokes in the service of a story comic. One might tsk Becky for going through university without ever having a proper mathematics class, but then, I got through university without ever taking a philosophy class that really challenged me. Not that I didn’t take the classes seriously, but that I took stuff like Intro to Logic that I was already conversant in. We all cut corners. It’s a shame not to use chances like that, but there’s always so much to do.

Mark Anderson’s Andertoons for the 10th relieves the worry that Mark Anderson’s Andertoons might not have got in an appearance this week. It’s your common kid at the chalkboard sort of problem, this one a kid with no idea where to put the decimal. As always happens I’m sympathetic. The rules about where to move decimals in this kind of multiplication come out really weird if the last digit, or worse, digits in the product are zeroes.

Mel Henze’s Gentle Creatures is in reruns. The strip from the 10th is part of a story I’m so sure I’ve featured here before that I’m not even going to look up when it aired. But it uses your standard story problem to stand in for science-fiction gadget mathematics calculation.

Dave Blazek’s Loose Parts for the 12th is the natural extension of sleep numbers. Yes, I’m relieved to see Dave Blazek’s Loose Parts around here again too. Feels weird when it’s not.

Bill Watterson’s Calvin and Hobbes rerun for the 13th is a resisting-the-story-problem joke. But Calvin resists so very well.

John Deering’s Strange Brew for the 13th is a “math club” joke featuring horses. Oh, it’s a big silly one, but who doesn’t like those too?

Dan Thompson’s Brevity for the 14th is one of the small set of punning jokes you can make using mathematician names. Good for the wall of a mathematics teacher’s classroom.

Shaenon K Garrity and Jefferey C Wells’s Skin Horse for the 14th is set inside a virtual reality game. (This is why there’s talk about duplicating objects.) Within the game, the characters are playing that game where you start with a set number (in this case 20) tokens and take turn removing a couple of them. The “rigged” part of it is that the house can, by perfect play, force a win every time. It’s a bit of game theory that creeps into recreational mathematics books and that I imagine is imprinted in the minds of people who grow up to design games.

The End 2016 Mathematics A To Z: Kernel


I told you that Image thing would reappear. Meanwhile I learned something about myself in writing this.

Kernel.

I want to talk about functions again. I’ve been keeping like a proper mathematician to a nice general idea of what a function is. The sort where a function’s this rule matching stuff in a set called the domain with stuff in a set called the range. And I’ve tried not to commit myself to saying anything about what that domain and range are. They could be numbers. They could be other functions. They could be the set of DVDs you own but haven’t watched in more than two years. They could be collections socks. Haven’t said.

But we know what functions anyone cares about. They’re stuff that have domains and ranges that are numbers. Preferably real numbers. Complex-valued numbers if we must. If we look at more exotic sets they’re ones that stick close to being numbers: vectors made up of an ordered set of numbers. Matrices of numbers. Functions that are themselves about numbers. Maybe we’ll get to something exotic like a rotation, but then what is a rotation but spinning something a certain number of degrees? There are a bunch of unavoidably common domains and ranges.

Fine, then. I’ll stick to functions with ranges that look enough like regular old numbers. By “enough” I mean they have a zero. That is, something that works like zero does. You know, add it to something else and that something else isn’t changed. That’s all I need.

A natural thing to wonder about a function — hold on. “Natural” is the wrong word. Something we learn to wonder about in functions, in pre-algebra class where they’re all polynomials, is where the zeroes are. They’re generally not at zero. Why would we say “zeroes” to mean “zero”? That could let non-mathematicians think they knew what we were on about. By the “zeroes” we mean the things in the domain that get matched to the zero in the range. It might be zero; no reason it couldn’t, until we know what the function’s rule is. Just we can’t count on that.

A polynomial we know has … well, it might have zero zeroes. Might have no zeroes. It might have one, or two, or so on. If it’s an n-th degree polynomial it can have up to n zeroes. And if it’s not a polynomial? Well, then it could have any conceivable number of zeroes and nobody is going to give you a nice little formula to say where they all are. It’s not that we’re being mean. It’s just that there isn’t a nice little formula that works for all possibilities. There aren’t even nice little formulas that work for all polynomials. You have to find zeroes by thinking about the problem. Sorry.

But! Suppose you have a collection of all the zeroes for your function. That’s all the points in the domain that match with zero in the range. Then we have a new name for the thing you have. And that’s the kernel of your function. It’s the biggest subset in the domain with an image that’s just the zero in the range.

So we have a name for the zeroes that isn’t just “the zeroes”. What does this get us?

If we don’t know anything about the kind of function we have, not much. If the function belongs to some common kinds of functions, though, it tells us stuff.

For example. Suppose the function has domain and range that are vectors. And that the function is linear, which is to say, easy to deal with. Let me call the function ‘f’. And let me pick out two things in the domain. I’ll call them ‘x’ and ‘y’ because I’m writing this after Thanksgiving dinner and can’t work up a cleverer name for anything. If f is linear then f(x + y) is the same thing as f(x) + f(y). And now something magic happens. If x and y are both in the kernel, then x + y has to be in the kernel too. Think about it. Meanwhile, if x is in the kernel but y isn’t, then f(x + y) is f(y). Again think about it.

What we can see is that the domain fractures into two directions. One of them, the direction of the kernel, is invisible to the function. You can move however much you like in that direction and f can’t see it. The other direction, perpendicular (“orthogonal”, we say in the trade) to the kernel, is visible. Everything that might change changes in that direction.

This idea threads through vector spaces, and we study a lot of things that turn out to look like vector spaces. It keeps surprising us by letting us solve problems, or find the best-possible approximate solutions. This kernel gives us room to match some fiddly conditions without breaking the real solution. The size of the null space alone can tell us whether some problems are solvable, or whether they’ll have infinitely large sets of solutions.

In this vector-space construct the kernel often takes on another name, the “null space”. This means the same thing. But it reminds us that superhero comics writers miss out on many excellent pieces of terminology by not taking advanced courses in mathematics.

Kernels also appear in group theory, whenever we get into rings. We’re always working with rings. They’re nearly as unavoidable as vector spaces.

You know how you can divide the whole numbers into odd and even? And you can do some neat tricks with that for some problems? You can do that with every ring, using the kernel as a dividing point. This gives us information about how the ring is shaped, and what other structures might look like the ring. This often lets us turn proofs that might be hard into a collection of proofs on individual cases that are, at least, doable. Tricks about odd and even numbers become, in trained hands, subtle proofs of surprising results.

We see vector spaces and rings all over the place in mathematics. Some of that’s selection bias. Vector spaces capture a lot of what’s important about geometry. Rings capture a lot of what’s important about arithmetic. We have understandings of geometry and arithmetic that transcend even our species. Raccoons understand space. Crows understand number. When we look to do mathematics we look for patterns we understand, and these are major patterns we understand. And there are kernels that matter to each of them.

Some mathematical ideas inspire metaphors to me. Kernels are one. Kernels feel to me like the process of holding a polarized lens up to a crystal. This lets one see how the crystal is put together. I realize writing this down that my metaphor is unclear: is the kernel the lens or the structure seen in the crystal? I suppose the function has to be the lens, with the kernel the crystallization planes made clear under it. It’s curious I had enjoyed this feeling about kernels and functions for so long without making it precise. Feelings about mathematical structures can be like that.

The End 2016 Mathematics A To Z: Algebra


So let me start the End 2016 Mathematics A To Z with a word everybody figures they know. As will happen, everybody’s right and everybody’s wrong about that.

Algebra.

Everybody knows what algebra is. It’s the point where suddenly mathematics involves spelling. Instead of long division we’re on a never-ending search for ‘x’. Years later we pass along gifs of either someone saying “stop asking us to find your ex” or someone who’s circled the letter ‘x’ and written “there it is”. And make jokes about how we got through life without using algebra. And we know it’s the thing mathematicians are always doing.

Mathematicians aren’t always doing that. I expect the average mathematician would say she almost never does that. That’s a bit of a fib. We have a lot of work where we do stuff that would be recognizable as high school algebra. It’s just we don’t really care about that. We’re doing that because it’s how we get the problem we are interested in done. the most recent few pieces in my “Why Stuff can Orbit” series include a bunch of high school algebra-style work. But that was just because it was the easiest way to answer some calculus-inspired questions.

Still, “algebra” is a much-used word. It comes back around the second or third year of a mathematics major’s career. It comes in two forms in undergraduate life. One form is “linear algebra”, which is a great subject. That field’s about how stuff moves. You get to imagine space as this stretchy material. You can stretch it out. You can squash it down. You can stretch it in some directions and squash it in others. You can rotate it. These are simple things to build on. You can spend a whole career building on that. It becomes practical in surprising ways. For example, it’s the field of study behind finding equations that best match some complicated, messy real data.

The second form is “abstract algebra”, which comes in about the same time. This one is alien and baffling for a long while. It doesn’t help that the books all call it Introduction to Algebra or just Algebra and all your friends think you’re slumming. The mathematics major stumbles through confusing definitions and theorems that ought to sound comforting. (“Fermat’s Little Theorem”? That’s a good thing, right?) But the confusion passes, in time. There’s a beautiful subject here, one of my favorites. I’ve talked about it a lot.

We start with something that looks like the loosest cartoon of arithmetic. We get a bunch of things we can add together, and an ‘addition’ operation. This lets us do a lot of stuff that looks like addition modulo numbers. Then we go on to stuff that looks like picking up floor tiles and rotating them. Add in something that we call ‘multiplication’ and we get rings. This is a bit more like normal arithmetic. Add in some other stuff and we get ‘fields’ and other structures. We can keep falling back on arithmetic and on rotating tiles to build our intuition about what we’re doing. This trains mathematicians to look for particular patterns in new, abstract constructs.

Linear algebra is not an abstract-algebra sort of algebra. Sorry about that.

And there’s another kind of algebra that mathematicians talk about. At least once they get into grad school they do. There’s a huge family of these kinds of algebras. The family trait for them is that they share a particular rule about how you can multiply their elements together. I won’t get into that here. There are many kinds of these algebras. One that I keep trying to study on my own and crash hard against is Lie Algebra. That’s named for the Norwegian mathematician Sophus Lie. Pronounce it “lee”, as in “leaning”. You can understand quantum mechanics much better if you’re comfortable with Lie Algebras and so now you know one of my weaknesses. Another kind is the Clifford Algebra. This lets us create something called a “hypercomplex number”. It isn’t much like a complex number. Sorry. Clifford Algebra does lend to a construct called spinors. These help physicists understand the behavior of bosons and fermions. Every bit of matter seems to be either a boson or a fermion. So you see why this is something people might like to understand.

Boolean Algebra is the algebra of this type that a normal person is likely to have heard of. It’s about what we can build using two values and a few operations. Those values by tradition we call True and False, or 1 and 0. The operations we call things like ‘and’ and ‘or’ and ‘not’. It doesn’t sound like much. It gives us computational logic. Isn’t that amazing stuff?

So if someone says “algebra” she might mean any of these. A normal person in a non-academic context probably means high school algebra. A mathematician speaking without further context probably means abstract algebra. If you hear something about “matrices” it’s more likely that she’s speaking of linear algebra. But abstract algebra can’t be ruled out yet. If you hear a word like “eigenvector” or “eigenvalue” or anything else starting “eigen” (or “characteristic”) she’s more probably speaking of abstract algebra. And if there’s someone’s name before the word “algebra” then she’s probably speaking of the last of these. This is not a perfect guide. But it is the sort of context mathematicians expect other mathematicians notice.

A Leap Day 2016 Mathematics A To Z: Basis


Today’s glossary term is one that turns up in many areas of mathematics. But these all share some connotations. So I mean to start with the easiest one to understand.

Basis.

Suppose you are somewhere. Most of us are. Where is something else?

That isn’t hard to answer if conditions are right. If we’re allowed to point and the something else is in sight, we’re done. It’s when pointing and following the line of sight breaks down that we’re in trouble. We’re also in trouble if we want to say how to get from that something to yet another spot. How can we guide someone from one point to another?

We have a good answer from everyday life. We can impose some order, some direction, on space. We’re familiar with this from the cardinal directions. We say where things on the surface of the Earth are by how far they are north or south, east or west, from something else. The scheme breaks down a bit if we’re at the North or the South pole exactly, but there we can fall back on pointing.

When we start using north and south and east and west as directions we are choosing basis vectors. Vectors are directions in how far to move and in what direction. Suppose we have two vectors that aren’t pointing in the same direction. Then we can describe any two-dimensional movement using them. We can say “go this far in the direction of the first vector and also that far in the direction of the second vector”. With the cardinal directions, we consider north and east, or east and south, or south and west, or west and north to be a pair of vectors going in different directions.

(North and south, in this context, are the same thing. “Go twenty paces north” says the same thing as “go negative twenty paces south”. Most mathematicians don’t pull this sort of stunt when telling you how to get somewhere unless they’re trying to be funny without succeeding.)

A basis vector is just a direction, and distance in that direction, that we’ve decided to be a reference for telling different points in space apart. A basis set, or basis, is the collection of all the basis vectors we need. What do we need? We need enough basis vectors to get to all the points in whatever space we’re working with.

(If you are going to ask about doesn’t “east” point in different directions as we go around the surface of the Earth, you’re doing very well. Please pretend we never move so far from where we start that anyone could notice the difference. If you can’t do that, please pretend the Earth has been smooshed into a huge flat square with north at one end and we’re only just now noticing.)

We are free to choose whatever basis vectors we like. The worst that can happen if we choose a lousy basis is that we have to write out more things than we otherwise would. Our work won’t be less true, it’ll just be more tedious. But there are some properties that often make for a good basis.

One is that the basis should relate to the problem you’re doing. Suppose you were in one of mathematicians’ favorite places, midtown Manhattan. There is a compelling grid here of streets running north-south and avenues running east-west. (Broadway we ignore as an implementation error retained for reasons of backwards compatibility.) Well, we pretend they run north-south and east-west. They’re actually a good bit clockwise of north-south and east-west. They do that to better match the geography of the island. A “north” street runs about parallel to the way Manhattan’s long dimension runs. In the circumstance, it would be daft to describe directions by true north or true east. We would say to go so many streets “north” and so many avenues “east”.

Purely mathematical problems aren’t concerned with streets and avenues. But there will often be preferred directions. Mathematicians often look at the way a process alters shapes or redirects forces. There’ll be some directions where the alterations are biggest. There’ll be some where the alterations are shortest. Those directions are probably good choices for a basis. They stand out as important.

We also tend to like basis vectors that are a unit length. That is, their size is 1 in some convenient unit. That’s for the same reason it’s easier to say how expensive something is if it costs 45 dollars instead of nine five-dollar bills. Or if you’re told it was 180 quarter-dollars. The length of your basis vector is just a scaling factor. But the more factors you have to work with the more likely you are to misunderstand something.

And we tend to like basis vectors that are perpendicular to one another. They don’t have to be. But if they are then it’s easier to divide up our work. We can study each direction separately. Mathematicians tend to like techniques that let us divide problems up into smaller ones that we can study separately.

I’ve described basis sets using vectors. They have intuitive appeal. It’s easy to understand directions of things in space. But the idea carries across into other things. For example, we can build functions out of other functions. So we can choose a set of basis functions. We can multiply them by real numbers (scalars) and add them together. This makes whatever function we’re interested in into a kind of weighted average of basis functions.

Why do that? Well, again, we often study processes that change shapes and directions. If we choose a basis well, though, the process changes the basis vectors in easy to describe ways. And many interesting processes let us describe the changing of an arbitrary function as the weighted sum of the changes in the basis vectors. By solving a couple of simple problems we get the ability to solve every interesting problem.

We can even define something that works like the angle between functions. And something that works a lot like perpendicularity for functions.

And this carries on to other mathematical constructs. We look for ways to impose some order, some direction, on whatever structure we’re looking at. We’re often successful, and can work with unreal things using tools like those that let us find our place in a city.

The Set Tour, Part 7: Matrices


I feel a bit odd about this week’s guest in the Set Tour. I’ve been mostly concentrating on sets that get used as the domains or ranges for functions a lot. The ones I want to talk about here don’t tend to serve the role of domain or range. But they are used a great deal in some interesting functions. So I loosen my rule about what to talk about.

Rm x n and Cm x n

Rm x n might explain itself by this point. If it doesn’t, then this may help: the “x” here is the multiplication symbol. “m” and “n” are positive whole numbers. They might be the same number; they might be different. So, are we done here?

Maybe not quite. I was fibbing a little when I said “x” was the multiplication symbol. R2 x 3 is not a longer way of saying R6, an ordered collection of six real-valued numbers. The x does represent a kind of product, though. What we mean by R2 x 3 is an ordered collection, two rows by three columns, of real-valued numbers. Say the “x” here aloud as “by” and you’re pronouncing it correctly.

What we get is called a “matrix”. If we put into it only real-valued numbers, it’s a “real matrix”, or a “matrix of reals”. Sometimes mathematical terminology isn’t so hard to follow. Just as with vectors, Rn, it matters just how the numbers are organized. R2 x 3 means something completely different from what R3 x 2 means. And swapping which positions the numbers in the matrix occupy changes what matrix we have, as you might expect.

You can add together matrices, exactly as you can add together vectors. The same rules even apply. You can only add together two matrices of the same size. They have to have the same number of rows and the same number of columns. You add them by adding together the numbers in the corresponding slots. It’s exactly what you would do if you went in without preconceptions.

You can also multiply a matrix by a single number. We called this scalar multiplication back when we were working with vectors. With matrices, we call this scalar multiplication. If it strikes you that we could see vectors as a kind of matrix, yes, we can. Sometimes that’s wise. We can see a vector as a matrix in the set R1 x n or as one in the set Rn x 1, depending on just what we mean to do.

It’s trickier to multiply two matrices together. As with vectors multiplying the numbers in corresponding positions together doesn’t give us anything. What we do instead is a time-consuming but not actually hard process. But according to its rules, something in Rm x n we can multiply by something in Rn x k. “k” is another whole number. The second thing has to have exactly as many rows as the first thing has columns. What we get is a matrix in Rm x k.

I grant you maybe didn’t see that coming. Also a potential complication: if you can multiply something in Rm x n by something in Rn x k, can you multiply the thing in Rn x k by the thing in Rm x n? … No, not unless k and m are the same number. Even if they are, you can’t count on getting the same product. Matrices are weird things this way. They’re also gateways to weirder things. But it is a productive weirdness, and I’ll explain why in a few paragraphs.

A matrix is a way of organizing terms. Those terms can be anything. Real matrices are surely the most common kind of matrix, at least in mathematical usage. Next in common use would be complex-valued matrices, much like how we get complex-valued vectors. These are written Cm x n. A complex-valued matrix is different from a real-valued matrix. The terms inside the matrix can be complex-valued numbers, instead of real-valued numbers. Again, sometimes, these mathematical terms aren’t so tricky.

I’ve heard occasionally of people organizing matrices of other sets. The notation is similar. If you’re building a matrix of “m” rows and “n” columns out of the things you find inside a set we’ll call H, then you write that as Hm x n. I’m not saying you should do this, just that if you need to, that’s how to tell people what you’re doing.

Now. We don’t really have a lot of functions that use matrices as domains, and I can think of fewer that use matrices as ranges. There are a couple of valuable ones, ones so valuable they get special names like “eigenvalue” and “eigenvector”. (Don’t worry about what those are.) They take in Rm x n or Cm x n and return a set of real- or complex-valued numbers, or real- or complex-valued vectors. Not even those, actually. Eigenvectors and eigenfunctions are only meaningful if there are exactly as many rows as columns. That is, for Rm x m and Cm x m. These are known as “square” matrices, just as you might guess if you were shaken awake and ordered to say what you guessed a “square matrix” might be.

They’re important functions. There are some other important functions, with names like “rank” and “condition number” and the like. But they’re not many. I believe they’re not even thought of as functions, any more than we think of “the length of a vector” as primarily a function. They’re just properties of these matrices, that’s all.

So why are they worth knowing? Besides the joy that comes of knowing something, I mean?

Here’s one answer, and the one that I find most compelling. There is cultural bias in this: I come from an applications-heavy mathematical heritage. We like differential equations, which study how stuff changes in time and in space. It’s very easy to go from differential equations to ordered sets of equations. The first equation may describe how the position of particle 1 changes in time. It might describe how the velocity of the fluid moving past point 1 changes in time. It might describe how the temperature measured by sensor 1 changes as it moves. It doesn’t matter. We get a set of these equations together and we have a majestic set of differential equations.

Now, the dirty little secret of differential equations: we can’t solve them. Most interesting physical phenomena are nonlinear. Linear stuff is easy. Small change 1 has effect A; small change 2 has effect B. If we make small change 1 and small change 2 together, this has effect A plus B. Nonlinear stuff, though … it just doesn’t work. Small change 1 has effect A; small change 2 has effect B. Small change 1 and small change 2 together has effect … A plus B plus some weird A times B thing plus some effect C that nobody saw coming and then C does something with A and B and now maybe we’d best hide.

There are some nonlinear differential equations we can solve. Those are the result of heroic work and brilliant insights. Compared to all the things we would like to solve there’s not many of them. Methods to solve nonlinear differential equations are as precious as ways to slay krakens.

But here’s what we can do. What we usually like to know about in systems are equilibriums. Those are the conditions in which the system stops changing. Those are interesting. We can usually find those points by boring but not conceptually challenging calculations. If we can’t, we can declare x0 represents the equilibrium. If we still care, we leave calculating its actual values to the interested reader or hungry grad student.

But what’s really interesting is: what happens if we’re near but not exactly at the equilibrium? Sometimes, we stay near it. Think of pushing a swing. However good a push you give, it’s going to settle back to the boring old equilibrium of dangling straight down. Sometimes, we go racing away from it. Think of trying to balance a pencil on its tip; if we did this perfectly it would stay balanced. It never does. We’re never perfect, or there’s some wind or somebody walks by and the perfect balance is foiled. It falls down and doesn’t bounce back up. Sometimes, whether it it stays near or goes away depends on what way it’s away from the equilibrium.

And now we finally get back to matrices. Suppose we are starting out near an equilibrium. We can, usually, approximate the differential equations that describe what will happen. The approximation may only be good if we’re just a tiny bit away from the equilibrium, but that might be all we really want to know. That approximation will be some linear differential equations. (If they’re not, then we’re just wasting our time.) And that system of linear differential equations we can describe using matrices.

If we can write what we are interested in as a set of linear differential equations, then we have won. We can use the many powerful tools of matrix arithmetic — linear algebra, specifically — to tell us everything we want to know about the system. We can say whether a small push away from the equilibrium stays small, or whether it grows, or whether it depends. We can say how fast the small push shrinks, or grows (for a while). We can say how the system will change, approximately.

This is what I love in matrices. It’s not everything there is to them. But it’s enough to make matrices important to me.

Combining Matrices And Model Universes


I would like to resume talking about matrices and really old universes and the way nucleosynthesis in these model universes causes atoms to keep settling down to peculiar but unchanging distribution.

I’d already described how a matrix offers a nice way to organize elements, and in ways that encode information about the context of the elements by where they’re placed. That’s useful and saves some writing, certainly, although by itself it’s not that interesting. Matrices start to get really powerful when, first, the elements being stored are things on which you can do something like arithmetic with pairs of them. Here I mostly just mean that you can add together two elements, or multiply them, and get back something meaningful.

This typically means that the matrix is made up of a grid of numbers, although that isn’t actually required, just, really common if we’re trying to do mathematics.

Then you get the ability to add together and multiply together the matrices themselves, turning pairs of matrices into some new matrix, and building something that works a lot like arithmetic on these matrices.

Adding one matrix to another is done in almost the obvious way: add the element in the first row, first column of the first matrix to the element in the first row, first column of the second matrix; that’s the first row, first column of your new matrix. Then add the element in the first row, second column of the first matrix to the element in the first row, second column of the second matrix; that’s the first row, second column of the new matrix. Add the element in the second row, first column of the first matrix to the element in the second row, first column of the second matrix, and put that in the second row, first column of the new matrix. And so on.

This means you can only add together two matrices that are the same size — the same number of rows and of columns — but that doesn’t seem unreasonable.

You can also do something called scalar multiplication of a matrix, in which you multiply every element in the matrix by the same number. A scalar is just a number that isn’t part of a matrix. This multiplication is useful, not least because it lets us talk about how to subtract one matrix from another: to find the difference of the first matrix and the second, scalar-multiply the second matrix by -1, and then add the first to that product. But you can do scalar multiplication by any number, by two or minus pi or by zero if you feel like it.

I should say something about notation. When we want to write out these kinds of operations efficiently, of course, we turn to symbols to represent the matrices. We can, in principle, use any symbols, but by convention a matrix usually gets represented with a capital letter, A or B or M or P or the like. So to add matrix A to matrix B, with the result being matrix C, we can write out the equation “A + B = C”, which is about as simple as we could hope to see. Scalars are normally written in lowercase letters, often Greek letters, if we don’t know what the number is, so that the scalar multiplication of the number r and the matrix A would be the product “rA”, and we could write the difference between matrix A and matrix B as “A + (-1)B” or “A – B”.

Matrix multiplication, now, that is done by a process that sounds like doubletalk, and it takes a while of practice to do it right. But there are good reasons for doing it that way and we’ll get to one of those reasons by the end of this essay.

To multiply matrix A and matrix B together, we do multiply various pairs of elements from both matrix A and matrix B. The surprising thing is that we also add together sets of these products, per this rule.

Take the element in the first row, first column of A, and multiply it by the element in the first row, first column of B. Add to that the product of the element in the first row, second column of A and the second row, first column of B. Add to that total the product of the element in the first row, third column of A and the third row, second column of B, and so on. When you’ve run out of columns of A and rows of B, this total is the first row, first column of the product of the matrices A and B.

Plenty of work. But we have more to do. Take the product of the element in the first row, first column of A and the element in the first row, second column of B. Add to that the product of the element in the first row, second column of A and the element in the second row, second column of B. Add to that the product of the element in the first row, third column of A and the element in the third row, second column of B. And keep adding those up until you’re out of columns of A and rows of B. This total is the first row, second column of the product of matrices A and B.

This does mean that you can multiply matrices of different sizes, provided the first one has as many columns as the second has rows. And the product may be a completely different size from the first or second matrices. It also means it might be possible to multiply matrices in one order but not the other: if matrix A has four rows and three columns, and matrix B has three rows and two columns, then you can multiply A by B, but not B by A.

My recollection on learning this process was that this was crazy, and the workload ridiculous, and I imagine people who get this in Algebra II, and don’t go on to using mathematics later on, remember the process as nothing more than an unpleasant blur of doing a lot of multiplying and addition for some reason or other.

So here is one of the reasons why we do it this way. Let me define two matrices:

A = \left(\begin{tabular}{c c c}  3/4 & 0 & 2/5 \\  1/4 & 3/5 & 2/5 \\  0 & 2/5 & 1/5  \end{tabular}\right)

B = \left(\begin{tabular}{c} 100 \\ 0 \\ 0 \end{tabular}\right)

Then matrix A times B is

AB = \left(\begin{tabular}{c}  3/4 * 100 + 0 * 0 + 2/5 * 0 \\  1/4 * 100 + 3/5 * 0 + 2/5 * 0 \\  0 * 100 + 2/5 * 0 + 1/5 * 0  \end{tabular}\right) = \left(\begin{tabular}{c}  75 \\  25 \\  0  \end{tabular}\right)

You’ve seen those numbers before, of course: the matrix A contains the probabilities I put in my first model universe to describe the chances that over the course of a billion years a hydrogen atom would stay hydrogen, or become iron, or become uranium, and so on. The matrix B contains the original distribution of atoms in the toy universe, 100 percent hydrogen and nothing anything else. And the product of A and B was exactly the distribution after that first billion years: 75 percent hydrogen, 25 percent iron, nothing uranium.

If we multiply the matrix A by that product again — well, you should expect we’re going to get the distribution of elements after two billion years, that is, 56.25 percent hydrogen, 33.75 percent iron, 10 percent uranium, but let me write it out anyway to show:

\left(\begin{tabular}{c c c}  3/4 & 0 & 2/5 \\  1/4 & 3/5 & 2/5 \\  0 & 2/5 & 1/5  \end{tabular}\right)\left(\begin{tabular}{c}  75 \\ 25 \\ 0  \end{tabular}\right) = \left(\begin{tabular}{c}  3/4 * 75 + 0 * 25 + 2/5 * 0 \\  1/4 * 75 + 3/5 * 25 + 2/5 * 0 \\  0 * 75 + 2/5 * 25 + 1/5 * 0  \end{tabular}\right) = \left(\begin{tabular}{c}  56.25 \\  33.75 \\  10  \end{tabular}\right)

And if you don’t know just what would happen if we multipled A by that product, you aren’t paying attention.

This also gives a reason why matrix multiplication is defined this way. The operation captures neatly the operation of making a new thing — in the toy universe case, hydrogen or iron or uranium — out of some combination of fractions of an old thing — again, the former distribution of hydrogen and iron and uranium.

Or here’s another reason. Since this matrix A has three rows and three columns, you can multiply it by itself and get a matrix of three rows and three columns out of it. That matrix — which we can write as A2 — then describes how two billion years of nucleosynthesis would change the distribution of elements in the toy universe. A times A times A would give three billion years of nucleosynthesis; A10 ten billion years. The actual calculating of the numbers in these matrices may be tedious, but it describes a complicated operation very efficiently, which we always want to do.

I should mention another bit of notation. We usually use capital letters to represent matrices; but, a matrix that’s just got one column is also called a vector. That’s often written with a lowercase letter, with a little arrow above the letter, as in \vec{x} , or in bold typeface, as in x. (The arrows are easier to put in writing, the bold easier when you were typing on typewriters.) But if you’re doing a lot of writing this out, and know that (say) x isn’t being used for anything but vectors, then even that arrow or boldface will be forgotten. Then we’d write the product of matrix A and vector x as just Ax.  (There are also cases where you put a little caret over the letter; that’s to denote that it’s a vector that’s one unit of length long.)

When you start writing vectors without an arrow or boldface you start to run the risk of confusing what symbols mean scalars and what ones mean vectors. That’s one of the reasons that Greek letters are popular for scalars. It’s also common to put scalars to the left and vectors to the right. So if one saw “rMx”, it would be expected that r is a scalar, M a matrix, and x a vector, and if they’re not then this should be explained in text nearby, preferably before the equations. (And of course if it’s work you’re doing, you should know going in what you mean the letters to represent.)

%d bloggers like this: