## The Summer 2017 Mathematics A To Z: Jordan Canonical Form

I made a mistake! I thought we had got to the end of the block of A To Z topics suggested by Gaurish, of the For The Love Of Mathematics blog. Not so and, indeed, I wonder if it wouldn’t be a viable writing strategy around here for me to just ask Gaurish to throw out topics and I have two weeks to write about them. I don’t think there’s a single unpromising one in the set.

# Jordan Canonical Form.

Before you ask, yes, this is named for the Camille Jordan.

So this is a thing from algebra. Particularly, linear algebra. And more particularly, matrices. Matrices are so much of linear algebra that you could be forgiven thinking they’re all of linear algebra. The thing is, matrices are a really good way of describing linear transformations. That is, where you take a block of space and stretch it out, or squash it down, or rotate it, or do some combination of these things. And stretching and squashing and rotating is a lot of what you’d ever want to do. Refer to any book on how to draw animated cartoons. The only thing matrices can’t do is have their eyes bug out huge when an attractive region of space walks past.

Thing about a matrix is if you want to do something with it, you’re going to write it as a grid of numbers. It doesn’t have to be a grid of numbers. But about all the matrices anyone does anything with are grids of numbers. And that’s fine. They do an incredible lot of stuff. What’s not fine is that on looking at a huge block of numbers, the mind sees: huh. That’s a big block of numbers. Good luck finding what’s meaningful in them. To help find meaning we have a set of standard forms. We call them “canonical” or “normal” or some other approving term. They rearrange and change the terms in the matrix so that more interesting stuff is more obvious.

Now you’re justified asking: how can we rearrange and change the terms in a matrix without changing what the matrix is? We can get away with doing this because we can show some rearrangements don’t change what we’re interested in. That covers the “how dare we” part of “how”. We do it by using matrix multiplication. You might remember from high school algebra that matrix multiplication is this agonizing process of multiplying every pair of numbers that ever existed together, then adding them all up, and then maybe you multiply something by minus one because you’re thinking of determinants, and it all comes out wrong anyway and you have to do it over? Yeah. Well, matrix multiplication is defined hard because it makes stuff like this work out. So that covers the “by what technique” part of “how”. We start out with some matrix, let me imaginatively name it $A$. And then we find some transformation matrix for which, eh, let’s say $P$ is a good enough name. I’ll say why in a moment. Then we use that matrix and its multiplicative inverse $P^{-1}$. And we evaluate the product $P^{-1} A P$. This won’t just be the same old matrix we started with. Not usually. Promise. But what this will be, if we chose our matrix $P$ correctly, is some new matrix that’s easier to read.

The matrices involved here have to follow some rules. Most important, they’re all going to be square matrices. There’ll be more rules that your linear algebra textbook will tell you. Or your instructor will, after checking the textbook.

So what makes a matrix easy to read? Zeroes. Lots and lots of zeroes. When we have a standardized form of a matrix it’s nearly all zeroes. This is for a good reason: zeroes are easy to multiply stuff by. And they’re easy to add stuff to. And almost everything we do with matrices, as a calculation, is a lot of multiplication and addition of the numbers in the matrix.

What also makes a matrix easy to read? Everything important being on the diagonal. The diagonal is one of the two things you would imagine if you were told “here’s a grid of numbers, pick out the diagonal”. In particular it’s the one that goes from the upper left to the bottom right, that is, row one column one, and row two column two, and row three column three, and so on up to row 86 column 86 (or whatever). If everything is on the diagonal the matrix is incredibly easy to work with. If it can’t all be on the diagonal at least everything should be close to it. As close as possible.

In the Jordan Canonical Form not everything is on the diagonal. I mean, it can be, but you shouldn’t count on that. But everything either will be on the diagonal or else it’ll be one row up from the diagonal. That is, row one column two, row two column three, row 85 column 86. Like that. There’s two other important pieces.

First is the thing in the row above the diagonal will be either 1 or 0. Second is that on the diagonal you’ll have a sequence of all the same number. Like, you’ll get four instances of the number ‘2’ along this string of the diagonal. Third is that you’ll get a 1 above all but the row above first instance of this particular number. Fourth is that you’ll get a 0 in the row above the first instance of this number.

Yeah, that’s fussy to visualize. This is one of those things easiest to show in a picture. A Jordan canonical form is a matrix that looks like this:

 2 1 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 -2 1 0 0 0 0 0 0 0 0 0 0 0 -2

This may have you dazzled. It dazzles mathematicians too. When we have to write a matrix that’s almost all zeroes like this we drop nearly all the zeroes. If we have to write anything we just write a really huge 0 in the upper-right and the lower-left corners.

What makes this the Jordan Canonical Form is that the matrix looks like it’s put together from what we call Jordan Blocks. Look around the diagonals. Here’s the first Jordan Block:

 2 1 0 0 0 2 1 0 0 0 2 1 0 0 0 2

Here’s the second:

 3 1 0 3

Here’s the third:

 4 1 0 0 4 1 0 0 4

Here’s the fourth:

 -1

And here’s the fifth:

 -2 1 0 -2

And we can represent the whole matrix as this might-as-well-be-diagonal thing:

 First Block 0 0 0 0 0 Second Block 0 0 0 0 0 Third Block 0 0 0 0 0 Fourth Block 0 0 0 0 0 Fifth Block

These blocks can be as small as a single number. They can be as big as however many rows and columns you like. Each individual block is some repeated number on the diagonal, and a repeated one in the row above the diagonal. You can call this the “superdiagonal”.

(Mathworld, and Wikipedia, assert that sometimes the row below the diagonal — the “subdiagonal” — gets the 1’s instead of the superdiagonal. That’s fine if you like it that way, and it won’t change any of the real work. I have not seen these subdiagonal 1’s in the wild. But I admit I don’t do a lot of this field and maybe there’s times it’s more convenient.)

Using the Jordan Canonical Form for a matrix is a lot like putting an object in a standard reference pose for photographing. This is a good metaphor. We get a Jordan Canonical Form by matrix multiplication, which works like rotating and scaling volumes of space. You can view the Jordan Canonical Form for a matrix as how you represent the original matrix from a new viewing angle that makes it easy to recognize. And this is why $P$ is not a bad name for the matrix that does this work. We can see all this as “projecting” the matrix we started with into a new frame of reference. The new frame is maybe rotated and stretched and squashed and whatnot, compared to how we started. But it’s as valid a base. Projecting a mathematical object from one frame of reference to another usually involves calculating something that looks like $P^{-1} A P$ so, projection. That’s our name.

Mathematicians will speak of “the” Jordan Canonical Form for a matrix as if there were such a thing. I don’t mean that Jordan Canonical Forms don’t exist. They exist just as much as matrices do. It’s the “the” that misleads. You can put the Jordan Blocks in any order and have as valid, and as useful, a Jordan Canonical Form. But it’s easy to swap the orders of these blocks around — it’s another matrix multiplication, and a blessedly easy one — so it doesn’t matter which form you have. Get any one and you have them all.

I haven’t said anything about what these numbers on the diagonal are. They’re the eigenvalues of the original matrix. I hope that clears things up.

Yeah, not to anyone who didn’t know what a Jordan Canonical Form was to start with. Rather than get into calculations let me go to well-established metaphor. Take a sample of an unknown chemical and set it on fire. Put the light from this through a prism and photograph the spectrum. There will be lines, interruptions in the progress of colors. The locations of those lines and how intense they are tell you what the chemical is made of, and in what proportions. These are much like the eigenvectors and eigenvalues of a matrix. The eigenvectors tell you what the matrix is made of, and the eigenvalues how much of the matrix is those. This stuff gets you very far in proving a lot of great stuff. And part of what makes the Jordan Canonical Form great is that you get the eigenvalues right there in neat order, right where anyone can see them.

So! All that’s left is finding the things. The best way to find the Jordan Canonical Form for a given matrix is to become an instructor for a class on linear algebra and assign it as homework. The second-best way is to give the problem to your TA, who will type it in to Mathematica and return the result. It’s too much work to do most of the time. Almost all the stuff you could learn from having the thing in the Jordan Canonical Form you work out in the process of finding the matrix $P$ that would let you calculate what the Jordan Canonical Form is. And once you had that, why go on?

Where the Jordan Canonical Form shines is in doing proofs about what matrices can do. We can always put a square matrix into a Jordan Canonical Form. So if we want to show something is true about matrices in general, we can show that it’s true for the simpler-to-work-with Jordan Canonical Form. Then show that shifting a matrix to or from the Jordan Canonical Form doesn’t change whether the thing we’re interested in is true. It exists in that strange space: it is quite useful, but never on a specific problem.

Oh, all right. Yes, it’s the same Camille Jordan of the Jordan Curve and also of the Jordan Curve Theorem. That fellow.

## Theorem Thursday: What Is Cramer’s Rule?

KnotTheorist asked for this one during my appeal for theorems to discuss. And I’m taking an open interpretation of what a “theorem” is. I can do a rule.

# Cramer’s Rule

I first learned of Cramer’s Rule in the way I expect most people do. It was an algebra course. I mean high school algebra. By high school algebra I mean you spend roughly eight hundred years learning ways to solve for x or to plot y versus x. Then take a pause for polar coordinates and matrices. Then you go back to finding both x and y.

Cramer’s Rule came up in the context of solving simultaneous equations. You have more than one variable. So x and y. Maybe z. Maybe even a w, before whoever set up the problem gives up and renames everything x1 and x2 and x62 and all that. You also have more than one equation. In fact, you have exactly as many equations as you have variables. Are there any sets of values those variables can have which make all those variable true simultaneously? Thus the imaginative name “simultaneous equations” or the search for “simultaneous solutions”.

If all the equations are linear then we can always say whether there’s simultaneous solutions. By “linear” we mean what we always mean in mathematics, which is, “something we can handle”. But more exactly it means the equations have x and y and whatever other variables only to the first power. No x-squared or square roots of y or tangents of z or anything. (The equations are also allowed to omit a variable. That is, if you have one equation with x, y, and z, and another with just x and z, and another with just y and z, that’s fine. We pretend the missing variable is there and just multiplied by zero, and proceed as before.) One way to find these solutions is with Cramer’s Rule.

Cramer’s Rule sets up some matrices based on the system of equations. If the system has two equations, it sets up three matrices. If the system has three equations, it sets up four matrices. If the system has twelve equations, it sets up thirteen matrices. You see the pattern here. And then you can take the determinant of each of these matrices. Dividing the determinant of one of these matrices by another one tells you what value of x makes all the equations true. Dividing the determinant of another matrix by the determinant of one of these matrices tells you which values of y makes all the equations true. And so on. The Rule tells you which determinants to use. It also says what it means if the determinant you want to divide by equals zero. It means there’s either no set of simultaneous solutions or there’s infinitely many solutions.

This gets dropped on us students in the vain effort to convince us knowing how to calculate determinants is worth it. It’s not that determinants aren’t worth knowing. It’s just that they don’t seem to tell us anything we care about. Not until we get into mappings and calculus and differential equations and other mathematics-major stuff. We never see it in high school.

And the hard part of determinants is that for all the cool stuff they tell us, they take forever to calculate. The determinant for a matrix with two rows and two columns isn’t bad. Three rows and three columns is getting bad. Four rows and four columns is awful. The determinant for a matrix with five rows and five columns you only ever calculate if you’ve made your teacher extremely cross with you.

So there’s the genius and the first problem with Cramer’s Rule. It takes a lot of calculating. Many any errors along the way with the calculation and your work is wrong. And worse, it won’t be wrong in an obvious way. You can find the error only by going over every single step and hoping to catch the spot where you, somehow, got 36 times -7 minus 21 times -8 wrong.

The second problem is nobody in high school algebra mentions why systems of linear equations should be interesting to solve. Oh, maybe they’ll explain how this is the work you do to figure out where two straight lines intersect. But that just shifts the “and we care because … ?” problem back one step. Later on we might come to understand the lines represent cases where something we’re interested in is true, or where it changes from true to false.

This sort of simultaneous-solution problem turns up naturally in optimization problems. These are problems where you try to find a maximum subject to some constraints. Or find a minimum. Maximums and minimums are the same thing when you think about them long enough. If all the constraints can be satisfied at once and you get a maximum (or minimum, whatever), great! If they can’t … Well, you can study how close it’s possible to get, and what happens if you loosen one or more constraint. That’s worth knowing about.

The third problem with Cramer’s Rule is that, as a method, it kind of sucks. We can be convinced that simultaneous linear equations are worth solving, or at least that we have to solve them to get out of High School Algebra. And we have computers. They can grind away and work out thirteen determinants of twelve-row-by-twelve-column matrices. They might even get an answer back before the end of the term. (The amount of work needed for a determinant grows scary fast as the matrix gets bigger.) But all that work might be meaningless.

The trouble is that Cramer’s Rule is numerically unstable. Before I even explain what that is you already sense it’s a bad thing. Think of all the good things in your life you’ve heard described as unstable. Fair enough. But here’s what we mean by numerically unstable.

Is 1/3 equal to 0.3333333? No, and we know that. But is it close enough? Sure, most of the time. Suppose we need a third of sixty million. 0.3333333 times 60,000,000 equals 19,999,998. That’s a little off of the correct 20,000,000. But I bet you wouldn’t even notice the difference if nobody pointed it out to you. Even if you did notice it you might write off the difference. “If we must, make up the difference out of petty cash”, you might declare, as if that were quite sensible in the context.

And that’s so because this multiplication is numerically stable. Make a small error in either term and you get a proportional error in the result. A small mistake will — well, maybe it won’t stay small, necessarily. But it’ll not grow too fast too quickly.

So now you know intuitively what an unstable calculation is. This is one in which a small error doesn’t necessarily stay proportionally small. It might grow huge, arbitrarily huge, and in few calculations. So your answer might be computed just fine, but actually be meaningless.

This isn’t because of a flaw in the computer per se. That is, it’s working as designed. It’s just that we might need, effectively, infinitely many digits of precision for the result to be correct. You see where there may be problems achieving that.

Cramer’s Rule isn’t guaranteed to be nonsense, and that’s a relief. But it is vulnerable to this. You can set up problems that look harmless but which the computer can’t do. And that’s surely the worst of all worlds, since we wouldn’t bother calculating them numerically if it weren’t too hard to do by hand.

(Let me direct the reader who’s unintimidated by mathematical jargon, and who likes seeing a good Wikipedia Editors quarrel, to the Cramer’s Rule Talk Page. Specifically to the section “Cramer’s Rule is useless.”)

I don’t want to get too down on Cramer’s Rule. It’s not like the numerical instability hurts every problem you might use it on. And you can, at the cost of some more work, detect whether a particular set of equations will have instabilities. That requires a lot of calculation but if we have the computer to do the work fine. Let it. And a computer can limit its numerical instabilities if it can do symbolic manipulations. That is, if it can use the idea of “one-third” rather than 0.3333333. The software package Mathematica, for example, does symbolic manipulations very well. You can shed many numerical-instability problems, although you gain the problem of paying for a copy of Mathematica.

If you just care about, or just need, one of the variables then what the heck. Cramer’s Rule lets you solve for just one or just some of the variables. That seems like a niche application to me, but it is there.

And the Rule re-emerges in pure analysis, where numerical instability doesn’t matter. When we look to differential equations, for example, we often find solutions are combinations of several independent component functions. Bases, in fact. Testing whether we have found independent bases can be done through a thing called the Wronskian. That’s a way that Cramer’s Rule appears in differential equations.

Wikipedia also asserts the use of Cramer’s Rule in differential geometry. I believe that’s a true statement, and that it will be reflected in many mechanics problems. In these we can use our knowledge that, say, energy and angular momentum of a system are constant values to tell us something of how positions and velocities depend on each other. But I admit I’m not well-read in differential geometry. That’s something which has indeed caused me pain in my scholarly life. I don’t know whether differential geometers thank Cramer’s Rule for this insight or whether they’re just glad to have got all that out of the way. (See the above Wikipedia Editors quarrel.)

I admit for all this talk about Cramer’s Rule I haven’t said what it is. Not in enough detail to pass your high school algebra class. That’s all right. It’s easy to find. MathWorld has the rule in pretty simple form. Mathworld does forget to define what it means by the vector d. (It’s the vector with components d1, d2, et cetera.) But that’s enough technical detail. If you need to calculate something using it, you can probably look closer at the problem and see if you can do it another way instead. Or you’re in high school algebra and just have to slog through it. It’s all right. Eventually you can put x and y aside and do geometry.

## A Leap Day 2016 Mathematics A To Z: Polynomials

I have another request for today’s Leap Day Mathematics A To Z term. Gaurish asked for something exciting. This should be less challenging than Dedekind Domains. I hope.

## Polynomials.

Polynomials are everything. Everything in mathematics, anyway. If humans study it, it’s a polynomial. If we know anything about a mathematical construct, it’s because we ran across it while trying to understand polynomials.

I exaggerate. A tiny bit. Maybe by three percent. But polynomials are big.

They’re easy to recognize. We can get them in pre-algebra. We make them out of a set of numbers called coefficients and one or more variables. The coefficients are usually either real numbers or complex-valued numbers. The variables we usually allow to be either real or complex-valued numbers. We take each coefficient and multiply it by some power of each variable. And we add all that up. So, polynomials are things that look like these things:

$x^2 - 2x + 1$
$12 x^4 + 2\pi x^2 y^3 - 4x^3 y - \sqrt{6}$
$\ln(2) + \frac{1}{2}\left(x - 2\right) - \frac{1}{2 \cdot 2^2}\left(x - 2\right)^2 + \frac{1}{2 \cdot 2^3}\left(x - 2\right)^3 - \frac{1}{2 \cdot 2^4}\left(x - 2\right)^4 + \cdots$
$a_n x^n + a_{n - 1}x^{n - 1} + a_{n - 2}x^{n - 2} + \cdots + a_2 x^2 + a_1 x^1 + a_0$

The first polynomial maybe looks nice and comfortable. The second may look a little threatening, what with it having two variables and a square root in it, but it’s not too weird. The third is an infinitely long polynomial; you’re supposed to keep going on in that pattern, adding even more terms. The last is a generic representation of a polynomial. Each number a0, a1, a2, et cetera is some coefficient that we in principle know. It’s a good way of representing a polynomial when we want to work with it but don’t want to tie ourselves down to a particular example. The highest power we raise a variable to we call the degree of the polynomial. A second-degree polynomial, for example, has an x2 in it, but not an x3 or x4 or x18 or anything like that. A third-degree polynomial has an x3, but not x to any higher powers. Degree is a useful way of saying roughly how long a polynomial is, so it appears all over discussions of polynomials.

But why do we like polynomials? Why like them so much that MathWorld lists 1,163 pages that mention polynomials?

It’s because they’re great. They do everything we’d ever want to do and they’re great at it. We can add them together as easily as we add regular old numbers. We can subtract them as well. We can multiply and divide them. There’s even prime polynomials, just like there are prime numbers. They take longer to work out, but they’re not harder.

And they do great stuff in advanced mathematics too. In calculus we want to take derivatives of functions. Polynomials, we always can. We get another polynomial out of that. So we can keep taking derivatives, as many as we need. (We might need a lot of them.) We can integrate too. The integration produces another polynomial. So we can keep doing that as long as we need too. (We need to do this a lot, too.) This lets us solve so many problems in calculus, which is about how functions work. It also lets us solve so many problems in differential equations, which is about systems whose change depends on the current state of things.

That’s great for analyzing polynomials, but what about things that aren’t polynomials?

Well, if a function is continuous, then it might as well be a polynomial. To be a little more exact, we can set a margin of error. And we can always find polynomials that are less than that margin of error away from the original function. The original function might be annoying to deal with. The polynomial that’s as close to it as we want, though, isn’t.

Not every function is continuous. Most of them aren’t. But most of the functions we want to do work with are, or at least are continuous in stretches. Polynomials let us understand the functions that describe most real stuff.

Nice for mathematicians, all right, but how about for real uses? How about for calculations?

Oh, polynomials are just magnificent. You know why? Because you can evaluate any polynomial as soon as you can add and multiply. (Also subtract, but we think of that as addition.) Remember, x4 just means “x times x times x times x”, four of those x’s in the product. All these polynomials are easy to evaluate.

Even better, we don’t have to evaluate them. We can automate away the evaluation. It’s easy to set a calculator doing this work, and it will do it without complaint and with few unforeseeable mistakes.

Now remember that thing where we can make a polynomial close enough to any continuous function? And we can always set a calculator to evaluate a polynomial? Guess that this means about continuous functions. We have a tool that lets us calculate stuff we would want to know. Things like arccosines and logarithms and Bessel functions and all that. And we get nice easy to understand numbers out of them. For example, that third polynomial I gave you above? That’s not just infinitely long. It’s also a polynomial that approximates the natural logarithm. Pick a positive number x that’s between 0 and 4 and put it in that polynomial. Calculate terms and add them up. You’ll get closer and closer to the natural logarithm of that number. You’ll get there faster if you pick a number near 2, but you’ll eventually get there for whatever number you pick. (Calculus will tell us why x has to be between 0 and 4. Don’t worry about it for now.)

So through polynomials we can understand functions, analytically and numerically.

And they keep revealing things to us. We discovered complex-valued numbers because we wanted to find roots, values of x that make a polynomial of x equal to zero. Some formulas worked well for third- and fourth-degree polynomials. (They look like the quadratic formula, which solves second-degree polynomials. The big difference is nobody remembers what they are without looking them up.) But the formulas sometimes called for things that looked like square roots of negative numbers. Absurd! But if you carried on as if these square roots of negative numbers meant something, you got meaningful answers. And correct answers.

We wanted formulas to solve fifth- and higher-degree polynomials exactly. We can do this with second and third and fourth-degree polynomials, after all. It turns out we can’t. Oh, we can solve some of them exactly. The attempt to understand why, though, helped us create and shape group theory, the study of things that look like but aren’t numbers.

Polynomials go on, sneaking into everything. We can look at a square matrix and discover its characteristic polynomial. This allows us to find beautifully-named things like eigenvalues and eigenvectors. These reveal secrets of the matrix’s structure. We can find polynomials in the formulas that describe how many ways to split up a group of things into a smaller number of sets. We can find polynomials that describe how networks of things are connected. We can find polynomials that describe how a knot is tied. We can even find polynomials that distinguish between a knot and the knot’s reflection in the mirror.

Polynomials are everything.

## A Leap Day 2016 Mathematics A To Z: Matrix

I get to start this week with another request. Today’s Leap Day Mathematics A To Z term is a famous one, and one that I remember terrifying me in the earliest days of high school. The request comes from Gaurish, chief author of the Gaurish4Math blog.

## Matrix.

Lewis Carroll didn’t like the matrix. Well, Charles Dodgson, anyway. And it isn’t that he disliked matrices particularly. He believed it was a bad use of a word. “Surely,” he wrote, “[ matrix ] means rather the mould, or form, into which algebraical quantities may be introduced, than an actual assemblage of such quantities”. He might have had etymology on his side. The word meant the place where something was developed, the source of something else. History has outvoted him, and his preferred “block”. The first mathematicians to use the word “matrix” were interested in things derived from the matrix. So for them, the matrix was the source of something else.

What we mean by a matrix is a collection of some number of rows and columns. Inside each individual row and column is some mathematical entity. We call this an element. Elements are almost always real numbers. When they’re not real numbers they’re complex-valued numbers. (I’m sure somebody, somewhere has created matrices with something else as elements. You’ll never see these freaks.)

Matrices work a lot like vectors do. We can add them together. We can multiply them by real- or complex-valued numbers, called scalars. But we can do other things with them. We can define multiplication, at least sometimes. The definition looks like a lot of work, but it represents something useful that way. And for square matrices, ones with equal numbers of rows and columns, we can find other useful stuff. We give that stuff wonderful names like traces and determinants and eigenvalues and eigenvectors and such.

One of the big uses of matrices is to represent a mapping. A matrix can describe how points in a domain map to points in a range. Properly, a matrix made up of real numbers can only describe what are called linear mappings. These are ones that turn the domain into the range by stretching or squeezing down or rotating the whole domain the same amount. A mapping might follow different rules in different regions, but that’s all right. We can write a matrix that approximates the original mapping, at least in some areas. We do this in the same way, and for pretty much the same reason, we can approximate a real and complicated curve with a bunch of straight lines. Or the way we can approximate a complicated surface with a bunch of triangular plates.

We can compound mappings. That is, we can start with a domain and a mapping, and find the image of that domain. We can then use a mapping again and find the image of the image of that domain. The matrix that describes this mapping-of-a-mapping is the one you get by multiplying the matrix of the first mapping and the matrix of the second mapping together. This is why we define matrix multiplication the odd way we do. Mapping are that useful, and matrices are that tied to them.

I wrote about some of the uses of matrices in a Set Tour essay. That was based on a use of matrices in physics. We can describe the changing of a physical system with a mapping. And we can understand equilibriums, states where a system doesn’t change, by looking at the matrix that approximates what the mapping does near but not exactly on the equilibrium.

But there are other uses of matrices. Many of them have nothing to do with mappings or physical systems or anything. For example, we have graph theory. A graph, here, means a bunch of points, “vertices”, connected by curves, “edges”. Many interesting properties of graphs depend on how many other vertices each vertex is connected to. And this is well-represented by a matrix. Index your vertices. Then create a matrix. If vertex number 1 connects to vertex number 2, put a ‘1’ in the first row, second column. If vertex number 1 connects to vertex number 3, put a ‘1’ in the first row, third column. If vertex number 2 isn’t connected to vertex number 3, put a ‘0’ in the second row, third column. And so on.

We don’t have to use ones and zeroes. A “network” is a kind of graph where there’s some cost associated with each edge. We can put that cost, that number, into the matrix. Studying the matrix of a graph or network can tell us things that aren’t obvious from looking at the drawing.

## The Set Tour, Part 7: Matrices

I feel a bit odd about this week’s guest in the Set Tour. I’ve been mostly concentrating on sets that get used as the domains or ranges for functions a lot. The ones I want to talk about here don’t tend to serve the role of domain or range. But they are used a great deal in some interesting functions. So I loosen my rule about what to talk about.

## Rm x n and Cm x n

Rm x n might explain itself by this point. If it doesn’t, then this may help: the “x” here is the multiplication symbol. “m” and “n” are positive whole numbers. They might be the same number; they might be different. So, are we done here?

Maybe not quite. I was fibbing a little when I said “x” was the multiplication symbol. R2 x 3 is not a longer way of saying R6, an ordered collection of six real-valued numbers. The x does represent a kind of product, though. What we mean by R2 x 3 is an ordered collection, two rows by three columns, of real-valued numbers. Say the “x” here aloud as “by” and you’re pronouncing it correctly.

What we get is called a “matrix”. If we put into it only real-valued numbers, it’s a “real matrix”, or a “matrix of reals”. Sometimes mathematical terminology isn’t so hard to follow. Just as with vectors, Rn, it matters just how the numbers are organized. R2 x 3 means something completely different from what R3 x 2 means. And swapping which positions the numbers in the matrix occupy changes what matrix we have, as you might expect.

You can add together matrices, exactly as you can add together vectors. The same rules even apply. You can only add together two matrices of the same size. They have to have the same number of rows and the same number of columns. You add them by adding together the numbers in the corresponding slots. It’s exactly what you would do if you went in without preconceptions.

You can also multiply a matrix by a single number. We called this scalar multiplication back when we were working with vectors. With matrices, we call this scalar multiplication. If it strikes you that we could see vectors as a kind of matrix, yes, we can. Sometimes that’s wise. We can see a vector as a matrix in the set R1 x n or as one in the set Rn x 1, depending on just what we mean to do.

It’s trickier to multiply two matrices together. As with vectors multiplying the numbers in corresponding positions together doesn’t give us anything. What we do instead is a time-consuming but not actually hard process. But according to its rules, something in Rm x n we can multiply by something in Rn x k. “k” is another whole number. The second thing has to have exactly as many rows as the first thing has columns. What we get is a matrix in Rm x k.

I grant you maybe didn’t see that coming. Also a potential complication: if you can multiply something in Rm x n by something in Rn x k, can you multiply the thing in Rn x k by the thing in Rm x n? … No, not unless k and m are the same number. Even if they are, you can’t count on getting the same product. Matrices are weird things this way. They’re also gateways to weirder things. But it is a productive weirdness, and I’ll explain why in a few paragraphs.

A matrix is a way of organizing terms. Those terms can be anything. Real matrices are surely the most common kind of matrix, at least in mathematical usage. Next in common use would be complex-valued matrices, much like how we get complex-valued vectors. These are written Cm x n. A complex-valued matrix is different from a real-valued matrix. The terms inside the matrix can be complex-valued numbers, instead of real-valued numbers. Again, sometimes, these mathematical terms aren’t so tricky.

I’ve heard occasionally of people organizing matrices of other sets. The notation is similar. If you’re building a matrix of “m” rows and “n” columns out of the things you find inside a set we’ll call H, then you write that as Hm x n. I’m not saying you should do this, just that if you need to, that’s how to tell people what you’re doing.

Now. We don’t really have a lot of functions that use matrices as domains, and I can think of fewer that use matrices as ranges. There are a couple of valuable ones, ones so valuable they get special names like “eigenvalue” and “eigenvector”. (Don’t worry about what those are.) They take in Rm x n or Cm x n and return a set of real- or complex-valued numbers, or real- or complex-valued vectors. Not even those, actually. Eigenvectors and eigenfunctions are only meaningful if there are exactly as many rows as columns. That is, for Rm x m and Cm x m. These are known as “square” matrices, just as you might guess if you were shaken awake and ordered to say what you guessed a “square matrix” might be.

They’re important functions. There are some other important functions, with names like “rank” and “condition number” and the like. But they’re not many. I believe they’re not even thought of as functions, any more than we think of “the length of a vector” as primarily a function. They’re just properties of these matrices, that’s all.

So why are they worth knowing? Besides the joy that comes of knowing something, I mean?

Here’s one answer, and the one that I find most compelling. There is cultural bias in this: I come from an applications-heavy mathematical heritage. We like differential equations, which study how stuff changes in time and in space. It’s very easy to go from differential equations to ordered sets of equations. The first equation may describe how the position of particle 1 changes in time. It might describe how the velocity of the fluid moving past point 1 changes in time. It might describe how the temperature measured by sensor 1 changes as it moves. It doesn’t matter. We get a set of these equations together and we have a majestic set of differential equations.

Now, the dirty little secret of differential equations: we can’t solve them. Most interesting physical phenomena are nonlinear. Linear stuff is easy. Small change 1 has effect A; small change 2 has effect B. If we make small change 1 and small change 2 together, this has effect A plus B. Nonlinear stuff, though … it just doesn’t work. Small change 1 has effect A; small change 2 has effect B. Small change 1 and small change 2 together has effect … A plus B plus some weird A times B thing plus some effect C that nobody saw coming and then C does something with A and B and now maybe we’d best hide.

There are some nonlinear differential equations we can solve. Those are the result of heroic work and brilliant insights. Compared to all the things we would like to solve there’s not many of them. Methods to solve nonlinear differential equations are as precious as ways to slay krakens.

But here’s what we can do. What we usually like to know about in systems are equilibriums. Those are the conditions in which the system stops changing. Those are interesting. We can usually find those points by boring but not conceptually challenging calculations. If we can’t, we can declare x0 represents the equilibrium. If we still care, we leave calculating its actual values to the interested reader or hungry grad student.

But what’s really interesting is: what happens if we’re near but not exactly at the equilibrium? Sometimes, we stay near it. Think of pushing a swing. However good a push you give, it’s going to settle back to the boring old equilibrium of dangling straight down. Sometimes, we go racing away from it. Think of trying to balance a pencil on its tip; if we did this perfectly it would stay balanced. It never does. We’re never perfect, or there’s some wind or somebody walks by and the perfect balance is foiled. It falls down and doesn’t bounce back up. Sometimes, whether it it stays near or goes away depends on what way it’s away from the equilibrium.

And now we finally get back to matrices. Suppose we are starting out near an equilibrium. We can, usually, approximate the differential equations that describe what will happen. The approximation may only be good if we’re just a tiny bit away from the equilibrium, but that might be all we really want to know. That approximation will be some linear differential equations. (If they’re not, then we’re just wasting our time.) And that system of linear differential equations we can describe using matrices.

If we can write what we are interested in as a set of linear differential equations, then we have won. We can use the many powerful tools of matrix arithmetic — linear algebra, specifically — to tell us everything we want to know about the system. We can say whether a small push away from the equilibrium stays small, or whether it grows, or whether it depends. We can say how fast the small push shrinks, or grows (for a while). We can say how the system will change, approximately.

This is what I love in matrices. It’s not everything there is to them. But it’s enough to make matrices important to me.

## Combining Matrices And Model Universes

I would like to resume talking about matrices and really old universes and the way nucleosynthesis in these model universes causes atoms to keep settling down to peculiar but unchanging distribution.

I’d already described how a matrix offers a nice way to organize elements, and in ways that encode information about the context of the elements by where they’re placed. That’s useful and saves some writing, certainly, although by itself it’s not that interesting. Matrices start to get really powerful when, first, the elements being stored are things on which you can do something like arithmetic with pairs of them. Here I mostly just mean that you can add together two elements, or multiply them, and get back something meaningful.

This typically means that the matrix is made up of a grid of numbers, although that isn’t actually required, just, really common if we’re trying to do mathematics.

Then you get the ability to add together and multiply together the matrices themselves, turning pairs of matrices into some new matrix, and building something that works a lot like arithmetic on these matrices.

Adding one matrix to another is done in almost the obvious way: add the element in the first row, first column of the first matrix to the element in the first row, first column of the second matrix; that’s the first row, first column of your new matrix. Then add the element in the first row, second column of the first matrix to the element in the first row, second column of the second matrix; that’s the first row, second column of the new matrix. Add the element in the second row, first column of the first matrix to the element in the second row, first column of the second matrix, and put that in the second row, first column of the new matrix. And so on.

This means you can only add together two matrices that are the same size — the same number of rows and of columns — but that doesn’t seem unreasonable.

You can also do something called scalar multiplication of a matrix, in which you multiply every element in the matrix by the same number. A scalar is just a number that isn’t part of a matrix. This multiplication is useful, not least because it lets us talk about how to subtract one matrix from another: to find the difference of the first matrix and the second, scalar-multiply the second matrix by -1, and then add the first to that product. But you can do scalar multiplication by any number, by two or minus pi or by zero if you feel like it.

I should say something about notation. When we want to write out these kinds of operations efficiently, of course, we turn to symbols to represent the matrices. We can, in principle, use any symbols, but by convention a matrix usually gets represented with a capital letter, A or B or M or P or the like. So to add matrix A to matrix B, with the result being matrix C, we can write out the equation “A + B = C”, which is about as simple as we could hope to see. Scalars are normally written in lowercase letters, often Greek letters, if we don’t know what the number is, so that the scalar multiplication of the number r and the matrix A would be the product “rA”, and we could write the difference between matrix A and matrix B as “A + (-1)B” or “A – B”.

Matrix multiplication, now, that is done by a process that sounds like doubletalk, and it takes a while of practice to do it right. But there are good reasons for doing it that way and we’ll get to one of those reasons by the end of this essay.

To multiply matrix A and matrix B together, we do multiply various pairs of elements from both matrix A and matrix B. The surprising thing is that we also add together sets of these products, per this rule.

Take the element in the first row, first column of A, and multiply it by the element in the first row, first column of B. Add to that the product of the element in the first row, second column of A and the second row, first column of B. Add to that total the product of the element in the first row, third column of A and the third row, second column of B, and so on. When you’ve run out of columns of A and rows of B, this total is the first row, first column of the product of the matrices A and B.

Plenty of work. But we have more to do. Take the product of the element in the first row, first column of A and the element in the first row, second column of B. Add to that the product of the element in the first row, second column of A and the element in the second row, second column of B. Add to that the product of the element in the first row, third column of A and the element in the third row, second column of B. And keep adding those up until you’re out of columns of A and rows of B. This total is the first row, second column of the product of matrices A and B.

This does mean that you can multiply matrices of different sizes, provided the first one has as many columns as the second has rows. And the product may be a completely different size from the first or second matrices. It also means it might be possible to multiply matrices in one order but not the other: if matrix A has four rows and three columns, and matrix B has three rows and two columns, then you can multiply A by B, but not B by A.

My recollection on learning this process was that this was crazy, and the workload ridiculous, and I imagine people who get this in Algebra II, and don’t go on to using mathematics later on, remember the process as nothing more than an unpleasant blur of doing a lot of multiplying and addition for some reason or other.

So here is one of the reasons why we do it this way. Let me define two matrices:

$A = \left(\begin{tabular}{c c c} 3/4 & 0 & 2/5 \\ 1/4 & 3/5 & 2/5 \\ 0 & 2/5 & 1/5 \end{tabular}\right)$

$B = \left(\begin{tabular}{c} 100 \\ 0 \\ 0 \end{tabular}\right)$

Then matrix A times B is

$AB = \left(\begin{tabular}{c} 3/4 * 100 + 0 * 0 + 2/5 * 0 \\ 1/4 * 100 + 3/5 * 0 + 2/5 * 0 \\ 0 * 100 + 2/5 * 0 + 1/5 * 0 \end{tabular}\right) = \left(\begin{tabular}{c} 75 \\ 25 \\ 0 \end{tabular}\right)$

You’ve seen those numbers before, of course: the matrix A contains the probabilities I put in my first model universe to describe the chances that over the course of a billion years a hydrogen atom would stay hydrogen, or become iron, or become uranium, and so on. The matrix B contains the original distribution of atoms in the toy universe, 100 percent hydrogen and nothing anything else. And the product of A and B was exactly the distribution after that first billion years: 75 percent hydrogen, 25 percent iron, nothing uranium.

If we multiply the matrix A by that product again — well, you should expect we’re going to get the distribution of elements after two billion years, that is, 56.25 percent hydrogen, 33.75 percent iron, 10 percent uranium, but let me write it out anyway to show:

$\left(\begin{tabular}{c c c} 3/4 & 0 & 2/5 \\ 1/4 & 3/5 & 2/5 \\ 0 & 2/5 & 1/5 \end{tabular}\right)\left(\begin{tabular}{c} 75 \\ 25 \\ 0 \end{tabular}\right) = \left(\begin{tabular}{c} 3/4 * 75 + 0 * 25 + 2/5 * 0 \\ 1/4 * 75 + 3/5 * 25 + 2/5 * 0 \\ 0 * 75 + 2/5 * 25 + 1/5 * 0 \end{tabular}\right) = \left(\begin{tabular}{c} 56.25 \\ 33.75 \\ 10 \end{tabular}\right)$

And if you don’t know just what would happen if we multipled A by that product, you aren’t paying attention.

This also gives a reason why matrix multiplication is defined this way. The operation captures neatly the operation of making a new thing — in the toy universe case, hydrogen or iron or uranium — out of some combination of fractions of an old thing — again, the former distribution of hydrogen and iron and uranium.

Or here’s another reason. Since this matrix A has three rows and three columns, you can multiply it by itself and get a matrix of three rows and three columns out of it. That matrix — which we can write as A2 — then describes how two billion years of nucleosynthesis would change the distribution of elements in the toy universe. A times A times A would give three billion years of nucleosynthesis; A10 ten billion years. The actual calculating of the numbers in these matrices may be tedious, but it describes a complicated operation very efficiently, which we always want to do.

I should mention another bit of notation. We usually use capital letters to represent matrices; but, a matrix that’s just got one column is also called a vector. That’s often written with a lowercase letter, with a little arrow above the letter, as in $\vec{x}$, or in bold typeface, as in x. (The arrows are easier to put in writing, the bold easier when you were typing on typewriters.) But if you’re doing a lot of writing this out, and know that (say) x isn’t being used for anything but vectors, then even that arrow or boldface will be forgotten. Then we’d write the product of matrix A and vector x as just Ax.  (There are also cases where you put a little caret over the letter; that’s to denote that it’s a vector that’s one unit of length long.)

When you start writing vectors without an arrow or boldface you start to run the risk of confusing what symbols mean scalars and what ones mean vectors. That’s one of the reasons that Greek letters are popular for scalars. It’s also common to put scalars to the left and vectors to the right. So if one saw “rMx”, it would be expected that r is a scalar, M a matrix, and x a vector, and if they’re not then this should be explained in text nearby, preferably before the equations. (And of course if it’s work you’re doing, you should know going in what you mean the letters to represent.)

## Lewis Carroll and my Playing With Universes

I wanted to explain what’s going on that my little toy universes with three kinds of elements changing to one another keep settling down to steady and unchanging distributions of stuff. I can’t figure a way to do that other than to introduce some actual mathematics notation, and I’m aware that people often find that sort of thing off-putting, or terrifying, or at the very least unnerving.

There’s fair reason to: the entire point of notation is to write down a lot of information in a way that’s compact or easy to manipulate. Using it at all assumes that the writer, and the reader, are familiar with enough of the background that they don’t have to have it explained at each reference. To someone who isn’t familiar with the topic, then, the notation looks like symbols written down without context and without explanation. It’s much like wandering into an Internet forum where all the local acronyms are unfamiliar, the in-jokes are heavy on the ground, and for some reason nobody actually spells out Dave Barry’s name in full.

Let me start by looking at the descriptions of my toy universe: it’s made up of a certain amount of hydrogen, a certain amount of iron, and a certain amount of uranium. Since I’m not trying to describe, like, where these elements are or how they assemble into toy stars or anything like that, I can describe everything that I find interesting about this universe with three numbers. I had written those out as “40% hydrogen, 35% iron, 25% uranium”, for example, or “10% hydrogen, 60% iron, 30% uranium”, or whatever the combination happens to be. If I write the elements in the same order each time, though, I don’t really need to add “hydrogen” and “iron” and “uranium” after the numbers, and if I’m always looking at percentages I don’t even need to add the percent symbol. I can just list the numbers and let the “percent hydrogen” or “percent iron” or “percent uranium” be implicit: “40, 35, 25”, for one universe’s distribution, or “10, 60, 30” for another.

Letting the position of where a number is written carry information is a neat and easy way to save effort, and when you notice what’s happening you realize it’s done all the time: it’s how writing the date as “7/27/14” makes any sense, or how a sports scoreboard might compactly describe the course of the game:

0 1 0   1 2 0   0 0 4   8 13 1
2 0 0   4 0 0   0 0 1   7 15 0


To use the notation you need to understand how the position encodes information. “7/27/14” doesn’t make sense unless you know the first number is the month, the second the day within the month, and the third the year in the current century, and that there’s an equally strong convention putting the day within the month first and the month in the year second presents hazards when the information is ambiguous. Reading the box score requires knowing the top row reflects the performance of the visitor’s team, the bottom row the home team, and the first nine columns count the runs by each team in each inning, while the last three columns are the total count of runs, hits, and errors by that row’s team.

When you put together the numbers describing something into a rectangular grid, that’s termed a matrix of numbers. The box score for that imaginary baseball game is obviously one, but it’s also a matrix if I just write the numbers describing my toy universe in a row, or a column:

40
35
25


or

10
60
30


If a matrix has just the one column, it’s often called a vector. If a matrix has the same number of rows as it has columns, it’s called a square matrix. Matrices and vectors are also usually written with either straight brackets or curled parentheses around them, left and right, but that’s annoying to do in HTML so please just pretend.

The matrix as mathematicians know it today got put into a logically rigorous form around 1850 largely by the work of James Joseph Sylvester and Arthur Cayley, leading British mathematicians who also spent time teaching in the United States. Both are fascinating people, Sylvester for his love of poetry and language and for an alleged incident while briefly teaching at the University of Virginia which the MacTutor archive of mathematician biographies, citing L S Feuer, describes so: “A student who had been reading a newspaper in one of Sylvester’s lectures insulted him and Sylvester struck him with a sword stick. The student collapsed in shock and Sylvester believed (wrongly) that he had killed him. He fled to New York where one os his elder brothers was living.” MacTutor goes on to give reasons why this story may be somewhat distorted, although it does suggest one solution to the problem of students watching their phones in class.

Cayley, meanwhile, competes with Leonhard Euler for prolific range in a mathematician. MacTutor cites him as having at least nine hundred published papers, covering pretty much all of modern mathematics, including work that would underlie quantum mechanics and non-Euclidean geometry. He wrote about 250 papers in the fourteen years he was working as a lawyer, which would by itself have made him a prolific mathematician. If you need to bluff your way through a mathematical conversation, saying “Cayley” and following it with any random noun will probably allow you to pass.

MathWorld mentions, to my delight, that Lewis Carroll, in his secret guise as Charles Dodgson, came in to the world of matrices in 1867 with an objection to the very word. In writing about them, Dodgson said, “”I am aware that the word Matrix’ is already in use to express the very meaning for which I use the word Block’; but surely the former word means rather the mould, or form, into which algebraical quantities may be introduced, than an actual assemblage of such quantities”. He’s got a fair point, really, but there wasn’t much to be done in 1867 to change the word, and it’s only gotten more entrenched since then.

## What’s Going On In The Old Universe

Last time in this infinitely-old universe puzzle, we found that by making a universe of only three kinds of atoms (hydrogen, iron, and uranium) which shifted to one another with fixed chances over the course of time, we’d end up with the same distribution of atoms regardless of what the distribution of hydrogen, iron, and uranium was to start with. That seems like it might require explanation.

(For people who want to join us late without re-reading: I got to wondering what the universe might look like if it just ran on forever, stars fusing lighter elements into heavier ones, heavier elements fissioning into lighter ones. So I looked at a toy model where there were three kinds of atoms, dubbed hydrogen for the lighter elements, iron for the middle, and uranium for the heaviest, and made up some numbers saying how likely hydrogen was to be turned into heavier atoms over the course of a billion years, how likely iron was to be turned into something heavier or lighter, and how likely uranium was to be turned into lighter atoms. And sure enough, if the rates of change stay constant, then the universe goes from any initial distribution of atoms to a single, unchanging-ever-after mix in surprisingly little time, considering it’s got a literal eternity to putter around.)

The first question, it seems, is whether I happened to pick a freak set of numbers for the change of one kind of atom to another. It’d be a stroke of luck, but, these things happen. In my first model, I gave hydrogen a 25 percent chance of turning to iron, and no chance of turning to helium, in a billion years. Let’s change that so any given atom has a 20 percent chance of turning to iron and a 20 percent chance of turning to uranium. Similarly, instead of iron having no chance of turning to hydrogen and a 40 percent chance of turning to uranium, let’s try giving each iron atom a 25 percent chance of becoming hydrogen and a 25 percent chance of becoming uranium. Uranium, first time around, had a 40 percent chance of becoming hydrogen and a 40 percent chance of becoming iron. Let me change that to a 60 percent chance of becoming hydrogen and a 20 percent chance of becoming iron.

With these chances of changing, a universe that starts out purely hydrogen settles on being about 50 percent hydrogen, a little over 28 percent iron, and a little over 21 percent uranium in about ten billion years. If the universe starts out with equal amounts of hydrogen, iron, and uranium, however, it settles over the course of eight billion years to … 50 percent hydrogen, a little over 28 percent iron, and a little over 21 percent uranium. If it starts out with no hydrogen and the rest of matter evenly split between iron and uranium, then over the course of twelve billion years it gets to … 50 percent hydrogen, a litte over 28 percent iron, and a little over 21 percent uranium.

Perhaps the problem is that I’m picking these numbers, and I’m biased towards things that are pretty nice ones — halves and thirds and two-fifths and the like — and maybe that’s causing this state where the universe settles down very quickly and stops changing any. We should at least try that before supposing there’s necessarily something more than coincidence going on here.

So I set the random number generator to produce some element changes which can’t be susceptible to my bias for simple numbers. Give hydrogen a 44.5385 percent chance of staying hydrogen, a 10.4071 percent chance of becoming iron, and a 45.0544 percent chance of becoming uranium. Give iron a 25.2174 percent chance of becoming hydrogen, a 32.0355 percent chance of staying iron, and a 42.7471 percent chance of becoming uranium. Give uranium a 2.9792 percent chance of becoming hydrogen, a 48.9201 percent chance of becoming iron, and a 48.1007 percent chance of staying uranium. (Clearly, by the way, I’ve given up on picking numbers that might reflect some actual if simple version of nucleosynthesis and I’m just picking numbers for numbers’ sake. That’s all right; the question this essay is, are we stuck getting an unchanging yet infinitely old universe?)

And the same thing happens again: after nine billion years a universe starting from pure hydrogen will be about 18.7 percent hydrogen, about 35.7 percent iron, and about 45.6 percent uranium. Starting from no hydrogen, 50 percent iron, and 50 percent uranium, we get to the same distribution in again about nine billion years. A universe beginning with equal amounts hydrogen, iron, and uranium under these rules gets to the same distribution after only seven billion years.

The conclusion is this settling down doesn’t seem to be caused by picking numbers that are too particularly nice-looking or obviously attractive; and the distributions don’t seem to have an obvious link to what the probabilities of changing are. There seems to be something happening here, though admittedly we haven’t proven that rigorously. To spoil a successor article in this thread: there is something here, and it’s a big thing.

(Also, no, we’re not stuck with an unchanging universe, and good on you if you can see ways to keep the universe changing without, like, having the probability of one atom changing to another itself vary in time.)

## Looking to Euler

I haven’t forgotten about writing original material here — actually I’ve been trying to think of why something I’ve not thought about a long while is true, which is embarrassing and hard to do — but in the meanwhile I’d like to remember Leonhard Euler’s 306th birthday and point to Richard Elwes’s essay here about Euler’s totient function. “Totient” is, as best I can determine, a word that exists only for this mathematical concept — it’s the count of how many numbers are relatively prime to a given number — but even if the word comes only from the mildly esoteric world of prime number studies, it’s still one of my favorite mathematical terms. It feels like a word that ought to be more successful. Someday I’ll probably get in a nasty argument with other people playing Boggle about it.

Apparently, though, Euler didn’t dub this quantity the “totient”, and the word is a neologism coined by James Joseph Sylvester (1814 – 1897). That’s pretty respectable company, though: Sylvester — whose name you probably brush up against if you study mathematical matrices — is widely praised for his skill in naming things, although the only terms I know offhand that he gave us were “totient” and “discriminant”. That $b^2 - 4ac$ term in the quadratic formula which tells you whether a quadratic equation has two real, one real, or two imaginary solutions, was a name (not a concept) given by him, and he named (and extended) the similar concept for cubic equations. I do believe there are more such Sylvester-dubbed terms, just, that we need a Wikipedia category to gather them together.

I’m amused to be reminded that, according to the St Andrews biographies of mathematicians, Sylvester at least one tossed off this version of the Chicken McNuggets problem, possibly after he’d worked out the general solution:

I have a large number of stamps to the value of 5d and 17d only. What is the largest denomination which I cannot make up with a combination of these two different values.