Combining Matrices And Model Universes
I would like to resume talking about matrices and really old universes and the way nucleosynthesis in these model universes causes atoms to keep settling down to peculiar but unchanging distribution.
I’d already described how a matrix offers a nice way to organize elements, and in ways that encode information about the context of the elements by where they’re placed. That’s useful and saves some writing, certainly, although by itself it’s not that interesting. Matrices start to get really powerful when, first, the elements being stored are things on which you can do something like arithmetic with pairs of them. Here I mostly just mean that you can add together two elements, or multiply them, and get back something meaningful.
This typically means that the matrix is made up of a grid of numbers, although that isn’t actually required, just, really common if we’re trying to do mathematics.
Then you get the ability to add together and multiply together the matrices themselves, turning pairs of matrices into some new matrix, and building something that works a lot like arithmetic on these matrices.
Adding one matrix to another is done in almost the obvious way: add the element in the first row, first column of the first matrix to the element in the first row, first column of the second matrix; that’s the first row, first column of your new matrix. Then add the element in the first row, second column of the first matrix to the element in the first row, second column of the second matrix; that’s the first row, second column of the new matrix. Add the element in the second row, first column of the first matrix to the element in the second row, first column of the second matrix, and put that in the second row, first column of the new matrix. And so on.
This means you can only add together two matrices that are the same size — the same number of rows and of columns — but that doesn’t seem unreasonable.
You can also do something called scalar multiplication of a matrix, in which you multiply every element in the matrix by the same number. A scalar is just a number that isn’t part of a matrix. This multiplication is useful, not least because it lets us talk about how to subtract one matrix from another: to find the difference of the first matrix and the second, scalar-multiply the second matrix by -1, and then add the first to that product. But you can do scalar multiplication by any number, by two or minus pi or by zero if you feel like it.
I should say something about notation. When we want to write out these kinds of operations efficiently, of course, we turn to symbols to represent the matrices. We can, in principle, use any symbols, but by convention a matrix usually gets represented with a capital letter, A or B or M or P or the like. So to add matrix A to matrix B, with the result being matrix C, we can write out the equation “A + B = C”, which is about as simple as we could hope to see. Scalars are normally written in lowercase letters, often Greek letters, if we don’t know what the number is, so that the scalar multiplication of the number r and the matrix A would be the product “rA”, and we could write the difference between matrix A and matrix B as “A + (-1)B” or “A – B”.
Matrix multiplication, now, that is done by a process that sounds like doubletalk, and it takes a while of practice to do it right. But there are good reasons for doing it that way and we’ll get to one of those reasons by the end of this essay.
To multiply matrix A and matrix B together, we do multiply various pairs of elements from both matrix A and matrix B. The surprising thing is that we also add together sets of these products, per this rule.
Take the element in the first row, first column of A, and multiply it by the element in the first row, first column of B. Add to that the product of the element in the first row, second column of A and the second row, first column of B. Add to that total the product of the element in the first row, third column of A and the third row, second column of B, and so on. When you’ve run out of columns of A and rows of B, this total is the first row, first column of the product of the matrices A and B.
Plenty of work. But we have more to do. Take the product of the element in the first row, first column of A and the element in the first row, second column of B. Add to that the product of the element in the first row, second column of A and the element in the second row, second column of B. Add to that the product of the element in the first row, third column of A and the element in the third row, second column of B. And keep adding those up until you’re out of columns of A and rows of B. This total is the first row, second column of the product of matrices A and B.
This does mean that you can multiply matrices of different sizes, provided the first one has as many columns as the second has rows. And the product may be a completely different size from the first or second matrices. It also means it might be possible to multiply matrices in one order but not the other: if matrix A has four rows and three columns, and matrix B has three rows and two columns, then you can multiply A by B, but not B by A.
My recollection on learning this process was that this was crazy, and the workload ridiculous, and I imagine people who get this in Algebra II, and don’t go on to using mathematics later on, remember the process as nothing more than an unpleasant blur of doing a lot of multiplying and addition for some reason or other.
So here is one of the reasons why we do it this way. Let me define two matrices:
Then matrix A times B is
You’ve seen those numbers before, of course: the matrix A contains the probabilities I put in my first model universe to describe the chances that over the course of a billion years a hydrogen atom would stay hydrogen, or become iron, or become uranium, and so on. The matrix B contains the original distribution of atoms in the toy universe, 100 percent hydrogen and nothing anything else. And the product of A and B was exactly the distribution after that first billion years: 75 percent hydrogen, 25 percent iron, nothing uranium.
If we multiply the matrix A by that product again — well, you should expect we’re going to get the distribution of elements after two billion years, that is, 56.25 percent hydrogen, 33.75 percent iron, 10 percent uranium, but let me write it out anyway to show:
And if you don’t know just what would happen if we multipled A by that product, you aren’t paying attention.
This also gives a reason why matrix multiplication is defined this way. The operation captures neatly the operation of making a new thing — in the toy universe case, hydrogen or iron or uranium — out of some combination of fractions of an old thing — again, the former distribution of hydrogen and iron and uranium.
Or here’s another reason. Since this matrix A has three rows and three columns, you can multiply it by itself and get a matrix of three rows and three columns out of it. That matrix — which we can write as A2 — then describes how two billion years of nucleosynthesis would change the distribution of elements in the toy universe. A times A times A would give three billion years of nucleosynthesis; A10 ten billion years. The actual calculating of the numbers in these matrices may be tedious, but it describes a complicated operation very efficiently, which we always want to do.
I should mention another bit of notation. We usually use capital letters to represent matrices; but, a matrix that’s just got one column is also called a vector. That’s often written with a lowercase letter, with a little arrow above the letter, as in , or in bold typeface, as in x. (The arrows are easier to put in writing, the bold easier when you were typing on typewriters.) But if you’re doing a lot of writing this out, and know that (say) x isn’t being used for anything but vectors, then even that arrow or boldface will be forgotten. Then we’d write the product of matrix A and vector x as just Ax. (There are also cases where you put a little caret over the letter; that’s to denote that it’s a vector that’s one unit of length long.)
When you start writing vectors without an arrow or boldface you start to run the risk of confusing what symbols mean scalars and what ones mean vectors. That’s one of the reasons that Greek letters are popular for scalars. It’s also common to put scalars to the left and vectors to the right. So if one saw “rMx”, it would be expected that r is a scalar, M a matrix, and x a vector, and if they’re not then this should be explained in text nearby, preferably before the equations. (And of course if it’s work you’re doing, you should know going in what you mean the letters to represent.)