Theorem Thursday: What Is Cramer’s Rule?

KnotTheorist asked for this one during my appeal for theorems to discuss. And I’m taking an open interpretation of what a “theorem” is. I can do a rule.

Cramer’s Rule

I first learned of Cramer’s Rule in the way I expect most people do. It was an algebra course. I mean high school algebra. By high school algebra I mean you spend roughly eight hundred years learning ways to solve for x or to plot y versus x. Then take a pause for polar coordinates and matrices. Then you go back to finding both x and y.

Cramer’s Rule came up in the context of solving simultaneous equations. You have more than one variable. So x and y. Maybe z. Maybe even a w, before whoever set up the problem gives up and renames everything x1 and x2 and x62 and all that. You also have more than one equation. In fact, you have exactly as many equations as you have variables. Are there any sets of values those variables can have which make all those variable true simultaneously? Thus the imaginative name “simultaneous equations” or the search for “simultaneous solutions”.

If all the equations are linear then we can always say whether there’s simultaneous solutions. By “linear” we mean what we always mean in mathematics, which is, “something we can handle”. But more exactly it means the equations have x and y and whatever other variables only to the first power. No x-squared or square roots of y or tangents of z or anything. (The equations are also allowed to omit a variable. That is, if you have one equation with x, y, and z, and another with just x and z, and another with just y and z, that’s fine. We pretend the missing variable is there and just multiplied by zero, and proceed as before.) One way to find these solutions is with Cramer’s Rule.

Cramer’s Rule sets up some matrices based on the system of equations. If the system has two equations, it sets up three matrices. If the system has three equations, it sets up four matrices. If the system has twelve equations, it sets up thirteen matrices. You see the pattern here. And then you can take the determinant of each of these matrices. Dividing the determinant of one of these matrices by another one tells you what value of x makes all the equations true. Dividing the determinant of another matrix by the determinant of one of these matrices tells you which values of y makes all the equations true. And so on. The Rule tells you which determinants to use. It also says what it means if the determinant you want to divide by equals zero. It means there’s either no set of simultaneous solutions or there’s infinitely many solutions.

This gets dropped on us students in the vain effort to convince us knowing how to calculate determinants is worth it. It’s not that determinants aren’t worth knowing. It’s just that they don’t seem to tell us anything we care about. Not until we get into mappings and calculus and differential equations and other mathematics-major stuff. We never see it in high school.

And the hard part of determinants is that for all the cool stuff they tell us, they take forever to calculate. The determinant for a matrix with two rows and two columns isn’t bad. Three rows and three columns is getting bad. Four rows and four columns is awful. The determinant for a matrix with five rows and five columns you only ever calculate if you’ve made your teacher extremely cross with you.

So there’s the genius and the first problem with Cramer’s Rule. It takes a lot of calculating. Many any errors along the way with the calculation and your work is wrong. And worse, it won’t be wrong in an obvious way. You can find the error only by going over every single step and hoping to catch the spot where you, somehow, got 36 times -7 minus 21 times -8 wrong.

The second problem is nobody in high school algebra mentions why systems of linear equations should be interesting to solve. Oh, maybe they’ll explain how this is the work you do to figure out where two straight lines intersect. But that just shifts the “and we care because … ?” problem back one step. Later on we might come to understand the lines represent cases where something we’re interested in is true, or where it changes from true to false.

This sort of simultaneous-solution problem turns up naturally in optimization problems. These are problems where you try to find a maximum subject to some constraints. Or find a minimum. Maximums and minimums are the same thing when you think about them long enough. If all the constraints can be satisfied at once and you get a maximum (or minimum, whatever), great! If they can’t … Well, you can study how close it’s possible to get, and what happens if you loosen one or more constraint. That’s worth knowing about.

The third problem with Cramer’s Rule is that, as a method, it kind of sucks. We can be convinced that simultaneous linear equations are worth solving, or at least that we have to solve them to get out of High School Algebra. And we have computers. They can grind away and work out thirteen determinants of twelve-row-by-twelve-column matrices. They might even get an answer back before the end of the term. (The amount of work needed for a determinant grows scary fast as the matrix gets bigger.) But all that work might be meaningless.

The trouble is that Cramer’s Rule is numerically unstable. Before I even explain what that is you already sense it’s a bad thing. Think of all the good things in your life you’ve heard described as unstable. Fair enough. But here’s what we mean by numerically unstable.

Is 1/3 equal to 0.3333333? No, and we know that. But is it close enough? Sure, most of the time. Suppose we need a third of sixty million. 0.3333333 times 60,000,000 equals 19,999,998. That’s a little off of the correct 20,000,000. But I bet you wouldn’t even notice the difference if nobody pointed it out to you. Even if you did notice it you might write off the difference. “If we must, make up the difference out of petty cash”, you might declare, as if that were quite sensible in the context.

And that’s so because this multiplication is numerically stable. Make a small error in either term and you get a proportional error in the result. A small mistake will — well, maybe it won’t stay small, necessarily. But it’ll not grow too fast too quickly.

So now you know intuitively what an unstable calculation is. This is one in which a small error doesn’t necessarily stay proportionally small. It might grow huge, arbitrarily huge, and in few calculations. So your answer might be computed just fine, but actually be meaningless.

This isn’t because of a flaw in the computer per se. That is, it’s working as designed. It’s just that we might need, effectively, infinitely many digits of precision for the result to be correct. You see where there may be problems achieving that.

Cramer’s Rule isn’t guaranteed to be nonsense, and that’s a relief. But it is vulnerable to this. You can set up problems that look harmless but which the computer can’t do. And that’s surely the worst of all worlds, since we wouldn’t bother calculating them numerically if it weren’t too hard to do by hand.

(Let me direct the reader who’s unintimidated by mathematical jargon, and who likes seeing a good Wikipedia Editors quarrel, to the Cramer’s Rule Talk Page. Specifically to the section “Cramer’s Rule is useless.”)

I don’t want to get too down on Cramer’s Rule. It’s not like the numerical instability hurts every problem you might use it on. And you can, at the cost of some more work, detect whether a particular set of equations will have instabilities. That requires a lot of calculation but if we have the computer to do the work fine. Let it. And a computer can limit its numerical instabilities if it can do symbolic manipulations. That is, if it can use the idea of “one-third” rather than 0.3333333. The software package Mathematica, for example, does symbolic manipulations very well. You can shed many numerical-instability problems, although you gain the problem of paying for a copy of Mathematica.

If you just care about, or just need, one of the variables then what the heck. Cramer’s Rule lets you solve for just one or just some of the variables. That seems like a niche application to me, but it is there.

And the Rule re-emerges in pure analysis, where numerical instability doesn’t matter. When we look to differential equations, for example, we often find solutions are combinations of several independent component functions. Bases, in fact. Testing whether we have found independent bases can be done through a thing called the Wronskian. That’s a way that Cramer’s Rule appears in differential equations.

Wikipedia also asserts the use of Cramer’s Rule in differential geometry. I believe that’s a true statement, and that it will be reflected in many mechanics problems. In these we can use our knowledge that, say, energy and angular momentum of a system are constant values to tell us something of how positions and velocities depend on each other. But I admit I’m not well-read in differential geometry. That’s something which has indeed caused me pain in my scholarly life. I don’t know whether differential geometers thank Cramer’s Rule for this insight or whether they’re just glad to have got all that out of the way. (See the above Wikipedia Editors quarrel.)

I admit for all this talk about Cramer’s Rule I haven’t said what it is. Not in enough detail to pass your high school algebra class. That’s all right. It’s easy to find. MathWorld has the rule in pretty simple form. Mathworld does forget to define what it means by the vector d. (It’s the vector with components d1, d2, et cetera.) But that’s enough technical detail. If you need to calculate something using it, you can probably look closer at the problem and see if you can do it another way instead. Or you’re in high school algebra and just have to slog through it. It’s all right. Eventually you can put x and y aside and do geometry.