Have a special one today. I’ve been reading a compilation of Crockett Johnson’s 1940s comic Barnaby. The title character, an almost too gentle child, follows his fairy godfather Mr O’Malley into various shenanigans. Many (the best ones, I’d say) involve the magical world. The steady complication is that Mr O’Malley boasts abilities beyond his demonstrated competence. (Although most of the magic characters are shown to be not all that good at their business.) It’s a gentle strip and everything works out all right, if farcically.
This particular strip comes from a late 1948 storyline. Mr O’Malley’s gone missing, coincidentally to a fairy cop come to arrest the pixie, who is a con artist at heart. So this sees the entry of Atlas, the Mental Giant, who’s got some pleasant gimmicks. One of them is his requiring mnemonics built on mathematical formulas to work out names. And this is a charming one, with a great little puzzle: how do you get A-T-L-A-S out of the formula Atlas has remembered?
I’m sorry the solution requires a bit of abusing notation, so please forgive it. But it’s a fun puzzle, especially as the joke would not be funnier if the formula didn’t work. I’m always impressed when a comic strip goes to that extra effort.
Today’s is another topic suggested by Mr Wu, author of the Singapore Maths Tuition blog. The Wronskian is named for Józef Maria Hoëne-Wroński, a Polish mathematician, born in 1778. He served in General Tadeusz Kosciuszko’s army in the 1794 Kosciuszko Uprising. After being captured and forced to serve in the Russian army, he moved to France. He kicked around Western Europe and its mathematical and scientific circles. I’d like to say this was all creative and insightful, but, well. Wikipedia describes him trying to build a perpetual motion machine. Trying to square the circle (also impossible). Building a machine to predict the future. The St Andrews mathematical biography notes his writing a summary of “the general solution of the fifth degree [polynomial] equation”. This doesn’t exist.
Both sources, though, admit that for all that he got wrong, there were flashes of insight and brilliance in his work. The St Andrews biography particularly notes that Wronski’s tables of logarithms were well-designed. This is a hard thing to feel impressed by. But it’s hard to balance information so that it’s compact yet useful. He wrote about the Wronskian in 1812; it wouldn’t be named for him until 1882. This was 29 years after his death, but it does seem likely he’d have enjoyed having a familiar thing named for him. I suspect he wouldn’t enjoy my next paragraph, but would enjoy the fight with me about it.
The Wronskian is a thing put into Introduction to Ordinary Differential Equations courses because students must suffer in atonement for their sins. Those who fail to reform enough must go on to the Hessian, in Partial Differential Equations.
To be more precise, the Wronskian is the determinant of a matrix. The determinant you find by adding and subtracting products of the elements in a matrix together. It’s not hard, but it is tedious, and gets more tedious pretty fast as the matrix gets bigger. (In Big-O notation, it’s the order of the cube of the matrix size. This is rough, for things humans do, although not bad as algorithms go.) The matrix here is made up of a bunch of functions and their derivatives. The functions need to be ones of a single variable. The derivatives, you need first, second, third, and so on, up to one less than the number of functions you have.
If you have two functions, and , you need their first derivatives, and . If you have three functions, , , and , you need first derivatives, , , and , as well as second derivatives, , , and . If you have functions and here I’ll call them , you need derivatives, and so on through . You see right away this is a fun and exciting thing to calculate. Also why in intro to differential equations you only work this out with two or three functions. Maybe four functions if the class has been really naughty.
Go through your functions and your derivatives and make a big square matrix. And then you go through calculating the derivative. This involves a lot of multiplying strings of these derivatives together. It’s a lot of work. But at least doing all this work gets you older.
So one will ask why do all this? Why fit it into every Intro to Ordinary Differential Equations textbook and why slip it in to classes that have enough stuff going on?
One answer is that if the Wronskian is not zero for some values of the independent variable, then the functions that went into it are linearly independent. Mathematicians learn to like sets of linearly independent functions. We can treat functions like directions in space. Linear independence assures us none of these functions are redundant, pointing a way we already can describe. (Real people see nothing wrong in having north, east, and northeast as directions. But mathematicians would like as few directions in our set as possible.) The Wronskian being zero for every value of the independent variable seems like it should tell us the functions are linearly dependent. It doesn’t, not without some more constraints on the functions.
This is fine, but who cares? And, unfortunately, in Intro it’s hard to reach a strong reason to care. To this major, the emphasis on linearly independent functions felt misplaced. It’s the sort of thing we care about in linear algebra. Or some course where we talk about vector spaces. Differential equations do lead us into vector spaces. It’s hard to find a corner of analysis that doesn’t.
Every ordinary differential equation has a secret picture. This is a vector field. One axis in the field is the independent variable of the function. The other axes are the value of the function. And maybe its derivatives, depending on how many derivatives are used in the ordinary differential equation. To solve one particular differential equation is to find one path in this field. People who just use differential equations will want to find one path.
Mathematicians tend to be fine with finding one path. But they want to find what kinds of paths there can be. Are there paths which the differential equation picks out, by making paths near it stay near? Or by making paths that run away from it? And here is the value of the Wronskian. The Wronskian tells us about the divergence of this vector field. This gives us insight to how these paths behave. It’s in the same way that knowing where high- and low-pressure systems are describes how the weather will change. The Wronskian, by way of a thing called Liouville’s Theorem that I haven’t the strength to describe today, ties in to the Hamiltonian. And the Hamiltonian we see in almost every mechanics problem of note.
You can see where the mathematics PhD, or the physicist, would find this interesting. But what about the student, who would look at the symbols evoked by those paragraphs above with reasonable horror?
And here’s the second answer for what the Wronskian is good for. It helps us solve ordinary differential equations. Like, particular ones. An ordinary differential equation will (normally) have several linearly independent solutions. If you know all but one of those solutions, it’s possible to calculate the Wronskian and, from that, the last of the independent solutions. Since a big chunk of mathematics — particularly for science or engineering — is solving differential equations you see why this is something valuable. Allow that it’s tedious. Tedious work we can automate, or give to research assistant to do.
One then asks what kind of differential equation would have all-but-one answer findable, and yield that last one only by long efforts of hard work. So let me show you an example ordinary differential equation:
Here , , and are some functions that depend only on the independent variable, . Don’t know what they are; don’t care. The differential equation is a lot easier of and are constants, but we don’t insist on that.
This equation has a close cousin, and one that’s easier to solve than the original. Is cousin is called a homogeneous equation:
The left-hand-side, the parts with the function that we want to find, is the same. It’s the right-hand-side that’s different, that’s a constant zero. This is what makes the new equation homogenous. This homogenous equation is easier and we can expect to find two functions, and , that solve it. If and are constant this is even easy. Even if they’re not, if you can find one solution, the Wronskian lets you generate the second.
That’s nice for the homogenous equation. But if we care about the original, inhomogenous one? The Wronskian serves us there too. Imagine that the inhomogenous solution has any solution, which we’ll call . (The ‘p’ stands for ‘particular’, as in “the solution for this particular ”.) But also has to solve that inhomogenous differential equation. It seems startling but if you work it out, it’s so. (The key is the derivative of the sum of functions is the same as the sum of the derivative of functions.) also has to solve that inhomogenous differential equation. In fact, for any constants and , it has to be that is a solution.
I’ll skip the derivation; you have Wikipedia for that. The key is that knowing these homogenous solutions, and the Wronskian, and the original , will let you find the that you really want.
My reading is that this is more useful in proving things true about differential equations, rather than particularly solving them. It takes a lot of paper and I don’t blame anyone not wanting to do it. But it’s a wonder that it works, and so well.
Don’t make your instructor so mad you have to do the Wronskian for four functions.
I first learned of Cramer’s Rule in the way I expect most people do. It was an algebra course. I mean high school algebra. By high school algebra I mean you spend roughly eight hundred years learning ways to solve for x or to plot y versus x. Then take a pause for polar coordinates and matrices. Then you go back to finding both x and y.
Cramer’s Rule came up in the context of solving simultaneous equations. You have more than one variable. So x and y. Maybe z. Maybe even a w, before whoever set up the problem gives up and renames everything x1 and x2 and x62 and all that. You also have more than one equation. In fact, you have exactly as many equations as you have variables. Are there any sets of values those variables can have which make all those variable true simultaneously? Thus the imaginative name “simultaneous equations” or the search for “simultaneous solutions”.
If all the equations are linear then we can always say whether there’s simultaneous solutions. By “linear” we mean what we always mean in mathematics, which is, “something we can handle”. But more exactly it means the equations have x and y and whatever other variables only to the first power. No x-squared or square roots of y or tangents of z or anything. (The equations are also allowed to omit a variable. That is, if you have one equation with x, y, and z, and another with just x and z, and another with just y and z, that’s fine. We pretend the missing variable is there and just multiplied by zero, and proceed as before.) One way to find these solutions is with Cramer’s Rule.
Cramer’s Rule sets up some matrices based on the system of equations. If the system has two equations, it sets up three matrices. If the system has three equations, it sets up four matrices. If the system has twelve equations, it sets up thirteen matrices. You see the pattern here. And then you can take the determinant of each of these matrices. Dividing the determinant of one of these matrices by another one tells you what value of x makes all the equations true. Dividing the determinant of another matrix by the determinant of one of these matrices tells you which values of y makes all the equations true. And so on. The Rule tells you which determinants to use. It also says what it means if the determinant you want to divide by equals zero. It means there’s either no set of simultaneous solutions or there’s infinitely many solutions.
This gets dropped on us students in the vain effort to convince us knowing how to calculate determinants is worth it. It’s not that determinants aren’t worth knowing. It’s just that they don’t seem to tell us anything we care about. Not until we get into mappings and calculus and differential equations and other mathematics-major stuff. We never see it in high school.
And the hard part of determinants is that for all the cool stuff they tell us, they take forever to calculate. The determinant for a matrix with two rows and two columns isn’t bad. Three rows and three columns is getting bad. Four rows and four columns is awful. The determinant for a matrix with five rows and five columns you only ever calculate if you’ve made your teacher extremely cross with you.
So there’s the genius and the first problem with Cramer’s Rule. It takes a lot of calculating. Many any errors along the way with the calculation and your work is wrong. And worse, it won’t be wrong in an obvious way. You can find the error only by going over every single step and hoping to catch the spot where you, somehow, got 36 times -7 minus 21 times -8 wrong.
The second problem is nobody in high school algebra mentions why systems of linear equations should be interesting to solve. Oh, maybe they’ll explain how this is the work you do to figure out where two straight lines intersect. But that just shifts the “and we care because … ?” problem back one step. Later on we might come to understand the lines represent cases where something we’re interested in is true, or where it changes from true to false.
This sort of simultaneous-solution problem turns up naturally in optimization problems. These are problems where you try to find a maximum subject to some constraints. Or find a minimum. Maximums and minimums are the same thing when you think about them long enough. If all the constraints can be satisfied at once and you get a maximum (or minimum, whatever), great! If they can’t … Well, you can study how close it’s possible to get, and what happens if you loosen one or more constraint. That’s worth knowing about.
The third problem with Cramer’s Rule is that, as a method, it kind of sucks. We can be convinced that simultaneous linear equations are worth solving, or at least that we have to solve them to get out of High School Algebra. And we have computers. They can grind away and work out thirteen determinants of twelve-row-by-twelve-column matrices. They might even get an answer back before the end of the term. (The amount of work needed for a determinant grows scary fast as the matrix gets bigger.) But all that work might be meaningless.
The trouble is that Cramer’s Rule is numerically unstable. Before I even explain what that is you already sense it’s a bad thing. Think of all the good things in your life you’ve heard described as unstable. Fair enough. But here’s what we mean by numerically unstable.
Is 1/3 equal to 0.3333333? No, and we know that. But is it close enough? Sure, most of the time. Suppose we need a third of sixty million. 0.3333333 times 60,000,000 equals 19,999,998. That’s a little off of the correct 20,000,000. But I bet you wouldn’t even notice the difference if nobody pointed it out to you. Even if you did notice it you might write off the difference. “If we must, make up the difference out of petty cash”, you might declare, as if that were quite sensible in the context.
And that’s so because this multiplication is numerically stable. Make a small error in either term and you get a proportional error in the result. A small mistake will — well, maybe it won’t stay small, necessarily. But it’ll not grow too fast too quickly.
So now you know intuitively what an unstable calculation is. This is one in which a small error doesn’t necessarily stay proportionally small. It might grow huge, arbitrarily huge, and in few calculations. So your answer might be computed just fine, but actually be meaningless.
This isn’t because of a flaw in the computer per se. That is, it’s working as designed. It’s just that we might need, effectively, infinitely many digits of precision for the result to be correct. You see where there may be problems achieving that.
Cramer’s Rule isn’t guaranteed to be nonsense, and that’s a relief. But it is vulnerable to this. You can set up problems that look harmless but which the computer can’t do. And that’s surely the worst of all worlds, since we wouldn’t bother calculating them numerically if it weren’t too hard to do by hand.
I don’t want to get too down on Cramer’s Rule. It’s not like the numerical instability hurts every problem you might use it on. And you can, at the cost of some more work, detect whether a particular set of equations will have instabilities. That requires a lot of calculation but if we have the computer to do the work fine. Let it. And a computer can limit its numerical instabilities if it can do symbolic manipulations. That is, if it can use the idea of “one-third” rather than 0.3333333. The software package Mathematica, for example, does symbolic manipulations very well. You can shed many numerical-instability problems, although you gain the problem of paying for a copy of Mathematica.
If you just care about, or just need, one of the variables then what the heck. Cramer’s Rule lets you solve for just one or just some of the variables. That seems like a niche application to me, but it is there.
And the Rule re-emerges in pure analysis, where numerical instability doesn’t matter. When we look to differential equations, for example, we often find solutions are combinations of several independent component functions. Bases, in fact. Testing whether we have found independent bases can be done through a thing called the Wronskian. That’s a way that Cramer’s Rule appears in differential equations.
Wikipedia also asserts the use of Cramer’s Rule in differential geometry. I believe that’s a true statement, and that it will be reflected in many mechanics problems. In these we can use our knowledge that, say, energy and angular momentum of a system are constant values to tell us something of how positions and velocities depend on each other. But I admit I’m not well-read in differential geometry. That’s something which has indeed caused me pain in my scholarly life. I don’t know whether differential geometers thank Cramer’s Rule for this insight or whether they’re just glad to have got all that out of the way. (See the above Wikipedia Editors quarrel.)
I admit for all this talk about Cramer’s Rule I haven’t said what it is. Not in enough detail to pass your high school algebra class. That’s all right. It’s easy to find. MathWorld has the rule in pretty simple form. Mathworld does forget to define what it means by the vectord. (It’s the vector with components d1, d2, et cetera.) But that’s enough technical detail. If you need to calculate something using it, you can probably look closer at the problem and see if you can do it another way instead. Or you’re in high school algebra and just have to slog through it. It’s all right. Eventually you can put x and y aside and do geometry.