My next entry for this A To Z was another request, this one from Jacob Kanev, who doesn’t seem to have a WordPress or other blog. (If I’m mistaken, please, let me know.) Kanev’s given me several requests, some of them quite challenging. Some too challenging: I have to step back from describing “both context sensitive and not” kinds of grammar just now. I hope all will forgive me if I just introduce the base idea.
One of the ideals humans hold when writing a mathematical proof is to crush all humanity from the proof. It’s nothing personal. It reflects a desire to be certain we have proved things without letting any unstated assumptions or unnoticed biases interfering. The 19th century was a lousy century for mathematicians and their intuitions. Many ideas that seemed clear enough turned out to be paradoxical. It’s natural to want to not make those mistakes again. We can succeed.
We can do this by stripping out everything but the essentials. We can even do away with words. After all, if I say something is a “square”, that suggests I mean what we mean by “square” in English. Our mathematics might not have proved all the square-ness of the thing. And so we reduce the universe to symbols. Letters will do as symbols, if we want to be kind to our typesetters. We do want to be kind now that, thanks to LaTeX, we do our own typesetting.
This is called building a “formal language”. The “formal” here means “relating to the form” rather than “the way you address people when you can’t just say `heya, gang’.” A formal language has two important components. One is the symbols that can be operated on. The other is the operations you can do on the symbols.
If we’ve set it all up correctly then we get something wonderful. We have “statements”. They’re strings of the various symbols. Some of the statements are axioms; they’re assumed to be true without proof. We can turn a statement into another one by using a statement we have and one of the operations. If the operation requires, we can add in something else we already know to be true. Something we’ve already proven.
Any statement we build this way — starting from an axiom and building with the valid operations — is a new and true statement. It’s a theorem. The proof of the theorem? It’s the full sequence of symbols and operations that we’ve built. The line between advanced mathematics and magic is blurred. To give a theorem its full name is to give its proof. (And now you understand why the biographies of many of the pioneering logicians of the late 19th and early 20th centuries include a period of fascination with the Kabbalah and other forms of occult or gnostic mysticism.)
A grammar is what’s required to describe a language like this. It’s defined to be a quartet of properties. The first property is the collection of symbols that can’t be the end of a statement. These are called nonterminal symbols. The second property is the collection of symbols that can end a statement. These are called terminal symbols. (You see why we want to have those as separate lists.) The third property is the collection of rules that let you build new statements from old. The fourth property is the collection of things we take to be true to start. We only have finitely many options for each of these, at least for your typical grammar. I imagine someone has experimented with infinite grammars. But that hasn’t got to be enough of a research field people have to pay attention to them. Not yet, anyway.
Now it’s reasonable to ask if we need mathematicians at all. If building up theorems is just a matter of applying the finitely many rules of inference on finitely many collections of symbols, finitely many times over, then what about this can’t be done by computer? And done better by a computer, since a computer doesn’t need coffee, or bathroom breaks an hour later, or the hope of moving to a tenure-track position?
Well, we do need mathematicians. I don’t say that just because I hope someone will give me money in exchange for doing mathematics. It’s because setting up a computer to just grind out every possible theorem will never turn up what you want to know now. There are several reasons for this.
Here’s a way to see why. It’s drawn from Douglas Hofstadter’s Gödel, Escher, Bach, a copy of which you can find in any college dorm room or student organization office. At least you could back when I was an undergraduate. I don’t know what the kids today use.
Anyway, this scheme has three nonterminal symbols: I, M, and U. As a terminal symbol … oh, let’s just use the space at the end of a string. That way everything looks like words. We will include a couple variables, lowercase letters like x and y and z. They stand for any string of nonterminal symbols. They’re falsework. They help us get work done, but must not appear in our final result.
There’s four rules of inference. The first: if xI is valid, then so is xIM. The second: if Mx is valid, then so is Mxx. The third: if MxIIIy is valid, then so is MxUy. The fourth: if MxUUy is valid, then so is Mxy.
We have one axiom, assumed without proof to be true: MI.
So let’s putter around some. MI is true. So by the second rule, so is MII. That’s a theorem. And since MII is true, by the second rule again, so is MIIII. That’s another theorem. Since MIIII is true, by the first rule, so is MIIIIM. We’ve got another theorem already. Since MIIIIM is true, by the third rule, so is MIUM. We’ve got another theorem. For that matter, since MIIIIM is true, again by the third rule, so is MUIM. Would you like MIUMIUM? That’s waiting there to be proved too.
And that will do. First question: what does any of this even mean? Nobody cares about whether MIUMIUM is a theorem in this system. Nobody cares about figuring out whether MUIUMUIUI might be a theorem. We care about questions like “what’s the smallest odd perfect number?” or “how many equally-strong vortices can be placed in a ring without the system becoming unstable?” With everything reduced to symbol-shuffling like this we’re safe from accidentally assuming something which isn’t justified. But we’re pretty far from understanding what these theorems even mean.
In this case, these strings don’t mean anything. They’re a toy so we can get comfortable with the idea of building theorems this way. We don’t expect them to do any more work than we expect Lincoln Logs to build usable housing. But you can see how we’re starting pretty far from most interesting mathematics questions.
Still, if we started from a system that meant something, we would get there in time, right? … Surely? …
Well, maybe. The thing is, even with this I, M, U scheme and its four rules there are a lot of things to try out. From the first axiom, MI, we can produce either MII or MIM. From MII we can produce MIIM or MIIII. From MIIII we could produce MIIIIM, or MUI, or MIU, or MIIIIIIII. From each of those we can produce … quite a bit of stuff.
All of those are theorems in this scheme and that’s nice. But it’s a lot. Suppose we have set up symbols and axioms and rules that have clear interpretations that relate to something we care about. If we set the computer to produce every possible legitimate result we are going to produce an enormous number of results that we don’t care about. They’re not wrong, they’re just off-point. And there’s a lot more true things that are off-point than there are true things on-point. We need something with judgement to pick out results that have anything to do with what we want to know. And trying out combinations to see if we can produce the pattern we want is hard. Really hard.
And there’s worse. If we set up a formal language that matches real mathematics, then we need a lot of work to prove anything. Even simple statements can take forever. I seem to remember my logic professor needing 27 steps to work out the uncontroversial theorem “if x = y and y = z, then x = z”. (Granting he may have been taking the long way around for demonstration purposes.) We would have to look in theorems of unspeakably many symbols to find the good stuff.
Now it’s reasonable to ask what the point of all this is. Why create a scheme that lets us find everything that can be proved, only to have all we’re interested in buried in garbage?
There are some uses. To make us swear we’ve read Jorge Luis Borges, for one. Another is to study the theory of what we can prove. That is, what are we able to learn by logical deduction? And another is to design systems meant to let us solve particular kinds of problems. That approach makes the subject merge into computer science. Code for a computer is, in a sense, about how to change a string of data into another string of data. What are the legitimate data to start with? What are the rules by which to change the data? And these are the sorts of things grammars, and the study of grammars, are about.
8 thoughts on “A Leap Day 2016 Mathematics A To Z: Grammar”
A beautiful post, thank you; and very well explained. I remember our professor linking grammars and Turing machines with Church’s thesis, the fact that the brain is a deterministic machine, and Gödel’s theorem, to arrive at some pretty fundamental claims about perception and knowledge in general. Well, I guess every professor tries to sell their own subject as the most substantial of all. Although he was pretty successful with this one.
Btw, I do have a wordpress blog: https://jacobkanev.wordpress.com/
I’m happy to be of service and glad that you liked the essay as it turned out.
I’d agree with your professor in linking grammars to Turing machines and fundamental ideas about what knowledge we can have. Grammars are ways of describing what we can know about a system, and if we’re looking seriously into the subject that has to bring us to the decidability problems and the limits of knowledge. I’m less sure about perception, but I don’t know what case your professor made.
And I’m glad for the blog link; thank you.
Great post! I was a die-hard Gödel-Escher-Bach fan :-) That book made my decision to difficult to choose between physics, math, or computer science.