My Little 2021 Mathematics A-to-Z: Zorn’s Lemma

The joke to which I alluded last week was a quick pun. The setup is, “What is yellow and equivalent to the Axiom of Choice?” It’s the topic for this week, and the conclusion of the Little 2021 Mathematics A-to-Z. I again thank Mr Wu, of Singapore Maths Tuition, for a delightful topic.

Zorn’s Lemma

Max Zorn did not name it Zorn’s Lemma. You expected that. He thought of it just as a Maximal Principle when introducing it in a 1934 presentation and 1935 paper. The word “lemma” connotes that some theorem is a small thing. It usually means it’s used to prove some larger and more interesting theorem. Zorn’s Lemma is one of those small things. With the right background, a rigorous proof is a couple not-too-dense paragraphs. Without the right background? It’s one of those proofs you read the statement of and nod, agreeing, that sounds reasonable.

The lemma is about partially ordered sets. A set’s partially ordered if it has a relationship between pairs of items in it. You will sometimes see a partially ordered set called a “poset”, a term of mathematical art which make me smile too. If we don’t know anything about the ordering relationship we’ll use the ≤ symbol, just like this was ordinary numbers. To be partially ordered, whenever x ≤ y and y ≤ x, we know that x and y must be equal. And the converse: if x = y then x ≤ y and y ≤ x. What makes this partial is that we’re not guaranteed that every x and y relate in some way. It’s a totally ordered set if we’re guaranteed that at least one of x ≤ y and y ≤ x is always true. And then there is such a thing as a well-ordered set. This is a totally ordered set for which every subset (unless it’s empty) has a minimal element.

If we have a couple elements, each of which we can put in some order, then we can create a chain. If x ≤ y and y ≤ z, then we can write x ≤ y ≤ z and we have at least three things all relating to one another. This seems like stuff too basic to notice, if we think too literally about the relationship being “is less than or equal to”. If the relationship is, say, “divides wholly into”, then we get some interesting different chains. Like, 2 divides into 4, which divides into 8, which divides into 24. And 3 divides into 6 which divides into 24. But 2 doesn’t divide into 3, nor 3 into 2. 4 doesn’t divide into 6, nor 6 into either 8 or 4.

So what Zorn’s Lemma says is, if all the chains in a partially ordered set each have an upper bound, then, the partially ordered set has a maximal element. “Maximal element” here means an element that doesn’t have a bigger comparable element. (That is, m is maximal if there’s no other element b for which m ≤ b. It’s possible that m and b can’t be compared, though, the way 6 doesn’t divide 8 and 8 doesn’t divide 6.) This is a little different from a “maximum” . It’s possible for there to be several maximal elements. But if you parse this as “if you can always find a maximum in a string of elements, there’s some maximum element”? And remember there could be many maximums? Then you’re getting the point.

You may also ask how this could be interesting. Zorn’s Lemma is an existence proof. Most existence proofs assure us a thing we thought existed does, but don’t tell us how to find it. This is all right. We tend to rely on an existence proof when we want to talk about some mathematical item but don’t care about fussy things like what it is. It is much the way we might talk about “an odd perfect number N”. We can describe interesting things that follow from having such a number even before we know what value N has.

A classic example, the one you find in any discussion of using Zorn’s Lemma, is about the basis for a vector space. This is like deciding how to give directions to a point in space. But vector spaces include some quite abstract things. One vector space is “the set of all functions you can integrate”. Another is “matrices whose elements are all four-dimensional rotations”. There might be literally infinitely many “directions” to go. How do we know we can find a set of directions that work as well as, for guiding us around a city, the north-south-east-west compass rose does? So there’s the answer. There are other things done all the time, too. A nontrivial ring-with-identity, for example, has to have a maximal ideal. (An ideal is a subset of the ring that’s still a ring.) This is handy to know if you’re working with rings a lot.

The joke in my prologue was built on the claim Zorn’s Lemma is equivalent to the Axiom of Choice. The Axiom of Choice is a piece of set theory that surprised everyone by being independent of the Zermelo-Fraenkel axioms. The Axiom says that, if you have a collection of disjoint nonempty sets, then there must exist at least one set with exactly one element from each of those sets. That is, you can pick one thing out of each of a set of bins. It’s easy to see how this has in common with Zorn’s Lemma being too obvious to imagine proving. That’s the sort of thing that makes a good axiom. Thing about a lemma, though, is we do prove it. That’s how we know it’s a lemma. How can a lemma be equivalent to an axiom?

I’l argue by analogy. In Euclidean geometry one of the axioms is this annoying statement about on which side of a line two other lines that intersect it will meet. If you have this axiom, you can prove some nice results, like, the interior angles of a triangle add up to two right angles. If you decide you’d rather make your axiom that bit about the interior angles adding up? You can go from that to prove the thing about two lines crossing a third line.

So it is here. If you suppose the Axiom of Choice is true, you can get Zorn’s Lemma: you can pick an element in your set, find a chain for which that’s the minimum, and find your maximal element from that. If you make Zorn’s Lemma your axiom? You can use x ≤ y to mean “x is a less desirable element to pick out of this set than is y”. And then you can choose a maximal element out of your set. (It’s a bit more work than that, but it’s that kind of work.)

There’s another theorem, or principle, that’s (with reservations) equivalent to both Zorn’s Lemma and the Axiom of Choice. It’s another piece that seems so obvious it should defy proof. This is the well-ordering theorem, which says that every set can be well-ordered. That is, so that every non-empty subset has some minimum element. Finally, a mathematical excuse for why we have alphabetical order, even if there’s no clear reason that “j” should come after “i”.

(I said “with reservations” above. This is because whether these are equivalent depends on what, precisely, kind of deductive logic you’re using. If you are not using ordinary propositional logic, and are using a “second-order logic” instead, they differ.)

Ermst Zermelo introduced the Axiom of Choice to set theory so that he could prove this in a way that felt reasonable. I bet you can imagine how you’d go from “every non-empty set has a minimum element” right back to “you can always pick one element of every set”, though. And, maybe if you squint, can see how to get from “there’s always a minimum” to “there has to be a maximum”. I’m speaking casually here because proving it precisely is more work than we need to do.

I mentioned how Zorn did not name his lemma after himself. Mathematicians typically don’t name things for themselves. Nor did he even think of it as a lemma. His name seems to have adhered to the principle in the late 30s. Credit the nonexistent mathematician Bourbaki writing about “le théorème de Zorn”. By 1940 John Tukey, celebrated for the Fast Fourier Transform, wrote of “Zorn’s Lemma”. Tukey’s impression was that this is how people in Princeton spoke of it at the time. He seems to have been the first to put the words “Zorn’s Lemma” in print, though. Zorn isn’t the first to have stated this. Kazimierez Kuratowski, in 1922, described what is clearly Zorn’s Lemma in a different form. Zorn remembered being aware of Kuratowski’s publication but did not remember noticing the property. The Hausdorff Maximal Principle, of Felix Hausdorff, has much the same content. Zorn said he did not know about Hausdorff’s 1927 paper until decades later.

Zorn’s lemma, the Axiom of Choice, the well-ordering theorem, and Hausdorff’s Maximal Principle all date to the early 20th century. So do a handful of other ideas that turn out to be equivalent. This was an era when set theory saw an explosive development of new and powerful ideas. The point of describing this chain is to emphasize that great concepts often don’t have a unique presentation. Part of the development of mathematics is picking through several quite similar expressions of a concept. Which one do we enshrine as an axiom, or at least the canonical presentation of the idea?

We have to choose.

And with this I at last declare the hard work Little 2021 Mathematics A-to-Z at an end. I plan to follow up, as traditional, with a little essay about what I learned while doing this project. All of the Little 2021 Mathematics A-to-Z essays should be at this link. And then all of the A-to-Z essays from all eight projects should be at this link. Thank you so for your support in these difficult times.

What Are Equivalence Classes?

(A couple weeks ago I published a little lemma of an essay. This is the next lemma; if my prose style holds out this’ll all lead to something neat.)

If you have the idea of equivalence — that you can pick elements of a set out and say whether they share some property, and that the sharing of that property works in some of the ways that equality works — then you can create equivalence classes. This is the dividing of your original set up into smaller parts according to the rule that everything in one of those parts is equivalent to anything else in that same part.

Since I think about mathematics so much, the most familiar equivalence classes to my mind comes from the counting numbers, that familiar old group of 1, 2, 3, and so on. The equivalence relationship I’d like to use looks a little more alien; it’s “has the same remainder when divided by two as”. But every integer, divided by two, has a remainder of either zero or one, and it’s not hard to follow the divisions here: 4 divided by two has a remainder of zero; 5 divided by two has a remainder of one; 6 divided by two has a remainder of zero; 7 divided by two has a remainder of one; and so on. Using this equivalence relationship, 4 and 6 and 8 and for that matter 2 are in the same class. 3 and 5 and 7 and 9 and so on also share a class, though not the same one that 4 and 6 and 8 had. And, yeah, that’s just the even and the odd numbers, presented in a way that uses much more abstraction than you needed to learn odds and evens.

But we can do this dividing into classes for any set and for any equivalence relationship on that set: dividing a group of people up by those who have the same age; dividing clothes up by those which are the same color; dividing functions up by which ones share some interesting property; whatever you like. There aren’t necessarily just the two equivalence classes. For example, “has the same remainder when divided by four as” will split the counting numbers into four classes, and “is the same age as” will split a group of people up into from as few as one class to as many classes as there are people.

So, why split sets up into equivalence classes? Besides the giddy fun of doing it, the most useful reason I know is that it can often be easier to prove something about a whole set of things if you can break the problem up into smaller ones, proving something about a special case of things. If you need to test something that you know will be the same for all the elements in an equivalence class, you just have to pick one element from that class and test that; that can be a wonderful time-saver.

If you do pick one of the elements of your class that’s called a “class representative”, which is one of mathematics’s less exotic terms. If you’ve picked your representative — let me call it a — and want to talk about the equivalence class that contains it, then that’s normally written with braces around it: [a]. Everything in [a] is equivalent to a, by definition. For the example of odds and evens you could use 1 and 2 — the sets [1] and [2] — although that’s just because we tend to look at nice familiar small numbers when we can. We wouldn’t be doing anything wrong if we wrote the sets as [147] and [2038] instead. We’ve entered a realm without uniquely right answers, just answers that are more attractive because we think they’re easier to work with or we think they look nicer.

But when we write down [147] and [2038] we’ve exhausted the odd-and-even partitioning of the counting numbers. We’ve also written down the quotient set, the collection of all the equivalence classes. This is a set whose elements are themselves sets, which is something a little odd to encounter at first, but not anything too exotic.

Here’s a neat little equivalence relation and quotient set I’d like to toss out for folks to consider. The original set is all the real numbers — positive and negative, rational and irrational. The equivalence relation is “is a whole number different from” — so that, for example, 0, 1, 3, and 35 are all in one equivalence class; 0.5, -3.5, and 147.5 are all in another class together; π, π + 1, π – 8, and &pi + 35 are in yet another class. How many equivalence classes are there for this set and this relation, and, what might a quotient set for them look like?

What Is Equivalence?

(I’m putting this little post out because I want to do something more impressive, and I’ll need this lurking in the background. If it seems unmotivated, then, please treat it as a lemma of an essay.)

Most of us have a fairly decent idea of “equals”; at least, we’re fairly sure we know what it means to say “x is equal to y” and can draw from that other conclusions, such as, the controversial “y is equal to x”. Usually we get the idea early in learning arithmetic, and get used to it in working with numbers, and maybe stretch the idea (within the bounds of mathematics) to include things like two angles being equal or two shapes being equal.

Equivalence is a kind of generalizing of this equality idea: we’ll take a bit of what’s interesting about the idea of two things being equal, and use it in a new context. In this new context two things that might not be equal are still similar in some way that’s of interest for whatever we’re working on right now.

To write that “x is equal to y” efficiently we call on the equals sign and just put “x = y”. “x is equivalent to y” also begs for a shorthand notation, at least if you’re doing a lot of work with the idea, and the easiest way to type that is probably just to use a tilde: “x ~ y”, though I admit I prefer using a double tilde, “x ≈ y”, which isn’t too hard to do in HTML but is more work.

For two things to be equivalent you need to say equivalent with respect to what property. Sugar, sand, and salt are pretty much the same if all you’re interested in is how heaps of small-grained particles move; they’re not at all equivalent if you’re baking; and they’re only sort-of equivalent if you’re trying to melt sidewalk ice. You also need to say what set of things you’re drawing from; it’s very hard to answer whether sugar is equivalent to birds if you thought the discussion was about real numbers. Usually in practice the relationship — called the equivalence relation — carries with it an explicit statement of what the set of things is, unless it’s just blisteringly obvious from context.

To say that something is an equivalence relation means that it has to obey three rules, ones that look make it look a lot like ordinary old equality. The first is called reflexivity: any thing in the set is equivalent to itself. Any number equals itself; any article of clothing has the same color as itself; any person has the same gender as herself. Sounds like an unavoidably true property? Consider, for real numbers, the relationship “is less than”; there’s no number that is less than itself. “Is less than” can’t be an equivalence relationship.

The second is called symmetry: if one thing is equivalent to another, then, that other thing is equivalent to the first. If the number we’ve given the name “height” is equal to the number we’ve given the name “length”, so to does the number we’ve given the name “length” equal the number we’ve given the name “height”, and similarly good results can be found with shirts and people’s genders. For numbers, “Is less than” is ruled out right away; but the initially promising “Is less than or equal to”, which satisfies reflexivity, can flop on symmetry: 4 is less than or equal to 12, certainly, but not the other way around.

And the last is called transitivity: if one thing is equivalent to a second, and a second thing is equivalent to a third, then, the first thing has to be equivalent to the third. Ordinary old numbers being equal to one another are still transitive, and those shirts having the same colors work out too, and the same with people sharing a gender. Interestingly, both “Is less than” and “Is less than or equal to” are transitive, but since those fail on reflexivity or on transitivity they’re not equivalence relationships anyway.

There are a lot of equivalences out there, such as two geometric shapes being congruent, or for that matter just being similar (having the same shape but different sizes), or whole numbers having the same remainder when divided by, say, two (which is a fussy way of saying numbers are odd or are even), or two objects having the same temperature, or the like.