## From my Third A-to-Z: Zermelo-Fraenkel Axioms

The close of my End 2016 A-to-Z let me show off one of my favorite modes, that of amateur historian of mathematics who doesn’t check his primary references enough. So far as I know I don’t have any serious errors here, but then, how would I know? … But keep in mind that the full story is more complicated and more ambiguous than presented. (This is true of all histories.) That I could fit some personal history in was also a delight.

I don’t know why Thoralf Skolem’s name does not attach to the Zermelo-Fraenkel Axioms. Mathematical things are named with a shocking degree of arbitrariness. Skolem did well enough for himself.

gaurish gave me a choice for the Z-term to finish off the End 2016 A To Z. I appreciate it. I’m picking the more abstract thing because I’m not sure that I can explain zero briefly. The foundations of mathematics are a lot easier.

## Zermelo-Fraenkel Axioms

I remember the look on my father’s face when I asked if he’d tell me what he knew about sets. He misheard what I was asking about. When we had that straightened out my father admitted that he didn’t know anything particular. I thanked him and went off disappointed. In hindsight, I kind of understand why everyone treated me like that in middle school.

My father’s always quick to dismiss how much mathematics he knows, or could understand. It’s a common habit. But in this case he was probably right. I knew a bit about set theory as a kid because I came to mathematics late in the “New Math” wave. Sets were seen as fundamental to why mathematics worked without being so exotic that kids couldn’t understand them. Perhaps so; both my love and I delighted in what we got of set theory as kids. But if you grew up before that stuff was popular you probably had a vague, intuitive, and imprecise idea of what sets were. Mathematicians had only a vague, intuitive, and imprecise idea of what sets were through to the late 19th century.

And then came what mathematics majors hear of as the Crisis of Foundations. (Or a similar name, like Foundational Crisis. I suspect there are dialect differences here.) It reflected mathematics taking seriously one of its ideals: that everything in it could be deduced from clearly stated axioms and definitions using logically rigorous arguments. As often happens, taking one’s ideals seriously produces great turmoil and strife.

Before about 1900 we could get away with saying that a set was a bunch of things which all satisfied some description. That’s how I would describe it to a new acquaintance if I didn’t want to be treated like I was in middle school. The definition is fine if we don’t look at it too hard. “The set of all roots of this polynomial”. “The set of all rectangles with area 2”. “The set of all animals with four-fingered front paws”. “The set of all houses in Central New Jersey that are yellow”. That’s all fine.

And then if we try to be logically rigorous we get problems. We always did, though. They’re embodied by ancient jokes like the person from Crete who declared that all Cretans always lie; is the statement true? Or the slightly less ancient joke about the barber who shaves only the men who do not shave themselves; does he shave himself? If not jokes these should at least be puzzles faced in fairy-tale quests. Logicians dressed this up some. Bertrand Russell gave us the quite respectable “The set consisting of all sets which are not members of themselves”, and asked us to stare hard into that set. To this we have only one logical response, which is to shout, “Look at that big, distracting thing!” and run away. This satisfies the problem only for a while.

The while ended in — well, that took a while too. But between 1908 and the early 1920s Ernst Zermelo, Abraham Fraenkel, and Thoralf Skolem paused from arguing whose name would also be the best indie rock band name long enough to put set theory right. Their structure is known as Zermelo-Fraenkel Set Theory, or ZF. It gives us a reliable base for set theory that avoids any contradictions or catastrophic pitfalls. Or does so far as we have found in a century of work.

It’s built on a set of axioms, of course. Most of them are uncontroversial, things like declaring two sets are equivalent if they have the same elements. Declaring that the union of sets is itself a set. Obvious, sure, but it’s the obvious things that we have to make axioms. Maybe you could start an argument about whether we should just assume there exists some infinitely large set. But if we’re aware sets probably have something to teach us about numbers, and that numbers can get infinitely large, then it seems fair to suppose that there must be some infinitely large set. The axioms that aren’t simple obvious things like that are too useful to do without. They assume stuff like that no set is an element of itself. Or that every set has a “power set”, a new set comprising all the subsets of the original set. Good stuff to know.

There is one axiom that’s controversial. Not controversial the way Euclid’s Parallel Postulate was. That’s the ugly one about lines crossing another line meeting on the same side they make angles smaller than something something or other. That axiom was controversial because it read so weird, so needlessly complicated. (It isn’t; it’s exactly as complicated as it must be. Or for a more instructive view, it’s as simple as it could be and still be useful.) The controversial axiom of Zermelo-Fraenkel Set Theory is known as the Axiom of Choice. It says if we have a collection of mutually disjoint sets, each with at least one thing in them, then it’s possible to pick exactly one item from each of the sets.

It’s impossible to dispute this is what we have axioms for. It’s about something that feels like it should be obvious: we can always pick something from a set. How could this not be true?

If it is true, though, we get some unsavory conclusions. For example, it becomes possible to take a ball the size of an orange and slice it up. We slice using mathematical blades. They’re not halted by something as petty as the desire not to slice atoms down the middle. We can reassemble the pieces. Into two balls. And worse, it doesn’t require we do something like cut the orange into infinitely many pieces. We expect crazy things to happen when we let infinities get involved. No, though, we can do this cut-and-duplicate thing by cutting the orange into five pieces. When you hear that it’s hard to know whether to point to the big, distracting thing and run away. If we dump the Axiom of Choice we don’t have that problem. But can we do anything useful without the ability to make a choice like that?

And we’ve learned that we can. If we want to use the Zermelo-Fraenkel Set Theory with the Axiom of Choice we say we were working in “ZFC”, Zermelo-Fraenkel-with-Choice. We don’t have to. If we don’t want to make any assumption about choices we say we’re working in “ZF”. Which to use depends on what one wants to use.

Either way Zermelo and Fraenkel and Skolem established set theory on the foundation we use to this day. We’re not required to use them, no; there’s a construction called von Neumann-Bernays-Gödel Set Theory that’s supposed to be more elegant. They didn’t mention it in my logic classes that I remember, though.

And still there’s important stuff we would like to know which even ZFC can’t answer. The most famous of these is the continuum hypothesis. Everyone knows — excuse me. That’s wrong. Everyone who would be reading a pop mathematics blog knows there are different-sized infinitely-large sets. And knows that the set of integers is smaller than the set of real numbers. The question is: is there a set bigger than the integers yet smaller than the real numbers? The Continuum Hypothesis says there is not.

Zermelo-Fraenkel Set Theory, even though it’s all about the properties of sets, can’t tell us if the Continuum Hypothesis is true. But that’s all right; it can’t tell us if it’s false, either. Whether the Continuum Hypothesis is true or false stands independent of the rest of the theory. We can assume whichever state is more useful for our work.

Back to the ideals of mathematics. One question that produced the Crisis of Foundations was consistency. How do we know our axioms don’t contain a contradiction? It’s hard to say. Typically a set of axioms we can prove consistent are also a set too boring to do anything useful in. Zermelo-Fraenkel Set Theory, with or without the Axiom of Choice, has a lot of interesting results. Do we know the axioms are consistent?

No, not yet. We know some of the axioms are mutually consistent, at least. And we have some results which, if true, would prove the axioms to be consistent. We don’t know if they’re true. Mathematicians are generally confident that these axioms are consistent. Mostly on the grounds that if there were a problem something would have turned up by now. It’s withstood all the obvious faults. But the universe is vaster than we imagine. We could be wrong.

It’s hard to live up to our ideals. After a generation of valiant struggling we settle into hoping we’re doing good enough. And waiting for some brilliant mind that can get us a bit closer to what we ought to be.

## From my Second A-to-Z: Z-score

When I first published this I mentioned not knowing why ‘z’ got picked as a variable name. Any letter besides ‘x’ would make sense. As happens when I toss this sort of question out, I haven’t learned anything about why ‘z’ and not, oh, ‘y’ or ‘t’ or even ‘d’. My best guess is that we don’t want to confuse references to the original data with references to the transformed. And while you can write a ‘z’ so badly it looks like an ‘x’, it’s much easier to write a ‘y’ that looks like an ‘x’. I don’t know whether the Preliminary SAT is still a thing.

And we come to the last of the Leap Day 2016 Mathematics A To Z series! Z is a richer letter than x or y, but it’s still not so rich as you might expect. This is why I’m using a term that everybody figured I’d use the last time around, when I went with z-transforms instead.

## Z-Score

You get an exam back. You get an 83. Did you do well?

Hard to say. It depends on so much. If you expected to barely pass and maybe get as high as a 70, then you’ve done well. If you took the Preliminary SAT, with a composite score that ranges from 60 to 240, an 83 is catastrophic. If the instructor gave an easy test, you maybe scored right in the middle of the pack. If the instructor sees tests as a way to weed out the undeserving, you maybe had the best score in the class. It’s impossible to say whether you did well without context.

The z-score is a way to provide that context. It draws that context by comparing a single score to all the other values. And underlying that comparison is the assumption that whatever it is we’re measuring fits a pattern. Usually it does. The pattern we suppose stuff we measure will fit is the Normal Distribution. Sometimes it’s called the Standard Distribution. Sometimes it’s called the Standard Normal Distribution, so that you know we mean business. Sometimes it’s called the Gaussian Distribution. I wouldn’t rule out someone writing the Gaussian Normal Distribution. It’s also called the bell curve distribution. As the names suggest by throwing around “normal” and “standard” so much, it shows up everywhere.

A normal distribution means that whatever it is we’re measuring follows some rules. One is that there’s a well-defined arithmetic mean of all the possible results. And that arithmetic mean is the most common value to turn up. That’s called the mode. Also, this arithmetic mean, and mode, is also the median value. There’s as many data points less than it as there are greater than it. Most of the data values are pretty close to the mean/mode/median value. There’s some more as you get farther from this mean. But the number of data values far away from it are pretty tiny. You can, in principle, get a value that’s way far away from the mean, but it’s unlikely.

We call this standard because it might as well be. Measure anything that varies at all. Draw a chart with the horizontal axis all the values you could measure. The vertical axis is how many times each of those values comes up. It’ll be a standard distribution uncannily often. The standard distribution appears when the thing we measure satisfies some quite common conditions. Almost everything satisfies them, or nearly satisfies them. So we see bell curves so often when we plot how frequently data points come up. It’s easy to forget that not everything is a bell curve.

The normal distribution has a mean, and median, and mode, of 0. It’s tidy that way. And it has a standard deviation of exactly 1. The standard deviation is a way of measuring how spread out the bell curve is. About 95 percent of all observed results are less than two standard deviations away from the mean. About 99 percent of all observed results are less than three standard deviations away. 99.9997 percent of all observed results are less than six standard deviations away. That last might sound familiar to those who’ve worked in manufacturing. At least it des once you know that the Greek letter sigma is the common shorthand for a standard deviation. “Six Sigma” is a quality-control approach. It’s meant to make sure one understands all the factors that influence a product and controls them. This is so the product falls outside the design specifications only 0.0003 percent of the time.

This is the normal distribution. It has a standard deviation of 1 and a mean of 0, by definition. And then people using statistics go and muddle the definition. It is always so, with the stuff people actually use. Forgive them. It doesn’t really change the shape of the curve if we scale it, so that the standard deviation is, say, two, or ten, or π, or any positive number. It just changes where the tick marks are on the x-axis of our plot. And it doesn’t really change the shape of the curve if we translate it, adding (or subtracting) some number to it. That makes the mean, oh, 80. Or -15. Or eπ. Or some other number. That just changes what value we write underneath the tick marks on the plot’s x-axis. We can find a scaling and translation of the normal distribution that fits whatever data we’re observing.

When we find the z-score for a particular data point we’re undoing this translation and scaling. We figure out what number on the standard distribution maps onto the original data set’s value. About two-thirds of all data points are going to have z-scores between -1 and 1. About nineteen out of twenty will have z-scores between -2 and 2. About 99 out of 100 will have z-scores between -3 and 3. If we don’t see this, and we have a lot of data points, then that’s suggests our data isn’t normally distributed.

I don’t know why the letter ‘z’ is used for this instead of, say, ‘y’ or ‘w’ or something else. ‘x’ is out, I imagine, because we use that for the original data. And ‘y’ is a natural pick for a second measured variable. z’, I expect, is just far enough from ‘x’ it isn’t needed for some more urgent duty, while being close enough to ‘x’ to suggest it’s some measured thing.

The z-score gives us a way to compare how interesting or unusual scores are. If the exam on which we got an 83 has a mean of, say, 74, and a standard deviation of 5, then we can say this 83 is a pretty solid score. If it has a mean of 78 and a standard deviation of 10, then the score is better-than-average but not exceptional. If the exam has a mean of 70 and a standard deviation of 4, then the score is fantastic. We get to meaningfully compare scores from the measurements of different things. And so it’s one of the tools with which statisticians build their work.

## From my First A-to-Z: Z-transform

Back in the day I taught in a Computational Science department, which threw me out to exciting and new-to-me subjects more than once. One quite fun semester I was learning, and teaching, signal processing. This set me up for the triumphant conclusion of my first A-to-Z.

One of the things you can see in my style is mentioning the connotations implied by whether one uses x or z as a variable. Any letter will do, for the use it’s put to. But to use the name ‘z’ suggests an openness to something that ‘x’ doesn’t.

There’s a mention here about stability in algorithms, and the note that we can process data in ways that are stable or are unstable. I don’t mention why one would want or not want stability. Wanting stability hardly seems to need explaining; isn’t that the good option? And, often, yes, we want stable systems because they correct and wipe away error. But there are reasons we might want instability, or at least less stability. Too stable a system will obscure weak trends, or the starts of trends. Your weight flutters day by day in ways that don’t mean much, which is why it’s better to consider a seven-day average. If you took instead a 700-day running average, these meaningless fluctuations would be invisible. But you also would take a year or more to notice whether you were losing or gaining weight. That’s one of the things stability costs.

## z-transform.

The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)

It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a0 represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.

a1, a2, a3, and so on are the first, second, third, and so on samples after the reference start. a-1, a-2, a-3, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.

So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a-4 = 77, a-3 = 81, a-2 = 84, a-1 = 82, a0 = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.

The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0$. (z1 is the same as z. z0 is the same as the number “1”. I wrote it this way to make the pattern more clear.)

I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a1 = 83 and a2 = 76. Then the z-transform for this set of data would be $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$. You’d probably agree that’s not a polynomial, although it looks a lot like one.

The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.

A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$.

Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure $\cdots + 0z^2 + 0z^1 + 1 + 1\left(\frac{1}{z}\right)^1 + 1\left(\frac{1}{z}\right)^2 + 1\left(\frac{1}{z}\right)^3 + 1\left(\frac{1}{z}\right)^4 + \cdots$. And that turns out to be the same as $1 \div \left(1 - \left(\frac{1}{z}\right)\right)$. That’s much shorter to write down, at least.

Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a0, and adding to that one-fifth of yesterday’s temperature, a-1, and one-fifth of the day before’s temperature a-2, and one-fifth a-3, and one-fifth a-4.

The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?

We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.

## A Moment Which Turns Out to Be Universal

I was reading a bit farther in Charles Coulson Gillispie’s Pierre-Simon Laplace, 1749 – 1827, A Life In Exact Science and reached this paragraph, too good not to share:

Wishing to study [ Méchanique céleste ] in advance, [ Jean-Baptiste ] Biot offered to read proof. When he returned the sheets, he would often ask Laplace to explain some of the many steps that had been skipped over with the famous phrase, “it is easy to see”. Sometimes, Biot said, Laplace himself would not remember how he had worked something out and would have difficulty reconstructing it.

So, it’s not just you and your instructors.

(Gillispie wrote the book along with Robert Fox and Ivor Grattan-Guinness.)

## How All Of 2021 Treated My Mathematics Blog

Oh, you know, how did 2021 treat anybody? I always do one of these surveys for the end of each month. It’s only fair to do one for the end of the year also.

2021 was my tenth full year blogging around here. I might have made more of that if the actual anniversary in late September hadn’t coincided with a lot of personal hardships. 2021 was a quiet year around these parts with only 94 things posted. That’s the fewest of any full year. (I posted only 41 things in 2011, but I only started posting at all in late September of that year.) That seems not to have done my readership any harm. There were 28,832 pages viewed in 2021, up from 24,474 in 2020 and a fair bit above the 24,662 given in my previously best-viewed year of 2019. Eleven data points (the partial year 2011, and the full years 2012 through 2021) aren’t many, so there’s no real drawing patterns here. But it does seem like I have a year of sharp increases and then a year of slight declines in page views. I suppose we’ll check in in 2023 and see if that pattern holds.

One thing not declining? The number of unique visitors. WordPress recorded 20,339 unique visitors in 2021, a comfortable bit above 2020’s 16,870 and 2019s 16,718. So far I haven’t seen a year-over-year decline in unique visitors. That’s gratifying.

Less gratifying: the number of likes continues its decline. It hasn’t increased, around here, since 2015 when a seemingly impossible 3,273 likes were given by readers. In 2021 there were only 481 likes, the fewest since 2013. The dropping-off of likes has looked so resembled a Poisson distribution that I’m tempted to see whether it actually fits that.

The number of comments dropped a slight bit. There were 188 given around here in 2021, but that’s only ten fewer than were given in 2020. It’s seven more than were given in 2019, so if there’s any pattern there I don’t know it.

WordPress lists 483 posts around here as having gotten four or more page views in the year. It won’t tell me everything that got even a single view, though. I’m not willing to do the work of stitching together the monthly page view data to learn everything that was of interest however passing. I’ll settle with knowing what was most popular. And what were my most popular posts of the year mercifully ended? These posts from 2021 got more views than all the others:

There were 143 countries, or country-like entities, sending me any page views in 2021. I don’t know how that compares to earlier years. But here’s the roster of where page views came from:

United States 13,723
Philippines 3,994
India 2,507
United Kingdom 865
Australia 659
Germany 442
Brazil 347
South Africa 296
European Union 273
Sweden 230
Singapore 210
Italy 204
Austria 178
France 143
Finland 141
Malaysia 135
South Korea 135
Hong Kong SAR China 132
Ireland 131
Netherlands 117
Turkey 117
Spain 107
Pakistan 105
Thailand 102
Mexico 101
United Arab Emirates 100
Indonesia 97
Switzerland 95
Norway 87
New Zealand 86
Belgium 76
Nigeria 76
Russia 74
Japan 64
Taiwan 62
Poland 55
Greece 54
Denmark 52
Colombia 51
Israel 49
Ghana 46
Portugal 44
Czech Republic 40
Vietnam 38
Saudi Arabia 33
Argentina 30
Lebanon 30
Nepal 28
Egypt 25
Kuwait 23
Serbia 22
Chile 21
Croatia 21
Jamaica 20
Peru 20
Tanzania 20
Costa Rica 19
Romania 17
Sri Lanka 16
Ukraine 15
Hungary 13
Jordan 13
Bulgaria 12
China 12
Albania 11
Bahrain 11
Morocco 11
Estonia 10
Qatar 10
Slovakia 10
Cyprus 9
Kenya 9
Zimbabwe 9
Algeria 8
Oman 8
Belarus 7
Georgia 7
Honduras 7
Lithuania 7
Puerto Rico 7
Venezuela 7
Bosnia & Herzegovina 6
Ethiopia 6
Iraq 6
Belize 5
Bhutan 5
Moldova 5
Uruguay 5
Dominican Republic 4
Guam 4
Kazakhstan 4
Macedonia 4
Mauritius 4
Zambia 4
Åland Islands 3
Antigua & Barbuda 3
Bahamas 3
Cambodia 3
Gambia 3
Guatemala 3
Slovenia 3
Suriname 3
American Samoa 2
Azerbaijan 2
Bolivia 2
Cameroon 2
Guernsey 2
Malta 2
Papua New Guinea 2
Réunion 2
Rwanda 2
Sudan 2
Uganda 2
Afghanistan 1
Andorra 1
Armenia 1
Fiji 1
Iceland 1
Isle of Man 1
Latvia 1
Liberia 1
Liechtenstein 1
Luxembourg 1
Maldives 1
Marshall Islands 1
Mongolia 1
Myanmar (Burma) 1
Namibia 1
Palestinian Territories 1
Panama 1
Paraguay 1
Senegal 1
St. Lucia 1
Togo 1
Tunisia 1
Vatican City 1

I don’t know that I’ve gotten a reader from Vatican City before. I hope it’s not about the essay figuring what dates are most and least likely for Easter. I’d expect them to know that already.

My plan is to spend a bit more time republishing posts from old A-to-Z’s. And then I hope to finish off the Little 2021 Mathematics A-to-Z, late and battered but still carrying on. I intend to post something at least once a week after that, although I don’t have a clear idea what that will be. Perhaps I’ll finally work out the algorithm for Compute!’s New Automatic Proofreader. Perhaps I’ll fill in with A-to-Z style essays for topics I had skipped before. Or I might get back to reading the comics for their mathematics topics. I’m open to suggestions.

## Some Progress on the Infinitude of Monkeys

I have been reading Pierre-Simon LaPlace, 1749 – 1827, A Life In Exact Science, by Charles Coulson Gillispie with Robert Fox and Ivor Grattan-Guinness. It’s less of a biography than I expected and more a discussion of LaPlace’s considerable body of work. Part of LaPlace’s work was in giving probability a logically coherent, rigorous meaning. Laplace discusses the gambler’s fallacy and the tendency to assign causes to random events. That, for example, if we came across letters from a printer’s font reading out ‘INFINITESIMAL’ we would think that deliberate. We wouldn’t think that for a string of letters in no recognized language. And that brings up this neat quote from Gillispie:

The example may in all probability be adapted from the chapter in the Port-Royal La Logique (1662) on judgement of future events, where Arnauld points out that it would be stupid to bet twenty sous against ten thousand livres that a child playing with printer’s type would arrange the letters to compose the first twenty lines of Virgil’s Aenid.

The reference here is to a book by Antoine Arnauld and Pierre Nicole that I haven’t read or heard of before. But it makes a neat forerunner to the Infinite Monkey Theorem. That’s the study of what probability means when put to infinitely great or long processes. Émile Borel’s use of monkeys at a typewriter echoes this idea of children playing beyond their understanding. I don’t know whether Borel knew of Arnauld and Nicole’s example. But I did not want my readers to miss a neat bit of infinite-monkey trivia. Or to miss today’s Bizarro, offering yet another comic on the subject.

## From my Seventh A-to-Z: Big-O and Little-O Notation

I toss off a mention in this essay, about its book publication. By the time it appeared I was thinking whether I could assemble these A-to-Z’s, or a set of them, into a book. I haven’t had the energy to put that together but it still seems viable.

Mr Wu, author of the Singapore Maths Tuition blog, asked me to explain a technical term today. I thought that would be a fun, quick essay. I don’t learn very fast, do I?

A note on style. I make reference here to “Big-O” and “Little-O”, capitalizing and hyphenating them. This is to give them visual presence as a name. In casual discussion they’re just read, or said, as the two words or word-and-a-letter. Often the Big- or Little- gets dropped and we just talk about O. An O, without further context, in my experience means Big-O.

The part of me that wants smooth consistency in prose urges me to write “Little-o”, as the thing described is represented with a lowercase ‘o’. But Little-o sounds like a midway game or an Eyerly Aircraft Company amusement park ride. And I never achieve consistency in my prose anyway. Maybe for the book publication. Until I’m convinced another is better, though, “Little-O” it is.

# Big-O and Little-O Notation.

When I first went to college I had a campus post office box. I knew my box number. I also knew the length of the sluggish line for the combination lock code. The lock was a dial, lettered A through J. Being a young STEM-class idiot I thought, boy, would it actually be quicker to pick the lock than wait for the line? A three-letter combination, of ten options? That’s 1,000 possibilities. If I could try five a minute that’s, at worst, three hours 20 minutes. Combination might be anywhere in that set; I might get lucky. I could expect to spend 80 minutes picking my lock.

I decided to wait in line instead, and good that I did. I was unaware lock settings might not be a letter, like ‘A’. It could be the midway point between adjacent letters, like ‘AB’. That meant there were eight times as many combinations as I estimated, and I could expect to spend over ten hours. Even the slow line was faster than that. It transpired that my combination had two of these midway letters.

But that’s a little demonstration of algorithmic complexity. Also in cracking passwords by trial-and-error. Doubling the set of possible combination codes octuples the time it takes to break into the set. Making the combination longer would also work; each extra letter would multiply the cracking time by twenty. So you understand why your password should include “special characters” like punctuation, but most of all should be long.

We’re often interested in how long to expect a task to take. Sometimes we’re interested in the typical time it takes. Often we’re interested in the longest it could ever take. If we have a deterministic algorithm, we can say. We can count how many steps it takes. Sometimes this is easy. If we want to add two two-digit numbers together we know: it will be, at most, three single-digit additions plus, maybe, writing down a carry. (To add 98 and 37 is adding 8 + 7 to get 15, to add 9 + 3 to get 12, and to take the carry from the 15, so, 1 + 12 to get 13, so we have 135.) We can get a good quarrel going about what “a single step” is. We can argue whether that carry into the hundreds column is really one more addition. But we can agree that there is some smallest bit of arithmetic work, and proceed from that.

For any algorithm we have something that describes how big a thing we’re working on. It’s often ‘n’. If we need more than one variable to describe how big it is, ‘m’ gets called up next. If we’re estimating how long it takes to work on a number, ‘n’ is the number of digits in the number. If we’re thinking about a square matrix, ‘n’ is the number of rows and columns. If it’s a not-square matrix, then ‘n’ is the number of rows and ‘m’ the number of columns. Or vice-versa; it’s your matrix. If we’re looking for an item in a list, ‘n’ is the number of items in the list. If we’re looking to evaluate a polynomial, ‘n’ is the order of the polynomial.

In normal circumstances we don’t work out how many steps some operation does take. It’s more useful to know that multiplying these two long numbers would take about 900 steps than that it would need only 816. And so this gives us an asymptotic estimate. We get an estimate of how much longer cracking the combination lock will take if there’s more letters to pick from. This allowing that some poor soul will get the combination A-B-C.

There are a couple ways to describe how long this will take. The more common is the Big-O. This is just the letter, like you find between N and P. Since that’s easy, many have taken to using a fancy, vaguely cursive O, one that looks like $\mathcal{O}$. I agree it looks nice. Particularly, though, we write $\mathcal{O}(f(n))$, where f is some function. In practice, we’ll see functions like $\mathcal{O}(n)$ or $\mathcal{O}(n^2 \log(n))$ or $\mathcal{O}(n^3)$. Usually something simple like that. It can be tricky. There’s a scheme for multiplying large numbers together that’s $\mathcal{O}(n \cdot 2^{\sqrt{2 log (n)}} \cdot log(n))$. What you will not see is something like $\mathcal{O}(\sin (n))$, or $\mathcal{O}(n^3 - n^4)$ or such. This comes to what we mean by the Big-O.

It’ll be convenient for me to have a name for the actual number of steps the algorithm takes. Let me call the function describing that g(n). Then g(n) is $\mathcal{O}(f(n))$ if once n gets big enough, g(n) is always less than C times f(n). Here c is some constant number. Could be 1. Could be 1,000,000. Could be 0.00001. Doesn’t matter; it’s some positive number.

There’s some neat tricks to play here. For example, the function ‘$n$‘ is $\mathcal{O}(n)$. It’s also $\mathcal{O}(n^2)$ and $\mathcal{O}(n^9)$ and $\mathcal{O}(e^{n})$. The function ‘$n^2$‘ is also $\mathcal{O}(n^2)$ and those later terms, but it is not $\mathcal{O}(n)$. And you can see why $\mathcal{O}(\sin(n))$ is right out.

There is also a Little-O notation. It, too, is an upper bound on the function. But it is a stricter bound, setting tighter restrictions on what g(n) is like. You ask how it is the stricter bound gets the minuscule letter. That is a fine question. I think it’s a quirk of history. Both symbols come to us through number theory. Big-O was developed first, published in 1894 by Paul Bachmann. Little-O was published in 1909 by Edmund Landau. Yes, the one with the short Hilbert-like list of number theory problems. In 1914 G H Hardy and John Edensor Littlewood would work on another measure and they used Ω to express it. (If you see the letter used for Big-O and Little-O as the Greek omicron, then you see why a related concept got called omega.)

What makes the Little-O measure different is its sternness. g(n) is $o(f(n))$ if, for every positive number C, whenever n is large enough g(n) is less than or equal to C times f(n). I know that sounds almost the same. Here’s why it’s not.

If g(n) is $\mathcal{O}(f(n))$, then you can go ahead and pick a C and find that, eventually, $g(n) \le C f(n)$. If g(n) is $o(f(n))$, then I, trying to sabotage you, can go ahead and pick a C, trying my best to spoil your bounds. But I will fail. Even if I pick, like a C of one millionth of a billionth of a trillionth, eventually f(n) will be so big that $g(n) \le C f(n)$. I can’t find a C small enough that f(n) doesn’t eventually outgrow it, and outgrow g(n).

This implies some odd-looking stuff. Like, that the function n is not $o(n)$. But the function n is at least $o(n^2)$, and $o(n^9)$ and those other fun variations. Being Little-O compels you to be Big-O. Big-O is not compelled to be Little-O, although it can happen.

These definitions, for Big-O and Little-O, I’ve laid out from algorithmic complexity. It’s implicitly about functions defined on the counting numbers. But there’s no reason I have to limit the ideas to that. I could define similar ideas for a function g(x), with domain the real numbers, and come up with an idea of being on the order of f(x).

We make some adjustments to this. The important one is that, with algorithmic complexity, we assumed g(n) had to be a positive number. What would it even mean for something to take minus four steps to complete? But a regular old function might be zero or negative or change between negative and positive. So we look at the absolute value of g(x). Is there some value of C so that, when x is big enough, the absolute value of g(x) stays less than C times f(x)? If it does, then g(x) is $\mathcal{O}(f(x))$. Is it the case that for every positive number C it’s true that g(x) is less than C times f(x), once x is big enough? Then g(x) is $o(f(x))$.

Fine, but why bother defining this?

A compelling answer is that it gives us a way to describe how different a function is from an approximation to that function. We are always looking for approximations to functions because most functions are hard. We have a small set of functions we like to work with. Polynomials are great numerically. Exponentials and trig functions are great analytically. That’s about all the functions that are easy to work with. Big-O notation particularly lets us estimate how bad an error we make using the approximation.

For example, the Runge-Kutta method numerically approximates solutions to ordinary differential equations. It does this by taking the information we have about the function at some point x to approximate its value at a point x + h. ‘h’ is some number. The difference between the actual answer and the Runge-Kutta approximation is $\mathcal{O}(h^4)$. We use this knowledge to make sure our error is tolerable. Also, we don’t usually care what the function is at x + h. It’s just what we can calculate. What we want is the function at some point a fair bit away from x, call it x + L. So we use our approximate knowledge of conditions at x + h to approximate the function at x + 2h. And use x + 2h to tell us about x + 3h, and from that x + 4h and so on, until we get to x + L. We’d like to have as few of these uninteresting intermediate points as we can, so look for as big an h as is safe.

That context may be the more common one. We see it, particularly, in Taylor Series and other polynomial approximations. For example, the sine of a number is approximately:

$\sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \frac{x^9}{9!} + \mathcal{O}(x^{11})$

This has consequences. It tells us, for example, that if x is about 0.1, this approximation is probably pretty good. So it is: the sine of 0.1 (radians) is about 0.0998334166468282 and that’s exactly what five terms here gives us. But it also warns that if x is about 10, this approximation may be gibberish. And so it is: the sine of 10.0 is about -0.5440 and the polynomial is about 1448.27.

The connotation in using Big-O notation here is that we look for small h’s, and for $\mathcal{O}(x)$ to be a tiny number. It seems odd to use the same notation with a large independent variable and with a small one. The concept carries over, though, and helps us talk efficiently about this different problem.

## From my Sixth A-to-Z: Operator

One of the many small benefits of these essays is getting myself clearly grounded on terms that I had accepted without thinking much about. Operator, like functional (mentioned in here), is one of them. I’m sure that when these were first introduced my instructors gave them clear definitions. Buut when they’re first introduced it’s not clear why these are important, or that we are going to spend the rest of grad school talking about them. So this piece from 2019’s A-to-Z sequence secured my footing on a term I had a fair understanding of. You get some idea of what has to be intended from the context in which the term is used. Also from knowing how terms like this tend to be defined. But having it down to where I could certainly pass a true-false test about “is this an operator”? That was new.

Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there.

# Operator.

An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same sapce but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.

So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.

My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.

I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.

Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function $af + bg$ is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of $af + bg$ is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.

I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.

One of the conceptual leaps in high school algebra is that we start analyzing the things we do with numbers. Like, we don’t just take the number three, square it, multiply that by two and add to that the number three times four and add to that the number 1. We think about what if we take any number, call it x, and think of $2x^2 + 4x + 1$. And what if we make equations based on doing this $2x^2 + 4x + 1$; what values of x make those equations true? Or tell us something interesting?

Operators represent a similar leap. We can think of functions as things we manipulate, and think of those manipulations as a particular thing to do. For example, let me come up with a differential expression. For some function u(x) work out the value of this:

$2\frac{d^2 u(x)}{dx^2} + 4 \frac{d u(x)}{dx} + u(x)$

Let me join in the convention of using ‘D’ for the differential operator. Then we can rewrite this expression like so:

$2D^2 u + 4D u + u$

Suddenly the differential equation looks a lot like a polynomial. Of course it does. Remember that everything in mathematics is polynomials. We get new tools to solve differential equations by rewriting them as operators. That’s nice. It also scratches that itch that I think everyone in Intro to Calculus gets, of wanting to somehow see $\frac{d^2}{dx^2}$ as if it were a square of $\frac{d}{dx}$. It’s not, and $D^2$ is not the square of $D$. It’s composing $D$ with itself. But it looks close enough to squaring to feel comfortable.

Nobody needs to do $2D^2 u + 4D u + u$ except to learn some stuff about operators. But you might imagine a world where we did this process all the time. If we did, then we’d develop shorthand for it. Maybe a new operator, call it T, and define it that $T = 2D^2 + 4D + 1$. You see the grammar of treating functions as if they were real numbers becoming familiar. You maybe even noticed the ‘1’ sitting there, serving as the “identity operator”. You know how you’d write out $Tv(x) = 3$ if you needed to write it in full.

But there are operators that we use all the time. These do get special names, and often shorthand. For example, there’s the gradient operator. This applies to any function with several independent variables. The gradient has a great physical interpretation if the variables represent coordinates of space. If they do, the gradient of a function at a point gives us a vector that describes the direction in which the function increases fastest. And the size of that gradient — a functional on this operator — describes how fast that increase is.

The gradient itself defines more operators. These have names you get very familiar with in Vector Calculus, with names like divergence and curl. These have compelling physical interpretations if we think of the function we operate on as describing a moving fluid. A positive divergence means fluid is coming into the system; a negative divergence, that it is leaving. The curl, in fluids, describe how nearby streams of fluid move at different rate.

Physical interpretations are common in operators. This probably reflects how much influence physics has on mathematics and vice-versa. Anyone studying quantum mechanics gets familiar with a host of operators. These have comfortable names like “position operator” or “momentum operator” or “spin operator”. These are operators that apply to the wave function for a problem. They transform the wave function into a probability distribution. That distribution describes what positions or momentums or spins are likely, how likely they are. Or how unlikely they are.

They’re not all physical, though. Or not purely physical. Many operators are useful because they are powerful mathematical tools. There is a variation of the Fourier series called the Fourier transform. We can interpret this as an operator. Suppose the original function started out with time or space as its independent variable. This often happens. The Fourier transform operator gives us a new function, one with frequencies as independent variable. This can make the function easier to work with. The Fourier transform is an integral operator, by the way, so don’t go thinking everything is a complicated set of derivatives.

Another integral-based operator that’s important is the Laplace transform. This is a great operator because it turns differential equations into algebraic equations. Often, into polynomials. You saw that one coming.

This is all a lot of good press for operators. Well, they’re powerful tools. They help us to see that we can manipulate functions in the ways that functions let us manipulate numbers. It should sound good to realize there is much new that you can do, and you already know most of what’s needed to do it.

This and all the other Fall 2019 A To Z posts should be gathered here. And once I have the time to fiddle with tags I’ll have all past A to Z essays gathered at this link.

## From my Fifth A-to-Z: Oriented Graph

My grad-student career took me into Monte Carlo methods and viscosity-free fluid flow. It’s a respectable path. But I could have ended up in graph theory; I got a couple courses in it in grad school and loved it. I just could not find a problem I could work on that was both solvable and interesting. But hints of that alternative path for me turn up now and then, such as in this piece from 2018.

I am surprised to have had no suggestions for an ‘O’ letter. I’m glad to take a free choice, certainly. It let me get at one of those fields I didn’t specialize in, but could easily have. And let me mention that while I’m still taking suggestions for the letters P through T, each other letter has gotten at least one nomination. I can be swayed by a neat term, though, so if you’ve thought of something hard to resist, try me. And later this month I’ll open up the letters U through Z. Might want to start thinking right away about what X, Y, and Z could be.

# Oriented Graph.

This is another term from graph theory, one of the great mathematical subjects for doodlers. A graph, here, is made of two sets of things. One is a bunch of fixed points, called ‘vertices’. The other is a bunch of curves, called ‘edges’. Every edge starts at one vertex and ends at one vertex. We don’t require that every vertex have an edge grow from it.

Already you can see why this is a fun subject. It models some stuff really well. Like, anything where you have a bunch of sources of stuff, that come together and spread out again? Chances are there’s a graph that describes this. There’s a compelling all-purpose interpretation. Have vertices represent the spots where something accumulates, or rests, or changes, or whatever. Have edges represent the paths along which something can move. This covers so much.

The next step is a “directed graph”. This comes from making the edges different. If we don’t say otherwise we suppose that stuff can move along an edge in either direction. But suppose otherwise. Suppose there are some edges that can be used in only one direction. This makes a “directed edge”. It’s easy to see in graph theory networks of stuff like city streets. Once you ponder that, one-way streets follow close behind. If every edge in a graph is directed, then you have a directed graph. Moving from a regular old undirected graph to a directed graph changes everything you’d learned about graph theory. Mostly it makes things harder. But you get some good things in trade. We become able to model sources, for example. This is where whatever might move comes from. Also sinks, which is where whatever might move disappears from our consideration.

You might fear that by switching to a directed graph there’s no way to have a two-way connection between a pair of vertices. Or that if there is you have to go through some third vertex. I understand your fear, and wish to reassure you. We can get a two-way connection even in a directed graph: just have the same two vertices be connected by two edges. One goes one way, one goes the other. I hope you feel some comfort.

What if we don’t have that, though? What if the directed graph doesn’t have any vertices with a pair of opposite-directed edges? And that, then, is an oriented graph. We get the orientation from looking at pairs of vertices. Each pair either has no edge connecting them, or has a single directed edge between them.

There’s a lot of potential oriented graphs. If you have three vertices, for example, there’s seven oriented graphs to make of that. You’re allowed to have a vertex not connected to any others. You’re also allowed to have the vertices grouped into a couple of subsets, and connect only to other vertices in their own subset. This is part of why four vertices can give you 42 different oriented graphs. Five vertices can give you 582 different oriented graphs. You can insist on a connected oriented graph.

A connected graph is what you guess. It’s a graph where there’s no vertices off on their own, unconnected to anything. There’s no subsets of vertices connected only to each other. This doesn’t mean you can always get from any one vertex to any other vertex. The directions might not allow you to that. But if you’re willing to break the laws, and ignore the directions of these edges, you could then get from any vertex to any other vertex. Limiting yourself to connected graphs reduces the number of oriented graphs you can get. But not by as much as you might guess, at least not to start. There’s only one connected oriented graph for two vertices, instead of two. Three vertices have five connected oriented graphs, rather than seven. Four vertices have 34, rather than 42. Five vertices, 535 rather than 582. The total number of lost graphs grows, of course. The percentage of lost graphs dwindles, though.

There’s something more. What if there are no unconnected vertices? That is, every pair of vertices has an edge? If every pair of vertices in a graph has a direct connection we call that a “complete” graph. This is true whether the graph is directed or not. If you do have a complete oriented graph — every pair of vertices has a direct connection, and only the one direction — then that’s a “tournament”. If that seems like a whimsical name, consider one interpretation of it. Imagine a sports tournament in which every team played every other team once. And that there’s no ties. Each vertex represents one team. Each edge is the match played by the two teams. The direction is, let’s say, from the losing team to the winning team. (It’s as good if the direction is from the winning team to the losing team.) Then you have a complete, oriented, directed graph. And it represents your tournament.

And that delights me. A mathematician like me might talk a good game about building models. How one can represent things with mathematical constructs. Here, it’s done. You can make little dots, for vertices, and curved lines with arrows, for edges. And draw a picture that shows how a round-robin tournament works. It can be that direct.

## From my Fourth A-to-Z: Open Set

It’s quite funny to notice the first paragraph’s shame at missing my self-imposed schedule. I still have not found confirmation of my hunch that “open” and “closed”, as set properties, were named independently. I haven’t found evidence I’m wrong, though, either.

Today’s glossary entry is another request from Elke Stangl, author of the Elkemental Force blog. I’m hoping this also turns out to be a well-received entry. Half of that is up to you, the kind reader. At least I hope you’re a reader. It’s already gone wrong, as it was supposed to be Friday’s entry. I discovered I hadn’t actually scheduled it while I was too far from my laptop to do anything about that mistake. This spoils the nice Monday-Wednesday-Friday routine of these glossary entries that dates back to the first one I ever posted and just means I have to quit forever and not show my face ever again. Sorry, Ulam Spiral. Someone else will have to think of you.

# Open Set.

Mathematics likes to present itself as being universal truths. And it is. At least if we allow that the rules of logic by which mathematics works are universal. Suppose them to be true and the rest follows. But we start out with intuition, with things we observe in the real world. We’re happy when we can remove the stuff that’s clearly based on idiosyncratic experience. We find something that’s got to be universal.

Sets are pretty abstract things, as mathematicians use the term. They get to be hard to talk about; we run out of simpler words that we can use. A set is … a bunch of things. The things are … stuff that could be in a set, or else that we’d rule out of a set. We can end up better understanding things by drawing a picture. We draw the universe, which is a rectangular block, sometimes with dashed lines as the edges. The set is some blotch drawn on the inside of it. Some shade it in to emphasize which stuff we want in the set. If we need to pick out a couple things in the universe we drop in dots or numerals. If we’re rigorous about the drawing we could create a Venn Diagram.

When we do this, we’re giving up on the pure mathematical abstraction of the set. We’re replacing it with a territory on a map. Several territories, if we have several sets. The territories can overlap or be completely separate. We’re subtly letting our sense of geography, our sense of the spaces in which we move, infiltrate our understanding of sets. That’s all right. It can give us useful ideas. Later on, we’ll try to separate out the ideas that are too bound to geography.

A set is open if whenever you’re in it, you can’t be on its boundary. We never quite have this in the real world, with territories. The border between, say, New Jersey and New York becomes this infinitesimally slender thing, as wide in space as midnight is in time. But we can, with some effort, imagine the state. Imagine being as tiny in every direction as the border between two states. Then we can imagine the difference between being on the border and being away from it.

And not being on the border matters. If we are not on the border we can imagine the problem of getting to the border. Pick any direction; we can move some distance while staying inside the set. It might be a lot of distance, it might be a tiny bit. But we stay inside however we might move. If we are on the border, then there’s some direction in which any movement, however small, drops us out of the set. That’s a difference in kind between a set that’s open and a set that isn’t.

I say “a set that’s open and a set that isn’t”. There are such things as closed sets. A set doesn’t have to be either open or closed. It can be neither, a set that includes some of its borders but not other parts of it. It can even be both open and closed simultaneously. The whole universe, for example, is both an open and a closed set. The empty set, with nothing in it, is both open and closed. (This looks like a semantic trick. OK, if you’re in the empty set you’re not on its boundary. But you can’t be in the empty set. So what’s going on? … The usual. It makes other work easier if we call the empty set ‘open’. And the extra work we’d have to do to rule out the empty set doesn’t seem to get us anything interesting. So we accept what might be a trick.) The definitions of ‘open’ and ‘closed’ don’t exclude one another.

I’m not sure how this confusing state of affairs developed. My hunch is that the words ‘open’ and ‘closed’ evolved independent of each other. Why do I think this? An open set has its openness from, well, not containing its boundaries; from the inside there’s always a little more to it. A closed set has its closedness from sequences. That is, you can consider a string of points inside a set. Are these points leading somewhere? Is that point inside your set? If a string of points always leads to somewhere, and that somewhere is inside the set, then you have closure. You have a closed set. I’m not sure that the terms were derived with that much thought. But it does explain, at least in terms a mathematician might respect, why a set that isn’t open isn’t necessarily closed.

Back to open sets. What does it mean to not be on the boundary of the set? How do we know if we’re on it? We can define sets by all sorts of complicated rules: complex-valued numbers of size less than five, say. Rational numbers whose denominator (in lowest form) is no more than ten. Points in space from which a satellite dropped would crash into the moon rather than into the Earth or Sun. If we have an idea of distance we could measure how far it is from a point to the nearest part of the boundary. Do we need distance, though?

No, it turns out. We can get the idea of open sets without using distance. Introduce a neighborhood of a point. A neighborhood of a point is an open set that contains that point. It doesn’t have to be small, but that’s the connotation. And we get to thinking of little N-balls, circle or sphere-like constructs centered on the target point. It doesn’t have to be N-balls. But we think of them so much that we might as well say it’s necessary. If every point in a set has a neighborhood around it that’s also inside the set, then the set’s open.

You’re going to accuse me of begging the question. Fair enough. I was using open sets to define open sets. This use is all right for an intuitive idea of what makes a set open, but it’s not rigorous. We can give in and say we have to have distance. Then we have N-balls and we can build open sets out of balls that don’t contain the edges. Or we can try to drive distance out of our idea of open sets.

We can do it this way. Start off by saying the whole universe is an open set. Also that the union of any number of open sets is also an open set. And that the intersection of any finite number of open sets is also an open set. Does this sound weak? So it sounds weak. It’s enough. We get the open sets we were thinking of all along from this.

This works for the sets that look like territories on a map. It also works for sets for which we have some idea of distance, however strange it is to our everyday distances. It even works if we don’t have any idea of distance. This lets us talk about topological spaces, and study what geometry looks like if we can’t tell how far apart two points are. We can, for example, at least tell that two points are different. Can we find a neighborhood of one that doesn’t contain the other? Then we know they’re some distance apart, even without knowing what distance is.

That we reached so abstract an idea of what an open set is without losing the idea’s usefulness suggests we’re doing well. So we are. It also shows why Nicholas Bourbaki, the famous nonexistent mathematician, thought set theory and its related ideas were the core of mathematics. Today category theory is a more popular candidate for the core of mathematics. But set theory is still close to the core, and much of analysis is about what we can know from the fact of sets being open. Open sets let us explain a lot.

## How December 2021, The Month I Crashed, Treated My Mathematics Blog

On my humor blog I joked I was holding off on my monthly statistics recaps waiting for December 2021 to get better. What held me back here is more attention- and energy-draining nonsense going on last week. It’s passed without lasting harm, that I know about, though. So I can get back to looking at how things looked here in December.

December was, technically, my most prolific month in the sorry year of 2021. I had twelve articles posted, in a year that mostly saw around five to seven posts a year. But more than half of them were repeats, copying the text of old A-to-Z’s, with a small introduction added. I’ve observed how much my readership seems to depend on the number of posts made, more than anything else. How did this sudden surge affect my statistics? … Here’s how.

This was another declining month, with the fewest number of page views — 1,946 — and unique visitors — 1,351 — since July 2021. As you’d expect, this was also below the twelve-month running means, of 2,437.7 views from 1,727.8 unique visitors. It’s also below the twelve-month running medians, of 2,436.5 views from 1,742 unique visitors.

I notice, looking at the years going back to 2018, that I’ve seen a readership drop in December each of the last several years. In 2019 my December readership was barely three-fifths the November readership, for example. In 2018 and 2020 readership fell by one-tenth to one-fifth. But those are also years where my A-to-Z was going regularly, and filling whole weeks with publication, in November, with only a few pieces in December. Having December be busier than November is novel.

So I’m curious whether other blogs see a similar November-to-December dropoff. I’m also curious if they have a publishing schedule that makes it easier to find actual patterns through the chaos.

There were 46 things liked in December, which is above the running mean of 40.5 and median of 38.5. There were nine comments given, below that mean of 15.3 and median of 11.5. On the other hand, what much was there to say? (And I appreciate each comment, particularly those of moral support.)

The per-posting numbers, of views and visitors and such, collapsed. I had expected that, since the laconic publishing schedule I settled on drove the per-posting averages way up. The twelve-month running mean of views per posting was 323.4, and median 307.4, for example. December saw 162.2 views per posting. There were a running mean of 228.4 visitors per posting, and median of 219.2 per posting, for the twelve months ending with November 2021. December 2021 saw 112.6 visitors per posting. So those numbers are way down. But they aren’t far off the figures I had in, say, the end of 2020, when I was doing 18 or 19 posts per month.

Might as well list all twelve posts of December, in their descending order of popularity. I’m not surprised the original A-to-Z stuff was most popular. Besides being least familiar, it also came first in the month, so had time to attract page views. Here’s the roster of how the month’s postings ranked.

WordPress credits me with publishing 16,789 words in December, an average of 1,399.1 words per post. That’s not only my most talkative month for 2021; that’s two of my most talkative months. There’s a whole third of the year I didn’t publish that much. This is all inflated by my reposting old articles in their entirety, of course. In past years I would include a pointer to an old A-to-Z essay, but not the whole thing.

This all brings my blog to a total 67,218 words posted for the year. It’s not the second-least-talkative year after all, although I’ll keep its comparisons to other years for a separate post.

At the closing of the year, WordPress figures I’ve posted 1,675 things here. They drew a total 150,883 page views from 90,187 visitors. This isn’t much compared to the first-tier pop-mathematics blogs. But it’s still more people than I could expect to meet in my life. So that’s nice to know about.

And now let’s look ahead to what 2022 is going to bring on all of this. I still intend to finish the Little 2021 Mathematics A-to-Z. Those essays should be at this link when I post them. I may get back to my Reading the Comics posts, as well. We’ll see.

## From my Third A-to-Z: Osculating Circle

With the third A-to-Z choice for the letter O, I finally set ortho-ness down. I had thought the letter might become a reference for everything described as ortho-. It has to be acknowledged that two or three examples gets you the general idea of what’s got at when something is named ortho-, though.

Must admit, I haven’t that I remember ever solved a differential equation using osculating circles instead of, you know, polynomials or sine functions (Fourier series). But references I trust say that would be a way to go.

I’m happy to say it’s another request today. This one’s from HowardAt58, author of the Saving School Math blog. He’s given me some great inspiration in the past.

## Osculating Circle.

It’s right there in the name. Osculating. You know what that is from that one Daffy Duck cartoon where he cries out “Greetings, Gate, let’s osculate” while wearing a moustache. Daffy’s imitating somebody there, but goodness knows who. Someday the mystery drives the young you to a dictionary web site. Osculate means kiss. This doesn’t seem to explain the scene. Daffy was imitating Jerry Colonna. That meant something in 1943. You can find him on old-time radio recordings. I think he’s funny, in that 40s style.

Make the substitution. A kissing circle. Suppose it’s not some playground antic one level up from the Kissing Bandit that plagues recess yet one or two levels down what we imagine we’d do in high school. It suggests a circle that comes really close to something, that touches it a moment, and then goes off its own way.

But then touching. We know another word for that. It’s the root behind “tangent”. Tangent is a trigonometry term. But it appears in calculus too. The tangent line is a line that touches a curve at one specific point and is going in the same direction as the original curve is at that point. We like this because … well, we do. The tangent line is a good approximation of the original curve, at least at the tangent point and for some region local to that. The tangent touches the original curve, and maybe it does something else later on. What could kissing be?

The osculating circle is about approximating an interesting thing with a well-behaved thing. So are similar things with names like “osculating curve” or “osculating sphere”. We need that a lot. Interesting things are complicated. Well-behaved things are understood. We move from what we understand to what we would like to know, often, by an approximation. This is why we have tangent lines. This is why we build polynomials that approximate an interesting function. They share the original function’s value, and its derivative’s value. A polynomial approximation can share many derivatives. If the function is nice enough, and the polynomial big enough, it can be impossible to tell the difference between the polynomial and the original function.

The osculating circle, or sphere, isn’t so concerned with matching derivatives. I know, I’m as shocked as you are. Well, it matches the first and the second derivatives of the original curve. Anything past that, though, it matches only by luck. The osculating circle is instead about matching the curvature of the original curve. The curvature is what you think it would be: it’s how much a function curves. If you imagine looking closely at the original curve and an osculating circle they appear to be two arcs that come together. They must touch at one point. They might touch at others, but that’s incidental.

Osculating circles, and osculating spheres, sneak out of mathematics and into practical work. This is because we often want to work with things that are almost circles. The surface of the Earth, for example, is not a sphere. But it’s only a tiny bit off. It’s off in ways that you only notice if you are doing high-precision mapping. Or taking close measurements of things in the sky. Sometimes we do this. So we map the Earth locally as if it were a perfect sphere, with curvature exactly what its curvature is at our observation post.

Or we might be observing something moving in orbit. If the universe had only two things in it, and they were the correct two things, all orbits would be simple: they would be ellipses. They would have to be “point masses”, things that have mass without any volume. They never are. They’re always shapes. Spheres would be fine, but they’re never perfect spheres even. The slight difference between a perfect sphere and whatever the things really are affects the orbit. Or the other things in the universe tug on the orbiting things. Or the thing orbiting makes a course correction. All these things make little changes in the orbiting thing’s orbit. The actual orbit of the thing is a complicated curve. The orbit we could calculate is an osculating — well, an osculating ellipse, rather than an osculating circle. Similar idea, though. Call it an osculating orbit if you’d rather.

That osculating circles have practical uses doesn’t mean they aren’t respectable mathematics. I’ll concede they’re not used as much as polynomials or sine curves are. I suppose that’s because polynomials and sine curves have nicer derivatives than circles do. But osculating circles do turn up as ways to try solving nonlinear differential equations. We need the help. Linear differential equations anyone can solve. Nonlinear differential equations are pretty much impossible. They also turn up in signal processing, as ways to find the frequencies of a signal from a sampling of data. This, too, we would like to know.

We get the name “osculating circle” from Gottfried Wilhelm Leibniz. This might not surprise. Finding easy-to-understand shapes that approximate interesting shapes is why we have calculus. Isaac Newton described a way of making them in the Principia Mathematica. This also might not surprise. Of course they would on this subject come so close together without kissing.

## From my Second A-to-Z: Orthonormal

For early 2016 — dubbed “Leap Day 2016” as that’s when it started — I got a request to explain orthogonal. I went in a different direction, although not completely different. This essay does get a bit more into specifics of how mathematicians use the idea, like, showing some calculations and such. I put in a casual description of vectors here. For book publication I’d want to rewrite that to be clearer that, like, ordered sets of numbers are just one (very common) way to represent vectors.

Jacob Kanev had requested “orthogonal” for this glossary. I’d be happy to oblige. But I used the word in last summer’s Mathematics A To Z. And I admit I’m tempted to just reprint that essay, since it would save some needed time. But I can do something more.

## Orthonormal.

“Orthogonal” is another word for “perpendicular”. Mathematicians use it for reasons I’m not precisely sure of. My belief is that it’s because “perpendicular” sounds like we’re talking about directions. And we want to extend the idea to things that aren’t necessarily directions. As majors, mathematicians learn orthogonality for vectors, things pointing in different directions. Then we extend it to other ideas. To functions, particularly, but we can also define it for spaces and for other stuff.

I was vague, last summer, about how we do that. We do it by creating a function called the “inner product”. That takes in two of whatever things we’re measuring and gives us a real number. If the inner product of two things is zero, then the two things are orthogonal.

The first example mathematics majors learn of this, before they even hear the words “inner product”, are dot products. These are for vectors, ordered sets of numbers. The dot product we find by matching up numbers in the corresponding slots for the two vectors, multiplying them together, and then adding up the products. For example. Give me the vector with values (1, 2, 3), and the other vector with values (-6, 5, -4). The inner product will be 1 times -6 (which is -6) plus 2 times 5 (which is 10) plus 3 times -4 (which is -12). So that’s -6 + 10 – 12 or -8.

So those vectors aren’t orthogonal. But how about the vectors (1, -1, 0) and (0, 0, 1)? Their dot product is 1 times 0 (which is 0) plus -1 times 0 (which is 0) plus 0 times 1 (which is 0). The vectors are perpendicular. And if you tried drawing this you’d see, yeah, they are. The first vector we’d draw as being inside a flat plane, and the second vector as pointing up, through that plane, like a thumbtack.

Well … the inner product can tell us something besides orthogonality. What happens if we take the inner product of a vector with itself? Say, (1, 2, 3) with itself? That’s going to be 1 times 1 (which is 1) plus 2 times 2 (4, according to rumor) plus 3 times 3 (which is 9). That’s 14, a tidy sum, although, so what?

The inner product of (-6, 5, -4) with itself? Oh, that’s some ugly numbers. Let’s skip it. How about the inner product of (1, -1, 0) with itself? That’ll be 1 times 1 (which is 1) plus -1 times -1 (which is positive 1) plus 0 times 0 (which is 0). That adds up to 2. And now, wait a minute. This might be something.

Start from somewhere. Move 1 unit to the east. (Don’t care what the unit is. Inches, kilometers, astronomical units, anything.) Then move -1 units to the north, or like normal people would say, 1 unit o the south. How far are you from the starting point? … Well, you’re the square root of 2 units away.

Now imagine starting from somewhere and moving 1 unit east, and then 2 units north, and then 3 units straight up, because you found a convenient elevator. How far are you from the starting point? This may take a moment of fiddling around with the Pythagorean theorem. But you’re the square root of 14 units away.

And what the heck, (0, 0, 1). The inner product of that with itself is 0 times 0 (which is zero) plus 0 times 0 (still zero) plus 1 times 1 (which is 1). That adds up to 1. And, yeah, if we go one unit straight up, we’re one unit away from where we started.

The inner product of a vector with itself gives us the square of the vector’s length. At least if we aren’t using some freak definition of inner products and lengths and vectors. And this is great! It means we can talk about the length — maybe better to say the size — of things that maybe don’t have obvious sizes.

Some stuff will have convenient sizes. For example, they’ll have size 1. The vector (0, 0, 1) was one such. So is (1, 0, 0). And you can think of another example easily. Yes, it’s $\left(\frac{1}{\sqrt{2}}, -\frac{1}{2}, \frac{1}{2}\right)$. (Go ahead, check!)

So by “orthonormal” we mean a collection of things that are orthogonal to each other, and that themselves are all of size 1. It’s a description of both what things are by themselves and how they relate to one another. A thing can’t be orthonormal by itself, for the same reason a line can’t be perpendicular to nothing in particular. But a pair of things might be orthogonal, and they might be the right length to be orthonormal too.

Why do this? Well, the same reasons we always do this. We can impose something like direction onto a problem. We might be able to break up a problem into simpler problems, one in each direction. We might at least be able to simplify the ways different directions are entangled. We might be able to write a problem’s solution as the sum of solutions to a standard set of representative simple problems. This one turns up all the time. And an orthogonal set of something is often a really good choice of a standard set of representative problems.

This sort of thing turns up a lot when solving differential equations. And those often turn up when we want to describe things that happen in the real world. So a good number of mathematicians develop a habit of looking for orthonormal sets.

## From my First A-to-Z: Orthogonal

I haven’t had the space yet to finish my Little 2021 A-to-Z, so let me resume playing the hits of past ones. For my first, Summer 2015, one, I picked all the topics myself. This one, Orthogonal, I remember as one of the challenging ones. The challenge was the question put in the first paragraph: why do we have this term, which is so nearly a synonym for “perpendicular”? I didn’t find an answer, then, or since. But I was able to think about how we use “orthogonal” and what it might do that “perpendicular ” doesn’t..

## Orthogonal.

Orthogonal is another word for perpendicular. So why do we need another word for that?

It helps to think about why “perpendicular” is a useful way to organize things. For example, we can describe the directions to a place in terms of how far it is north-south and how far it is east-west, and talk about how fast it’s travelling in terms of its speed heading north or south and its speed heading east or west. We can separate the north-south motion from the east-west motion. If we’re lucky these motions separate entirely, and we turn a complicated two- or three-dimensional problem into two or three simpler problems. If they can’t be fully separated, they can often be largely separated. We turn a complicated problem into a set of simpler problems with a nice and easy part plus an annoying yet small hard part.

And this is why we like perpendicular directions. We can often turn a problem into several simpler ones describing each direction separately, or nearly so.

And now the amazing thing. We can separate these motions because the north-south and the east-west directions are at right angles to one another. But we can describe something that works like an angle between things that aren’t necessarily directions. For example, we can describe an angle between things like functions that have the same domain. And once we can describe the angle between two functions, we can describe functions that make right angles between each other.

This means we can describe functions as being perpendicular to one another. An example. On the domain of real numbers from -1 to 1, the function $f(x) = x$ is perpendicular to the function $g(x) = x^2$. And when we want to study a more complicated function we can separate the part that’s in the “direction” of f(x) from the part that’s in the “direction” of g(x). We can treat functions, even functions we don’t know, as if they were locations in space. And we can study and even solve for the different parts of the function as if we were pinning down the north-south and the east-west movements of a thing.

So if we want to study, say, how heat flows through a body, we can work out a series of “direction” for functions, and work out the flow in each of those “directions”. These don’t have anything to do with left-right or up-down directions, but the concepts and the convenience is similar.

I’ve spoken about this in terms of functions. But we can define the “angle” between things for many kinds of mathematical structures. Once we can do that, we can have “perpendicular” pairs of things. I’ve spoken only about functions, but that’s because functions are more familiar than many of the mathematical structures that have orthogonality.

Ah, but why call it “orthogonal” rather than “perpendicular”? And I don’t know. The best I can work out is that it feels weird to speak of, say, the cosine function being “perpendicular” to the sine function when you can’t really say either is in any particular direction. “Orthogonal” seems to appeal less directly to physical intuition while still meaning something. But that’s my guess, rather than the verdict of a skilled etymologist.

## 78 Pages and More of Arithmetic Trivia About 2022

2022 is a new and, we all hope, less brutal year. It is also a number, though, an integer. And every integer has some interesting things about it. Iva Sallay, of the Find The Factors recreational mathematics blog, assembled an awesome list of trivia about the number. This includes a bunch of tweets about the number’s interesting mathematical properties. At least some of them are sure to surprise you.

If that is not enough, then, please consider something which Christian Lawson-Perfect noted on Mathstodon. It is 78 pages titled Mathematical Beauty of 2022, by Dr Inder J Taneja. If the name sounds faintly familiar it might be that I’ve mentioned Taneja’s work before, in recreational arithmetic projects.

None of this trivia may matter. But there is some value in finding cute and silly things. Verifying, or discovering, cute trivia about a number helps you learn how to spot patterns and learn to look for new ones. And it’s good to play some.

## From my Seventh A-to-Z: Tiling (the accidental remake)

For the 2020 A-to-Z I took the suggestion to write about tiling. It’s a fun field with many interesting wrinkles. And I realized after publishing that I had already written about Tiling, just two years before. There was no scrambling together a replacement essay, so I had to let it stand as is.

The accidental remake allows for some interesting studies, though. The two essays have very similar structures, which probably reflects that I came to both essays with similar rough ideas what to write, and went to similar sources to fill in details. The second essay turned out longer. Also, I think, better. I did a bit more tracking down specifics, such as trying to find Hao Wang’s paper and see just what it says. And rewriting is often key to good writing. This offers lessons in preparing these essays for book publication.

Mr Wu, author of the Singapore Maths Tuition blog, had an interesting suggestion for the letter T: Talent. As in mathematical talent. It’s a fine topic but, in the end, too far beyond my skills. I could share some of the legends about mathematical talent I’ve received. But what that says about the culture of mathematicians is a deeper and more important question.

So I picked my own topic for the week. I do have topics for next week — U — and the week after — V — chosen. But the letters W and X? I’m still open to suggestions. I’m open to creative or wild-card interpretations of the letters. Especially for X and (soon) Z. Thanks for sharing any thoughts you care to.

# Tiling.

Think of a floor. Imagine you are bored. What do you notice?

What I hope you notice is that it is covered. Perhaps by carpet, or concrete, or something homogeneous like that. Let’s ignore that. My floor is covered in small pieces, repeated. My dining room floor is slats of wood, about three and a half feet long and two inches wide. The slats are offset from the neighbors so there’s a pleasant strong line in one direction and stippled lines in the other. The kitchen is squares, one foot on each side. This is a grid we could plot high school algebra functions on. The bathroom is more elaborate. It has white rectangles about two inches long, tan rectangles about two inches long, and black squares. Each rectangle is perpendicular to ones of the other color, and arranged to bisect those. The black squares fill the gaps where no rectangle would fit.

Move from my house to pure mathematics. It’s easy to turn the floor of a room into abstract mathematics. We start with something to tile. Usually this is the infinite, two-dimensional plane. The thing you get if you have a house and forget the walls. Sometimes we look to tile the hyperbolic plane, a different geometry that we of course represent with a finite circle. (Setting particular rules about how to measure distance makes this equivalent to a funny-shaped plane.) Or the surface of a sphere, or of a torus, or something like that. But if we don’t say otherwise, it’s the plane.

What to cover it with? … Smaller shapes. We have a mathematical tiling if we have a collection of not-overlapping open sets. And if those open sets, plus their boundaries, cover the whole plane. “Cover” here means what “cover” means in English, only using more technical words. These sets — these tiles — can be any shape. We can have as many or as few of them as we like. We can even add markings to the tiles, give them colors or patterns or such, to add variety to the puzzles.

(And if we want, we can do this in other dimensions. There are good “tiling” questions to ask about how to fill a three-dimensional space, or a four-dimensional one, or more.)

Having an unlimited collection of tiles is nice. But mathematicians learn to look for how little we need to do something. Here, we look for the smallest number of distinct shapes. As with tiling an actual floor, we can get all the tiles we need. We can rotate them, too, to any angle. We can flip them over and put the “top” side “down”, something kitchen tiles won’t let us do. Can we reflect them? Use the shape we’d get looking at the mirror image of one? That’s up to whoever’s writing this paper.

What shapes will work? Well, squares, for one. We can prove that by looking at a sheet of graph paper. Rectangles would work too. We can see that by drawing boxes around the squares on our graph paper. Two-by-one blocks, three-by-two blocks, 40-by-1 blocks, these all still cover the paper and we can imagine covering the plane. If we like, we can draw two-by-two squares. Squares made up of smaller squares. Or repeat this: draw two-by-one rectangles, and then group two of these rectangles together to make two-by-two squares.

We can take it on faith that, oh, rectangles π long by e wide would cover the plane too. These can all line up in rows and columns, the way our squares would. Or we can stagger them, like bricks or my dining room’s wood slats are.

How about parallelograms? Those, it turns out, tile exactly as well as rectangles or squares do. Grids or staggered, too. Ah, but how about trapezoids? Surely they won’t tile anything. Not generally, anyway. The slanted sides will, most of the time, only fit in weird winding circle-like paths.

Unless … take two of these trapezoid tiles. We’ll set them down so the parallel sides run horizontally in front of you. Rotate one of them, though, 180 degrees. And try setting them — let’s say so the longer slanted line of both trapezoids meet, edge to edge. These two trapezoids come together. They make a parallelogram, although one with a slash through it. And we can tile parallelograms, whether or not they have a slash.

OK, but if you draw some weird quadrilateral shape, and it’s not anything that has a more specific name than “quadrilateral”? That won’t tile the plane, will it?

It will! In one of those turns that surprises and impresses me every time I run across it again, any quadrilateral can tile the plane. It opens up so many home decorating options, if you get in good with a tile maker.

That’s some good news for quadrilateral tiles. How about other shapes? Triangles, for example? Well, that’s good news too. Take two of any identical triangle you like. Turn one of them around and match sides of the same length. The two triangles, bundled together like that, are a quadrilateral. And we can use any quadrilateral to tile the plane, so we’re done.

How about pentagons? … With pentagons, the easy times stop. It turns out not every pentagon will tile the plane. The pentagon has to be of the right kind to make it fit. If the pentagon is in one of these kinds, it can tile the plane. If not, not. There are fifteen families of tiling known. The most recent family was discovered in 2015. It’s thought that there are no other convex pentagon tilings. I don’t know whether the proof of that is generally accepted in tiling circles. And we can do more tilings if the pentagon doesn’t need to be convex. For example, we can cut any parallelogram into two identical pentagons. So we can make as many pentagons as we want to cover the plane. But we can’t assume any pentagon we like will do it.

Hexagons look promising. First, a regular hexagon tiles the plane, as strategy games know. There are also at least three families of irregular hexagons that we know can tile the plane.

And there the good times end. There are no convex heptagons or octagons or any other shape with more sides that tile the plane.

Not by themselves, anyway. If we have more than one tile shape we can start doing fine things again. Octagons assisted by squares, for example, will tile the plane. I’ve lived places with that tiling. Or something that looks like it. It’s easier to install if you have square tiles with an octagon pattern making up the center, and triangle corners a different color. These squares come together to look like octagons and squares.

And this leads to a fun avenue of tiling. Hao Wang, in the early 60s, proposed a sort of domino-like tiling. You may have seen these in mathematics puzzles, or in toys. Each of these Wang Tiles, or Wang Dominoes, is a square. But the square is cut along the diagonals, into four quadrants. Each quadrant is a right triangle. Each quadrant, each triangle, is one of a finite set of colors. Adjacent triangles can have the same color. You can place down tiles, subject only to the rule that the tile edge has to have the same color on both sides. So a tile with a blue right-quadrant has to have on its right a tile with a blue left-quadrant. A tile with a white upper-quadrant on its top has, above it, a tile with a white lower-quadrant.

In 1961 Wang conjectured that if a finite set of these tiles will tile the plane, then there must be a periodic tiling. That is, if you picked up the plane and slid it a set horizontal and vertical distance, it would all look the same again. This sort of translation is common. All my floors do that. If we ignore things like the bounds of their rooms, or the flaws in their manufacture or installation or where a tile broke in some mishap.

This is not to say you couldn’t arrange them aperiodically. You don’t even need Wang Tiles for that. Get two colors of square tile, a white and a black, and lay them down based on whether the next decimal digit of π is odd or even. No; Wang’s conjecture was that if you had tiles that you could lay down aperiodically, then you could also arrange them to set down periodically. With the black and white squares, lay down alternate colors. That’s easy.

In 1964, Robert Berger proved Wang’s conjecture was false. He found a collection of Wang Tiles that could only tile the plane aperiodically. In 1966 he published this in the Memoirs of the American Mathematical Society. The 1964 proof was for his thesis. 1966 was its general publication. I mention this because while doing research I got irritated at how different sources dated this to 1964, 1966, or sometimes 1961. I want to have this straightened out. It appears Berger had the proof in 1964 and the publication in 1966.

I would like to share details of Berger’s proof, but haven’t got access to the paper. What fascinates me about this is that Berger’s proof used a set of 20,426 different tiles. I assume he did not work this all out with shards of construction paper, but then, how to get 20,426 of anything? With computer time as expensive as it was in 1964? The mystery of how he got all these tiles is worth an essay of its own and regret I can’t write it.

Berger conjectured that a smaller set might do. Quite so. He himself reduced the set to 104 tiles. Donald Knuth in 1968 modified the set down to 92 tiles. In 2015 Emmanuel Jeandel and Michael Rao published a set of 11 tiles, using four colors. And showed by computer search that a smaller set of tiles, or fewer colors, would not force some aperiodic tiling to exist. I do not know whether there might be other sets of 11, four-colored, tiles that work. You can see the set at the top of Wikipedia’s page on Wang Tiles.

These Wang Tiles, all squares, inspired variant questions. Could there be other shapes that only aperiodically tile the plane? What if they don’t have to be squares? Raphael Robinson, in 1971, came up with a tiling using six shapes. The shapes have patterns on them too, usually represented as colored lines. Tiles can be put down only in ways that fit and that make the lines match up.

Among my readers are people who have been waiting, for 1800 words now, for Roger Penrose. It’s now that time. In 1974 Penrose published an aperiodic tiling, one based on pentagons and using a set of six tiles. You’ve never heard of that either, because soon after he found a different set, based on a quadrilateral cut into two shapes. The shapes, as with Wang Tiles or Robinson’s tiling, have rules about what edges may be put against each other. Penrose — and independently Robert Ammann — also developed another set, this based on a pair of rhombuses. These have rules about what edges may tough one another, and have patterns on them which must line up.

The Penrose tiling became, and stayed famous. (Ammann, an amateur, never had much to do with the mathematics community. He died in 1994.) Martin Gardner publicized it, and it leapt out of mathematicians’ hands into the popular culture. At least a bit. That it could give you nice-looking floors must have helped.

To show that the rhombus-based Penrose tiling is aperiodic takes some arguing. But it uses tools already used in this essay. Remember drawing rectangles around several squares? And then drawing squares around several of these rectangles? We can do that with these Penrose-Ammann rhombuses. From the rhombus tiling we can draw bigger rhombuses. Ones which, it turns out, follow the same edge rules that the originals do. So that we can go again, grouping these bigger rhombuses into even-bigger rhombuses. And into even-even-bigger rhombuses. And so on.

What this gets us is this: suppose the rhombus tiling is periodic. Then there’s some finite-distance horizontal-and-vertical move that leaves the pattern unchanged. So, the same finite-distance move has to leave the bigger-rhombus pattern unchanged. And this same finite-distance move has to leave the even-bigger-rhombus pattern unchanged. Also the even-even-bigger pattern unchanged.

Keep bundling rhombuses together. You get eventually-big-enough-rhombuses. Now, think of how far you have to move the tiles to get a repeat pattern. Especially, think how many eventually-big-enough-rhombuses it is. This distance, the move you have to make, is less than one eventually-big-enough rhombus. (If it’s not you aren’t eventually-big-enough yet. Bundle them together again.) And that doesn’t work. Moving one tile over without changing the pattern makes sense. Moving one-half a tile over? That doesn’t. So the eventually-big-enough pattern can’t be periodic, and so, the original pattern can’t be either. This is explained in graphic detail a nice Powerpoint slide set from Professor Alexander F Ritter, A Tour Of Tilings In Thirty Minutes.

It’s possible to do better. In 2010 Joshua E S Socolar and Joan M Taylor published a single tile that can force an aperiodic tiling. As with the Wang Tiles, and Robinson shapes, and the Penrose-Ammann rhombuses, markings are part of it. They have to line up so that the markings — in two colors, in the renditions I’ve seen — make sense. With the Penrose tilings, you can get away from the pattern rules for the edges by replacing them with little notches. The Socolar-Taylor shape can make a similar trade. Here the rules are complex enough that it would need to be a three-dimensional shape, one that looks like the dilithium housing of the warp core. You can see the tile — in colored, marked form, and also in three-dimensional tile shape — at the PDF here. It’s likely not coming to the flooring store soon.

It’s all wonderful, but is it useful? I could go on a few hundred words about, particularly, crystals and quasicrystals. These are important for materials science. Especially these days as we have harnessed slightly-imperfect crystals to be our computers. I don’t care. These are lovely to look at. If you see nothing appealing in a great heap of colors and polygons spread over the floor there are things we cannot communicate about. Tiling is a delight; what more do you need?

Thanks for your attention. This and all of my 2020 A-to-Z essays should be at this link. All the essays from every A-to-Z series should be at this link. See you next week, I hope.

## From my Sixth A-to-Z: Taylor Series

By the time of 2019 and my sixth A-to-Z series , I had some standard narrative tricks I could deploy. My insistence that everything is polynomials, for example. Anecdotes from my slight academic career. A prose style that emphasizes what we do with the idea of something rather than instructions. That last comes from the idea that if you wanted to know how to compute a Taylor series you’d just look it up on Mathworld or Wikipedia or whatnot. The thing a pop mathematics blog can do is give some reason that you’d want to know how to compute a Taylor series. I regret talking about functions that break Taylor series, though. I have to treat these essays as introducing the idea of a Taylor series to someone who doesn’t know anything about them. And it’s bad form to teach how stuff doesn’t work too close to teaching how it does work. Readers tend to blur what works and what doesn’t together. Still, $f(x) = \exp(-\frac{1}{x^2})$ is a really neat weird function and it’d be a shame to let it go completely unmentioned.

Today’s A To Z term was nominated by APMA, author of the Everybody Makes DATA blog. It was a topic that delighted me to realize I could explain. Then it started to torment me as I realized there is a lot to explain here, and I had to pick something. So here’s where things ended up.

# Taylor Series.

In the mid-2000s I was teaching at a department being closed down. In its last semester I had to teach Computational Quantum Mechanics. The person who’d normally taught it had transferred to another department. But a few last majors wanted the old department’s version of the course, and this pressed me into the role. Teaching a course you don’t really know is a rush. It’s a semester of learning, and trying to think deeply enough that you can convey something to students. This while all the regular demands of the semester eat your time and working energy. And this in the leap of faith that the syllabus you made up, before you truly knew the subject, will be nearly enough right. And that you have not committed to teaching something you do not understand.

So around mid-course I realized I needed to explain finding the wave function for a hydrogen atom with two electrons. The wave function is this probability distribution. You use it to find things like the probability a particle is in a certain area, or has a certain momentum. Things like that. A proton with one electron is as much as I’d ever done, as a physics major. We treat the proton as the center of the universe, immobile, and the electron hovers around that somewhere. Two electrons, though? A thing repelling your electron, and repelled by your electron, and neither of those having fixed positions? What the mathematics of that must look like terrified me. When I couldn’t procrastinate it farther I accepted my doom and read exactly what it was I should do.

It turned out I had known what I needed for nearly twenty years already. Got it in high school.

Of course I’m discussing Taylor Series. The equations were loaded down with symbols, yes. But at its core, the important stuff, was this old and trusted friend.

The premise behind a Taylor Series is even older than that. It’s universal. If you want to do something complicated, try doing the simplest thing that looks at all like it. And then make that a little bit more like you want. And then a bit more. Keep making these little improvements until you’ve got it as right as you truly need. Put that vaguely, the idea describes Taylor series just as well as it describes making a video game or painting a state portrait. We can make it more specific, though.

A series, in this context, means the sum of a sequence of things. This can be finitely many things. It can be infinitely many things. If the sum makes sense, we say the series converges. If the sum doesn’t, we say the series diverges. When we first learn about series, the sequences are all numbers. $1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \cdots$, for example, which diverges. (It adds to a number bigger than any finite number.) Or $1 + \frac{1}{2^2} + \frac{1}{3^2} + \frac{1}{4^2} + \cdots$, which converges. (It adds to $\frac{1}{6}\pi^2$.)

In a Taylor Series, the terms are all polynomials. They’re simple polynomials. Let me call the independent variable ‘x’. Sometimes it’s ‘z’, for the reasons you would expect. (‘x’ usually implies we’re looking at real-valued functions. ‘z’ usually implies we’re looking at complex-valued functions. ‘t’ implies it’s a real-valued function with an independent variable that represents time.) Each of these terms is simple. Each term is the distance between x and a reference point, raised to a whole power, and multiplied by some coefficient. The reference point is the same for every term. What makes this potent is that we use, potentially, many terms. Infinitely many terms, if need be.

Call the reference point ‘a’. Or if you prefer, x0. z0 if you want to work with z’s. You see the pattern. This ‘a’ is the “point of expansion”. The coefficients of each term depend on the original function at the point of expansion. The coefficient for the term that has $(x - a)$ is the first derivative of f, evaluated at a. The coefficient for the term that has $(x - a)^2$ is the second derivative of f, evaluated at a (times a number that’s the same for the squared-term for every Taylor Series). The coefficient for the term that has $(x - a)^3$ is the third derivative of f, evaluated at a (times a different number that’s the same for the cubed-term for every Taylor Series).

You’ll never guess what the coefficient for the term with $(x - a)^{122,743}$ is. Nor will you ever care. The only reason you would wish to is to answer an exam question. The instructor will, in that case, have a function that’s either the sine or the cosine of x. The point of expansion will be 0, $\frac{\pi}{2}$, $\pi$, or $\frac{3\pi}{2}$.

Otherwise you will trust that this is one of the terms of $(x - a)^n$, ‘n’ representing some counting number too great to be interesting. All the interesting work will be done with the Taylor series either truncated to a couple terms, or continued on to infinitely many.

What a Taylor series offers is the chance to approximate a function we’re genuinely interested in with a polynomial. This is worth doing, usually, because polynomials are easier to work with. They have nice analytic properties. We can automate taking their derivatives and integrals. We can set a computer to calculate their value at some point, if we need that. We might have no idea how to start calculating the logarithm of 1.3. We certainly have an idea how to start calculating $0.3 - \frac{1}{2}(0.3^2) + \frac{1}{3}(0.3^3)$. (Yes, it’s 0.3. I’m using a Taylor series with a = 1 as the point of expansion.)

The first couple terms tell us interesting things. Especially if we’re looking at a function that represents something physical. The first two terms tell us where an equilibrium might be. The next term tells us whether an equilibrium is stable or not. If it is stable, it tells us how perturbations, points near the equilibrium, behave.

The first couple terms will describe a line, or a quadratic, or a cubic, some simple function like that. Usually adding more terms will make this Taylor series approximation a better fit to the original. There might be a larger region where the polynomial and the original function are close enough. Or the difference between the polynomial and the original function will be closer together on the same old region.

We would really like that region to eventually grow to the whole domain of the original function. We can’t count on that, though. Roughly, the interval of convergence will stretch from ‘a’ to wherever the first weird thing happens. Weird things are, like, discontinuities. Vertical asymptotes. Anything you don’t like dealing with in the original function, the Taylor series will refuse to deal with. Outside that interval, the Taylor series diverges and we just can’t use it for anything meaningful. Which is almost supernaturally weird of them. The Taylor series uses information about the original function, but it’s all derivatives at a single point. Somehow the derivatives of, say, the logarithm of x around x = 1 give a hint that the logarithm of 0 is undefinable. And so they won’t help us calculate the logarithm of 3.

Things can be weirder. There are functions that just break Taylor series altogether. Some are obvious. A function needs lots of derivatives at a point to have a good Taylor series approximation. So, many fractal curves won’t have a Taylor series approximation. These curves are all corners, points where they aren’t continuous or where derivatives don’t exist. Some are obviously designed to break Taylor series approximations. We can make a function that follows different rules if x is rational than if x is irrational. There’s no approximating that, and you’d blame the person who made such a function, not the Taylor series. It can be subtle. The function defined by the rule $f(x) = \exp{-\frac{1}{x^2}}$, with the note that if x is zero then f(x) is 0, seems to satisfy everything we’d look for. It’s a function that’s mostly near 1, that drops down to being near zero around where x = 0. But its Taylor series expansion around a = 0 is a horizontal line always at 0. The interval of convergence can be a single point, challenging our idea of what an interval is.

That’s all right. If we can trust that we’re avoiding weird parts, Taylor series give us an outstanding new tool. Grant that the Taylor series describes a function with the same rule as our original function. The Taylor series is often easier to work with, especially if we’re working on differential equations. We can automate, or at least find formulas for, taking the derivative of a polynomial. Or adding together derivatives of polynomials. Often we can attack a differential equation too hard to solve otherwise by supposing the answer is a polynomial. This is essentially what that quantum mechanics problem used, and why the tool was so familiar when I was in a strange land.

Roughly. What I was actually doing was treating the function I wanted as a power series. This is, like the Taylor series, the sum of a sequence of terms, all of which are $(x - a)^n$ times some coefficient. What makes it not a Taylor series is that the coefficients weren’t the derivatives of any function I knew to start. But the experience of Taylor series trained me to look at functions as things which could be approximated by polynomials.

This gives us the hint to look at other series that approximate interesting functions. We get a host of these, with names like Laurent series and Fourier series and Chebyshev series and such. Laurent series look like Taylor series but we allow powers to be negative integers as well as positive ones. Fourier series do away with polynomials. They instead use trigonometric functions, sines and cosines. Chebyshev series build on polynomials, but not on pure powers. They’ll use orthogonal polynomials. These behave like perpendicular directions do. That orthogonality makes many numerical techniques behave better.

The Taylor series is a great introduction to these tools. Its first several terms have good physical interpretations. Its calculation requires tools we learn early on in calculus. The habits of thought it teaches guides us even in unfamiliar territory.

And I feel very relieved to be done with this. I often have a few false starts to an essay, but those are mostly before I commit words to text editor. This one had about four branches that now sit in my scrap file. I’m glad to have a deadline forcing me to just publish already.

Thank you, though. This and the essays for the Fall 2019 A to Z should be at this link. Next week: the letters U and V. And all past A to Z essays ought to be at this link.

## From my Fifth A-to-Z: Tiling (the first time)

I keep saying in picking A-to-Z topics that just because I don’t take a suggestion now doesn’t mean I won’t in the future. 2018’s A-to-Z I notice includes Mr Wu’s suggestion of “torus”. I didn’t take it then, but did get to it in this year’s little project. I’m glad to have the proof my word is good. I have thought sometime I might fill a gap in my inspiration by taking topics I hadn’t used in A-to-Z’s (I’ve kept lists) and doing them. I’d just need a catchy name for the set of essays.

For today’s a to Z topic I again picked one nominated by aajohannas. This after I realized I was falling into a never-ending research spiral on Mr Wu, of Mathtuition’s suggested “torus”. I do have an older essay describing the torus, as a set. But that does leave out a lot of why a torus is interesting. Well, we’ll carry on.

# Tiling.

Here is a surprising thought for the next time you consider remodeling the kitchen. It’s common to tile the floor. Perhaps some of the walls behind the counter. What patterns could you use? And there are infinitely many possibilities. You might leap ahead of me and say, yes, but they’re all boring. A tile that’s eight inches square is different from one that’s twelve inches square and different from one that’s 12.01 inches square. Fine. Let’s allow that all square tiles are “really” the same pattern. The only difference between a square two feet on a side and a square half an inch on a side is how much grout you have to deal with. There are still infinitely many possibilities.

You might still suspect me of being boring. Sure, there’s a rectangular tile that’s, say, six inches by eight inches. And one that’s six inches by nine inches. Six inches by ten inches. Six inches by one millimeter. Yes, I’m technically right. But I’m not interested in that. Let’s allow that all rectangular tiles are “really” the same pattern. So we have “squares” and “rectangles”. There are still infinitely many tile possibilities.

Let me shorten the discussion here. Draw a quadrilateral. One that doesn’t intersect itself. That is, there’s four corners, four lines, and there’s no X crossings. If you have that, then you have a tiling. Get enough of these tiles and arrange them correctly and you can cover the plane. Or the kitchen floor, if you have a level floor. It might not be obvious how to do it. You might have to rotate alternating tiles, or set them in what seem like weird offsets. But you can do it. You’ll need someone to make the tiles for you, if you pick some weird pattern. I hope I live long enough to see it become part of the dubious kitchen package on junk home-renovation shows.

Let me broaden the discussion here. What do I mean by a tiling if I’m allowing any four-sided figure to be a tile? We start with a surface. Usually the plane, a flat surface stretching out infinitely far in two dimensions. The kitchen floor, or any other mere mortal surface, approximates this. But the floor stops at some point. That’s all right. The ideas we develop for the plane work all right for the kitchen. There’s some weird effects for the tiles that get too near the edges of the room. We don’t need to worry about them here. The tiles are some collection of open sets. No two tiles overlap. The tiles, plus their boundaries, cover the whole plane. That is, every point on the plane is either inside exactly one of the open sets, or it’s on the boundary between one (or more) sets.

There isn’t a requirement that all these sets have the same shape. We usually do, and will limit our tiles to one or two shapes endlessly repeated. It seems to appeal to our aesthetics and our installation budget. Using a single pattern allows us to cover the plane with triangles. Any triangle will do. Similarly any quadrilateral will do. For convex pentagonal tiles — here things get weird. There are fourteen known families of pentagons that tile the plane. Each member of the family looks about the same, but there’s some room for variation in the sides. Plus there’s one more special case that can tile the plane, but only that one shape, with no variation allowed. We don’t know if there’s a sixteenth pattern. But then until 2015 we didn’t know there was a 15th, and that was the first pattern found in thirty years. Might be an opening for someone with a good eye for doodling.

There are also exciting opportunities in convex hexagons. Anyone who plays strategy games knows a regular hexagon will tile the plane. (Regular hexagonal tilings fit a certain kind of strategy game well. Particularly they imply an equal distance between the centers of any adjacent tiles. Square and triangular tiles don’t guarantee that. This can imply better balance for territory-based games.) Irregular hexagons will, too. There are three known families of irregular hexagons that tile the plane. You can treat the regular hexagon as a special case of any of these three families. No one knows if there’s a fourth family. Ready your notepad at the next overlong, agenda-less meeting.

There aren’t tilings for identical convex heptagons, figures with seven sides. Nor eight, nor nine, nor any higher figure. You can cover them if you have non-convex figures. See any Tetris game where you keep getting the ‘s’ or ‘t’ shapes. And you can cover them if you use several shapes.

There’s some guidance if you want to create your own periodic tilings. I see it called the Conway Criterion. I don’t know the field well enough to say whether that is a common term. It could be something one mathematics popularizer thought of and that other popularizers imitated. (I don’t find “Conway Criterion” on the Mathworld glossary, but that isn’t definitive.) Suppose your polygon satisfies a couple of rules about the shapes of the edges. The rules are given in that link earlier this paragraph. If your shape does, then it’ll be able to tile the plane. If you don’t satisfy the rules, don’t despair! It might yet. The Conway Criterion tells you when some shape will tile the plane. It won’t tell you that something won’t.

(The name “Conway” may nag at you as familiar from somewhere. This criterion is named for John H Conway, who’s famous for a bunch of work in knot theory, group theory, and coding theory. And in popular mathematics for the “Game of Life”. This is a set of rules on a grid of numbers. The rules say how to calculate a new grid, based on this first one. Iterating them, creating grid after grid, can make patterns that seem far too complicated to be implicit in the simple rules. Conway also developed an algorithm to calculate the day of the week, in the Gregorian calendar. It is difficult to explain to the non-calendar fan how great this sort of thing is.)

This has all gotten to periodic tilings. That is, these patterns might be complicated. But if need be, we could get them printed on a nice square tile and cover the floor with that. Almost as beautiful and much easier to install. Are there tilings that aren’t periodic? Aperiodic tilings?

Well, sure. Easily. Take a bunch of tiles with a right angle, and two 45-degree angles. Put any two together and you have a square. So you’re “really” tiling squares that happen to be made up of a pair of triangles. Each pair, toss a coin to decide whether you put the diagonal as a forward or backward slash. Done. That’s not a periodic tiling. Not unless you had a weird run of luck on your coin tosses.

All right, but is that just a technicality? We could have easily installed this periodically and we just added some chaos to make it “not work”. Can we use a finite number of different kinds of tiles, and have it be aperiodic however much we try to make it periodic? And through about 1966 mathematicians would have mostly guessed that no, you couldn’t. If you had a set of tiles that would cover the plane aperiodically, there was also some way to do it periodically.

And then in 1966 came a surprising result. No, not Penrose tiles. I know you want me there. I’ll get there. Not there yet though. In 1966 Robert Berger — who also attended Rensselaer Polytechnic Institute, thank you — discovered such a tiling. It’s aperiodic, and it can’t be made periodic. Why do we know Penrose Tiles rather than Berger Tiles? Couple reasons, including that Berger has to use 20,426 distinct tile shapes. In 1971 Raphael M Robinson simplified matters a bit and got that down to six shapes. Roger Penrose in 1974 squeezed the set down to two, although by adding some rules about what edges may and may not touch one another. (You can turn this into a pure edges thing by putting notches into the shapes.) That really caught the public imagination. It’s got simplicity and accessibility to combine with beauty. Aperiodic tiles seem to relate to “quasicrystals”, which are what the name suggests and do happen in some materials. And they’ve got beauty. Aperiodic tiling embraces our need to have not too much order in our order.

I’ve discussed, in all this, tiling the plane. It’s an easy surface to think about and a popular one. But we can form tiling questions about other shapes. Cylinders, spheres, and toruses seem like they should have good tiling questions available. And we can imagine “tiling” stuff in more dimensions too. If we can fill a volume with cubes, or rectangles, it’s natural to wonder what other shapes we can fill it with. My impression is that fewer definite answers are known about the tiling of three- and four- and higher-dimensional space. Possibly because it’s harder to sketch out ideas and test them. Possibly because the spaces are that much stranger. I would be glad to hear more.

I’m hoping now to have a nice relaxing weekend. I won’t. I need to think of what to say for the letter ‘U’. On Tuesday I hope that it will join the rest of my A to Z essays at this link.

## From my Fourth A-to-Z: Topology

In 2017 I reverted to just one A-to-Z per year. And I got banner art for the first time. It’s a small bit of polish that raised my apparent professionalism a whole order of magnitude. And for the letter T, I did something no pop mathematics blog had ever done before. I wrote about topology without starting from stretchy rubber doughnuts and coffee cups. Let me prove that to you now.

Today’s glossary entry comes from Elke Stangl, author of the Elkemental Force blog. I’ll do my best, although it would have made my essay a bit easier if I’d had the chance to do another topic first. We’ll get there.

# Topology.

Start with a universe. Nice thing to have around. Call it ‘M’. I’ll get to why that name.

I’ve talked a fair bit about weird mathematical objects that need some bundle of traits to be interesting. So this will change the pace some. Here, I request only that the universe have a concept of “sets”. OK, that carries a little baggage along with it. We have to have intersections and unions. Those come about from having pairs of sets. The intersection of two sets is all the things that are in both sets simultaneously. The union of two sets is all the things that are in one set, or the other, or both simultaneously. But it’s hard to think of something that could have sets that couldn’t have intersections and unions.

So from your universe ‘M’ create a new collection of things. Call it ‘T’. I’ll get to why that name. But if you’ve formed a guess about why, then you know. So I suppose I don’t need to say why, now. ‘T’ is a collection of subsets of ‘M’. Now let’s suppose these four things are true.

First. ‘M’ is one of the sets in ‘T’.

Second. The empty set ∅ (which has nothing at all in it) is one of the sets in ‘T’.

Third. Whenever two sets are in ‘T’, their intersection is also in ‘T’.

Fourth. Whenever two (or more) sets are in ‘T’, their union is also in ‘T’.

Got all that? I imagine a lot of shrugging and head-nodding out there. So let’s take that. Your universe ‘M’ and your collection of sets ‘T’ are a topology. And that’s that.

Yeah, that’s never that. Let me put in some more text. Suppose we have a universe that consists of two symbols, say, ‘a’ and ‘b’. There’s four distinct topologies you can make of that. Take the universe plus the collection of sets {∅}, {a}, {b}, and {a, b}. That’s a topology. Try it out. That’s the first collection you would probably think of.

Here’s another collection. Take this two-thing universe and the collection of sets {∅}, {a}, and {a, b}. That’s another topology and you might want to double-check that. Or there’s this one: the universe and the collection of sets {∅}, {b}, and {a, b}. Last one: the universe and the collection of sets {∅} and {a, b} and nothing else. That one barely looks legitimate, but it is. Not a topology: the universe and the collection of sets {∅}, {a}, and {b}.

The number of toplogies grows surprisingly with the number of things in the universe. Like, if we had three symbols, ‘a’, ‘b’, and ‘c’, there would be 29 possible topologies. The universe of the three symbols and the collection of sets {∅}, {a}, {b, c}, and {a, b, c}, for example, would be a topology. But the universe and the collection of sets {∅}, {a}, {b}, {c}, and {a, b, c} would not. It’s a good thing to ponder if you need something to occupy your mind while awake in bed.

With four symbols, there’s 355 possibilities. Good luck working those all out before you fall asleep. Five symbols have 6,942 possibilities. You realize this doesn’t look like any expected sequence. After ‘4’ the count of topologies isn’t anything obvious like “two to the number of symbols” or “the number of symbols factorial” or something.

Are you getting ready to call me on being inconsistent? In the past I’ve talked about topology as studying what we can know about geometry without involving the idea of distance. How’s that got anything to do with this fiddling about with sets and intersections and stuff?

So now we come to that name ‘M’, and what it’s finally mnemonic for. I have to touch on something Elke Stangl hoped I’d write about, but a letter someone else bid on first. That would be a manifold. I come from an applied-mathematics background so I’m not sure I ever got a proper introduction to manifolds. They appeared one day in the background of some talk about physics problems. I think they were introduced as “it’s a space that works like normal space”, and that was it. We were supposed to pretend we had always known about them. (I’m translating. What we were actually told would be that it “works like R3”. That’s how mathematicians say “like normal space”.) That was all we needed.

Properly, a manifold is … eh. It’s something that works kind of like normal space. That is, it’s a set, something that can be a universe. And it has to be something we can define “open sets” on. The open sets for the manifold follow the rules I gave for a topology above. You can make a collection of these open sets. And the empty set has to be in that collection. So does the whole universe. The intersection of two open sets in that collection is itself in that collection. The union of open sets in that collection is in that collection. If all that’s true, then we have a manifold.

And now the piece that makes every pop mathematics article about topology talk about doughnuts and coffee cups. It’s possible that two topologies might be homeomorphic to each other. “Homeomorphic” is a term of art. But you understand it if you remember that “morph” means shape, and suspect that “homeo” is probably close to “homogenous”. Two things being homeomorphic means you can match their parts up. In the matching there’s nothing left over in the first thing or the second. And the relations between the parts of the first thing are the same as the relations between the parts of the second thing.

So. Imagine the snippet of the number line for the numbers larger than -π and smaller than π. Think of all the open sets you can use to cover that. It will have a set like “the numbers bigger than 0 and less than 1”. A set like “the numbers bigger than -π and smaller than 2.1”. A set like “the numbers bigger than 0.01 and smaller than 0.011”. And so on.

Now imagine the points that exist on a circle, if you’ve omitted one point. Let’s say it’s the unit circle, centered on the origin, and that what we’re leaving out is the point that’s exactly to the left of the origin. The open sets for this are the arcs that cover some part of this punctured circle. There’s the arc that corresponds to the angles from 0 to 1 radian measure. There’s the arc that corresponds to the angles from -π to 2.1 radians. There’s the arc that corresponds to the angles from 0.01 to 0.011 radians. You see where this is going. You see why I say we can match those sets on the number line to the arcs of this punctured circle. There’s some details to fill in here. But you probably believe me this could be done if I had to.

There’s two (or three) great branches of topology. One is called “algebraic topology”. It’s the one that makes for fun pop mathematics articles about imaginary rubber sheets. It’s called “algebraic” because this field makes it natural to study the holes in a sheet. And those holes tend to form groups and rings, basic pieces of Not That Algebra. The field (I’m told) can be interpreted as looking at functors on groups and rings. This makes for some neat tying-together of subjects this A To Z round.

The other branch is called “differential topology”, which is a great field to study because it sounds like what Mister Spock is thinking about. It inspires awestruck looks where saying you study, like, Bayesian probability gets blank stares. Differential topology is about differentiable functions on manifolds. This gets deep into mathematical physics.

As you study mathematical physics, you stop worrying about ever solving specific physics problems. Specific problems are petty stuff. What you like is solving whole classes of problems. A steady trick for this is to try to find some properties that are true about the problem regardless of what exactly it’s doing at the time. This amounts to finding a manifold that relates to the problem. Consider a central-force problem, for example, with planets orbiting a sun. A planet can’t move just anywhere. It can only be in places and moving in directions that give the system the same total energy that it had to start. And the same linear momentum. And the same angular momentum. We can match these constraints to manifolds. Whatever the planet does, it does it without ever leaving these manifolds. To know the shapes of these manifolds — how they are connected — and what kinds of functions are defined on them tells us something of how the planets move.

The maybe-third branch is “low-dimensional topology”. This is what differential topology is for two- or three- or four-dimensional spaces. You know, shapes we can imagine with ease in the real world. Maybe imagine with some effort, for four dimensions. This kind of branches out of differential topology because having so few dimensions to work in makes a lot of problems harder. We need specialized theoretical tools that only work for these cases. Is that enough to count as a separate branch? It depends what topologists you want to pick a fight with. (I don’t want a fight with any of them. I’m over here in numerical mathematics when I’m not merely blogging. I’m happy to provide space for anyone wishing to defend her branch of topology.)

But each grows out of this quite general, quite abstract idea, also known as “point-set topology”, that’s all about sets and collections of sets. There is much that we can learn from thinking about how to collect the things that are possible.

## From my Third A-to-Z: Tree

It’s difficult to remember but there was a time I didn’t just post three A-to-Z essays in a week, but I did two such sequences in a year. It’s hard to imagine having that much energy now. The End 2016 A-to-Z got that name, rather than “End Of 2016”, because — hard as this may be to believe now — 2016 seemed like a particularly brutal year that we could not wait to finish. Unfortunately it turned out to be one of those years that will get pop-histories with subtitles like “Twelve Months That Changed The World” or “The Crisis Of Our Times”. Still, this piece shows off some of what I think characteristic of my writing: an interest in the legends that accrue around mathematical fields, and my reasons to be skeptical of the legends.

Graph theory begins with a beautiful legend. I have no reason to suppose it’s false, except my natural suspicion of beautiful legends as origin stories. Its organization as a field is traced to 18th century Köningsburg, where seven bridges connected the banks of a river and a small island in the center. Whether it was possible to cross each bridge exactly once and get back where one started was, they say, a pleasant idle thought to ponder and path to try walking. Then Leonhard Euler solved the problem. It’s impossible.

## Tree.

Graph theory arises whenever we have a bunch of things that can be connected. We call the things “vertices”, because that’s a good corner-type word. The connections we call “edges”, because that’s a good connection-type word. It’s easy to create graphs that look like the edges of a crystal, especially if you draw edges as straight as much as possible. You don’t have to. You can draw them curved. Then they look like the scary tangles of wire around your wireless router complex.

Graph theory really got organized in the 19th century, and went crazy in the 20th. It turns out there’s lots of things that connect to other things. Networks, whether computers or social or thematically linked concepts. Anything that has to be delivered from one place to another. All the interesting chemicals. Anything that could be put in a pipe or taken on a road has some graph theory thing applicable to it.

A lot of graph theory ponders loops. The original problem was about how to use every bridge, every edge, exactly one time. Look at a tangled mass of a graph and it’s hard not to start looking for loops. They’re often interesting. It’s not easy to tell if there’s a loop that lets you get to every vertex exactly once.

What if there aren’t loops? What if there aren’t any vertices you can step away from and get back to by another route? Well, then you have a tree.

A tree’s a graph where all the vertices are connected so that there aren’t any closed loops. We normally draw them with straight lines, the better to look like actual trees. We then stop trying to make them look like actual trees by doing stuff like drawing them as a long horizontal spine with a couple branches sticking off above and below, or as * type stars, or H shapes. They still correspond to real-world things. If you’re not sure how consider the layout of one of those long, single-corridor hallways as in a hotel or dormitory. The rooms connect to one another as a tree once again, as long as no room opens to anything but its own closet or bathroom or the central hallway.

We can talk about the radius of a graph. That’s how many edges away any point can be from the center of the tree. And every tree has a center. Or two centers. If it has two centers they share an edge between the two. And that’s one of the quietly amazing things about trees to me. However complicated and messy the tree might be, we can find its center. How many things allow us that?

A tree might have some special vertex. That’s called the ‘root’. It’s what the vertices and the connections represent that make a root; it’s not something inherent in the way trees look. We pick one for some special reason and then we highlight it. Maybe put it at the bottom of the drawing, making ‘root’ for once a sensible name for a mathematics thing. Often we put it at the top of the drawing, because I guess we’re just being difficult. Well, we do that because we were modelling stuff where a thing’s properties depend on what it comes from. And that puts us into thoughts of inheritance and of family trees. And weird as it is to put the root of a tree at the top, it’s also weird to put the eldest ancestors at the bottom of a family tree. People do it, but in those illuminated drawings that make a literal tree out of things. You don’t see it in family trees used for actual work, like filling up a couple pages at the start of a king or a queen’s biography.

Trees give us neat new questions to ponder, like, how many are there? I mean, if you have a certain number of vertices then how many ways are there to arrange them? One or two or three vertices all have just the one way to arrange them. Four vertices can be hooked up a whole two ways. Five vertices offer a whole three different ways to connect them. Six vertices offer six ways to connect and now we’re finally getting something interesting. There’s eleven ways to connect seven vertices, and 23 ways to connect eight vertices. The number keeps on rising, but it doesn’t follow the obvious patterns for growth of this sort of thing.

And if that’s not enough to idly ponder then think of destroying trees. Draw a tree, any shape you like. Pick one of the vertices. Imagine you obliterate that. How many separate pieces has the tree been broken into? It might be as few as two. It might be as many as the number of remaining vertices. If graph theory took away the pastime of wandering around Köningsburg’s bridges, it has given us this pastime we can create anytime we have pen, paper, and a long meeting.

## From my Second A-To-Z: Transcendental Number

The second time I did one of these A-to-Z’s, I hit on the idea of asking people for suggestions. It was a good move as it opened up subjects I had not come close to considering. I didn’t think to include the instructions for making your own transcendental number, though. You never get craft projects in mathematics, not after you get past the stage of making construction-paper rhombuses or something. I am glad to see my schtick of including a warning about using this stuff at your thesis defense was established by then.

I’m down to the last seven letters in the Leap Day 2016 A To Z. It’s also the next-to-the-last of Gaurish’s requests. This was a fun one.

## Transcendental Number.

Take a huge bag and stuff all the real numbers into it. Give the bag a good solid shaking. Stir up all the numbers until they’re thoroughly mixed. Reach in and grab just the one. There you go: you’ve got a transcendental number. Enjoy!

OK, I detect some grumbling out there. The first is that you tried doing this in your head because you somehow don’t have a bag large enough to hold all the real numbers. And you imagined pulling out some number like “2” or “37” or maybe “one-half”. And you may not be exactly sure what a transcendental number is. But you’re confident the strangest number you extracted, “minus 8”, isn’t it. And you’re right. None of those are transcendental numbers.

I regret saying this, but that’s your own fault. You’re lousy at picking random numbers from your head. So am I. We all are. Don’t believe me? Think of a positive whole number. I predict you probably picked something between 1 and 10. Almost surely something between 1 and 100. Surely something less than 10,000. You didn’t even consider picking something between 10,012,002,214,473,325,937,775 and 10,012,002,214,473,325,937,785. Challenged to pick a number, people will select nice and familiar ones. The nice familiar numbers happen not to be transcendental.

I detect some secondary grumbling there. Somebody picked π. And someone else picked e. Very good. Those are transcendental numbers. They’re also nice familiar numbers, at least to people who like mathematics a lot. So they attract attention.

Still haven’t said what they are. What they are traces back, of course, to polynomials. Take a polynomial that’s got one variable, which we call ‘x’ because we don’t want to be difficult. Suppose that all the coefficients of the polynomial, the constant numbers we presumably know or could find out, are integers. What are the roots of the polynomial? That is, for what values of x is the polynomial a complicated way of writing ‘zero’?

For example, try the polynomial x2 – 6x + 5. If x = 1, then that polynomial is equal to zero. If x = 5, the polynomial’s equal to zero. Or how about the polynomial x2 + 4x + 4? That’s equal to zero if x is equal to -2. So a polynomial with integer coefficients can certainly have positive and negative integers as roots.

How about the polynomial 2x – 3? Yes, that is so a polynomial. This is almost easy. That’s equal to zero if x = 3/2. How about the polynomial (2x – 3)(4x + 5)(6x – 7)? It’s my polynomial and I want to write it so it’s easy to find the roots. That polynomial will be zero if x = 3/2, or if x = -5/4, or if x = 7/6. So a polynomial with integer coefficients can have positive and negative rational numbers as roots.

How about the polynomial x2 – 2? That’s equal to zero if x is the square root of 2, about 1.414. It’s also equal to zero if x is minus the square root of 2, about -1.414. And the square root of 2 is irrational. So we can certainly have irrational numbers as roots.

So if we can have whole numbers, and rational numbers, and irrational numbers as roots, how can there be anything else? Yes, complex numbers, I see you raising your hand there. We’re not talking about complex numbers just now. Only real numbers.

It isn’t hard to work out why we can get any whole number, positive or negative, from a polynomial with integer coefficients. Or why we can get any rational number. The irrationals, though … it turns out we can only get some of them this way. We can get square roots and cube roots and fourth roots and all that. We can get combinations of those. But we can’t get everything. There are irrational numbers that are there but that even polynomials can’t reach.

It’s all right to be surprised. It’s a surprising result. Maybe even unsettling. Transcendental numbers have something peculiar about them. The 19th Century French mathematician Joseph Liouville first proved the things must exist, in 1844. (He used continued fractions to show there must be such things.) It would be seven years later that he gave an example of one in nice, easy-to-understand decimals. This is the number 0.110 001 000 000 000 000 000 001 000 000 (et cetera). This number is zero almost everywhere. But there’s a 1 in the n-th digit past the decimal if n is the factorial of some number. That is, 1! is 1, so the 1st digit past the decimal is a 1. 2! is 2, so the 2nd digit past the decimal is a 1. 3! is 6, so the 6th digit past the decimal is a 1. 4! is 24, so the 24th digit past the decimal is a 1. The next 1 will appear in spot number 5!, which is 120. After that, 6! is 720 so we wait for the 720th digit to be 1 again.

And what is this Liouville number 0.110 001 000 000 000 000 000 001 000 000 (et cetera) used for, besides showing that a transcendental number exists? Not a thing. It’s of no other interest. And this plagued the transcendental numbers until 1873. The only examples anyone had of transcendental numbers were ones built to show that they existed. In 1873 Charles Hermite showed finally that e, the base of the natural logarithm, was transcendental. e is a much more interesting number; we have reasons to care about it. Every exponential growth or decay or oscillating process has e lurking in it somewhere. In 1882 Ferdinand von Lindemann showed that π was transcendental, and that’s an even more interesting number.

That bit about π has interesting implications. One goes back to the ancient Greeks. Is it possible, using straightedge and compass, to create a square that’s exactly the same size as a given circle? This is equivalent to saying, if I give you a line segment, can you create another line segment that’s exactly the square root of π times as long? This geometric problem is equivalent to an algebraic one. That problem: can you create a polynomial, with integer coefficients, that has the square root of π as a root? (WARNING: I’m skipping some important points for the sake of clarity. DO NOT attempt to use this to pass your thesis defense without putting those points back in.) We want the square root of π because … well, what’s the area of a square whose sides are the square root of π long? That’s right. So we start with a line segment that’s equal to the radius of the circle and we can do that, surely. Once we have the radius, can’t we make a line that’s the square root of π times the radius, and from that make a square with area exactly π times the radius squared? Since π is transcendental, then, no. We can’t. Sorry. One of the great problems of ancient mathematics, and one that still has the power to attract the casual mathematician, got its final answer in 1882.

Georg Cantor is a name even non-mathematicians might recognize. He showed there have to be some infinite sets bigger than others, and that there must be more real numbers than there are rational numbers. Four years after showing that, he proved there are as many transcendental numbers as there are real numbers.

They’re everywhere. They permeate the real numbers so much that we can understand the real numbers as the transcendental numbers plus some dust. They’re almost the dark matter of mathematics. We don’t actually know all that many of them. Wolfram MathWorld has a table listing numbers proven to be transcendental, and the fact we can list that on a single web page is remarkable. Some of them are large sets of numbers, yes, like $e^{\pi \sqrt{d}}$ for every positive whole number d. And we can infer many more from them; if π is transcendental then so is 2π, and so is 5π, and so is -20.38π, and so on. But the table of numbers proven to be irrational is still just 25 rows long.

There are even mysteries about obvious numbers. π is transcendental. So is e. We know that at least one of π times e and π plus e is transcendental. Perhaps both are. We don’t know which one is, or if both are. We don’t know whether ππ is transcendental. We don’t know whether ee is, either. Don’t even ask if πe is.

How, by the way, does this fit with my claim that everything in mathematics is polynomials? — Well, we found these numbers in the first place by looking at polynomials. The set is defined, even to this day, by how a particular kind of polynomial can’t reach them. Thinking about a particular kind of polynomial makes visible this interesting set.

## From my First A-to-Z: Tensor

Of course I can’t just take a break for the sake of having a break. I feel like I have to do something of interest. So why not make better use of my several thousand past entries and repost one? I’d just reblog it except WordPress’s system for that is kind of rubbish. So here’s what I wrote, when I was first doing A-to-Z’s, back in summer of 2015. Somehow I was able to post three of these a week. I don’t know how.

I had remembered this essay as mostly describing the boring part of tensors, that we usually represent them as grids of numbers and then symbols with subscripts and superscripts. I’m glad to rediscover that I got at why we do such things to numbers and subscripts and superscripts.

## Tensor.

The true but unenlightening answer first: a tensor is a regular, rectangular grid of numbers. The most common kind is a two-dimensional grid, so that it looks like a matrix, or like the times tables. It might be square, with as many rows as columns, or it might be rectangular.

It can also be one-dimensional, looking like a row or a column of numbers. Or it could be three-dimensional, rows and columns and whole levels of numbers. We don’t try to visualize that. It can be what we call zero-dimensional, in which case it just looks like a solitary number. It might be four- or more-dimensional, although I confess I’ve never heard of anyone who actually writes out such a thing. It’s just so hard to visualize.

You can add and subtract tensors if they’re of compatible sizes. You can also do something like multiplication. And this does mean that tensors of compatible sizes will form a ring. Of course, that doesn’t say why they’re interesting.

Tensors are useful because they can describe spatial relationships efficiently. The word comes from the same Latin root as “tension”, a hint about how we can imagine it. A common use of tensors is in describing the stress in an object. Applying stress in different directions to an object often produces different effects. The classic example there is a newspaper. Rip it in one direction and you get a smooth, clean tear. Rip it perpendicularly and you get a raggedy mess. The stress tensor represents this: it gives some idea of how a force put on the paper will create a tear.

Tensors show up a lot in physics, and so in mathematical physics. Technically they show up everywhere, since vectors and even plain old numbers (scalars, in the lingo) are kinds of tensors, but that’s not what I mean. Tensors can describe efficiently things whose magnitude and direction changes based on where something is and where it’s looking. So they are a great tool to use if one wants to represent stress, or how well magnetic fields pass through objects, or how electrical fields are distorted by the objects they move in. And they describe space, as well: general relativity is built on tensors. The mathematics of a tensor allow one to describe how space is shaped, based on how to measure the distance between two points in space.

My own mathematical education happened to be pretty tensor-light. I never happened to have courses that forced me to get good with them, and I confess to feeling intimidated when a mathematical argument gets deep into tensor mathematics. Joseph C Kolecki, with NASA’s Glenn (Lewis) Research Center, published in 2002 a nice little booklet “An Introduction to Tensors for Students of Physics and Engineering”. This I think nicely bridges some of the gap between mathematical structures like vectors and matrices, that mathematics and physics majors know well, and the kinds of tensors that get called tensors and that can be intimidating.

## My Little 2021 Mathematics A-to-Z is taking a short break

I regret coming to this point. I’d started my Little 2021 Mathematics A-to-Z with more lead time than usual, in the hopes that I’d have a less stressful time for the whole project. And then all that lead time slipped away. And there’s an extra bit of awkwardness, caused by my once-a-week schedule and the date I happened to start publishing this year’s project. I couldn’t finish it before 2021 ended, not unless I published two things in a week. And I don’t have the energy or time to do that.

The point of a schedule is to help make it easier to accomplish things you value. If it can’t help that, then the schedule has to go. I’ve given people this advice, and now, I’ll take it. I mean still to get to the letters T, O, and Z, to finish off the sequence. But that’ll run in 2022, I hope early in the year.

## My Little 2021 Mathematics A-to-Z: Atlas

I owe Elkement thanks again for a topic. They’re author of the Theory and Practice of Trying to Combine Just Anything blog. And the subject lets me circle back around topology.

# Atlas.

Mathematics is like every field in having jargon. Some jargon is unique to the field; there is no lay meaning of a “homeomorphism”. Some jargon is words plucked from the common language, such as “smooth”. The common meaning may guide you to what mathematicians want in it. A smooth function has a graph with no gaps, no discontinuities, no sharp corners; you can see smoothness in it. Sometimes the common meaning is an ambiguous help. A “series” is the sum of a sequence of numbers, that is, it is one number. Mathematicians study the series, but by looking at properties of the sequence.

So what sort of jargon is “atlas”? In common English, an atlas is a book of maps. Each map represents something different. Perhaps a different region of space. Perhaps a different scale, or a different projection altogether. The maps may show different features, or show them at different times. The maps must be about the same sort of thing. No slipping a map of Narnia in with the map of an amusement park, unless you warn of that in the title. The maps must not contradict one another. (So far as human-made things can be consistent, anyway.) And that’s the important stuff.

Atlas is the first kind of common-word jargon. Mathematicians use it to mean a collection of things. Those collected things aren’t mathematical maps. “Map” is the second type of jargon. The collected things are coordinate charts. “Coordinate chart” is a pairing of words not likely to appear in common English. But if you did encounter them? The meaning you might guess from their common use is not far off their mathematical use.

A coordinate chart is a matching of the points in an open set to normal coordinates. Euclidean coordinates, to be precise. But, you know, latitude and longitude, if it’s two dimensional. Add in the altitude if it’s three dimensions. Your x-y-z coordinates. It still counts if this is one dimension, or four dimensions, or sixteen dimensions. You’re less likely to draw a sketch of those. (In practice, you draw a sketch of a three-dimensional blob, and put N = 16 off in the corner, maybe in a box.)

These coordinate charts are on a manifold. That’s the second type of common-language jargon. Manifold, to pick the least bad of its manifold common definitions, is a “complicated object or subject”. The mathematical manifold is a surface. The things on that surface are connected by relationships that could be complicated. But the shape can be as simple as a plane or a sphere or a torus.

Every point on a coordinate chart needs some unique set of coordinates. And if a point appears on two coordinate charts, they have to be consistent. Consistent here is the matching between charts being a homeomorphism. A homeomorphism is a map, in the jargon sense. So it’s a function matching open sets on one chart to ope sets in the other chart. There’s more to it (there always is). But the important thing is that, away from the edges of the chart, we don’t create any new gaps or punctures or missing sections.

Some manifolds are easy to spot. The surface of the Earth, for example. Many are easy to come up with charts for. Think of any map of the Earth. Each point on the surface of the Earth matches some point on the sheet of paper. The coordinate chart is … let’s say how far your point is from the upper left corner of the page. (Pretend that you can measure those points precisely enough to match them to, like, the town you’re in.) Could be how far you are from the center, or the lower right corner, or whatever. These are all as good, and even count as other coordinate charts.

It’s easy to imagine that as latitude and longitude. We see maps of the world arranged by latitude and longitude so often. And that’s fine; latitude and longitude makes a good chart. But we have a problem in giving coordinates to the north and south pole. The latitude is easy but the longitude? So we have two points that can’t be covered on the map. We can save our atlas by having a couple charts. For the Earth this can be a map of most of the world arranged by latitude and longitude, and then two insets showing a disc around the north and the south poles. Thus we have an atlas of three charts.

We can make this a little tighter, reducing this to two charts. Have one that’s your normal sort of wall map, centered on the equator. Have the other be a transverse Mercator map. Make its center the great circle going through the prime meridian and the 180-degree antimeridian. Then every point on the planet, including the poles, has a neat unambiguous coordinate in at least one chart. A good chunk of the world will be on both charts. We can throw in more charts if we like, but two is enough.

The requirements to be an atlas aren’t hard to meet. So a lot of geometric structures end up being atlases. Theodore Frankel’s wonderful The Geometry of Physics introduces them on page 15. But that’s also the last appearance of “atlas”, at least in the index. The idea gets upstaged. The manifolds that the atlas charts end up being more interesting. Many problems about things in motion are easy to describe as paths traced out on manifolds. A large chunk of mathematical physics is then looking at this problem and figuring out what the space of possible behaviors looks like. What its topology is.

In a sense, the mathematical physicist might survey a problem, like a scout exploring new territory, more than solve it. This exploration brings us to directional derivatives. To tangent bundles. To other terms, jargon only partially informed by the common meanings.

And we draw to the final weeks of 2021, and of the Little 2021 Mathematics A-to-Z. All this year’s essays should be at this link. And all my glossary essays from every year should be at this link. Thank you for reading!

## My Little 2021 Mathematics A-to-Z: Subtraction

Iva Sallay was once again a kind friend to my writing efforts here. Sallay, who runs the Find the Factors recreational mathematics puzzle site, saw a topic gives a compelling theme to this year’s A-to-Z.

# Subtraction.

Subtraction is the inverse of addition.

So thanks for reading along as the Little 2021 Mathematics A-to-Z enters its final stage. Next week I hope to be back with something for my third letter ‘A’ of the sequence.

All right, I can be a little more clear. By the inverse I mean subtraction is the name the name we give to adding the additive inverse of something. It’s what lets addition be a group action. That is, we write $a - b$ to mean we find whatever number, added to b, gives us 0. Then we add that to a. We do this pretty often, so it’s convenient to have a name for it. The word “subtraction” appears in English from about 1400. It grew from the Latin for “taking away”. By about 1425 the word has its mathematical meaning. I imagine this wasn’t too radical a linguistic evolution

All right, so some other thoughts. What’s so interesting about subtraction that it’s worth a name? We don’t have a particular word for reversing, say, a permutation. But don’t go very far in school not thinking about inverting an addition. Must come down to subtraction’s practical use in finding differences between things. Often in figuring out change. Debts at least. Nobody needs the inverse of a permutation unless they’re putting a deck of cards back in order.

Subtraction has other roles, though. Not so much in mathematics, but in teaching us how to learn about mathematics. For example, subtraction gives us a good reason to notice zero. Zero, the additive identity, is implicit to addition. But if you’re learning addition, and you think of it as “put these two piles of things together into one larger pile”? What good does an empty pile do you there? It’s easy to not notice there’s a concept there. But subtraction, taking stuff away from a pile? You can imagine taking everything away, and wanting a word for that. This isn’t the only way to notice zero is worth some attention. It’s a good way, though.

There’s more, though. Learning subtraction teaches us limits of what we can do, mathematically. We can add 3 to 7 or, if it’s more convenient, 7 to 3. But we learn from the start that while we can subtract 3 from 7, there’s no subtracting 7 from 3. This is true when we’re learning arithmetic and numbers are all positive. Some time later we ask, what happens if we go ahead and do this anyway? And figure out a number that makes sense as the answer to “what do you get subtracting 7 from 3”? This introduces us to the negative numbers. It’s a richer idea of what it is to have numbers. We can start to see addition and subtraction as expressions of the same operation.

But we also notice they’re not quite the same. As mentioned, addition can be done in any order. If I need to do 7 + 4 + 3 + 6 I can decide I’d rather do 4 + 6 + 7 + 3 and make that 10 + 10 before getting to 20. This all simplifies my calculating. If I need to do 7 – 4 – 3 – 6 I get into a lot of trouble if I simplify my work by writing 4 – 6 – 7 – 3 instead. Even if I decide I’d rather take the 3 – 6 and turn that into a negative 3 first, I’ve made a mess of things.

The first property this teaches us to notice we call “commutativity”. Most mathematical operations don’t have that. But a lot of the ones we find useful do. The second property this points out is “associativity”, which more of the operations we find useful have. It’s not essential that someone learning how to calculate know this is a way to categorize mathematics operations. (I’ve read that before the New Math educational reforms of the 1960s, American elementary school mathematics textbooks never mentioned commutativity or associativity.) But I suspect it is essential that someone learning mathematics learn the things you can do come in families.

So let me mention division, the inverse of multiplication. (And that my chosen theme won’t let me get to in sequence.) Like subtraction, division refuses to be commutative or associative. Subtraction prompts us to treat the negative numbers as something useful. In parallel, division prompts us to accept fractions as numbers. (We accepted fractions as numbers long before we accepted negative numbers, mind. Anyone with a pie and three friends has an interest in “one-quarter” that they may not have with “negative four”.) When we start learning about numbers raised to powers, or exponentials, we have questions ready to ask. How do the operations behave? Do they encourage us to find other kinds of number?

And we also think of how to patch up subtraction’s problems. If we want subtraction to be a kind of addition, we have to get precise about what that little subtraction sign means. What we’ve settled on is that $a - b$ is shorthand for $a + (-b)$, where $-b$ is the additive inverse of $b$.

Once we do that all subtraction’s problems with commutativity and associativity go away. 7 – 4 – 3 – 6 becomes 7 + (-4) + (-3) + (-6), and that we can shuffle around however convenient. Say, to 7 + (-3) + (-4) + (-6), then to 7 + (-3) + (-10), then to 4 + (-10), and so -6. Thus do we domesticate a useful, wild operation like subtraction.

Any individual subtraction has one right answer. There are many ways to get there, though. I had learned, for example, to do a problem such as 738 minus 451 by subtracting one column of numbers at a time. Right to left, so, subtracting 8 minus 1, and then 3 minus 5, and after the borrowing then 6 minus 4. I remember several elementary school textbooks explaining borrowing as unwrapping rolls of dimes. It was a model well-suited to me.

We don’t need to, though. We can go from the left to the right, doing 7 minus 4 first and 8 minus 1 last. We can go through and figure out all the possible carries before doing any work. There’s a slick method called partial differences which skips all the carrying. But it demands writing out several more intermediate terms. This uses more paper, but if there isn’t a paper shortage, so what?

There are more ways to calculate. If we turn things over to a computer, we’re likely to do subtraction using a complements technique. When I say computer you likely think electronic computer, or did right up to the adjective there. But mechanical computers were a thing too. Blaise Pascal’s computing device of the 1650s used nines’ complements to subtract on the gears that did addition. Explaining the trick would take me farther afield than I want to go now. But, you know how, like, 6 plus 3 is 9? So you can turn a subtraction of 6 into an addition of 3. Or a subtraction of 3 into an addition of 6. Plus some bookkeeping.

A digital computer is likely to use ones’ complements, representing every number as a string of 0’s and 1’s. This has great speed advantages. The complement of 0 is 1 and vice-versa, and it’s very quick for a computer to swap between 0 and 1. Subtraction by complements is different and, to my eye, takes more steps. But they might be steps you do better.

One more thought subtraction gives us, though. In a previous paragraph I wrote out 7 – 4, and also wrote 7 + (-4). We use the symbol – for two things. Do those two uses of – mean the same thing? You may think I’m being fussy here. After all, the value of -4 is the same as the value of 0 – 4. And even a fussy mathematician says whichever of “minus four” and “negative four” better fits the meter of the sentence. But our friends in the philosophy department would agree this is a fair question. Are we collapsing two related ideas together by using the same symbol for them?

My inclination is to say that the – of -4 is different from the – in 0 – 4, though. The – in -4 is a unary operation: it means “give me the inverse of the number on the right”. The – in 0 – 4 is a binary operation: it means “subtract the number on the right from the number on the left”. So I would say these are different things sharing a symbol. Unfortunately our friends in the philosophy department can’t answer the question for us. The university laid them off four years ago, part of society’s realignment away from questions like “how can we recognize when a thing is true?” and towards “how can we teach proto-laborers to use Excel macros?”. We have to use subtraction to expand our thinking on our own.

## How November 2021 Treated My Mathematics Blog

As I come near the end of the Little 2021 Mathematics A-to-Z, I also come to the start of December. So that’s a good time to look at the past month and see how readers responded to my work. Over November I published seven pieces, and here’s how they sorted out, most popular to the least, as WordPress counts their page views:

There’s an obvious advantage stuff published earlier in the month has. Still, this is usually around the time in an A-to-Z sequence where I get hit by a content aggregator and one post gets 25,000 views in a three-hour period and then falls back to normal. Would be a mood lift.

After a suspiciously average October, I saw another underperforming November. I mean underperforming compared to the twelve-month running average leading up to November. The mean, leading up to November, monthly page view was 2,501.8, and the median was 2,527. In actual November, I got 2,103 page views. The mean number of unique visitors was 1,775.7, and the running median 1,752. In fact, there were 1,493 unique visitors.

Rated per posting, though, it doesn’t look so bad. There were on average 300.4 page views for each of the seven postings this past month. The twelve-month running mean was 314.3 views per posting, and the median 307.4. There were 213.3 unique visitors per posting in November. This is insignificantly below the running mean 222.1 unique visitors per posting, and running median of 217.2 visitors per posting. (And, again, this is views to anything at all on my blog, per new posting. Sometime, I’ll have to dare a month with no posts to learn how much my back catalogue gets on its own weight.)

I am at least growing less likable, confirming a fear. There were 25 likes given in November, the second month in a row it’s been less than one like a day. The mean was 43.4 likes per day, and the median 42. It doesn’t even look good rated per posting: this came out to 3.6 likes per posting, compared to a running mean of 5.3 and running median of 5.6. Comments offer a little hope, at least, with 13 comments given over the course of November. The mean was 15.1 and median 10.1. Per posting, this gets right on average: November averaged 1.9 comments per posting, and the twelve-month running mean was 1.9. The twelve-month running median was 1.4 comments per posting, so I finally found a figure where I beat an average.

WordPress figures I published 6,106 words this past month. It’s my second-most loquacious month this year, with an average 872.3 words per November posting. It brings my total for the year to 50,429 words, averaging 623 words per posting. Unless December makes some big changes this is going to be my second-least-talkative year of the blog.

As of the start of November I’ve had 1,663 postings here. They’ve drawn a total 148,937 views, from 88,561 unique visitors.

If you’d like to follow this blog regularly, I’d be glad if you did. You can use the “Follow Nebusresearch” button at the upper right corner of this page. Or you can get essays by e-mail as soon as they’re published, using the box just below that button. I don’t use the e-mail for anything but sending these essays. I don’t know how WordPress Master Command uses them.

While my Twitter account has gone feral I am on Mathstodon, the mathematics-themed instance of the Mastodon network. So you can catch me as @nebusj@mathstodon.xyz there. Thank you as ever for reading and for, I hope, the successful conclusion of this year’s little A-to-Z.

## My Little 2021 Mathematics A-to-Z: Convex

Jacob Siehler, a friend from Mathstodon, and Assistant Professor at Gustavus Adolphus College, offered several good topics for the letter ‘C’. I picked the one that seemed to connect to the greatest number of other topics I’ve covered recently.

# Convex

It’s easy to say what convex is, if we’re talking about shapes in ordinary space. A convex shape is one where the line connecting any two points inside the shape always stays inside the shape. Circles are convex. Triangles and rectangles too. Star shapes are not. Is a torus? That depends. If it’s a doughnut shape sitting in some bigger space, then it’s not convex. If the doughnut shape is all the space there is to consider, then it is. There’s a parallel here to prime numbers. Whether 5 is a prime depends on whether you think 5 is an integer, a real number, or a complex number.

Still, this seems easy to the point of boring. So how does Wolfram Mathworld match 337 items for ‘convex’? For a sense of scale, it has only 112 matches for ‘quadrilateral’. This is a word used almost as much as ‘quadrilateral’, with 370 items. Why?

Why is that it’s one of those terms that sneaks in everywhere. Some of it is obvious. There’s a concept called “star-convex”, where two points only need a connection by some path. It doesn’t have to be a straight line. That’s a familiar mathematical trick, coming up with a less-demanding version of a property. There’s the “convex hull”, which is the smallest convex set that contains a given set of points. We even come up with “convex functions”, functions of real numbers. A function’s convex if, the space above the graph of a function is convex. This seems like stretching the idea of convexity rather a bit.

Still, we wouldn’t coin such a term if we couldn’t use it. Well, if someone couldn’t use it. The saving thing here is the idea of “space”. We get it from our idea of what space is from looking around rooms and walking around hills and stuff. But what makes something a space? When we look at what’s essential? What we need is traits like, there are things. We can measure how far apart things are. We have some idea of paths between things. That’s not asking a lot.

So many things become spaces. And so convexity sneaks in everywhere. A convex function has nice properties if you’re looking for minimums. Or maximums; that’s as easy to do. And we look for minimums a lot. A large, practical set of mathematics is the search for optimum values, the set of values that maximize, or minimize, something. You may protest that not everything we’re intersted in is a convex function. This is true. But a lot of what we are interested in is, or is approximately.

This gets into some surprising corners. Economics, for example. The mathematics of economics is often interested in how much of a thing you can make. But you have to put things in to make it. You expect, at least once the system is set up, that if you halve the components you put in you get half the thing out. Or double the components in and get double the thing out. But you can run out of the components. Or related stuff, like, floor space to store partly-complete product. Or transport available to send this stuff to the customer. Or time to get things finished. For our needs these are all “things you can run out of”.

And so we have a problem of linear programming. We have something or other we want to optimize. Call it $y$. It depends on a whole range of variables, which we describe as a vector $\vec{x}$. And we have constraints. Each of these is an inequality; we can represent that as demanding some functions of these variables be at most some numbers. We can bundle those functions together as a matrix called $A$. We can bundle those maximum numbers together as a vector called $\vec{b}$. So the problem is finding $A\vec{x} \le \vec{b}$. Also, we demand that none of these values be smaller than some minimum we might as well call 0. The range of all the possible values of these variables is a space. These constraints chop up that space, into a shape. Into a convex shape, of course, or this paragraph wouldn’t belong in this essay. If you need to be convinced of this, imagine taking a wedge of cheese and hacking away slices all the way through it. How do you cut a cave or a tunnel in it?

So take this convex shape, called a polytope. That’s what we call a polygon or polyhedron if we don’t want to commit to any particular number of dimensions of space. (If we’re being careful. My suspicion is ‘polyhedron’ is more often said.) This makes a shape. Some point in that shape has the best possible value of $y$. (Also the worst, if that’s your thing.) Where is it? There is an answer, and it gives a pretext to share a fun story. The answer is that it’s on the outside, on one of the faces of the polytope. And you can find it following along the edges of those polytopes. This we know as the simplex method, or Dantzig’s Simplex Method if we must be more particular, for George Dantzig. Its success relies on looking at convex functions in convex spaces and how much this simplifies finding things.

Usually. The simplex method is one of polynomial-order complexity for normal, typical problems. That’s a measure of how much longer it takes to find an answer as you get more variables, more constraints, more work. Polynomial is okay, growing about the way it takes longer to multiply when you have more digits in the numbers. But there’s a worst case, in which the complexity grows exponentially. We shy away from exponential-complexity because … you know, exponentials grow fast, given a chance. What saves us is that that’s a worst case, not a typical case. The convexity lets us set up our problem and, rather often, solve it well enough.

Now the story, a mutation of which it’s likely you encountered. George Dantzig, as a student in Jerzy Neyman’s statistics class, arrived late one day to find a couple problems on the board. He took these to be homework, and struggled with the harder-than-usual set. But turned them in, apologizing for them being late. Neyman accepted the work, and eventually got around to looking at it. This wasn’t the homework. This was some unsolved problems in statistics. Six weeks later Neyman had prepared them for publication. A year later, Neyman explained to Dantzig that all he needed to earn his PhD was put these two papers together in a nice binder.

This cute story somehow escaped into the wild. It became an inspirational tale for more than mathematics grad students. That part’s easy to see; it has most everything inspiration needs. It mutated further, into the movie Good Will Hunting. I do not know that the unsolved problems, work done in the late 1930s, related to Dantzig’s simplex method, proved after World War II. It may be that they are simply connected in their originator. But perhaps it is more than I realize now.

I hope to finish off the word ‘Mathematics’ with the letter S next week. This week’s essay, and all the essays for the Little Mathematics A-to-Z, should be at this link. And all of this year’s essays, and all the A-to-Z essays from past years, should be at this link. Thank you for reading.

## I’m looking for the last topics for the Little 2021 Mathematics A-to-Z

I’m approaching the end of this year’s little Mathematics A-to-Z. The project’s been smaller, as I’d hoped, although I’m not sure I managed to make it any less hard on myself. Still, I’m glad to be doing it and glad to have the suggestions of you kind readers for topics. This quartet should wrap up the year, and the project.

So please let me know of any topics you’d like to see me try taking on. The topic should be anything mathematics-related, although I tend to take a broad view of mathematics-related. (I’m also open to biographical sketches.) To suggest something, please, say so in a comment. If you do, please also let me know about any projects you have — blogs, YouTube channels, real-world projects — that I should mention at the top of that essay.

I am happy to revisit a subject I think I have more to write about, so don’t be shy about suggesting those. Past essays for these letters include:

## Z.

And, as ever, all my A-to-Z essays should be at this link. Thanks for reading and thanks for sharing your thoughts.

## My Little 2021 Mathematics A-to-Z: Inverse

I owe Iva Sallay thanks for the suggestion of today’s topic. Sallay is a longtime friend of my blog here. And runs the Find the Factors recreational mathematics puzzle site. If you haven’t been following, or haven’t visited before, this is a fun week to step in again. The puzzles this week include (American) Thanksgiving-themed pictures.

# Inverse.

When we visit the museum made of a visual artist’s studio we often admire the tools. The surviving pencils and crayons, pens, brushes and such. We don’t often notice the eraser, the correction tape, the unused white-out, or the pages cut into scraps to cover up errors. To do something is to want to undo it. This is as true for the mathematics of a circle as it is for the drawing of one.

If not to undo something, we do often want to know where something comes from. A classic paper asks can one hear the shape of a drum? You hear a sound. Can you say what made that sound? Fine, dismiss the drum shape as idle curiosity. The same question applies to any sensory data. If our hand feels cooler here, where is the insulation of the building damaged? If we have this electrocardiogram reading, what can we say about the action of the heart producing that? If we see the banks of a river, what can we know about how the river floods?

And this is the point, and purpose, of inverses. We can understand them as finding the causes of what we observe.

The first inverse we meet is usually the inverse function. It’s introduced as a way to undo what a function does. That’s an odd introduction, if you’re comfortable with what a function is. A function is a mathematical construct. It’s two sets — a domain and a range — and a rule that links elements in the domain to the range. To “undo” a function is like “undoing” a rectangle. But a function has a compelling “physical” interpretation. It’s routine to introduce functions as machines that take some numbers in and give numbers out. We think of them as ways to transform the domain into the range. In functional analysis get to thinking of domains as the most perfect putty. We expect functions to stretch and rotate and compress and slide along as though they were drawing a Betty Boop cartoon.

So we’re trained to speak of a function as a verb, acting on pieces of the domain. An element or point, or a region, or the whole domain. We think the function “maps”, or “takes”, or “transforms” this into its image in the range. And if we can turn one thing into another, surely we can turn it back.

Some things it’s obvious we can turn back. Suppose our function adds 2 to whatever we give it. We can get the original back by subtracting 2. If the function subtracts 32 and divides by 1.8, we can reverse it by multiplying by 1.8 and adding 32. If the function takes the reciprocal, we can take the reciprocal again. We have a bit of a problem if we started out taking the reciprocal of 0, but who would want to do such a thing anyway? If the function squares a number, we can undo that by taking the square root. Unless we started from a negative number. Then we have trouble.

The trouble is not every function has an inverse. Which we could have realized by thinking how to undo “multiply by zero”. To be a well-defined function, the rule part has to match elements in the domain to exactly one element in the range. This makes the function, in the impenetrable jargon of the mathematician, a “one-to-one function”. Or you can describe it with the more intuitive label of “bijective”.

But there’s no reason more than one thing in the domain can’t match to the same thing in the range. If I know the cosine of my angle is $\frac{1}{2}$, my angle might be 30 degrees. Or -30 degrees. Or 390 degrees. Or 330 degrees. You may protest there’s no difference between a 30 degree and a 390 degree angle. I agree those angles point in the same direction. But a gear rotated 390 degrees has done something that a gear rotated 30 degrees hasn’t. If all I know is where the dot I’ve put on the gear is, how can I know how much it’s rotated?

So what we do is shift from the actual cosine into one branch of the cosine. By restricting the domain we can create a function that has the same rule as the one we want, but that’s also one-to-one and so has an inverse. What restriction to use? That depends on what you want. But mathematicians have some that come up so often they might as well be defaults. So the square root is the inverse of the square of nonnegative numbers. The inverse Cosine is the inverse of the cosine of angles from 0 to 180 degrees. The inverse Sine is the inverse of the sine of angles from -90 to 90 degrees. The capital letters are convention to say we’re doing this. If we want a different range, we write out that we’re looking for an inverse cosine from -180 to 0 degrees or whatever. (Yes, the mathematician will default to using radians, rather than degrees, for angles. That’s a different essay.) It’s an imperfect solution, but it often works well enough.

The trouble we had with cosines, and functions, continues through all inverses. There are almost always alternate causes. Many shapes of drums sound alike. Take two metal bars. Heat both with a blowtorch, one on the end and one in the center. Not to the point of melting, only to the point of being too hot to touch. Let them cool in insulated boxes for a couple weeks. There’ll be no measurement you can do on the remaining heat that tells you which one was heated on the end and which the center. That’s not because your thermometers are no good or the flow of heat is not deterministic or anything. It’s that both starting cases settle to the same end. So here there is no usable inverse.

This is not to call inverses futile. We can look for what we expect to find useful. We are inclined to find inverses of the cosine between 0 and 180 degrees, even though 4140 through 4320 degrees is as legitimate. We may not know what is wrong with a heart, but have some idea what a heart could do and still beat. And there’s a famous example in 19th-century astronomy. After the discovery of Uranus came the discovery it did not move right. For a while it moved across the sky too fast for its distance from the sun. Then it started moving too slow. The obvious supposition was that there was another, not-yet-seen, planet, affecting its orbit.

The trouble is finding it. Calculating the orbit from what data they had required solving equations with 13 unknown quantities. John Couch Adams and Urbain Le Verrier attempted this anyway, making suppositions about what they could not measure. They made great suppositions. Le Verrier made the better calculations, and persuaded an astronomer (Johann Gottfried Galle, assisted by Heinrich Louis d’Arrest) to go look. Took about an hour of looking. They also made lucky suppositions. Both, for example, supposed the trans-Uranian planet would obey “Bode’s Law”, a seeming pattern in the size of planetary radiuses. The actual Neptune does not. It was near enough in the sky to where the calculated planet would be, though. The world is vaster than our imaginations.

That there are many ways to draw Betty Boop does not mean there’s nothing to learn about how this drawing was done. And so we keep having inverses as a vibrant field of mathematics.

Next week I hope to cover the letter ‘C’ and don’t think I’m not worried about what that ‘C’ will be. This week’s essay, and all the essays for the Little Mathematics A-to-Z, should be at this link. And all of this year’s essays, and all the A-to-Z essays from past years, should be at this link. Thank you for reading.

## My Little 2021 Mathematics A-to-Z: Triangle

And I have another topic suggested by John Golden, author of Math Hombre. It’s one of the basic bits of mathematics, and so is hard to think about.

# Triangle.

Edward Brisse assembled a list of 2,001 things to call a “center” of a triangle. I’d have run out around three. We don’t need most of them. I mention them because the list speaks of how interesting we find triangles. Nobody’s got two thousand thoughts about enneadecagons (19-sided figures).

As always with mathematics it’s hard to say whether triangles are all that interesting or whether we humans are obsessed. They’ve got great publicity. The Pythagorean Theorem may be the only bit of interesting mathematics an average person can be assumed to recognize. The kinds of triangles — acute, obtuse, right, equilateral, isosceles, scalene — are fit questions for trivia games. An ordinary mathematics education can end in trigonometry. This ends up being about circles, but we learn it through triangles. The art and science of determining where a thing is we call “triangulation”.

But triangles do seem to stand out. They’re the simplest polygon, only three vertices and three edges. So we can slice any other polygon into triangles. Any triangle can tile the plane. Even quadrilaterals may need reflections of themselves. One of the first geometry facts we learn is the interior angles of a triangle add up to two right angles. And one of the first geometry facts we learn, discovering there are non-Euclidean geometries, is that they don’t have to.

Triangles have to be convex, that is, they don’t have any divots. This property sounds boring. But it’s a good boring; it makes other work easier. It tells us that the length of any two sides of a triangle add together to something longer than the third side. And that’s a powerful idea.

There are many ways to define “distance”. Mathematicians have tried to find the most abstract version of the concept. This inequality is one of the few pieces that every definition of “distance” must respect. This idea of distance leaps out of shapes drawn on paper. Last week I mentioned a triangle inequality, in discussing functions $f$ and $g$. We can define operators that describe a distance between functions. And the distances between trios of functions behave like the distances between points on the triangle. Thus does geometry sneak in to abstract concepts like “piecewise continuous functions”.

And they serve in curious blends of the abstract and the concrete. For example, numerical solutions to partial differential equations. A partial differential equation is one where we want to know a function of two or more variables, and only have information about how the function changes as those variables change. These turn up all the time in any study of things in bulk. Heat flowing through space. Waves passing through fluids. Fluids running through channels. So any classical physics problem that isn’t, like, balls bouncing against each other or planets orbiting stars. We can solve these if they’re linear. Linear here is a term of art meaning “easy”. I kid; “linear” means more like “manageable”. All the good problems are nonlinear and we can exactly solve about two of them.

So, numerical solutions. We make approximations by putting down a mesh on the differential equation’s domain. And then, using several graduate-level courses’ worth of tricks, approximating the equation we want with one that we can solve here. That mesh, though? … It can be many things. One powerful technique is “finite elements”. An element is a small piece of space. Guess what the default shape for these elements are. There are times, and reasons, to use other shapes as elements. You learn those once you have the hang of triangles. (Dividing the space of your variables up into elements lets you look for an approximate solution using tools easier to manage than you’d have without. This is a bit like looking for one’s keys over where the light is better. But we can find something that’s as close as we need to our keys.)

If we need finite elements for, oh, three dimensions of space, or four, then triangles fail us. We can’t fill a volume with two-dimensional shapes like triangles. But the triangle has its analog. The tetrahedron, in some sense four triangles joined together, has all the virtues of the triangle for three dimensions. We can look for a similar shape in four and five and more dimensions. If we’re looking for the thing most like an equilateral triangle, we’re looking for a “simplex”.

These simplexes, or these elements, sprawl out across the domain we want to solve problems for. They look uncannily like the triangles surveyors draw across the chart of a territory, as they show us where things are.

Next week I hope to cover the letter ‘I’ as I near the end of ‘Mathematics’ and consider what to do about ‘A To Z’. This week’s essay, and all the essays for the Little Mathematics A-to-Z, should be at this link. And all of this year’s essays, and all the A-to-Z essays from past years, should be at this link. Thank you once more for reading.

## My Little 2021 Mathematics A-to-Z: Analysis

I’m fortunate this week to have another topic suggested again by Mr Wu, blogger and Singaporean mathematics tutor. It’s a big field, so forgive me not explaining the entire subject.

# Analysis.

Analysis is about proving why the rest of mathematics works. It’s a hard field. My experience, a typical one, included crashing against real analysis as an undergraduate and again as a graduate student. It turns out mathematics works by throwing a lot of $\epsilon$ symbols around.

Let me give an example. If you read pop mathematics blogs you know about the number represented by $0.999999\cdots$. You’ve seen proofs, some of them even convincing, that this number equals 1. Not a tiny bit less than 1, but exactly 1. Here’s a real-analysis treatment. And — I may regret this — I recommend you don’t read it. Not closely, at least. Instead, look at its shape. Look at the words and symbols as graphic design elements, and trust that what I say is not nonsense. Resume reading after the horizontal rule.

It’s convenient to have a name for the number $0.999999\cdots$. I’ll call that $r$, for “repeating”. 1 we’ll call 1. I think you’ll grant that whatever r is, it can’t be more than 1. I hope you’ll accept that if the difference between 1 and r is zero, then r equals 1. So what is the difference between 1 and r?

Give me some number $\epsilon$. It has to be a positive number. The implication in the letter $\epsilon$ is that it’s a small number. This isn’t actually required in general. We expect it. We feel surprise and offense if it’s ever not the case.

I can show that the difference between 1 and r is less than $\epsilon$. I know there is some smallest counting number N so that $\epsilon > \frac{1}{10^{N}}$. For example, say $\epsilon$ is 0.125. Then we can let N = 1, and $0.125 > \frac{1}{10^{1}}$. Or suppose $\epsilon$ is 0.00625. But then if N = 3, $0.00625 > \frac{1}{10^{3}}$. (If $\epsilon$ is bigger than 1, let N = 1.) Now we have to ask why I want this N.

Whatever the value of r is, I know that it is more than 0.9. And that it is more than 0.99. And that it is more than 0.999. In fact, it’s more than the number you get by truncating r after any whole number N of digits. Let me call $r_N$ the number you get by truncating r after N digits. So, $r_1 = 0.9$ and $r_2 = 0.99$ and $r_5 = 0.99999$ and so on.

Since $r > r_N$, it has to be true that $1 - r < 1 - r_N$. And since we know what $r_N$ is, we can say exactly what $1 - r_N$ is. It's $\frac{1}{10^{N}}$. And we picked N so that $\frac{1}{10^{N}} < \epsilon$. So $1 - r < 1 - r_N = \frac{1}{10^{N}} < \epsilon$. But all we know of $\epsilon$ is that it's a positive number. It can be any positive number. So $1 - r$ has to be smaller than each and every positive number. The biggest number that’s smaller than every positive number is zero. So the difference between 1 and r must be zero and so they must be equal.

That is a compelling argument. Granted, it compels much the way your older brother kneeling on your chest and pressing your head into the ground compels. But this argument gives the flavor of what much of analysis is like.

For one, it is fussy, leaning to technical. You see why the subject has the reputation of driving off all but the most intent mathematics majors. If you get comfortable with this sort of argument it’s hard to notice anymore.

For another, the argument shows that the difference between two things is less than every positive number. Therefore the difference is zero and so the things are equal. This is one of mathematics’ most important tricks. And another point, there’s a lot of talk about $\epsilon$. And about finding differences that are, it usually turns out, smaller than some $\epsilon$. (As an undergraduate I found something wasteful in how the differences were so often so much less than $\epsilon$. We can’t exhaust the small numbers, though. It still feels uneconomic.)

Something this misses is another trick, though. That’s adding zero. I couldn’t think of a good way to use that here. What we often get is the need to show that, say, function $f$ and function $g$ are equal. That is, that they are less than $\epsilon$ apart. What we can often do is show that $f$ is close to some related function, which let me call $f_n$.

I know what you’re suspecting: $f_n$ must be a polynomial. Good thought! Although in my experience, it’s actually more likely to be a piecewise constant function. That is, it’s some number, eg, “2”, for part of the domain, and then “2.5” in some other region, with no transition between them. Some other values, even values not starting with “2”, in other parts of the domain. Usually this is easier to prove stuff about than even polynomials are.

But get back to $g_n$. It’s got the same deal as $f_n$, some approximation easier to prove stuff about. Then we want to show that $g$ is close to some $g_n$. And then show that $f_n$ is close to $g_n$. So — watch this trick. Or, again, watch the shape of this trick. Read again after the horizontal rule.

The difference $| f - g |$ is equal to $| f - f_n + f_n - g |$ since adding zero, that is, adding the number $( -f_n + f_n )$, can’t change a quantity. And $| f - f_n + f_n - g |$ is equal to $| f - f_n + f_n -g_n + g_n - g |$. Same reason: $( -g_n + g_n )$ is zero. So:

$| f - g | = |f - f_n + f_n -g_n + g_n - g |$

Now we use the “triangle inequality”. If a, b, and c are the lengths of a triangle’s sides, the sum of any two of those numbers is larger than the third. And that tells us:

$|f - f_n + f_n -g_n + g_n - g | \le |f - f_n| + |f_n - g_n| + | g_n - g |$

And then if you can show that $| f - f_n |$ is less than $\frac{1}{3}\epsilon$? And that $| f_n - g_n |$ is also $\frac{1}{3}\epsilon$? And you see where this is going for $| g_n - g |$? Then you’ve shown that $| f - g | \le \epsilon$. With luck, each of these little pieces is something you can prove.

Don’t worry about what all this means. It’s meant to give a flavor of what you do in an analysis course. It looks hard, but most of that is because it’s a different sort of work than you’d done before. If you hadn’t seen the adding-zero and triangle-inequality tricks? I don’t know how long you’d need to imagine them.

There are other tricks too. An old reliable one is showing that one thing is bounded by the other. That is, that $f \le g$. You use this trick all the time because if you can also show that $g \le f$, then those two have to be equal.

The good thing — and there is good — is that once you get the hang of these tricks analysis starts to come together. And even get easier. The first course you take as a mathematics major is real analysis, all about functions of real numbers. The next course in this track is complex analysis, about functions of complex-valued numbers. And it is easy. Compared to what comes before, yes. But also on its own. Every theorem in complex analysis named after Augustin-Louis Cauchy. They all show that the integral of your function, calculated along a closed loop, is zero. I exaggerate by $\epsilon$.

In grad school, if you make it, you get to functional analysis, which examines functions on functions and other abstractions like that. This, too, is easy, possibly because all the basic approaches you’ve seen several courses over. Or it feels easy after all that mucking around with the real numbers.

This is not the entirety of explaining how mathematics works. Since all these proofs depend on how numbers work, we need to show how numbers work. How logic works. But those are subjects we can leave for grad school, for someone who’s survived this gauntlet.

I hope to return in a week with a fresh A-to-Z essay. This week’s essay, and all the essays for the Little Mathematics A-to-Z, should be at this link. And all this year’s essays, and all A-to-Z essays from past years, should be at this link. Thank you once more for reading.

## How October 2021 Treated My Mathematics Blog

I’m aware this is a fair bit into October. But it’s the first publication slot I’ve had free. At least since I want Wednesdays to take the Little 2021 A-to-Z essays, and Mondays the other thing I publish. If that, since October ended up another month when I barely managed one essay a week. Let me jump right to that, in fact. The five essays published here in October ranked like this, in popularity, and it’s not just order of publication:

I don’t know what made “Embedding” so popular. I’d suspect I may have hit a much-searched-for keyword except it doesn’t seem to be popular so far in November.

So I got 2,547 page views around here in October. This is up from the last couple months. It’s quite average for the twelve months from October 2020 through September 2021, though. The twelve-month running mean was 2,543.2 page views per month, and the running median of 2,569 views per month. I told you it was average.

There were 1,733 unique visitors, as WordPress makes it out. That’s almost, but a bit below average. The running mean was 1,811.3 visitors per month for the twelve months leading up to October. The running median was 1,801 unique visitors. I can make this into something good; it implies people who visited read more stuff. A mere 30 likes were given in October, below the running mean of 47.5 and median of 45. And there were only five comments, below the mean of 16.2 and median of 12.

Given that I’m barely posting anymore, though, the numbers look all right. This was 509.4 views per posting, which creams the running mean of 286.0 and running median of 295.9 views per posting. There were 346.8 unique visitors per posting, even more above the running mean of 203.2 and running median of 205.6 unique visitors per posting. Rating things per posting even makes the number of likes look good: 6.0 per posting, above the mean of 5.2 and median of 4.9. Can’t help with comments, though. Those hang out at a still-anemic 1.0 comments per posting, below the running mean of 1.9 and median of 1.4.

WordPress figures that I published 5,335 words in October, an average of 1,067.0 words per posting. That is my second-chattiest month all year, and my longest words-per-posting for the month. I don’t know where all those words came from. So far for all of 2021 I’ve published 44,323 words, averaging 599 words per essay.

As of the start of November I’ve published 1,656 essays here. They’ve drawn a total 146,834 views from 87,340 logged unique visitors. And drawn 3,285 comments altogether, so far.

If you’d like to follow this blog regularly, please do. You can use the “Follow Nebusresearch” button at the upper right corner of this page. Or you can get essays by e-mail as soon as they’re published, using the box just below that button. I never use the e-mail for anything but sending these essays. I can’t say what WordPress does with them, though.

While my Twitter account is unattended — all it does is post announcements of essays; I don’t see anything from it — I am on Mathstodon, the mathematics-themed instance of the Mastodon network. So you can catch me as @nebusj@mathstodon.xyz there, and I’m not sure anyone has yet. Still, thank you for reading, and here’s hoping for a good November.

## My Little 2021 Mathematics A-to-Z: Monte Carlo

This week’s topic is one of several suggested again by Mr Wu, blogger and Singaporean mathematics tutor. He’d suggested several topics, overlapping in their subject matter, and I was challenged to pick one.

# Monte Carlo.

The reputation of mathematics has two aspects: difficulty and truth. Put “difficulty” to the side. “Truth” seems inarguable. We expect mathematics to produce sound, deductive arguments for everything. And that is an ideal. But we often want to know things we can’t do, or can’t do exactly. We can handle that often. If we can show that a number we want must be within some error range of a number we can calculate, we have a “numerical solution”. If we can show that a number we want must be within every error range of a number we can calculate, we have an “analytic solution”.

There are many things we’d like to calculate and can’t exactly. Many of them are integrals, which seem like they should be easy. We can represent any integral as finding the area, or volume, of a shape. The trick is that there’s only a few shapes with volumes we can find exact formulas for. You may remember the area of a triangle or a parallelogram. You have no idea what the area of a regular nonagon is. The trick we rely on is to approximate the shape we want with shapes we know formulas for. This usually gives us a numerical solution.

If you’re any bit devious you’ve had the impulse to think of a shape that can’t be broken up like that. There are such things, and a good swath of mathematics in the late 19th and early 20th centuries was arguments about how to handle them. I don’t mean to discuss them here. I’m more interested in the practical problems of breaking complicated shapes up into simpler ones and adding them all together.

One catch, an obvious one, is that if the shape is complicated you need a lot of simpler shapes added together to get a decent approximation. Less obvious is that you need way more shapes to do a three-dimensional volume well than you need for a two-dimensional area. That’s important because you need even way-er more to do a four-dimensional hypervolume. And more and more and more for a five-dimensional hypervolume. And so on.

That matters because many of the integrals we’d like to work out represent things like the energy of a large number of gas particles. Each of those particles carries six dimensions with it. Three dimensions describe its position and three dimensions describe its momentum. Worse, each particle has its own set of six dimensions. The position of particle 1 tells you nothing about the position of particle 2. So you end up needing ridiculously, impossibly many shapes to get even a rough approximation.

With no alternative, then, we try wisdom instead. We train ourselves to think of deductive reasoning as the only path to certainty. By the rules of deductive logic it is. But there are other unshakeable truths. One of them is randomness.

We can show — by deductive logic, so we trust the conclusion — that the purely random is predictable. Not in the way that lets us say how a ball will bounce off the floor. In the way that we can describe the shape of a great number of grains of sand dropped slowly on the floor.

The trick is one we might get if we were bad at darts. If we toss darts at a dartboard, badly, some will land on the board and some on the wall behind. How many hit the dartboard, compared to the total number we throw? If we’re as likely to hit every spot of the wall, then the fraction that hit the dartboard, times the area of the wall, should be about the area of the dartboard.

So we can do something equivalent to this dart-throwing to find the volumes of these complicated, hyper-dimensional shapes. It’s a kind of numerical integration. It isn’t particularly sensitive to how complicated the shape is, though. It takes more work to find the volume of a shape with more dimensions, yes. But it takes less more-work than the breaking-up-into-known-shapes method does. There are wide swaths of mathematics and mathematical physics where this is the best way to calculate the integral.

This bit that I’ve described is called “Monte Carlo integration”. The “integration” part of the name because that’s what we started out doing. To call it “Monte Carlo” implies either the method was first developed there or the person naming it was thinking of the famous casinos. The case is the latter. Monte Carlo methods as we know them come from Stanislaw Ulam, mathematical physicist working on atomic weapon design. While ill, he got to playing the game of Canfield solitaire, about which I know nothing except that Stanislaw Ulam was playing it in 1946 while ill. He wondered what the chance was that a given game was winnable. The most practical approach was sampling: set a computer to play a great many games and see what fractions of them were won. (The method comes from Ulam and John von Neumann. The name itself comes from their colleague Nicholas Metropolis.)

There are many Monte Carlo methods, with integration being only one very useful one. They hold in common that they’re build on randomness. We try calculations — often simple ones — many times over with many different possible values. And the regularity, the predictability, of randomness serves us. The results come together to an average that is close to the thing we do want to know.

I hope to return in a week with a fresh A-to-Z essay. This week’s essay, and all the essays for the Little Mathematics A-to-Z, should be at this link. And all of this year’s essays, and all A-to-Z essays from past years, should be at this link. And if you’d like to shape the next several essays, please let me know of some topics worth writing about! Thank you for reading.

## I’m looking for some more topics for the Little 2021 Mathematics A-to-Z

I am happy to be near the midpoint of my Little 2021 Mathematics A-to-Z. It feels like forever since I planned to start this, but it has been a long and a hard year. I am in need of topics for the third quarter of letters, the end of the world ‘Mathematics’, and so I appeal to my kind readers for help.

What are mathematical topics which start with the letters I, C, or S, that you’d like to see me try explaining? Leave a comment, and let me know. I’ll pick the one I think I can be most interesting about. As you nominate things, please also include a mention of your own blog or YouTube channel or book. Whatever other projects you do that people might enjoy. The projects don’t need to be mathematical. The topics don’t need to be either, although I like being able to see mathematics from them.

Here are the topics I’ve covered in past years. I’m willing to consider redoing one of these, if I can find a fresh approach. So don’t be afraid to ask if you think I might do a better job about, oh, cohomology or something.

## S.

(Please note: there’s nothing I can do with cohomology. I did my best and that’s how it came out.)

All the Little 2021 A-to-Z essays should be at this link. And if you like, all of my A-to-Z essays, for every year, should be at this link. Thanks for reading, and thanks for suggesting things.

## My Little 2021 Mathematics A-to-Z: Embedding

Elkement, who’s one of my longest blog-friends here, put forth this suggestion for an ‘E’ topic. It’s a good one. They’re author of the Theory and Practice of Trying to Combine Just Anything blog. Their blog has recently been exploring complex-valued numbers and how to represent rotations.

# Embedding.

Consider a book. It’s a collection. It’s easy to see the ordered setting of words, maybe pictures, possibly numbers or even equations. The important thing is the ideas those all represent.

Set the book in a library. How can this change the book?

Perhaps the comparison to other books shows us something the original book neglected. Perhaps something in the original book we now realize was a brilliantly-presented insight. The way we appreciate the book may change.

What can’t change is the content of the original book. The words stay the same, in the same order. If it’s a physical book, the number of pages stays the same, as does the size of the page. The ideas expressed remain the same.

So now you understand embedding. It’s a broad concept, something that can have meaning for any mathematical structure. A structure here is a bunch of items and some things you can do with them. A group, for example, is a good structure to use with this sort of thing. So, for example, the integers and regular addition. This original structure’s embedded in another when everything in the original structure is in the new, and everything you can do with the original structure you can do in the new and get the same results. So, for example, the group you get by taking the integers and regular addition? That’s embedded in the group you get by taking the rational numbers and regular addition. 4 + 8 is 12 whether or not you consider 6.5 a topic fit for discussion. It’s an embedding that expands the set of elements, and that modifies the things you can do to match.

The group you get from the integers and addition is embedded in other things. For example, it’s embedded in the ring you get from the integers and regular addition and regular multiplication. 4 + 8 remains 12 whether or not you can multiply 4 by 8. This embedding doesn’t add any new elements, just new things you can do with them.

Once you have the name, you see embedding everywhere. When we first learn arithmetic we — I, anyway — learn it as adding whole numbers together. Then we embed that into whole numbers with addition and multiplication. And then the (nonnegative) rational numbers with addition and multiplication. At some point (I forget when) the negative numbers came in. So did the whole set of real numbers. Eventually the real numbers got embedded into the complex numbers. And the complex numbers got embedded into the quaternions, although we found real and complex numbers enough for most of our work. I imagine something similar goes on these days.

There’s never only one embedding possible. Consider, for example, two-dimensional geometry, the shapes of figures on a sheet of paper. It’s easy to put that in three dimensions, by setting the paper on the floor, and expand it by drawing in chalk on the wall. Or you can set the paper on the wall, and extend its figures by drawing in chalk on the floor. Or set the paper at an angle to the floor. What you use depends on what’s most convenient. And that can be driven by laziness. It’s easy to match, say, the point in two dimensions at coordinates (3, 4) with the point in three dimensions at coordinates (3, 4, 0), even though (0, 3, 4) or (4, 0, 3) are as valid.

Why embed something in another thing? For the same reasons we do any transformation in mathematics. One is that we figure to embed the thing we’re working on into something easier to deal with. A famous example of this is the Nash embedding theorem. It describes when certain manifolds can be embedded into something that looks like normal space. And that’s useful because it can turn nonlinear partial differential equations — the most insufferable equations — into something solvable.

Another good reason, though, is the one implicit in that early arithmetic education. We started with whole-numbers-with-addition. And then we added the new operation of multiplication. And then new elements, like fractions and negative numbers. If we follow this trail we get to some abstract, tricky structures like octonions. But by small steps in which we have great experience guiding us into new territories.

I hope to return in a week with a fresh A-to-Z essay. This week’s essay, and all the essays for the Little Mathematics A-to-Z, should be at this link. And all of this year’s essays, and all A-to-Z essays from past years, should be at this link. Thank you once more for reading.