This function — I guess it’s the “popcorn” function — is a challenge to our ideas about what a “continuous” function is. I’ve mentioned “continuous” functions before and said something like they’re functions you could draw without lifting your pen from the paper. That’s the colloquial, and the intuitive, idea of what they mean. And that’s all right for ordinary uses.

But the best definition mathematicians have thought of for a “continuous function” has some quirks. And here’s one of them. Define a function named ‘f’. Its domain is the real numbers. Its range is the real numbers. And the rule matching things in the domain to things in the range is, as pictured:

If ‘x’ is zero then

If ‘x’ is an irrational number then

If ‘x’ is a rational number, then it’s equal in lowest terms to the whole number ‘p’ divided by the positive whole number ‘q’. And for this ‘x’, then

And as the tweet from Fermat’s Library says, this is a function that’s continuous on all the irrational numbers. It’s not continuous on any rational numbers. This seems like a prank. But it’s a common approach to finding intuition-testing ideas about continuity. Setting different rules for rational and irrational numbers works well for making these strange functions. And thinking of rational numbers as their representation in lowest terms is also common. (Writing it as ‘p divided by q’ suggests that ‘p’ and ‘q’ are going to be prime, but, no! Think of or of .) If you stare at the plot you can maybe convince yourself that “continuous on the irrational numbers” makes sense here. That heavy line of dots at the bottom looks like it’s approaching a continuous blur, at least.

It can get weirder. It’s possible to create a function that’s continuous at only a single point of all the real numbers. This is why Real Analysis is such a good subject to crash hard against. But we accept weird conclusions like this because the alternative is to give up as “continuous” functions that we just know have to be continuous. Mathematical definitions are things we make for our use.

Gaurish gives me another topic for today. I’m now no longer sure whether Gaurish hopes me to become a topology blogger or a category theory blogger. I have the last laugh, though. I’ve wanted to get better-versed in both fields and there’s nothing like explaining something to learn about it.

Functor.

So, category theory. It’s a foundational field. It talks about stuff that’s terribly abstract. This means it’s powerful, but it can be hard to think of interesting examples. I’ll try, though.

It starts with categories. These have three parts. The first part is a set of things. (There always is.) The second part is a collection of matches between pairs of things in the set. They’re called morphisms. The third part is a rule that lets us combine two morphisms into a new, third one. That is. Suppose ‘a’, ‘b’, and ‘c’ are things in the set. Then there’s a morphism that matches , and a morphism that matches . And we can combine them into another morphism that matches . So we have a set of things, and a set of things we can do with those things. And the set of things we can do is itself a group.

This describes a lot of stuff. Group theory fits seamlessly into this description. Most of what we do with numbers is a kind of group theory. Vector spaces do too. Most of what we do with analysis has vector spaces underneath it. Topology does too. Most of what we do with geometry is an expression of topology. So you see why category theory is so foundational.

Functors enter our picture when we have two categories. Or more. They’re about the ways we can match up categories. But let’s start with two categories. One of them I’ll name ‘C’, and the other, ‘D’. A functor has to match everything that’s in the set of ‘C’ to something that’s in the set of ‘D’.

And it does more. It has to match every morphism between things in ‘C’ to some other morphism, between corresponding things in ‘D’. It’s got to do it in a way that satisfies that combining, too. That is, suppose that ‘f’ and ‘g’ are morphisms for ‘C’. And that ‘f’ and ‘g’ combine to make ‘h’. Then, the functor has to match ‘f’ and ‘g’ and ‘h’ to some morphisms for ‘D’. The combination of whatever ‘f’ matches to and whatever ‘g’ matches to has to be whatever ‘h’ matches to.

This might sound to you like a homomorphism. If it does, I admire your memory or mathematical prowess. Functors are about matching one thing to another in a way that preserves structure. Structure is the way that sets of things can interact. We naturally look for stuff made up of different things that have the same structure. Yes, functors are themselves a category. That is, you can make a brand-new category whose set of things are the functors between two other categories. This is a good spot to pause while the dizziness passes.

There are two kingdoms of functor. You tell them apart by what they do with the morphisms. Here again I’m going to need my categories ‘C’ and ‘D’. I need a morphism for ‘C’. I’ll call that ‘f’. ‘f’ has to match something in the set of ‘C’ to something in the set of ‘C’. Let me call the first something ‘a’, and the second something ‘b’. That’s all right so far? Thank you.

Let me call my functor ‘F’. ‘F’ matches all the elements in ‘C’ to elements in ‘D’. And it matches all the morphisms on the elements in ‘C’ to morphisms on the elmenets in ‘D’. So if I write ‘F(a)’, what I mean is look at the element ‘a’ in the set for ‘C’. Then look at what element in the set for ‘D’ the functor matches with ‘a’. If I write ‘F(b)’, what I mean is look at the element ‘b’ in the set for ‘C’. Then pick out whatever element in the set for ‘D’ gets matched to ‘b’. If I write ‘F(f)’, what I mean is to look at the morphism ‘f’ between elements in ‘C’. Then pick out whatever morphism between elements in ‘D’ that that gets matched with.

Here’s where I’m going with this. Suppose my morphism ‘f’ matches ‘a’ to ‘b’. Does the functor of that morphism, ‘F(f)’, match ‘F(a)’ to ‘F(b)’? Of course, you say, what else could it do? And the answer is: why couldn’t it match ‘F(b)’ to ‘F(a)’?

No, it doesn’t break everything. Not if you’re consistent about swapping the order of the matchings. The normal everyday order, the one you’d thought couldn’t have an alternative, is a “covariant functor”. The crosswise order, this second thought, is a “contravariant functor”. Covariant and contravariant are distinctions that weave through much of mathematics. They particularly appear through tensors and the geometry they imply. In that introduction they tend to be difficult, even mean, creations, since in regular old Euclidean space they don’t mean anything different. They’re different for non-Euclidean spaces, and that’s important and valuable. The covariant versus contravariant difference is easier to grasp here.

Functors work their way into computer science. The avenue here is in functional programming. That’s a method of programming in which instead of the normal long list of commands, you write a single line of code that holds like fourteen “->” symbols that makes the computer stop and catch fire when it encounters a bug. The advantage is that when you have the code debugged it’s quite speedy and memory-efficient. The disadvantage is if you have to alter the function later, it’s easiest to throw everything out and start from scratch, beginning from vacuum-tube-based computing machines. But it works well while it does. You just have to get the hang of it.

So you know how the Earth is a sphere, but from our normal vantage point right up close to its surface it looks flat? That happens with functions too. Here I mean the normal kinds of functions we deal with, ones with domains that are the real numbers or a Euclidean space. And ranges that are real numbers. The functions you can draw on a sheet of paper with some wiggly bits. Let the function wiggle as much as you want. Pick a part of it and zoom in close. That zoomed-in part will look straight. If it doesn’t look straight, zoom in closer.

We rely on this. Functions that are straight, or at least straight enough, are easy to work with. We can do calculus on them. We can do analysis on them. Functions with plots that look like straight lines are easy to work with. Often the best approach to working with the function you’re interested in is to approximate it with an easy-to-work-with function. I bet it’ll be a polynomial. That serves us well. Polynomials are these continuous functions. They’re differentiable. They’re smooth.

That thing about the Earth looking flat, though? That’s a lie. I’ve never been to any of the really great cuts in the Earth’s surface, but I have been to some decent gorges. I went to grad school in the Hudson River Valley. I’ve driven I-80 over Pennsylvania’s scariest bridges. There’s points where the surface of the Earth just drops a great distance between your one footstep and your last.

Functions do that too. We can have points where a function isn’t differentiable, where it’s impossible to define the direction it’s headed. We can have points where a function isn’t continuous, where it jumps from one region of values to another region. Everyone knows this. We can’t dismiss those as abberations not worthy of the name “function”; too many of them are too useful. Typically we handle this by admitting there’s points that aren’t continuous and we chop the function up. We make it into a couple of functions, each stretching from discontinuity to discontinuity. Between them we have continuous region and we can go about our business as before.

Then came the 19th century when things got crazy. This particular craziness we credit to Karl Weierstrass. Weierstrass’s name is all over 19th century analysis. He had that talent for probing the limits of our intuition about basic mathematical ideas. We have a calculus that is logically rigorous because he found great counterexamples to what we had assumed without proving.

The Weierstrass function challenges this idea that any function is going to eventually level out. Or that we can even smooth a function out into basically straight, predictable chunks in-between sudden changes of direction. The function is continuous everywhere; you can draw it perfectly without lifting your pen from paper. But it always looks like a zig-zag pattern, jumping around like it was always randomly deciding whether to go up or down next. Zoom in on any patch and it still jumps around, zig-zagging up and down. There’s never an interval where it’s always moving up, or always moving down, or even just staying constant.

Despite being continuous it’s not differentiable. I’ve described that casually as it being impossible to predict where the function is going. That’s an abuse of words, yes. The function is defined. Its value at a point isn’t any more random than the value of “x^{2}” is for any particular x. The unpredictability I’m talking about here is a side effect of ignorance. Imagine I showed you a plot of “x^{2}” with a part of it concealed and asked you to fill in the gap. You’d probably do pretty well estimating it. The Weierstrass function, though? No; your guess would be lousy. My guess would be lousy too.

That’s a weird thing to have happen. A century and a half later it’s still weird. It gets weirder. The Weierstrass function isn’t differentiable generally. But there are exceptions. There are little dots of differentiability, where the rate at which the function changes is known. Not intervals, though. Single points. This is crazy. Derivatives are about how a function changes. We work out what they should even mean by thinking of a function’s value on strips of the domain. Those strips are small, but they’re still, you know, strips. But on almost all of that strip the derivative isn’t defined. It’s only at isolated points, a set with measure zero, that this derivative even exists. It evokes the medieval Mysteries, of how we are supposed to try, even though we know we shall fail, to understand how God can have contradictory properties.

It’s not quite that Mysterious here. Properties like this challenge our intuition, if we’ve gotten any. Once we’ve laid out good definitions for ideas like “derivative” and “continuous” and “limit” and “function” we can work out whether results like this make sense. And they — well, they follow. We can avoid weird conclusions like this, but at the cost of messing up our definitions for what a “function” and other things are. Making those useless. For the mathematical world to make sense, we have to change our idea of what quite makes sense.

That’s all right. When we look close we realize the Earth around us is never flat. Even reasonably flat areas have slight rises and falls. The ends of properties are marked with curbs or ditches, and bordered by streets that rise to a center. Look closely even at the dirt and we notice that as level as it gets there are still rocks and scratches in the ground, clumps of dirt an infinitesimal bit higher here and lower there. The flatness of the Earth around us is a useful tool, but we miss a lot by pretending it’s everything. The Weierstrass function is one of the ways a student mathematician learns that while smooth, predictable functions are essential, there is much more out there.

Mathematicians affect a pose of objectivity. We justify this by working on things whose truth we can know, and which must be true whenever we accept certain rules of deduction and certain definitions and axioms. This seems fair. But we choose to pay attention to things that interest us for particular reasons. We study things we like. My A To Z glossary term for today is about one of those things we like.

Smooth.

Functions. Not everything mathematicians do is functions. But functions turn up a lot. We need to set some rules. “A function” is so generic a thing we can’t handle it much. Narrow it down. Pick functions with domains that are numbers. Range too. By numbers I mean real numbers, maybe complex numbers. That gives us something.

There’s functions that are hard to work with. This is almost all of them, so we don’t touch them unless we absolutely must. But they’re functions that aren’t continuous. That means what you imagine. The value of the function at some point is wholly unrelated to its value at some nearby point. It’s hard to work with anything that’s unpredictable like that. Functions as well as people.

We like functions that are continuous. They’re predictable. We can make approximations. We can estimate the function’s value at some point using its value at some more convenient point. It’s easy to see why that’s useful for numerical mathematics, for calculations to approximate stuff. The dazzling thing is it’s useful analytically. We step into the Platonic-ideal world of pure mathematics. We have tools that let us work as if we had infinitely many digits of precision, for infinitely many numbers at once. And yet we use estimates and approximations and errors. We use them in ways to give us perfect knowledge; we get there by estimates.

Continuous functions are nice. Well, they’re nicer to us than functions that aren’t continuous. But there are even nicer functions. Functions nicer to us. A continuous function, for example, can have corners; it can change direction suddenly and without warning. A differentiable function is more predictable. It can’t have corners like that. Knowing the function well at one point gives us more information about what it’s like nearby.

The derivative of a function doesn’t have to be continuous. Grumble. It’s nice when it is, though. It makes the function easier to work with. It’s really nice for us when the derivative itself has a derivative. Nothing guarantees that the derivative of a derivative is continuous. But maybe it is. Maybe the derivative of the derivative has a derivative. That’s a function we can do a lot with.

A function is “smooth” if it has as many derivatives as we need for whatever it is we’re doing. And if those derivatives are continuous. If this seems loose that’s because it is. A proof for whatever we’re interested in might need only the original function and its first derivative. It might need the original function and its first, second, third, and fourth derivatives. It might need hundreds of derivatives. If we look through the details of the proof we might find exactly how many derivatives we need and how many of them need to be continuous. But that’s tedious. We save ourselves considerable time by saying the function is “smooth”, as in, “smooth enough for what we need”.

If we do want to specify how many continuous derivatives a function has we call it a “C^{k} function”. The C here means continuous. The ‘k’ means there are the number ‘k’ continuous derivatives of it. This is completely different from a “C^{k} function”, which would be one that’s a k-dimensional vector. Whether the “C” is boldface or not is important. A function might have infinitely many continuous derivatives. That we call a “C^{∞} function”. That’s got wonderful properties, especially if the domain and range are complex-valued numbers. We couldn’t do Complex Analysis without it. Complex Analysis is the course students take after wondering how they’ll ever survive Real Analysis. It’s much easier than Real Analysis. Mathematics can be strange.

Functions. They’re at the center of so much mathematics. They have three pieces: a domain, a range, and a rule. The one thing functions absolutely must do is match stuff in the domain to one and only one thing in the range. So this is where it gets tricky.

Principal.

Thing with this one-and-only-one thing in the range is it’s not always practical. Sometimes it only makes sense to allow for something in the domain to match several things in the range. For example, suppose we have the domain of positive numbers. And we want a function that gives us the numbers which, squared, are whatever the original function was. For any positive real number there’s two numbers that do that. 4 should match to both +2 and -2.

You might ask why I want a function that tells me the numbers which, squared, equal something. I ask back, what business is that of yours? I want a function that does this and shouldn’t that be enough? We’re getting off to a bad start here. I’m sorry; I’ve been running ragged the last few days. I blame the flat tire on my car.

Anyway. I’d want something like that function because I’m looking for what state of things makes some other thing true. This turns up often in “inverse problems”, problems in which we know what some measurement is and want to know what caused the measurement. We do that sort of problem all the time.

We can handle these multi-valued functions. Of course we can. Mathematicians are as good at loopholes as anyone else is. Formally we declare that the range isn’t the real numbers but rather sets of real numbers. My what-number-squared function then matches ‘4’ in the domain to the set of numbers ‘+2 and -2’. The set has several things in it, but there’s just the one set. Clever, huh?

This sort of thing turns up a lot. There’s two numbers that, squared, give us any real number (except zero). There’s three numbers that, squared, give us any real number (again except zero). Polynomials might have a whole bunch of numbers that make some equation true. Trig functions are worse. The tangent of 45 degrees equals 1. So is the tangent of 225 degrees. Also 405 degrees. Also -45 degrees. Also -585 degrees. OK, a mathematician would use radians instead of degrees, but that just changes what the numbers are. Not that there’s infinitely many of them.

It’s nice to have options. We don’t always want options. Sometimes we just want one blasted simple answer to things. It’s coded into the language. We say “the square root of four”. We speak of “the arctangent of 1”, which is to say, “the angle with tangent of 1”. We only say “all square roots of four” if we’re making a point about overlooking options.

If we’ve got a set of things, then we can pick out one of them. This is obvious, which means it is so very hard to prove. We just have to assume we can. Go ahead; assume we can. Our pick of the one thing out of this set is the “principal”. It’s not any more inherently right than the other possibilities. It’s just the one we choose to grab first.

So. The principal square root of four is positive two. The principal arctangent of 1 is 45 degrees, or in the dialect of mathematicians π divided by four. We pick these values over other possibilities because they’re nice. What makes them nice? Well, they’re nice. Um. Most of their numbers aren’t that big. They use positive numbers if we have a choice in the matter. Deep down we still suspect negative numbers of being up to something.

If nobody says otherwise then the principal square root is the positive one, or the one with a positive number in front of the imaginary part. If nobody says otherwise the principal arcsine is between -90 and +90 degrees (-^{π}/_{2} and ^{π}/_{2}). The principal arccosine is between 0 and 180 degrees (0 and π), unless someone says otherwise. The principal arctangent is … between -90 and 90 degrees, unless it’s between 0 and 180 degrees. You can count on the 0 to 90 part. Use your best judgement and roll with whatever develops for the other half of the range there. There’s not one answer that’s right for every possible case. The point of a principal value is to pick out one answer that’s usually a good starting point.

When you stare at what it means to be a function you realize that there’s a difference between the original function and the one that returns the principal value. The original function has a range that’s “sets of values”. The principal-value version has a range that’s just one value. If you’re being kind to your audience you make some note of that. Usually we note this by capitalizing the start of the function: “arcsin z” gives way to “Arcsin z”. “Log z” would be the principal-value version of “log z”. When you start pondering logarithms for negative numbers or for complex-valued numbers you get multiple values. It’s the same way that the arcsine function does.

And it’s good to warn your audience which principal value you mean, especially for the arc-trigonometric-functions or logarithms. (I’ve never seen someone break the square root convention.) The principal value is about picking the most obvious and easy-to-work-with value out of a set of them. It’s just impossible to get everyone to agree on what the obvious is.

Today’s is another of those words that means nearly what you would guess. There are still seven letters left, by the way, which haven’t had any requested terms. If you’d like something described please try asking.

Local.

Stops at every station, rather than just the main ones.

OK, I’ll take it seriously.

So a couple years ago I visited Niagara Falls, and I stepped into the river, just above the really big drop.

I didn’t have any plans to go over the falls, and didn’t, but I liked the thrill of claiming I had. I’m not crazy, though; I picked a spot I knew was safe to step in. It’s only in the retelling I went into the Niagara River just above the falls.

Because yes, there is surely danger in certain spots of the Niagara River. But there are also spots that are perfectly safe. And not isolated spots either. I wouldn’t have been less safe if I’d stepped into the river a few feet closer to the edge. Nor if I’d stepped in a few feet farther away. Where I stepped in was locally safe.

Over in mathematics we do a lot of work on stuff that’s true or false depending on what some parameters are. We can look at bunches of those parameters, and they often look something like normal everyday space. There’s some values that are close to what we started from. There’s others that are far from that.

So, a “neighborhood” of some point is that point and some set of points containing it. It needs to be an “open” set, which means it doesn’t contain its boundary. So, like, everything less than one minute’s walk away, but not the stuff that’s precisely one minute’s walk away. (If we include boundaries we break stuff that we don’t want broken is why.) And certainly not the stuff more than one minute’s walk away. A neighborhood could have any shape. It’s easy to think of it as a little disc around the point you want. That’s usually the easiest to describe in a proof, because it’s “everything a distance less than (something) away”. (That “something” is either ‘δ’ or ‘ε’. Both Greek letters are called in to mean “a tiny distance”. They have different connotations about what the tiny distance is in.) It’s easiest to draw as little amoeba-like blob around a point, and contained inside a bigger amoeba-like blob.

Anyway, something is true “locally” to a point if it’s true in that neighborhood. That means true for everything in that neighborhood. Which is what you’d expect. “Local” means just that. It’s the stuff that’s close to where we started out.

Often we would like to know something “globally”, which means … er … everywhere. Universally so. But it’s usually easier to prove a thing locally. I suppose having a point where we know something is so makes it easier to prove things about what’s nearby. Distant stuff, who knows?

“Local” serves as an adjective for many things. We think of a “local maximum”, for example, or “local minimum”. This is where whatever we’re studying has a value bigger (or smaller) than anywhere else nearby has. Or we speak of a function being “locally continuous”, meaning that we know it’s continuous near this point and we make no promises away from it. It might be “locally differentiable”, meaning we can take derivatives of it close to some interesting point. We say nothing about what happens far from it.

Unless we do. We can talk about something being “local to infinity”. Your first reaction to that should probably be to slap the table and declare that’s it, we’re done. But we can make it sensible, at least to other mathematicians. We do it by starting with a neighborhood that contains the origin, zero, that point in the middle of everything. So, what’s the inverse of that? It’s everything that’s far enough away from the origin. (Don’t include the boundary, we don’t need those headaches.) So why not call that the “neighborhood of infinity”? Other than that it’s a weird set of words to put together? And if something is true in that “neighborhood of infinity”, what is that thing other than true “local to infinity”?

It’s another free-choice entry. I’ve got something that I can use to make my Friday easier.

Image.

So remember a while back I talked about what functions are? I described them the way modern mathematicians like. A function’s got three components to it. One is a set of things called the domain. Another is a set of things called the range. And there’s some rule linking things in the domain to things in the range. In shorthand we’ll write something like “f(x) = y”, where we know that x is in the domain and y is in the range. In a slightly more advanced mathematics class we’ll write . That maybe looks a little more computer-y. But I bet you can read that already: “f matches x to y”. Or maybe “f maps x to y”.

We have a couple ways to think about what ‘y’ is here. One is to say that ‘y’ is the image of ‘x’, under ‘f’. The language evokes camera trickery, or at least the way a trick lens might make us see something different. Pretend that the domain is something you could gaze at. If the domain is, say, some part of the real line, or a two-dimensional plane, or the like that’s not too hard to do. Then we can think of the rule part of ‘f’ as some distorting filter. When we look to where ‘x’ would be, we see the thing in the range we know as ‘y’.

At this point you probably imagine this is a pointless word to have. And that it’s backed up by a useless analogy. So it is. As far as I’ve gone this addresses a problem we don’t need to solve. If we want “the thing f matches x to” we can just say “f(x)”. Well, we write “f(x)”. We say “f of x”. Maybe “f at x”, or “f evaluated at x” if we want to emphasize ‘f’ more than ‘x’ or ‘f(x)’.

Where it gets useful is that we start looking at subsets. Bunches of points, not just one. Call ‘D’ some interesting-looking subset of the domain. What would it mean if we wrote the expression ‘f(D)’? Could we make that meaningful?

We do mean something by it. We mean what you might imagine by it. If you haven’t thought about what ‘f(D)’ might mean, take a moment — a short moment — and guess what it might. Don’t overthink it and you’ll have it right. I’ll put the answer just after this little bit so you can ponder.

So. ‘f(D)’ is a set. We make that set by taking, in turn, every single thing that’s in ‘D’. And find everything in the range that’s matched by ‘f’ to those things in ‘D’. Collect them all together. This set, ‘f(D)’, is “the image of D under f”.

We use images a lot when we’re studying how functions work. A function that maps a simple lump into a simple lump of about the same size is one thing. A function that maps a simple lump into a cloud of disparate particles is a very different thing. A function that describes how physical systems evolve will preserve the volume and some other properties of these lumps of space. But it might stretch out and twist around that space, which is how we discovered chaos.

Properly speaking, the range of a function ‘f’ is just the image of the whole domain under that ‘f’. But we’re not usually that careful about defining ranges. We’ll say something like ‘the domain and range are the sets of real numbers’ even though we only need the positive real numbers in the range. Well, it’s not like we’re paying for unnecessary range. Let me call the whole domain ‘X’, because I went and used ‘D’ earlier. Then the range, let me call that ‘Y’, would be ‘Y = f(X)’.

Images will turn up again. They’re a handy way to let us get at some useful ideas.

Some things are created with magnificent names. My essay today is about one of them. It’s one of my favorite terms and I get a strange little delight whenever it needs to be mentioned in a proof. It’s also the title I shall use for my 1970s Paranoid-Conspiracy Thriller.

The Fredholm Alternative.

So the Fredholm Alternative is about whether this supercomputer with the ability to monitor every commercial transaction in the country falls into the hands of the Parallax Corporation or whether — ahm. Sorry. Wrong one. OK.

The Fredholm Alternative comes from the world of functional analysis. In functional analysis we study sets of functions with tools from elsewhere in mathematics. Some you’d be surprised aren’t already in there. There’s adding functions together, multiplying them, the stuff of arithmetic. Some might be a bit surprising, like the stuff we draw from linear algebra. That’s ideas like functions having length, or being at angles to each other. Or that length and those angles changing when we take a function of those functions. This may sound baffling. But a mathematics student who’s got into functional analysis usually has a happy surprise waiting. She discovers the subject is easy. At least, it relies on a lot of stuff she’s learned already, applied to stuff that’s less difficult to work with than, like, numbers.

(This may be a personal bias. I found functional analysis a thoroughgoing delight, even though I didn’t specialize in it. But I got the impression from other grad students that functional analysis was well-liked. Maybe we just got the right instructor for it.)

I’ve mentioned in passing “operators”. These are functions that have a domain that’s a set of functions and a range that’s another set of functions. Suppose you come up to me with some function, let’s say . I give you back some other function — say, . Then I’m acting as an operator.

Why should I do such a thing? Many operators correspond to doing interesting stuff. Taking derivatives of functions, for example. Or undoing the work of taking a derivative. Describing how changing a condition changes what sorts of outcomes a process has. We do a lot of stuff with these. Trust me.

Let me use the name `T’ for some operator. I’m not going to say anything about what it does. The letter’s arbitrary. We like to use capital letters for operators because it makes the operators look extra important. And we don’t want to use `O’ because that just looks like zero and we don’t need that confusion.

Anyway. We need two functions. One of them will be called ‘f’ because we always call functions ‘f’. The other we’ll call ‘v’. In setting up the Fredholm Alternative we have this important thing: we know what ‘f’ is. We don’t know what ‘v’ is. We’re finding out something about what ‘v’ might be. The operator doing whatever it does to a function we write down as if it were multiplication, that is, like ‘Tv’. We get this notation from linear algebra. There we multiple matrices by vectors. Matrix-times-vector multiplication works like operator-on-a-function stuff. So much so that if we didn’t use the same notation young mathematics grad students would rise in rebellion. “This is absurd,” they would say, in unison. “The connotations of these processes are too alike not to use the same notation!” And the department chair would admit they have a point. So we write ‘Tv’.

If you skipped out on mathematics after high school you might guess we’d write ‘T(v)’ and that would make sense too. And, actually, we do sometimes. But by the time we’re doing a lot of functional analysis we don’t need the parentheses so much. They don’t clarify anything we’re confused about, and they require all the work of parenthesis-making. But I do see it sometimes, mostly in older books. This makes me think mathematicians started out with ‘T(v)’ and then wrote less as people got used to what they were doing.

I admit we might not literally know what ‘f’ is. I mean we know what ‘f’ is in the same way that, for a quadratic equation, “ax^{2} + bx + c = 0”, we “know” what ‘a’, ‘b’, and ‘c’ are. Similarly we don’t know what ‘v’ is in the same way we don’t know what ‘x’ there is. The Fredholm Alternative tells us exactly one of these two things has to be true:

For operators that meet some requirements I don’t feel like getting into, either:

There’s one and only one ‘v’ which makes the equation true.

Or else for some ‘v’ that isn’t just zero everywhere.

That is, either there’s exactly one solution, or else there’s no solving this particular equation. We can rule out there being two solutions (the way quadratic equations often have), or ten solutions (the way some annoying problems will), or infinitely many solutions (oh, it happens).

It turns up often in boundary value problems. Often before we try solving one we spend some time working out whether there is a solution. You can imagine why it’s worth spending a little time working that out before committing to a big equation-solving project. But it comes up elsewhere. Very often we have problems that, at their core, are “does this operator match anything at all in the domain to a particular function in the range?” When we try to answer we stumble across Fredholm’s Alternative over and over.

Fredholm here was Ivar Fredholm, a Swedish mathematician of the late 19th and early 20th centuries. He worked for Uppsala University, and for the Swedish Social Insurance Agency, and as an actuary for the Skandia insurance company. Wikipedia tells me that his mathematical work was used to calculate buyback prices. I have no idea how.

For this week I have something I want to follow up on. We’ll see if I make it that far.

The Mean Value Theorem.

My subject line disagrees with the header just above here. I want to talk about the Mean Value Theorem. It’s one of those things that turns up in freshman calculus and then again in Analysis. It’s introduced as “the” Mean Value Theorem. But like many things in calculus it comes in several forms. So I figure to talk about one of them here, and another form in a while, when I’ve had time to make up drawings.

Calculus can split effortlessly into two kinds of things. One is differential calculus. This is the study of continuity and smoothness. It studies how a quantity changes if someting affecting it changes. It tells us how to optimize things. It tells us how to approximate complicated functions with simpler ones. Usually polynomials. It leads us to differential equations, problems in which the rate at which something changes depends on what value the thing has.

The other kind is integral calculus. This is the study of shapes and areas. It studies how infinitely many things, all infinitely small, add together. It tells us what the net change in things are. It tells us how to go from information about every point in a volume to information about the whole volume.

They aren’t really separate. Each kind informs the other, and gives us tools to use in studying the other. And they are almost mirrors of one another. Differentials and integrals are not quite inverses, but they come quite close. And as a result most of the important stuff you learn in differential calculus has an echo in integral calculus. The Mean Value Theorem is among them.

The Mean Value Theorem is a rule about functions. In this case it’s functions with a domain that’s an interval of the real numbers. I’ll use ‘a’ as the name for the smallest number in the domain and ‘b’ as the largest number. People talking about the Mean Value Theorem often do. The range is also the real numbers, although it doesn’t matter which ones.

I’ll call the function ‘f’ in accord with a longrunning tradition of not working too hard to name functions. What does matter is that ‘f’ is continuous on the interval [a, b]. I’ve described what ‘continuous’ means before. It means that here too.

And we need one more thing. The function f has to be differentiable on the interval (a, b). You maybe noticed that before I wrote [a, b], and here I just wrote (a, b). There’s a difference here. We need the function to be continuous on the “closed” interval [a, b]. That is, it’s got to be continuous for ‘a’, for ‘b’, and for every point in-between.

But we only need the function to be differentiable on the “open” interval (a, b). That is, it’s got to be continuous for all the points in-between ‘a’ and ‘b’. If it happens to be differentiable for ‘a’, or for ‘b’, or for both, that’s great. But we won’t turn away a function f for not being differentiable at those points. Only the interior. That sort of distinction between stuff true on the interior and stuff true on the boundaries is common. This is why mathematicians have words for “including the boundaries” (“closed”) and “never minding the boundaries” (“open”).

As to what “differentiable” is … A function is differentiable at a point if you can take its derivative at that point. I’m sure that clears everything up. There are many ways to describe what differentiability is. One that’s not too bad is to imagine zooming way in on the curve representing a function. If you start with a big old wobbly function it waves all around. But pick a point. Zoom in on that. Does the function stay all wobbly, or does it get more steady, more straight? Keep zooming in. Does it get even straighter still? If you zoomed in over and over again on the curve at some point, would it look almost exactly like a straight line?

If it does, then the function is differentiable at that point. It has a derivative there. The derivative’s value is whatever the slope of that line is. The slope is that thing you remember from taking Boring Algebra in high school. That rise-over-run thing. But this derivative is a great thing to know. You could approximate the original function with a straight line, with slope equal to that derivative. Close to that point, you’ll make a small enough error nobody has to worry about it.

That there will be this straight line approximation isn’t true for every function. Here’s an example. Picture a line that goes up and then takes a 90-degree turn to go back down again. Look at the corner. However close you zoom in on the corner, there’s going to be a corner. It’s never going to look like a straight line; there’s a 90-degree angle there. It can be a smaller angle if you like, but any sort of corner breaks this differentiability. This is a point where the function isn’t differentiable.

There are functions that are nothing but corners. They can be differentiable nowhere, or only at a tiny set of points that can be ignored. (A set of measure zero, as the dialect would put it.) Mathematicians discovered this over the course of the 19th century. They got into some good arguments about how that can even make sense. It can get worse. Also found in the 19th century were functions that are continuous only at a single point. This smashes just about everyone’s intuition. But we can’t find a definition of continuity that’s as useful as the one we use now and avoids that problem. So we accept that it implies some pathological conclusions and carry on as best we can.

Now I get to the Mean Value Theorem in its differential calculus pelage. It starts with the endpoints, ‘a’ and ‘b’, and the values of the function at those points, ‘f(a)’ and ‘f(b)’. And from here it’s easiest to figure what’s going on if you imagine the plot of a generic function f. I recommend drawing one. Just make sure you draw it without lifting the pen from paper, and without including any corners anywhere. Something wiggly.

Draw the line that connects the ends of the wiggly graph. Formally, we’re adding the line segment that connects the points with coordinates (a, f(a)) and (b, f(b)). That’s coordinate pairs, not intervals. That’s clear in the minds of the mathematicians who don’t see why not to use parentheses over and over like this. (We are short on good grouping symbols like parentheses and brackets and braces.)

Per the Mean Value Theorem, there is at least one point whose derivative is the same as the slope of that line segment. If you were to slide the line up or down, without changing its orientation, you’d find something wonderful. Most of the time this line intersects the curve, crossing from above to below or vice-versa. But there’ll be at least one point where the shifted line is “tangent”, where it just touches the original curve. Close to that touching point, the “tangent point”, the shifted line and the curve blend together and can’t be easily told apart. As long as the function is differentiable on the open interval (a, b), and continuous on the closed interval [a, b], this will be true. You might convince yourself of it by drawing a couple of curves and taking a straightedge to the results.

This is an existence theorem. Like the Intermediate Value Theorem, it doesn’t tell us which point, or points, make the thing we’re interested in true. It just promises us that there is some point that does it. So it gets used in other proofs. It lets us mix information about intervals and information about points.

It’s tempting to try using it numerically. It looks as if it justifies a common differential-calculus trick. Suppose we want to know the value of the derivative at a point. We could pick a little interval around that point and find the endpoints. And then find the slope of the line segment connecting the endpoints. And won’t that be close enough to the derivative at the point we care about?

Well. Um. No, we really can’t be sure about that. We don’t have any idea what interval might make the derivative of the point we care about equal to this line-segment slope. The Mean Value Theorem won’t tell us. It won’t even tell us if there exists an interval that would let that trick work. We can’t invoke the Mean Value Theorem to let us get away with that.

Often, though, we can get away with it. Differentiable functions do have to follow some rules. Among them is that if you do pick a small enough interval then approximations that look like this will work all right. If the function flutters around a lot, we need a smaller interval. But a lot of the functions we’re interested in don’t flutter around that much. So we can get away with it. And there’s some grounds to trust in getting away with it. The Mean Value Theorem isn’t any part of the grounds. It just looks so much like it ought to be.

I hope on a later Thursday to look at an integral-calculus form of the Mean Value Theorem.

Oh, x- and y-, why are you so poor in mathematics terms? I brave my way.

X-Intercept.

I did not get much out of my eighth-grade, pre-algebra, class. I didn’t connect with the teacher at all. There were a few little bits to get through my disinterest. One came in graphing. Not graph theory, of course, but the graphing we do in middle school and high school. That’s where we find points on the plane with coordinates that make some expression true. Two major terms kept coming up in drawing curves of lines. They’re the x-intercept and the y-intercept. They had this lovely, faintly technical, faintly science-y sound. I think the teacher emphasized a few times they were “intercepts”, not “intersects”. But it’s hard to explain to an eighth-grader why this is an important difference to make. I’m not sure I could explain it to myself.

An x-intercept is a point where the plot of a curve and the x-axis meet. So we’re assuming this is a Cartesian coordinate system, the kind marked off with a pair of lines meeting at right angles. It’s usually two-dimensional, sometimes three-dimensional. I don’t know anyone who’s worried about the x-intercept for a four-dimensional space. Even higher dimensions are right out. The thing that confused me the most, when learning this, is a small one. The x-axis is points that have a y-coordinate of zero. Not an x-coordinate of zero. So in a two-dimensional space it makes sense to describe the x-intercept as a single value. That’ll be the x-coordinate, and the point with the x-coordinate of that and the y-coordinate of zero is the intercept.

If you have an expression and you want to find an x-intercept, you need to find values of x which make the expression equal to zero. We get the idea from studying lines. There are a couple of typical representations of lines. They almost always use x for the horizontal coordinate, and y for the vertical coordinate. The names are only different if the author is making a point about the arbitrariness of variable names. Sigh at such an author and move on. An x-intercept has a y-coordinate of zero, so, set any appearance of ‘y’ in the expression equal to zero and find out what value or values of x make this true. If the expression is an equation for a line there’ll be just the one point, unless the line is horizontal. (If the line is horizontal, then either every point on the x-axis is an intercept, or else none of them are. The line is either “y equals zero”, or it is “y equals something other than zero”. )

There’s also a y-intercept. It is exactly what you’d imagine once you know that. It’s usually easier to find what the y-intercept is. The equation describing a curve is typically written in the form “y = f(x)”. That is, y is by itself on one side, and some complicated expression involving x’s is on the other. Working out what y is for a given x is straightforward. Working out what x is for a given y is … not hard, for a line. For more complicated shapes it can be difficult. There might not be a unique answer. That’s all right. There may be several x-intercepts.

There are a couple names for the x-intercepts. The one that turns up most often away from the pre-algebra and high school algebra study of lines is a “zero”. It’s one of those bits in which mathematicians seem to be trying to make it hard for students. A “zero” of the function f(x) is generally not what you get when you evaluate it for x equalling zero. Sorry about that. It’s the values of x for which f(x) equals zero. We also call them “roots”.

OK, but who cares?

Well, if you want to understand the shape of a curve, the way a function looks, it helps to plot it. Today, yeah, pull up Mathematica or Matlab or Octave or some other program and you get your plot. Fair enough. If you don’t have a computer that can plot like that, the way I did in middle school, you have to do it by hand. And then the intercepts are clues to how to sketch the function. They are, relatively, easy points which you can find, and which you know must be on the curve. We may form a very rough sketch of the curve. But that rough picture may be better than having nothing.

And we can learn about the behavior of functions even without plotting, or sketching a plot. Intercepts of expressions, or of parts of expressions, are points where the value might change from positive to negative. If the denominator of a part of the expression has an x-intercept, this could be a point where the function’s value is undefined. It may be a discontinuity in the function. The function’s values might jump wildly between one side and another. These are often the important things about understanding functions. Where are they positive? Where are they negative? Where are they continuous? Where are they not?

These are things we often want to know about functions. And we learn many of them by looking for the intercepts, x- and y-.

This request echoes one of the first terms from my Summer 2015 Mathematics A To Z. Then I’d spent some time on a bijection, or a bijective map. A surjective map is a less complicated concept. But if you understood bijective maps, you picked up surjective maps along the way.

By “map”, in this context, mathematicians don’t mean those diagrams that tell you where things are and how you might get there. Of course we don’t. By a “map” we mean that we have some rule that matches things in one set to things in another. If this sounds to you like what I’ve claimed a function is then you have a good ear. A mapping and a function are pretty much different names for one another. If there’s a difference in connotation I suppose it’s that a “mapping” makes a weaker suggestion that we’re necessarily talking about numbers.

(In some areas of mathematics, a mapping means a function with some extra properties, often some kind of continuity. Don’t worry about that. Someone will tell you when you’re doing mathematics deep enough to need this care. Mind, that person will tell you by way of a snarky follow-up comment picking on some minor point. It’s nothing personal. They just want you to appreciate that they’re very smart.)

So a function, or a mapping, has three parts. One is a set called the domain. One is a set called the range. And then there’s a rule matching things in the domain to things in the range. With functions we’re so used to the domain and range being the real numbers that we often forget to mention those parts. We go on thinking “the function” is just “the rule”. But the function is all three of these pieces.

A function has to match everything in the domain to something in the range. That’s by definition. There’s no unused scraps in the domain. If it looks like there is, that’s because were being sloppy in defining the domain. Or let’s be charitable. We assumed the reader understands the domain is only the set of things that make sense. And things make sense by being matched to something in the range.

Ah, but now, the range. The range could have unused bits in it. There’s nothing that inherently limits the range to “things matched by the rule to some thing in the domain”.

By now, then, you’ve probably spotted there have to be two kinds of functions. There’s one in which the whole range is used, and there’s ones in which it’s not. Good eye. This is exactly so.

If a function only uses part of the range, if it leaves out anything, even if it’s just a single value out of infinitely many, then the function is called an “into” mapping. If you like, it takes the domain and stuffs it into the range without filling the range.

Ah, but if a function uses every scrap of the range, with nothing left out, then we have an “onto” mapping. The whole of the domain gets sent onto the whole of the range. And this is also known as a “surjective” mapping. We get the term “surjective” from Nicolas Bourbaki. Bourbaki is/was the renowned 20th century mathematics art-collective group which did so much to place rigor and intuition-free bases into mathematics.

The term pairs up with the “injective” mapping. In this, the elements in the range match up with one and only one thing in the domain. So if you know the function’s rule, then if you know a thing in the range, you also know the one and only thing in the domain matched to that. If you don’t feel very French, you might call this sort of function one-to-one. That might be a better name for saying why this kind of function is interesting.

Not every function is injective. But then not every function is surjective either. But if a function is both injective and surjective — if it’s both one-to-one and onto — then we have a bijection. It’s a mapping that can represent the way a system changes and that we know how to undo. That’s pretty comforting stuff.

If we use a mapping to describe how a process changes a system, then knowing it’s a surjective map tells us something about the process. It tells us the process makes the system settle into a subset of all the possible states. That doesn’t mean the thing is stable — that little jolts get worn down. And it doesn’t mean that the thing is settling to a fixed state. But it is a piece of information suggesting that’s possible. This may not seem like a strong conclusion. But considering how little we know about the function it’s impressive to be able to say that much.

I hope we’re all comfortable with the idea of looking at sets of functions. If not we can maybe get comfortable soon. What’s important about functions is that we can add them together, and we can multiply them by real numbers. They work in important ways like regular old numbers would. They also work the way vectors do. So all we have to do is be comfortable with vectors. Then we have the background to talk about functions this way. And so, my first example of an oft-used set of functions:

C[a, b]

People like continuity. It’s comfortable. It’s reassuring, even. Most situations, most days, most things are pretty much like they were before, and that’s how we want it. Oh, we cast some hosannas towards the people who disrupt the steady progression of stuff. But we’re lying. Think of the worst days of your life. They were the ones that were very much not like the day before. If the day is discontinuous enough, then afterwards, people ask one another what they were doing when the discontinuous thing happened.

(OK, there are some good days which are very much not like the day before. But imagine someone who seems informed assures you that tomorrow will completely change your world. Do you feel anticipation or dread?)

Mathematical continuity isn’t so fraught with social implications. What we mean by a continuous function is — well, skip the precise definition. Calculus I students see it, stare at it, and run away. It comes back to the mathematics majors in Intro to Real Analysis. Then it comes back again in Real Analysis. Mathematics majors get to accepting it sometime around Real Analysis II, because the alternative is Functional Analysis. The definition’s in truth not so bad. But it’s fussy and if you get any parts wrong silly consequences follow.

If you’re not a mathematics major, or if you’re a mathematics major not taking a test in Real Analysis, you can get away with this. We’re talking here, and we’re going to keep talking, about functions with real numbers as the domain and real numbers as the range. Later, we can go to complex-valued numbers, or even vectors of numbers. The arguments get a bit longer but don’t change much, so if you learn this you’ve got most of the way to learning everything.

A continuous function is one whose graph you can draw without having to lift your pen. We like continuous functions, mathematically, because they are so much easier to work with. Why are they easy? Well, because if you know the value of your function at one point, you know approximately what it is at nearby points. There’s predictability to the function’s values. You can see why this would make it easier to do calculations. But it makes analysis easy too. We want to do a lot of proofs which involve arithmetic with the values functions have. It gets so much easier that we can say the function’s actual value is something like the value it has at some point we happen to know.

So if we want to work with functions, we usually want to work with continuous functions. They behave more predictably, and more like we hope they will.

The set C[a, b] is the set of all continuous real-valued whose domain is the set of real numbers from a to b. For example, pick a function that’s in C[-1, 1]. Let me call it f. Then f is a real-valued function. And its domain is the real numbers from -1 to 1. In the absence of other information about what its range is, we assume it to be the real numbers R. We can have any real numbers as the boundaries; C[-1000, π] is legitimate if eccentric.

There are some ranges that are particularly popular. All the real numbers is one. That might get written C(R) for shorthand. C[0, 1], the range from 0 to 1, is popular and easy to work with. C[-1, 1] is almost as good and has the advantage of giving us negative numbers. C[-π, π] is also liked because it meshes well with the trigonometric functions. You remember those: sines and cosines and tangent functions, plus some unpopular ones we try to not talk about. We don’t often talk about other ranges. We can change, say, C[0, 1] into C[0, 10] exactly the way you’d imagine. Re-scaling numbers, and even shifting them up or down some, requires so little work we don’t bother doing it.

C[-1, 1] is a different set of functions from, say, C[0, 1]. There are many functions in one set that have the same rule as a function in another set. But the functions in C[-1, 1] have a different domain from the functions in C[0, 1]. So they can’t be the same functions. The rule might be meaningful outside the domain. If the rule is “f:x -> 3*x”, well, that makes sense whatever x should be. But a function is the rule, the domain, and the range together. If any of the parts changes, we have a different function.

The way I’ve written the symbols, with straight brackets [a, b], means that both the numbers a and b are in the domain of these functions. If I want to omit the boundaries — have every number greater than a but not a itself, and have every number less than b but not b itself — then we change to parentheses. That would be C(-1, 1). If I want to include one boundary but not the other, use a straight bracket for the boundary to include, and a parenthesis for the boundary to omit. C[-1, 1) says functions in that set have a domain that includes -1 but does not include -1. It also drives my text editor crazy having unmatched parentheses and brackets like that. We must suffer for our mathematical arts.

I want to resume my tour of sets that turn up a lot as domains and ranges. But I need to spend some time explaining stuff before the next bunch. I want to talk about things that aren’t so familiar as “numbers” or “shapes”. We get into more abstract things.

We have to start out with functions. Functions are built of three points, a set that’s the domain, a set that’s the range, and a rule that matches things in the domain to things in the range. But what’s a set? Sets are bunches of things. (If we want to avoid logical chaos we have to be more exact. But we’re not going near the zones of logical chaos. So we’re all right going with “sets are bunches of things”. WARNING: do not try to pass this off at your thesis defense.)

So if a function is a thing, can’t we have a set that’s made up of functions? Sure, why not? We can get a set by describing the collection of things we want in it. At least if we aren’t doing anything weird. (See above warning.)

Let’s pick out a set of functions. Put together a group of functions that all have the same set as their domain, and that have compatible sets as their range. The real numbers are a good pick for a domain. They’re also good for a range.

Is this an interesting set? Generally, a set is boring unless we can do something with the stuff in it. That something is, almost always, taking a pair of the things in the set and relating it to something new. Whole numbers, for example, would be trivia if we weren’t able to add them together. Real numbers would be a complicated pile of digits if we couldn’t multiply them together. Having things is nice. Doing stuff with things is all that’s meaningful.

So what can we do with a couple of functions, if they have the same domains and ranges? Let’s pick one out. Give it the name ‘f’. That’s a common name for functions. It was given to us by Leonhard Euler, who was brilliant in every field of mathematics, including in creating notation. Now let’s pick out a function again. Give this new one the name ‘g’. That’s a common name for functions, given to us by every mathematician who needed something besides ‘f’. (There are alternatives. One is to start using subscripts, like f_{1} and f_{2}. That’s too hard for me to type. Another is to use different typefaces. Again, too hard for me. Another is to use lower- and upper-case letters, ‘f’ and ‘F’. Using alternate-case forms usually connotes that these two functions are related in some way. I don’t want to suggest that they are related here. So, ‘g’ it is.)

We can do some obvious things. We can add them together. We can create a new function, imaginatively named `f + g’. It’ll have the same domain and the same range as f and g did. What rule defines how it matches things in the domain to things in the range?

Mathematicians throw the term “obvious” around a lot. Also “intuitive”. What they mean is “what makes sense to me but I don’t want to write it down”. Saying that is fine if your mathematician friend knows roughly what you’d think makes sense. It can be catastrophic if she’s much smarter than you, or thinks in weird ways, and is always surprised other people don’t think like her. It’s hard to better describe it than “obvious”, though. Well, here goes.

Let me pick something that’s in the domain of both f and g. I’m going to call that x, which mathematicians have been doing ever since René Descartes gave us the idea. So “f(x)” is something in the range of f, and “g(x) is something in the range of g. I said, way up earlier, that both of these ranges are the same set and suggested the real numbers there. That is, f(x) is some real number and I don’t care which just now. g(x) is also some real number and again I don’t care right now just which.

The function we call “f + g” matches the thing x, in the domain, to something in the range. What thing? The number f(x) + g(x). I told you, I can’t see any fair way to describe that besides being “obvious” and “intuitive”.

Another thing we’ll want to do is multiply a function by a real number. Suppose we have a function f, just like above. Give me a real number. We’ll call that real number ‘a’ because I don’t remember if you can do the alpha symbol easily on web pages. Anyway, we can define a function, `af’, the multiplication of the real number a by the function f. It has the same domain as f, and the same range as f. What’s its rule?

Let me say x is something in the domain of f. So f(x) is some real number. Then the new function `af’ matches the x in the domain with a real number. That number is what you get by multiplying `a’ by whatever `f(x)’ is. So there are major parts of your mathematician friend from college’s classes that you could have followed without trouble.

(Her class would have covered many more things, mind you, and covered these more cryptically.)

There’s more stuff we would like to do with functions. But for now, this is enough. This lets us turn a set of functions into a “vector space”. Vector spaces are kinds of things that work, at least a bit, like arithmetic. And mathematicians have studied these kinds of things. We have a lot of potent tools that work on vector spaces. So mathematicians develop a habit of finding vector spaces in what they study.

And I’m subject to that too. This is why I’ve spent such time talking about what we can do with functions rather than naming particular sets. I’ll pick up from that.

I’ve been slow getting back to my tour of commonly-used domains for several reasons. It’s been a busy season. It’s so much easier to plan out writing something than it is to write something. The usual. But one of my excuses this time is that I’m not sure the set I want to talk about is that common. But I like it, and I imagine a lot of people will like it. So that’s enough.

T and T^{n}

T stands for the torus. Or the toroid, if you prefer. It’s a fun name. You know the shape. It’s a doughnut. Take a cylindrical tube and curl it around back on itself. Don’t rip it or fold it. That’s hard to do with paper or a sheet of clay or other real-world stuff. But we can imagine it easily enough. I suppose we can make a computer animation of it, if by ‘we’ we mean ‘you’.

We don’t use the whole doughnut shape for T. And no, we don’t use the hole either. What we use is the surface of the doughnut, the part that could get glazed. We ignore the inside, just the same way we had S represent the surface of a sphere (or the edge of a circle, or the boundary of a hypersphere). If there is a common symbol for the torus including the interior I don’t know it. I’d be glad to hear if someone had.

What good is the surface of a torus, though? Well, it’s a neat shape. Slice it in one direction, the way you’d cut a bagel in half, and at the slice you get the shape of a washer, the kind you fit around a nut and bolt. (An annulus, to use the trade term.) Slice it perpendicular to that, the way you’d cut it if you’re one of those people who eats half doughnuts to the amazement of the rest of us, and at the slice you get two detached circles. If you start from any point on the torus shape you can go in one direction and make a circle that loops around the doughnut’s central hole. You can go the perpendicular direction and make a circle that brushes up against but doesn’t go around the central hole. There’s some neat topology in it.

There’s also video games in it. The topology of this is just like old-fashioned video games where if you go off the edge of the screen to the right you come back around on the left, and if you go off the top you come back from the bottom. (And if you go off to the left you come back around the right, and off the bottom you come back to the top.) To go from the flat screen to the surface of a doughnut requires imagining some stretching and scrunching up of the surface, but that’s all right. (OK, in an old video game it was a kind-of flat screen.) We can imagine a nice flexible screen that just behaves.

This is a common trick to deal with boundaries. (I first wrote “to avoid having to deal with boundaries”. But this is dealing with them, by a method that often makes sense.) You just make each boundary match up with a logical other boundary. It’s not just useful in video games. Often we’ll want to study some phenomenon where the current state of things depends on the immediate neighborhood, but it’s hard to say what a logical boundary ought to be. This particularly comes up if we want to model an infinitely large surface without dealing with infinitely large things. The trick will turn up a lot in numerical simulations for that reason. (In that case, we’re in truth working with a numerical approximation of T, but that’ll be close enough.)

T^{n}, meanwhile, is a vector of things, each of which is a point on a torus. It’s akin to R^{n} or S^{2 x n}. They’re ordered sets of things that are themselves things. There can be as many as you like. n, here, is whatever positive whole number you need.

You might wonder how big the doughnut is. When we talked about the surface of the sphere, S^{2}, or the surface and interior, B^{3}, we figured on a sphere with radius of 1 unless we heard otherwise. Toruses would seem to have two parameters. There’s how big the outer diameter is and how big the inner diameter is. Which do we pick?

We don’t actually care. It’s much the way we can talk about a point on the surface of a planet by the latitude and longitude of the point, and never care about how big the planet is. We can describe a point on the surface of the torus without needing to refer to how big the whole shape is or how big the hole in the middle is. A popular scheme to describe points is one that looks a lot like latitude and longitude.

Imagine the torus sitting as flat as it gets on the table. Pick a point that you find interesting.

We use some reference point that’s as good as an equator and a prime meridian. One coordinate is the angle you make going horizontally, possibly around the hole in the middle, from the reference point to the point we’re interested in. The other coordinate is the angle you make vertically, going in a loop that doesn’t go around the hole in the middle, from the reference point to the point we’re interested in. The reference point has coordinates 0, 0, as it must. If this sounds confusing it’s because I’m not using a picture. I thought making some pictures would be too much work. I’m a fool. But if you think of real torus-shaped objects it’ll come to you.

In this scheme the coordinates are both angles. Normal people would measure that in degrees, from 0 to 360, or maybe from -180 to 180. Mathematicians would measure as radians, from 0 to 2π, or from -π to +π. Whatever it is, it’s the same as the coordinates of a point on the edge of the circle, what we called S^{1} a few essays back. So it’s fair to say you can think of T as S^{1} x S^{1}, an ordered set of points on circles.

I’ve written of these toruses as three-dimensional things. Well, two dimensional-surfaces wrapped up to suggest three-dimensional objects. You don’t have to stick with these dimensions if you don’t want or if your problem needs something else. You can make a torus that’s a three-dimensional shape in four dimensions. For me that’s easiest to imagine as a cube where the left edge and the right edge loop back and meet up, the lower and the upper edges meet up, and the front and the back edges meet up. This works well to model an infinitely large space with a nice and small block.

I like to think I can imagine a four-dimensional doughnut where every cross-section is a sphere. I may be kidding myself. There could also be a five-dimensional torus and you’re on your own working that out, or working out what to do with it.

I’m not sure there is a common standard notation for that, though. Probably the mathematician wanting to make clear she’s working with a torus in four dimensions just says so in text, and trusts that the context of her mathematics makes it clear this is no ordinary torus.

I’ve also written of these toruses as circular, as rounded shapes. That’s the most familiar torus. It’s a doughnut shape, or an O-ring shape, or an inner tube’s shape. It’s the shape you produce by taking a circle and looping it around an axis not on the ring. That’s common and that’s usually all we need.

But if you need some other torus, produced by rotating some other shape around an axis not inside it, go ahead. You’ll need to make clear what that original shape, the generator, is. You’ve seen examples of this in, for example, the washers that fit around nuts and bolts. They’re typically rectangles in cross-section. Or you might have seen that image of someone who fit together a couple dozen iMac boxes to make a giant wheel. I don’t know why you would need this, but it’s your problem, not mine. If these shapes are useful for your work, by all means, use them.

I’m not sure there is a standard notation for that sort of shape. My hunch is to say you’d define your generating shape and give it a name such as A or D. Then name the torus based on that as T(A) or T(D). But I would recommend spelling it out in text before you start using symbols like this.

Last week in the tour of often-used domains I talked about S^{n}, the surfaces of spheres. These correspond naturally to stuff like the surfaces of planets, or the edges of surfaces. They are also natural fits if you have a quantity that’s made up of a couple of components, and some total amount of the quantity is fixed. More physical systems do that than you might have guessed.

But this is all the surfaces. The great interior of a planet is by definition left out of S^{n}. This gives away the heart of what this week’s entry in the set tour is.

B^{n}

B^{n} is the domain that’s the interior of a sphere. That is, B^{3} would be all the points in a three-dimensional space that are less than a particular radius from the origin, from the center of space. If we don’t say what the particular radius is, then we mean “1”. That’s just as with the S^{n} we meant the radius to be “1” unless someone specifically says otherwise. In practice, I don’t remember anyone ever saying otherwise when I was in grad school. I suppose they might if we were doing a numerical simulation of something like the interior of a planet. You know, something where it could make a difference what the radius is.

It may have struck you that B^{3} is just the points that are inside S^{2}. Alternatively, it might have struck you that S^{2} is the points that are on the edge of B^{3}. Either way is right. B^{n} and S^{n-1}, for any positive whole number n, are tied together, one the edge and the other the interior.

B^{n} we tend to call the “ball” or the “n-ball”. Probably we hope that suggests bouncing balls and baseballs and other objects that are solid throughout. S^{n} we tend to call the “sphere” or the “n-sphere”, though I admit that doesn’t make a strong case for ruling out the inside of the sphere. Maybe we should think of it as the surface. We don’t even have to change the letter representing it.

As the “n” suggests, there are balls for as many dimensions of space as you like. B^{2} is a circle, filled in. B^{1} is just a line segment, stretching out from -1 to 1. B^{3} is what’s inside a planet or an orange or an amusement park’s glass light fixture. B^{4} is more work than I want to do today.

So here’s a natural question: does B^{n} include S^{n-1}? That is, when we talk about a ball in three dimensions, do we mean the surface and everything inside it? Or do we just mean the interior, stopping ever so short of the surface? This is a division very much like dividing the real numbers into negative and positive; do you include zero among other set?

Typically, I think, mathematicians don’t. If a mathematician speaks of B^{3} without saying otherwise, she probably means the interior of a three-dimensional ball. She’s not saying anything one way or the other about the surface. This we name the “open ball”, and if she wants to avoid any ambiguity she will say “the open ball B^{n}”.

“Open” here means the same thing it does when speaking of an “open set”. That may not communicate well to people who don’t remember their set theory. It means that the edges aren’t included. (Warning! Not actual set theory! Do not attempt to use that at your thesis defense. That description was only a reference to what’s important about this property in this particular context.)

If a mathematician wants to talk about the ball and the surface, she might say “the closed ball B^{n}”. This means to take the surface and the interior together. “Closed”, again, here means what it does in set theory. It pretty much means “include the edges”. (Warning! See above warning.)

Balls work well as domains for functions that have to describe the interiors of things. They also work if we want to talk about a constraint that’s made up of a couple of components, and that can be up to some size but not larger. For example, suppose you may put up to a certain budget cap into (say) six different projects, but you aren’t required to use the entire budget. We could model your budgeting as finding the point in B^{6} that gets the best result. How you measure the best is a problem for your operations research people. All I’m telling you is how we might represent the study of the thing you’re doing.

I haven’t forgotten or given up on the Set Tour, don’t worry or celebrate. I just expected there to be more mathematically-themed comic strips the last couple days. Really, three days in a row without anything at ComicsKingdom or GoComics to talk about? That’s unsettling stuff. Ah well.

S^{n}

We are also starting to get into often-used domains that are a bit stranger. We are going to start seeing domains that strain the imagination more. But this isn’t strange quite yet. We’re looking at the surface of a sphere.

The surface of a sphere we call S^{2}. The “S” suggests a sphere. The “2” means that we have a two-dimensional surface, which matches what we see with the surface of the Earth, or a beach ball, or a soap bubble. All these are sphere enough for our needs. If we want to say where we are on the surface of the Earth, it’s most convenient to do this with two numbers. These are a latitude and a longitude. The latitude is the angle made between the point we’re interested in and the equator. The longitude is the angle made between the point we’re interested in and a reference prime longitude.

There are some variations. We can replace the latitude, for example, with the colatitude. That’s the angle between our point and the north pole. Or we might replace the latitude with the cosine of the colatitude. That has some nice analytic properties that you have to be well into grad school to care about. It doesn’t matter. The details may vary but it’s all the same. We put in a number for the east-west distance and another for the north-south distance.

It may seem pompous to use the same system to say where a point is on the surface of a beach ball. But can you think of a better one? Pointing to the ball and saying “there”, I suppose. But that requires we go around with the beach ball pointing out spots. Giving two numbers saves us having to go around pointing.

(Some weenie may wish to point out that if we were clever we could describe a point exactly using only a single number. This is true. Nobody does that unless they’re weenies trying to make a point. This essay is long enough without describing what mathematicians really mean by “dimension”. “How many numbers normal people use to identify a point in it” is good enough.)

S^{2} is a common domain. If we talk about something that varies with your position on the surface of the earth, we’re probably using S^{2} as the domain. If we talk about the temperature as it varies with position, or the height above sea level, or the population density, we have functions with a domain of S^{2} and a range in R. If we talk about the wind speed and direction we have a function with domain of S^{2} and a range in R^{3}, because the wind might be moving in any direction.

Of course, I wrote down S^{n} rather than just S^{2}. As with R^{n} and with R^{m x n}, there is really a family of similar domains. They are common enough to share a basic symbol, and the superscript is enough to differentiate them.

What we mean by S^{n} is “the collection of points in R^{n+1} that are all the same distance from the origin”. Let me unpack that a little. The “origin” is some point in space that we pick to measure stuff from. On the number line we just call that “zero”. On your normal two-dimensional plot that’s where the x- and y-axes intersect. On your normal three-dimensional plot that’s where the x- and y- and z-axes intersect.

And by “the same distance” we mean some set, fixed distance. Usually we call that the radius. If we don’t specify some distance then we mean “1”. In fact, this is so regularly the radius I’m not sure how we would specify a different one. Maybe we would write S^{n}_{r} for a radius of “r”. Anyway, S^{n}, the surface of the sphere with radius 1, is commonly called the “unit sphere”. “Unit” gets used a fair bit for shapes. You’ll see references to a “unit cube” or “unit disc” or so on. A unit cube has sides length 1. A unit disc has radius 1. If you see “unit” in a mathematical setting it usually means “this thing measures out at 1”. (The other thing it may mean is “a unit of measure, but we’re not saying which one”. For example, “a unit of distance” doesn’t commit us to saying whether the distance is one inch, one meter, one million light-years, or one angstrom. We use that when we don’t care how big the unit is, and only wonder how many of them we have.)

S^{1} is an exotic name for a familiar thing. It’s all the points in two-dimensional space that are a distance 1 from the origin. Real people call this a “circle”. So do mathematicians unless they’re comparing it to other spheres or hyperspheres.

This is a one-dimensional figure. We can identify a single point on it easily with just one number, the angle made with respect to some reference direction. The reference direction is almost always that of the positive x-axis. That’s the line that starts at the center of the circle and points off to the right.

S^{3} is the first hypersphere we encounter. It’s a surface that’s three-dimensional, and it takes a four-dimensional space to see it. You might be able to picture this in your head. When I try I imagine something that looks like the regular old surface of the sphere, only it has fancier shading and maybe some extra lines to suggest depth. That’s all right. We can describe the thing even if we can’t imagine it perfectly. S^{4}, well, that’s something taking five dimensions of space to fit in. I don’t blame you if you don’t bother trying to imagine what that looks like exactly.

The need for S^{4} itself tends to be rare. If we want to prove something about a function on a hypersphere we usually make do with S^{n}. This doesn’t tell us how many dimensions we’re working with. But we can imagine that as a regular old sphere only with a most fancy job of drawing lines on it.

If we want to talk about S^{n} aloud, or if we just want some variation in our prose, we might call it an n-sphere instead. So the 2-sphere is the surface of the regular old sphere that’s good enough for everybody but mathematicians. The 1-sphere is the circle. The 3-sphere and so on are harder to imagine. Wikipedia asserts that 3-spheres and higher-dimension hyperspheres are sometimes called “glomes”. I have not heard this word before, and I would expect it to start a fight if I tried to play it in Scrabble. However, I do not do mathematics that often requires discussion of hyperspheres. I leave this space open to people who do and who can say whether “glome” is a thing.

Something that all these S^{n} sets have in common are that they are the surfaces of spheres. They are just the boundary, and omit the interior. If we want a function that’s defined on the interior of the Earth we need to find a different domain.

Brian Fies’s Mom’s Cancer is a heartbreaking story. It’s compelling reading, but people who are emotionally raw from lost love ones, or who know they’re particularly sensitive to such stories, should consider before reading that the comic is about exactly what the title says.

But it belongs here because in the October 29th and the November 2nd installments are about a curiosity of area, and volume, and hypervolume, and more. That is that our perception of how big a thing is tends to be governed by one dimension, the length or the diameter of the thing. But its area is the square of that, its volume the cube of that, its hypervolume some higher power yet of that. So very slight changes in the diameter produce great changes in the volume. Conversely, though, great changes in volume will look like only slight changes. This can hurt.

Tom Toles’s Randolph Itch, 2 am from the 29th of October is a Roman numerals joke. I include it as comic relief. The clock face in the strip does depict 4 as IV. That’s eccentric but not unknown for clock faces; IIII seems to be more common. There’s not a clear reason why this should be. The explanation I find most nearly convincing is an aesthetic one. Roman numerals are flexible things, and can be arranged for artistic virtue in ways that Arabic numerals make impossible.

The aesthetic argument is that the four-character symbol IIII takes up nearly as much horizontal space as the VIII opposite it. The two-character IV would look distractingly skinny. Now, none of the symbols takes up exactly the same space as their counterpart. X is shorter than II, VII longer than V. But IV-versus-VIII does seem like the biggest discrepancy. Still, Toles’s art shows it wouldn’t look all that weird. And he had to conserve line strokes, so that the clock would read cleanly in newsprint. I imagine he also wanted to avoid using different representations of “4” so close together.

Jon Rosenberg’s Scenes From A Multiverse for the 29th of October is a riff on both quantum mechanics — Schödinger’s Cat in a box — and the uncertainty principle. The uncertainty principle can be expressed as a fascinating mathematical construct. It starts with Ψ, a probability function that has spacetime as its domain, and the complex-valued numbers as its range. By applying a function to this function we can derive yet another function. This function-of-a-function we call an operator, because we’re saying “function” so much it’s starting to sound funny. But this new function, the one we get by applying an operator to Ψ, tells us the probability that the thing described is in this place versus that place. Or that it has this speed rather than that speed. Or this angular momentum — the tendency to keep spinning — versus that angular momentum. And so on.

If we apply an operator — let me call it A — to the function Ψ, we get a new function. What happens if we apply another operator — let me call it B — to this new function? Well, we get a second new function. It’s much the way if we take a number, and multiply it by another number, and then multiply it again by yet another number. Of course we get a new number out of it. What would you expect? This operators-on-functions things looks and acts in many ways like multiplication. We even use symbols that look like multiplication: AΨ is operator A applied to function Ψ, and BAΨ is operator B applied to the function AΨ.

Now here is the thing we don’t expect. What if we applied operator B to Ψ first, and then operator A to the product? That is, what if we worked out ABΨ? If this was ordinary multiplication, then, nothing all that interesting. Changing the order of the real numbers we multiply together doesn’t change what the product is.

Operators are stranger creatures than real numbers are. It can be that BAΨ is not the same function as ABΨ. We say this means the operators A and B do not commute. But it can be that BAΨ is exactly the same function as ABΨ. When this happens we say that A and B do commute.

Whether they do or they don’t commute depends on the operators. When we know what the operators are we can say whether they commute. We don’t have to try them out on some functions and see what happens, although that sometimes is the easiest way to double-check your work. And here is where we get the uncertainty principle from.

The operator that lets us learn the probability of particles’ positions does not commute with the operator that lets us learn the probability of particles’ momentums. We get different answers if we measure a particle’s position and then its velocity than we do if we measure its velocity and then its position. (Velocity is not the same thing as momentum. But they are related. There’s nothing you can say about momentum in this context that you can’t say about velocity.)

The uncertainty principle is a great source for humor, and for science fiction. It seems to allow for all kinds of magic. Its reality is no less amazing, though. For example, it implies that it is impossible for an electron to spiral down into the nucleus of an atom, collapsing atoms the way satellites eventually fall to Earth. Matter can exist, in ways that let us have solid objects and chemistry and biology. This is at least as good as a cat being perhaps boxed.

Jan Eliot’s Stone Soup Classics for the 29th of October is a rerun from 1995. (The strip itself has gone to Sunday-only publication.) It’s a joke about how arithmetic is easy when you have the proper motivation. In 1995 that would include catching TV shows at a particular time. You see, in 1995 it was possible to record and watch TV shows when you wanted, but it required coordinating multiple pieces of electronics. It would often be easier to just watch when the show actually aired. Today we have it much better. You can watch anything you want anytime you want, using any piece of consumer electronics you have within reach, including several current models of microwave ovens and programmable thermostats. This does, sadly, remove one motivation for doing arithmetic. Also, I’m not certain the kids’ TV schedule is actually consistent with what was on TV in 1995.

Oh, heck, why not. Obviously we’re 14 minutes before the hour. Let me move onto the hour for convenience. It’s 744 minutes to the morning cartoons; that’s 12.4 hours. Taking the morning cartoons to start at 8 am, that means it’s currently 14 minutes before 24 minutes before 8 pm. I suspect a rounding error. Let me say they’re coming up on 8 pm. 194 minutes to Jeopardy implies the game show is on at 11 pm. 254 minutes to The Simpsons puts that on at midnight, which is probably true today, though I don’t think it was so in 1995 just yet. 284 minutes to Grace puts that on at 12:30 am.

I suspect that Eliot wanted it to be 978 minutes to the morning cartoons, which would bump Oprah to 4:00, Jeopardy to 7:00, Simpsons and Grace to 8:00 and 8:30, and still let the cartoons begin at 8 am. Or perhaps the kids aren’t that great at arithmetic yet.

Stephen Beals’s Adult Children for the 30th of October tries to build a “math error” out of repeated use of the phrase “I couldn’t care less”. The argument is that the thing one cares least about is unique. But why can’t there be two equally least-cared-about things?

We can consider caring about things as an optimization problem. Optimization problems are about finding the most of something given some constraints. If you want the least of something, multiply the thing you have by minus one and look for the most of that. You may giggle at this. But it’s the sensible thing to do. And many things can be equally high, or low. Take a bundt cake pan, and drizzle a little water in it. The water separates into many small, elliptic puddles. If the cake pan were perfectly formed, and set on a perfectly level counter, then the bottom of each puddle would be at the same minimum height. I grant a real cake pan is not perfect; neither is any counter. But you can imagine such.

Just because you can imagine it, though, must it exist? Think of the “smallest positive number”. The idea is simple. Positive numbers are a set of numbers. Surely there’s some smallest number. Yet there isn’t; name any positive number and we can name a smaller number. Divide it by two, for example. Zero is smaller than any positive number, but it’s not itself a positive number. A minimum might not exist, at least not within the confines where we are to look. It could be there is not something one could not care less about.

So a minimum might or might not exist, and it might or might not be unique. This is why optimization problems are exciting, challenging things.

Niklas Eriksson’s Carpe Diem for the 1st of November is about understanding the universe by way of observation and calculation. We do rely on mathematics to tell us things about the universe. Immanuel Kant has a bit of reputation in mathematical physics circles for this observation. (I admit I’ve never seen the original text where Kant observed this, so I may be passing on an urban legend. My love has several thousands of pages of Kant’s writing, but I do not know if any of them touch on natural philosophy.) If all we knew about space was that gravitation falls off as the square of the distance between two things, though, we could infer that space must have three dimensions. Otherwise that relationship would not make geometric sense.

Jeff Harris’s kids-information feature Shortcuts for the 1st of November was about the Harvard Computers. By this we mean the people who did the hard work of numerical computation, back in the days before this could be done by electrical and then electronic computer. Mathematicians relied on people who could do arithmetic in those days. There is the folkloric belief that mathematicians are inherently terrible at arithmetic. (I suspect the truth is people assume mathematicians must be better at arithmetic than they really are.) But here, there’s the mathematics of thinking what needs to be calculated, and there’s the mathematics of doing the calculations.

Their existence tends to be mentioned as a rare bit of human interest in numerical mathematics books, usually in the preface in which the author speaks with amazement of how people who did computing were once called computers. I wonder if books about font and graphic design mention how people who typed used to be called typewriters in their prefaces.

After talking about the real numbers last time, I had two obvious sets to use as follow up. Of course I’d overthink the choice of which to make my next common domain-and-range set.

R^{n}

R^{n} is pronounced “are enn”, just as you might do if you didn’t know enough mathematics to think the superscript meant something important. It does mean something important; it’s just that there’s not a graceful way to say what offhand. This is the set of n-tuples of real numbers. That is, anything you pick out of R^{n} is an ordered set of things all of which are themselves real numbers. The “n” here is the name for some whole number whose value isn’t going to change during the length of this problem.

So when we speak of R^{n} we are really speaking of a family of sets, all of them similar in some important ways. The things in R^{2} look like pairs of real numbers: (3, 4), or (4π, -2e), or (2038, 0.010010001), pairs like that. The things in R^{3} are triplets of real numbers: (3, 4, 5), or (4π, -2e, 1 + 1/π). The things in R^{4} are quartets of real numbers: (3, 4, 5, 12) or (4π, -2e, 1 + 1/π, -6) or so. The things in R^{10} are probably clear enough to not need listing.

It’s possible to add together two things in R^{n}. At least if they come from the same R^{n}; you can’t add a pair of numbers to a quartet of numbers, not if you’re being honest. The addition rule is just what you’d come up with if you didn’t know enough mathematics to be devious, though: add the first number of the first thing to the first number of the second thing, and that’s the first number of the sum. Add the second number of the first thing to the second number of the second thing, and that’s the second number of the sum. Add the third number of the first thing to the third number of the second thing, and that’s the third number of the sum. Keep on like this until you run out of numbers in each thing. It’s possible you already have.

You can’t multiply together two things in R^{n}, though, unless your n is 1. (There may be some conceptual difference between R^{1} and plain old R. But I don’t recall seeing a mathematician being interested in the difference except when she’s studying the philosophy of mathematics.) The obvious multiplication scheme — multiply matching numbers, like you do with addition — produces something that doesn’t work enough like multiplication to be interesting. It’s possible for some n’s to work out schemes that act like multiplication enough to be interesting, but for the most part we don’t need them.

What we will do, though, is multiply something in R^{n} by a single real number. That real number is called a “scalar”. You do the multiplication, again, like you’d do if you were too new to mathematics to be clever. Multiply the first number in your thing by the scalar, and that’s the first number in your product. Multiply the second number in your thing by the scalar, and that’s the second number in your product. Multiply the third number in your thing by the scalar, and that’s the third number in your product. Carry on like this until you run out of numbers, and then stop. Usually good advice.

That you can add together two things from R^{n}, and you can multiply anything in R^{n} by a scalar, makes this a “vector space”. (There are some more requirements, but they amount to addition and multiplication working like you’d expect.) The term means about what you think; a “space” is a … well … something that acts mathematically like ordinary everyday space works. A “vector space” is a space where the things inside it are vectors. Vectors are a combination of a direction and a distance in that direction. They’re very well-represented as n-tuples. They get represented as n-tuples so often it’s easy to forget that’s just a convenient way to write them down.

This vector space property of R^{n} makes it a really useful set. R^{2} corresponds naturally to “the points on a flat surface”. R^{3} corresponds naturally to an idea of “all the points in normal everyday space where something could be”. Or, if you like, it can represent “the speed and direction something is travelling in”. Or the direction and amount of its acceleration, for that matter.

Because of these mathematicians will often call R^{n} the “n-dimensional Euclidean space”. The n is about how many components there are in an element of the set. The “space” tells us it’s a space. “Euclidean” tells us that it looks and works like, well, Euclidean geometry. We can talk about the distance between points and use the ideas we had from plane or solid geometry. We can talk about angles and areas and volumes similarly. We can do this so much we might say “n-dimensional space” as if there weren’t anything but Euclidean spaces out there.

And this is useful for more than describing where something happens to be. A great number of physics problems find it convenient to study the position and the velocity of a number of particles which interact. If we have N particles, then, and we’re in a three-dimensional space, and we’re keeping track of positions and velocities for each of them, then we can describe where everything is and how everything is moving as one element in the space R^{6N}. We can describe movement in time as a function that has a domain of R^{6N} and a range of R^{6N}, and see the progression of time as tracing out a path in that space.

We can’t draw that, obviously, and I’d look skeptically at people who say they can visualize it. What we usually draw is a little enclosed space that’s either a rectangle or a blob, and draw out lines — “trajectories” — inside that. The different spots along the trajectory correspond to all the positions and velocities of all the particles in the system at different times.

Though that’s a fantastic use, it’s not the only one. It’s not required, for example, that a function have the same R^{n} as both domain and range. It can have different sets. If we want to be clear that the domain and range can be of different sizes, it’s common to call one R^{n} and the other R^{m} if we aren’t interested in pinning down just which spaces they are.

But, for example, a perfectly legitimate function would have a domain of R^{3} and a range of R^{1}, the reals. There’s even an obvious, common one: return the size, the magnitude, of whatever the vector in the domain is. Or we might take as domain R^{4}, and the range R^{2}, following the rule “match an element in the domain to an element in the range that has the same first and third components”. That kind of function is called a “projection”, as it gives what might look like the shadow of the original thing in a smaller space.

If we wanted to go the other way, from R^{2} to R^{4} as an example, we could. Here set the rule “match an element in the domain to an element in the range which has the same first and second components, and has ‘3’ and ‘4’ as the third and fourth components”. That’s an “embedding”, giving us the idea that we can put a Euclidean space with fewer dimensions into a space with more. The idea comes naturally to anyone who’s seen a cartoon where a character leaps off the screen and interacts with the real world.

This is the real numbers. In text that’s written with a bold R. Written by hand, and often in text, that’s written with a capital R that has a double stroke for the main vertical line. That’s an easy-to-write way to distinguish it from a plain old civilian R. The double-vertical-stroke convention is used for many of the most common sets of numbers. It will get used for letters like I and J (the integers), or N (the counting numbers). A vertical stroke will even get added to symbols that technically don’t have any vertical strokes, like Q (the rational numbers). There it’s just put inside the loop, on the left side, far enough from the edge that the reader can notice the vertical stroke is there.

R is a big one. It’s not just a big set. It’s also a popular one. It may as well be the default domain and range. If someone fails to tell you what either set is, you can suppose she meant R and be only rarely wrong. The real numbers are familiar and popular and it feels like we know what they are. It’s a bit tricky to define them exactly, though, and you’ll notice that I’m not doing that. You know what I mean, though. It’s whole numbers, and rational numbers, and irrational numbers like the square root of pi, and for that matter pi, and a whole bunch of other boring numbers nobody looks at. Let’s leave it at that.

All the intervals I talked about last time are subsets of R. If we really wanted to, we could turn a function with domain an interval like [0, 1] into a function with a domain of R. That’s a kind of “embedding”. Let me call the function with domain [0, 1] by the name “f”. I’ll then define g, on the domain R, by the rule “whatever f(x) is, if x is from 0 to 1; and some other, harmless value, if x isn’t”. Probably the harmless value is zero. Sometimes we need to change the domain a function’s defined on, and this is a way to do it.

If we only want to talk about the positive real numbers we can denote that by putting a plus sign in superscript: R^{+}. If we only want the negative numbers we put in a minus sign: R^{–}. Do either of these include zero? My heart tells me neither should, but I wouldn’t be surprised if in practice either did, because zero is often useful to have around. To be careful we might explicitly include zero, using the notations of set theory. Then we might write .

Sometimes the rule for a function doesn’t make sense for some values. For example, if a function has the rule then you can’t work out a value for f(1). That would require dividing by zero and we dare not do that. A careful mathematician would say the domain of that function f is all the real numbers R except for the number 1. This exclusion gets written as “R \ {1}”. The backslash means “except the numbers in the following set”. It might be a single number, such as in this example. It might be a lot of numbers. The function is meaningless for any x that’s equal to or greater than 1. We could write its domain then as “R \ { x: x ≥ 1 }”.

That’s if we’re being careful. If we get a little careless, or if we’re writing casually, or if the set of non-permitted points is complicated we might omit that. Mathematical writing includes an assumption of good faith. The author is supposed to be trying to say something interesting and true. The reader is expected to be skeptical but not quarrelsome. Spotting a flaw in the argument because the domain doesn’t explicitly rule out some points it shouldn’t have is tedious. Finding that the interesting thing only holds true for values that are implicitly outside the domain is serious.

The set of real numbers is a group; it has an operation that works like addition. We call it addition. For that matter, it’s a ring. It has an operation that works like multiplication. We call it multiplication. And it’s even more than a ring. Everything in R except for the additive identity — 0, the number you can add to anything without changing what the thing is — has a multiplicative inverse. That is, any number except zero has some number you can multiply it by to get 1. This property makes it a “field”, to people who study (abstract) algebra. This “field” hasn’t got anything to do with gravitational or electrical or baseball or magnetic fields. But the overlap in names does serve to sometimes confuse people.

But having this multiplicative inverse means that we can do something that operates like division. Divide one thing by a second by taking the first thing and multiplying it by the second thing’s multiplicative inverse. We call this division-like operation “division”.

It’s not coincidence that the algebraic “addition” and “multiplication” and “division” operations are the ones we call addition and multiplication and division. What makes abstract algebra abstract is that it’s the study of things that work kind of like the real numbers do. The operations we can do on the real numbers inspire us to look for other sets that can let us do similar things.

We can describe each of these sets in words, and often will when speaking or when describing a line of argument. But when we want to work, we start using shorthand names, often single letters. For the sets of the domain and range these are usually capital letters. I haven’t noticed much of a preference for which letters to use. D for domain and R for range have a hard-to-resist logic if we don’t really care what the sets are.

There are some sets that are used as domains or ranges a lot, and those have common shorthands. The set of real numbers is often written as R — bold, in print, or written with a double vertical stroke on the R if you’re doing this by hand. The set of whole numbers, integers, gets written as I (for integer) or J (again for integer; the letters I and J weren’t perceived as truly different things until recently) or Z (for Zahlen, German for “counting number”). There are a lot of others and don’t worry about them.

The rules for a function are generally described by a lowercase letter. It’s most commonly f, with g and h pressed into service if f won’t do. Subscrips are common also: f_{1}, f_{2}, f_{j}, f_{n}, and so on. Again, any name is allowed, as long as you’re consistent about it. But f and g and h are used as “names of functions” so often that it’s what the reader will expect they mean even without being told.

One common shorthand for saying that a function named “f” has the domain “D” and the range “R” is to use an arrow. Write out “f: D –> R”. The function name comes first, before the colon; then the domain, and an arrow, and the range. There are other notations but this is the one I see most often. This is often read aloud as “f maps D into R”. The activity of the verb “map” — well, it’s kind of action-y — suggests motion to my mind. Functions are commonly used to describe how a system changes over time. This seems mnemonic to me, as arrows suggest flow and motion. We often use the language of flowing things even for problems that don’t have anything to do with moving objects or any sense of time.

There’s another part of function-defining that has to be done, though. Most often we’re interested in domains and ranges that are both numbers, or at least collections of numbers. And we want to describe matching something in the domain with something in the range based on a formula. If “x” is a number in the domain then, say, “x^{2} – 4x + 4” is the corresponding number in the range.

One way to write down this rule is the way we get in introductory algebra class, and to write something like “f(x) = x^{2} – 4x + 4”. The “x” is, here, a dummy variable. We will never care about pinning it down to any particular number. If we write “f(3)” we mean to evaluate whatever’s on the right hand of the equals sign, using 3, the thing in parentheses, wherever “x” appears in the rule definition. In this case that would be the number 3^{2} – 4*3 + 4 which it happens is 1. If we write “f(1 – t)” we would evaluate “(1 – t)^{2} – 4(1 – t) + 4” which we might want to leave as is, or might want to simplify somehow. It depends what we’re up to.

But we can also use an arrow notation, and write the same rule as “f: x –> x^{2} – 4x + 4”. My feeling is this notation makes it clearer that the definition isn’t itself something to solve, and that the definition doesn’t care what value x is. It should suggest how we can substitute anything for x and should do so throughout the expression to the right of the arrow.

Wikipedia asserts that when writing the rule this way there should be a vertical stroke on the left side of the arrow. This is probably a good rule, since “f: D –> R” and “f: x –> x^{2} – 4x + 4” are talking about different things. I’m not sure the rule is consistently followed, though. I suspect that in most contexts it’s clear what is meant.