## My All 2020 Mathematics A to Z: Jacobi Polynomials

Mr Wu, author of the Singapore Maths Tuition blog, gave me a good nomination for this week’s topic: the j-function of number theory. Unfortunately I concluded I didn’t understand the function well enough to write about it. So I went to a topic of my own choosing instead.

The Jacobi Polynomials discussed here are named for Carl Gustav Jacob Jacobi. Jacobi lived in Prussia in the first half of the 19th century. Though his career was short, it was influential. I’ve already discussed the Jacobian, which describes how changes of variables change volume. He has a host of other things named for him, most of them in matrices or mathematical physics. He was also a pioneer in those elliptic curves you hear so much about these days.

# Jacobi Polynomials.

Jacobi Polynomials are a family of functions. Polynomials, it happens; this is a happy case where the name makes sense. “Family” is the name mathematicians give to a bunch of functions that have some similarity. This often means there’s a parameter, and each possible value of the parameter describes a different function in the family. For example, we talk about the family of sine functions, $S_n(z)$. For every integer n we have the function $S_n(z) = \sin(n z)$ where z is a real number between -π and π.

We like a family because every function in it gives us some nice property. Often, the functions play nice together, too. This is often something like mutual orthogonality. This means two different representatives of the family are orthogonal to one another. “Orthogonal” means “perpendicular”. We can talk about functions being perpendicular to one another through a neat mechanism. It comes from vectors. It’s easy to use vectors to represent how to get from one point in space to another. From vectors we define a dot product, a way of multiplying them together. A dot product has to meet a couple rules that are pretty easy to do. And if you don’t do anything weird? Then the dot product between two vectors is the cosine of the angle made by the end of the first vector, the origin, and the end of the second vector.

Functions, it turns out, meet all the rules for a vector space. (There are not many rules to make a vector space.) And we can define something that works like a dot product for two functions. Take the integral, over the whole domain, of the first function times the second. This meets all the rules for a dot product. (There are not many rules to make a dot product.) Did you notice me palm that card? When I did not say “the dot product is take the integral …”? That card will come back. That’s for later. For now: we have a vector space, we have a dot product, we can take arc-cosines, so why not define the angle between functions?

Mostly we don’t because we don’t care. Where we do care? We do like functions that are at right angles to one another. As with most things mathematicians do, it’s because it makes life easier. We’ll often want to describe properties of a function we don’t yet know. We can describe the function we don’t yet know as the sum of coefficients — some fixed real number — times basis functions that we do know. And then our problem of finding the function changes to one of finding the coefficients. If we picked a set of basis functions that are all orthogonal to one another, the finding of these coefficients gets easier. Analytically and numerically: we can often turn each coefficient into its own separate problem. Let a different computer, or at least computer process, work on each coefficient and get the full answer much faster.

The Jacobi Polynomials have three coefficients. I see them most often labelled α, β, and n. Likely you imagine this means it’s a huge family. It is huger than that. A zoologist would call this a superfamily, at least. Probably an order, possibly a class.

It turns out different relationships of these coefficients give you families of functions. Many of these families are noteworthy enough to have their own names. For example, if α and β are both zero, then the Jacobi functions are a family also known as the Legendre Polynomials. This is a great set of orthogonal polynomials. And the roots of the Legendre Polynomials give you information needed for Gaussian quadrature. Gaussian quadrature is a neat trick for numerically integrating a function. Take a weighted sum of the function you’re integrating evaluated at a set of points. This can get a very good — maybe even perfect — numerical estimate of the integral. The points to use, and the weights to use, come from a Legendre polynomial.

If α and β are both $-\frac{1}{2}$ then the Jacobi Polynomials are the Chebyshev Polynomials of the first kind. (There’s also a second kind.) These are handy in approximation theory, describing ways to better interpolate a polynomial from a set of data. They also have a neat, peculiar relationship to the multiple-cosine formulas. Like, $\cos(2\theta) = 2\cos^2(\theta) - 1$. And the second Chebyshev polynomial is $T_2(x) = 2x^2 - 1$. Imagine sliding between x and $cos(\theta)$ and you see the relationship. $cos(3\theta) = 4 \cos^3(\theta) - 3\cos(\theta)$ and $T_3(x) = 4x^3 - 3x$. And so on.

Chebyshev Polynomials have some superpowers. One that’s most amazing is accelerating convergence. Often a numerical process, such as finding the solution of an equation, is an iterative process. You can’t find the answer all at once. You instead find an approximation and do something that improves it. Each time you do the process, you get a little closer to the true answer. This can be fine. But, if the problem you’re working on allows it, you can use the first couple iterations of the solution to figure out where this is going. The result is that you can get very good answers using the same amount of computer time you needed to just get decent answers. The trade, of course, is that you need to understand Chebyshev Polynomials and accelerated convergence. We always have to make trades like that.

Back to the Jacobi Polynomials family. If α and β are the same number, then the Jacobi functions are a family called the Gegenbauer Polynomials. These are great in mathematical physics, in potential theory. You can turn the gravitational or electrical potential function — that one-over-the-distance-squared force — into a sum of better-behaved functions. And they also describe zonal spherical harmonics. These let you represent functions on the surface of a sphere as the sum of coefficients times basis functions. They work in much the way the terms of a Fourier series do.

If β is zero and there’s a particular relationship between α and n that I don’t want to get into? The Jacobi Polynomials become the Zernike Polynomials, which I never heard of before this paragraph either. I read they are the tools you need to understand optics, and particularly how lenses will alter the light passing through.

Since the Jacobi Polynomials have a greater variety of form than even poison ivy has, you’ll forgive me not trying to list them. Or even listing a representative sample. You might also ask how they’re related at all.

Well, they all solve the same differential equation, for one. Not literally a single differential equation. A family of differential equations, where α and β and n turn up in the coefficients. The formula using these coefficients is the same in all these differential equations. That’s a good reason to see a relationship. Or we can write the Jacobi Polynomials as a series, a function made up of the sum of terms. The coefficients for each of the terms depends on α and β and n, always in the same way. I’ll give you that formula. You won’t like it and won’t ever use it. The Jacobi Polynomial for a particular α, β, and n is the polynomial

$P_n^{(\alpha, \beta)}(z) = (n+\alpha)!(n + \beta)!\sum_{s=0}^n \frac{1}{s!(n + \alpha - s)!(\beta + s)!(n - s)!}\left(\frac{z-1}{2}\right)^{n-s}\left(\frac{z + 1}{2}\right)^s$

Its domain, by the way, is the real numbers from -1 to 1. We need something for the domain. It turns out there’s nothing you can do on the real numbers that you can’t fit into the domain from -1 to 1 anyway. (If you have to do something on, say, the interval from 10 to 54? Do a change of variable, scaling things down and moving them, and use -1 to 1. Then undo that change when you’re done.) The range is the real numbers, as you’d expect.

(You maybe noticed I used ‘z’ for the independent variable there, rather than ‘x’. Usually using ‘z’ means we expect this to be a complex number. But ‘z’ here is definitely a real number. This is because we can also get to the Jacobi Polynomials through the hypergeometric series, a function I don’t want to get into. But for the hypergeometric series we are open to the variable being a complex number. So many references carry that ‘z’ back into Jacobi Polynomials.)

Another thing which links these many functions is recurrence. If you know the Jacobi Polynomial for one set of parameters — and you do; $P_0^{(\alpha, \beta)}(z) = 1$ — you can find others. You do this in a way rather like how you find new terms in the Fibonacci series by adding together terms you already know. These formulas can be long. Still, if you know $P_{n-1}^{(\alpha, \beta)}$ and $P_{n-2}^{(\alpha, \beta)}$ for the same α and β? Then you can calculate $P_n^{(\alpha, \beta)}$ with nothing more than pen, paper, and determination. If it helps,

$P_1^{(\alpha, \beta)}(z) = (\alpha + 1) + (\alpha + \beta + 2)\frac{z - 1}{2}$

and this is true for any α and β. You’ll never do anything with that. This is fine.

There is another way that all these many polynomials are related. It goes back to their being orthogonal. We measured orthogonality by a dot product. Back when I palmed that card I told you was the integral of the two functions multiplied together. This is indeed a dot product. We can define others. We make those others by taking a weighted integral of the product of these two functions. That is, integrate the two functions times a third, a weight function. Of course there’s reasons to do this; they amount to deciding that some parts of the domain are more important than others. The weight function can be anything that meets a few rules. If you want to get the Jacobi Polynomials out of them, you start with the function $P_0^{(\alpha, \beta)}(z) = 1$ and the weight function

$w_n(z) = (1 - z)^{\alpha} (1 + z)^{\beta}$

As I say, though, you’ll never use that. If you’re eager and ready to leap into this work you can use this to build a couple Legendre Polynomials. Or Chebyshev Polynomials. For the full Jacobi Polynomials, though? Use, like, the command JacobiP[n, a, b, z] in Mathematica, or jacobiP(n, a, b, z) in Matlab. Other people have programmed this for you. Enjoy their labor.

In my work I have not used the full set of Jacobi Polynomials much. There’s more of them than I need. I do rely on the Legendre Polynomials, and the Chebyshev Polynomials. Other mathematicians use other slices regularly. It is stunning to sometimes look and realize that these many functions, different as they look, are reflections of one another, though. Mathematicians like to generalize, and find one case that covers as many things as possible. It’s rare that we are this successful.

I thank you for reading this. All of this year’s A-to-Z essays should be available at this link. The essays from every A-to-Z sequence going back to 2015 should be at this link. And I’m already looking ahead to the M, N, and O essays that I’ll be writing the day before publication instead of the week before like I want! I appreciate any nominations you have, even ones I can’t cover fairly.

## My 2019 Mathematics A To Z: Operator

Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there.

# Operator.

An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same sapce but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.

So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.

My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.

I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.

Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function $af + bg$ is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of $af + bg$ is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.

I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.

One of the conceptual leaps in high school algebra is that we start analyzing the things we do with numbers. Like, we don’t just take the number three, square it, multiply that by two and add to that the number three times four and add to that the number 1. We think about what if we take any number, call it x, and think of $2x^2 + 4x + 1$. And what if we make equations based on doing this $2x^2 + 4x + 1$; what values of x make those equations true? Or tell us something interesting?

Operators represent a similar leap. We can think of functions as things we manipulate, and think of those manipulations as a particular thing to do. For example, let me come up with a differential expression. For some function u(x) work out the value of this:

$2\frac{d^2 u(x)}{dx^2} + 4 \frac{d u(x)}{dx} + u(x)$

Let me join in the convention of using ‘D’ for the differential operator. Then we can rewrite this expression like so:

$2D^2 u + 4D u + u$

Suddenly the differential equation looks a lot like a polynomial. Of course it does. Remember that everything in mathematics is polynomials. We get new tools to solve differential equations by rewriting them as operators. That’s nice. It also scratches that itch that I think everyone in Intro to Calculus gets, of wanting to somehow see $\frac{d^2}{dx^2}$ as if it were a square of $\frac{d}{dx}$. It’s not, and $D^2$ is not the square of $D$. It’s composing $D$ with itself. But it looks close enough to squaring to feel comfortable.

Nobody needs to do $2D^2 u + 4D u + u$ except to learn some stuff about operators. But you might imagine a world where we did this process all the time. If we did, then we’d develop shorthand for it. Maybe a new operator, call it T, and define it that $T = 2D^2 + 4D + 1$. You see the grammar of treating functions as if they were real numbers becoming familiar. You maybe even noticed the ‘1’ sitting there, serving as the “identity operator”. You know how you’d write out $Tv(x) = 3$ if you needed to write it in full.

But there are operators that we use all the time. These do get special names, and often shorthand. For example, there’s the gradient operator. This applies to any function with several independent variables. The gradient has a great physical interpretation if the variables represent coordinates of space. If they do, the gradient of a function at a point gives us a vector that describes the direction in which the function increases fastest. And the size of that gradient — a functional on this operator — describes how fast that increase is.

The gradient itself defines more operators. These have names you get very familiar with in Vector Calculus, with names like divergence and curl. These have compelling physical interpretations if we think of the function we operate on as describing a moving fluid. A positive divergence means fluid is coming into the system; a negative divergence, that it is leaving. The curl, in fluids, describe how nearby streams of fluid move at different rate.

Physical interpretations are common in operators. This probably reflects how much influence physics has on mathematics and vice-versa. Anyone studying quantum mechanics gets familiar with a host of operators. These have comfortable names like “position operator” or “momentum operator” or “spin operator”. These are operators that apply to the wave function for a problem. They transform the wave function into a probability distribution. That distribution describes what positions or momentums or spins are likely, how likely they are. Or how unlikely they are.

They’re not all physical, though. Or not purely physical. Many operators are useful because they are powerful mathematical tools. There is a variation of the Fourier series called the Fourier transform. We can interpret this as an operator. Suppose the original function started out with time or space as its independent variable. This often happens. The Fourier transform operator gives us a new function, one with frequencies as independent variable. This can make the function easier to work with. The Fourier transform is an integral operator, by the way, so don’t go thinking everything is a complicated set of derivatives.

Another integral-based operator that’s important is the Laplace transform. This is a great operator because it turns differential equations into algebraic equations. Often, into polynomials. You saw that one coming.

This is all a lot of good press for operators. Well, they’re powerful tools. They help us to see that we can manipulate functions in the ways that functions let us manipulate numbers. It should sound good to realize there is much new that you can do, and you already know most of what’s needed to do it.

This and all the other Fall 2019 A To Z posts should be gathered here. And once I have the time to fiddle with tags I’ll have all past A to Z essays gathered at this link. Thank you for reading. I should be back on Thursday with the letter P.

## My 2019 Mathematics A To Z: Differential Equations

The thing most important to know about differential equations is that for short, we call it “diff eq”. This is pronounced “diffy q”. It’s a fun name. People who aren’t taking mathematics smile when they hear someone has to get to “diffy q”.

Sometimes we need to be more exact. Then the less exciting names “ODE” and “PDE” get used. The meaning of the “DE” part is an easy guess. The meaning of “O” or “P” will be clear by the time this essay’s finished. We can find approximate answers to differential equations by computer. This is known generally as “numerical solutions”. So you will encounter talk about, say, “NSPDE”. There’s an implied “of” between the S and the P there. I don’t often see “NSODE”. For some reason, probably a quite arbitrary historical choice, this is just called “numerical integration” instead.

To write about “differential equations” was suggested by aajohannas, who is on Twitter as @aajohannas.

# Differential Equations.

One of algebra’s unsettling things is the idea that we can work with numbers without knowing their values. We can give them names, like ‘x’ or ‘a’ or ‘t’. We can know things about them. Often it’s equations telling us these things. We can make collections of numbers based on them all sharing some property. Often these things are solutions to equations. We can even describe changing those collections according to some rule, even before we know whether any of the numbers is 2. Often these things are functions, here matching one set of numbers to another.

One of analysis’s unsettling things is the idea that most things we can do with numbers we can also do with functions. We can give them names, like ‘f’ and ‘g’ and … ‘F’. That’s easy enough. We can add and subtract them. Multiply and divide. This is unsurprising. We can measure their sizes. This is odd but, all right. We can know things about functions even without knowing exactly what they are. We can group together collections of functions based on some properties they share. This is getting wild. We can even describe changing these collections according to some rule. This change is itself a function, but it is usually called an “operator”, saving us some confusion.

So we can describe a function in an equation. We may not know what f is, but suppose we know $\sqrt{f(x) - 2} = x$ is true. We can suppose that if we cared we could find what function, or functions, f made that equation true. There is shorthand here. A function has a domain, a range, and a rule. The equation part helps us find the rule. The domain and range we get from the problem. Or we take the implicit rule that both are the biggest sets of real-valued numbers for which the rule parses. Sometimes biggest sets of complex-valued numbers. We get so used to saying “the function” to mean “the rule for the function” that we’ll forget to say that’s what we’re doing.

There are things we can do with functions that we can’t do with numbers. Or at least that are too boring to do with numbers. The most important here is taking derivatives. The derivative of a function is another function. One good way to think of a derivative is that it describes how a function changes when its variables change. (The derivative of a number is zero, which is boring except when it’s also useful.) Derivatives are great. You learn them in Intro Calculus, and there are a bunch of rules to follow. But follow them and you can pretty much take the derivative of any function even if it’s complicated. Yes, you might have to look up what the derivative of the arc-hyperbolic-secant is. Nobody has ever used the arc-hyperbolic-secant, except to tease a student.

And the derivative of a function is itself a function. So you can take a derivative again. Mathematicians call this the “second derivative”, because we didn’t expect someone would ask what to call it and we had to say something. We can take the derivative of the second derivative. This is the “third derivative” because by then changing the scheme would be awkward. If you need to talk about taking the derivative some large but unspecified number of times, this is the n-th derivative. Or m-th, if you’ve already used ‘n’ to mean something else.

And now we get to differential equations. These are equations in which we describe a function using at least one of its derivatives. The original function, that is, f, usually appears in the equation. It doesn’t have to, though.

We divide the earth naturally (we think) into two pairs of hemispheres, northern and southern, eastern and western. We divide differential equations naturally (we think) into two pairs of two kinds of differential equations.

The first division is into linear and nonlinear equations. I’ll describe the two kinds of problem loosely. Linear equations are the kind you don’t need a mathematician to solve. If the equation has solutions, we can write out procedures that find them, like, all the time. A well-programmed computer can solve them exactly. Nonlinear equations, meanwhile, are the kind no mathematician can solve. They’re just too hard. There’s no processes that are sure to find an answer.

You may ask. We don’t need mathematicians to solve linear equations. Mathematicians can’t solve nonlinear ones. So what do we need mathematicians for? The answer is that I exaggerate. Linear equations aren’t quite that simple. Nonlinear equations aren’t quite that hopeless. There are nonlinear equations we can solve exactly, for example. This usually involves some ingenious transformation. We find a linear equation whose solution guides us to the function we do want.

And that is what mathematicians do in such a field. A nonlinear differential equation may, generally, be hopeless. But we can often find a linear differential equation which gives us insight to what we want. Finding that equation, and showing that its answers are relevant, is the work.

The other hemispheres we call ordinary differential equations and partial differential equations. In form, the difference between them is the kind of derivative that’s taken. If the function’s domain is more than one dimension, then there are different kinds of derivative. Or as normal people put it, if the function has more than one independent variable, then there are different kinds of derivatives. These are partial derivatives and ordinary (or “full”) derivatives. Partial derivatives give us partial differential equations. Ordinary derivatives give us ordinary differential equations. I think it’s easier to understand a partial derivative.

Suppose a function depends on three variables, imaginatively named x, y, and z. There are three partial first derivatives. One describes how the function changes if we pretend y and z are constants, but let x change. This is the “partial derivative with respect to x”. Another describes how the function changes if we pretend x and z are constants, but let y change. This is the “partial derivative with respect to y”. The third describes how the function changes if we pretend x and y are constants, but let z change. You can guess what we call this.

In an ordinary differential equation we would still like to know how the function changes when x changes. But we have to admit that a change in x might cause a change in y and z. So we have to account for that. If you don’t see how such a thing is possible don’t worry. The differential equations textbook has an example in which you wish to measure something on the surface of a hill. Temperature, usually. Maybe rainfall or wind speed. To move from one spot to another a bit east of it is also to move up or down. The change in (let’s say) x, how far east you are, demands a change in z, how far above sea level you are.

That’s structure, though. What’s more interesting is the meaning. What kinds of problems do ordinary and partial differential equations usually represent? Partial differential equations are great for describing surfaces and flows and great bulk masses of things. If you see an equation about how heat transmits through a room? That’s a partial differential equation. About how sound passes through a forest? Partial differential equation. About the climate? Partial differential equations again.

Ordinary differential equations are great for describing a ball rolling on a lumpy hill. It’s given an initial push. There are some directions (downhill) that it’s easier to roll in. There’s some directions (uphill) that it’s harder to roll in, but it can roll if the push was hard enough. There’s maybe friction that makes it roll to a stop.

Put that way it’s clear all the interesting stuff is partial differential equations. Balls on lumpy hills are nice but who cares? Miniature golf course designers and that’s all. This is because I’ve presented it to look silly. I’ve got you thinking of a “ball” and a “hill” as if I meant balls and hills. Nah. It’s usually possible to bundle a lot of information about a physical problem into something that looks like a ball. And then we can bundle the ways things interact into something that looks like a hill.

Like, suppose we have two blocks on a shared track, like in a high school physics class. We can describe their positions as one point in a two-dimensional space. One axis is where on the track the first block is, and the other axis is where on the track the second block is. Physics problems like this also usually depend on momentum. We can toss these in too, an axis that describes the momentum of the first block, and another axis that describes the momentum of the second block.

We’re already up to four dimensions, and we only have two things, both confined to one track. That’s all right. We don’t have to draw it. If we do, we draw something that looks like a two- or three-dimensional sketch, maybe with a note that says “D = 4” to remind us. There’s some point in this four-dimensional space that describes these blocks on the track. That’s the “ball” for this differential equation.

The things that the blocks can do? Like, they can collide? They maybe have rubber tips so they bounce off each other? Maybe someone’s put magnets on them so they’ll draw together or repel? Maybe there’s a spring connecting them? These possible interactions are the shape of the hills that the ball representing the system “rolls” over. An impenetrable barrier, like, two things colliding, is a vertical wall. Two things being attracted is a little divot. Two things being repulsed is a little hill. Things like that.

Now you see why an ordinary differential equation might be interesting. It can capture what happens when many separate things interact.

I write this as though ordinary and partial differential equations are different continents of thought. They’re not. When you model something you make choices and they can guide you to ordinary or to partial differential equations. My own research work, for example, was on planetary atmospheres. Atmospheres are fluids. Representing how fluids move usually calls for partial differential equations. But my own interest was in vortices, swirls like hurricanes or Jupiter’s Great Red Spot. Since I was acting as if the atmosphere was a bunch of storms pushing each other around, this implied ordinary differential equations.

There are more hemispheres of differential equations. They have names like homogenous and non-homogenous. Coupled and decoupled. Separable and nonseparable. Exact and non-exact. Elliptic, parabolic, and hyperbolic partial differential equations. Don’t worry about those labels. They relate to how difficult the equations are to solve. What ways they’re difficult. In what ways they break computers trying to approximate their solutions.

What’s interesting about these, besides that they represent many physical problems, is that they capture the idea of feedback. Of control. If a system’s current state affects how it’s going to change, then it probably has a differential equation describing it. Many systems change based on their current state. So differential equations have long been near the center of professional mathematics. They offer great and exciting pure questions while still staying urgent and relevant to real-world problems. They’re great things.

Thanks again for reading. All Fall 2019 A To Z posts should be at this link. I should get to the letter E for Tuesday. All of the A To Z essays should be at this link. If you have thoughts about other topics I might cover, please offer suggestions for the letters G and H.

## Why I’ll Say 1/x Is A Continuous Function And Why I’ll Say It Isn’t

So let me finally follow up last month’s question. That was whether the function “$\frac{1}{x}$” is continuous. My earlier post lays out what a mathematician means by a “continuous function”. The short version is, we have a good definition for a function being continuous at a point in the domain. If it’s continuous at every point in the domain, it’s a continuous function.

The definition of continuous-at-a-point has some technical stuff that I’m going to skip this essay. The important part is that the stuff ordinary people would call “continuous” mathematicians agree with. Like, if you draw a curve representing the function without having to lift your pen off the paper? That function’s continuous. At least the stretch you drew was.

So is the function “$\frac{1}{x}$” continuous? What if I said absolutely it is, because ‘x’ is a number that happens to be … oh, let’s say it’s 3. And $\frac{1}{3}$ is a constant function; of course that’s continuous. Your sensible response is to ask if I want a punch in the nose. No, I do not.

One of the great breakthroughs of algebra was that we could use letters to represent any number we want, whether or not we know what number it is. So why can’t I get away with this? And the answer is that we live in a society, please. There are rules. At least, there’s conventions. They’re good things. They save us time setting up problems. They help us see things the current problem has with other problems. They help us communicate to people who haven’t been with us through all our past work. As always, these rules are made for our convenience, and we can waive them for good reason. But then you have to say what those reasons are.

What someone expects, if you write ‘x’ without explanation it’s a variable and usually an independent one. Its value might be any of a set of things, and often, we don’t explicitly know what it is. Letters at the start of the alphabet usually stand for coefficients, some fixed number with a value we don’t want to bother specifying. In making this division — ‘a’, ‘b’, ‘c’ for coefficients, ‘x’, ‘y’, ‘z’ for variables — we are following Réné Descartes, who explained his choice of convention quite well. And there are other letters with connotations. We tend to use ‘t’ as a variable if it seems like we’re looking at something which depends on time. If something seems to depend on a radius, ‘r’ goes into service. We use letters like ‘f’ and ‘g’ and ‘h’ for functions. For indexes, ‘i’ and ‘j’ and ‘k’ get called up. For total counts of things, or for powers, ‘n’ and ‘m’, often capitalized, appear. The result is that any mathematician, looking at the expression

$\sum_{j = i}^{n} a_i f(x_j)$

would have a fair idea what kinds of things she was looking at.

So when someone writes “the function $\frac{1}{x}$” they mean “the function which matches ‘x’, in the domain, with $\frac{1}{x}$, in the range”. We write this as “$f(x) = \frac{1}{x}$”. Or, if we become mathematics majors, and we’re in the right courses, we write “$f:x \rightarrow \frac{1}{x}$”. It’s a format that seems like it’s overcomplicating things. But it’s good at emphasizing the idea that a function can be a map, matching a set in the domain to a set in the range.

This is a tiny point. Why discuss it at any length?

It’s because the question “is $\frac{1}{x}$ a continuous function” isn’t well-formed. There’s important parts not specified. We can make it well-formed by specifying these parts. This is adding assumptions about what we mean. What assumptions we make affect what the answer is.

A function needs three components. One component is a set that’s the domain. One component is a set that’s the range. And one component is a rule that pairs up things in the domain with things in the range. But there are some domains and some ranges that we use all the time. We use them so often we end up not mentioning them. We have a common shorthand for functions which is to just list the rule.

So what are the domain and range?

Barring special circumstances, we usually take the domain that offers the most charitable reading of the rule. What’s the biggest set on which the rule makes sense? The domain is that. The range we find once we have the domain and rule. It’s the set that the rule maps the domain onto.

So, for example, if we have the function “f(x) = x2”? That makes sense if ‘x’ is any real number. if there’s no reason to think otherwise, we suppose the domain is the set of all real numbers. We’d write that as the set R. Whatever ‘x’ is, though, ‘x2‘ is either zero or a positive number. So the range is the real numbers greater than or equal to zero. Or the nonnegative real numbers, if you prefer.

And even that reasonably clear guideline hides conventions. Like, who says this should be the real numbers? Can’t you take the square of a complex-valued number? And yes, you absolutely can. Some people even encourage it. So why not use the set C instead?

Convention, again. If we don’t expect to need complex-valued numbers, we don’t tend to use them. I suspect it’s a desire not to invite trouble. The use of ‘x’ as the independent variable is another bit of convention. An ‘x’ can be anything, yes. But if it’s a number, it’s more likely a real-valued number. Same with ‘y’. If we want a complex-valued independent variable we usually label that ‘z’. If we need a second, ‘w’ comes in. Writing “x2” alone suggests real-valued numbers.

And this might head off another question. How do we know that ‘x’ is the only variable? How do we know we don’t need an ordered pair, ‘(x, y)’? This would be from the set called R2, pairs of real-valued numbers. It uses only the first coordinate of the pair, but that’s allowed. How do we know that’s not going on? And we don’t know that from the “x2” part. The “f(x) = ” part gives us that hint. If we thought the problem needed two independent variables, it would usually list them somewhere. Writing “f(x, y) = x2” begs for the domain R2, even if we don’t know what good the ‘y’ does yet. In mapping notation, if we wrote “$f:(x, y) \rightarrow x^2$” we’d be calling for R2. If ‘x’ and ‘z’ both appear, that’s usually a hint that the problem needs coordinates ‘x’, ‘y’, and ‘z’, so that we’d want R3 at least.

So that’s the maybe frustrating heuristic here. The inferred domain is the smallest biggest set that the rule makes sense on. The real numbers, but not ordered pairs of real numbers, and not complex-valued numbers. Something like that.

What does this mean for the function “$f(x) = \frac{1}{x}$”? Well, the variable is ‘x’, so we should think real numbers rather than complex-valued ones. There no ‘y’ or ‘z’ or anything, so we don’t need ordered sets. The domain is something in the real numbers, then. And the formula “$\frac{1}{x}$” means something for any real number ‘x’ … well, with the one exception. We try not to divide by zero. It raises questions we’d rather not have brought up.

So from this we infer a domain of “all the real numbers except 0”. And this in turn implies a range of “all the real numbers except 0”.

Is “$f(x) = \frac{1}{x}$” continuous on every point in the domain? That is, whenever ‘x’ is any real number besides zero? And, well, it is. A proper proof would be even more heaps of paragraphs, so I’ll skip it. Informally, you know if you drew a curve representing this function there’s only one point where you would ever lift your pen. And that point is 0 … which is not in this domain. So the function is continuous at every point in the domain. So the function’s continuous. Done.

And, I admit, not quite comfortably done. I feel like there’s some slight-of-hand anyway. You draw “$\frac{1}{x}$” and you absolutely do lift your pen, after all.

So, I fibbed a little above. When I said the range was “the set that the rule maps the domain onto”. I mean, that’s what it properly is. But finding that is often too much work. You have to find where the function would be its smallest, which is often hard, or at least tedious. You have to find where it’s largest, which is just as tedious. You have to find if there’s anything between the smallest and largest values that it skips. You have to find all these gaps. That’s boring. And what’s the harm done if we declare the range is bigger than that set? If, for example, we say the range of’ x2‘ is all the real numbers, even though we know it’s really only the non-negative numbers?

None at all. Not unless we’re taking an exam about finding the smallest range that lets a function make sense. So in practice we’ll throw in all the negative numbers into that range, even if nothing matches them. I admit this makes me feel wasteful, but that’s my weird issue. It’s not like we use the numbers up. We’ll just overshoot on the range and that’s fine.

You see the trap this has set up. If it doesn’t cost us anything to throw in unneeded stuff in the range, and it makes the problem easier to write about, can we do that with the domain?

Well. Uhm. No. Not if we’re doing this right. The range can have unneeded stuff in it. The domain can’t. It seems unfair, but if we don’t set hold to that rule, we make trouble for ourselves. By ourselves I mean mathematicians who study the theory of functions. That’s kind of like ourselves, right? So there’s no declaring that “$\frac{1}{x}$” is a function on “all” the real numbers and trusting nobody to ask what happens when ‘x’ is zero.

But we don’t need for a function’s rule to a be a single thing. Or a simple thing. It can have different rules for different parts of the domain. It’s fine to declare, for example, that f(x) is equal to “$\frac{1}{x}$” for every real number where that makes sense, and that it’s equal to 0 everywhere else. Or that it’s 1 everywhere else. That it’s negative a billion and a third everywhere else. Whatever number you like. As long as it’s something in the range.

So I’ll declare that my idea of this function is an ‘f(x)’ that’s equal to “$\frac{1}{x}$” if ‘x’ is not zero, and that’s equal to 2 if ‘x’ is zero. I admit if I weren’t writing for an audience I’d make ‘f(x)’ equal to 0 there. That feels nicely symmetric. But everybody picks 0 when they’re filling in this function. I didn’t get where I am by making the same choices as everybody else, I tell myself, while being far less successful than everybody else.

And now my ‘f(x)’ is definitely not continuous. The domain’s all the real numbers, yes. But at the point where ‘x’ is 0? There’s no drawing that without raising your pen from the paper. I trust you’re convinced. Your analysis professor will claim she’s not convinced, if you write that on your exam. But if you and she were just talking about functions, she’d agree. Since there’s one point in the domain where the function’s not continuous, the function is not continuous.

So there we have it. “$\frac{1}{x}$”, taken in one reasonable way, is a continuous function. “$\frac{1}{x}$”, taken in another reasonable way, is not a continuous function. What you think reasonable is what sets your answer.

## Is 1/x a Continuous Function?

So this is a question I got by way of a friend. It’s got me thinking because there is an obviously right answer. And there’s an answer that you get to if you think about it longer. And then longer still and realize there are several answers you could give. So I wanted to put it out to my audience. Figuring out your answer and why you stand on that is the interesting bit.

The question is as asked in the subject line: is $\frac{1}{x}$ a continuous function?

Mathematics majors, or related people like physics majors, already understand the question. Other people will want to know what the question means. This includes people who took a class calculus class, who remember three awful weeks where they had to write ε and δ a lot. The era passed, even if they did not. And people who never took a mathematics class, but like their odds at solving a reasoning problem, can get up to speed on this fast.

The colloquial idea of a “continuous function” is, well. Imagine drawing a curve that represents the function. Can you draw the whole thing without lifting your pencil off the page? That is, no gaps, no jumps? Then it’s continuous. That’s roughly the idea we want to capture by talking about a “continuous function”. It needs some logical rigor to pass as mathematics, though. So here we go.

A function is continuous if, and only if, it’s continuous at every point in the function’s domain. That I start out with that may inspire a particular feeling. That feeling is, “our Game Master grinned ear-to-ear and took out four more dice and a booklet when we said we were sure”.

But our best definition of continuity builds on functions at particular points. Which is fair. We can imagine a function that’s continuous in some places but that’s not continuous somewhere else. The ground can be very level and smooth right up to the cliff. And we have a nice, easy enough, idea of what it is to be continuous at a point.

I’ll get there in a moment. My life will be much easier if I can give you some more vocabulary. They’re all roughly what you might imagine the words meant if I didn’t tell you they were mathematics words.

The first is ‘map’. A function ‘maps’ something in its domain to something in its range. Like if ‘a’ is a point in the domain, ‘f’ maps that point to ‘f(a)’, in its range. Like, if your function is ‘f(x) = x2‘, then f maps 2 to 4. It maps 3 to 9. It maps -2 to 4 again, and that’s all right. There’s no reason you can’t map several things to one thing.

The next is ‘image’. Take something in the domain. It might be a single point. It might be a couple of points. It might be an interval. It might be several intervals. It’s a set, as big or as empty as you like. The image’ of that set is all the points in the range that any point in the original set gets mapped to. So, again play with f(x) = x2. The image of the interval from 0 to 2 is the interval from 0 to 4. The image of the interval from 3 to 4 is the interval from 9 to 16. The image of the interval from -3 to 1 is the interval from 0 to 9.

That’s as much vocabulary as I need. Thank you for putting up with that. Now I can say what it means to be continuous at a point.

Is a function continuous at a point? Let me call that point ‘a’? It is continuous at ‘a’ we can do this. Take absolutely any open set in the range that contains ‘f(a)’. I’m going to call that open set ‘R’. Is there an open set, that I’ll call ‘D’, inside the domain, that contains ‘a’, and with an image that’s inside ‘R’? ‘D’ doesn’t have to be big. It can be ridiculously tiny; it just has to be an open set. If there always is a D like this, no matter how big or how small ‘R’ is, then ‘f’ is continuous at ‘a’. If there is not — if there’s even just the one exception — then ‘f’ is not continuous at ‘a’.

I realize that’s going back and forth a lot. It’s as good as we can hope for, though. It does really well at capturing things that seem like they should be continuous. And it never rules as not-continuous something that people agree should be continuous. It does label “continuous” some things that seem like they shouldn’t be. We accept this because not labelling continuous stuff as non-continuous is worse.

And all this talk about open sets and images gets a bit abstract. It’s written to cover all kinds of functions on all kinds of things. It’s hard to master, but, if you get it, you’ve got a lot of things. It works for functions on all kinds of domains and ranges. And it doesn’t need very much. You need to have an idea of what an ‘open set’ is, on the domain and range, and that’s all. This is what gives it universality.

But it does mean there’s the challenge figuring out how to start doing anything. If we promise that we’re talking about a function with domain and range of real numbers we can simplify things. This is where that ε and δ talk comes from. But here’s how we can define “continuous at a point” for a function in the special case that its domain and range are both real numbers.

Take any positive ε. Is there is some positive δ, so that, whenever ‘x’ is a number less than δ away from ‘a’, we know that f(x) is less than ε away from f(a)? If there always is, no matter how large or small ε is, then f is continuous at a. If there ever is not, even for a single exceptional ε, then f is not continuous at a.

That definition is tailored for real-valued functions. But that’s enough if you want to answer the original question. Which, you might remember, is, “is 1/x a continuous function”?

That I ask the question, for a function simple and familiar enough a lot of people don’t even need to draw it, may give away what I think the answer is. But what’s interesting is, of course, why the answer. So I’ll leave that for an essay next week.

## My 2018 Mathematics A To Z: Commutative

Today’s A to Z term comes from Reynardo, @Reynardo_red on Twitter, and is a challenge. And the other A To Z posts for this year should be at this link.

# Commutative.

Some terms are hard to discuss. This is among them. Mathematicians find commutative things early on. Addition of whole numbers. Addition of real numbers. Multiplication of whole numbers. Multiplication of real numbers. Multiplication of complex-valued numbers. It’s easy to think of this commuting as just having liberty to swap the order of things. And it’s easy to think of commuting as “two things you can do in either order”. It inspires physical examples like rotating a dial, clockwise or counterclockwise, however much you like. Or outside the things that seem obviously mathematical. Add milk and then cereal to the bowl, or cereal and then milk. As long as you don’t overfill the bowl, there’s not an important different. Per Wikipedia, if you’re putting one sock on each foot, it doesn’t matter which foot gets a sock first.

When something is this accessible, and this universal, it gets hard to talk about. It threatens to be invisible. It was hard to say much interesting about the still air in a closed room, at least before there was a chemistry that could tell it wasn’t a homogenous invisible something, and before there was a statistical mechanics that it was doing something even when it was doing nothing.

But commutativity is different. It’s easy to think of mathematics that doesn’t commute. Subtraction doesn’t, for all that it’s as familiar as addition. And despite that we try, in high school algebra, to fuse it into addition. Division doesn’t either, for all that we try to think of it as multiplication. Rotating things in three dimensions doesn’t commute. Nor does multiplying quaternions, which are a kind of number still. (I’m double-dipping here. You can use quaternions to represent three-dimensional rotations, and vice-versa. So they aren’t quite different examples, even though you can use quaternions to do things unrelated to rotations.) Clothing is a mass of things that can and can’t be put on first.

We talk about commuting as if it’s something in (or not in) the operations we do. Adding. Rotating. Walking in some direction. But it’s not entirely in that. Consider walking directions. From an intersection in the city, walk north to the first intersection you encounter. And walk east to the first intersection you encounter. Does it matter whether you walk north first and then east, or east first and then north? In some cases, no; famously, in Midtown Manhattan there’s no difference. At least if we pretend Broadway doesn’t exist.

Also if we don’t start from near the edge of the island, or near Central Park. An operation, even something familiar like addition, is a function. Its domain is an ordered pair. Each thing in the pair is from the set of whatever might be added together. (Or multiplied, or whatever the name of the operation is.) The operation commutes if the order of the pair doesn’t matter. It’s easy to find sets and operations that won’t commute. I suppose it’s for the same reason it’s easier to find rectangular rather than square things. We’re so used to working with operations like multiplication that we forget that multiplication needs things to multiply.

Whether a thing commutes turns up often in group theory. This shouldn’t surprise. Group theory studies how arithmetic works. A “group”, which is a set of things with an operation like multiplication on it, might or might not commute. A “ring”, which has a set of things and two operations, has some commutativity built into it. One ring operation is something like addition. That commutes, or else you don’t have a ring. The other operation is something like multiplication. That might or might not commute. It depends what you need for your problem. A ring with commuting multiplication, plus some other stuff, can reach the heights of being a “field”. Fields are neat. They look a lot like the real numbers, but they can be all weird, too.

But even in a group, that doesn’t have to have a commuting multiplication, we can tease out commutativity. There is a thing named the “commutator”, which is this particular way of multiplying elements together. You can use it to split the original group in the way that odds and evens split the whole numbers. That splitting is based on the same multiplication as the original group. But its domain is now classes based on elements of the original group. What’s created, the “commutator subgroup”, is commutative. We can find a thing, based on what we are interested in, which offers commutativity right nearby.

It reaches further. In analysis, it can be useful to think of functions as “mappings”. We describe this as though a function took a domain and transformed it into a range. We can compose these functions together: take the range from one function and use it as the domain for another. Sometimes these chains of functions will commute. We can get from the original set to the final set by several paths. This can produce fascinating and beautiful proofs that look as if you just drew a lattice-work. The MathWorld page on “Commutative Diagram” has some examples of this, and I recommend just looking at the pictures. Appreciate their aesthetic, particularly the ones immediately after the sentence about “Commutative diagrams are usually composed by commutative triangles and commutative squares”.

Whether these mappings commute can have meaning. This takes us, maybe inevitably, to quantum mechanics. Mathematically, this represents systems as either a wave function or a matrix, whichever is more convenient. We can use this to find the distribution of positions or momentums or energies or anything else we would like to know. Distributions are as much as we can hope for from quantum mechanics. We can say what (eg) the position of something is most likely to be but not what it is. That’s all right.

The mathematics of finding these distributions is just applying an operator, taking a mapping, on this wave function or this matrix. Some pairs of these operators commute, like the ones that let us find momentum and find kinetic energy. Some do not, like those to find position and angular momentum.

We can describe how much two operators do or don’t commute. This is through a thing called the “commutator”. Its form looks almost playfully simple. Call the operators ‘f’ and ‘g’. And that by ‘fg’ we mean, “do g, then do f”. (This seems awkward. But if you think of ‘fg’ as ‘f(g(x))’, where ‘x’ is just something in the domain of g, then this seems less awkward.) The commutator of ‘f’ and ‘g’ is then whatever ‘fg – gf’ is. If it’s always zero, then ‘f’ and ‘g’ commute. If it’s ever not zero, then they don’t.

This is easy to understand physically. Imagine starting from a point on the surface of the earth. Travel south one mile and then west one mile. You are at a different spot than you would be, had you instead travelled west one mile and then south one mile. How different? That’s the commutator. It’s obviously zero, for just multiplying some regular old numbers together. It’s sometimes zero, for these paths on the Earth’s surface. It’s never zero, for finding-the-position and finding-the-angular-momentum. The amount by which that’s never zero we can see as the famous Uncertainty Principle, the limits of what kinds of information we can know about the world.

Still, it is a hard subject to describe. Things which commute are so familiar that it takes work to imagine them not commuting. (How could three times four equal anything but four times three?) Things which do not commute either obviously shouldn’t (add hot water to the instant oatmeal, and eat it), or are unfamiliar enough people need to stop and think about them. (Rotating something in one direction and then another, in three dimensions, generally doesn’t commute. But I wouldn’t fault you for testing this out with a couple objects on hand before being sure about it.) But it can be noticed, once you know to explore.

## Someone Else’s Homework: Was It Hard? An Umbrella Search

I wanted to follow up, at last, on this homework problem a friend had.

The question: suppose you have a function f. Its domain is the integers Z. Its rage range is also the integers Z. You know two things about the function. First, for any two integers ‘a’ and ‘b’, you know that f(a + b) equals f(a) + f(b). Second, you know there is some odd number ‘c’ for which f(c) is even. The challenge: prove that f is even for all the integers.

My friend asked, as we were working out the question, “Is this hard?” And I wasn’t sure what to say. I didn’t think it was hard, but I understand why someone would. If you’re used to mathematics problems like showing that all the roots of this polynomial are positive, then this stuff about f being even is weird. It’s a different way of thinking about problems. I’ve got experience in that thinking that my friend hasn’t.

All right, but then, what thinking? What did I see that my friend didn’t? And I’m not sure I can answer that perfectly. Part of gaining mastery of a subject is pattern recognition. Spotting how some things fit a form, while other stuff doesn’t, and some other bits yet are irrelevant. But also part of gaining that mastery is that it becomes hard to notice that’s what you’re doing.

But I can try to look with fresh eyes. There is a custom in writing this sort of problem, and that drove much of my thinking. The custom is that a mathematics problem, at this level, works by the rules of a Minute Mystery Puzzle. You are given in the setup everything that you need to solve the problem, yes. But you’re also not given stuff that you don’t need. If the detective mentions to the butler how dreary the rain is on arriving, you’re getting the tip to suspect the houseguest whose umbrella is unaccounted for.

(This format is almost unavoidable for teaching mathematics. At least it seems unavoidable given the number of problems that don’t avoid it. This can be treacherous. One of the hardest parts in stepping out to research on one’s own is that there’s nobody to tell you what the essential pieces are. Telling apart the necessary, the convenient, and the irrelevant requires expertise and I’m not sure that I know how to teach it.)

The first unaccounted-for umbrella in this problem is the function’s domain and range. They’re integers. Why wouldn’t the range, particularly, be all the real numbers? What things are true about the integers that aren’t true about the real numbers? There’s a bunch of things. The highest-level things are rooted in topology. There’s gaps between one integer and its nearest neighbor. Oh, and an integer has a nearest neighbor. A real number doesn’t. That matters for approximations and for sequences and series. Not likely to matter here. Look to more basic, obvious stuff: there’s even and odd numbers. And the problem talks about knowing something for an odd number in the domain. This is a signal to look at odds and evens for the answer.

The second unaccounted-for umbrella is the most specific thing we learn about the function. There is some odd number ‘c’, and the function matches that integer ‘c’ in the domain to some even number f(c) in the range. This makes me think: what do I know about ‘c’? Most basic thing about any odd number is it’s some even number plus one. And that made me think: can I conclude anything about f(1)? Can I conclude anything about f at the sum of two numbers?

Third unaccounted-for umbrella. The less-specific thing we learn about the function. That is that for any integers ‘a’ and ‘b’, f(a + b) is f(a) + f(b). So see how this interacts with the second umbrella. f(c) is f(some-even-number) + f(1). Do I know anything about f(some-even-number)?

Sure. If I know anything about even numbers, it’s that any even number equals two times some integer. Let me call that some-integer ‘k’. Since some-even-number equals 2*k, then, f(some-even-number) is f(2*k), which is f(k + k). And by the third umbrella, that’s f(k) + f(k). By the first umbrella, f(k) has to be an integer. So f(k) + f(k) has to be even.

So, f(c) is an even number. And it has to equal f(2*k) + f(1). f(2*k) is even; so, f(1) has to be even. These are the things that leapt out to me about the problem. This is why the problem looked, to me, easy.

Because I knew that f(1) was even, I knew that f(1 + 1), or f(2), was even. And so would be f(2 + 1), that is, f(3). And so on, for at least all the positive integers.

Now, after that, in my first version of this proof, I got hung up on what seems like a very fussy technical point. And that was, what about f(0)? What about the negative integers? f(0) is easy enough to show. It follows from one of those tricks mathematics majors are told about early. Somewhere in grad school they start to believe it. And that is: adding zero doesn’t change a number’s value, but it can give you a more useful way to express that number. Here’s how adding zero helps: we know c = c + 0. So f(c) = f(c) + f(0) and whether f(c) is even or odd, f(0) has to be even. Evens and odds don’t work any other way.

After that my proof got hung up on what may seem like a pretty fussy technical point. That amounted to whether f(-1) was even or odd. I discussed this with a couple people who could not see what my issue with this was. I admit I wasn’t sure myself. I think I’ve narrowed it down to this: my questioning whether it’s known that the number “negative one” is the same thing as what we get from the operation “zero minus one”. I mean, in general, this isn’t much questioned. Not for the last couple centuries.

You might be having trouble even figuring out why I might worry there could be a difference. In “0 – 1” the – sign there is a binary operation, meaning, “subtract the number on the right from the number on the left”. In “-1” the – sign there is a unary operation, meaning, “take the additive inverse of the number on the right”. These are two different – signs that look alike. One of them interacts with two numbers. One of them interacts with a single number. How can they mean the same thing?

With some ordinary assumptions about what we mean by “addition” and “subtraction” and “equals” and “zero” and “numbers” and stuff, the difference doesn’t matter much. We can swap between “-1” and “0 – 1” effortlessly. If we couldn’t, we probably wouldn’t use the same symbol for the two ideas. But in the context of this particular question, could we count on that?

My friend wasn’t confident in understanding what the heck I was getting on about. Fair enough. But some part of me felt like that needed to be shown. If it hadn’t been recently shown, or used, in class, then it had to go into this proof. And that’s why I went, in the first essay, into the bit about additive inverses.

This was me over-thinking the problem. I got to looking at umbrellas that likely were accounted for.

My second proof, the one thought up in the shower, uses the same unaccounted-for umbrellas. In the first proof, the second unaccounted-for umbrella seemed particularly important. Knowing that f(c) was odd, what else could I learn? In the second proof, it’s the third unaccounted-for umbrella that seemed key. Knowing that f(a + b) is f(a) + f(b), what could I learn? That right away tells me that for any even number ‘d’, f(d) must be even.

Call this the fourth unaccounted-for umbrella. Every integer is either even or odd. So right away I could prove this for what I really want to say is half of the integers. Don’t call it that. There’s not a coherent way to say the even integers are any fraction of all the integers. There’s exactly as many even integers as there are integers. But you know what I mean. (What I mean is, in any finite interval of consecutive integers, half are going to be even. Well, there’ll be at most two more odd integers than there are even integers. That’ll be close enough to half if the interval is long enough. And if we pretend we can make bigger and bigger intervals until all the integers are covered … yeah. Don’t poke at that and do not use it at your thesis defense because it doesn’t work. That’s what it feels like ought to work.)

But that I could cover the even integers in the domain with one quick sentence was a hint. The hint was, look for some thing similar that would cover the odd integers in the domain. And hey, that second unaccounted-for umbrella said something about one odd integer in the domain. Add to that one of those boring little things that a mathematician knows about odd numbers: the difference between any two odd numbers is an even number. ‘c’ is an odd number. So any odd number in the domain, let’s call it ‘d’, is equal to ‘c’ plus some even number. And f(some-even-number) has to be even and there we go.

So all this is what I see when I look at the question. And why I see those things, and why I say this is not a hard problem. It’s all in spotting these umbrellas.

## Someone Else’s Homework: Some More Thoughts

I wanted to get back to my friend’s homework problem. And a question my friend had about the question. It’s a question I figure is good for another essay.

But I also had second thoughts about the answer I gave. Not that it’s wrong, but that it could be better. Also that I’m not doing as well in spelling “range” as I had always assumed I would. This is what happens when I don’t run an essay through Hemmingway App to check whether my sentences are too convoluted. I also catch smaller word glitches.

Let me re-state the problem: Suppose you have a function f, with domain of the integers Z and rage of the integers Z. And also you know that f has the property that for any two integers ‘a’ and ‘b’, f(a + b) equals f(a) + f(b). And finally, suppose that for some odd number ‘c’, you know that f(c) is even. The challenge: prove that f is even for all the integers.

Like I say, the answer I gave on Tuesday is right. That’s fine. I just thought of a better answer. This often happens. There are very few interesting mathematical truths that only have a single proof. The ones that have only a single proof are on the cutting edge, new mathematics in a context we don’t understand well enough yet. (Yes, I am overlooking the obvious exception of ______ .) But a question so well-chewed-over that it’s fit for undergraduate homework? There’s probably dozens of ways to attack that problem.

And yes, you might only see one proof of something. Sometimes there’s an approach that works so well it’s silly to consider alternatives. Or the problem isn’t big enough to need several different proofs. There’s something to regret in that. Re-thinking an argument can make it better. As instructors we might recommend rewriting an assignment before turning it in. But I’m not sure that encourages re-thinking the assignment. It’s too easy to just copy-edit and catch obvious mistakes. Which is valuable, yes. But it’s good for communication, not for the mathematics itself.

So here’s my revised argument. It’s much cleaner, as I realized it while showering Wednesday morning.

Give me an integer. Let’s call it m. Well, m has to be either an even or an odd number. I’m supposing nothing about whether it’s positive or negative, by the way. This means what I show will work whether m is greater than, less than, or equal to zero.

Suppose that m is an even number. Then m has to equal 2*k for some integer k. (And yeah, k might be positive, might be negative, might be zero. Don’t know. Don’t care.) That is, m has to equal k + k. So f(m) = f(k) + f(k). That’s one of the two things we know about the function f. And f(k) + f(k) is is 2 * f(k). And f(k) is an integer: the integers are the function’s rage range). So 2 * f(k) is an even integer. So if m is an even number then f(m) has to be even.

All right. Suppose that m isn’t an even integer. Then it’s got to be an odd integer. So this means m has to be equal to c plus some even number, which I’m going ahead and calling 2*k. Remember c? We were given information about f for that element c in the domain. And again, k might be positive. Might be negative. Might be zero. Don’t know, and don’t need to know. So since m = c + 2*k, we know that f(m) = f(c) + f(2*k). And the other thing we know about f is that f(c) is even. f(2*k) is also even. f(c), which is even, plus f(2*k), which is even, has to be even. So if m is an odd number, then f(m) has to be even.

And so, as long as m is an integer, f(m) is even.

You see why I like that argument better. It’s shorter. It breaks things up into fewer cases. None of those cases have to worry about whether m is positive or negative or zero. Each of the cases is short, and moves straight to its goal. This is the proof I’d be happy submitting. Today, anyway. No telling what tomorrow will make me think.

## Someone Else’s Homework: A Solution

I have a friend who’s been taking mathematical logic. While talking over the past week’s work they mentioned a problem that had stumped them. But they’d figured it out — at least the critical part — about a half-hour after turning it in. And I had fun going over it. Since the assignment’s already turned in and I don’t even know which class it was, I’d like to share it with you.

So here’s the problem. Suppose you have a function f, with domain of the integers Z and rage of the integers Z. And also you know that f has the property that for any two integers ‘a’ and ‘b’, f(a + b) equals f(a) + f(b). And finally, suppose that for some odd number ‘c’, you know that f(c) is even. The challenge: prove that f is even for all the integers.

If you want to take a moment to think about that, please do.

First thing I want to do is show that f(1) is an even number. How? Well, if ‘c’ is an odd number, then ‘c’ has to equal ‘2*k + 1’ for some integer ‘k’. So f(c) = f(2*k + 1). And therefore f(c) = f(2*k) + f(1). And, since 2*k is equal to k + k, then f(2*k) has to equal f(k) + f(k). Therefore f(c) = 2*f(k) + f(1). Whatever f(k) is, 2*f(k) has to be an even number. And we’re given f(c) is even. Therefore f(1) has to be even.

Now I can prove that if ‘k’ is any positive integer, then f(k) has to be even. Why? Because ‘k’ is equal to 1 + 1 + 1 + … + 1. And so f(k) has to equal f(1) + f(1) + f(1) + … + f(1). That is, it’s k * f(1). And if f(1) is even then so is k * f(1). So that covers the positive integers.

How about zero? Can I show that f(0) is even? Oh, sure, easy. Start with ‘c’. ‘c’ equals ‘c + 0’. So f(c) = f(c) + f(0). The only way that’s going to be true is if f(0) is equal to zero, which is an even number.

By the way, here’s an alternate way of arguing this: 0 = 0 + 0. So f(0) = f(0) + f(0). And therefore f(0) = 2 * f(0) and that’s an even number. Incidentally also zero. Submit the proof you like.

What’s not covered yet? Negative integers. It’s hard not to figure, well, we know f(1) is even, we know f(a + b) if f(a) + f(b). Shouldn’t, like, f(-2) just be -2 * f(1)? Oh, it so should. I don’t feel like we have that already proven, though. So let me nail that down. I’m going to use what we know about f(k) for positive ‘k’, and the fact that f(0) is 0.

So give me any negative integer; I’m going call it ‘-k’. Its additive inverse is ‘k’, which is a positive number. -k + k = 0. And so f(-k + k) = f(-k) + f(k) = f(0). So, f(-k) + f(k) = 0, and f(-k) = -f(k). If f(k) is even — and it is — then f(-k) is also even.

So there we go: whether ‘k’ is a positive, zero, or negative integer, f(k) is even. All the integers are either positive, zero, or negative. So f is even for any integer.

This attractive little tweet came across my feed yesterday:

This function — I guess it’s the “popcorn” function — is a challenge to our ideas about what a “continuous” function is. I’ve mentioned “continuous” functions before and said something like they’re functions you could draw without lifting your pen from the paper. That’s the colloquial, and the intuitive, idea of what they mean. And that’s all right for ordinary uses.

But the best definition mathematicians have thought of for a “continuous function” has some quirks. And here’s one of them. Define a function named ‘f’. Its domain is the real numbers. Its range is the real numbers. And the rule matching things in the domain to things in the range is, as pictured:

• If ‘x’ is zero then $f(x) = 1$
• If ‘x’ is an irrational number then $f(x) = 0$
• If ‘x’ is a rational number, then it’s equal in lowest terms to the whole number ‘p’ divided by the positive whole number ‘q’. And for this ‘x’, then $f(x) = \frac{1}{q}$

And as the tweet from Fermat’s Library says, this is a function that’s continuous on all the irrational numbers. It’s not continuous on any rational numbers. This seems like a prank. But it’s a common approach to finding intuition-testing ideas about continuity. Setting different rules for rational and irrational numbers works well for making these strange functions. And thinking of rational numbers as their representation in lowest terms is also common. (Writing it as ‘p divided by q’ suggests that ‘p’ and ‘q’ are going to be prime, but, no! Think of $\frac{3}{8}$ or of $\frac{4}{9}$.) If you stare at the plot you can maybe convince yourself that “continuous on the irrational numbers” makes sense here. That heavy line of dots at the bottom looks like it’s approaching a continuous blur, at least.

It can get weirder. It’s possible to create a function that’s continuous at only a single point of all the real numbers. This is why Real Analysis is such a good subject to crash hard against. But we accept weird conclusions like this because the alternative is to give up as “continuous” functions that we just know have to be continuous. Mathematical definitions are things we make for our use.

## The Summer 2017 Mathematics A To Z: Functor

Gaurish gives me another topic for today. I’m now no longer sure whether Gaurish hopes me to become a topology blogger or a category theory blogger. I have the last laugh, though. I’ve wanted to get better-versed in both fields and there’s nothing like explaining something to learn about it.

# Functor.

So, category theory. It’s a foundational field. It talks about stuff that’s terribly abstract. This means it’s powerful, but it can be hard to think of interesting examples. I’ll try, though.

It starts with categories. These have three parts. The first part is a set of things. (There always is.) The second part is a collection of matches between pairs of things in the set. They’re called morphisms. The third part is a rule that lets us combine two morphisms into a new, third one. That is. Suppose ‘a’, ‘b’, and ‘c’ are things in the set. Then there’s a morphism that matches $a \rightarrow b$, and a morphism that matches $b \rightarrow c$. And we can combine them into another morphism that matches $a \rightarrow c$. So we have a set of things, and a set of things we can do with those things. And the set of things we can do is itself a group.

This describes a lot of stuff. Group theory fits seamlessly into this description. Most of what we do with numbers is a kind of group theory. Vector spaces do too. Most of what we do with analysis has vector spaces underneath it. Topology does too. Most of what we do with geometry is an expression of topology. So you see why category theory is so foundational.

Functors enter our picture when we have two categories. Or more. They’re about the ways we can match up categories. But let’s start with two categories. One of them I’ll name ‘C’, and the other, ‘D’. A functor has to match everything that’s in the set of ‘C’ to something that’s in the set of ‘D’.

And it does more. It has to match every morphism between things in ‘C’ to some other morphism, between corresponding things in ‘D’. It’s got to do it in a way that satisfies that combining, too. That is, suppose that ‘f’ and ‘g’ are morphisms for ‘C’. And that ‘f’ and ‘g’ combine to make ‘h’. Then, the functor has to match ‘f’ and ‘g’ and ‘h’ to some morphisms for ‘D’. The combination of whatever ‘f’ matches to and whatever ‘g’ matches to has to be whatever ‘h’ matches to.

This might sound to you like a homomorphism. If it does, I admire your memory or mathematical prowess. Functors are about matching one thing to another in a way that preserves structure. Structure is the way that sets of things can interact. We naturally look for stuff made up of different things that have the same structure. Yes, functors are themselves a category. That is, you can make a brand-new category whose set of things are the functors between two other categories. This is a good spot to pause while the dizziness passes.

There are two kingdoms of functor. You tell them apart by what they do with the morphisms. Here again I’m going to need my categories ‘C’ and ‘D’. I need a morphism for ‘C’. I’ll call that ‘f’. ‘f’ has to match something in the set of ‘C’ to something in the set of ‘C’. Let me call the first something ‘a’, and the second something ‘b’. That’s all right so far? Thank you.

Let me call my functor ‘F’. ‘F’ matches all the elements in ‘C’ to elements in ‘D’. And it matches all the morphisms on the elements in ‘C’ to morphisms on the elmenets in ‘D’. So if I write ‘F(a)’, what I mean is look at the element ‘a’ in the set for ‘C’. Then look at what element in the set for ‘D’ the functor matches with ‘a’. If I write ‘F(b)’, what I mean is look at the element ‘b’ in the set for ‘C’. Then pick out whatever element in the set for ‘D’ gets matched to ‘b’. If I write ‘F(f)’, what I mean is to look at the morphism ‘f’ between elements in ‘C’. Then pick out whatever morphism between elements in ‘D’ that that gets matched with.

Here’s where I’m going with this. Suppose my morphism ‘f’ matches ‘a’ to ‘b’. Does the functor of that morphism, ‘F(f)’, match ‘F(a)’ to ‘F(b)’? Of course, you say, what else could it do? And the answer is: why couldn’t it match ‘F(b)’ to ‘F(a)’?

No, it doesn’t break everything. Not if you’re consistent about swapping the order of the matchings. The normal everyday order, the one you’d thought couldn’t have an alternative, is a “covariant functor”. The crosswise order, this second thought, is a “contravariant functor”. Covariant and contravariant are distinctions that weave through much of mathematics. They particularly appear through tensors and the geometry they imply. In that introduction they tend to be difficult, even mean, creations, since in regular old Euclidean space they don’t mean anything different. They’re different for non-Euclidean spaces, and that’s important and valuable. The covariant versus contravariant difference is easier to grasp here.

Functors work their way into computer science. The avenue here is in functional programming. That’s a method of programming in which instead of the normal long list of commands, you write a single line of code that holds like fourteen “->” symbols that makes the computer stop and catch fire when it encounters a bug. The advantage is that when you have the code debugged it’s quite speedy and memory-efficient. The disadvantage is if you have to alter the function later, it’s easiest to throw everything out and start from scratch, beginning from vacuum-tube-based computing machines. But it works well while it does. You just have to get the hang of it.

## The End 2016 Mathematics A To Z: Weierstrass Function

I’ve teased this one before.

## Weierstrass Function.

So you know how the Earth is a sphere, but from our normal vantage point right up close to its surface it looks flat? That happens with functions too. Here I mean the normal kinds of functions we deal with, ones with domains that are the real numbers or a Euclidean space. And ranges that are real numbers. The functions you can draw on a sheet of paper with some wiggly bits. Let the function wiggle as much as you want. Pick a part of it and zoom in close. That zoomed-in part will look straight. If it doesn’t look straight, zoom in closer.

We rely on this. Functions that are straight, or at least straight enough, are easy to work with. We can do calculus on them. We can do analysis on them. Functions with plots that look like straight lines are easy to work with. Often the best approach to working with the function you’re interested in is to approximate it with an easy-to-work-with function. I bet it’ll be a polynomial. That serves us well. Polynomials are these continuous functions. They’re differentiable. They’re smooth.

That thing about the Earth looking flat, though? That’s a lie. I’ve never been to any of the really great cuts in the Earth’s surface, but I have been to some decent gorges. I went to grad school in the Hudson River Valley. I’ve driven I-80 over Pennsylvania’s scariest bridges. There’s points where the surface of the Earth just drops a great distance between your one footstep and your last.

Functions do that too. We can have points where a function isn’t differentiable, where it’s impossible to define the direction it’s headed. We can have points where a function isn’t continuous, where it jumps from one region of values to another region. Everyone knows this. We can’t dismiss those as abberations not worthy of the name “function”; too many of them are too useful. Typically we handle this by admitting there’s points that aren’t continuous and we chop the function up. We make it into a couple of functions, each stretching from discontinuity to discontinuity. Between them we have continuous region and we can go about our business as before.

Then came the 19th century when things got crazy. This particular craziness we credit to Karl Weierstrass. Weierstrass’s name is all over 19th century analysis. He had that talent for probing the limits of our intuition about basic mathematical ideas. We have a calculus that is logically rigorous because he found great counterexamples to what we had assumed without proving.

The Weierstrass function challenges this idea that any function is going to eventually level out. Or that we can even smooth a function out into basically straight, predictable chunks in-between sudden changes of direction. The function is continuous everywhere; you can draw it perfectly without lifting your pen from paper. But it always looks like a zig-zag pattern, jumping around like it was always randomly deciding whether to go up or down next. Zoom in on any patch and it still jumps around, zig-zagging up and down. There’s never an interval where it’s always moving up, or always moving down, or even just staying constant.

Despite being continuous it’s not differentiable. I’ve described that casually as it being impossible to predict where the function is going. That’s an abuse of words, yes. The function is defined. Its value at a point isn’t any more random than the value of “x2” is for any particular x. The unpredictability I’m talking about here is a side effect of ignorance. Imagine I showed you a plot of “x2” with a part of it concealed and asked you to fill in the gap. You’d probably do pretty well estimating it. The Weierstrass function, though? No; your guess would be lousy. My guess would be lousy too.

That’s a weird thing to have happen. A century and a half later it’s still weird. It gets weirder. The Weierstrass function isn’t differentiable generally. But there are exceptions. There are little dots of differentiability, where the rate at which the function changes is known. Not intervals, though. Single points. This is crazy. Derivatives are about how a function changes. We work out what they should even mean by thinking of a function’s value on strips of the domain. Those strips are small, but they’re still, you know, strips. But on almost all of that strip the derivative isn’t defined. It’s only at isolated points, a set with measure zero, that this derivative even exists. It evokes the medieval Mysteries, of how we are supposed to try, even though we know we shall fail, to understand how God can have contradictory properties.

It’s not quite that Mysterious here. Properties like this challenge our intuition, if we’ve gotten any. Once we’ve laid out good definitions for ideas like “derivative” and “continuous” and “limit” and “function” we can work out whether results like this make sense. And they — well, they follow. We can avoid weird conclusions like this, but at the cost of messing up our definitions for what a “function” and other things are. Making those useless. For the mathematical world to make sense, we have to change our idea of what quite makes sense.

That’s all right. When we look close we realize the Earth around us is never flat. Even reasonably flat areas have slight rises and falls. The ends of properties are marked with curbs or ditches, and bordered by streets that rise to a center. Look closely even at the dirt and we notice that as level as it gets there are still rocks and scratches in the ground, clumps of dirt an infinitesimal bit higher here and lower there. The flatness of the Earth around us is a useful tool, but we miss a lot by pretending it’s everything. The Weierstrass function is one of the ways a student mathematician learns that while smooth, predictable functions are essential, there is much more out there.

## The End 2016 Mathematics A To Z: Smooth

Mathematicians affect a pose of objectivity. We justify this by working on things whose truth we can know, and which must be true whenever we accept certain rules of deduction and certain definitions and axioms. This seems fair. But we choose to pay attention to things that interest us for particular reasons. We study things we like. My A To Z glossary term for today is about one of those things we like.

## Smooth.

Functions. Not everything mathematicians do is functions. But functions turn up a lot. We need to set some rules. “A function” is so generic a thing we can’t handle it much. Narrow it down. Pick functions with domains that are numbers. Range too. By numbers I mean real numbers, maybe complex numbers. That gives us something.

There’s functions that are hard to work with. This is almost all of them, so we don’t touch them unless we absolutely must. But they’re functions that aren’t continuous. That means what you imagine. The value of the function at some point is wholly unrelated to its value at some nearby point. It’s hard to work with anything that’s unpredictable like that. Functions as well as people.

We like functions that are continuous. They’re predictable. We can make approximations. We can estimate the function’s value at some point using its value at some more convenient point. It’s easy to see why that’s useful for numerical mathematics, for calculations to approximate stuff. The dazzling thing is it’s useful analytically. We step into the Platonic-ideal world of pure mathematics. We have tools that let us work as if we had infinitely many digits of precision, for infinitely many numbers at once. And yet we use estimates and approximations and errors. We use them in ways to give us perfect knowledge; we get there by estimates.

Continuous functions are nice. Well, they’re nicer to us than functions that aren’t continuous. But there are even nicer functions. Functions nicer to us. A continuous function, for example, can have corners; it can change direction suddenly and without warning. A differentiable function is more predictable. It can’t have corners like that. Knowing the function well at one point gives us more information about what it’s like nearby.

The derivative of a function doesn’t have to be continuous. Grumble. It’s nice when it is, though. It makes the function easier to work with. It’s really nice for us when the derivative itself has a derivative. Nothing guarantees that the derivative of a derivative is continuous. But maybe it is. Maybe the derivative of the derivative has a derivative. That’s a function we can do a lot with.

A function is “smooth” if it has as many derivatives as we need for whatever it is we’re doing. And if those derivatives are continuous. If this seems loose that’s because it is. A proof for whatever we’re interested in might need only the original function and its first derivative. It might need the original function and its first, second, third, and fourth derivatives. It might need hundreds of derivatives. If we look through the details of the proof we might find exactly how many derivatives we need and how many of them need to be continuous. But that’s tedious. We save ourselves considerable time by saying the function is “smooth”, as in, “smooth enough for what we need”.

If we do want to specify how many continuous derivatives a function has we call it a “Ck function”. The C here means continuous. The ‘k’ means there are the number ‘k’ continuous derivatives of it. This is completely different from a “Ck function”, which would be one that’s a k-dimensional vector. Whether the “C” is boldface or not is important. A function might have infinitely many continuous derivatives. That we call a “C function”. That’s got wonderful properties, especially if the domain and range are complex-valued numbers. We couldn’t do Complex Analysis without it. Complex Analysis is the course students take after wondering how they’ll ever survive Real Analysis. It’s much easier than Real Analysis. Mathematics can be strange.

## The End 2016 Mathematics A To Z: Principal

Functions. They’re at the center of so much mathematics. They have three pieces: a domain, a range, and a rule. The one thing functions absolutely must do is match stuff in the domain to one and only one thing in the range. So this is where it gets tricky.

## Principal.

Thing with this one-and-only-one thing in the range is it’s not always practical. Sometimes it only makes sense to allow for something in the domain to match several things in the range. For example, suppose we have the domain of positive numbers. And we want a function that gives us the numbers which, squared, are whatever the original function was. For any positive real number there’s two numbers that do that. 4 should match to both +2 and -2.

You might ask why I want a function that tells me the numbers which, squared, equal something. I ask back, what business is that of yours? I want a function that does this and shouldn’t that be enough? We’re getting off to a bad start here. I’m sorry; I’ve been running ragged the last few days. I blame the flat tire on my car.

Anyway. I’d want something like that function because I’m looking for what state of things makes some other thing true. This turns up often in “inverse problems”, problems in which we know what some measurement is and want to know what caused the measurement. We do that sort of problem all the time.

We can handle these multi-valued functions. Of course we can. Mathematicians are as good at loopholes as anyone else is. Formally we declare that the range isn’t the real numbers but rather sets of real numbers. My what-number-squared function then matches ‘4’ in the domain to the set of numbers ‘+2 and -2’. The set has several things in it, but there’s just the one set. Clever, huh?

This sort of thing turns up a lot. There’s two numbers that, squared, give us any real number (except zero). There’s three numbers that, squared, give us any real number (again except zero). Polynomials might have a whole bunch of numbers that make some equation true. Trig functions are worse. The tangent of 45 degrees equals 1. So is the tangent of 225 degrees. Also 405 degrees. Also -45 degrees. Also -585 degrees. OK, a mathematician would use radians instead of degrees, but that just changes what the numbers are. Not that there’s infinitely many of them.

It’s nice to have options. We don’t always want options. Sometimes we just want one blasted simple answer to things. It’s coded into the language. We say “the square root of four”. We speak of “the arctangent of 1”, which is to say, “the angle with tangent of 1”. We only say “all square roots of four” if we’re making a point about overlooking options.

If we’ve got a set of things, then we can pick out one of them. This is obvious, which means it is so very hard to prove. We just have to assume we can. Go ahead; assume we can. Our pick of the one thing out of this set is the “principal”. It’s not any more inherently right than the other possibilities. It’s just the one we choose to grab first.

So. The principal square root of four is positive two. The principal arctangent of 1 is 45 degrees, or in the dialect of mathematicians π divided by four. We pick these values over other possibilities because they’re nice. What makes them nice? Well, they’re nice. Um. Most of their numbers aren’t that big. They use positive numbers if we have a choice in the matter. Deep down we still suspect negative numbers of being up to something.

If nobody says otherwise then the principal square root is the positive one, or the one with a positive number in front of the imaginary part. If nobody says otherwise the principal arcsine is between -90 and +90 degrees (-π/2 and π/2). The principal arccosine is between 0 and 180 degrees (0 and π), unless someone says otherwise. The principal arctangent is … between -90 and 90 degrees, unless it’s between 0 and 180 degrees. You can count on the 0 to 90 part. Use your best judgement and roll with whatever develops for the other half of the range there. There’s not one answer that’s right for every possible case. The point of a principal value is to pick out one answer that’s usually a good starting point.

When you stare at what it means to be a function you realize that there’s a difference between the original function and the one that returns the principal value. The original function has a range that’s “sets of values”. The principal-value version has a range that’s just one value. If you’re being kind to your audience you make some note of that. Usually we note this by capitalizing the start of the function: “arcsin z” gives way to “Arcsin z”. “Log z” would be the principal-value version of “log z”. When you start pondering logarithms for negative numbers or for complex-valued numbers you get multiple values. It’s the same way that the arcsine function does.

And it’s good to warn your audience which principal value you mean, especially for the arc-trigonometric-functions or logarithms. (I’ve never seen someone break the square root convention.) The principal value is about picking the most obvious and easy-to-work-with value out of a set of them. It’s just impossible to get everyone to agree on what the obvious is.

## The End 2016 Mathematics A To Z: Local

Today’s is another of those words that means nearly what you would guess. There are still seven letters left, by the way, which haven’t had any requested terms. If you’d like something described please try asking.

## Local.

Stops at every station, rather than just the main ones.

OK, I’ll take it seriously.

So a couple years ago I visited Niagara Falls, and I stepped into the river, just above the really big drop.

I didn’t have any plans to go over the falls, and didn’t, but I liked the thrill of claiming I had. I’m not crazy, though; I picked a spot I knew was safe to step in. It’s only in the retelling I went into the Niagara River just above the falls.

Because yes, there is surely danger in certain spots of the Niagara River. But there are also spots that are perfectly safe. And not isolated spots either. I wouldn’t have been less safe if I’d stepped into the river a few feet closer to the edge. Nor if I’d stepped in a few feet farther away. Where I stepped in was locally safe.

Over in mathematics we do a lot of work on stuff that’s true or false depending on what some parameters are. We can look at bunches of those parameters, and they often look something like normal everyday space. There’s some values that are close to what we started from. There’s others that are far from that.

So, a “neighborhood” of some point is that point and some set of points containing it. It needs to be an “open” set, which means it doesn’t contain its boundary. So, like, everything less than one minute’s walk away, but not the stuff that’s precisely one minute’s walk away. (If we include boundaries we break stuff that we don’t want broken is why.) And certainly not the stuff more than one minute’s walk away. A neighborhood could have any shape. It’s easy to think of it as a little disc around the point you want. That’s usually the easiest to describe in a proof, because it’s “everything a distance less than (something) away”. (That “something” is either ‘δ’ or ‘ε’. Both Greek letters are called in to mean “a tiny distance”. They have different connotations about what the tiny distance is in.) It’s easiest to draw as little amoeba-like blob around a point, and contained inside a bigger amoeba-like blob.

Anyway, something is true “locally” to a point if it’s true in that neighborhood. That means true for everything in that neighborhood. Which is what you’d expect. “Local” means just that. It’s the stuff that’s close to where we started out.

Often we would like to know something “globally”, which means … er … everywhere. Universally so. But it’s usually easier to prove a thing locally. I suppose having a point where we know something is so makes it easier to prove things about what’s nearby. Distant stuff, who knows?

“Local” serves as an adjective for many things. We think of a “local maximum”, for example, or “local minimum”. This is where whatever we’re studying has a value bigger (or smaller) than anywhere else nearby has. Or we speak of a function being “locally continuous”, meaning that we know it’s continuous near this point and we make no promises away from it. It might be “locally differentiable”, meaning we can take derivatives of it close to some interesting point. We say nothing about what happens far from it.

Unless we do. We can talk about something being “local to infinity”. Your first reaction to that should probably be to slap the table and declare that’s it, we’re done. But we can make it sensible, at least to other mathematicians. We do it by starting with a neighborhood that contains the origin, zero, that point in the middle of everything. So, what’s the inverse of that? It’s everything that’s far enough away from the origin. (Don’t include the boundary, we don’t need those headaches.) So why not call that the “neighborhood of infinity”? Other than that it’s a weird set of words to put together? And if something is true in that “neighborhood of infinity”, what is that thing other than true “local to infinity”?

I don’t blame you for being skeptical.

## The End 2016 Mathematics A To Z: Image

It’s another free-choice entry. I’ve got something that I can use to make my Friday easier.

## Image.

So remember a while back I talked about what functions are? I described them the way modern mathematicians like. A function’s got three components to it. One is a set of things called the domain. Another is a set of things called the range. And there’s some rule linking things in the domain to things in the range. In shorthand we’ll write something like “f(x) = y”, where we know that x is in the domain and y is in the range. In a slightly more advanced mathematics class we’ll write $f: x \rightarrow y$. That maybe looks a little more computer-y. But I bet you can read that already: “f matches x to y”. Or maybe “f maps x to y”.

We have a couple ways to think about what ‘y’ is here. One is to say that ‘y’ is the image of ‘x’, under ‘f’. The language evokes camera trickery, or at least the way a trick lens might make us see something different. Pretend that the domain is something you could gaze at. If the domain is, say, some part of the real line, or a two-dimensional plane, or the like that’s not too hard to do. Then we can think of the rule part of ‘f’ as some distorting filter. When we look to where ‘x’ would be, we see the thing in the range we know as ‘y’.

At this point you probably imagine this is a pointless word to have. And that it’s backed up by a useless analogy. So it is. As far as I’ve gone this addresses a problem we don’t need to solve. If we want “the thing f matches x to” we can just say “f(x)”. Well, we write “f(x)”. We say “f of x”. Maybe “f at x”, or “f evaluated at x” if we want to emphasize ‘f’ more than ‘x’ or ‘f(x)’.

Where it gets useful is that we start looking at subsets. Bunches of points, not just one. Call ‘D’ some interesting-looking subset of the domain. What would it mean if we wrote the expression ‘f(D)’? Could we make that meaningful?

We do mean something by it. We mean what you might imagine by it. If you haven’t thought about what ‘f(D)’ might mean, take a moment — a short moment — and guess what it might. Don’t overthink it and you’ll have it right. I’ll put the answer just after this little bit so you can ponder.

So. ‘f(D)’ is a set. We make that set by taking, in turn, every single thing that’s in ‘D’. And find everything in the range that’s matched by ‘f’ to those things in ‘D’. Collect them all together. This set, ‘f(D)’, is “the image of D under f”.

We use images a lot when we’re studying how functions work. A function that maps a simple lump into a simple lump of about the same size is one thing. A function that maps a simple lump into a cloud of disparate particles is a very different thing. A function that describes how physical systems evolve will preserve the volume and some other properties of these lumps of space. But it might stretch out and twist around that space, which is how we discovered chaos.

Properly speaking, the range of a function ‘f’ is just the image of the whole domain under that ‘f’. But we’re not usually that careful about defining ranges. We’ll say something like ‘the domain and range are the sets of real numbers’ even though we only need the positive real numbers in the range. Well, it’s not like we’re paying for unnecessary range. Let me call the whole domain ‘X’, because I went and used ‘D’ earlier. Then the range, let me call that ‘Y’, would be ‘Y = f(X)’.

Images will turn up again. They’re a handy way to let us get at some useful ideas.

## The End 2016 Mathematics A To Z: The Fredholm Alternative

Some things are created with magnificent names. My essay today is about one of them. It’s one of my favorite terms and I get a strange little delight whenever it needs to be mentioned in a proof. It’s also the title I shall use for my 1970s Paranoid-Conspiracy Thriller.

## The Fredholm Alternative.

So the Fredholm Alternative is about whether this supercomputer with the ability to monitor every commercial transaction in the country falls into the hands of the Parallax Corporation or whether — ahm. Sorry. Wrong one. OK.

The Fredholm Alternative comes from the world of functional analysis. In functional analysis we study sets of functions with tools from elsewhere in mathematics. Some you’d be surprised aren’t already in there. There’s adding functions together, multiplying them, the stuff of arithmetic. Some might be a bit surprising, like the stuff we draw from linear algebra. That’s ideas like functions having length, or being at angles to each other. Or that length and those angles changing when we take a function of those functions. This may sound baffling. But a mathematics student who’s got into functional analysis usually has a happy surprise waiting. She discovers the subject is easy. At least, it relies on a lot of stuff she’s learned already, applied to stuff that’s less difficult to work with than, like, numbers.

(This may be a personal bias. I found functional analysis a thoroughgoing delight, even though I didn’t specialize in it. But I got the impression from other grad students that functional analysis was well-liked. Maybe we just got the right instructor for it.)

I’ve mentioned in passing “operators”. These are functions that have a domain that’s a set of functions and a range that’s another set of functions. Suppose you come up to me with some function, let’s say $f(x) = x^2$. I give you back some other function — say, $F(x) = \frac{1}{3}x^3 - 4$. Then I’m acting as an operator.

Why should I do such a thing? Many operators correspond to doing interesting stuff. Taking derivatives of functions, for example. Or undoing the work of taking a derivative. Describing how changing a condition changes what sorts of outcomes a process has. We do a lot of stuff with these. Trust me.

Let me use the name T’ for some operator. I’m not going to say anything about what it does. The letter’s arbitrary. We like to use capital letters for operators because it makes the operators look extra important. And we don’t want to use `O’ because that just looks like zero and we don’t need that confusion.

Anyway. We need two functions. One of them will be called ‘f’ because we always call functions ‘f’. The other we’ll call ‘v’. In setting up the Fredholm Alternative we have this important thing: we know what ‘f’ is. We don’t know what ‘v’ is. We’re finding out something about what ‘v’ might be. The operator doing whatever it does to a function we write down as if it were multiplication, that is, like ‘Tv’. We get this notation from linear algebra. There we multiple matrices by vectors. Matrix-times-vector multiplication works like operator-on-a-function stuff. So much so that if we didn’t use the same notation young mathematics grad students would rise in rebellion. “This is absurd,” they would say, in unison. “The connotations of these processes are too alike not to use the same notation!” And the department chair would admit they have a point. So we write ‘Tv’.

If you skipped out on mathematics after high school you might guess we’d write ‘T(v)’ and that would make sense too. And, actually, we do sometimes. But by the time we’re doing a lot of functional analysis we don’t need the parentheses so much. They don’t clarify anything we’re confused about, and they require all the work of parenthesis-making. But I do see it sometimes, mostly in older books. This makes me think mathematicians started out with ‘T(v)’ and then wrote less as people got used to what they were doing.

I admit we might not literally know what ‘f’ is. I mean we know what ‘f’ is in the same way that, for a quadratic equation, “ax2 + bx + c = 0”, we “know” what ‘a’, ‘b’, and ‘c’ are. Similarly we don’t know what ‘v’ is in the same way we don’t know what ‘x’ there is. The Fredholm Alternative tells us exactly one of these two things has to be true:

For operators that meet some requirements I don’t feel like getting into, either:

1. There’s one and only one ‘v’ which makes the equation $Tv = f$ true.
2. Or else $Tv = 0$ for some ‘v’ that isn’t just zero everywhere.

That is, either there’s exactly one solution, or else there’s no solving this particular equation. We can rule out there being two solutions (the way quadratic equations often have), or ten solutions (the way some annoying problems will), or infinitely many solutions (oh, it happens).

It turns up often in boundary value problems. Often before we try solving one we spend some time working out whether there is a solution. You can imagine why it’s worth spending a little time working that out before committing to a big equation-solving project. But it comes up elsewhere. Very often we have problems that, at their core, are “does this operator match anything at all in the domain to a particular function in the range?” When we try to answer we stumble across Fredholm’s Alternative over and over.

Fredholm here was Ivar Fredholm, a Swedish mathematician of the late 19th and early 20th centuries. He worked for Uppsala University, and for the Swedish Social Insurance Agency, and as an actuary for the Skandia insurance company. Wikipedia tells me that his mathematical work was used to calculate buyback prices. I have no idea how.

## Theorem Thursday: One Mean Value Theorem Of Many

For this week I have something I want to follow up on. We’ll see if I make it that far.

# The Mean Value Theorem.

My subject line disagrees with the header just above here. I want to talk about the Mean Value Theorem. It’s one of those things that turns up in freshman calculus and then again in Analysis. It’s introduced as “the” Mean Value Theorem. But like many things in calculus it comes in several forms. So I figure to talk about one of them here, and another form in a while, when I’ve had time to make up drawings.

Calculus can split effortlessly into two kinds of things. One is differential calculus. This is the study of continuity and smoothness. It studies how a quantity changes if someting affecting it changes. It tells us how to optimize things. It tells us how to approximate complicated functions with simpler ones. Usually polynomials. It leads us to differential equations, problems in which the rate at which something changes depends on what value the thing has.

The other kind is integral calculus. This is the study of shapes and areas. It studies how infinitely many things, all infinitely small, add together. It tells us what the net change in things are. It tells us how to go from information about every point in a volume to information about the whole volume.

They aren’t really separate. Each kind informs the other, and gives us tools to use in studying the other. And they are almost mirrors of one another. Differentials and integrals are not quite inverses, but they come quite close. And as a result most of the important stuff you learn in differential calculus has an echo in integral calculus. The Mean Value Theorem is among them.

The Mean Value Theorem is a rule about functions. In this case it’s functions with a domain that’s an interval of the real numbers. I’ll use ‘a’ as the name for the smallest number in the domain and ‘b’ as the largest number. People talking about the Mean Value Theorem often do. The range is also the real numbers, although it doesn’t matter which ones.

I’ll call the function ‘f’ in accord with a longrunning tradition of not working too hard to name functions. What does matter is that ‘f’ is continuous on the interval [a, b]. I’ve described what ‘continuous’ means before. It means that here too.

And we need one more thing. The function f has to be differentiable on the interval (a, b). You maybe noticed that before I wrote [a, b], and here I just wrote (a, b). There’s a difference here. We need the function to be continuous on the “closed” interval [a, b]. That is, it’s got to be continuous for ‘a’, for ‘b’, and for every point in-between.

But we only need the function to be differentiable on the “open” interval (a, b). That is, it’s got to be continuous for all the points in-between ‘a’ and ‘b’. If it happens to be differentiable for ‘a’, or for ‘b’, or for both, that’s great. But we won’t turn away a function f for not being differentiable at those points. Only the interior. That sort of distinction between stuff true on the interior and stuff true on the boundaries is common. This is why mathematicians have words for “including the boundaries” (“closed”) and “never minding the boundaries” (“open”).

As to what “differentiable” is … A function is differentiable at a point if you can take its derivative at that point. I’m sure that clears everything up. There are many ways to describe what differentiability is. One that’s not too bad is to imagine zooming way in on the curve representing a function. If you start with a big old wobbly function it waves all around. But pick a point. Zoom in on that. Does the function stay all wobbly, or does it get more steady, more straight? Keep zooming in. Does it get even straighter still? If you zoomed in over and over again on the curve at some point, would it look almost exactly like a straight line?

If it does, then the function is differentiable at that point. It has a derivative there. The derivative’s value is whatever the slope of that line is. The slope is that thing you remember from taking Boring Algebra in high school. That rise-over-run thing. But this derivative is a great thing to know. You could approximate the original function with a straight line, with slope equal to that derivative. Close to that point, you’ll make a small enough error nobody has to worry about it.

That there will be this straight line approximation isn’t true for every function. Here’s an example. Picture a line that goes up and then takes a 90-degree turn to go back down again. Look at the corner. However close you zoom in on the corner, there’s going to be a corner. It’s never going to look like a straight line; there’s a 90-degree angle there. It can be a smaller angle if you like, but any sort of corner breaks this differentiability. This is a point where the function isn’t differentiable.

There are functions that are nothing but corners. They can be differentiable nowhere, or only at a tiny set of points that can be ignored. (A set of measure zero, as the dialect would put it.) Mathematicians discovered this over the course of the 19th century. They got into some good arguments about how that can even make sense. It can get worse. Also found in the 19th century were functions that are continuous only at a single point. This smashes just about everyone’s intuition. But we can’t find a definition of continuity that’s as useful as the one we use now and avoids that problem. So we accept that it implies some pathological conclusions and carry on as best we can.

Now I get to the Mean Value Theorem in its differential calculus pelage. It starts with the endpoints, ‘a’ and ‘b’, and the values of the function at those points, ‘f(a)’ and ‘f(b)’. And from here it’s easiest to figure what’s going on if you imagine the plot of a generic function f. I recommend drawing one. Just make sure you draw it without lifting the pen from paper, and without including any corners anywhere. Something wiggly.

Draw the line that connects the ends of the wiggly graph. Formally, we’re adding the line segment that connects the points with coordinates (a, f(a)) and (b, f(b)). That’s coordinate pairs, not intervals. That’s clear in the minds of the mathematicians who don’t see why not to use parentheses over and over like this. (We are short on good grouping symbols like parentheses and brackets and braces.)

Per the Mean Value Theorem, there is at least one point whose derivative is the same as the slope of that line segment. If you were to slide the line up or down, without changing its orientation, you’d find something wonderful. Most of the time this line intersects the curve, crossing from above to below or vice-versa. But there’ll be at least one point where the shifted line is “tangent”, where it just touches the original curve. Close to that touching point, the “tangent point”, the shifted line and the curve blend together and can’t be easily told apart. As long as the function is differentiable on the open interval (a, b), and continuous on the closed interval [a, b], this will be true. You might convince yourself of it by drawing a couple of curves and taking a straightedge to the results.

This is an existence theorem. Like the Intermediate Value Theorem, it doesn’t tell us which point, or points, make the thing we’re interested in true. It just promises us that there is some point that does it. So it gets used in other proofs. It lets us mix information about intervals and information about points.

It’s tempting to try using it numerically. It looks as if it justifies a common differential-calculus trick. Suppose we want to know the value of the derivative at a point. We could pick a little interval around that point and find the endpoints. And then find the slope of the line segment connecting the endpoints. And won’t that be close enough to the derivative at the point we care about?

Well. Um. No, we really can’t be sure about that. We don’t have any idea what interval might make the derivative of the point we care about equal to this line-segment slope. The Mean Value Theorem won’t tell us. It won’t even tell us if there exists an interval that would let that trick work. We can’t invoke the Mean Value Theorem to let us get away with that.

Often, though, we can get away with it. Differentiable functions do have to follow some rules. Among them is that if you do pick a small enough interval then approximations that look like this will work all right. If the function flutters around a lot, we need a smaller interval. But a lot of the functions we’re interested in don’t flutter around that much. So we can get away with it. And there’s some grounds to trust in getting away with it. The Mean Value Theorem isn’t any part of the grounds. It just looks so much like it ought to be.

I hope on a later Thursday to look at an integral-calculus form of the Mean Value Theorem.

## A Leap Day 2016 Mathematics A To Z: X-Intercept

Oh, x- and y-, why are you so poor in mathematics terms? I brave my way.

## X-Intercept.

I did not get much out of my eighth-grade, pre-algebra, class. I didn’t connect with the teacher at all. There were a few little bits to get through my disinterest. One came in graphing. Not graph theory, of course, but the graphing we do in middle school and high school. That’s where we find points on the plane with coordinates that make some expression true. Two major terms kept coming up in drawing curves of lines. They’re the x-intercept and the y-intercept. They had this lovely, faintly technical, faintly science-y sound. I think the teacher emphasized a few times they were “intercepts”, not “intersects”. But it’s hard to explain to an eighth-grader why this is an important difference to make. I’m not sure I could explain it to myself.

An x-intercept is a point where the plot of a curve and the x-axis meet. So we’re assuming this is a Cartesian coordinate system, the kind marked off with a pair of lines meeting at right angles. It’s usually two-dimensional, sometimes three-dimensional. I don’t know anyone who’s worried about the x-intercept for a four-dimensional space. Even higher dimensions are right out. The thing that confused me the most, when learning this, is a small one. The x-axis is points that have a y-coordinate of zero. Not an x-coordinate of zero. So in a two-dimensional space it makes sense to describe the x-intercept as a single value. That’ll be the x-coordinate, and the point with the x-coordinate of that and the y-coordinate of zero is the intercept.

If you have an expression and you want to find an x-intercept, you need to find values of x which make the expression equal to zero. We get the idea from studying lines. There are a couple of typical representations of lines. They almost always use x for the horizontal coordinate, and y for the vertical coordinate. The names are only different if the author is making a point about the arbitrariness of variable names. Sigh at such an author and move on. An x-intercept has a y-coordinate of zero, so, set any appearance of ‘y’ in the expression equal to zero and find out what value or values of x make this true. If the expression is an equation for a line there’ll be just the one point, unless the line is horizontal. (If the line is horizontal, then either every point on the x-axis is an intercept, or else none of them are. The line is either “y equals zero”, or it is “y equals something other than zero”. )

There’s also a y-intercept. It is exactly what you’d imagine once you know that. It’s usually easier to find what the y-intercept is. The equation describing a curve is typically written in the form “y = f(x)”. That is, y is by itself on one side, and some complicated expression involving x’s is on the other. Working out what y is for a given x is straightforward. Working out what x is for a given y is … not hard, for a line. For more complicated shapes it can be difficult. There might not be a unique answer. That’s all right. There may be several x-intercepts.

There are a couple names for the x-intercepts. The one that turns up most often away from the pre-algebra and high school algebra study of lines is a “zero”. It’s one of those bits in which mathematicians seem to be trying to make it hard for students. A “zero” of the function f(x) is generally not what you get when you evaluate it for x equalling zero. Sorry about that. It’s the values of x for which f(x) equals zero. We also call them “roots”.

OK, but who cares?

Well, if you want to understand the shape of a curve, the way a function looks, it helps to plot it. Today, yeah, pull up Mathematica or Matlab or Octave or some other program and you get your plot. Fair enough. If you don’t have a computer that can plot like that, the way I did in middle school, you have to do it by hand. And then the intercepts are clues to how to sketch the function. They are, relatively, easy points which you can find, and which you know must be on the curve. We may form a very rough sketch of the curve. But that rough picture may be better than having nothing.

And we can learn about the behavior of functions even without plotting, or sketching a plot. Intercepts of expressions, or of parts of expressions, are points where the value might change from positive to negative. If the denominator of a part of the expression has an x-intercept, this could be a point where the function’s value is undefined. It may be a discontinuity in the function. The function’s values might jump wildly between one side and another. These are often the important things about understanding functions. Where are they positive? Where are they negative? Where are they continuous? Where are they not?

These are things we often want to know about functions. And we learn many of them by looking for the intercepts, x- and y-.

## A Leap Day 2016 Mathematics A To Z: Surjective Map

Gaurish today gives me one more request for the Leap Day Mathematics A To Z. And it lets me step away from abstract algebra again, into the world of analysis and what makes functions work. It also hovers around some of my past talk about functions.

## Surjective Map.

This request echoes one of the first terms from my Summer 2015 Mathematics A To Z. Then I’d spent some time on a bijection, or a bijective map. A surjective map is a less complicated concept. But if you understood bijective maps, you picked up surjective maps along the way.

By “map”, in this context, mathematicians don’t mean those diagrams that tell you where things are and how you might get there. Of course we don’t. By a “map” we mean that we have some rule that matches things in one set to things in another. If this sounds to you like what I’ve claimed a function is then you have a good ear. A mapping and a function are pretty much different names for one another. If there’s a difference in connotation I suppose it’s that a “mapping” makes a weaker suggestion that we’re necessarily talking about numbers.

(In some areas of mathematics, a mapping means a function with some extra properties, often some kind of continuity. Don’t worry about that. Someone will tell you when you’re doing mathematics deep enough to need this care. Mind, that person will tell you by way of a snarky follow-up comment picking on some minor point. It’s nothing personal. They just want you to appreciate that they’re very smart.)

So a function, or a mapping, has three parts. One is a set called the domain. One is a set called the range. And then there’s a rule matching things in the domain to things in the range. With functions we’re so used to the domain and range being the real numbers that we often forget to mention those parts. We go on thinking “the function” is just “the rule”. But the function is all three of these pieces.

A function has to match everything in the domain to something in the range. That’s by definition. There’s no unused scraps in the domain. If it looks like there is, that’s because were being sloppy in defining the domain. Or let’s be charitable. We assumed the reader understands the domain is only the set of things that make sense. And things make sense by being matched to something in the range.

Ah, but now, the range. The range could have unused bits in it. There’s nothing that inherently limits the range to “things matched by the rule to some thing in the domain”.

By now, then, you’ve probably spotted there have to be two kinds of functions. There’s one in which the whole range is used, and there’s ones in which it’s not. Good eye. This is exactly so.

If a function only uses part of the range, if it leaves out anything, even if it’s just a single value out of infinitely many, then the function is called an “into” mapping. If you like, it takes the domain and stuffs it into the range without filling the range.

Ah, but if a function uses every scrap of the range, with nothing left out, then we have an “onto” mapping. The whole of the domain gets sent onto the whole of the range. And this is also known as a “surjective” mapping. We get the term “surjective” from Nicolas Bourbaki. Bourbaki is/was the renowned 20th century mathematics art-collective group which did so much to place rigor and intuition-free bases into mathematics.

The term pairs up with the “injective” mapping. In this, the elements in the range match up with one and only one thing in the domain. So if you know the function’s rule, then if you know a thing in the range, you also know the one and only thing in the domain matched to that. If you don’t feel very French, you might call this sort of function one-to-one. That might be a better name for saying why this kind of function is interesting.

Not every function is injective. But then not every function is surjective either. But if a function is both injective and surjective — if it’s both one-to-one and onto — then we have a bijection. It’s a mapping that can represent the way a system changes and that we know how to undo. That’s pretty comforting stuff.

If we use a mapping to describe how a process changes a system, then knowing it’s a surjective map tells us something about the process. It tells us the process makes the system settle into a subset of all the possible states. That doesn’t mean the thing is stable — that little jolts get worn down. And it doesn’t mean that the thing is settling to a fixed state. But it is a piece of information suggesting that’s possible. This may not seem like a strong conclusion. But considering how little we know about the function it’s impressive to be able to say that much.