## My 2019 Mathematics A To Z: Operator

Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there. Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

# Operator.

An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same sapce but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.

So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.

My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.

I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.

Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function $af + bg$ is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of $af + bg$ is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.

I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.

# Functor.

So, category theory. It’s a foundational field. It talks about stuff that’s terribly abstract. This means it’s powerful, but it can be hard to think of interesting examples. I’ll try, though.

It starts with categories. These have three parts. The first part is a set of things. (There always is.) The second part is a collection of matches between pairs of things in the set. They’re called morphisms. The third part is a rule that lets us combine two morphisms into a new, third one. That is. Suppose ‘a’, ‘b’, and ‘c’ are things in the set. Then there’s a morphism that matches $a \rightarrow b$, and a morphism that matches $b \rightarrow c$. And we can combine them into another morphism that matches $a \rightarrow c$. So we have a set of things, and a set of things we can do with those things. And the set of things we can do is itself a group.

This describes a lot of stuff. Group theory fits seamlessly into this description. Most of what we do with numbers is a kind of group theory. Vector spaces do too. Most of what we do with analysis has vector spaces underneath it. Topology does too. Most of what we do with geometry is an expression of topology. So you see why category theory is so foundational.

Functors enter our picture when we have two categories. Or more. They’re about the ways we can match up categories. But let’s start with two categories. One of them I’ll name ‘C’, and the other, ‘D’. A functor has to match everything that’s in the set of ‘C’ to something that’s in the set of ‘D’.

And it does more. It has to match every morphism between things in ‘C’ to some other morphism, between corresponding things in ‘D’. It’s got to do it in a way that satisfies that combining, too. That is, suppose that ‘f’ and ‘g’ are morphisms for ‘C’. And that ‘f’ and ‘g’ combine to make ‘h’. Then, the functor has to match ‘f’ and ‘g’ and ‘h’ to some morphisms for ‘D’. The combination of whatever ‘f’ matches to and whatever ‘g’ matches to has to be whatever ‘h’ matches to.

This might sound to you like a homomorphism. If it does, I admire your memory or mathematical prowess. Functors are about matching one thing to another in a way that preserves structure. Structure is the way that sets of things can interact. We naturally look for stuff made up of different things that have the same structure. Yes, functors are themselves a category. That is, you can make a brand-new category whose set of things are the functors between two other categories. This is a good spot to pause while the dizziness passes.

There are two kingdoms of functor. You tell them apart by what they do with the morphisms. Here again I’m going to need my categories ‘C’ and ‘D’. I need a morphism for ‘C’. I’ll call that ‘f’. ‘f’ has to match something in the set of ‘C’ to something in the set of ‘C’. Let me call the first something ‘a’, and the second something ‘b’. That’s all right so far? Thank you.

Let me call my functor ‘F’. ‘F’ matches all the elements in ‘C’ to elements in ‘D’. And it matches all the morphisms on the elements in ‘C’ to morphisms on the elmenets in ‘D’. So if I write ‘F(a)’, what I mean is look at the element ‘a’ in the set for ‘C’. Then look at what element in the set for ‘D’ the functor matches with ‘a’. If I write ‘F(b)’, what I mean is look at the element ‘b’ in the set for ‘C’. Then pick out whatever element in the set for ‘D’ gets matched to ‘b’. If I write ‘F(f)’, what I mean is to look at the morphism ‘f’ between elements in ‘C’. Then pick out whatever morphism between elements in ‘D’ that that gets matched with.

Here’s where I’m going with this. Suppose my morphism ‘f’ matches ‘a’ to ‘b’. Does the functor of that morphism, ‘F(f)’, match ‘F(a)’ to ‘F(b)’? Of course, you say, what else could it do? And the answer is: why couldn’t it match ‘F(b)’ to ‘F(a)’?

No, it doesn’t break everything. Not if you’re consistent about swapping the order of the matchings. The normal everyday order, the one you’d thought couldn’t have an alternative, is a “covariant functor”. The crosswise order, this second thought, is a “contravariant functor”. Covariant and contravariant are distinctions that weave through much of mathematics. They particularly appear through tensors and the geometry they imply. In that introduction they tend to be difficult, even mean, creations, since in regular old Euclidean space they don’t mean anything different. They’re different for non-Euclidean spaces, and that’s important and valuable. The covariant versus contravariant difference is easier to grasp here.

Functors work their way into computer science. The avenue here is in functional programming. That’s a method of programming in which instead of the normal long list of commands, you write a single line of code that holds like fourteen “->” symbols that makes the computer stop and catch fire when it encounters a bug. The advantage is that when you have the code debugged it’s quite speedy and memory-efficient. The disadvantage is if you have to alter the function later, it’s easiest to throw everything out and start from scratch, beginning from vacuum-tube-based computing machines. But it works well while it does. You just have to get the hang of it.

## The End 2016 Mathematics A To Z: Weierstrass Function

I’ve teased this one before.

## Weierstrass Function.

So you know how the Earth is a sphere, but from our normal vantage point right up close to its surface it looks flat? That happens with functions too. Here I mean the normal kinds of functions we deal with, ones with domains that are the real numbers or a Euclidean space. And ranges that are real numbers. The functions you can draw on a sheet of paper with some wiggly bits. Let the function wiggle as much as you want. Pick a part of it and zoom in close. That zoomed-in part will look straight. If it doesn’t look straight, zoom in closer.

We rely on this. Functions that are straight, or at least straight enough, are easy to work with. We can do calculus on them. We can do analysis on them. Functions with plots that look like straight lines are easy to work with. Often the best approach to working with the function you’re interested in is to approximate it with an easy-to-work-with function. I bet it’ll be a polynomial. That serves us well. Polynomials are these continuous functions. They’re differentiable. They’re smooth.

That thing about the Earth looking flat, though? That’s a lie. I’ve never been to any of the really great cuts in the Earth’s surface, but I have been to some decent gorges. I went to grad school in the Hudson River Valley. I’ve driven I-80 over Pennsylvania’s scariest bridges. There’s points where the surface of the Earth just drops a great distance between your one footstep and your last.

Functions do that too. We can have points where a function isn’t differentiable, where it’s impossible to define the direction it’s headed. We can have points where a function isn’t continuous, where it jumps from one region of values to another region. Everyone knows this. We can’t dismiss those as abberations not worthy of the name “function”; too many of them are too useful. Typically we handle this by admitting there’s points that aren’t continuous and we chop the function up. We make it into a couple of functions, each stretching from discontinuity to discontinuity. Between them we have continuous region and we can go about our business as before.

Then came the 19th century when things got crazy. This particular craziness we credit to Karl Weierstrass. Weierstrass’s name is all over 19th century analysis. He had that talent for probing the limits of our intuition about basic mathematical ideas. We have a calculus that is logically rigorous because he found great counterexamples to what we had assumed without proving.

The Weierstrass function challenges this idea that any function is going to eventually level out. Or that we can even smooth a function out into basically straight, predictable chunks in-between sudden changes of direction. The function is continuous everywhere; you can draw it perfectly without lifting your pen from paper. But it always looks like a zig-zag pattern, jumping around like it was always randomly deciding whether to go up or down next. Zoom in on any patch and it still jumps around, zig-zagging up and down. There’s never an interval where it’s always moving up, or always moving down, or even just staying constant.

Despite being continuous it’s not differentiable. I’ve described that casually as it being impossible to predict where the function is going. That’s an abuse of words, yes. The function is defined. Its value at a point isn’t any more random than the value of “x2” is for any particular x. The unpredictability I’m talking about here is a side effect of ignorance. Imagine I showed you a plot of “x2” with a part of it concealed and asked you to fill in the gap. You’d probably do pretty well estimating it. The Weierstrass function, though? No; your guess would be lousy. My guess would be lousy too.

That’s a weird thing to have happen. A century and a half later it’s still weird. It gets weirder. The Weierstrass function isn’t differentiable generally. But there are exceptions. There are little dots of differentiability, where the rate at which the function changes is known. Not intervals, though. Single points. This is crazy. Derivatives are about how a function changes. We work out what they should even mean by thinking of a function’s value on strips of the domain. Those strips are small, but they’re still, you know, strips. But on almost all of that strip the derivative isn’t defined. It’s only at isolated points, a set with measure zero, that this derivative even exists. It evokes the medieval Mysteries, of how we are supposed to try, even though we know we shall fail, to understand how God can have contradictory properties.

It’s not quite that Mysterious here. Properties like this challenge our intuition, if we’ve gotten any. Once we’ve laid out good definitions for ideas like “derivative” and “continuous” and “limit” and “function” we can work out whether results like this make sense. And they — well, they follow. We can avoid weird conclusions like this, but at the cost of messing up our definitions for what a “function” and other things are. Making those useless. For the mathematical world to make sense, we have to change our idea of what quite makes sense.

That’s all right. When we look close we realize the Earth around us is never flat. Even reasonably flat areas have slight rises and falls. The ends of properties are marked with curbs or ditches, and bordered by streets that rise to a center. Look closely even at the dirt and we notice that as level as it gets there are still rocks and scratches in the ground, clumps of dirt an infinitesimal bit higher here and lower there. The flatness of the Earth around us is a useful tool, but we miss a lot by pretending it’s everything. The Weierstrass function is one of the ways a student mathematician learns that while smooth, predictable functions are essential, there is much more out there.

## The End 2016 Mathematics A To Z: Smooth

Mathematicians affect a pose of objectivity. We justify this by working on things whose truth we can know, and which must be true whenever we accept certain rules of deduction and certain definitions and axioms. This seems fair. But we choose to pay attention to things that interest us for particular reasons. We study things we like. My A To Z glossary term for today is about one of those things we like.

## Smooth.

Functions. Not everything mathematicians do is functions. But functions turn up a lot. We need to set some rules. “A function” is so generic a thing we can’t handle it much. Narrow it down. Pick functions with domains that are numbers. Range too. By numbers I mean real numbers, maybe complex numbers. That gives us something.

There’s functions that are hard to work with. This is almost all of them, so we don’t touch them unless we absolutely must. But they’re functions that aren’t continuous. That means what you imagine. The value of the function at some point is wholly unrelated to its value at some nearby point. It’s hard to work with anything that’s unpredictable like that. Functions as well as people.

We like functions that are continuous. They’re predictable. We can make approximations. We can estimate the function’s value at some point using its value at some more convenient point. It’s easy to see why that’s useful for numerical mathematics, for calculations to approximate stuff. The dazzling thing is it’s useful analytically. We step into the Platonic-ideal world of pure mathematics. We have tools that let us work as if we had infinitely many digits of precision, for infinitely many numbers at once. And yet we use estimates and approximations and errors. We use them in ways to give us perfect knowledge; we get there by estimates.

Continuous functions are nice. Well, they’re nicer to us than functions that aren’t continuous. But there are even nicer functions. Functions nicer to us. A continuous function, for example, can have corners; it can change direction suddenly and without warning. A differentiable function is more predictable. It can’t have corners like that. Knowing the function well at one point gives us more information about what it’s like nearby.

The derivative of a function doesn’t have to be continuous. Grumble. It’s nice when it is, though. It makes the function easier to work with. It’s really nice for us when the derivative itself has a derivative. Nothing guarantees that the derivative of a derivative is continuous. But maybe it is. Maybe the derivative of the derivative has a derivative. That’s a function we can do a lot with.

A function is “smooth” if it has as many derivatives as we need for whatever it is we’re doing. And if those derivatives are continuous. If this seems loose that’s because it is. A proof for whatever we’re interested in might need only the original function and its first derivative. It might need the original function and its first, second, third, and fourth derivatives. It might need hundreds of derivatives. If we look through the details of the proof we might find exactly how many derivatives we need and how many of them need to be continuous. But that’s tedious. We save ourselves considerable time by saying the function is “smooth”, as in, “smooth enough for what we need”.

If we do want to specify how many continuous derivatives a function has we call it a “Ck function”. The C here means continuous. The ‘k’ means there are the number ‘k’ continuous derivatives of it. This is completely different from a “Ck function”, which would be one that’s a k-dimensional vector. Whether the “C” is boldface or not is important. A function might have infinitely many continuous derivatives. That we call a “C function”. That’s got wonderful properties, especially if the domain and range are complex-valued numbers. We couldn’t do Complex Analysis without it. Complex Analysis is the course students take after wondering how they’ll ever survive Real Analysis. It’s much easier than Real Analysis. Mathematics can be strange.

## The End 2016 Mathematics A To Z: Principal

Functions. They’re at the center of so much mathematics. They have three pieces: a domain, a range, and a rule. The one thing functions absolutely must do is match stuff in the domain to one and only one thing in the range. So this is where it gets tricky.

## Principal.

Thing with this one-and-only-one thing in the range is it’s not always practical. Sometimes it only makes sense to allow for something in the domain to match several things in the range. For example, suppose we have the domain of positive numbers. And we want a function that gives us the numbers which, squared, are whatever the original function was. For any positive real number there’s two numbers that do that. 4 should match to both +2 and -2.

You might ask why I want a function that tells me the numbers which, squared, equal something. I ask back, what business is that of yours? I want a function that does this and shouldn’t that be enough? We’re getting off to a bad start here. I’m sorry; I’ve been running ragged the last few days. I blame the flat tire on my car.

Anyway. I’d want something like that function because I’m looking for what state of things makes some other thing true. This turns up often in “inverse problems”, problems in which we know what some measurement is and want to know what caused the measurement. We do that sort of problem all the time.

We can handle these multi-valued functions. Of course we can. Mathematicians are as good at loopholes as anyone else is. Formally we declare that the range isn’t the real numbers but rather sets of real numbers. My what-number-squared function then matches ‘4’ in the domain to the set of numbers ‘+2 and -2’. The set has several things in it, but there’s just the one set. Clever, huh?

This sort of thing turns up a lot. There’s two numbers that, squared, give us any real number (except zero). There’s three numbers that, squared, give us any real number (again except zero). Polynomials might have a whole bunch of numbers that make some equation true. Trig functions are worse. The tangent of 45 degrees equals 1. So is the tangent of 225 degrees. Also 405 degrees. Also -45 degrees. Also -585 degrees. OK, a mathematician would use radians instead of degrees, but that just changes what the numbers are. Not that there’s infinitely many of them.

It’s nice to have options. We don’t always want options. Sometimes we just want one blasted simple answer to things. It’s coded into the language. We say “the square root of four”. We speak of “the arctangent of 1”, which is to say, “the angle with tangent of 1”. We only say “all square roots of four” if we’re making a point about overlooking options.

If we’ve got a set of things, then we can pick out one of them. This is obvious, which means it is so very hard to prove. We just have to assume we can. Go ahead; assume we can. Our pick of the one thing out of this set is the “principal”. It’s not any more inherently right than the other possibilities. It’s just the one we choose to grab first.

So. The principal square root of four is positive two. The principal arctangent of 1 is 45 degrees, or in the dialect of mathematicians π divided by four. We pick these values over other possibilities because they’re nice. What makes them nice? Well, they’re nice. Um. Most of their numbers aren’t that big. They use positive numbers if we have a choice in the matter. Deep down we still suspect negative numbers of being up to something.

If nobody says otherwise then the principal square root is the positive one, or the one with a positive number in front of the imaginary part. If nobody says otherwise the principal arcsine is between -90 and +90 degrees (-π/2 and π/2). The principal arccosine is between 0 and 180 degrees (0 and π), unless someone says otherwise. The principal arctangent is … between -90 and 90 degrees, unless it’s between 0 and 180 degrees. You can count on the 0 to 90 part. Use your best judgement and roll with whatever develops for the other half of the range there. There’s not one answer that’s right for every possible case. The point of a principal value is to pick out one answer that’s usually a good starting point.

When you stare at what it means to be a function you realize that there’s a difference between the original function and the one that returns the principal value. The original function has a range that’s “sets of values”. The principal-value version has a range that’s just one value. If you’re being kind to your audience you make some note of that. Usually we note this by capitalizing the start of the function: “arcsin z” gives way to “Arcsin z”. “Log z” would be the principal-value version of “log z”. When you start pondering logarithms for negative numbers or for complex-valued numbers you get multiple values. It’s the same way that the arcsine function does.

And it’s good to warn your audience which principal value you mean, especially for the arc-trigonometric-functions or logarithms. (I’ve never seen someone break the square root convention.) The principal value is about picking the most obvious and easy-to-work-with value out of a set of them. It’s just impossible to get everyone to agree on what the obvious is.

## The End 2016 Mathematics A To Z: Local

Today’s is another of those words that means nearly what you would guess. There are still seven letters left, by the way, which haven’t had any requested terms. If you’d like something described please try asking.

## Local.

Stops at every station, rather than just the main ones.

OK, I’ll take it seriously.

So a couple years ago I visited Niagara Falls, and I stepped into the river, just above the really big drop. Niagara Falls, demonstrating some locally unsafe waters to be in. Background: Canada (left), United States (right).

I didn’t have any plans to go over the falls, and didn’t, but I liked the thrill of claiming I had. I’m not crazy, though; I picked a spot I knew was safe to step in. It’s only in the retelling I went into the Niagara River just above the falls.

Because yes, there is surely danger in certain spots of the Niagara River. But there are also spots that are perfectly safe. And not isolated spots either. I wouldn’t have been less safe if I’d stepped into the river a few feet closer to the edge. Nor if I’d stepped in a few feet farther away. Where I stepped in was locally safe. The Niagara River, and some locally safe enough waters to be in. That’s not me in the picture; if you do know who it is, I have no way of challenging you. But it’s the area I stepped into and felt this lovely illicit thrill doing so.

Over in mathematics we do a lot of work on stuff that’s true or false depending on what some parameters are. We can look at bunches of those parameters, and they often look something like normal everyday space. There’s some values that are close to what we started from. There’s others that are far from that.

So, a “neighborhood” of some point is that point and some set of points containing it. It needs to be an “open” set, which means it doesn’t contain its boundary. So, like, everything less than one minute’s walk away, but not the stuff that’s precisely one minute’s walk away. (If we include boundaries we break stuff that we don’t want broken is why.) And certainly not the stuff more than one minute’s walk away. A neighborhood could have any shape. It’s easy to think of it as a little disc around the point you want. That’s usually the easiest to describe in a proof, because it’s “everything a distance less than (something) away”. (That “something” is either ‘δ’ or ‘ε’. Both Greek letters are called in to mean “a tiny distance”. They have different connotations about what the tiny distance is in.) It’s easiest to draw as little amoeba-like blob around a point, and contained inside a bigger amoeba-like blob.

Anyway, something is true “locally” to a point if it’s true in that neighborhood. That means true for everything in that neighborhood. Which is what you’d expect. “Local” means just that. It’s the stuff that’s close to where we started out.

Often we would like to know something “globally”, which means … er … everywhere. Universally so. But it’s usually easier to prove a thing locally. I suppose having a point where we know something is so makes it easier to prove things about what’s nearby. Distant stuff, who knows?

“Local” serves as an adjective for many things. We think of a “local maximum”, for example, or “local minimum”. This is where whatever we’re studying has a value bigger (or smaller) than anywhere else nearby has. Or we speak of a function being “locally continuous”, meaning that we know it’s continuous near this point and we make no promises away from it. It might be “locally differentiable”, meaning we can take derivatives of it close to some interesting point. We say nothing about what happens far from it.

Unless we do. We can talk about something being “local to infinity”. Your first reaction to that should probably be to slap the table and declare that’s it, we’re done. But we can make it sensible, at least to other mathematicians. We do it by starting with a neighborhood that contains the origin, zero, that point in the middle of everything. So, what’s the inverse of that? It’s everything that’s far enough away from the origin. (Don’t include the boundary, we don’t need those headaches.) So why not call that the “neighborhood of infinity”? Other than that it’s a weird set of words to put together? And if something is true in that “neighborhood of infinity”, what is that thing other than true “local to infinity”?

I don’t blame you for being skeptical.

## The End 2016 Mathematics A To Z: Image

It’s another free-choice entry. I’ve got something that I can use to make my Friday easier.

## Image.

So remember a while back I talked about what functions are? I described them the way modern mathematicians like. A function’s got three components to it. One is a set of things called the domain. Another is a set of things called the range. And there’s some rule linking things in the domain to things in the range. In shorthand we’ll write something like “f(x) = y”, where we know that x is in the domain and y is in the range. In a slightly more advanced mathematics class we’ll write $f: x \rightarrow y$. That maybe looks a little more computer-y. But I bet you can read that already: “f matches x to y”. Or maybe “f maps x to y”.

We have a couple ways to think about what ‘y’ is here. One is to say that ‘y’ is the image of ‘x’, under ‘f’. The language evokes camera trickery, or at least the way a trick lens might make us see something different. Pretend that the domain is something you could gaze at. If the domain is, say, some part of the real line, or a two-dimensional plane, or the like that’s not too hard to do. Then we can think of the rule part of ‘f’ as some distorting filter. When we look to where ‘x’ would be, we see the thing in the range we know as ‘y’.

At this point you probably imagine this is a pointless word to have. And that it’s backed up by a useless analogy. So it is. As far as I’ve gone this addresses a problem we don’t need to solve. If we want “the thing f matches x to” we can just say “f(x)”. Well, we write “f(x)”. We say “f of x”. Maybe “f at x”, or “f evaluated at x” if we want to emphasize ‘f’ more than ‘x’ or ‘f(x)’.

Where it gets useful is that we start looking at subsets. Bunches of points, not just one. Call ‘D’ some interesting-looking subset of the domain. What would it mean if we wrote the expression ‘f(D)’? Could we make that meaningful?

We do mean something by it. We mean what you might imagine by it. If you haven’t thought about what ‘f(D)’ might mean, take a moment — a short moment — and guess what it might. Don’t overthink it and you’ll have it right. I’ll put the answer just after this little bit so you can ponder. Our pet rabbit on the beach in Omena, Michigan back in July this year. Which is a small town on the Traverse Bay, which is just off Lake Michigan where … oh, you have Google Maps, you don’t need me. Anyway we wondered what he would make of vast expanses of water, considering he doesn’t like water what with being a rabbit and all that. And he watched it for a while and then shuffled his way in to where the waves come up and could wash over his front legs, making us wonder what kind of crazy rabbit he is, exactly.

So. ‘f(D)’ is a set. We make that set by taking, in turn, every single thing that’s in ‘D’. And find everything in the range that’s matched by ‘f’ to those things in ‘D’. Collect them all together. This set, ‘f(D)’, is “the image of D under f”.

We use images a lot when we’re studying how functions work. A function that maps a simple lump into a simple lump of about the same size is one thing. A function that maps a simple lump into a cloud of disparate particles is a very different thing. A function that describes how physical systems evolve will preserve the volume and some other properties of these lumps of space. But it might stretch out and twist around that space, which is how we discovered chaos.

Properly speaking, the range of a function ‘f’ is just the image of the whole domain under that ‘f’. But we’re not usually that careful about defining ranges. We’ll say something like ‘the domain and range are the sets of real numbers’ even though we only need the positive real numbers in the range. Well, it’s not like we’re paying for unnecessary range. Let me call the whole domain ‘X’, because I went and used ‘D’ earlier. Then the range, let me call that ‘Y’, would be ‘Y = f(X)’.

Images will turn up again. They’re a handy way to let us get at some useful ideas.

## The End 2016 Mathematics A To Z: The Fredholm Alternative

Some things are created with magnificent names. My essay today is about one of them. It’s one of my favorite terms and I get a strange little delight whenever it needs to be mentioned in a proof. It’s also the title I shall use for my 1970s Paranoid-Conspiracy Thriller.

## The Fredholm Alternative.

So the Fredholm Alternative is about whether this supercomputer with the ability to monitor every commercial transaction in the country falls into the hands of the Parallax Corporation or whether — ahm. Sorry. Wrong one. OK.

The Fredholm Alternative comes from the world of functional analysis. In functional analysis we study sets of functions with tools from elsewhere in mathematics. Some you’d be surprised aren’t already in there. There’s adding functions together, multiplying them, the stuff of arithmetic. Some might be a bit surprising, like the stuff we draw from linear algebra. That’s ideas like functions having length, or being at angles to each other. Or that length and those angles changing when we take a function of those functions. This may sound baffling. But a mathematics student who’s got into functional analysis usually has a happy surprise waiting. She discovers the subject is easy. At least, it relies on a lot of stuff she’s learned already, applied to stuff that’s less difficult to work with than, like, numbers.

(This may be a personal bias. I found functional analysis a thoroughgoing delight, even though I didn’t specialize in it. But I got the impression from other grad students that functional analysis was well-liked. Maybe we just got the right instructor for it.)

I’ve mentioned in passing “operators”. These are functions that have a domain that’s a set of functions and a range that’s another set of functions. Suppose you come up to me with some function, let’s say $f(x) = x^2$. I give you back some other function — say, $F(x) = \frac{1}{3}x^3 - 4$. Then I’m acting as an operator.

Why should I do such a thing? Many operators correspond to doing interesting stuff. Taking derivatives of functions, for example. Or undoing the work of taking a derivative. Describing how changing a condition changes what sorts of outcomes a process has. We do a lot of stuff with these. Trust me.

Let me use the name T’ for some operator. I’m not going to say anything about what it does. The letter’s arbitrary. We like to use capital letters for operators because it makes the operators look extra important. And we don’t want to use `O’ because that just looks like zero and we don’t need that confusion.

Anyway. We need two functions. One of them will be called ‘f’ because we always call functions ‘f’. The other we’ll call ‘v’. In setting up the Fredholm Alternative we have this important thing: we know what ‘f’ is. We don’t know what ‘v’ is. We’re finding out something about what ‘v’ might be. The operator doing whatever it does to a function we write down as if it were multiplication, that is, like ‘Tv’. We get this notation from linear algebra. There we multiple matrices by vectors. Matrix-times-vector multiplication works like operator-on-a-function stuff. So much so that if we didn’t use the same notation young mathematics grad students would rise in rebellion. “This is absurd,” they would say, in unison. “The connotations of these processes are too alike not to use the same notation!” And the department chair would admit they have a point. So we write ‘Tv’.

If you skipped out on mathematics after high school you might guess we’d write ‘T(v)’ and that would make sense too. And, actually, we do sometimes. But by the time we’re doing a lot of functional analysis we don’t need the parentheses so much. They don’t clarify anything we’re confused about, and they require all the work of parenthesis-making. But I do see it sometimes, mostly in older books. This makes me think mathematicians started out with ‘T(v)’ and then wrote less as people got used to what they were doing.

I admit we might not literally know what ‘f’ is. I mean we know what ‘f’ is in the same way that, for a quadratic equation, “ax2 + bx + c = 0”, we “know” what ‘a’, ‘b’, and ‘c’ are. Similarly we don’t know what ‘v’ is in the same way we don’t know what ‘x’ there is. The Fredholm Alternative tells us exactly one of these two things has to be true:

For operators that meet some requirements I don’t feel like getting into, either:

1. There’s one and only one ‘v’ which makes the equation $Tv = f$ true.
2. Or else $Tv = 0$ for some ‘v’ that isn’t just zero everywhere.

That is, either there’s exactly one solution, or else there’s no solving this particular equation. We can rule out there being two solutions (the way quadratic equations often have), or ten solutions (the way some annoying problems will), or infinitely many solutions (oh, it happens).

It turns up often in boundary value problems. Often before we try solving one we spend some time working out whether there is a solution. You can imagine why it’s worth spending a little time working that out before committing to a big equation-solving project. But it comes up elsewhere. Very often we have problems that, at their core, are “does this operator match anything at all in the domain to a particular function in the range?” When we try to answer we stumble across Fredholm’s Alternative over and over.

Fredholm here was Ivar Fredholm, a Swedish mathematician of the late 19th and early 20th centuries. He worked for Uppsala University, and for the Swedish Social Insurance Agency, and as an actuary for the Skandia insurance company. Wikipedia tells me that his mathematical work was used to calculate buyback prices. I have no idea how.

## Theorem Thursday: One Mean Value Theorem Of Many

For this week I have something I want to follow up on. We’ll see if I make it that far.

# The Mean Value Theorem.

My subject line disagrees with the header just above here. I want to talk about the Mean Value Theorem. It’s one of those things that turns up in freshman calculus and then again in Analysis. It’s introduced as “the” Mean Value Theorem. But like many things in calculus it comes in several forms. So I figure to talk about one of them here, and another form in a while, when I’ve had time to make up drawings.

Calculus can split effortlessly into two kinds of things. One is differential calculus. This is the study of continuity and smoothness. It studies how a quantity changes if someting affecting it changes. It tells us how to optimize things. It tells us how to approximate complicated functions with simpler ones. Usually polynomials. It leads us to differential equations, problems in which the rate at which something changes depends on what value the thing has.

The other kind is integral calculus. This is the study of shapes and areas. It studies how infinitely many things, all infinitely small, add together. It tells us what the net change in things are. It tells us how to go from information about every point in a volume to information about the whole volume.

They aren’t really separate. Each kind informs the other, and gives us tools to use in studying the other. And they are almost mirrors of one another. Differentials and integrals are not quite inverses, but they come quite close. And as a result most of the important stuff you learn in differential calculus has an echo in integral calculus. The Mean Value Theorem is among them.

The Mean Value Theorem is a rule about functions. In this case it’s functions with a domain that’s an interval of the real numbers. I’ll use ‘a’ as the name for the smallest number in the domain and ‘b’ as the largest number. People talking about the Mean Value Theorem often do. The range is also the real numbers, although it doesn’t matter which ones.

I’ll call the function ‘f’ in accord with a longrunning tradition of not working too hard to name functions. What does matter is that ‘f’ is continuous on the interval [a, b]. I’ve described what ‘continuous’ means before. It means that here too.

And we need one more thing. The function f has to be differentiable on the interval (a, b). You maybe noticed that before I wrote [a, b], and here I just wrote (a, b). There’s a difference here. We need the function to be continuous on the “closed” interval [a, b]. That is, it’s got to be continuous for ‘a’, for ‘b’, and for every point in-between.

But we only need the function to be differentiable on the “open” interval (a, b). That is, it’s got to be continuous for all the points in-between ‘a’ and ‘b’. If it happens to be differentiable for ‘a’, or for ‘b’, or for both, that’s great. But we won’t turn away a function f for not being differentiable at those points. Only the interior. That sort of distinction between stuff true on the interior and stuff true on the boundaries is common. This is why mathematicians have words for “including the boundaries” (“closed”) and “never minding the boundaries” (“open”).

As to what “differentiable” is … A function is differentiable at a point if you can take its derivative at that point. I’m sure that clears everything up. There are many ways to describe what differentiability is. One that’s not too bad is to imagine zooming way in on the curve representing a function. If you start with a big old wobbly function it waves all around. But pick a point. Zoom in on that. Does the function stay all wobbly, or does it get more steady, more straight? Keep zooming in. Does it get even straighter still? If you zoomed in over and over again on the curve at some point, would it look almost exactly like a straight line?

If it does, then the function is differentiable at that point. It has a derivative there. The derivative’s value is whatever the slope of that line is. The slope is that thing you remember from taking Boring Algebra in high school. That rise-over-run thing. But this derivative is a great thing to know. You could approximate the original function with a straight line, with slope equal to that derivative. Close to that point, you’ll make a small enough error nobody has to worry about it.

That there will be this straight line approximation isn’t true for every function. Here’s an example. Picture a line that goes up and then takes a 90-degree turn to go back down again. Look at the corner. However close you zoom in on the corner, there’s going to be a corner. It’s never going to look like a straight line; there’s a 90-degree angle there. It can be a smaller angle if you like, but any sort of corner breaks this differentiability. This is a point where the function isn’t differentiable.

There are functions that are nothing but corners. They can be differentiable nowhere, or only at a tiny set of points that can be ignored. (A set of measure zero, as the dialect would put it.) Mathematicians discovered this over the course of the 19th century. They got into some good arguments about how that can even make sense. It can get worse. Also found in the 19th century were functions that are continuous only at a single point. This smashes just about everyone’s intuition. But we can’t find a definition of continuity that’s as useful as the one we use now and avoids that problem. So we accept that it implies some pathological conclusions and carry on as best we can.

Now I get to the Mean Value Theorem in its differential calculus pelage. It starts with the endpoints, ‘a’ and ‘b’, and the values of the function at those points, ‘f(a)’ and ‘f(b)’. And from here it’s easiest to figure what’s going on if you imagine the plot of a generic function f. I recommend drawing one. Just make sure you draw it without lifting the pen from paper, and without including any corners anywhere. Something wiggly.

Draw the line that connects the ends of the wiggly graph. Formally, we’re adding the line segment that connects the points with coordinates (a, f(a)) and (b, f(b)). That’s coordinate pairs, not intervals. That’s clear in the minds of the mathematicians who don’t see why not to use parentheses over and over like this. (We are short on good grouping symbols like parentheses and brackets and braces.)

Per the Mean Value Theorem, there is at least one point whose derivative is the same as the slope of that line segment. If you were to slide the line up or down, without changing its orientation, you’d find something wonderful. Most of the time this line intersects the curve, crossing from above to below or vice-versa. But there’ll be at least one point where the shifted line is “tangent”, where it just touches the original curve. Close to that touching point, the “tangent point”, the shifted line and the curve blend together and can’t be easily told apart. As long as the function is differentiable on the open interval (a, b), and continuous on the closed interval [a, b], this will be true. You might convince yourself of it by drawing a couple of curves and taking a straightedge to the results.

This is an existence theorem. Like the Intermediate Value Theorem, it doesn’t tell us which point, or points, make the thing we’re interested in true. It just promises us that there is some point that does it. So it gets used in other proofs. It lets us mix information about intervals and information about points.

It’s tempting to try using it numerically. It looks as if it justifies a common differential-calculus trick. Suppose we want to know the value of the derivative at a point. We could pick a little interval around that point and find the endpoints. And then find the slope of the line segment connecting the endpoints. And won’t that be close enough to the derivative at the point we care about?

Well. Um. No, we really can’t be sure about that. We don’t have any idea what interval might make the derivative of the point we care about equal to this line-segment slope. The Mean Value Theorem won’t tell us. It won’t even tell us if there exists an interval that would let that trick work. We can’t invoke the Mean Value Theorem to let us get away with that.

Often, though, we can get away with it. Differentiable functions do have to follow some rules. Among them is that if you do pick a small enough interval then approximations that look like this will work all right. If the function flutters around a lot, we need a smaller interval. But a lot of the functions we’re interested in don’t flutter around that much. So we can get away with it. And there’s some grounds to trust in getting away with it. The Mean Value Theorem isn’t any part of the grounds. It just looks so much like it ought to be.

I hope on a later Thursday to look at an integral-calculus form of the Mean Value Theorem.

## A Leap Day 2016 Mathematics A To Z: X-Intercept

Oh, x- and y-, why are you so poor in mathematics terms? I brave my way.

## X-Intercept.

I did not get much out of my eighth-grade, pre-algebra, class. I didn’t connect with the teacher at all. There were a few little bits to get through my disinterest. One came in graphing. Not graph theory, of course, but the graphing we do in middle school and high school. That’s where we find points on the plane with coordinates that make some expression true. Two major terms kept coming up in drawing curves of lines. They’re the x-intercept and the y-intercept. They had this lovely, faintly technical, faintly science-y sound. I think the teacher emphasized a few times they were “intercepts”, not “intersects”. But it’s hard to explain to an eighth-grader why this is an important difference to make. I’m not sure I could explain it to myself.

An x-intercept is a point where the plot of a curve and the x-axis meet. So we’re assuming this is a Cartesian coordinate system, the kind marked off with a pair of lines meeting at right angles. It’s usually two-dimensional, sometimes three-dimensional. I don’t know anyone who’s worried about the x-intercept for a four-dimensional space. Even higher dimensions are right out. The thing that confused me the most, when learning this, is a small one. The x-axis is points that have a y-coordinate of zero. Not an x-coordinate of zero. So in a two-dimensional space it makes sense to describe the x-intercept as a single value. That’ll be the x-coordinate, and the point with the x-coordinate of that and the y-coordinate of zero is the intercept.

If you have an expression and you want to find an x-intercept, you need to find values of x which make the expression equal to zero. We get the idea from studying lines. There are a couple of typical representations of lines. They almost always use x for the horizontal coordinate, and y for the vertical coordinate. The names are only different if the author is making a point about the arbitrariness of variable names. Sigh at such an author and move on. An x-intercept has a y-coordinate of zero, so, set any appearance of ‘y’ in the expression equal to zero and find out what value or values of x make this true. If the expression is an equation for a line there’ll be just the one point, unless the line is horizontal. (If the line is horizontal, then either every point on the x-axis is an intercept, or else none of them are. The line is either “y equals zero”, or it is “y equals something other than zero”. )

There’s also a y-intercept. It is exactly what you’d imagine once you know that. It’s usually easier to find what the y-intercept is. The equation describing a curve is typically written in the form “y = f(x)”. That is, y is by itself on one side, and some complicated expression involving x’s is on the other. Working out what y is for a given x is straightforward. Working out what x is for a given y is … not hard, for a line. For more complicated shapes it can be difficult. There might not be a unique answer. That’s all right. There may be several x-intercepts.

There are a couple names for the x-intercepts. The one that turns up most often away from the pre-algebra and high school algebra study of lines is a “zero”. It’s one of those bits in which mathematicians seem to be trying to make it hard for students. A “zero” of the function f(x) is generally not what you get when you evaluate it for x equalling zero. Sorry about that. It’s the values of x for which f(x) equals zero. We also call them “roots”.

OK, but who cares?

Well, if you want to understand the shape of a curve, the way a function looks, it helps to plot it. Today, yeah, pull up Mathematica or Matlab or Octave or some other program and you get your plot. Fair enough. If you don’t have a computer that can plot like that, the way I did in middle school, you have to do it by hand. And then the intercepts are clues to how to sketch the function. They are, relatively, easy points which you can find, and which you know must be on the curve. We may form a very rough sketch of the curve. But that rough picture may be better than having nothing.

And we can learn about the behavior of functions even without plotting, or sketching a plot. Intercepts of expressions, or of parts of expressions, are points where the value might change from positive to negative. If the denominator of a part of the expression has an x-intercept, this could be a point where the function’s value is undefined. It may be a discontinuity in the function. The function’s values might jump wildly between one side and another. These are often the important things about understanding functions. Where are they positive? Where are they negative? Where are they continuous? Where are they not?

These are things we often want to know about functions. And we learn many of them by looking for the intercepts, x- and y-.

## A Leap Day 2016 Mathematics A To Z: Surjective Map

Gaurish today gives me one more request for the Leap Day Mathematics A To Z. And it lets me step away from abstract algebra again, into the world of analysis and what makes functions work. It also hovers around some of my past talk about functions.

## Surjective Map.

This request echoes one of the first terms from my Summer 2015 Mathematics A To Z. Then I’d spent some time on a bijection, or a bijective map. A surjective map is a less complicated concept. But if you understood bijective maps, you picked up surjective maps along the way.

By “map”, in this context, mathematicians don’t mean those diagrams that tell you where things are and how you might get there. Of course we don’t. By a “map” we mean that we have some rule that matches things in one set to things in another. If this sounds to you like what I’ve claimed a function is then you have a good ear. A mapping and a function are pretty much different names for one another. If there’s a difference in connotation I suppose it’s that a “mapping” makes a weaker suggestion that we’re necessarily talking about numbers.

(In some areas of mathematics, a mapping means a function with some extra properties, often some kind of continuity. Don’t worry about that. Someone will tell you when you’re doing mathematics deep enough to need this care. Mind, that person will tell you by way of a snarky follow-up comment picking on some minor point. It’s nothing personal. They just want you to appreciate that they’re very smart.)

So a function, or a mapping, has three parts. One is a set called the domain. One is a set called the range. And then there’s a rule matching things in the domain to things in the range. With functions we’re so used to the domain and range being the real numbers that we often forget to mention those parts. We go on thinking “the function” is just “the rule”. But the function is all three of these pieces.

A function has to match everything in the domain to something in the range. That’s by definition. There’s no unused scraps in the domain. If it looks like there is, that’s because were being sloppy in defining the domain. Or let’s be charitable. We assumed the reader understands the domain is only the set of things that make sense. And things make sense by being matched to something in the range.

Ah, but now, the range. The range could have unused bits in it. There’s nothing that inherently limits the range to “things matched by the rule to some thing in the domain”.

By now, then, you’ve probably spotted there have to be two kinds of functions. There’s one in which the whole range is used, and there’s ones in which it’s not. Good eye. This is exactly so.

If a function only uses part of the range, if it leaves out anything, even if it’s just a single value out of infinitely many, then the function is called an “into” mapping. If you like, it takes the domain and stuffs it into the range without filling the range.

Ah, but if a function uses every scrap of the range, with nothing left out, then we have an “onto” mapping. The whole of the domain gets sent onto the whole of the range. And this is also known as a “surjective” mapping. We get the term “surjective” from Nicolas Bourbaki. Bourbaki is/was the renowned 20th century mathematics art-collective group which did so much to place rigor and intuition-free bases into mathematics.

The term pairs up with the “injective” mapping. In this, the elements in the range match up with one and only one thing in the domain. So if you know the function’s rule, then if you know a thing in the range, you also know the one and only thing in the domain matched to that. If you don’t feel very French, you might call this sort of function one-to-one. That might be a better name for saying why this kind of function is interesting.

Not every function is injective. But then not every function is surjective either. But if a function is both injective and surjective — if it’s both one-to-one and onto — then we have a bijection. It’s a mapping that can represent the way a system changes and that we know how to undo. That’s pretty comforting stuff.

If we use a mapping to describe how a process changes a system, then knowing it’s a surjective map tells us something about the process. It tells us the process makes the system settle into a subset of all the possible states. That doesn’t mean the thing is stable — that little jolts get worn down. And it doesn’t mean that the thing is settling to a fixed state. But it is a piece of information suggesting that’s possible. This may not seem like a strong conclusion. But considering how little we know about the function it’s impressive to be able to say that much.

## The Set Tour, Part 13: Continuity

I hope we’re all comfortable with the idea of looking at sets of functions. If not we can maybe get comfortable soon. What’s important about functions is that we can add them together, and we can multiply them by real numbers. They work in important ways like regular old numbers would. They also work the way vectors do. So all we have to do is be comfortable with vectors. Then we have the background to talk about functions this way. And so, my first example of an oft-used set of functions:

## C[a, b]

People like continuity. It’s comfortable. It’s reassuring, even. Most situations, most days, most things are pretty much like they were before, and that’s how we want it. Oh, we cast some hosannas towards the people who disrupt the steady progression of stuff. But we’re lying. Think of the worst days of your life. They were the ones that were very much not like the day before. If the day is discontinuous enough, then afterwards, people ask one another what they were doing when the discontinuous thing happened.

(OK, there are some good days which are very much not like the day before. But imagine someone who seems informed assures you that tomorrow will completely change your world. Do you feel anticipation or dread?)

Mathematical continuity isn’t so fraught with social implications. What we mean by a continuous function is — well, skip the precise definition. Calculus I students see it, stare at it, and run away. It comes back to the mathematics majors in Intro to Real Analysis. Then it comes back again in Real Analysis. Mathematics majors get to accepting it sometime around Real Analysis II, because the alternative is Functional Analysis. The definition’s in truth not so bad. But it’s fussy and if you get any parts wrong silly consequences follow.

If you’re not a mathematics major, or if you’re a mathematics major not taking a test in Real Analysis, you can get away with this. We’re talking here, and we’re going to keep talking, about functions with real numbers as the domain and real numbers as the range. Later, we can go to complex-valued numbers, or even vectors of numbers. The arguments get a bit longer but don’t change much, so if you learn this you’ve got most of the way to learning everything.

A continuous function is one whose graph you can draw without having to lift your pen. We like continuous functions, mathematically, because they are so much easier to work with. Why are they easy? Well, because if you know the value of your function at one point, you know approximately what it is at nearby points. There’s predictability to the function’s values. You can see why this would make it easier to do calculations. But it makes analysis easy too. We want to do a lot of proofs which involve arithmetic with the values functions have. It gets so much easier that we can say the function’s actual value is something like the value it has at some point we happen to know.

So if we want to work with functions, we usually want to work with continuous functions. They behave more predictably, and more like we hope they will.

The set C[a, b] is the set of all continuous real-valued whose domain is the set of real numbers from a to b. For example, pick a function that’s in C[-1, 1]. Let me call it f. Then f is a real-valued function. And its domain is the real numbers from -1 to 1. In the absence of other information about what its range is, we assume it to be the real numbers R. We can have any real numbers as the boundaries; C[-1000, π] is legitimate if eccentric.

There are some ranges that are particularly popular. All the real numbers is one. That might get written C(R) for shorthand. C[0, 1], the range from 0 to 1, is popular and easy to work with. C[-1, 1] is almost as good and has the advantage of giving us negative numbers. C[-π, π] is also liked because it meshes well with the trigonometric functions. You remember those: sines and cosines and tangent functions, plus some unpopular ones we try to not talk about. We don’t often talk about other ranges. We can change, say, C[0, 1] into C[0, 10] exactly the way you’d imagine. Re-scaling numbers, and even shifting them up or down some, requires so little work we don’t bother doing it.

C[-1, 1] is a different set of functions from, say, C[0, 1]. There are many functions in one set that have the same rule as a function in another set. But the functions in C[-1, 1] have a different domain from the functions in C[0, 1]. So they can’t be the same functions. The rule might be meaningful outside the domain. If the rule is “f:x -> 3*x”, well, that makes sense whatever x should be. But a function is the rule, the domain, and the range together. If any of the parts changes, we have a different function.

The way I’ve written the symbols, with straight brackets [a, b], means that both the numbers a and b are in the domain of these functions. If I want to omit the boundaries — have every number greater than a but not a itself, and have every number less than b but not b itself — then we change to parentheses. That would be C(-1, 1). If I want to include one boundary but not the other, use a straight bracket for the boundary to include, and a parenthesis for the boundary to omit. C[-1, 1) says functions in that set have a domain that includes -1 but does not include -1. It also drives my text editor crazy having unmatched parentheses and brackets like that. We must suffer for our mathematical arts.