My Little 2021 Mathematics A-to-Z: Ordinary Differential Equations


Mr Wu, my Singapore Maths Tuition friend, has offered many fine ideas for A-to-Z topics. This week’s is another of them, and I’m grateful for it.

Ordinary Differential Equations

As a rule, if you can do something with a number, you can do the same thing with a function. Not always, of course, but the exceptions are fewer than you might imagine. I’ll start with one of those things you can do to both.

A powerful thing we learn in (high school) algebra is that we can use a number without knowing what it is. We give it a name like ‘x’ or ‘y’ and describe what we find interesting about it. If we want to know what it is, we (usually) find some equation or set of equations and find what value of x could make that true. If we study enough (college) mathematics we learn its equivalent in functions. We give something a name like f or g or Ψ and describe what we know about it. And then try to find functions which make that true.

There are a couple common types of equation for these not-yet-known functions. The kind you expect to learn as a mathematics major involves differential equations. These are ones where your equation (or equations) involve derivatives of the not-yet-known f. A derivative describes the rate at which something changes. If we imagine the original f is a position, the derivative is velocity. Derivatives can have derivatives also; this second derivative would be the acceleration. And then second derivatives can have derivatives also, and so on, into infinity. When an equation involves a function and its derivatives we have a differential equation.

(The second common type is the integral equation, using a function and its integrals. And a third involves both derivatives and integrals. That’s known as an integro-differential equation, and isn’t life complicated enough? )

Differential equations themselves naturally divide into two kinds, ordinary and partial. They serve different roles. Usually an ordinary differential equation we can describe the change for from knowing only the current situation. (This may include velocities and accelerations and stuff. We could ask what the velocity at an instant means. But never mind that here.) Usually a partial differential equation bases the change where you are on the neighborhood of where your location. If you see holes you can pick in that, you’re right. The precise difference is about the independent variables. If the function f has more than one independent variable, it’s possible to take a partial derivative. This describes how f changes if one variable changes while the others stay fixed. If the function f has only the one independent variable, you can only take ordinary derivatives. So you get an ordinary differential equation.

But let’s speak casually here. If what you’re studying can be fully represented with a dashboard readout? Like, an ordered list of positions and velocities and stuff? You probably have an ordinary differential equation. If you need a picture with a three-dimensional surface or a color map to understand it? You probably have a partial differential equation.

One more metaphor. If you can imagine the thing you’re modeling as a marble rolling around on a hilly table? Odds are that’s an ordinary differential equation. And that representation covers a lot of interesting problems. Marbles on hills, obviously. But also rigid pendulums: we can treat the angle a pendulum makes and the rate at which those change as dimensions of space. The pendulum’s swinging then matches exactly a marble rolling around the right hilly table. Planets in space, too. We need more dimensions — three space dimensions and three velocity dimensions — for each planet. So, like, the Sun-Earth-and-Moon would be rolling around a hilly table with 18 dimensions. That’s all right. We don’t have to draw it. The mathematics works about the same. Just longer.

[ To be precise we need three momentum dimensions for each orbiting body. If they’re not changing mass appreciably, and not moving too near the speed of light, velocity is just momentum times a constant number, so we can use whichever is easier to visualize. ]

We mostly work with ordinary differential equations of either the first or the second order. First order means we have first derivatives in the equation, but never have to deal with more than the original function and its first derivative. Second order means we have second derivatives in the equation, but never have to deal with more than the original function or its first or second derivatives. You’ll never guess what a “third order” differential equation is unless you have experience in reading words. There are some reasons we stick to these low orders like first and second, though. One is that we know of good techniques for solving most first- and second-order ordinary differential equations. For higher-order differential equations we often use techniques that find a related normal old polynomial. Its solution helps with the thing we want. Or we break a high-order differential equation into a set of low-order ones. So yes, again, we search for answers where the light is good. But the good light covers many things we like to look at.

There’s simple harmonic motion, for example. It covers pendulums and springs and perturbations around stable equilibriums and all. This turns out to cover so many problems that, as a physics major, you get a little sick of simple harmonic motion. There’s the Airy function, which started out to describe the rainbow. It turns out to describe particles trapped in a triangular quantum well. The van der Pol equation, about systems where a small oscillation gets energy fed into it while a large oscillation gets energy drained. All kinds of exponential growth and decay problems. Very many functions where pairs of particles interact.

This doesn’t cover everything we would like to do. That’s all right. Ordinary differential equations lend themselves to numerical solutions. It requires considerable study and thought to do these numerical solutions well. But this doesn’t make the subject unapproachable. Few of us could animate the “Pink Elephants on Parade” scene from Dumbo. But could you draw a flip book of two stick figures tossing a ball back and forth? If you’ve had a good rest, a hearty breakfast, and have not listened to the news yet today, so you’re in a good mood?

The flip book ball is a decent example here, too. The animation will look good if the ball moves about the “right” amount between pages. A little faster when it’s first thrown, a bit slower as it reaches the top of its arc, a little faster as it falls back to the catcher. The ordinary differential equation tells us how fast our marble is rolling on this hilly table, and in what direction. So we can calculate how far the marble needs to move, and in what direction, to make the next page in the flip book.

Almost. The rate at which the marble should move will change, in the interval between one flip-book page and the next. The difference, the error, may not be much. But there is a difference between the exact and the numerical solution. Well, there is a difference between a circle and a regular polygon. We have many ways of minimizing and estimating and controlling the error. Doing that is what makes numerical mathematics the high-paid professional industry it is. Our game of catch we can verify by flipping through the book. The motion of four dozen planets and moons attracting one another is harder to be sure we calculate it right.

I said at the top that most anything one can do with numbers one can do with functions also. I would like to close the essay with some great parallel. Like, the way that trying to solve cubic equations made people realize complex numbers were good things to have. I don’t have a good example like that for ordinary differential equations, where the study expanded our ideas of what functions could be. Part of that is that complex numbers are more accessible than the stranger functions. Part of that is that complex numbers have a story behind them. The story features titanic figures like Gerolamo Cardano, Niccolò Tartaglia and Ludovico Ferrari. We see some awesome and weird personalities in 19th century mathematics. But their fights are generally harder to watch from the sidelines and cheer on. And part is that it’s easier to find pop historical treatments of the kinds of numbers. The historiography of what a “function” is is a specialist occupation.

But I can think of a possible case. A tool that’s sometimes used in solving ordinary differential equations is the “Dirac delta function”. Yes, that Paul Dirac. It’s a weird function, written as \delta(x) . It’s equal to zero everywhere, except where x is zero. When x is zero? It’s … we don’t talk about what it is. Instead we talk about what it can do. The integral of that Dirac delta function times some other function can equal that other function at a single point. It strains credibility to call this a function the way we speak of, like, sin(x) or \sqrt{x^2 + 4} being functions. Many will classify it as a distribution instead. But it is so useful, for a particular kind of problem, that it’s impossible to throw away.

So perhaps the parallels between numbers and functions extend that far. Ordinary differential equations can make us notice kinds of functions we would not have seen otherwise.


And with this — I can see the much-postponed end of the Little 2021 Mathematics A-to-Z! You can read all my entries for 2021 at this link, and if you’d like can find all my A-to-Z essays here. How will I finish off the shortest yet most challenging sequence I’ve done yet? Will it be yellow and equivalent to the Axiom of Choice? Answers should come, in a week, if all starts going well.

From my Sixth A-to-Z: Operator


One of the many small benefits of these essays is getting myself clearly grounded on terms that I had accepted without thinking much about. Operator, like functional (mentioned in here), is one of them. I’m sure that when these were first introduced my instructors gave them clear definitions. Buut when they’re first introduced it’s not clear why these are important, or that we are going to spend the rest of grad school talking about them. So this piece from 2019’s A-to-Z sequence secured my footing on a term I had a fair understanding of. You get some idea of what has to be intended from the context in which the term is used. Also from knowing how terms like this tend to be defined. But having it down to where I could certainly pass a true-false test about “is this an operator”? That was new.


Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Operator.

An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same sapce but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.

So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.

My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.

I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.

Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function af + bg is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of af + bg is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.

I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.

One of the conceptual leaps in high school algebra is that we start analyzing the things we do with numbers. Like, we don’t just take the number three, square it, multiply that by two and add to that the number three times four and add to that the number 1. We think about what if we take any number, call it x, and think of 2x^2 + 4x + 1 . And what if we make equations based on doing this 2x^2 + 4x + 1 ; what values of x make those equations true? Or tell us something interesting?

Operators represent a similar leap. We can think of functions as things we manipulate, and think of those manipulations as a particular thing to do. For example, let me come up with a differential expression. For some function u(x) work out the value of this:

2\frac{d^2 u(x)}{dx^2} + 4 \frac{d u(x)}{dx} + u(x)

Let me join in the convention of using ‘D’ for the differential operator. Then we can rewrite this expression like so:

2D^2 u + 4D u + u

Suddenly the differential equation looks a lot like a polynomial. Of course it does. Remember that everything in mathematics is polynomials. We get new tools to solve differential equations by rewriting them as operators. That’s nice. It also scratches that itch that I think everyone in Intro to Calculus gets, of wanting to somehow see \frac{d^2}{dx^2} as if it were a square of \frac{d}{dx} . It’s not, and D^2 is not the square of D . It’s composing D with itself. But it looks close enough to squaring to feel comfortable.

Nobody needs to do 2D^2 u + 4D u + u except to learn some stuff about operators. But you might imagine a world where we did this process all the time. If we did, then we’d develop shorthand for it. Maybe a new operator, call it T, and define it that T = 2D^2 + 4D + 1 . You see the grammar of treating functions as if they were real numbers becoming familiar. You maybe even noticed the ‘1’ sitting there, serving as the “identity operator”. You know how you’d write out Tv(x) = 3 if you needed to write it in full.

But there are operators that we use all the time. These do get special names, and often shorthand. For example, there’s the gradient operator. This applies to any function with several independent variables. The gradient has a great physical interpretation if the variables represent coordinates of space. If they do, the gradient of a function at a point gives us a vector that describes the direction in which the function increases fastest. And the size of that gradient — a functional on this operator — describes how fast that increase is.

The gradient itself defines more operators. These have names you get very familiar with in Vector Calculus, with names like divergence and curl. These have compelling physical interpretations if we think of the function we operate on as describing a moving fluid. A positive divergence means fluid is coming into the system; a negative divergence, that it is leaving. The curl, in fluids, describe how nearby streams of fluid move at different rate.

Physical interpretations are common in operators. This probably reflects how much influence physics has on mathematics and vice-versa. Anyone studying quantum mechanics gets familiar with a host of operators. These have comfortable names like “position operator” or “momentum operator” or “spin operator”. These are operators that apply to the wave function for a problem. They transform the wave function into a probability distribution. That distribution describes what positions or momentums or spins are likely, how likely they are. Or how unlikely they are.

They’re not all physical, though. Or not purely physical. Many operators are useful because they are powerful mathematical tools. There is a variation of the Fourier series called the Fourier transform. We can interpret this as an operator. Suppose the original function started out with time or space as its independent variable. This often happens. The Fourier transform operator gives us a new function, one with frequencies as independent variable. This can make the function easier to work with. The Fourier transform is an integral operator, by the way, so don’t go thinking everything is a complicated set of derivatives.

Another integral-based operator that’s important is the Laplace transform. This is a great operator because it turns differential equations into algebraic equations. Often, into polynomials. You saw that one coming.

This is all a lot of good press for operators. Well, they’re powerful tools. They help us to see that we can manipulate functions in the ways that functions let us manipulate numbers. It should sound good to realize there is much new that you can do, and you already know most of what’s needed to do it.


This and all the other Fall 2019 A To Z posts should be gathered here. And once I have the time to fiddle with tags I’ll have all past A to Z essays gathered at this link.

From my First A-to-Z: Orthogonal


I haven’t had the space yet to finish my Little 2021 A-to-Z, so let me resume playing the hits of past ones. For my first, Summer 2015, one, I picked all the topics myself. This one, Orthogonal, I remember as one of the challenging ones. The challenge was the question put in the first paragraph: why do we have this term, which is so nearly a synonym for “perpendicular”? I didn’t find an answer, then, or since. But I was able to think about how we use “orthogonal” and what it might do that “perpendicular ” doesn’t..


Orthogonal.

Orthogonal is another word for perpendicular. So why do we need another word for that?

It helps to think about why “perpendicular” is a useful way to organize things. For example, we can describe the directions to a place in terms of how far it is north-south and how far it is east-west, and talk about how fast it’s travelling in terms of its speed heading north or south and its speed heading east or west. We can separate the north-south motion from the east-west motion. If we’re lucky these motions separate entirely, and we turn a complicated two- or three-dimensional problem into two or three simpler problems. If they can’t be fully separated, they can often be largely separated. We turn a complicated problem into a set of simpler problems with a nice and easy part plus an annoying yet small hard part.

And this is why we like perpendicular directions. We can often turn a problem into several simpler ones describing each direction separately, or nearly so.

And now the amazing thing. We can separate these motions because the north-south and the east-west directions are at right angles to one another. But we can describe something that works like an angle between things that aren’t necessarily directions. For example, we can describe an angle between things like functions that have the same domain. And once we can describe the angle between two functions, we can describe functions that make right angles between each other.

This means we can describe functions as being perpendicular to one another. An example. On the domain of real numbers from -1 to 1, the function f(x) = x is perpendicular to the function g(x) = x^2 . And when we want to study a more complicated function we can separate the part that’s in the “direction” of f(x) from the part that’s in the “direction” of g(x). We can treat functions, even functions we don’t know, as if they were locations in space. And we can study and even solve for the different parts of the function as if we were pinning down the north-south and the east-west movements of a thing.

So if we want to study, say, how heat flows through a body, we can work out a series of “direction” for functions, and work out the flow in each of those “directions”. These don’t have anything to do with left-right or up-down directions, but the concepts and the convenience is similar.

I’ve spoken about this in terms of functions. But we can define the “angle” between things for many kinds of mathematical structures. Once we can do that, we can have “perpendicular” pairs of things. I’ve spoken only about functions, but that’s because functions are more familiar than many of the mathematical structures that have orthogonality.

Ah, but why call it “orthogonal” rather than “perpendicular”? And I don’t know. The best I can work out is that it feels weird to speak of, say, the cosine function being “perpendicular” to the sine function when you can’t really say either is in any particular direction. “Orthogonal” seems to appeal less directly to physical intuition while still meaning something. But that’s my guess, rather than the verdict of a skilled etymologist.

My Little 2021 Mathematics A-to-Z: Inverse


I owe Iva Sallay thanks for the suggestion of today’s topic. Sallay is a longtime friend of my blog here. And runs the Find the Factors recreational mathematics puzzle site. If you haven’t been following, or haven’t visited before, this is a fun week to step in again. The puzzles this week include (American) Thanksgiving-themed pictures.

Inverse.

When we visit the museum made of a visual artist’s studio we often admire the tools. The surviving pencils and crayons, pens, brushes and such. We don’t often notice the eraser, the correction tape, the unused white-out, or the pages cut into scraps to cover up errors. To do something is to want to undo it. This is as true for the mathematics of a circle as it is for the drawing of one.

If not to undo something, we do often want to know where something comes from. A classic paper asks can one hear the shape of a drum? You hear a sound. Can you say what made that sound? Fine, dismiss the drum shape as idle curiosity. The same question applies to any sensory data. If our hand feels cooler here, where is the insulation of the building damaged? If we have this electrocardiogram reading, what can we say about the action of the heart producing that? If we see the banks of a river, what can we know about how the river floods?

And this is the point, and purpose, of inverses. We can understand them as finding the causes of what we observe.

The first inverse we meet is usually the inverse function. It’s introduced as a way to undo what a function does. That’s an odd introduction, if you’re comfortable with what a function is. A function is a mathematical construct. It’s two sets — a domain and a range — and a rule that links elements in the domain to the range. To “undo” a function is like “undoing” a rectangle. But a function has a compelling “physical” interpretation. It’s routine to introduce functions as machines that take some numbers in and give numbers out. We think of them as ways to transform the domain into the range. In functional analysis get to thinking of domains as the most perfect putty. We expect functions to stretch and rotate and compress and slide along as though they were drawing a Betty Boop cartoon.

So we’re trained to speak of a function as a verb, acting on pieces of the domain. An element or point, or a region, or the whole domain. We think the function “maps”, or “takes”, or “transforms” this into its image in the range. And if we can turn one thing into another, surely we can turn it back.

Some things it’s obvious we can turn back. Suppose our function adds 2 to whatever we give it. We can get the original back by subtracting 2. If the function subtracts 32 and divides by 1.8, we can reverse it by multiplying by 1.8 and adding 32. If the function takes the reciprocal, we can take the reciprocal again. We have a bit of a problem if we started out taking the reciprocal of 0, but who would want to do such a thing anyway? If the function squares a number, we can undo that by taking the square root. Unless we started from a negative number. Then we have trouble.

The trouble is not every function has an inverse. Which we could have realized by thinking how to undo “multiply by zero”. To be a well-defined function, the rule part has to match elements in the domain to exactly one element in the range. This makes the function, in the impenetrable jargon of the mathematician, a “one-to-one function”. Or you can describe it with the more intuitive label of “bijective”.

But there’s no reason more than one thing in the domain can’t match to the same thing in the range. If I know the cosine of my angle is \frac{1}{2}, my angle might be 30 degrees. Or -30 degrees. Or 390 degrees. Or 330 degrees. You may protest there’s no difference between a 30 degree and a 390 degree angle. I agree those angles point in the same direction. But a gear rotated 390 degrees has done something that a gear rotated 30 degrees hasn’t. If all I know is where the dot I’ve put on the gear is, how can I know how much it’s rotated?

So what we do is shift from the actual cosine into one branch of the cosine. By restricting the domain we can create a function that has the same rule as the one we want, but that’s also one-to-one and so has an inverse. What restriction to use? That depends on what you want. But mathematicians have some that come up so often they might as well be defaults. So the square root is the inverse of the square of nonnegative numbers. The inverse Cosine is the inverse of the cosine of angles from 0 to 180 degrees. The inverse Sine is the inverse of the sine of angles from -90 to 90 degrees. The capital letters are convention to say we’re doing this. If we want a different range, we write out that we’re looking for an inverse cosine from -180 to 0 degrees or whatever. (Yes, the mathematician will default to using radians, rather than degrees, for angles. That’s a different essay.) It’s an imperfect solution, but it often works well enough.

The trouble we had with cosines, and functions, continues through all inverses. There are almost always alternate causes. Many shapes of drums sound alike. Take two metal bars. Heat both with a blowtorch, one on the end and one in the center. Not to the point of melting, only to the point of being too hot to touch. Let them cool in insulated boxes for a couple weeks. There’ll be no measurement you can do on the remaining heat that tells you which one was heated on the end and which the center. That’s not because your thermometers are no good or the flow of heat is not deterministic or anything. It’s that both starting cases settle to the same end. So here there is no usable inverse.

This is not to call inverses futile. We can look for what we expect to find useful. We are inclined to find inverses of the cosine between 0 and 180 degrees, even though 4140 through 4320 degrees is as legitimate. We may not know what is wrong with a heart, but have some idea what a heart could do and still beat. And there’s a famous example in 19th-century astronomy. After the discovery of Uranus came the discovery it did not move right. For a while it moved across the sky too fast for its distance from the sun. Then it started moving too slow. The obvious supposition was that there was another, not-yet-seen, planet, affecting its orbit.

The trouble is finding it. Calculating the orbit from what data they had required solving equations with 13 unknown quantities. John Couch Adams and Urbain Le Verrier attempted this anyway, making suppositions about what they could not measure. They made great suppositions. Le Verrier made the better calculations, and persuaded an astronomer (Johann Gottfried Galle, assisted by Heinrich Louis d’Arrest) to go look. Took about an hour of looking. They also made lucky suppositions. Both, for example, supposed the trans-Uranian planet would obey “Bode’s Law”, a seeming pattern in the size of planetary radiuses. The actual Neptune does not. It was near enough in the sky to where the calculated planet would be, though. The world is vaster than our imaginations.

That there are many ways to draw Betty Boop does not mean there’s nothing to learn about how this drawing was done. And so we keep having inverses as a vibrant field of mathematics.


Next week I hope to cover the letter ‘C’ and don’t think I’m not worried about what that ‘C’ will be. This week’s essay, and all the essays for the Little Mathematics A-to-Z, should be at this link. And all of this year’s essays, and all the A-to-Z essays from past years, should be at this link. Thank you for reading.

My Little 2021 Mathematics A-to-Z: Hyperbola


John Golden, author of the Math Hombre blog, had several great ideas for the letter H in this little A-to-Z for the year. Here’s one of them.

Hyperbola.

The hyperbola is where advanced mathematics begins. It’s a family of shapes, some of the pieces you get by slicing a cone. You can make an approximate one shining a flashlight on a wall. Other conic sections are familiar, everyday things, though. Circles we see everywhere. Ellipses we see everywhere we look at a circle in perspective. Parabolas we learn, in approximation, watching something tossed, or squirting water into the air. The hyperbola should be as accessible. Hold your flashlight parallel to the wall and look at the outline of light it casts. But the difference between this and a parabola isn’t obvious. And it’s harder to see parabolas in nature. It’s the path a space probe swinging past a planet makes? Great guide for all us who’ve launched space probes past Jupiter.

When we learn of hyperbolas, somewhere in high school algebra or in precalculus, they seem designed to break the rules we had inferred. We’ve learned functions like lines and quadradics (parabolas) and cubics. They’re nice, simple, connected shapes. The hyperbola comes in two pieces. We’ve learned that the graph of a function crosses any given vertical line at most once. Now, we can expect to see it twice. We learn to sketch functions by finding a few interesting points — roots, y-intercepts, things like that. Hyperbolas, we’re taught to draw this little central box and then two asymptotes. Also, we have asymptotes, a simpler curve that the actual curve almost equals.

We’re trained to see functions having the couple odd points where they’re not defined. Nobody expects y = 1 \div x to mean anything when x is zero. But we learn these as weird, isolated points. Now there’s this interval of x-values that don’t fit anything on the graph. Half the time, anyway, because we see two classes of hyperbolas. There’s ones that open like cups, pointing up and down. Those have definitions for every value of x. There’s ones that open like ears, pointing left and right. Those have a box in the center where no y satisfies the x’s. They seem like they’re taught just to be mean.

They’re not, of course. The only mathematical thing we teach just to be mean is integration by trigonometric substitution. The things which seem weird or new in hyperbolas are, largely, things we didn’t notice before. A vertical line put across a circle or ellipse crosses the curve twice, most points. There are two huge intervals, to the left and to the right of the circle, where no value of y makes the equation true. Circles are familiar, though. Ellipses don’t seem intimidating. We know we can’t turn x^2 + y^2 = 4 (a typical circle) into a function without some work. We have to write either f(x) = \sqrt{4 - x^2} or f(x) = -\sqrt{4 - x^2} , breaking the circle into two halves. The same happens for hyperbolas, though, with x^2 - y^2 = 4 (a typical hyperbola) turning into f(x) = \sqrt{x^2 - 4} or f(x) = -\sqrt{x^2 - 4} .

Even the definitions seem weird. The ellipse we can draw by taking a set distance and two focus points. If the distance from the first focus to a point plus the distance from the point to the second focus is that set distance, the point’s on the ellipse. We can use two thumbtacks and a piece of string to draw the ellipse. The hyperbola has a simliar rule, but weirder. You have your two focus points, yes. And a set distance. But the locus of points of the hyperbola is everything where the distance from the point to one focus minus the distance from the point to the other focus is that set distance. Good luck doing that with thumbtacks and string.

Yet hyperbolas are ready for us. Consider playing with a decent calculator, hitting the reciprocal button for different numbers. 1 turns to 1, yes. 2 turns into 0.5. -0.125 turns into -8. It’s the simplest iterative game to do on the calculator. If you sketch this, though, all the points (x, y) where one coordinate is the reciprocal of the other? It’s two curves. They approach without ever touching the x- and y-axes. Get far enough from the origin and there’s no telling this curve from the axes. It’s a hyperbola, one that obeys that vertical-line rule again. It has only the one value of x that can’t be allowed. We write it as y = \frac{1}{x} or even xy = 1 . But it’s the shape we see when we draw x^2 - y^2 = 2 , rotated. Or a rotation of one we see when we draw y^2 - x^2 = 2 . The equations of rotated shapes are annoying. We do enough of them for ellipses and parabolas and hyperbolas to meet the course requirement. But they point out how the hyperbola is a more normal construct than we fear.

And let me look at that construct again. An equation describing a hyperbola that opens horizontally or vertically looks like ax^2 - by^2 = c for some constant numbers a, b, and c. (If a, b, and c are all positive, this is a hyperbola opening horizontally. If a and b are positive and c negative, this is a hyperbola opening vertically.) An equation describing an ellipse, similarly with its axes horizontal or vertical looks like ax^2 + by^2 = c . (These are shapes centered on the origin. They can have other centers, which make the equations harder but not more enlightening.) The equations have very similar shapes. Mathematics trains us to suspect things with similar shapes have similar properties. That change from a plus to a minus seems too important to ignore, and yet …

I bet you assumed x and y are real numbers. This is convention, the safe bet. If someone wants complex-valued numbers they usually say so. If they don’t want to be explicit, they use z and w as variables instead of x and y. But what if y is an imaginary number? Suppose y = \imath t , for some real number t, where \imath^2 = -1 . You haven’t missed a step; I’m summoning this from nowhere. (Let’s not think about how to draw a point with an imaginary coordinate.) Then ax^2 - by^2 = c is ax^2 - b(\imath t)^2 = c which is ax^2 + bt^2 = c . And despite the weird letters, that’s a circle. By the same supposition we could go from ax^2 + by^2 = c , which we’d taken to be a circle, and get ax^2 - bt^2 = c , a hyperbola.

Fine stuff inspiring the question “so?” I made up a case and showed how that made two dissimilar things look alike. All right. But consider trigonometry, built on the cosine and sine functions. One good way to see the cosine and sine of an angle is as the x- and y-coordinates of a point on the unit circle, where x^2 + y^2 = 1 . (The angle \theta is the one from the point (\cos(\theta), \sin(\theta)) to the origin to the point (1, 0).)

There exists, in parallel to the familiar trig functions, the “hyperbolic trigonometric functions”. These have imaginative names like the hyperbolic sine and hyperbolic cosine. (And onward. We can speak of the “inverse hyperbolic cosecant”, if we wish no one to speak to us again.) Usually these get introduced in calculus, to give the instructor a tiny break. Their derivatives, and integrals, look much like those of the normal trigonometric functions, but aren’t the exact same problems over and over. And these functions, too, have a compelling meaning. The hyperbolic cosine of an angle and hyperbolic sine of an angle have something to do with points on a unit hyperbola, x^2 - y^2 = 1 .

Thinking back on the flashlight. We get a circle by holding the light perpendicular to the wall. We get a hyperbola holding the light parallel. We get a circle by drawing x^2 + y^2 = 1 with x and y real numbers. We get a hyperbola by (somehow) drawing x^2 + y^2 = 1 with x real and y imaginary. We remember something about representing complex-valued numbers with a real axis and an orthogonal imaginary axis.

One almost feels the connection. I can’t promise that pondering this will make hyperbolas be as familiar as circles or at least ellipses. But often a problem that brings us to hyperbolas has an alternate phrasing that’s ellipses, a nd vice-versa. But the common traits of these conic slices can guide you into a new understanding of mathematics.


Thank you for reading. I hope to have another piece next week at this time. This and all of this year’s Little Mathematics A to Z essays should be at this link. And the A-to-Z essays for every year should be at this link.

My All 2020 Mathematics A to Z: Extraneous Solutions


Iva Sallay, the kind author of the Find the Factors recreational mathematics puzzle, suggested this topic for the letter X. It’s a fun chance to look at some of the basics of (high school) algebra again.

Color cartoon illustration of a coati in a beret and neckerchief, holding up a director's megaphone and looking over the Hollywood hills. The megaphone has the symbols + x (division obelus) and = on it. The Hollywood sign is, instead, the letters MATHEMATICS. In the background are spotlights, with several of them crossing so as to make the letters A and Z; one leg of the spotlights has 'TO' in it, so the art reads out, subtly, 'Mathematics A to Z'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Extraneous Solutions.

When developing general relativity, Albert Einstein created a convention. He’s not unique in that. All mathematicians create conventions. They use shorthand for an idea that’s complicated or common. Relatively unique is that other people adopted his convention, because it expressed an idea compactly. This was in working with tensors, which look somewhat like matrixes and have a lot of indexes. In the equations of general relativity you need to take sums over many combinations of values of these indexes. What indexes there are are the same in most every problem. The possible values of the indexes is constant, problem to problem, too.

So Einstein saved himself writing, and his publishers from typesetting, a lot of redundant writing. This by writing out the conditions which implied “take the sums over these indexes on this range”. This is good for people doing general relativity, and certain kinds of geometry. It’s a problem only when an expression escapes its context. When it’s shown to a student or someone who doesn’t know this is a differential-geometry problem. Then the problem becomes confusing, and they can’t work on it.

This is not to fault the Einstein Summation Convention. It puts common necessary scaffolding out of the way and highlighting the interesting unique parts of a problem. Most conventions aim for that. We have the hazard, though, that we may not notice something breaking the convention.

And this is how we create extraneous solutions. And, as a bonus, to have missing solutions. We encounter them with the start of (high school) algebra, when we get used to manipulating equations. When we solve an equation what we always want is something clear, like

x = 2

But it never starts that way. It always starts with something like

x^3 - 8x^2 + 24x - 32 + 22\frac{1}{x} = \frac{6}{x}

or worse. We learn how to handle this. We know that we can do six things that do not alter the truth of an equation. We can regroup terms in the equation. We can add the same number to both sides of the equation. We can multiply both sides of the equation by some number besides zero. We can add zero to one side of the equation. We can multiply one side of the equation by 1. We can replace one quantity with another that has the same value. That doesn’t sound like a lot. It covers more than it seems. Multiplying by 1, for example, is the same as multiplying by \frac{x}{x} . If x isn’t zero, then we can multiply both sides of the equation by that x. And x can’t be zero, or else \frac{x}{x} would not be 1.

So with my example there, start off by multiplying the right side by 1, in the guise \frac{x}{x} . Then multiply both sides by that same non-zero x. At this point the right-hand side simplifies to being 6. Add a -6 to both sides. And then with a lot of shuffling around you work out that the equation is the same as

(x - 2)^4 = 0

And that can only be true when x equals 2.

It should be easy to catch spurious solutions creeping in. They must result from breaking a rule. The obvious problem is multiplying — or dividing — by zero. We expect those to be trouble. Wikipedia has a fine example:

\frac{1}{x - 2} = \frac{3}{x + 2} - \frac{6x}{(x - 2)(x + 2)}

The obvious step is to multiply this whole mess by (x - 2)(x + 2) , which turns our work into a linear equation. Very soon we find the solution must be x = -2 . Which would make at least two of the denominators in the original equation zero. We know not to want that.

The problems can be subtler, though. Consider:

x - 12 = \sqrt{x}

That’s not hard to solve. Multiply both sides by x - 12 . Although, before working out \sqrt{x}\cdot(x - 12) substitute that x - 12 with something equal to it. We know one thing is equal to it, \sqrt{x} . Then we have

(x - 12)^2 = x

It’s a quadratic equation. A little bit of work shows the roots are 9 and 16. One of those answers is correct and the other spurious. At no point did we divide anything, by zero or anything else.

So what is happening and what is the necessary rhetorical link to the Einstein Summation Convention?

There are many ways to look at equations. One that’s common is to look at them as functions. This is so common that we’ll elide between an equation and a function representation. This confuses the prealgebra student who wants to know why sometimes we look at

x^2 - 25x + 144 = 0

and sometimes we look at

f(x) = x^2 - 25x + 144

and sometimes at

f(x) = x^2 - 25x + 144 = 0

The advantage of looking at the function which shadows any equation is we have different tools for studying functions. Sometimes that makes solving the equation easier. In this form, we’re looking for what in the domain matches with something particular in the range.

And now we’ve reached the convention. When we write down something lke x^2 - 25x + 144 we’re implicitly defining a function. A function has three pieces. It has a set called the domain, from which we draw the independent variable. It has a set called the range. It has a rule matching elements in the domain to an element in the range. We’ve only given the rule. What are the domain and what’s the range for f(x) = x^2 - 25x + 144 ?

And here are the conventions. If we haven’t said otherwise, the domain and range are usually either the real numbers or the complex numbers. If we used x or y or t as the independent variable, we mean the real numbers. If we used z as the independent variable, and haven’t already put x and y in, we mean the complex numbers. Sometimes we call in s or w or another letter; never mind that. The range can be the whole set of real or complex numbers. It does us no harm to have too large a range.

The domain, though. We do insist that everything in the domain match to something in the range. And, like, \frac{1}{x - 2} ? That can’t mean anything if x equals 2.

So we take an implicit definition of the domain: it’s all the real numbers for which the function’s rule is meaningful. So, \frac{1}{x - 2} would have a domain “real numbers other than 2”. \frac{6x}{(x - 2)(x + 2)} would have a domain “real numbers other than 2 and -2”.

We create extraneous solutions — or we lose some — when our convention changes the domain. An extraneous solution is one that existed outside the original problem’s domain. A missing solution is one that existed in an excised part of the domain. To go from x^2 = 4x to x = 4 by dividing out x is to cut x = 0 out of the space of possible solutions.

A complaint you might raise. What is the domain for x - 12 = \sqrt{x} ? Rewrite that as a function. f(x) = x - 12 - \sqrt{x} would seem to have a domain “x greater than or equal to 0”. The extraneous solution is x = 9 , a number which rumor has it is greater than or equal to 0. What happened?

We have to take that equation-handling more slowly. We had started out with

x - 12 = \sqrt{x}

The domain has to be “x is greater than or equal to 0” here. All right. The next step was multiplying both sides by the same quantity, x - 12 . So:

(x - 12)(x - 12) = \sqrt{x}(x - 12)

The domain is still “x is greater than or equal to 0”. The next step, though, was a substitution. I wanted to replace the (x - 12) on the right with \sqrt{x} . We know, from the original equation, that those are equal. At least, they’re equal wherever the original equation x - 12 = \sqrt{x} is true. What happens when x = 9 , though?

9 - 12 = \sqrt{9}

We start to see the catch. 9 – 12 is -3. And while it’s true that -3 squared will be 9, it’s false that -3 is the square root of 9. The equation x - 12 = \sqrt{x} can only be true, for real numbers, if \sqrt{x} is nonnegative. We can make this rigorous with two supplementary functions. Let me call g(x) = x - 12 and h(x) = \sqrt{x} .

h(x) has an implicit domain of “x greater than or equal to 0”. What’s the domain of g(x) ? If g(x) = h(x) , like we said it does, then they have to agree for every x in either’s domain. So g(x) can’t have in its domain any x for which h(x) isn’t defined. So the domain of g(x) has to be “x for which x – 12 is greater than or equal to 0”. And that’s “x greater than or equal to 12”.

So the domain for the original equation is “x greater than or equal to 12”. When we keep that domain in mind, the extraneous nature of x = 9 is clear, and we avoid trouble.

Not all extraneous solutions come from algebraic manipulations. Sometimes there are constraints on the problem, rather than the numbers, that make a solution absurd. There is a betting strategy called the martingale. This amounts to doubling the bet every time one loses. This makes the first win balance out all the losses leading to it. This solution fails because the player has a finite wallet, and after a few losses any player hasn’t got the money to continue.

Or consider a case that may be legend. It concerns the Apollo Guidance Computer. It was designed to take the Lunar Module to a spot at zero altitude above the moon’s surface, with zero velocity. The story is that in early test runs, the computer would not avoid trajectories that dropped to a negative altitude along the way to the surface. One imagines the scene after the first Apollo subway trip. (I have not found a date when such a test run was done, or corrections to the code ordered. If someone knows, I’d appreciate learning specifics.)

The convention, that we trust the domain is “everything which makes sense”, is not to blame here. It’s normally a good convention. Explicitly noting the domain at every step is tedious and, most of the time, unenlightening. It belongs in the background. We also must check our possible solutions, and that they represent things that make sense. We can try to concentrate our thinking on the obvious interesting parts, but must spend some time on the rest also.


I am surprised to be so near the end of the 2020 A-to-Z, and to 2020, I hope. This and all the other glossary essays for the year should be at this link. All the essays from every A-to-Z series should be at this link. Thank you for reading.

My All 2020 Mathematics A to Z: Jacobi Polynomials


Mr Wu, author of the Singapore Maths Tuition blog, gave me a good nomination for this week’s topic: the j-function of number theory. Unfortunately I concluded I didn’t understand the function well enough to write about it. So I went to a topic of my own choosing instead.

The Jacobi Polynomials discussed here are named for Carl Gustav Jacob Jacobi. Jacobi lived in Prussia in the first half of the 19th century. Though his career was short, it was influential. I’ve already discussed the Jacobian, which describes how changes of variables change volume. He has a host of other things named for him, most of them in matrices or mathematical physics. He was also a pioneer in those elliptic curves you hear so much about these days.

Color cartoon illustration of a coati in a beret and neckerchief, holding up a director's megaphone and looking over the Hollywood hills. The megaphone has the symbols + x (division obelus) and = on it. The Hollywood sign is, instead, the letters MATHEMATICS. In the background are spotlights, with several of them crossing so as to make the letters A and Z; one leg of the spotlights has 'TO' in it, so the art reads out, subtly, 'Mathematics A to Z'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Jacobi Polynomials.

Jacobi Polynomials are a family of functions. Polynomials, it happens; this is a happy case where the name makes sense. “Family” is the name mathematicians give to a bunch of functions that have some similarity. This often means there’s a parameter, and each possible value of the parameter describes a different function in the family. For example, we talk about the family of sine functions, S_n(z) . For every integer n we have the function S_n(z) = \sin(n z) where z is a real number between -π and π.

We like a family because every function in it gives us some nice property. Often, the functions play nice together, too. This is often something like mutual orthogonality. This means two different representatives of the family are orthogonal to one another. “Orthogonal” means “perpendicular”. We can talk about functions being perpendicular to one another through a neat mechanism. It comes from vectors. It’s easy to use vectors to represent how to get from one point in space to another. From vectors we define a dot product, a way of multiplying them together. A dot product has to meet a couple rules that are pretty easy to do. And if you don’t do anything weird? Then the dot product between two vectors is the cosine of the angle made by the end of the first vector, the origin, and the end of the second vector.

Functions, it turns out, meet all the rules for a vector space. (There are not many rules to make a vector space.) And we can define something that works like a dot product for two functions. Take the integral, over the whole domain, of the first function times the second. This meets all the rules for a dot product. (There are not many rules to make a dot product.) Did you notice me palm that card? When I did not say “the dot product is take the integral …”? That card will come back. That’s for later. For now: we have a vector space, we have a dot product, we can take arc-cosines, so why not define the angle between functions?

Mostly we don’t because we don’t care. Where we do care? We do like functions that are at right angles to one another. As with most things mathematicians do, it’s because it makes life easier. We’ll often want to describe properties of a function we don’t yet know. We can describe the function we don’t yet know as the sum of coefficients — some fixed real number — times basis functions that we do know. And then our problem of finding the function changes to one of finding the coefficients. If we picked a set of basis functions that are all orthogonal to one another, the finding of these coefficients gets easier. Analytically and numerically: we can often turn each coefficient into its own separate problem. Let a different computer, or at least computer process, work on each coefficient and get the full answer much faster.

The Jacobi Polynomials have three coefficients. I see them most often labelled α, β, and n. Likely you imagine this means it’s a huge family. It is huger than that. A zoologist would call this a superfamily, at least. Probably an order, possibly a class.

It turns out different relationships of these coefficients give you families of functions. Many of these families are noteworthy enough to have their own names. For example, if α and β are both zero, then the Jacobi functions are a family also known as the Legendre Polynomials. This is a great set of orthogonal polynomials. And the roots of the Legendre Polynomials give you information needed for Gaussian quadrature. Gaussian quadrature is a neat trick for numerically integrating a function. Take a weighted sum of the function you’re integrating evaluated at a set of points. This can get a very good — maybe even perfect — numerical estimate of the integral. The points to use, and the weights to use, come from a Legendre polynomial.

If α and β are both -\frac{1}{2} then the Jacobi Polynomials are the Chebyshev Polynomials of the first kind. (There’s also a second kind.) These are handy in approximation theory, describing ways to better interpolate a polynomial from a set of data. They also have a neat, peculiar relationship to the multiple-cosine formulas. Like, \cos(2\theta) = 2\cos^2(\theta) - 1 . And the second Chebyshev polynomial is T_2(x) = 2x^2 - 1 . Imagine sliding between x and cos(\theta) and you see the relationship. cos(3\theta) = 4 \cos^3(\theta) - 3\cos(\theta) and T_3(x) = 4x^3 - 3x . And so on.

Chebyshev Polynomials have some superpowers. One that’s most amazing is accelerating convergence. Often a numerical process, such as finding the solution of an equation, is an iterative process. You can’t find the answer all at once. You instead find an approximation and do something that improves it. Each time you do the process, you get a little closer to the true answer. This can be fine. But, if the problem you’re working on allows it, you can use the first couple iterations of the solution to figure out where this is going. The result is that you can get very good answers using the same amount of computer time you needed to just get decent answers. The trade, of course, is that you need to understand Chebyshev Polynomials and accelerated convergence. We always have to make trades like that.

Back to the Jacobi Polynomials family. If α and β are the same number, then the Jacobi functions are a family called the Gegenbauer Polynomials. These are great in mathematical physics, in potential theory. You can turn the gravitational or electrical potential function — that one-over-the-distance-squared force — into a sum of better-behaved functions. And they also describe zonal spherical harmonics. These let you represent functions on the surface of a sphere as the sum of coefficients times basis functions. They work in much the way the terms of a Fourier series do.

If β is zero and there’s a particular relationship between α and n that I don’t want to get into? The Jacobi Polynomials become the Zernike Polynomials, which I never heard of before this paragraph either. I read they are the tools you need to understand optics, and particularly how lenses will alter the light passing through.

Since the Jacobi Polynomials have a greater variety of form than even poison ivy has, you’ll forgive me not trying to list them. Or even listing a representative sample. You might also ask how they’re related at all.

Well, they all solve the same differential equation, for one. Not literally a single differential equation. A family of differential equations, where α and β and n turn up in the coefficients. The formula using these coefficients is the same in all these differential equations. That’s a good reason to see a relationship. Or we can write the Jacobi Polynomials as a series, a function made up of the sum of terms. The coefficients for each of the terms depends on α and β and n, always in the same way. I’ll give you that formula. You won’t like it and won’t ever use it. The Jacobi Polynomial for a particular α, β, and n is the polynomial

P_n^{(\alpha, \beta)}(z) = (n+\alpha)!(n + \beta)!\sum_{s=0}^n \frac{1}{s!(n + \alpha - s)!(\beta + s)!(n - s)!}\left(\frac{z-1}{2}\right)^{n-s}\left(\frac{z + 1}{2}\right)^s

Its domain, by the way, is the real numbers from -1 to 1. We need something for the domain. It turns out there’s nothing you can do on the real numbers that you can’t fit into the domain from -1 to 1 anyway. (If you have to do something on, say, the interval from 10 to 54? Do a change of variable, scaling things down and moving them, and use -1 to 1. Then undo that change when you’re done.) The range is the real numbers, as you’d expect.

(You maybe noticed I used ‘z’ for the independent variable there, rather than ‘x’. Usually using ‘z’ means we expect this to be a complex number. But ‘z’ here is definitely a real number. This is because we can also get to the Jacobi Polynomials through the hypergeometric series, a function I don’t want to get into. But for the hypergeometric series we are open to the variable being a complex number. So many references carry that ‘z’ back into Jacobi Polynomials.)

Another thing which links these many functions is recurrence. If you know the Jacobi Polynomial for one set of parameters — and you do; P_0^{(\alpha, \beta)}(z) = 1 — you can find others. You do this in a way rather like how you find new terms in the Fibonacci series by adding together terms you already know. These formulas can be long. Still, if you know P_{n-1}^{(\alpha, \beta)} and P_{n-2}^{(\alpha, \beta)} for the same α and β? Then you can calculate P_n^{(\alpha, \beta)} with nothing more than pen, paper, and determination. If it helps,

P_1^{(\alpha, \beta)}(z) = (\alpha + 1) + (\alpha + \beta + 2)\frac{z - 1}{2}

and this is true for any α and β. You’ll never do anything with that. This is fine.

There is another way that all these many polynomials are related. It goes back to their being orthogonal. We measured orthogonality by a dot product. Back when I palmed that card I told you was the integral of the two functions multiplied together. This is indeed a dot product. We can define others. We make those others by taking a weighted integral of the product of these two functions. That is, integrate the two functions times a third, a weight function. Of course there’s reasons to do this; they amount to deciding that some parts of the domain are more important than others. The weight function can be anything that meets a few rules. If you want to get the Jacobi Polynomials out of them, you start with the function P_0^{(\alpha, \beta)}(z) = 1 and the weight function

w_n(z) = (1 - z)^{\alpha} (1 + z)^{\beta}

As I say, though, you’ll never use that. If you’re eager and ready to leap into this work you can use this to build a couple Legendre Polynomials. Or Chebyshev Polynomials. For the full Jacobi Polynomials, though? Use, like, the command JacobiP[n, a, b, z] in Mathematica, or jacobiP(n, a, b, z) in Matlab. Other people have programmed this for you. Enjoy their labor.

In my work I have not used the full set of Jacobi Polynomials much. There’s more of them than I need. I do rely on the Legendre Polynomials, and the Chebyshev Polynomials. Other mathematicians use other slices regularly. It is stunning to sometimes look and realize that these many functions, different as they look, are reflections of one another, though. Mathematicians like to generalize, and find one case that covers as many things as possible. It’s rare that we are this successful.


I thank you for reading this. All of this year’s A-to-Z essays should be available at this link. The essays from every A-to-Z sequence going back to 2015 should be at this link. And I’m already looking ahead to the M, N, and O essays that I’ll be writing the day before publication instead of the week before like I want! I appreciate any nominations you have, even ones I can’t cover fairly.

My 2019 Mathematics A To Z: Operator


Today’s A To Z term is one I’ve mentioned previously, including in this A to Z sequence. But it was specifically nominated by Goldenoj, whom I know I follow on Twitter. I’m sorry not to be able to give you an account; I haven’t been able to use my @nebusj account for several months now. Well, if I do get a Twitter, Mathstodon, or blog account I’ll refer you there.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Operator.

An operator is a function. An operator has a domain that’s a space. Its range is also a space. It can be the same space but doesn’t have to be. It is very common for these spaces to be “function spaces”. So common that if you want to talk about an operator that isn’t dealing with function spaces it’s good form to warn your audience. Everything in a particular function space is a real-valued and continuous function. Also everything shares the same domain as everything else in that particular function space.

So here’s what I first wonder: why call this an operator instead of a function? I have hypotheses and an unwillingness to read the literature. One is that maybe mathematicians started saying “operator” a long time ago. Taking the derivative, for example, is an operator. So is taking an indefinite integral. Mathematicians have been doing those for a very long time. Longer than we’ve had the modern idea of a function, which is this rule connecting a domain and a range. So the term might be a fossil.

My other hypothesis is the one I’d bet on, though. This hypothesis is that there is a limit to how many different things we can call “the function” in one sentence before the reader rebels. I felt bad enough with that first paragraph. Imagine parsing something like “the function which the Laplacian function took the function to”. We are less likely to make dumb mistakes if we have different names for things which serve different roles. This is probably why there is another word for a function with domain of a function space and range of real or complex-valued numbers. That is a “functional”. It covers things like the norm for measuring a function’s size. It also covers things like finding the total energy in a physics problem.

I’ve mentioned two operators that anyone who’d read a pop mathematics blog has heard of, the differential and the integral. There are more. There are so many more.

Many of them we can build from the differential and the integral. Many operators that we care to deal with are linear, which is how mathematicians say “good”. But both the differential and the integral operators are linear, which lurks behind many of our favorite rules. Like, allow me to call from the vasty deep functions ‘f’ and ‘g’, and scalars ‘a’ and ‘b’. You know how the derivative of the function af + bg is a times the derivative of f plus b times the derivative of g? That’s the differential operator being all linear on us. Similarly, how the integral of af + bg is a times the integral of f plus b times the integral of g? Something mathematical with the adjective “linear” is giving us at least some solid footing.

I’ve mentioned before that a wonder of functions is that most things you can do with numbers, you can also do with functions. One of those things is the premise that if numbers can be the domain and range of functions, then functions can be the domain and range of functions. We can do more, though.

One of the conceptual leaps in high school algebra is that we start analyzing the things we do with numbers. Like, we don’t just take the number three, square it, multiply that by two and add to that the number three times four and add to that the number 1. We think about what if we take any number, call it x, and think of 2x^2 + 4x + 1 . And what if we make equations based on doing this 2x^2 + 4x + 1 ; what values of x make those equations true? Or tell us something interesting?

Operators represent a similar leap. We can think of functions as things we manipulate, and think of those manipulations as a particular thing to do. For example, let me come up with a differential expression. For some function u(x) work out the value of this:

2\frac{d^2 u(x)}{dx^2} + 4 \frac{d u(x)}{dx} + u(x)

Let me join in the convention of using ‘D’ for the differential operator. Then we can rewrite this expression like so:

2D^2 u + 4D u + u

Suddenly the differential equation looks a lot like a polynomial. Of course it does. Remember that everything in mathematics is polynomials. We get new tools to solve differential equations by rewriting them as operators. That’s nice. It also scratches that itch that I think everyone in Intro to Calculus gets, of wanting to somehow see \frac{d^2}{dx^2} as if it were a square of \frac{d}{dx} . It’s not, and D^2 is not the square of D . It’s composing D with itself. But it looks close enough to squaring to feel comfortable.

Nobody needs to do 2D^2 u + 4D u + u except to learn some stuff about operators. But you might imagine a world where we did this process all the time. If we did, then we’d develop shorthand for it. Maybe a new operator, call it T, and define it that T = 2D^2 + 4D + 1 . You see the grammar of treating functions as if they were real numbers becoming familiar. You maybe even noticed the ‘1’ sitting there, serving as the “identity operator”. You know how you’d write out Tv(x) = 3 if you needed to write it in full.

But there are operators that we use all the time. These do get special names, and often shorthand. For example, there’s the gradient operator. This applies to any function with several independent variables. The gradient has a great physical interpretation if the variables represent coordinates of space. If they do, the gradient of a function at a point gives us a vector that describes the direction in which the function increases fastest. And the size of that gradient — a functional on this operator — describes how fast that increase is.

The gradient itself defines more operators. These have names you get very familiar with in Vector Calculus, with names like divergence and curl. These have compelling physical interpretations if we think of the function we operate on as describing a moving fluid. A positive divergence means fluid is coming into the system; a negative divergence, that it is leaving. The curl, in fluids, describe how nearby streams of fluid move at different rate.

Physical interpretations are common in operators. This probably reflects how much influence physics has on mathematics and vice-versa. Anyone studying quantum mechanics gets familiar with a host of operators. These have comfortable names like “position operator” or “momentum operator” or “spin operator”. These are operators that apply to the wave function for a problem. They transform the wave function into a probability distribution. That distribution describes what positions or momentums or spins are likely, how likely they are. Or how unlikely they are.

They’re not all physical, though. Or not purely physical. Many operators are useful because they are powerful mathematical tools. There is a variation of the Fourier series called the Fourier transform. We can interpret this as an operator. Suppose the original function started out with time or space as its independent variable. This often happens. The Fourier transform operator gives us a new function, one with frequencies as independent variable. This can make the function easier to work with. The Fourier transform is an integral operator, by the way, so don’t go thinking everything is a complicated set of derivatives.

Another integral-based operator that’s important is the Laplace transform. This is a great operator because it turns differential equations into algebraic equations. Often, into polynomials. You saw that one coming.

This is all a lot of good press for operators. Well, they’re powerful tools. They help us to see that we can manipulate functions in the ways that functions let us manipulate numbers. It should sound good to realize there is much new that you can do, and you already know most of what’s needed to do it.


This and all the other Fall 2019 A To Z posts should be gathered here. And once I have the time to fiddle with tags I’ll have all past A to Z essays gathered at this link. Thank you for reading. I should be back on Thursday with the letter P.

My 2019 Mathematics A To Z: Differential Equations


The thing most important to know about differential equations is that for short, we call it “diff eq”. This is pronounced “diffy q”. It’s a fun name. People who aren’t taking mathematics smile when they hear someone has to get to “diffy q”.

Sometimes we need to be more exact. Then the less exciting names “ODE” and “PDE” get used. The meaning of the “DE” part is an easy guess. The meaning of “O” or “P” will be clear by the time this essay’s finished. We can find approximate answers to differential equations by computer. This is known generally as “numerical solutions”. So you will encounter talk about, say, “NSPDE”. There’s an implied “of” between the S and the P there. I don’t often see “NSODE”. For some reason, probably a quite arbitrary historical choice, this is just called “numerical integration” instead.

To write about “differential equations” was suggested by aajohannas, who is on Twitter as @aajohannas.

Cartoony banner illustration of a coati, a raccoon-like animal, flying a kite in the clear autumn sky. A skywriting plane has written 'MATHEMATIC A TO Z'; the kite, with the letter 'S' on it to make the word 'MATHEMATICS'.
Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

Differential Equations.

One of algebra’s unsettling things is the idea that we can work with numbers without knowing their values. We can give them names, like ‘x’ or ‘a’ or ‘t’. We can know things about them. Often it’s equations telling us these things. We can make collections of numbers based on them all sharing some property. Often these things are solutions to equations. We can even describe changing those collections according to some rule, even before we know whether any of the numbers is 2. Often these things are functions, here matching one set of numbers to another.

One of analysis’s unsettling things is the idea that most things we can do with numbers we can also do with functions. We can give them names, like ‘f’ and ‘g’ and … ‘F’. That’s easy enough. We can add and subtract them. Multiply and divide. This is unsurprising. We can measure their sizes. This is odd but, all right. We can know things about functions even without knowing exactly what they are. We can group together collections of functions based on some properties they share. This is getting wild. We can even describe changing these collections according to some rule. This change is itself a function, but it is usually called an “operator”, saving us some confusion.

So we can describe a function in an equation. We may not know what f is, but suppose we know \sqrt{f(x) - 2} = x is true. We can suppose that if we cared we could find what function, or functions, f made that equation true. There is shorthand here. A function has a domain, a range, and a rule. The equation part helps us find the rule. The domain and range we get from the problem. Or we take the implicit rule that both are the biggest sets of real-valued numbers for which the rule parses. Sometimes biggest sets of complex-valued numbers. We get so used to saying “the function” to mean “the rule for the function” that we’ll forget to say that’s what we’re doing.

There are things we can do with functions that we can’t do with numbers. Or at least that are too boring to do with numbers. The most important here is taking derivatives. The derivative of a function is another function. One good way to think of a derivative is that it describes how a function changes when its variables change. (The derivative of a number is zero, which is boring except when it’s also useful.) Derivatives are great. You learn them in Intro Calculus, and there are a bunch of rules to follow. But follow them and you can pretty much take the derivative of any function even if it’s complicated. Yes, you might have to look up what the derivative of the arc-hyperbolic-secant is. Nobody has ever used the arc-hyperbolic-secant, except to tease a student.

And the derivative of a function is itself a function. So you can take a derivative again. Mathematicians call this the “second derivative”, because we didn’t expect someone would ask what to call it and we had to say something. We can take the derivative of the second derivative. This is the “third derivative” because by then changing the scheme would be awkward. If you need to talk about taking the derivative some large but unspecified number of times, this is the n-th derivative. Or m-th, if you’ve already used ‘n’ to mean something else.

And now we get to differential equations. These are equations in which we describe a function using at least one of its derivatives. The original function, that is, f, usually appears in the equation. It doesn’t have to, though.

We divide the earth naturally (we think) into two pairs of hemispheres, northern and southern, eastern and western. We divide differential equations naturally (we think) into two pairs of two kinds of differential equations.

The first division is into linear and nonlinear equations. I’ll describe the two kinds of problem loosely. Linear equations are the kind you don’t need a mathematician to solve. If the equation has solutions, we can write out procedures that find them, like, all the time. A well-programmed computer can solve them exactly. Nonlinear equations, meanwhile, are the kind no mathematician can solve. They’re just too hard. There’s no processes that are sure to find an answer.

You may ask. We don’t need mathematicians to solve linear equations. Mathematicians can’t solve nonlinear ones. So what do we need mathematicians for? The answer is that I exaggerate. Linear equations aren’t quite that simple. Nonlinear equations aren’t quite that hopeless. There are nonlinear equations we can solve exactly, for example. This usually involves some ingenious transformation. We find a linear equation whose solution guides us to the function we do want.

And that is what mathematicians do in such a field. A nonlinear differential equation may, generally, be hopeless. But we can often find a linear differential equation which gives us insight to what we want. Finding that equation, and showing that its answers are relevant, is the work.

The other hemispheres we call ordinary differential equations and partial differential equations. In form, the difference between them is the kind of derivative that’s taken. If the function’s domain is more than one dimension, then there are different kinds of derivative. Or as normal people put it, if the function has more than one independent variable, then there are different kinds of derivatives. These are partial derivatives and ordinary (or “full”) derivatives. Partial derivatives give us partial differential equations. Ordinary derivatives give us ordinary differential equations. I think it’s easier to understand a partial derivative.

Suppose a function depends on three variables, imaginatively named x, y, and z. There are three partial first derivatives. One describes how the function changes if we pretend y and z are constants, but let x change. This is the “partial derivative with respect to x”. Another describes how the function changes if we pretend x and z are constants, but let y change. This is the “partial derivative with respect to y”. The third describes how the function changes if we pretend x and y are constants, but let z change. You can guess what we call this.

In an ordinary differential equation we would still like to know how the function changes when x changes. But we have to admit that a change in x might cause a change in y and z. So we have to account for that. If you don’t see how such a thing is possible don’t worry. The differential equations textbook has an example in which you wish to measure something on the surface of a hill. Temperature, usually. Maybe rainfall or wind speed. To move from one spot to another a bit east of it is also to move up or down. The change in (let’s say) x, how far east you are, demands a change in z, how far above sea level you are.

That’s structure, though. What’s more interesting is the meaning. What kinds of problems do ordinary and partial differential equations usually represent? Partial differential equations are great for describing surfaces and flows and great bulk masses of things. If you see an equation about how heat transmits through a room? That’s a partial differential equation. About how sound passes through a forest? Partial differential equation. About the climate? Partial differential equations again.

Ordinary differential equations are great for describing a ball rolling on a lumpy hill. It’s given an initial push. There are some directions (downhill) that it’s easier to roll in. There’s some directions (uphill) that it’s harder to roll in, but it can roll if the push was hard enough. There’s maybe friction that makes it roll to a stop.

Put that way it’s clear all the interesting stuff is partial differential equations. Balls on lumpy hills are nice but who cares? Miniature golf course designers and that’s all. This is because I’ve presented it to look silly. I’ve got you thinking of a “ball” and a “hill” as if I meant balls and hills. Nah. It’s usually possible to bundle a lot of information about a physical problem into something that looks like a ball. And then we can bundle the ways things interact into something that looks like a hill.

Like, suppose we have two blocks on a shared track, like in a high school physics class. We can describe their positions as one point in a two-dimensional space. One axis is where on the track the first block is, and the other axis is where on the track the second block is. Physics problems like this also usually depend on momentum. We can toss these in too, an axis that describes the momentum of the first block, and another axis that describes the momentum of the second block.

We’re already up to four dimensions, and we only have two things, both confined to one track. That’s all right. We don’t have to draw it. If we do, we draw something that looks like a two- or three-dimensional sketch, maybe with a note that says “D = 4” to remind us. There’s some point in this four-dimensional space that describes these blocks on the track. That’s the “ball” for this differential equation.

The things that the blocks can do? Like, they can collide? They maybe have rubber tips so they bounce off each other? Maybe someone’s put magnets on them so they’ll draw together or repel? Maybe there’s a spring connecting them? These possible interactions are the shape of the hills that the ball representing the system “rolls” over. An impenetrable barrier, like, two things colliding, is a vertical wall. Two things being attracted is a little divot. Two things being repulsed is a little hill. Things like that.

Now you see why an ordinary differential equation might be interesting. It can capture what happens when many separate things interact.

I write this as though ordinary and partial differential equations are different continents of thought. They’re not. When you model something you make choices and they can guide you to ordinary or to partial differential equations. My own research work, for example, was on planetary atmospheres. Atmospheres are fluids. Representing how fluids move usually calls for partial differential equations. But my own interest was in vortices, swirls like hurricanes or Jupiter’s Great Red Spot. Since I was acting as if the atmosphere was a bunch of storms pushing each other around, this implied ordinary differential equations.

There are more hemispheres of differential equations. They have names like homogenous and non-homogenous. Coupled and decoupled. Separable and nonseparable. Exact and non-exact. Elliptic, parabolic, and hyperbolic partial differential equations. Don’t worry about those labels. They relate to how difficult the equations are to solve. What ways they’re difficult. In what ways they break computers trying to approximate their solutions.

What’s interesting about these, besides that they represent many physical problems, is that they capture the idea of feedback. Of control. If a system’s current state affects how it’s going to change, then it probably has a differential equation describing it. Many systems change based on their current state. So differential equations have long been near the center of professional mathematics. They offer great and exciting pure questions while still staying urgent and relevant to real-world problems. They’re great things.


Thanks again for reading. All Fall 2019 A To Z posts should be at this link. I should get to the letter E for Tuesday. All of the A To Z essays should be at this link. If you have thoughts about other topics I might cover, please offer suggestions for the letters G and H.

Why I’ll Say 1/x Is A Continuous Function And Why I’ll Say It Isn’t


So let me finally follow up last month’s question. That was whether the function “\frac{1}{x} ” is continuous. My earlier post lays out what a mathematician means by a “continuous function”. The short version is, we have a good definition for a function being continuous at a point in the domain. If it’s continuous at every point in the domain, it’s a continuous function.

The definition of continuous-at-a-point has some technical stuff that I’m going to skip this essay. The important part is that the stuff ordinary people would call “continuous” mathematicians agree with. Like, if you draw a curve representing the function without having to lift your pen off the paper? That function’s continuous. At least the stretch you drew was.

So is the function “\frac{1}{x} ” continuous? What if I said absolutely it is, because ‘x’ is a number that happens to be … oh, let’s say it’s 3. And \frac{1}{3} is a constant function; of course that’s continuous. Your sensible response is to ask if I want a punch in the nose. No, I do not.

One of the great breakthroughs of algebra was that we could use letters to represent any number we want, whether or not we know what number it is. So why can’t I get away with this? And the answer is that we live in a society, please. There are rules. At least, there’s conventions. They’re good things. They save us time setting up problems. They help us see things the current problem has with other problems. They help us communicate to people who haven’t been with us through all our past work. As always, these rules are made for our convenience, and we can waive them for good reason. But then you have to say what those reasons are.

What someone expects, if you write ‘x’ without explanation it’s a variable and usually an independent one. Its value might be any of a set of things, and often, we don’t explicitly know what it is. Letters at the start of the alphabet usually stand for coefficients, some fixed number with a value we don’t want to bother specifying. In making this division — ‘a’, ‘b’, ‘c’ for coefficients, ‘x’, ‘y’, ‘z’ for variables — we are following Réné Descartes, who explained his choice of convention quite well. And there are other letters with connotations. We tend to use ‘t’ as a variable if it seems like we’re looking at something which depends on time. If something seems to depend on a radius, ‘r’ goes into service. We use letters like ‘f’ and ‘g’ and ‘h’ for functions. For indexes, ‘i’ and ‘j’ and ‘k’ get called up. For total counts of things, or for powers, ‘n’ and ‘m’, often capitalized, appear. The result is that any mathematician, looking at the expression

\sum_{j = i}^{n} a_i f(x_j)

would have a fair idea what kinds of things she was looking at.

So when someone writes “the function \frac{1}{x} ” they mean “the function which matches ‘x’, in the domain, with \frac{1}{x} , in the range”. We write this as “f(x) = \frac{1}{x} ”. Or, if we become mathematics majors, and we’re in the right courses, we write “f:x \rightarrow \frac{1}{x} ”. It’s a format that seems like it’s overcomplicating things. But it’s good at emphasizing the idea that a function can be a map, matching a set in the domain to a set in the range.

This is a tiny point. Why discuss it at any length?

It’s because the question “is \frac{1}{x} a continuous function” isn’t well-formed. There’s important parts not specified. We can make it well-formed by specifying these parts. This is adding assumptions about what we mean. What assumptions we make affect what the answer is.

A function needs three components. One component is a set that’s the domain. One component is a set that’s the range. And one component is a rule that pairs up things in the domain with things in the range. But there are some domains and some ranges that we use all the time. We use them so often we end up not mentioning them. We have a common shorthand for functions which is to just list the rule.

So what are the domain and range?

Barring special circumstances, we usually take the domain that offers the most charitable reading of the rule. What’s the biggest set on which the rule makes sense? The domain is that. The range we find once we have the domain and rule. It’s the set that the rule maps the domain onto.

So, for example, if we have the function “f(x) = x2”? That makes sense if ‘x’ is any real number. if there’s no reason to think otherwise, we suppose the domain is the set of all real numbers. We’d write that as the set R. Whatever ‘x’ is, though, ‘x2‘ is either zero or a positive number. So the range is the real numbers greater than or equal to zero. Or the nonnegative real numbers, if you prefer.

And even that reasonably clear guideline hides conventions. Like, who says this should be the real numbers? Can’t you take the square of a complex-valued number? And yes, you absolutely can. Some people even encourage it. So why not use the set C instead?

Convention, again. If we don’t expect to need complex-valued numbers, we don’t tend to use them. I suspect it’s a desire not to invite trouble. The use of ‘x’ as the independent variable is another bit of convention. An ‘x’ can be anything, yes. But if it’s a number, it’s more likely a real-valued number. Same with ‘y’. If we want a complex-valued independent variable we usually label that ‘z’. If we need a second, ‘w’ comes in. Writing “x2” alone suggests real-valued numbers.

And this might head off another question. How do we know that ‘x’ is the only variable? How do we know we don’t need an ordered pair, ‘(x, y)’? This would be from the set called R2, pairs of real-valued numbers. It uses only the first coordinate of the pair, but that’s allowed. How do we know that’s not going on? And we don’t know that from the “x2” part. The “f(x) = ” part gives us that hint. If we thought the problem needed two independent variables, it would usually list them somewhere. Writing “f(x, y) = x2” begs for the domain R2, even if we don’t know what good the ‘y’ does yet. In mapping notation, if we wrote “f:(x, y) \rightarrow x^2 ” we’d be calling for R2. If ‘x’ and ‘z’ both appear, that’s usually a hint that the problem needs coordinates ‘x’, ‘y’, and ‘z’, so that we’d want R3 at least.

So that’s the maybe frustrating heuristic here. The inferred domain is the smallest biggest set that the rule makes sense on. The real numbers, but not ordered pairs of real numbers, and not complex-valued numbers. Something like that.

What does this mean for the function “f(x) = \frac{1}{x} ”? Well, the variable is ‘x’, so we should think real numbers rather than complex-valued ones. There no ‘y’ or ‘z’ or anything, so we don’t need ordered sets. The domain is something in the real numbers, then. And the formula “\frac{1}{x} ” means something for any real number ‘x’ … well, with the one exception. We try not to divide by zero. It raises questions we’d rather not have brought up.

So from this we infer a domain of “all the real numbers except 0”. And this in turn implies a range of “all the real numbers except 0”.

Is “f(x) = \frac{1}{x} ” continuous on every point in the domain? That is, whenever ‘x’ is any real number besides zero? And, well, it is. A proper proof would be even more heaps of paragraphs, so I’ll skip it. Informally, you know if you drew a curve representing this function there’s only one point where you would ever lift your pen. And that point is 0 … which is not in this domain. So the function is continuous at every point in the domain. So the function’s continuous. Done.

And, I admit, not quite comfortably done. I feel like there’s some slight-of-hand anyway. You draw “\frac{1}{x} ” and you absolutely do lift your pen, after all.

So, I fibbed a little above. When I said the range was “the set that the rule maps the domain onto”. I mean, that’s what it properly is. But finding that is often too much work. You have to find where the function would be its smallest, which is often hard, or at least tedious. You have to find where it’s largest, which is just as tedious. You have to find if there’s anything between the smallest and largest values that it skips. You have to find all these gaps. That’s boring. And what’s the harm done if we declare the range is bigger than that set? If, for example, we say the range of’ x2‘ is all the real numbers, even though we know it’s really only the non-negative numbers?

None at all. Not unless we’re taking an exam about finding the smallest range that lets a function make sense. So in practice we’ll throw in all the negative numbers into that range, even if nothing matches them. I admit this makes me feel wasteful, but that’s my weird issue. It’s not like we use the numbers up. We’ll just overshoot on the range and that’s fine.

You see the trap this has set up. If it doesn’t cost us anything to throw in unneeded stuff in the range, and it makes the problem easier to write about, can we do that with the domain?

Well. Uhm. No. Not if we’re doing this right. The range can have unneeded stuff in it. The domain can’t. It seems unfair, but if we don’t set hold to that rule, we make trouble for ourselves. By ourselves I mean mathematicians who study the theory of functions. That’s kind of like ourselves, right? So there’s no declaring that “\frac{1}{x} ” is a function on “all” the real numbers and trusting nobody to ask what happens when ‘x’ is zero.

But we don’t need for a function’s rule to a be a single thing. Or a simple thing. It can have different rules for different parts of the domain. It’s fine to declare, for example, that f(x) is equal to “\frac{1}{x} ” for every real number where that makes sense, and that it’s equal to 0 everywhere else. Or that it’s 1 everywhere else. That it’s negative a billion and a third everywhere else. Whatever number you like. As long as it’s something in the range.

So I’ll declare that my idea of this function is an ‘f(x)’ that’s equal to “\frac{1}{x} ” if ‘x’ is not zero, and that’s equal to 2 if ‘x’ is zero. I admit if I weren’t writing for an audience I’d make ‘f(x)’ equal to 0 there. That feels nicely symmetric. But everybody picks 0 when they’re filling in this function. I didn’t get where I am by making the same choices as everybody else, I tell myself, while being far less successful than everybody else.

And now my ‘f(x)’ is definitely not continuous. The domain’s all the real numbers, yes. But at the point where ‘x’ is 0? There’s no drawing that without raising your pen from the paper. I trust you’re convinced. Your analysis professor will claim she’s not convinced, if you write that on your exam. But if you and she were just talking about functions, she’d agree. Since there’s one point in the domain where the function’s not continuous, the function is not continuous.

So there we have it. “\frac{1}{x} ”, taken in one reasonable way, is a continuous function. “\frac{1}{x} ”, taken in another reasonable way, is not a continuous function. What you think reasonable is what sets your answer.

Is 1/x a Continuous Function?


So this is a question I got by way of a friend. It’s got me thinking because there is an obviously right answer. And there’s an answer that you get to if you think about it longer. And then longer still and realize there are several answers you could give. So I wanted to put it out to my audience. Figuring out your answer and why you stand on that is the interesting bit.

The question is as asked in the subject line: is \frac{1}{x} a continuous function?

Mathematics majors, or related people like physics majors, already understand the question. Other people will want to know what the question means. This includes people who took a class calculus class, who remember three awful weeks where they had to write ε and δ a lot. The era passed, even if they did not. And people who never took a mathematics class, but like their odds at solving a reasoning problem, can get up to speed on this fast.

The colloquial idea of a “continuous function” is, well. Imagine drawing a curve that represents the function. Can you draw the whole thing without lifting your pencil off the page? That is, no gaps, no jumps? Then it’s continuous. That’s roughly the idea we want to capture by talking about a “continuous function”. It needs some logical rigor to pass as mathematics, though. So here we go.

A function is continuous if, and only if, it’s continuous at every point in the function’s domain. That I start out with that may inspire a particular feeling. That feeling is, “our Game Master grinned ear-to-ear and took out four more dice and a booklet when we said we were sure”.

The red-and-brown ground of a rocky outcropping far above the lush green tree-covered hills below.
A discontinuous ground level. I totally took a weeklong vacation to the Keweenaw Peninsula of upper Michigan in order to get this picture just for my readers. Fun fact: there was also a ham radio event happening on the mountain.

But our best definition of continuity builds on functions at particular points. Which is fair. We can imagine a function that’s continuous in some places but that’s not continuous somewhere else. The ground can be very level and smooth right up to the cliff. And we have a nice, easy enough, idea of what it is to be continuous at a point.

I’ll get there in a moment. My life will be much easier if I can give you some more vocabulary. They’re all roughly what you might imagine the words meant if I didn’t tell you they were mathematics words.

The first is ‘map’. A function ‘maps’ something in its domain to something in its range. Like if ‘a’ is a point in the domain, ‘f’ maps that point to ‘f(a)’, in its range. Like, if your function is ‘f(x) = x2‘, then f maps 2 to 4. It maps 3 to 9. It maps -2 to 4 again, and that’s all right. There’s no reason you can’t map several things to one thing.

The next is ‘image’. Take something in the domain. It might be a single point. It might be a couple of points. It might be an interval. It might be several intervals. It’s a set, as big or as empty as you like. The `image’ of that set is all the points in the range that any point in the original set gets mapped to. So, again play with f(x) = x2. The image of the interval from 0 to 2 is the interval from 0 to 4. The image of the interval from 3 to 4 is the interval from 9 to 16. The image of the interval from -3 to 1 is the interval from 0 to 9.

That’s as much vocabulary as I need. Thank you for putting up with that. Now I can say what it means to be continuous at a point.

Is a function continuous at a point? Let me call that point ‘a’? It is continuous at ‘a’ we can do this. Take absolutely any open set in the range that contains ‘f(a)’. I’m going to call that open set ‘R’. Is there an open set, that I’ll call ‘D’, inside the domain, that contains ‘a’, and with an image that’s inside ‘R’? ‘D’ doesn’t have to be big. It can be ridiculously tiny; it just has to be an open set. If there always is a D like this, no matter how big or how small ‘R’ is, then ‘f’ is continuous at ‘a’. If there is not — if there’s even just the one exception — then ‘f’ is not continuous at ‘a’.

I realize that’s going back and forth a lot. It’s as good as we can hope for, though. It does really well at capturing things that seem like they should be continuous. And it never rules as not-continuous something that people agree should be continuous. It does label “continuous” some things that seem like they shouldn’t be. We accept this because not labelling continuous stuff as non-continuous is worse.

And all this talk about open sets and images gets a bit abstract. It’s written to cover all kinds of functions on all kinds of things. It’s hard to master, but, if you get it, you’ve got a lot of things. It works for functions on all kinds of domains and ranges. And it doesn’t need very much. You need to have an idea of what an ‘open set’ is, on the domain and range, and that’s all. This is what gives it universality.

But it does mean there’s the challenge figuring out how to start doing anything. If we promise that we’re talking about a function with domain and range of real numbers we can simplify things. This is where that ε and δ talk comes from. But here’s how we can define “continuous at a point” for a function in the special case that its domain and range are both real numbers.

Take any positive ε. Is there is some positive δ, so that, whenever ‘x’ is a number less than δ away from ‘a’, we know that f(x) is less than ε away from f(a)? If there always is, no matter how large or small ε is, then f is continuous at a. If there ever is not, even for a single exceptional ε, then f is not continuous at a.

That definition is tailored for real-valued functions. But that’s enough if you want to answer the original question. Which, you might remember, is, “is 1/x a continuous function”?

That I ask the question, for a function simple and familiar enough a lot of people don’t even need to draw it, may give away what I think the answer is. But what’s interesting is, of course, why the answer. So I’ll leave that for an essay next week.

My 2018 Mathematics A To Z: Commutative


Today’s A to Z term comes from Reynardo, @Reynardo_red on Twitter, and is a challenge. And the other A To Z posts for this year should be at this link.

Cartoon of a thinking coati (it's a raccoon-like animal from Latin America); beside him are spelled out on Scrabble titles, 'MATHEMATICS A TO Z', on a starry background. Various arithmetic symbols are constellations in the background.
Art by Thomas K Dye, creator of the web comics Newshounds, Something Happens, and Infinity Refugees. His current project is Projection Edge. And you can get Projection Edge six months ahead of public publication by subscribing to his Patreon. And he’s on Twitter as @Newshoundscomic.

Commutative.

Some terms are hard to discuss. This is among them. Mathematicians find commutative things early on. Addition of whole numbers. Addition of real numbers. Multiplication of whole numbers. Multiplication of real numbers. Multiplication of complex-valued numbers. It’s easy to think of this commuting as just having liberty to swap the order of things. And it’s easy to think of commuting as “two things you can do in either order”. It inspires physical examples like rotating a dial, clockwise or counterclockwise, however much you like. Or outside the things that seem obviously mathematical. Add milk and then cereal to the bowl, or cereal and then milk. As long as you don’t overfill the bowl, there’s not an important different. Per Wikipedia, if you’re putting one sock on each foot, it doesn’t matter which foot gets a sock first.

When something is this accessible, and this universal, it gets hard to talk about. It threatens to be invisible. It was hard to say much interesting about the still air in a closed room, at least before there was a chemistry that could tell it wasn’t a homogenous invisible something, and before there was a statistical mechanics that it was doing something even when it was doing nothing.

But commutativity is different. It’s easy to think of mathematics that doesn’t commute. Subtraction doesn’t, for all that it’s as familiar as addition. And despite that we try, in high school algebra, to fuse it into addition. Division doesn’t either, for all that we try to think of it as multiplication. Rotating things in three dimensions doesn’t commute. Nor does multiplying quaternions, which are a kind of number still. (I’m double-dipping here. You can use quaternions to represent three-dimensional rotations, and vice-versa. So they aren’t quite different examples, even though you can use quaternions to do things unrelated to rotations.) Clothing is a mass of things that can and can’t be put on first.

We talk about commuting as if it’s something in (or not in) the operations we do. Adding. Rotating. Walking in some direction. But it’s not entirely in that. Consider walking directions. From an intersection in the city, walk north to the first intersection you encounter. And walk east to the first intersection you encounter. Does it matter whether you walk north first and then east, or east first and then north? In some cases, no; famously, in Midtown Manhattan there’s no difference. At least if we pretend Broadway doesn’t exist.

Also if we don’t start from near the edge of the island, or near Central Park. An operation, even something familiar like addition, is a function. Its domain is an ordered pair. Each thing in the pair is from the set of whatever might be added together. (Or multiplied, or whatever the name of the operation is.) The operation commutes if the order of the pair doesn’t matter. It’s easy to find sets and operations that won’t commute. I suppose it’s for the same reason it’s easier to find rectangular rather than square things. We’re so used to working with operations like multiplication that we forget that multiplication needs things to multiply.

Whether a thing commutes turns up often in group theory. This shouldn’t surprise. Group theory studies how arithmetic works. A “group”, which is a set of things with an operation like multiplication on it, might or might not commute. A “ring”, which has a set of things and two operations, has some commutativity built into it. One ring operation is something like addition. That commutes, or else you don’t have a ring. The other operation is something like multiplication. That might or might not commute. It depends what you need for your problem. A ring with commuting multiplication, plus some other stuff, can reach the heights of being a “field”. Fields are neat. They look a lot like the real numbers, but they can be all weird, too.

But even in a group, that doesn’t have to have a commuting multiplication, we can tease out commutativity. There is a thing named the “commutator”, which is this particular way of multiplying elements together. You can use it to split the original group in the way that odds and evens split the whole numbers. That splitting is based on the same multiplication as the original group. But its domain is now classes based on elements of the original group. What’s created, the “commutator subgroup”, is commutative. We can find a thing, based on what we are interested in, which offers commutativity right nearby.

It reaches further. In analysis, it can be useful to think of functions as “mappings”. We describe this as though a function took a domain and transformed it into a range. We can compose these functions together: take the range from one function and use it as the domain for another. Sometimes these chains of functions will commute. We can get from the original set to the final set by several paths. This can produce fascinating and beautiful proofs that look as if you just drew a lattice-work. The MathWorld page on “Commutative Diagram” has some examples of this, and I recommend just looking at the pictures. Appreciate their aesthetic, particularly the ones immediately after the sentence about “Commutative diagrams are usually composed by commutative triangles and commutative squares”.

Whether these mappings commute can have meaning. This takes us, maybe inevitably, to quantum mechanics. Mathematically, this represents systems as either a wave function or a matrix, whichever is more convenient. We can use this to find the distribution of positions or momentums or energies or anything else we would like to know. Distributions are as much as we can hope for from quantum mechanics. We can say what (eg) the position of something is most likely to be but not what it is. That’s all right.

The mathematics of finding these distributions is just applying an operator, taking a mapping, on this wave function or this matrix. Some pairs of these operators commute, like the ones that let us find momentum and find kinetic energy. Some do not, like those to find position and angular momentum.

We can describe how much two operators do or don’t commute. This is through a thing called the “commutator”. Its form looks almost playfully simple. Call the operators ‘f’ and ‘g’. And that by ‘fg’ we mean, “do g, then do f”. (This seems awkward. But if you think of ‘fg’ as ‘f(g(x))’, where ‘x’ is just something in the domain of g, then this seems less awkward.) The commutator of ‘f’ and ‘g’ is then whatever ‘fg – gf’ is. If it’s always zero, then ‘f’ and ‘g’ commute. If it’s ever not zero, then they don’t.

This is easy to understand physically. Imagine starting from a point on the surface of the earth. Travel south one mile and then west one mile. You are at a different spot than you would be, had you instead travelled west one mile and then south one mile. How different? That’s the commutator. It’s obviously zero, for just multiplying some regular old numbers together. It’s sometimes zero, for these paths on the Earth’s surface. It’s never zero, for finding-the-position and finding-the-angular-momentum. The amount by which that’s never zero we can see as the famous Uncertainty Principle, the limits of what kinds of information we can know about the world.

Still, it is a hard subject to describe. Things which commute are so familiar that it takes work to imagine them not commuting. (How could three times four equal anything but four times three?) Things which do not commute either obviously shouldn’t (add hot water to the instant oatmeal, and eat it), or are unfamiliar enough people need to stop and think about them. (Rotating something in one direction and then another, in three dimensions, generally doesn’t commute. But I wouldn’t fault you for testing this out with a couple objects on hand before being sure about it.) But it can be noticed, once you know to explore.

Someone Else’s Homework: Was It Hard? An Umbrella Search


I wanted to follow up, at last, on this homework problem a friend had.

The question: suppose you have a function f. Its domain is the integers Z. Its rage range is also the integers Z. You know two things about the function. First, for any two integers ‘a’ and ‘b’, you know that f(a + b) equals f(a) + f(b). Second, you know there is some odd number ‘c’ for which f(c) is even. The challenge: prove that f is even for all the integers.

My friend asked, as we were working out the question, “Is this hard?” And I wasn’t sure what to say. I didn’t think it was hard, but I understand why someone would. If you’re used to mathematics problems like showing that all the roots of this polynomial are positive, then this stuff about f being even is weird. It’s a different way of thinking about problems. I’ve got experience in that thinking that my friend hasn’t.

All right, but then, what thinking? What did I see that my friend didn’t? And I’m not sure I can answer that perfectly. Part of gaining mastery of a subject is pattern recognition. Spotting how some things fit a form, while other stuff doesn’t, and some other bits yet are irrelevant. But also part of gaining that mastery is that it becomes hard to notice that’s what you’re doing.

But I can try to look with fresh eyes. There is a custom in writing this sort of problem, and that drove much of my thinking. The custom is that a mathematics problem, at this level, works by the rules of a Minute Mystery Puzzle. You are given in the setup everything that you need to solve the problem, yes. But you’re also not given stuff that you don’t need. If the detective mentions to the butler how dreary the rain is on arriving, you’re getting the tip to suspect the houseguest whose umbrella is unaccounted for.

(This format is almost unavoidable for teaching mathematics. At least it seems unavoidable given the number of problems that don’t avoid it. This can be treacherous. One of the hardest parts in stepping out to research on one’s own is that there’s nobody to tell you what the essential pieces are. Telling apart the necessary, the convenient, and the irrelevant requires expertise and I’m not sure that I know how to teach it.)

The first unaccounted-for umbrella in this problem is the function’s domain and range. They’re integers. Why wouldn’t the range, particularly, be all the real numbers? What things are true about the integers that aren’t true about the real numbers? There’s a bunch of things. The highest-level things are rooted in topology. There’s gaps between one integer and its nearest neighbor. Oh, and an integer has a nearest neighbor. A real number doesn’t. That matters for approximations and for sequences and series. Not likely to matter here. Look to more basic, obvious stuff: there’s even and odd numbers. And the problem talks about knowing something for an odd number in the domain. This is a signal to look at odds and evens for the answer.

The second unaccounted-for umbrella is the most specific thing we learn about the function. There is some odd number ‘c’, and the function matches that integer ‘c’ in the domain to some even number f(c) in the range. This makes me think: what do I know about ‘c’? Most basic thing about any odd number is it’s some even number plus one. And that made me think: can I conclude anything about f(1)? Can I conclude anything about f at the sum of two numbers?

Third unaccounted-for umbrella. The less-specific thing we learn about the function. That is that for any integers ‘a’ and ‘b’, f(a + b) is f(a) + f(b). So see how this interacts with the second umbrella. f(c) is f(some-even-number) + f(1). Do I know anything about f(some-even-number)?

Sure. If I know anything about even numbers, it’s that any even number equals two times some integer. Let me call that some-integer ‘k’. Since some-even-number equals 2*k, then, f(some-even-number) is f(2*k), which is f(k + k). And by the third umbrella, that’s f(k) + f(k). By the first umbrella, f(k) has to be an integer. So f(k) + f(k) has to be even.

So, f(c) is an even number. And it has to equal f(2*k) + f(1). f(2*k) is even; so, f(1) has to be even. These are the things that leapt out to me about the problem. This is why the problem looked, to me, easy.

Because I knew that f(1) was even, I knew that f(1 + 1), or f(2), was even. And so would be f(2 + 1), that is, f(3). And so on, for at least all the positive integers.

Now, after that, in my first version of this proof, I got hung up on what seems like a very fussy technical point. And that was, what about f(0)? What about the negative integers? f(0) is easy enough to show. It follows from one of those tricks mathematics majors are told about early. Somewhere in grad school they start to believe it. And that is: adding zero doesn’t change a number’s value, but it can give you a more useful way to express that number. Here’s how adding zero helps: we know c = c + 0. So f(c) = f(c) + f(0) and whether f(c) is even or odd, f(0) has to be even. Evens and odds don’t work any other way.

After that my proof got hung up on what may seem like a pretty fussy technical point. That amounted to whether f(-1) was even or odd. I discussed this with a couple people who could not see what my issue with this was. I admit I wasn’t sure myself. I think I’ve narrowed it down to this: my questioning whether it’s known that the number “negative one” is the same thing as what we get from the operation “zero minus one”. I mean, in general, this isn’t much questioned. Not for the last couple centuries.

You might be having trouble even figuring out why I might worry there could be a difference. In “0 – 1” the – sign there is a binary operation, meaning, “subtract the number on the right from the number on the left”. In “-1” the – sign there is a unary operation, meaning, “take the additive inverse of the number on the right”. These are two different – signs that look alike. One of them interacts with two numbers. One of them interacts with a single number. How can they mean the same thing?

With some ordinary assumptions about what we mean by “addition” and “subtraction” and “equals” and “zero” and “numbers” and stuff, the difference doesn’t matter much. We can swap between “-1” and “0 – 1” effortlessly. If we couldn’t, we probably wouldn’t use the same symbol for the two ideas. But in the context of this particular question, could we count on that?

My friend wasn’t confident in understanding what the heck I was getting on about. Fair enough. But some part of me felt like that needed to be shown. If it hadn’t been recently shown, or used, in class, then it had to go into this proof. And that’s why I went, in the first essay, into the bit about additive inverses.

This was me over-thinking the problem. I got to looking at umbrellas that likely were accounted for.

My second proof, the one thought up in the shower, uses the same unaccounted-for umbrellas. In the first proof, the second unaccounted-for umbrella seemed particularly important. Knowing that f(c) was odd, what else could I learn? In the second proof, it’s the third unaccounted-for umbrella that seemed key. Knowing that f(a + b) is f(a) + f(b), what could I learn? That right away tells me that for any even number ‘d’, f(d) must be even.

Call this the fourth unaccounted-for umbrella. Every integer is either even or odd. So right away I could prove this for what I really want to say is half of the integers. Don’t call it that. There’s not a coherent way to say the even integers are any fraction of all the integers. There’s exactly as many even integers as there are integers. But you know what I mean. (What I mean is, in any finite interval of consecutive integers, half are going to be even. Well, there’ll be at most two more odd integers than there are even integers. That’ll be close enough to half if the interval is long enough. And if we pretend we can make bigger and bigger intervals until all the integers are covered … yeah. Don’t poke at that and do not use it at your thesis defense because it doesn’t work. That’s what it feels like ought to work.)

But that I could cover the even integers in the domain with one quick sentence was a hint. The hint was, look for some thing similar that would cover the odd integers in the domain. And hey, that second unaccounted-for umbrella said something about one odd integer in the domain. Add to that one of those boring little things that a mathematician knows about odd numbers: the difference between any two odd numbers is an even number. ‘c’ is an odd number. So any odd number in the domain, let’s call it ‘d’, is equal to ‘c’ plus some even number. And f(some-even-number) has to be even and there we go.

So all this is what I see when I look at the question. And why I see those things, and why I say this is not a hard problem. It’s all in spotting these umbrellas.

Someone Else’s Homework: Some More Thoughts


I wanted to get back to my friend’s homework problem. And a question my friend had about the question. It’s a question I figure is good for another essay.

But I also had second thoughts about the answer I gave. Not that it’s wrong, but that it could be better. Also that I’m not doing as well in spelling “range” as I had always assumed I would. This is what happens when I don’t run an essay through Hemmingway App to check whether my sentences are too convoluted. I also catch smaller word glitches.

Let me re-state the problem: Suppose you have a function f, with domain of the integers Z and rage of the integers Z. And also you know that f has the property that for any two integers ‘a’ and ‘b’, f(a + b) equals f(a) + f(b). And finally, suppose that for some odd number ‘c’, you know that f(c) is even. The challenge: prove that f is even for all the integers.

Like I say, the answer I gave on Tuesday is right. That’s fine. I just thought of a better answer. This often happens. There are very few interesting mathematical truths that only have a single proof. The ones that have only a single proof are on the cutting edge, new mathematics in a context we don’t understand well enough yet. (Yes, I am overlooking the obvious exception of ______ .) But a question so well-chewed-over that it’s fit for undergraduate homework? There’s probably dozens of ways to attack that problem.

And yes, you might only see one proof of something. Sometimes there’s an approach that works so well it’s silly to consider alternatives. Or the problem isn’t big enough to need several different proofs. There’s something to regret in that. Re-thinking an argument can make it better. As instructors we might recommend rewriting an assignment before turning it in. But I’m not sure that encourages re-thinking the assignment. It’s too easy to just copy-edit and catch obvious mistakes. Which is valuable, yes. But it’s good for communication, not for the mathematics itself.

So here’s my revised argument. It’s much cleaner, as I realized it while showering Wednesday morning.

Give me an integer. Let’s call it m. Well, m has to be either an even or an odd number. I’m supposing nothing about whether it’s positive or negative, by the way. This means what I show will work whether m is greater than, less than, or equal to zero.

Suppose that m is an even number. Then m has to equal 2*k for some integer k. (And yeah, k might be positive, might be negative, might be zero. Don’t know. Don’t care.) That is, m has to equal k + k. So f(m) = f(k) + f(k). That’s one of the two things we know about the function f. And f(k) + f(k) is is 2 * f(k). And f(k) is an integer: the integers are the function’s rage range). So 2 * f(k) is an even integer. So if m is an even number then f(m) has to be even.

All right. Suppose that m isn’t an even integer. Then it’s got to be an odd integer. So this means m has to be equal to c plus some even number, which I’m going ahead and calling 2*k. Remember c? We were given information about f for that element c in the domain. And again, k might be positive. Might be negative. Might be zero. Don’t know, and don’t need to know. So since m = c + 2*k, we know that f(m) = f(c) + f(2*k). And the other thing we know about f is that f(c) is even. f(2*k) is also even. f(c), which is even, plus f(2*k), which is even, has to be even. So if m is an odd number, then f(m) has to be even.

And so, as long as m is an integer, f(m) is even.

You see why I like that argument better. It’s shorter. It breaks things up into fewer cases. None of those cases have to worry about whether m is positive or negative or zero. Each of the cases is short, and moves straight to its goal. This is the proof I’d be happy submitting. Today, anyway. No telling what tomorrow will make me think.

Someone Else’s Homework: A Solution


I have a friend who’s been taking mathematical logic. While talking over the past week’s work they mentioned a problem that had stumped them. But they’d figured it out — at least the critical part — about a half-hour after turning it in. And I had fun going over it. Since the assignment’s already turned in and I don’t even know which class it was, I’d like to share it with you.

So here’s the problem. Suppose you have a function f, with domain of the integers Z and rage of the integers Z. And also you know that f has the property that for any two integers ‘a’ and ‘b’, f(a + b) equals f(a) + f(b). And finally, suppose that for some odd number ‘c’, you know that f(c) is even. The challenge: prove that f is even for all the integers.

If you want to take a moment to think about that, please do.

A Californian rabbit (white body, grey ears and nose and paws) eating a pile of vegetables. In the background is the sunlit outside in the window, with a small rabbit statue silhouetted behind the rabbit's back.
So you can ponder without spoilers here’s a picture of the rabbit we’re fostering for the month, who’s having lunch. The silhouette behind her back is of a little statue decoration and not some outsider trying to lure our foster rabbit to freedom outside, so far as we know. (Don’t set domesticated rabbits outside. It won’t go well for them. And domesticated rabbits aren’t native to North America, I mention for the majority of my readers who are.)

So here’s my thinking about this.

First thing I want to do is show that f(1) is an even number. How? Well, if ‘c’ is an odd number, then ‘c’ has to equal ‘2*k + 1’ for some integer ‘k’. So f(c) = f(2*k + 1). And therefore f(c) = f(2*k) + f(1). And, since 2*k is equal to k + k, then f(2*k) has to equal f(k) + f(k). Therefore f(c) = 2*f(k) + f(1). Whatever f(k) is, 2*f(k) has to be an even number. And we’re given f(c) is even. Therefore f(1) has to be even.

Now I can prove that if ‘k’ is any positive integer, then f(k) has to be even. Why? Because ‘k’ is equal to 1 + 1 + 1 + … + 1. And so f(k) has to equal f(1) + f(1) + f(1) + … + f(1). That is, it’s k * f(1). And if f(1) is even then so is k * f(1). So that covers the positive integers.

How about zero? Can I show that f(0) is even? Oh, sure, easy. Start with ‘c’. ‘c’ equals ‘c + 0’. So f(c) = f(c) + f(0). The only way that’s going to be true is if f(0) is equal to zero, which is an even number.

By the way, here’s an alternate way of arguing this: 0 = 0 + 0. So f(0) = f(0) + f(0). And therefore f(0) = 2 * f(0) and that’s an even number. Incidentally also zero. Submit the proof you like.

What’s not covered yet? Negative integers. It’s hard not to figure, well, we know f(1) is even, we know f(a + b) if f(a) + f(b). Shouldn’t, like, f(-2) just be -2 * f(1)? Oh, it so should. I don’t feel like we have that already proven, though. So let me nail that down. I’m going to use what we know about f(k) for positive ‘k’, and the fact that f(0) is 0.

So give me any negative integer; I’m going call it ‘-k’. Its additive inverse is ‘k’, which is a positive number. -k + k = 0. And so f(-k + k) = f(-k) + f(k) = f(0). So, f(-k) + f(k) = 0, and f(-k) = -f(k). If f(k) is even — and it is — then f(-k) is also even.

So there we go: whether ‘k’ is a positive, zero, or negative integer, f(k) is even. All the integers are either positive, zero, or negative. So f is even for any integer.

I’ve got some more thoughts about this problem.

How To Wreck Your Idea About What ‘Continuous’ Means


This attractive little tweet came across my feed yesterday:

This function — I guess it’s the “popcorn” function — is a challenge to our ideas about what a “continuous” function is. I’ve mentioned “continuous” functions before and said something like they’re functions you could draw without lifting your pen from the paper. That’s the colloquial, and the intuitive, idea of what they mean. And that’s all right for ordinary uses.

But the best definition mathematicians have thought of for a “continuous function” has some quirks. And here’s one of them. Define a function named ‘f’. Its domain is the real numbers. Its range is the real numbers. And the rule matching things in the domain to things in the range is, as pictured:

  • If ‘x’ is zero then f(x) = 1
  • If ‘x’ is an irrational number then f(x) = 0
  • If ‘x’ is a rational number, then it’s equal in lowest terms to the whole number ‘p’ divided by the positive whole number ‘q’. And for this ‘x’, then f(x) = \frac{1}{q}

And as the tweet from Fermat’s Library says, this is a function that’s continuous on all the irrational numbers. It’s not continuous on any rational numbers. This seems like a prank. But it’s a common approach to finding intuition-testing ideas about continuity. Setting different rules for rational and irrational numbers works well for making these strange functions. And thinking of rational numbers as their representation in lowest terms is also common. (Writing it as ‘p divided by q’ suggests that ‘p’ and ‘q’ are going to be prime, but, no! Think of \frac{3}{8} or of \frac{4}{9} .) If you stare at the plot you can maybe convince yourself that “continuous on the irrational numbers” makes sense here. That heavy line of dots at the bottom looks like it’s approaching a continuous blur, at least.

It can get weirder. It’s possible to create a function that’s continuous at only a single point of all the real numbers. This is why Real Analysis is such a good subject to crash hard against. But we accept weird conclusions like this because the alternative is to give up as “continuous” functions that we just know have to be continuous. Mathematical definitions are things we make for our use.

The Summer 2017 Mathematics A To Z: Functor


Gaurish gives me another topic for today. I’m now no longer sure whether Gaurish hopes me to become a topology blogger or a category theory blogger. I have the last laugh, though. I’ve wanted to get better-versed in both fields and there’s nothing like explaining something to learn about it.

Summer 2017 Mathematics A to Z, featuring a coati (it's kind of the Latin American raccoon) looking over alphabet blocks, with a lot of equations in the background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patreon for those able to support his work. He’s also open for commissions, starting from US$10.

Functor.

So, category theory. It’s a foundational field. It talks about stuff that’s terribly abstract. This means it’s powerful, but it can be hard to think of interesting examples. I’ll try, though.

It starts with categories. These have three parts. The first part is a set of things. (There always is.) The second part is a collection of matches between pairs of things in the set. They’re called morphisms. The third part is a rule that lets us combine two morphisms into a new, third one. That is. Suppose ‘a’, ‘b’, and ‘c’ are things in the set. Then there’s a morphism that matches a \rightarrow b , and a morphism that matches b \rightarrow c . And we can combine them into another morphism that matches a \rightarrow c . So we have a set of things, and a set of things we can do with those things. And the set of things we can do is itself a group.

This describes a lot of stuff. Group theory fits seamlessly into this description. Most of what we do with numbers is a kind of group theory. Vector spaces do too. Most of what we do with analysis has vector spaces underneath it. Topology does too. Most of what we do with geometry is an expression of topology. So you see why category theory is so foundational.

Functors enter our picture when we have two categories. Or more. They’re about the ways we can match up categories. But let’s start with two categories. One of them I’ll name ‘C’, and the other, ‘D’. A functor has to match everything that’s in the set of ‘C’ to something that’s in the set of ‘D’.

And it does more. It has to match every morphism between things in ‘C’ to some other morphism, between corresponding things in ‘D’. It’s got to do it in a way that satisfies that combining, too. That is, suppose that ‘f’ and ‘g’ are morphisms for ‘C’. And that ‘f’ and ‘g’ combine to make ‘h’. Then, the functor has to match ‘f’ and ‘g’ and ‘h’ to some morphisms for ‘D’. The combination of whatever ‘f’ matches to and whatever ‘g’ matches to has to be whatever ‘h’ matches to.

This might sound to you like a homomorphism. If it does, I admire your memory or mathematical prowess. Functors are about matching one thing to another in a way that preserves structure. Structure is the way that sets of things can interact. We naturally look for stuff made up of different things that have the same structure. Yes, functors are themselves a category. That is, you can make a brand-new category whose set of things are the functors between two other categories. This is a good spot to pause while the dizziness passes.

There are two kingdoms of functor. You tell them apart by what they do with the morphisms. Here again I’m going to need my categories ‘C’ and ‘D’. I need a morphism for ‘C’. I’ll call that ‘f’. ‘f’ has to match something in the set of ‘C’ to something in the set of ‘C’. Let me call the first something ‘a’, and the second something ‘b’. That’s all right so far? Thank you.

Let me call my functor ‘F’. ‘F’ matches all the elements in ‘C’ to elements in ‘D’. And it matches all the morphisms on the elements in ‘C’ to morphisms on the elmenets in ‘D’. So if I write ‘F(a)’, what I mean is look at the element ‘a’ in the set for ‘C’. Then look at what element in the set for ‘D’ the functor matches with ‘a’. If I write ‘F(b)’, what I mean is look at the element ‘b’ in the set for ‘C’. Then pick out whatever element in the set for ‘D’ gets matched to ‘b’. If I write ‘F(f)’, what I mean is to look at the morphism ‘f’ between elements in ‘C’. Then pick out whatever morphism between elements in ‘D’ that that gets matched with.

Here’s where I’m going with this. Suppose my morphism ‘f’ matches ‘a’ to ‘b’. Does the functor of that morphism, ‘F(f)’, match ‘F(a)’ to ‘F(b)’? Of course, you say, what else could it do? And the answer is: why couldn’t it match ‘F(b)’ to ‘F(a)’?

No, it doesn’t break everything. Not if you’re consistent about swapping the order of the matchings. The normal everyday order, the one you’d thought couldn’t have an alternative, is a “covariant functor”. The crosswise order, this second thought, is a “contravariant functor”. Covariant and contravariant are distinctions that weave through much of mathematics. They particularly appear through tensors and the geometry they imply. In that introduction they tend to be difficult, even mean, creations, since in regular old Euclidean space they don’t mean anything different. They’re different for non-Euclidean spaces, and that’s important and valuable. The covariant versus contravariant difference is easier to grasp here.

Functors work their way into computer science. The avenue here is in functional programming. That’s a method of programming in which instead of the normal long list of commands, you write a single line of code that holds like fourteen “->” symbols that makes the computer stop and catch fire when it encounters a bug. The advantage is that when you have the code debugged it’s quite speedy and memory-efficient. The disadvantage is if you have to alter the function later, it’s easiest to throw everything out and start from scratch, beginning from vacuum-tube-based computing machines. But it works well while it does. You just have to get the hang of it.

The End 2016 Mathematics A To Z: Weierstrass Function


I’ve teased this one before.

Weierstrass Function.

So you know how the Earth is a sphere, but from our normal vantage point right up close to its surface it looks flat? That happens with functions too. Here I mean the normal kinds of functions we deal with, ones with domains that are the real numbers or a Euclidean space. And ranges that are real numbers. The functions you can draw on a sheet of paper with some wiggly bits. Let the function wiggle as much as you want. Pick a part of it and zoom in close. That zoomed-in part will look straight. If it doesn’t look straight, zoom in closer.

We rely on this. Functions that are straight, or at least straight enough, are easy to work with. We can do calculus on them. We can do analysis on them. Functions with plots that look like straight lines are easy to work with. Often the best approach to working with the function you’re interested in is to approximate it with an easy-to-work-with function. I bet it’ll be a polynomial. That serves us well. Polynomials are these continuous functions. They’re differentiable. They’re smooth.

That thing about the Earth looking flat, though? That’s a lie. I’ve never been to any of the really great cuts in the Earth’s surface, but I have been to some decent gorges. I went to grad school in the Hudson River Valley. I’ve driven I-80 over Pennsylvania’s scariest bridges. There’s points where the surface of the Earth just drops a great distance between your one footstep and your last.

Functions do that too. We can have points where a function isn’t differentiable, where it’s impossible to define the direction it’s headed. We can have points where a function isn’t continuous, where it jumps from one region of values to another region. Everyone knows this. We can’t dismiss those as abberations not worthy of the name “function”; too many of them are too useful. Typically we handle this by admitting there’s points that aren’t continuous and we chop the function up. We make it into a couple of functions, each stretching from discontinuity to discontinuity. Between them we have continuous region and we can go about our business as before.

Then came the 19th century when things got crazy. This particular craziness we credit to Karl Weierstrass. Weierstrass’s name is all over 19th century analysis. He had that talent for probing the limits of our intuition about basic mathematical ideas. We have a calculus that is logically rigorous because he found great counterexamples to what we had assumed without proving.

The Weierstrass function challenges this idea that any function is going to eventually level out. Or that we can even smooth a function out into basically straight, predictable chunks in-between sudden changes of direction. The function is continuous everywhere; you can draw it perfectly without lifting your pen from paper. But it always looks like a zig-zag pattern, jumping around like it was always randomly deciding whether to go up or down next. Zoom in on any patch and it still jumps around, zig-zagging up and down. There’s never an interval where it’s always moving up, or always moving down, or even just staying constant.

Despite being continuous it’s not differentiable. I’ve described that casually as it being impossible to predict where the function is going. That’s an abuse of words, yes. The function is defined. Its value at a point isn’t any more random than the value of “x2” is for any particular x. The unpredictability I’m talking about here is a side effect of ignorance. Imagine I showed you a plot of “x2” with a part of it concealed and asked you to fill in the gap. You’d probably do pretty well estimating it. The Weierstrass function, though? No; your guess would be lousy. My guess would be lousy too.

That’s a weird thing to have happen. A century and a half later it’s still weird. It gets weirder. The Weierstrass function isn’t differentiable generally. But there are exceptions. There are little dots of differentiability, where the rate at which the function changes is known. Not intervals, though. Single points. This is crazy. Derivatives are about how a function changes. We work out what they should even mean by thinking of a function’s value on strips of the domain. Those strips are small, but they’re still, you know, strips. But on almost all of that strip the derivative isn’t defined. It’s only at isolated points, a set with measure zero, that this derivative even exists. It evokes the medieval Mysteries, of how we are supposed to try, even though we know we shall fail, to understand how God can have contradictory properties.

It’s not quite that Mysterious here. Properties like this challenge our intuition, if we’ve gotten any. Once we’ve laid out good definitions for ideas like “derivative” and “continuous” and “limit” and “function” we can work out whether results like this make sense. And they — well, they follow. We can avoid weird conclusions like this, but at the cost of messing up our definitions for what a “function” and other things are. Making those useless. For the mathematical world to make sense, we have to change our idea of what quite makes sense.

That’s all right. When we look close we realize the Earth around us is never flat. Even reasonably flat areas have slight rises and falls. The ends of properties are marked with curbs or ditches, and bordered by streets that rise to a center. Look closely even at the dirt and we notice that as level as it gets there are still rocks and scratches in the ground, clumps of dirt an infinitesimal bit higher here and lower there. The flatness of the Earth around us is a useful tool, but we miss a lot by pretending it’s everything. The Weierstrass function is one of the ways a student mathematician learns that while smooth, predictable functions are essential, there is much more out there.

The End 2016 Mathematics A To Z: Smooth


Mathematicians affect a pose of objectivity. We justify this by working on things whose truth we can know, and which must be true whenever we accept certain rules of deduction and certain definitions and axioms. This seems fair. But we choose to pay attention to things that interest us for particular reasons. We study things we like. My A To Z glossary term for today is about one of those things we like.

Smooth.

Functions. Not everything mathematicians do is functions. But functions turn up a lot. We need to set some rules. “A function” is so generic a thing we can’t handle it much. Narrow it down. Pick functions with domains that are numbers. Range too. By numbers I mean real numbers, maybe complex numbers. That gives us something.

There’s functions that are hard to work with. This is almost all of them, so we don’t touch them unless we absolutely must. But they’re functions that aren’t continuous. That means what you imagine. The value of the function at some point is wholly unrelated to its value at some nearby point. It’s hard to work with anything that’s unpredictable like that. Functions as well as people.

We like functions that are continuous. They’re predictable. We can make approximations. We can estimate the function’s value at some point using its value at some more convenient point. It’s easy to see why that’s useful for numerical mathematics, for calculations to approximate stuff. The dazzling thing is it’s useful analytically. We step into the Platonic-ideal world of pure mathematics. We have tools that let us work as if we had infinitely many digits of precision, for infinitely many numbers at once. And yet we use estimates and approximations and errors. We use them in ways to give us perfect knowledge; we get there by estimates.

Continuous functions are nice. Well, they’re nicer to us than functions that aren’t continuous. But there are even nicer functions. Functions nicer to us. A continuous function, for example, can have corners; it can change direction suddenly and without warning. A differentiable function is more predictable. It can’t have corners like that. Knowing the function well at one point gives us more information about what it’s like nearby.

The derivative of a function doesn’t have to be continuous. Grumble. It’s nice when it is, though. It makes the function easier to work with. It’s really nice for us when the derivative itself has a derivative. Nothing guarantees that the derivative of a derivative is continuous. But maybe it is. Maybe the derivative of the derivative has a derivative. That’s a function we can do a lot with.

A function is “smooth” if it has as many derivatives as we need for whatever it is we’re doing. And if those derivatives are continuous. If this seems loose that’s because it is. A proof for whatever we’re interested in might need only the original function and its first derivative. It might need the original function and its first, second, third, and fourth derivatives. It might need hundreds of derivatives. If we look through the details of the proof we might find exactly how many derivatives we need and how many of them need to be continuous. But that’s tedious. We save ourselves considerable time by saying the function is “smooth”, as in, “smooth enough for what we need”.

If we do want to specify how many continuous derivatives a function has we call it a “Ck function”. The C here means continuous. The ‘k’ means there are the number ‘k’ continuous derivatives of it. This is completely different from a “Ck function”, which would be one that’s a k-dimensional vector. Whether the “C” is boldface or not is important. A function might have infinitely many continuous derivatives. That we call a “C function”. That’s got wonderful properties, especially if the domain and range are complex-valued numbers. We couldn’t do Complex Analysis without it. Complex Analysis is the course students take after wondering how they’ll ever survive Real Analysis. It’s much easier than Real Analysis. Mathematics can be strange.

The End 2016 Mathematics A To Z: Principal


Functions. They’re at the center of so much mathematics. They have three pieces: a domain, a range, and a rule. The one thing functions absolutely must do is match stuff in the domain to one and only one thing in the range. So this is where it gets tricky.

Principal.

Thing with this one-and-only-one thing in the range is it’s not always practical. Sometimes it only makes sense to allow for something in the domain to match several things in the range. For example, suppose we have the domain of positive numbers. And we want a function that gives us the numbers which, squared, are whatever the original function was. For any positive real number there’s two numbers that do that. 4 should match to both +2 and -2.

You might ask why I want a function that tells me the numbers which, squared, equal something. I ask back, what business is that of yours? I want a function that does this and shouldn’t that be enough? We’re getting off to a bad start here. I’m sorry; I’ve been running ragged the last few days. I blame the flat tire on my car.

Anyway. I’d want something like that function because I’m looking for what state of things makes some other thing true. This turns up often in “inverse problems”, problems in which we know what some measurement is and want to know what caused the measurement. We do that sort of problem all the time.

We can handle these multi-valued functions. Of course we can. Mathematicians are as good at loopholes as anyone else is. Formally we declare that the range isn’t the real numbers but rather sets of real numbers. My what-number-squared function then matches ‘4’ in the domain to the set of numbers ‘+2 and -2’. The set has several things in it, but there’s just the one set. Clever, huh?

This sort of thing turns up a lot. There’s two numbers that, squared, give us any real number (except zero). There’s three numbers that, squared, give us any real number (again except zero). Polynomials might have a whole bunch of numbers that make some equation true. Trig functions are worse. The tangent of 45 degrees equals 1. So is the tangent of 225 degrees. Also 405 degrees. Also -45 degrees. Also -585 degrees. OK, a mathematician would use radians instead of degrees, but that just changes what the numbers are. Not that there’s infinitely many of them.

It’s nice to have options. We don’t always want options. Sometimes we just want one blasted simple answer to things. It’s coded into the language. We say “the square root of four”. We speak of “the arctangent of 1”, which is to say, “the angle with tangent of 1”. We only say “all square roots of four” if we’re making a point about overlooking options.

If we’ve got a set of things, then we can pick out one of them. This is obvious, which means it is so very hard to prove. We just have to assume we can. Go ahead; assume we can. Our pick of the one thing out of this set is the “principal”. It’s not any more inherently right than the other possibilities. It’s just the one we choose to grab first.

So. The principal square root of four is positive two. The principal arctangent of 1 is 45 degrees, or in the dialect of mathematicians π divided by four. We pick these values over other possibilities because they’re nice. What makes them nice? Well, they’re nice. Um. Most of their numbers aren’t that big. They use positive numbers if we have a choice in the matter. Deep down we still suspect negative numbers of being up to something.

If nobody says otherwise then the principal square root is the positive one, or the one with a positive number in front of the imaginary part. If nobody says otherwise the principal arcsine is between -90 and +90 degrees (-π/2 and π/2). The principal arccosine is between 0 and 180 degrees (0 and π), unless someone says otherwise. The principal arctangent is … between -90 and 90 degrees, unless it’s between 0 and 180 degrees. You can count on the 0 to 90 part. Use your best judgement and roll with whatever develops for the other half of the range there. There’s not one answer that’s right for every possible case. The point of a principal value is to pick out one answer that’s usually a good starting point.

When you stare at what it means to be a function you realize that there’s a difference between the original function and the one that returns the principal value. The original function has a range that’s “sets of values”. The principal-value version has a range that’s just one value. If you’re being kind to your audience you make some note of that. Usually we note this by capitalizing the start of the function: “arcsin z” gives way to “Arcsin z”. “Log z” would be the principal-value version of “log z”. When you start pondering logarithms for negative numbers or for complex-valued numbers you get multiple values. It’s the same way that the arcsine function does.

And it’s good to warn your audience which principal value you mean, especially for the arc-trigonometric-functions or logarithms. (I’ve never seen someone break the square root convention.) The principal value is about picking the most obvious and easy-to-work-with value out of a set of them. It’s just impossible to get everyone to agree on what the obvious is.

The End 2016 Mathematics A To Z: Local


Today’s is another of those words that means nearly what you would guess. There are still seven letters left, by the way, which haven’t had any requested terms. If you’d like something described please try asking.

Local.

Stops at every station, rather than just the main ones.

OK, I’ll take it seriously.

So a couple years ago I visited Niagara Falls, and I stepped into the river, just above the really big drop.

A view (from the United States side) of the Niagara Falls. With a lot of falling water and somehow even more mist.
Niagara Falls, demonstrating some locally unsafe waters to be in. Background: Canada (left), United States (right).

I didn’t have any plans to go over the falls, and didn’t, but I liked the thrill of claiming I had. I’m not crazy, though; I picked a spot I knew was safe to step in. It’s only in the retelling I went into the Niagara River just above the falls.

Because yes, there is surely danger in certain spots of the Niagara River. But there are also spots that are perfectly safe. And not isolated spots either. I wouldn’t have been less safe if I’d stepped into the river a few feet closer to the edge. Nor if I’d stepped in a few feet farther away. Where I stepped in was locally safe.

Speedy but not actually turbulent waters on the Niagara River, above the falls.
The Niagara River, and some locally safe enough waters to be in. That’s not me in the picture; if you do know who it is, I have no way of challenging you. But it’s the area I stepped into and felt this lovely illicit thrill doing so.

Over in mathematics we do a lot of work on stuff that’s true or false depending on what some parameters are. We can look at bunches of those parameters, and they often look something like normal everyday space. There’s some values that are close to what we started from. There’s others that are far from that.

So, a “neighborhood” of some point is that point and some set of points containing it. It needs to be an “open” set, which means it doesn’t contain its boundary. So, like, everything less than one minute’s walk away, but not the stuff that’s precisely one minute’s walk away. (If we include boundaries we break stuff that we don’t want broken is why.) And certainly not the stuff more than one minute’s walk away. A neighborhood could have any shape. It’s easy to think of it as a little disc around the point you want. That’s usually the easiest to describe in a proof, because it’s “everything a distance less than (something) away”. (That “something” is either ‘δ’ or ‘ε’. Both Greek letters are called in to mean “a tiny distance”. They have different connotations about what the tiny distance is in.) It’s easiest to draw as little amoeba-like blob around a point, and contained inside a bigger amoeba-like blob.

Anyway, something is true “locally” to a point if it’s true in that neighborhood. That means true for everything in that neighborhood. Which is what you’d expect. “Local” means just that. It’s the stuff that’s close to where we started out.

Often we would like to know something “globally”, which means … er … everywhere. Universally so. But it’s usually easier to prove a thing locally. I suppose having a point where we know something is so makes it easier to prove things about what’s nearby. Distant stuff, who knows?

“Local” serves as an adjective for many things. We think of a “local maximum”, for example, or “local minimum”. This is where whatever we’re studying has a value bigger (or smaller) than anywhere else nearby has. Or we speak of a function being “locally continuous”, meaning that we know it’s continuous near this point and we make no promises away from it. It might be “locally differentiable”, meaning we can take derivatives of it close to some interesting point. We say nothing about what happens far from it.

Unless we do. We can talk about something being “local to infinity”. Your first reaction to that should probably be to slap the table and declare that’s it, we’re done. But we can make it sensible, at least to other mathematicians. We do it by starting with a neighborhood that contains the origin, zero, that point in the middle of everything. So, what’s the inverse of that? It’s everything that’s far enough away from the origin. (Don’t include the boundary, we don’t need those headaches.) So why not call that the “neighborhood of infinity”? Other than that it’s a weird set of words to put together? And if something is true in that “neighborhood of infinity”, what is that thing other than true “local to infinity”?

I don’t blame you for being skeptical.

The End 2016 Mathematics A To Z: Image


It’s another free-choice entry. I’ve got something that I can use to make my Friday easier.

Image.

So remember a while back I talked about what functions are? I described them the way modern mathematicians like. A function’s got three components to it. One is a set of things called the domain. Another is a set of things called the range. And there’s some rule linking things in the domain to things in the range. In shorthand we’ll write something like “f(x) = y”, where we know that x is in the domain and y is in the range. In a slightly more advanced mathematics class we’ll write f: x \rightarrow y . That maybe looks a little more computer-y. But I bet you can read that already: “f matches x to y”. Or maybe “f maps x to y”.

We have a couple ways to think about what ‘y’ is here. One is to say that ‘y’ is the image of ‘x’, under ‘f’. The language evokes camera trickery, or at least the way a trick lens might make us see something different. Pretend that the domain is something you could gaze at. If the domain is, say, some part of the real line, or a two-dimensional plane, or the like that’s not too hard to do. Then we can think of the rule part of ‘f’ as some distorting filter. When we look to where ‘x’ would be, we see the thing in the range we know as ‘y’.

At this point you probably imagine this is a pointless word to have. And that it’s backed up by a useless analogy. So it is. As far as I’ve gone this addresses a problem we don’t need to solve. If we want “the thing f matches x to” we can just say “f(x)”. Well, we write “f(x)”. We say “f of x”. Maybe “f at x”, or “f evaluated at x” if we want to emphasize ‘f’ more than ‘x’ or ‘f(x)’.

Where it gets useful is that we start looking at subsets. Bunches of points, not just one. Call ‘D’ some interesting-looking subset of the domain. What would it mean if we wrote the expression ‘f(D)’? Could we make that meaningful?

We do mean something by it. We mean what you might imagine by it. If you haven’t thought about what ‘f(D)’ might mean, take a moment — a short moment — and guess what it might. Don’t overthink it and you’ll have it right. I’ll put the answer just after this little bit so you can ponder.

Close up view of a Flemish Giant rabbit looking at you from the corner of his eye.
Our pet rabbit on the beach in Omena, Michigan back in July this year. Which is a small town on the Traverse Bay, which is just off Lake Michigan where … oh, you have Google Maps, you don’t need me. Anyway we wondered what he would make of vast expanses of water, considering he doesn’t like water what with being a rabbit and all that. And he watched it for a while and then shuffled his way in to where the waves come up and could wash over his front legs, making us wonder what kind of crazy rabbit he is, exactly.

So. ‘f(D)’ is a set. We make that set by taking, in turn, every single thing that’s in ‘D’. And find everything in the range that’s matched by ‘f’ to those things in ‘D’. Collect them all together. This set, ‘f(D)’, is “the image of D under f”.

We use images a lot when we’re studying how functions work. A function that maps a simple lump into a simple lump of about the same size is one thing. A function that maps a simple lump into a cloud of disparate particles is a very different thing. A function that describes how physical systems evolve will preserve the volume and some other properties of these lumps of space. But it might stretch out and twist around that space, which is how we discovered chaos.

Properly speaking, the range of a function ‘f’ is just the image of the whole domain under that ‘f’. But we’re not usually that careful about defining ranges. We’ll say something like ‘the domain and range are the sets of real numbers’ even though we only need the positive real numbers in the range. Well, it’s not like we’re paying for unnecessary range. Let me call the whole domain ‘X’, because I went and used ‘D’ earlier. Then the range, let me call that ‘Y’, would be ‘Y = f(X)’.

Images will turn up again. They’re a handy way to let us get at some useful ideas.

The End 2016 Mathematics A To Z: The Fredholm Alternative


Some things are created with magnificent names. My essay today is about one of them. It’s one of my favorite terms and I get a strange little delight whenever it needs to be mentioned in a proof. It’s also the title I shall use for my 1970s Paranoid-Conspiracy Thriller.

The Fredholm Alternative.

So the Fredholm Alternative is about whether this supercomputer with the ability to monitor every commercial transaction in the country falls into the hands of the Parallax Corporation or whether — ahm. Sorry. Wrong one. OK.

The Fredholm Alternative comes from the world of functional analysis. In functional analysis we study sets of functions with tools from elsewhere in mathematics. Some you’d be surprised aren’t already in there. There’s adding functions together, multiplying them, the stuff of arithmetic. Some might be a bit surprising, like the stuff we draw from linear algebra. That’s ideas like functions having length, or being at angles to each other. Or that length and those angles changing when we take a function of those functions. This may sound baffling. But a mathematics student who’s got into functional analysis usually has a happy surprise waiting. She discovers the subject is easy. At least, it relies on a lot of stuff she’s learned already, applied to stuff that’s less difficult to work with than, like, numbers.

(This may be a personal bias. I found functional analysis a thoroughgoing delight, even though I didn’t specialize in it. But I got the impression from other grad students that functional analysis was well-liked. Maybe we just got the right instructor for it.)

I’ve mentioned in passing “operators”. These are functions that have a domain that’s a set of functions and a range that’s another set of functions. Suppose you come up to me with some function, let’s say f(x) = x^2 . I give you back some other function — say, F(x) = \frac{1}{3}x^3 - 4 . Then I’m acting as an operator.

Why should I do such a thing? Many operators correspond to doing interesting stuff. Taking derivatives of functions, for example. Or undoing the work of taking a derivative. Describing how changing a condition changes what sorts of outcomes a process has. We do a lot of stuff with these. Trust me.

Let me use the name `T’ for some operator. I’m not going to say anything about what it does. The letter’s arbitrary. We like to use capital letters for operators because it makes the operators look extra important. And we don’t want to use `O’ because that just looks like zero and we don’t need that confusion.

Anyway. We need two functions. One of them will be called ‘f’ because we always call functions ‘f’. The other we’ll call ‘v’. In setting up the Fredholm Alternative we have this important thing: we know what ‘f’ is. We don’t know what ‘v’ is. We’re finding out something about what ‘v’ might be. The operator doing whatever it does to a function we write down as if it were multiplication, that is, like ‘Tv’. We get this notation from linear algebra. There we multiple matrices by vectors. Matrix-times-vector multiplication works like operator-on-a-function stuff. So much so that if we didn’t use the same notation young mathematics grad students would rise in rebellion. “This is absurd,” they would say, in unison. “The connotations of these processes are too alike not to use the same notation!” And the department chair would admit they have a point. So we write ‘Tv’.

If you skipped out on mathematics after high school you might guess we’d write ‘T(v)’ and that would make sense too. And, actually, we do sometimes. But by the time we’re doing a lot of functional analysis we don’t need the parentheses so much. They don’t clarify anything we’re confused about, and they require all the work of parenthesis-making. But I do see it sometimes, mostly in older books. This makes me think mathematicians started out with ‘T(v)’ and then wrote less as people got used to what they were doing.

I admit we might not literally know what ‘f’ is. I mean we know what ‘f’ is in the same way that, for a quadratic equation, “ax2 + bx + c = 0”, we “know” what ‘a’, ‘b’, and ‘c’ are. Similarly we don’t know what ‘v’ is in the same way we don’t know what ‘x’ there is. The Fredholm Alternative tells us exactly one of these two things has to be true:

For operators that meet some requirements I don’t feel like getting into, either:

  1. There’s one and only one ‘v’ which makes the equation Tv  = f true.
  2. Or else Tv = 0 for some ‘v’ that isn’t just zero everywhere.

That is, either there’s exactly one solution, or else there’s no solving this particular equation. We can rule out there being two solutions (the way quadratic equations often have), or ten solutions (the way some annoying problems will), or infinitely many solutions (oh, it happens).

It turns up often in boundary value problems. Often before we try solving one we spend some time working out whether there is a solution. You can imagine why it’s worth spending a little time working that out before committing to a big equation-solving project. But it comes up elsewhere. Very often we have problems that, at their core, are “does this operator match anything at all in the domain to a particular function in the range?” When we try to answer we stumble across Fredholm’s Alternative over and over.

Fredholm here was Ivar Fredholm, a Swedish mathematician of the late 19th and early 20th centuries. He worked for Uppsala University, and for the Swedish Social Insurance Agency, and as an actuary for the Skandia insurance company. Wikipedia tells me that his mathematical work was used to calculate buyback prices. I have no idea how.

Theorem Thursday: One Mean Value Theorem Of Many


For this week I have something I want to follow up on. We’ll see if I make it that far.

The Mean Value Theorem.

My subject line disagrees with the header just above here. I want to talk about the Mean Value Theorem. It’s one of those things that turns up in freshman calculus and then again in Analysis. It’s introduced as “the” Mean Value Theorem. But like many things in calculus it comes in several forms. So I figure to talk about one of them here, and another form in a while, when I’ve had time to make up drawings.

Calculus can split effortlessly into two kinds of things. One is differential calculus. This is the study of continuity and smoothness. It studies how a quantity changes if someting affecting it changes. It tells us how to optimize things. It tells us how to approximate complicated functions with simpler ones. Usually polynomials. It leads us to differential equations, problems in which the rate at which something changes depends on what value the thing has.

The other kind is integral calculus. This is the study of shapes and areas. It studies how infinitely many things, all infinitely small, add together. It tells us what the net change in things are. It tells us how to go from information about every point in a volume to information about the whole volume.

They aren’t really separate. Each kind informs the other, and gives us tools to use in studying the other. And they are almost mirrors of one another. Differentials and integrals are not quite inverses, but they come quite close. And as a result most of the important stuff you learn in differential calculus has an echo in integral calculus. The Mean Value Theorem is among them.

The Mean Value Theorem is a rule about functions. In this case it’s functions with a domain that’s an interval of the real numbers. I’ll use ‘a’ as the name for the smallest number in the domain and ‘b’ as the largest number. People talking about the Mean Value Theorem often do. The range is also the real numbers, although it doesn’t matter which ones.

I’ll call the function ‘f’ in accord with a longrunning tradition of not working too hard to name functions. What does matter is that ‘f’ is continuous on the interval [a, b]. I’ve described what ‘continuous’ means before. It means that here too.

And we need one more thing. The function f has to be differentiable on the interval (a, b). You maybe noticed that before I wrote [a, b], and here I just wrote (a, b). There’s a difference here. We need the function to be continuous on the “closed” interval [a, b]. That is, it’s got to be continuous for ‘a’, for ‘b’, and for every point in-between.

But we only need the function to be differentiable on the “open” interval (a, b). That is, it’s got to be continuous for all the points in-between ‘a’ and ‘b’. If it happens to be differentiable for ‘a’, or for ‘b’, or for both, that’s great. But we won’t turn away a function f for not being differentiable at those points. Only the interior. That sort of distinction between stuff true on the interior and stuff true on the boundaries is common. This is why mathematicians have words for “including the boundaries” (“closed”) and “never minding the boundaries” (“open”).

As to what “differentiable” is … A function is differentiable at a point if you can take its derivative at that point. I’m sure that clears everything up. There are many ways to describe what differentiability is. One that’s not too bad is to imagine zooming way in on the curve representing a function. If you start with a big old wobbly function it waves all around. But pick a point. Zoom in on that. Does the function stay all wobbly, or does it get more steady, more straight? Keep zooming in. Does it get even straighter still? If you zoomed in over and over again on the curve at some point, would it look almost exactly like a straight line?

If it does, then the function is differentiable at that point. It has a derivative there. The derivative’s value is whatever the slope of that line is. The slope is that thing you remember from taking Boring Algebra in high school. That rise-over-run thing. But this derivative is a great thing to know. You could approximate the original function with a straight line, with slope equal to that derivative. Close to that point, you’ll make a small enough error nobody has to worry about it.

That there will be this straight line approximation isn’t true for every function. Here’s an example. Picture a line that goes up and then takes a 90-degree turn to go back down again. Look at the corner. However close you zoom in on the corner, there’s going to be a corner. It’s never going to look like a straight line; there’s a 90-degree angle there. It can be a smaller angle if you like, but any sort of corner breaks this differentiability. This is a point where the function isn’t differentiable.

There are functions that are nothing but corners. They can be differentiable nowhere, or only at a tiny set of points that can be ignored. (A set of measure zero, as the dialect would put it.) Mathematicians discovered this over the course of the 19th century. They got into some good arguments about how that can even make sense. It can get worse. Also found in the 19th century were functions that are continuous only at a single point. This smashes just about everyone’s intuition. But we can’t find a definition of continuity that’s as useful as the one we use now and avoids that problem. So we accept that it implies some pathological conclusions and carry on as best we can.

Now I get to the Mean Value Theorem in its differential calculus pelage. It starts with the endpoints, ‘a’ and ‘b’, and the values of the function at those points, ‘f(a)’ and ‘f(b)’. And from here it’s easiest to figure what’s going on if you imagine the plot of a generic function f. I recommend drawing one. Just make sure you draw it without lifting the pen from paper, and without including any corners anywhere. Something wiggly.

Draw the line that connects the ends of the wiggly graph. Formally, we’re adding the line segment that connects the points with coordinates (a, f(a)) and (b, f(b)). That’s coordinate pairs, not intervals. That’s clear in the minds of the mathematicians who don’t see why not to use parentheses over and over like this. (We are short on good grouping symbols like parentheses and brackets and braces.)

Per the Mean Value Theorem, there is at least one point whose derivative is the same as the slope of that line segment. If you were to slide the line up or down, without changing its orientation, you’d find something wonderful. Most of the time this line intersects the curve, crossing from above to below or vice-versa. But there’ll be at least one point where the shifted line is “tangent”, where it just touches the original curve. Close to that touching point, the “tangent point”, the shifted line and the curve blend together and can’t be easily told apart. As long as the function is differentiable on the open interval (a, b), and continuous on the closed interval [a, b], this will be true. You might convince yourself of it by drawing a couple of curves and taking a straightedge to the results.

This is an existence theorem. Like the Intermediate Value Theorem, it doesn’t tell us which point, or points, make the thing we’re interested in true. It just promises us that there is some point that does it. So it gets used in other proofs. It lets us mix information about intervals and information about points.

It’s tempting to try using it numerically. It looks as if it justifies a common differential-calculus trick. Suppose we want to know the value of the derivative at a point. We could pick a little interval around that point and find the endpoints. And then find the slope of the line segment connecting the endpoints. And won’t that be close enough to the derivative at the point we care about?

Well. Um. No, we really can’t be sure about that. We don’t have any idea what interval might make the derivative of the point we care about equal to this line-segment slope. The Mean Value Theorem won’t tell us. It won’t even tell us if there exists an interval that would let that trick work. We can’t invoke the Mean Value Theorem to let us get away with that.

Often, though, we can get away with it. Differentiable functions do have to follow some rules. Among them is that if you do pick a small enough interval then approximations that look like this will work all right. If the function flutters around a lot, we need a smaller interval. But a lot of the functions we’re interested in don’t flutter around that much. So we can get away with it. And there’s some grounds to trust in getting away with it. The Mean Value Theorem isn’t any part of the grounds. It just looks so much like it ought to be.

I hope on a later Thursday to look at an integral-calculus form of the Mean Value Theorem.

A Leap Day 2016 Mathematics A To Z: X-Intercept


Oh, x- and y-, why are you so poor in mathematics terms? I brave my way.

X-Intercept.

I did not get much out of my eighth-grade, pre-algebra, class. I didn’t connect with the teacher at all. There were a few little bits to get through my disinterest. One came in graphing. Not graph theory, of course, but the graphing we do in middle school and high school. That’s where we find points on the plane with coordinates that make some expression true. Two major terms kept coming up in drawing curves of lines. They’re the x-intercept and the y-intercept. They had this lovely, faintly technical, faintly science-y sound. I think the teacher emphasized a few times they were “intercepts”, not “intersects”. But it’s hard to explain to an eighth-grader why this is an important difference to make. I’m not sure I could explain it to myself.

An x-intercept is a point where the plot of a curve and the x-axis meet. So we’re assuming this is a Cartesian coordinate system, the kind marked off with a pair of lines meeting at right angles. It’s usually two-dimensional, sometimes three-dimensional. I don’t know anyone who’s worried about the x-intercept for a four-dimensional space. Even higher dimensions are right out. The thing that confused me the most, when learning this, is a small one. The x-axis is points that have a y-coordinate of zero. Not an x-coordinate of zero. So in a two-dimensional space it makes sense to describe the x-intercept as a single value. That’ll be the x-coordinate, and the point with the x-coordinate of that and the y-coordinate of zero is the intercept.

If you have an expression and you want to find an x-intercept, you need to find values of x which make the expression equal to zero. We get the idea from studying lines. There are a couple of typical representations of lines. They almost always use x for the horizontal coordinate, and y for the vertical coordinate. The names are only different if the author is making a point about the arbitrariness of variable names. Sigh at such an author and move on. An x-intercept has a y-coordinate of zero, so, set any appearance of ‘y’ in the expression equal to zero and find out what value or values of x make this true. If the expression is an equation for a line there’ll be just the one point, unless the line is horizontal. (If the line is horizontal, then either every point on the x-axis is an intercept, or else none of them are. The line is either “y equals zero”, or it is “y equals something other than zero”. )

There’s also a y-intercept. It is exactly what you’d imagine once you know that. It’s usually easier to find what the y-intercept is. The equation describing a curve is typically written in the form “y = f(x)”. That is, y is by itself on one side, and some complicated expression involving x’s is on the other. Working out what y is for a given x is straightforward. Working out what x is for a given y is … not hard, for a line. For more complicated shapes it can be difficult. There might not be a unique answer. That’s all right. There may be several x-intercepts.

There are a couple names for the x-intercepts. The one that turns up most often away from the pre-algebra and high school algebra study of lines is a “zero”. It’s one of those bits in which mathematicians seem to be trying to make it hard for students. A “zero” of the function f(x) is generally not what you get when you evaluate it for x equalling zero. Sorry about that. It’s the values of x for which f(x) equals zero. We also call them “roots”.

OK, but who cares?

Well, if you want to understand the shape of a curve, the way a function looks, it helps to plot it. Today, yeah, pull up Mathematica or Matlab or Octave or some other program and you get your plot. Fair enough. If you don’t have a computer that can plot like that, the way I did in middle school, you have to do it by hand. And then the intercepts are clues to how to sketch the function. They are, relatively, easy points which you can find, and which you know must be on the curve. We may form a very rough sketch of the curve. But that rough picture may be better than having nothing.

And we can learn about the behavior of functions even without plotting, or sketching a plot. Intercepts of expressions, or of parts of expressions, are points where the value might change from positive to negative. If the denominator of a part of the expression has an x-intercept, this could be a point where the function’s value is undefined. It may be a discontinuity in the function. The function’s values might jump wildly between one side and another. These are often the important things about understanding functions. Where are they positive? Where are they negative? Where are they continuous? Where are they not?

These are things we often want to know about functions. And we learn many of them by looking for the intercepts, x- and y-.

A Leap Day 2016 Mathematics A To Z: Surjective Map


Gaurish today gives me one more request for the Leap Day Mathematics A To Z. And it lets me step away from abstract algebra again, into the world of analysis and what makes functions work. It also hovers around some of my past talk about functions.

Surjective Map.

This request echoes one of the first terms from my Summer 2015 Mathematics A To Z. Then I’d spent some time on a bijection, or a bijective map. A surjective map is a less complicated concept. But if you understood bijective maps, you picked up surjective maps along the way.

By “map”, in this context, mathematicians don’t mean those diagrams that tell you where things are and how you might get there. Of course we don’t. By a “map” we mean that we have some rule that matches things in one set to things in another. If this sounds to you like what I’ve claimed a function is then you have a good ear. A mapping and a function are pretty much different names for one another. If there’s a difference in connotation I suppose it’s that a “mapping” makes a weaker suggestion that we’re necessarily talking about numbers.

(In some areas of mathematics, a mapping means a function with some extra properties, often some kind of continuity. Don’t worry about that. Someone will tell you when you’re doing mathematics deep enough to need this care. Mind, that person will tell you by way of a snarky follow-up comment picking on some minor point. It’s nothing personal. They just want you to appreciate that they’re very smart.)

So a function, or a mapping, has three parts. One is a set called the domain. One is a set called the range. And then there’s a rule matching things in the domain to things in the range. With functions we’re so used to the domain and range being the real numbers that we often forget to mention those parts. We go on thinking “the function” is just “the rule”. But the function is all three of these pieces.

A function has to match everything in the domain to something in the range. That’s by definition. There’s no unused scraps in the domain. If it looks like there is, that’s because were being sloppy in defining the domain. Or let’s be charitable. We assumed the reader understands the domain is only the set of things that make sense. And things make sense by being matched to something in the range.

Ah, but now, the range. The range could have unused bits in it. There’s nothing that inherently limits the range to “things matched by the rule to some thing in the domain”.

By now, then, you’ve probably spotted there have to be two kinds of functions. There’s one in which the whole range is used, and there’s ones in which it’s not. Good eye. This is exactly so.

If a function only uses part of the range, if it leaves out anything, even if it’s just a single value out of infinitely many, then the function is called an “into” mapping. If you like, it takes the domain and stuffs it into the range without filling the range.

Ah, but if a function uses every scrap of the range, with nothing left out, then we have an “onto” mapping. The whole of the domain gets sent onto the whole of the range. And this is also known as a “surjective” mapping. We get the term “surjective” from Nicolas Bourbaki. Bourbaki is/was the renowned 20th century mathematics art-collective group which did so much to place rigor and intuition-free bases into mathematics.

The term pairs up with the “injective” mapping. In this, the elements in the range match up with one and only one thing in the domain. So if you know the function’s rule, then if you know a thing in the range, you also know the one and only thing in the domain matched to that. If you don’t feel very French, you might call this sort of function one-to-one. That might be a better name for saying why this kind of function is interesting.

Not every function is injective. But then not every function is surjective either. But if a function is both injective and surjective — if it’s both one-to-one and onto — then we have a bijection. It’s a mapping that can represent the way a system changes and that we know how to undo. That’s pretty comforting stuff.

If we use a mapping to describe how a process changes a system, then knowing it’s a surjective map tells us something about the process. It tells us the process makes the system settle into a subset of all the possible states. That doesn’t mean the thing is stable — that little jolts get worn down. And it doesn’t mean that the thing is settling to a fixed state. But it is a piece of information suggesting that’s possible. This may not seem like a strong conclusion. But considering how little we know about the function it’s impressive to be able to say that much.

The Set Tour, Part 13: Continuity


I hope we’re all comfortable with the idea of looking at sets of functions. If not we can maybe get comfortable soon. What’s important about functions is that we can add them together, and we can multiply them by real numbers. They work in important ways like regular old numbers would. They also work the way vectors do. So all we have to do is be comfortable with vectors. Then we have the background to talk about functions this way. And so, my first example of an oft-used set of functions:

C[a, b]

People like continuity. It’s comfortable. It’s reassuring, even. Most situations, most days, most things are pretty much like they were before, and that’s how we want it. Oh, we cast some hosannas towards the people who disrupt the steady progression of stuff. But we’re lying. Think of the worst days of your life. They were the ones that were very much not like the day before. If the day is discontinuous enough, then afterwards, people ask one another what they were doing when the discontinuous thing happened.

(OK, there are some good days which are very much not like the day before. But imagine someone who seems informed assures you that tomorrow will completely change your world. Do you feel anticipation or dread?)

Mathematical continuity isn’t so fraught with social implications. What we mean by a continuous function is — well, skip the precise definition. Calculus I students see it, stare at it, and run away. It comes back to the mathematics majors in Intro to Real Analysis. Then it comes back again in Real Analysis. Mathematics majors get to accepting it sometime around Real Analysis II, because the alternative is Functional Analysis. The definition’s in truth not so bad. But it’s fussy and if you get any parts wrong silly consequences follow.

If you’re not a mathematics major, or if you’re a mathematics major not taking a test in Real Analysis, you can get away with this. We’re talking here, and we’re going to keep talking, about functions with real numbers as the domain and real numbers as the range. Later, we can go to complex-valued numbers, or even vectors of numbers. The arguments get a bit longer but don’t change much, so if you learn this you’ve got most of the way to learning everything.

A continuous function is one whose graph you can draw without having to lift your pen. We like continuous functions, mathematically, because they are so much easier to work with. Why are they easy? Well, because if you know the value of your function at one point, you know approximately what it is at nearby points. There’s predictability to the function’s values. You can see why this would make it easier to do calculations. But it makes analysis easy too. We want to do a lot of proofs which involve arithmetic with the values functions have. It gets so much easier that we can say the function’s actual value is something like the value it has at some point we happen to know.

So if we want to work with functions, we usually want to work with continuous functions. They behave more predictably, and more like we hope they will.

The set C[a, b] is the set of all continuous real-valued whose domain is the set of real numbers from a to b. For example, pick a function that’s in C[-1, 1]. Let me call it f. Then f is a real-valued function. And its domain is the real numbers from -1 to 1. In the absence of other information about what its range is, we assume it to be the real numbers R. We can have any real numbers as the boundaries; C[-1000, π] is legitimate if eccentric.

There are some ranges that are particularly popular. All the real numbers is one. That might get written C(R) for shorthand. C[0, 1], the range from 0 to 1, is popular and easy to work with. C[-1, 1] is almost as good and has the advantage of giving us negative numbers. C[-π, π] is also liked because it meshes well with the trigonometric functions. You remember those: sines and cosines and tangent functions, plus some unpopular ones we try to not talk about. We don’t often talk about other ranges. We can change, say, C[0, 1] into C[0, 10] exactly the way you’d imagine. Re-scaling numbers, and even shifting them up or down some, requires so little work we don’t bother doing it.

C[-1, 1] is a different set of functions from, say, C[0, 1]. There are many functions in one set that have the same rule as a function in another set. But the functions in C[-1, 1] have a different domain from the functions in C[0, 1]. So they can’t be the same functions. The rule might be meaningful outside the domain. If the rule is “f:x -> 3*x”, well, that makes sense whatever x should be. But a function is the rule, the domain, and the range together. If any of the parts changes, we have a different function.

The way I’ve written the symbols, with straight brackets [a, b], means that both the numbers a and b are in the domain of these functions. If I want to omit the boundaries — have every number greater than a but not a itself, and have every number less than b but not b itself — then we change to parentheses. That would be C(-1, 1). If I want to include one boundary but not the other, use a straight bracket for the boundary to include, and a parenthesis for the boundary to omit. C[-1, 1) says functions in that set have a domain that includes -1 but does not include -1. It also drives my text editor crazy having unmatched parentheses and brackets like that. We must suffer for our mathematical arts.

The Set Tour, Part 12: What Can You Do With Functions?


I want to resume my tour of sets that turn up a lot as domains and ranges. But I need to spend some time explaining stuff before the next bunch. I want to talk about things that aren’t so familiar as “numbers” or “shapes”. We get into more abstract things.

We have to start out with functions. Functions are built of three points, a set that’s the domain, a set that’s the range, and a rule that matches things in the domain to things in the range. But what’s a set? Sets are bunches of things. (If we want to avoid logical chaos we have to be more exact. But we’re not going near the zones of logical chaos. So we’re all right going with “sets are bunches of things”. WARNING: do not try to pass this off at your thesis defense.)

So if a function is a thing, can’t we have a set that’s made up of functions? Sure, why not? We can get a set by describing the collection of things we want in it. At least if we aren’t doing anything weird. (See above warning.)

Let’s pick out a set of functions. Put together a group of functions that all have the same set as their domain, and that have compatible sets as their range. The real numbers are a good pick for a domain. They’re also good for a range.

Is this an interesting set? Generally, a set is boring unless we can do something with the stuff in it. That something is, almost always, taking a pair of the things in the set and relating it to something new. Whole numbers, for example, would be trivia if we weren’t able to add them together. Real numbers would be a complicated pile of digits if we couldn’t multiply them together. Having things is nice. Doing stuff with things is all that’s meaningful.

So what can we do with a couple of functions, if they have the same domains and ranges? Let’s pick one out. Give it the name ‘f’. That’s a common name for functions. It was given to us by Leonhard Euler, who was brilliant in every field of mathematics, including in creating notation. Now let’s pick out a function again. Give this new one the name ‘g’. That’s a common name for functions, given to us by every mathematician who needed something besides ‘f’. (There are alternatives. One is to start using subscripts, like f1 and f2. That’s too hard for me to type. Another is to use different typefaces. Again, too hard for me. Another is to use lower- and upper-case letters, ‘f’ and ‘F’. Using alternate-case forms usually connotes that these two functions are related in some way. I don’t want to suggest that they are related here. So, ‘g’ it is.)

We can do some obvious things. We can add them together. We can create a new function, imaginatively named `f + g’. It’ll have the same domain and the same range as f and g did. What rule defines how it matches things in the domain to things in the range?

Mathematicians throw the term “obvious” around a lot. Also “intuitive”. What they mean is “what makes sense to me but I don’t want to write it down”. Saying that is fine if your mathematician friend knows roughly what you’d think makes sense. It can be catastrophic if she’s much smarter than you, or thinks in weird ways, and is always surprised other people don’t think like her. It’s hard to better describe it than “obvious”, though. Well, here goes.

Let me pick something that’s in the domain of both f and g. I’m going to call that x, which mathematicians have been doing ever since René Descartes gave us the idea. So “f(x)” is something in the range of f, and “g(x) is something in the range of g. I said, way up earlier, that both of these ranges are the same set and suggested the real numbers there. That is, f(x) is some real number and I don’t care which just now. g(x) is also some real number and again I don’t care right now just which.

The function we call “f + g” matches the thing x, in the domain, to something in the range. What thing? The number f(x) + g(x). I told you, I can’t see any fair way to describe that besides being “obvious” and “intuitive”.

Another thing we’ll want to do is multiply a function by a real number. Suppose we have a function f, just like above. Give me a real number. We’ll call that real number ‘a’ because I don’t remember if you can do the alpha symbol easily on web pages. Anyway, we can define a function, `af’, the multiplication of the real number a by the function f. It has the same domain as f, and the same range as f. What’s its rule?

Let me say x is something in the domain of f. So f(x) is some real number. Then the new function `af’ matches the x in the domain with a real number. That number is what you get by multiplying `a’ by whatever `f(x)’ is. So there are major parts of your mathematician friend from college’s classes that you could have followed without trouble.

(Her class would have covered many more things, mind you, and covered these more cryptically.)

There’s more stuff we would like to do with functions. But for now, this is enough. This lets us turn a set of functions into a “vector space”. Vector spaces are kinds of things that work, at least a bit, like arithmetic. And mathematicians have studied these kinds of things. We have a lot of potent tools that work on vector spaces. So mathematicians develop a habit of finding vector spaces in what they study.

And I’m subject to that too. This is why I’ve spent such time talking about what we can do with functions rather than naming particular sets. I’ll pick up from that.

The Set Tour, Part 11: Doughnuts And Lots Of Them


I’ve been slow getting back to my tour of commonly-used domains for several reasons. It’s been a busy season. It’s so much easier to plan out writing something than it is to write something. The usual. But one of my excuses this time is that I’m not sure the set I want to talk about is that common. But I like it, and I imagine a lot of people will like it. So that’s enough.

T and Tn

T stands for the torus. Or the toroid, if you prefer. It’s a fun name. You know the shape. It’s a doughnut. Take a cylindrical tube and curl it around back on itself. Don’t rip it or fold it. That’s hard to do with paper or a sheet of clay or other real-world stuff. But we can imagine it easily enough. I suppose we can make a computer animation of it, if by ‘we’ we mean ‘you’.

We don’t use the whole doughnut shape for T. And no, we don’t use the hole either. What we use is the surface of the doughnut, the part that could get glazed. We ignore the inside, just the same way we had S represent the surface of a sphere (or the edge of a circle, or the boundary of a hypersphere). If there is a common symbol for the torus including the interior I don’t know it. I’d be glad to hear if someone had.

What good is the surface of a torus, though? Well, it’s a neat shape. Slice it in one direction, the way you’d cut a bagel in half, and at the slice you get the shape of a washer, the kind you fit around a nut and bolt. (An annulus, to use the trade term.) Slice it perpendicular to that, the way you’d cut it if you’re one of those people who eats half doughnuts to the amazement of the rest of us, and at the slice you get two detached circles. If you start from any point on the torus shape you can go in one direction and make a circle that loops around the doughnut’s central hole. You can go the perpendicular direction and make a circle that brushes up against but doesn’t go around the central hole. There’s some neat topology in it.

There’s also video games in it. The topology of this is just like old-fashioned video games where if you go off the edge of the screen to the right you come back around on the left, and if you go off the top you come back from the bottom. (And if you go off to the left you come back around the right, and off the bottom you come back to the top.) To go from the flat screen to the surface of a doughnut requires imagining some stretching and scrunching up of the surface, but that’s all right. (OK, in an old video game it was a kind-of flat screen.) We can imagine a nice flexible screen that just behaves.

This is a common trick to deal with boundaries. (I first wrote “to avoid having to deal with boundaries”. But this is dealing with them, by a method that often makes sense.) You just make each boundary match up with a logical other boundary. It’s not just useful in video games. Often we’ll want to study some phenomenon where the current state of things depends on the immediate neighborhood, but it’s hard to say what a logical boundary ought to be. This particularly comes up if we want to model an infinitely large surface without dealing with infinitely large things. The trick will turn up a lot in numerical simulations for that reason. (In that case, we’re in truth working with a numerical approximation of T, but that’ll be close enough.)

Tn, meanwhile, is a vector of things, each of which is a point on a torus. It’s akin to Rn or S2 x n. They’re ordered sets of things that are themselves things. There can be as many as you like. n, here, is whatever positive whole number you need.

You might wonder how big the doughnut is. When we talked about the surface of the sphere, S2, or the surface and interior, B3, we figured on a sphere with radius of 1 unless we heard otherwise. Toruses would seem to have two parameters. There’s how big the outer diameter is and how big the inner diameter is. Which do we pick?

We don’t actually care. It’s much the way we can talk about a point on the surface of a planet by the latitude and longitude of the point, and never care about how big the planet is. We can describe a point on the surface of the torus without needing to refer to how big the whole shape is or how big the hole in the middle is. A popular scheme to describe points is one that looks a lot like latitude and longitude.

Imagine the torus sitting as flat as it gets on the table. Pick a point that you find interesting.

We use some reference point that’s as good as an equator and a prime meridian. One coordinate is the angle you make going horizontally, possibly around the hole in the middle, from the reference point to the point we’re interested in. The other coordinate is the angle you make vertically, going in a loop that doesn’t go around the hole in the middle, from the reference point to the point we’re interested in. The reference point has coordinates 0, 0, as it must. If this sounds confusing it’s because I’m not using a picture. I thought making some pictures would be too much work. I’m a fool. But if you think of real torus-shaped objects it’ll come to you.

In this scheme the coordinates are both angles. Normal people would measure that in degrees, from 0 to 360, or maybe from -180 to 180. Mathematicians would measure as radians, from 0 to 2π, or from -π to +π. Whatever it is, it’s the same as the coordinates of a point on the edge of the circle, what we called S1 a few essays back. So it’s fair to say you can think of T as S1 x S1, an ordered set of points on circles.

I’ve written of these toruses as three-dimensional things. Well, two dimensional-surfaces wrapped up to suggest three-dimensional objects. You don’t have to stick with these dimensions if you don’t want or if your problem needs something else. You can make a torus that’s a three-dimensional shape in four dimensions. For me that’s easiest to imagine as a cube where the left edge and the right edge loop back and meet up, the lower and the upper edges meet up, and the front and the back edges meet up. This works well to model an infinitely large space with a nice and small block.

I like to think I can imagine a four-dimensional doughnut where every cross-section is a sphere. I may be kidding myself. There could also be a five-dimensional torus and you’re on your own working that out, or working out what to do with it.

I’m not sure there is a common standard notation for that, though. Probably the mathematician wanting to make clear she’s working with a torus in four dimensions just says so in text, and trusts that the context of her mathematics makes it clear this is no ordinary torus.

I’ve also written of these toruses as circular, as rounded shapes. That’s the most familiar torus. It’s a doughnut shape, or an O-ring shape, or an inner tube’s shape. It’s the shape you produce by taking a circle and looping it around an axis not on the ring. That’s common and that’s usually all we need.

But if you need some other torus, produced by rotating some other shape around an axis not inside it, go ahead. You’ll need to make clear what that original shape, the generator, is. You’ve seen examples of this in, for example, the washers that fit around nuts and bolts. They’re typically rectangles in cross-section. Or you might have seen that image of someone who fit together a couple dozen iMac boxes to make a giant wheel. I don’t know why you would need this, but it’s your problem, not mine. If these shapes are useful for your work, by all means, use them.

I’m not sure there is a standard notation for that sort of shape. My hunch is to say you’d define your generating shape and give it a name such as A or D. Then name the torus based on that as T(A) or T(D). But I would recommend spelling it out in text before you start using symbols like this.

The Set Tour, Part 9: Balls, Only The Insides


Last week in the tour of often-used domains I talked about Sn, the surfaces of spheres. These correspond naturally to stuff like the surfaces of planets, or the edges of surfaces. They are also natural fits if you have a quantity that’s made up of a couple of components, and some total amount of the quantity is fixed. More physical systems do that than you might have guessed.

But this is all the surfaces. The great interior of a planet is by definition left out of Sn. This gives away the heart of what this week’s entry in the set tour is.

Bn

Bn is the domain that’s the interior of a sphere. That is, B3 would be all the points in a three-dimensional space that are less than a particular radius from the origin, from the center of space. If we don’t say what the particular radius is, then we mean “1”. That’s just as with the Sn we meant the radius to be “1” unless someone specifically says otherwise. In practice, I don’t remember anyone ever saying otherwise when I was in grad school. I suppose they might if we were doing a numerical simulation of something like the interior of a planet. You know, something where it could make a difference what the radius is.

It may have struck you that B3 is just the points that are inside S2. Alternatively, it might have struck you that S2 is the points that are on the edge of B3. Either way is right. Bn and Sn-1, for any positive whole number n, are tied together, one the edge and the other the interior.

Bn we tend to call the “ball” or the “n-ball”. Probably we hope that suggests bouncing balls and baseballs and other objects that are solid throughout. Sn we tend to call the “sphere” or the “n-sphere”, though I admit that doesn’t make a strong case for ruling out the inside of the sphere. Maybe we should think of it as the surface. We don’t even have to change the letter representing it.

As the “n” suggests, there are balls for as many dimensions of space as you like. B2 is a circle, filled in. B1 is just a line segment, stretching out from -1 to 1. B3 is what’s inside a planet or an orange or an amusement park’s glass light fixture. B4 is more work than I want to do today.

So here’s a natural question: does Bn include Sn-1? That is, when we talk about a ball in three dimensions, do we mean the surface and everything inside it? Or do we just mean the interior, stopping ever so short of the surface? This is a division very much like dividing the real numbers into negative and positive; do you include zero among other set?

Typically, I think, mathematicians don’t. If a mathematician speaks of B3 without saying otherwise, she probably means the interior of a three-dimensional ball. She’s not saying anything one way or the other about the surface. This we name the “open ball”, and if she wants to avoid any ambiguity she will say “the open ball Bn”.

“Open” here means the same thing it does when speaking of an “open set”. That may not communicate well to people who don’t remember their set theory. It means that the edges aren’t included. (Warning! Not actual set theory! Do not attempt to use that at your thesis defense. That description was only a reference to what’s important about this property in this particular context.)

If a mathematician wants to talk about the ball and the surface, she might say “the closed ball Bn”. This means to take the surface and the interior together. “Closed”, again, here means what it does in set theory. It pretty much means “include the edges”. (Warning! See above warning.)

Balls work well as domains for functions that have to describe the interiors of things. They also work if we want to talk about a constraint that’s made up of a couple of components, and that can be up to some size but not larger. For example, suppose you may put up to a certain budget cap into (say) six different projects, but you aren’t required to use the entire budget. We could model your budgeting as finding the point in B6 that gets the best result. How you measure the best is a problem for your operations research people. All I’m telling you is how we might represent the study of the thing you’re doing.

The Set Tour, Part 8: Balls, Only Made Harder


I haven’t forgotten or given up on the Set Tour, don’t worry or celebrate. I just expected there to be more mathematically-themed comic strips the last couple days. Really, three days in a row without anything at ComicsKingdom or GoComics to talk about? That’s unsettling stuff. Ah well.

Sn

We are also starting to get into often-used domains that are a bit stranger. We are going to start seeing domains that strain the imagination more. But this isn’t strange quite yet. We’re looking at the surface of a sphere.

The surface of a sphere we call S2. The “S” suggests a sphere. The “2” means that we have a two-dimensional surface, which matches what we see with the surface of the Earth, or a beach ball, or a soap bubble. All these are sphere enough for our needs. If we want to say where we are on the surface of the Earth, it’s most convenient to do this with two numbers. These are a latitude and a longitude. The latitude is the angle made between the point we’re interested in and the equator. The longitude is the angle made between the point we’re interested in and a reference prime longitude.

There are some variations. We can replace the latitude, for example, with the colatitude. That’s the angle between our point and the north pole. Or we might replace the latitude with the cosine of the colatitude. That has some nice analytic properties that you have to be well into grad school to care about. It doesn’t matter. The details may vary but it’s all the same. We put in a number for the east-west distance and another for the north-south distance.

It may seem pompous to use the same system to say where a point is on the surface of a beach ball. But can you think of a better one? Pointing to the ball and saying “there”, I suppose. But that requires we go around with the beach ball pointing out spots. Giving two numbers saves us having to go around pointing.

(Some weenie may wish to point out that if we were clever we could describe a point exactly using only a single number. This is true. Nobody does that unless they’re weenies trying to make a point. This essay is long enough without describing what mathematicians really mean by “dimension”. “How many numbers normal people use to identify a point in it” is good enough.)

S2 is a common domain. If we talk about something that varies with your position on the surface of the earth, we’re probably using S2 as the domain. If we talk about the temperature as it varies with position, or the height above sea level, or the population density, we have functions with a domain of S2 and a range in R. If we talk about the wind speed and direction we have a function with domain of S2 and a range in R3, because the wind might be moving in any direction.

Of course, I wrote down Sn rather than just S2. As with Rn and with Rm x n, there is really a family of similar domains. They are common enough to share a basic symbol, and the superscript is enough to differentiate them.

What we mean by Sn is “the collection of points in Rn+1 that are all the same distance from the origin”. Let me unpack that a little. The “origin” is some point in space that we pick to measure stuff from. On the number line we just call that “zero”. On your normal two-dimensional plot that’s where the x- and y-axes intersect. On your normal three-dimensional plot that’s where the x- and y- and z-axes intersect.

And by “the same distance” we mean some set, fixed distance. Usually we call that the radius. If we don’t specify some distance then we mean “1”. In fact, this is so regularly the radius I’m not sure how we would specify a different one. Maybe we would write Snr for a radius of “r”. Anyway, Sn, the surface of the sphere with radius 1, is commonly called the “unit sphere”. “Unit” gets used a fair bit for shapes. You’ll see references to a “unit cube” or “unit disc” or so on. A unit cube has sides length 1. A unit disc has radius 1. If you see “unit” in a mathematical setting it usually means “this thing measures out at 1”. (The other thing it may mean is “a unit of measure, but we’re not saying which one”. For example, “a unit of distance” doesn’t commit us to saying whether the distance is one inch, one meter, one million light-years, or one angstrom. We use that when we don’t care how big the unit is, and only wonder how many of them we have.)

S1 is an exotic name for a familiar thing. It’s all the points in two-dimensional space that are a distance 1 from the origin. Real people call this a “circle”. So do mathematicians unless they’re comparing it to other spheres or hyperspheres.

This is a one-dimensional figure. We can identify a single point on it easily with just one number, the angle made with respect to some reference direction. The reference direction is almost always that of the positive x-axis. That’s the line that starts at the center of the circle and points off to the right.

S3 is the first hypersphere we encounter. It’s a surface that’s three-dimensional, and it takes a four-dimensional space to see it. You might be able to picture this in your head. When I try I imagine something that looks like the regular old surface of the sphere, only it has fancier shading and maybe some extra lines to suggest depth. That’s all right. We can describe the thing even if we can’t imagine it perfectly. S4, well, that’s something taking five dimensions of space to fit in. I don’t blame you if you don’t bother trying to imagine what that looks like exactly.

The need for S4 itself tends to be rare. If we want to prove something about a function on a hypersphere we usually make do with Sn. This doesn’t tell us how many dimensions we’re working with. But we can imagine that as a regular old sphere only with a most fancy job of drawing lines on it.

If we want to talk about Sn aloud, or if we just want some variation in our prose, we might call it an n-sphere instead. So the 2-sphere is the surface of the regular old sphere that’s good enough for everybody but mathematicians. The 1-sphere is the circle. The 3-sphere and so on are harder to imagine. Wikipedia asserts that 3-spheres and higher-dimension hyperspheres are sometimes called “glomes”. I have not heard this word before, and I would expect it to start a fight if I tried to play it in Scrabble. However, I do not do mathematics that often requires discussion of hyperspheres. I leave this space open to people who do and who can say whether “glome” is a thing.

Something that all these Sn sets have in common are that they are the surfaces of spheres. They are just the boundary, and omit the interior. If we want a function that’s defined on the interior of the Earth we need to find a different domain.

Reading the Comics, November 1, 2015: Uncertainty and TV Schedules Edition


Brian Fies’s Mom’s Cancer is a heartbreaking story. It’s compelling reading, but people who are emotionally raw from lost love ones, or who know they’re particularly sensitive to such stories, should consider before reading that the comic is about exactly what the title says.

But it belongs here because in the October 29th and the November 2nd installments are about a curiosity of area, and volume, and hypervolume, and more. That is that our perception of how big a thing is tends to be governed by one dimension, the length or the diameter of the thing. But its area is the square of that, its volume the cube of that, its hypervolume some higher power yet of that. So very slight changes in the diameter produce great changes in the volume. Conversely, though, great changes in volume will look like only slight changes. This can hurt.

Tom Toles’s Randolph Itch, 2 am from the 29th of October is a Roman numerals joke. I include it as comic relief. The clock face in the strip does depict 4 as IV. That’s eccentric but not unknown for clock faces; IIII seems to be more common. There’s not a clear reason why this should be. The explanation I find most nearly convincing is an aesthetic one. Roman numerals are flexible things, and can be arranged for artistic virtue in ways that Arabic numerals make impossible.

The aesthetic argument is that the four-character symbol IIII takes up nearly as much horizontal space as the VIII opposite it. The two-character IV would look distractingly skinny. Now, none of the symbols takes up exactly the same space as their counterpart. X is shorter than II, VII longer than V. But IV-versus-VIII does seem like the biggest discrepancy. Still, Toles’s art shows it wouldn’t look all that weird. And he had to conserve line strokes, so that the clock would read cleanly in newsprint. I imagine he also wanted to avoid using different representations of “4” so close together.

Jon Rosenberg’s Scenes From A Multiverse for the 29th of October is a riff on both quantum mechanics — Schödinger’s Cat in a box — and the uncertainty principle. The uncertainty principle can be expressed as a fascinating mathematical construct. It starts with Ψ, a probability function that has spacetime as its domain, and the complex-valued numbers as its range. By applying a function to this function we can derive yet another function. This function-of-a-function we call an operator, because we’re saying “function” so much it’s starting to sound funny. But this new function, the one we get by applying an operator to Ψ, tells us the probability that the thing described is in this place versus that place. Or that it has this speed rather than that speed. Or this angular momentum — the tendency to keep spinning — versus that angular momentum. And so on.

If we apply an operator — let me call it A — to the function Ψ, we get a new function. What happens if we apply another operator — let me call it B — to this new function? Well, we get a second new function. It’s much the way if we take a number, and multiply it by another number, and then multiply it again by yet another number. Of course we get a new number out of it. What would you expect? This operators-on-functions things looks and acts in many ways like multiplication. We even use symbols that look like multiplication: AΨ is operator A applied to function Ψ, and BAΨ is operator B applied to the function AΨ.

Now here is the thing we don’t expect. What if we applied operator B to Ψ first, and then operator A to the product? That is, what if we worked out ABΨ? If this was ordinary multiplication, then, nothing all that interesting. Changing the order of the real numbers we multiply together doesn’t change what the product is.

Operators are stranger creatures than real numbers are. It can be that BAΨ is not the same function as ABΨ. We say this means the operators A and B do not commute. But it can be that BAΨ is exactly the same function as ABΨ. When this happens we say that A and B do commute.

Whether they do or they don’t commute depends on the operators. When we know what the operators are we can say whether they commute. We don’t have to try them out on some functions and see what happens, although that sometimes is the easiest way to double-check your work. And here is where we get the uncertainty principle from.

The operator that lets us learn the probability of particles’ positions does not commute with the operator that lets us learn the probability of particles’ momentums. We get different answers if we measure a particle’s position and then its velocity than we do if we measure its velocity and then its position. (Velocity is not the same thing as momentum. But they are related. There’s nothing you can say about momentum in this context that you can’t say about velocity.)

The uncertainty principle is a great source for humor, and for science fiction. It seems to allow for all kinds of magic. Its reality is no less amazing, though. For example, it implies that it is impossible for an electron to spiral down into the nucleus of an atom, collapsing atoms the way satellites eventually fall to Earth. Matter can exist, in ways that let us have solid objects and chemistry and biology. This is at least as good as a cat being perhaps boxed.

Jan Eliot’s Stone Soup Classics for the 29th of October is a rerun from 1995. (The strip itself has gone to Sunday-only publication.) It’s a joke about how arithmetic is easy when you have the proper motivation. In 1995 that would include catching TV shows at a particular time. You see, in 1995 it was possible to record and watch TV shows when you wanted, but it required coordinating multiple pieces of electronics. It would often be easier to just watch when the show actually aired. Today we have it much better. You can watch anything you want anytime you want, using any piece of consumer electronics you have within reach, including several current models of microwave ovens and programmable thermostats. This does, sadly, remove one motivation for doing arithmetic. Also, I’m not certain the kids’ TV schedule is actually consistent with what was on TV in 1995.

Oh, heck, why not. Obviously we’re 14 minutes before the hour. Let me move onto the hour for convenience. It’s 744 minutes to the morning cartoons; that’s 12.4 hours. Taking the morning cartoons to start at 8 am, that means it’s currently 14 minutes before 24 minutes before 8 pm. I suspect a rounding error. Let me say they’re coming up on 8 pm. 194 minutes to Jeopardy implies the game show is on at 11 pm. 254 minutes to The Simpsons puts that on at midnight, which is probably true today, though I don’t think it was so in 1995 just yet. 284 minutes to Grace puts that on at 12:30 am.

I suspect that Eliot wanted it to be 978 minutes to the morning cartoons, which would bump Oprah to 4:00, Jeopardy to 7:00, Simpsons and Grace to 8:00 and 8:30, and still let the cartoons begin at 8 am. Or perhaps the kids aren’t that great at arithmetic yet.

Stephen Beals’s Adult Children for the 30th of October tries to build a “math error” out of repeated use of the phrase “I couldn’t care less”. The argument is that the thing one cares least about is unique. But why can’t there be two equally least-cared-about things?

We can consider caring about things as an optimization problem. Optimization problems are about finding the most of something given some constraints. If you want the least of something, multiply the thing you have by minus one and look for the most of that. You may giggle at this. But it’s the sensible thing to do. And many things can be equally high, or low. Take a bundt cake pan, and drizzle a little water in it. The water separates into many small, elliptic puddles. If the cake pan were perfectly formed, and set on a perfectly level counter, then the bottom of each puddle would be at the same minimum height. I grant a real cake pan is not perfect; neither is any counter. But you can imagine such.

Just because you can imagine it, though, must it exist? Think of the “smallest positive number”. The idea is simple. Positive numbers are a set of numbers. Surely there’s some smallest number. Yet there isn’t; name any positive number and we can name a smaller number. Divide it by two, for example. Zero is smaller than any positive number, but it’s not itself a positive number. A minimum might not exist, at least not within the confines where we are to look. It could be there is not something one could not care less about.

So a minimum might or might not exist, and it might or might not be unique. This is why optimization problems are exciting, challenging things.

A bedbug declares that 'according to our quantum mechanical computations, our entire observable universe is almost certainly Fred Wardle's bed.'
Niklas Eriksson’s Carpe Diem for the 1st of November, 2015. I’m not sure how accurately the art depicts bedbugs, although I’m also not sure how accurately Eriksson should.

Niklas Eriksson’s Carpe Diem for the 1st of November is about understanding the universe by way of observation and calculation. We do rely on mathematics to tell us things about the universe. Immanuel Kant has a bit of reputation in mathematical physics circles for this observation. (I admit I’ve never seen the original text where Kant observed this, so I may be passing on an urban legend. My love has several thousands of pages of Kant’s writing, but I do not know if any of them touch on natural philosophy.) If all we knew about space was that gravitation falls off as the square of the distance between two things, though, we could infer that space must have three dimensions. Otherwise that relationship would not make geometric sense.

Jeff Harris’s kids-information feature Shortcuts for the 1st of November was about the Harvard Computers. By this we mean the people who did the hard work of numerical computation, back in the days before this could be done by electrical and then electronic computer. Mathematicians relied on people who could do arithmetic in those days. There is the folkloric belief that mathematicians are inherently terrible at arithmetic. (I suspect the truth is people assume mathematicians must be better at arithmetic than they really are.) But here, there’s the mathematics of thinking what needs to be calculated, and there’s the mathematics of doing the calculations.

Their existence tends to be mentioned as a rare bit of human interest in numerical mathematics books, usually in the preface in which the author speaks with amazement of how people who did computing were once called computers. I wonder if books about font and graphic design mention how people who typed used to be called typewriters in their prefaces.

The Set Tour, Part 3: R^n


After talking about the real numbers last time, I had two obvious sets to use as follow up. Of course I’d overthink the choice of which to make my next common domain-and-range set.

Rn

Rn is pronounced “are enn”, just as you might do if you didn’t know enough mathematics to think the superscript meant something important. It does mean something important; it’s just that there’s not a graceful way to say what offhand. This is the set of n-tuples of real numbers. That is, anything you pick out of Rn is an ordered set of things all of which are themselves real numbers. The “n” here is the name for some whole number whose value isn’t going to change during the length of this problem.

So when we speak of Rn we are really speaking of a family of sets, all of them similar in some important ways. The things in R2 look like pairs of real numbers: (3, 4), or (4π, -2e), or (2038, 0.010010001), pairs like that. The things in R3 are triplets of real numbers: (3, 4, 5), or (4π, -2e, 1 + 1/π). The things in R4 are quartets of real numbers: (3, 4, 5, 12) or (4π, -2e, 1 + 1/π, -6) or so. The things in R10 are probably clear enough to not need listing.

It’s possible to add together two things in Rn. At least if they come from the same Rn; you can’t add a pair of numbers to a quartet of numbers, not if you’re being honest. The addition rule is just what you’d come up with if you didn’t know enough mathematics to be devious, though: add the first number of the first thing to the first number of the second thing, and that’s the first number of the sum. Add the second number of the first thing to the second number of the second thing, and that’s the second number of the sum. Add the third number of the first thing to the third number of the second thing, and that’s the third number of the sum. Keep on like this until you run out of numbers in each thing. It’s possible you already have.

You can’t multiply together two things in Rn, though, unless your n is 1. (There may be some conceptual difference between R1 and plain old R. But I don’t recall seeing a mathematician being interested in the difference except when she’s studying the philosophy of mathematics.) The obvious multiplication scheme — multiply matching numbers, like you do with addition — produces something that doesn’t work enough like multiplication to be interesting. It’s possible for some n’s to work out schemes that act like multiplication enough to be interesting, but for the most part we don’t need them.

What we will do, though, is multiply something in Rn by a single real number. That real number is called a “scalar”. You do the multiplication, again, like you’d do if you were too new to mathematics to be clever. Multiply the first number in your thing by the scalar, and that’s the first number in your product. Multiply the second number in your thing by the scalar, and that’s the second number in your product. Multiply the third number in your thing by the scalar, and that’s the third number in your product. Carry on like this until you run out of numbers, and then stop. Usually good advice.

That you can add together two things from Rn, and you can multiply anything in Rn by a scalar, makes this a “vector space”. (There are some more requirements, but they amount to addition and multiplication working like you’d expect.) The term means about what you think; a “space” is a … well … something that acts mathematically like ordinary everyday space works. A “vector space” is a space where the things inside it are vectors. Vectors are a combination of a direction and a distance in that direction. They’re very well-represented as n-tuples. They get represented as n-tuples so often it’s easy to forget that’s just a convenient way to write them down.

This vector space property of Rn makes it a really useful set. R2 corresponds naturally to “the points on a flat surface”. R3 corresponds naturally to an idea of “all the points in normal everyday space where something could be”. Or, if you like, it can represent “the speed and direction something is travelling in”. Or the direction and amount of its acceleration, for that matter.

Because of these mathematicians will often call Rn the “n-dimensional Euclidean space”. The n is about how many components there are in an element of the set. The “space” tells us it’s a space. “Euclidean” tells us that it looks and works like, well, Euclidean geometry. We can talk about the distance between points and use the ideas we had from plane or solid geometry. We can talk about angles and areas and volumes similarly. We can do this so much we might say “n-dimensional space” as if there weren’t anything but Euclidean spaces out there.

And this is useful for more than describing where something happens to be. A great number of physics problems find it convenient to study the position and the velocity of a number of particles which interact. If we have N particles, then, and we’re in a three-dimensional space, and we’re keeping track of positions and velocities for each of them, then we can describe where everything is and how everything is moving as one element in the space R6N. We can describe movement in time as a function that has a domain of R6N and a range of R6N, and see the progression of time as tracing out a path in that space.

We can’t draw that, obviously, and I’d look skeptically at people who say they can visualize it. What we usually draw is a little enclosed space that’s either a rectangle or a blob, and draw out lines — “trajectories” — inside that. The different spots along the trajectory correspond to all the positions and velocities of all the particles in the system at different times.

Though that’s a fantastic use, it’s not the only one. It’s not required, for example, that a function have the same Rn as both domain and range. It can have different sets. If we want to be clear that the domain and range can be of different sizes, it’s common to call one Rn and the other Rm if we aren’t interested in pinning down just which spaces they are.

But, for example, a perfectly legitimate function would have a domain of R3 and a range of R1, the reals. There’s even an obvious, common one: return the size, the magnitude, of whatever the vector in the domain is. Or we might take as domain R4, and the range R2, following the rule “match an element in the domain to an element in the range that has the same first and third components”. That kind of function is called a “projection”, as it gives what might look like the shadow of the original thing in a smaller space.

If we wanted to go the other way, from R2 to R4 as an example, we could. Here set the rule “match an element in the domain to an element in the range which has the same first and second components, and has ‘3’ and ‘4’ as the third and fourth components”. That’s an “embedding”, giving us the idea that we can put a Euclidean space with fewer dimensions into a space with more. The idea comes naturally to anyone who’s seen a cartoon where a character leaps off the screen and interacts with the real world.

The Set Tour, Stage 2: The Real Star


For the second of my little tour of sets that get commonly used as domains and ranges I want to name the most common of them all.

R

This is the real numbers. In text that’s written with a bold R. Written by hand, and often in text, that’s written with a capital R that has a double stroke for the main vertical line. That’s an easy-to-write way to distinguish it from a plain old civilian R. The double-vertical-stroke convention is used for many of the most common sets of numbers. It will get used for letters like I and J (the integers), or N (the counting numbers). A vertical stroke will even get added to symbols that technically don’t have any vertical strokes, like Q (the rational numbers). There it’s just put inside the loop, on the left side, far enough from the edge that the reader can notice the vertical stroke is there.

R is a big one. It’s not just a big set. It’s also a popular one. It may as well be the default domain and range. If someone fails to tell you what either set is, you can suppose she meant R and be only rarely wrong. The real numbers are familiar and popular and it feels like we know what they are. It’s a bit tricky to define them exactly, though, and you’ll notice that I’m not doing that. You know what I mean, though. It’s whole numbers, and rational numbers, and irrational numbers like the square root of pi, and for that matter pi, and a whole bunch of other boring numbers nobody looks at. Let’s leave it at that.

All the intervals I talked about last time are subsets of R. If we really wanted to, we could turn a function with domain an interval like [0, 1] into a function with a domain of R. That’s a kind of “embedding”. Let me call the function with domain [0, 1] by the name “f”. I’ll then define g, on the domain R, by the rule “whatever f(x) is, if x is from 0 to 1; and some other, harmless value, if x isn’t”. Probably the harmless value is zero. Sometimes we need to change the domain a function’s defined on, and this is a way to do it.

If we only want to talk about the positive real numbers we can denote that by putting a plus sign in superscript: R+. If we only want the negative numbers we put in a minus sign: R. Do either of these include zero? My heart tells me neither should, but I wouldn’t be surprised if in practice either did, because zero is often useful to have around. To be careful we might explicitly include zero, using the notations of set theory. Then we might write \textbf{R}^+ \cup \left\{0\right\} .

Sometimes the rule for a function doesn’t make sense for some values. For example, if a function has the rule f: x \mapsto 1 / (x - 1) then you can’t work out a value for f(1). That would require dividing by zero and we dare not do that. A careful mathematician would say the domain of that function f is all the real numbers R except for the number 1. This exclusion gets written as “R \ {1}”. The backslash means “except the numbers in the following set”. It might be a single number, such as in this example. It might be a lot of numbers. The function g: x \mapsto \log\left(1 - x\right) is meaningless for any x that’s equal to or greater than 1. We could write its domain then as “R \ { x: x ≥ 1 }”.

That’s if we’re being careful. If we get a little careless, or if we’re writing casually, or if the set of non-permitted points is complicated we might omit that. Mathematical writing includes an assumption of good faith. The author is supposed to be trying to say something interesting and true. The reader is expected to be skeptical but not quarrelsome. Spotting a flaw in the argument because the domain doesn’t explicitly rule out some points it shouldn’t have is tedious. Finding that the interesting thing only holds true for values that are implicitly outside the domain is serious.

The set of real numbers is a group; it has an operation that works like addition. We call it addition. For that matter, it’s a ring. It has an operation that works like multiplication. We call it multiplication. And it’s even more than a ring. Everything in R except for the additive identity — 0, the number you can add to anything without changing what the thing is — has a multiplicative inverse. That is, any number except zero has some number you can multiply it by to get 1. This property makes it a “field”, to people who study (abstract) algebra. This “field” hasn’t got anything to do with gravitational or electrical or baseball or magnetic fields. But the overlap in names does serve to sometimes confuse people.

But having this multiplicative inverse means that we can do something that operates like division. Divide one thing by a second by taking the first thing and multiplying it by the second thing’s multiplicative inverse. We call this division-like operation “division”.

It’s not coincidence that the algebraic “addition” and “multiplication” and “division” operations are the ones we call addition and multiplication and division. What makes abstract algebra abstract is that it’s the study of things that work kind of like the real numbers do. The operations we can do on the real numbers inspire us to look for other sets that can let us do similar things.

One Way We Write Functions


During the Summer A To Z I talked a bit about functions. Mathematically we see these as a collection of three things: a set of things which we call the domain, a set of things which we call the range, and a rule that matches things in the domain to something in the range. The domain and the range can be the same set, or they can be different ones. The definition is quite flexible. What I want to talk about here is how to write them down.

We can describe each of these sets in words, and often will when speaking or when describing a line of argument. But when we want to work, we start using shorthand names, often single letters. For the sets of the domain and range these are usually capital letters. I haven’t noticed much of a preference for which letters to use. D for domain and R for range have a hard-to-resist logic if we don’t really care what the sets are.

There are some sets that are used as domains or ranges a lot, and those have common shorthands. The set of real numbers is often written as R — bold, in print, or written with a double vertical stroke on the R if you’re doing this by hand. The set of whole numbers, integers, gets written as I (for integer) or J (again for integer; the letters I and J weren’t perceived as truly different things until recently) or Z (for Zahlen, German for “counting number”). There are a lot of others and don’t worry about them.

The rules for a function are generally described by a lowercase letter. It’s most commonly f, with g and h pressed into service if f won’t do. Subscrips are common also: f1, f2, fj, fn, and so on. Again, any name is allowed, as long as you’re consistent about it. But f and g and h are used as “names of functions” so often that it’s what the reader will expect they mean even without being told.

One common shorthand for saying that a function named “f” has the domain “D” and the range “R” is to use an arrow. Write out “f: D –> R”. The function name comes first, before the colon; then the domain, and an arrow, and the range. There are other notations but this is the one I see most often. This is often read aloud as “f maps D into R”. The activity of the verb “map” — well, it’s kind of action-y — suggests motion to my mind. Functions are commonly used to describe how a system changes over time. This seems mnemonic to me, as arrows suggest flow and motion. We often use the language of flowing things even for problems that don’t have anything to do with moving objects or any sense of time.

There’s another part of function-defining that has to be done, though. Most often we’re interested in domains and ranges that are both numbers, or at least collections of numbers. And we want to describe matching something in the domain with something in the range based on a formula. If “x” is a number in the domain then, say, “x2 – 4x + 4” is the corresponding number in the range.

One way to write down this rule is the way we get in introductory algebra class, and to write something like “f(x) = x2 – 4x + 4”. The “x” is, here, a dummy variable. We will never care about pinning it down to any particular number. If we write “f(3)” we mean to evaluate whatever’s on the right hand of the equals sign, using 3, the thing in parentheses, wherever “x” appears in the rule definition. In this case that would be the number 32 – 4*3 + 4 which it happens is 1. If we write “f(1 – t)” we would evaluate “(1 – t)2 – 4(1 – t) + 4” which we might want to leave as is, or might want to simplify somehow. It depends what we’re up to.

But we can also use an arrow notation, and write the same rule as “f: x –> x2 – 4x + 4”. My feeling is this notation makes it clearer that the definition isn’t itself something to solve, and that the definition doesn’t care what value x is. It should suggest how we can substitute anything for x and should do so throughout the expression to the right of the arrow.

Wikipedia asserts that when writing the rule this way there should be a vertical stroke on the left side of the arrow. This is probably a good rule, since “f: D –> R” and “f: x –> x2 – 4x + 4” are talking about different things. I’m not sure the rule is consistently followed, though. I suspect that in most contexts it’s clear what is meant.

%d bloggers like this: