## My 2019 Mathematics A To Z: Fourier series

Today’s A To Z term came to me from two nominators. One was @aajohannas, again offering a great topic. Another was Mr Wu, author of the Singapore Maths Tuition blog. I hope neither’s disappointed here.

Fourier series are named for Jean-Baptiste Joseph Fourier, and are maybe the greatest example of the theory that’s brilliantly wrong. Anyone can be wrong about something. There’s genius in being wrong in a way that gives us good new insights into things. Fourier series were developed to understand how the fluid we call “heat” flows through and between objects. Heat is not a fluid. So what? Pretending it’s a fluid gives us good, accurate results. More, you don’t need to use Fourier series to work with a fluid. Or a thing you’re pretending is a fluid. It works for lots of stuff. The Fourier series method challenged assumptions mathematicians had made about how functions worked, how continuity worked, how differential equations worked. These problems could be sorted out. It took a lot of work. It challenged and expended our ideas of functions.

Fourier also managed to hold political offices in France during the Revolution, the Consulate, the Empire, the Bourbon Restoration, the Hundred Days, and the Second Bourbon Restoration without getting killed for his efforts. If nothing else this shows the depth of his talents. Art by Thomas K Dye, creator of the web comics Projection Edge, Newshounds, Infinity Refugees, and Something Happens. He’s on Twitter as @projectionedge. You can get to read Projection Edge six months early by subscribing to his Patreon.

# Fourier series.

So, how do you solve differential equations? As long as they’re linear? There’s usually something we can do. This is one approach. It works well. It has a bit of a weird setup.

The weirdness of the setup: you want to think of functions as points in space. The allegory is rather close. Think of the common association between a point in space and the coordinates that describe that point. Pretend those are the same thing. Then you can do stuff like add points together. That is, take the coordinates of both points. Add the corresponding coordinates together. Match that sum-of-coordinates to a point. This gives us the “sum” of two points. You can subtract points from one another, again by going through their coordinates. Multiply a point by a constant and get a new point. Find the angle between two points. (This is the angle formed by the line segments connecting the origin and both points.)

Functions can work like this. You can add functions together and get a new function. Subtract one function from another. Multiply a function by a constant. It’s even possible to describe an “angle” between two functions. Mathematicians usually call that the dot product or the inner product. But we will sometimes call two functions “orthogonal”. That means the ordinary everyday meaning of “orthogonal”, if anyone said “orthogonal” in ordinary everyday life.

We can take equations of a bunch of variables and solve them. Call the values of that solution the coordinates of a point. Then we talk about finding the point where something interesting happens. Or the points where something interesting happens. We can do the same with differential equations. This is finding a point in the space of functions that makes the equation true. Maybe a set of points. So we can find a function or a family of functions solving the differential equation.

You have reasons for skepticism, even if you’ll grant me treating functions as being like points in space. You might remember solving systems of equations. You need as many equations as there are dimensions of space; a two-dimensional space needs two equations. A three-dimensional space needs three equations. You might have worked four equations in four variables. You were threatened with five equations in five variables if you didn’t all settle down. You’re not sure how many dimensions of space “all the possible functions” are. It’s got to be more than the one differential equation we started with.

This is fair. The approach I’m talking about uses the original differential equation, yes. But it breaks it up into a bunch of linear equations. Enough linear equations to match the space of functions. We turn a differential equation into a set of linear equations, a matrix problem, like we know how to solve. So that settles that.

So suppose $f(x)$ solves the differential equation. Here I’m going to pretend that the function has one independent variable. Many functions have more than this. Doesn’t matter. Everything I say here extends into two or three or more independent variables. It takes longer and uses more symbols and we don’t need that. The thing about $f(x)$ is that we don’t know what it is, but would quite like to.

What we’re going to do is choose a reference set of functions that we do know. Let me call them $g_0(x), g_1(x), g_2(x), g_3(x), \cdots$ going on to however many we need. It can be infinitely many. It certainly is at least up to some $g_N(x)$ for some big enough whole number N. These are a set of “basis functions”. For any function we want to represent we can find a bunch of constants, called coefficients. Let me use $a_0, a_1, a_2, a_3, \cdots$ to represent them. Any function we want is the sum of the coefficient times the matching basis function. That is, there’s some coefficients so that $f(x) = a_0\cdot g_0(x) + a_1\cdot g_1(x) + a_2\cdot g_2(x) + a_3\cdot g_3(x) + \cdots$

is true. That summation goes on until we run out of basis functions. Or it runs on forever. This is a great way to solve linear differential equations. This is because we know the basis functions. We know everything we care to know about them. We know their derivatives. We know everything on the right-hand side except the coefficients. The coefficients matching any particular function are constants. So the derivatives of $f(x)$, written as the sum of coefficients times basis functions, are easy to work with. If we need second or third or more derivatives? That’s no harder to work with.

You may know something about matrix equations. That is that solving them takes freaking forever. The bigger the equation, the more forever. If you have to solve eight equations in eight unknowns? If you start now, you might finish in your lifetime. For this function space? We need dozens, hundreds, maybe thousands of equations and as many unknowns. Maybe infinitely many. So we seem to have a solution that’s great apart from how we can’t use it.

Except. What if the equations we have to solve are all easy? If we have to solve a bunch that looks like, oh, $2a_0 = 4$ and $3a_1 = -9$ and $2a_2 = 10$ … well, that’ll take some time, yes. But not forever. Great idea. Is there any way to guarantee that?

It’s in the basis functions. If we pick functions that are orthogonal, or are almost orthogonal, to each other? Then we can turn the differential equation into an easy matrix problem. Not as easy as in the last paragraph. But still, not hard.

So what’s a good set of basis functions?

And here, about 800 words later than everyone was expecting, let me introduce the sine and cosine functions. Sines and cosines make great basis functions. They don’t grow without bounds. They don’t dwindle to nothing. They’re easy to differentiate. They’re easy to integrate, which is really special. Most functions are hard to integrate. We even know what they look like. They’re waves. Some have long wavelengths, some short wavelengths. But waves. And … well, it’s easy to make sets of them orthogonal.

We have to set some rules. The first is that each of these sine and cosine basis functions have a period. That is, after some time (or distance), they repeat. They might repeat before that. Most of them do, in fact. But we’re guaranteed a repeat after no longer than some period. Call that period ‘L’.

Each of these sine and cosine basis functions has to have a whole number of complete oscillations within the period L. So we can say something about the sine and cosine functions. They have to look like these: $s_j(x) = \sin\left(\frac{2\pi j}{L} x\right)$ $c_k(x) = \cos\left(\frac{2\pi k}{L} x\right)$

Here ‘j’ and ‘k’ are some whole numbers. I have two sets of basis functions at work here. Don’t let that throw you. We could have labelled them all as $g_k(x)$, with some clever scheme that told us for a given k whether it represents a sine or a cosine. It’s less hard work if we have s’s and c’s. And if we have coefficients of both a’s and b’s. That is, we suppose the function $f(x)$ is: $f(x) = \frac{1}{2}a_0 + b_1 s_1(x) + a_1 c_1(x) + b_2 s_2(x) + a_2 s_2(x) + b_3 s_3(x) + a_3 c_3(x) + \cdots$

This, at last, is the Fourier series. Each function has its own series. A “series” is a summation. It can be of finitely many terms. It can be of infinitely many. Often infinitely many terms give more interesting stuff. Like this, for example. Oh, and there’s a bare $\frac{1}{2}a_0$ there, not multiplied by anything more complicated. It makes life easier. It lets us see that the Fourier series for, like, 3 + f(x) is the same as the Fourier series for f(x), except for the leading term. The ½ before that makes easier some work that’s outside the scope of this essay. Accept it as one of the merry, wondrous appearances of ‘2’ in mathematics expressions.

It’s great for solving differential equations. It’s also great for encryption. The sines and the cosines are standard functions, after all. We can send all the information we need to reconstruct a function by sending the coefficients for it. This can also help us pick out signal from noise. Noise has a Fourier series that looks a particular way. If you take the coefficients for a noisy signal and remove that? You can get a good approximation of the original, noiseless, signal.

This all seems great. That’s a good time to feel skeptical. First, like, not everything we want to work with looks like waves. Suppose we need a function that looks like a parabola. It’s silly to think we can add a bunch of sines and cosines and get a parabola. Like, a parabola isn’t periodic, to start with.

So it’s not. To use Fourier series methods on something that’s not periodic, we use a clever technique: we tell a fib. We declare that the period is something bigger than we care about. Say the period is, oh, ten million years long. A hundred light-years wide. Whatever. We trust that the difference between the function we do want, and the function that we calculate, will be small. We trust that if someone ten million years from now and a hundred light-years away wishes to complain about our work, we will be out of the office that day. Letting the period L be big enough is a good reliable tool.

The other thing? Can we approximate any function as a Fourier series? Like, at least chunks of parabolas? Polynomials? Chunks of exponential growths or decays? What about sawtooth functions, that rise and fall? What about step functions, that are constant for a while and then jump up or down?

The answer to all these questions is “yes,” although drawing out the word and raising a finger to say there are some issues we have to deal with. One issue is that most of the time, we need an infinitely long series to represent a function perfectly. This is fine if we’re trying to prove things about functions in general rather than solve some specific problem. It’s no harder to write the sum of infinitely many terms than the sum of finitely many terms. You write an ∞ symbol instead of an N in some important places. But if we want to solve specific problems? We probably want to deal with finitely many terms. (I hedge that statement on purpose. Sometimes it turns out we can find a formula for all the infinitely many coefficients.) This will usually give us an approximation of the $f(x)$ we want. The approximation can be as good as we want, but to get a better approximation we need more terms. Fair enough. This kind of tradeoff doesn’t seem too weird.

Another issue is in discontinuities. If $f(x)$ jumps around? If it has some point where it’s undefined? If it has corners? Then the Fourier series has problems. Summing up sines and cosines can’t give us a sudden jump or a gap or anything. Near a discontinuity, the Fourier series will get this high-frequency wobble. A bigger jump, a bigger wobble. You may not blame the series for not representing a discontinuity. But it does mean that what is, otherwise, a pretty good match for the $f(x)$ you want gets this region where it stops being so good a match.

That’s all right. These issues aren’t bad enough, or unpredictable enough, to keep Fourier series from being powerful tools. Even when we find problems for which sines and cosines are poor fits, we use this same approach. Describe a function we would like to know as the sums of functions we choose to work with. Fourier series are one of those ideas that helps us solve problems, and guides us to new ways to solve problems.

This is my last big essay for the week. All of Fall 2019 A To Z posts should be at this link. The letter G should get its chance on Tuesday and H next Thursday. I intend to have A To Z essays should be available at this link. If you’d like to nominate topics for essays, I’m asking for the letters I through N at this link. Thank you.

## Reading the Comics, July 8, 2016: Filling Out The Week Edition

When I split last week’s mathematically-themed comics I had just supposed there’d be some more on Friday to note. Live and learn, huh? Well, let me close out last week with a not-too-long essay. Better a couple of these than a few Reading the Comics posts long enough to break your foot on.

Adrian Raeside’s The Other Coastfor the 6th uses mathematics as a way to judge the fit and the unfit. (And Daryl isn’t even far wrong.) It’s understandable and the sort of thing people figure should flatter mathematicians. But it also plays on 19th-century social-Darwinist/eugenicist ideas which try binding together mental acuity and evolutionary “superiority”. It’s a cute joke but there is a nasty undercurrent.

Wayno’s Waynovisionfor the 6th is this essay’s pie chart. Good to have. Hilary Price’s Rhymes With Orangefor the 7th of July, 2016. I don’t know how valid it is; I don’t use the bar stools, myself.

Hilary Price’s Rhymes With Orangefor the 7th is this essay’s Venn Diagram joke. Good to have.

Rich Powell’s Wide Open for the 7th shows a Western-style “Convolution Kid”. It’s shown here as just shouting numbers in-between a count so as to mess things up. That matches the ordinary definition and I’m amused with it as-is. Convolution is a good mathematical function, though one I don’t remember encountering until a couple years into my undergraduate career. It’s a binary operation, one that takes two functions and combines them into a new function. It turns out to be a natural way to understand signal processing. The original signal is one function. The way a processor changes a signal is another function. The convolution of the two is what actually comes out of the processing. Dividing this lets us study the behaviors of the processor separate from a particular problem.

And it turns up in other contexts. We can use convolution to solve differential equations, which turn up everywhere. We need to solve the differential equation for a special particular boundary condition, one called the Dirac delta function. That’s a really weird one. You have no idea. And it can require incredible ingenuity to find a solution. But once you have, you can find solutions for every boundary condition. You convolute the solution for the special case and the boundary condition you’re interested in, and there you go. The work may be particularly hard for this one case, but it is only the one case. Daniel Beyer’s Long Story Shortfor the 9th of July, 2016. The link’s probably good for a month or so. If you’re in the far future don’t worry about telling me how the link turned out, though. It’s not that important that I know.

Daniel Beyer’s Long Story Shortfor the 9th is this essay’s mathematical symbols joke. Good to have.

## Some More Stuff To Read

I’ve actually got enough comics for yet another Reading The Comics post. But rather than overload my Recent Posts display with those I’ll share some pointers to other stuff I think worth looking at.

So remember how the other day I said polynomials were everything? And I tried to give some examples of things you might not expect had polynomials tied to them? Here’s one I forgot. Howard Phillips, of the HowardAt58 blog, wrote recently about discrete signal processing, the struggle to separate real patterns from random noise. It’s a hard problem. If you do very little filtering, then meaningless flutterings can look like growing trends. If you do a lot of filtering, then you miss rare yet significant events and you take a long time to detect changes. Either can be mistakes. The study of a filter’s characteristics … well, you’ll see polynomials. A lot.

For something else to read, and one that doesn’t get into polynomials, here’s a post from Stephen Cavadino of the CavMaths blog, abut the areas of lunes. Lunes are … well, they’re kind of moon-shaped figures. Cavadino particularly writes about the Theorem of Not That Hippocrates. Start with a half circle. Draw a symmetric right triangle inside the circle. Draw half-circles off the two equal legs of that right triangle. The area between the original half-circle and the newly-drawn half circles is … how much? The answer may surprise you.

Cavadino doesn’t get into this, but: it’s possible to make a square that has the same area as these strange crescent shapes using only straightedge and compass. Not That Hippocrates knew this. It’s impossible to make a square with the exact same area as a circle using only straightedge and compass. But these figures, with edges that are defined by circles of just the right relative shapes, they’re fine. Isn’t that wondrous?

And this isn’t mathematics but what the heck. Have you been worried about the Chandler Wobble? Apparently there’s been a bit of a breakthrough in understanding it. Turns out water melting can change the Earth’s rotation enough to be noticed. And to have been noticed since the 1890s.

## Reading the Comics, February 2, 2016: Pre-Lottery Edition

So a couple weeks ago one of the multi-state lotteries in the United States reached a staggering jackpot of one and a half billion dollars. And it turns out that “a couple weeks” is about the lead time most syndicated comic strip artists maintain. So there’s a rash of lottery-themed comic strips. There’s enough of them that I’m going to push those off to the next Reading the Comics installment. I’ll make do here with what Comic Strip master Command sent us before thoughts of the lottery infiltrated folks’ heads. Bud Blake’s Tiger for the 28th of January, 2016. I do like Punkinhead’s look of dismay in the second panel that Tiger has failed him.

Bud Blake’s Tiger for the 28th of January (a rerun; Blake’s been dead a long while) is a cute one about kids not understanding numbers. And about expectations of those who know more than you, I suppose. I’d say this is my favorite of this essay’s strips. Part of that is that it reminds me of a bit in one of the lesser Wizard of Oz books. In it the characters have to count by twos to seventeen to make a successful wish. That’s the sort of problem you expect in fairy lands and quick gags.

Mort Walker’s Beetle Bailey (Vintage) from the 7th of July, 1959 (reprinted the 28th of January) also tickles me. It uses the understanding of mathematics as stand-in for the understanding of science. I imagine it’s also meant to stand in for intelligence. It’s also a good riff on the Sisyphean nature of teaching. The equations on the board at the end almost look meaningful. At least, I can see some resemblance between them and the equations describing orbital mechanics. Camp Swampy hasn’t got any obvious purpose or role today. But the vintage strips reveal it had some role in orbital rocket launches. This was in the late 50s, before orbital rockets worked. Mort Walker’s Beetle Bailey (Vintage) for the 7th of July, 1959. This is possibly the brightest I’ve ever seen Beetle, and he doesn’t know what he’s looking at.

Matt Lubchansky’s Please Listen To Me for the 28th of January is a riff on creationist “teach the controversy” nonsense. So we get some nonsense about a theological theory of numbers. Historically, especially in the western tradition, much great mathematics was done by theologians. Lazy histories of science make out religion as the relentless antagonist to scientific knowledge. It’s not so.

The equation from the last panel, $F(x) = \mathcal{L}\left\{f(t)\right\} = \int_0^{\infty} e^{-st} f(t) dt$, is a legitimate one. It describes the Laplace Transform of the function f(t). It’s named for Pierre-Simon Laplace. That name might be familiar from mathematical physics, astronomy, the “nebular” hypothesis of planet formation, probability, and so on. Laplace transforms have many uses. One is in solving differential equations. They can change a differential equation, hard to solve, to a polynomial, easy to solve. Then by inverting the Laplace transform you can solve the original, hard, differential equation.

Another major use that I’m familiar with is signal processing. Often we will have some data, a signal, that changes in time or in space. The Laplace transform lets us look at the frequency distribution. That is, what regularly rising and falling patterns go in to making up the signal (or could)? If you’ve taken a bit of differential equations this might sound like it’s just Fourier series. It’s related. (If you don’t know what a Fourier series might be, don’t worry. I bet we’ll come around to discussing it someday.) It might also remind readers here of the z-transform and yes, there’s a relationship.

The transform also shows itself in probability. We’re often interested in the probability distribution of a quantity. That’s what the possible values it might have are, and how likely each of those values is. The Laplace transform lets us switch between the probability distribution and a thing called the moment-generating function. I’m not sure of an efficient way of describing what good that is. If you do, please, leave a comment. But it lets you switch from one description of a thing to another. And your problem might be easier in the other description.

John McPherson’s Close To Home for the 30th of January uses mathematics as the sort of thing that can have an answer just, well, you see it. I suppose only geography would lend itself to a joke like this (“What state is Des Moines in?”)

Scott Adams’s Dilbert for the 31st of January mentions Zeno’s Paradox, three thousand years old and still going strong. I haven’t heard the paradox used as an excuse to put off doing work. It does remind me of the old saw that half your time is spent on the first 90 percent of the project, and half your time on the remaining 10 percent. It’s absurd but truthful, as so many things are.

Samson’s Dark Side Of The Horse for the 2nd of February (I’m skipping some lottery strips to get here) plays on the merger of the ideas of “turn my life completely around” and “turn around 360 degrees”. A perfect 360 degree rotation would be an “identity tranformation”, leaving the thing it’s done to unchanged. But I understand why the terms merged. As with many English words or terms, “all the way around” can mean opposite things.

But anyone playing pinball or taking time-lapse photographs or just listening to Heraclitus can tell you. Turning all the way around does not leave you quite what you were before. People aren’t perfect at rotations, and even if they were, the act of breaking focus and coming back to it changes what one’s doing.

## z-transform.

The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)

It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a0 represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.

a1, a2, a3, and so on are the first, second, third, and so on samples after the reference start. a-1, a-2, a-3, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.

So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a-4 = 77, a-3 = 81, a-2 = 84, a-1 = 82, a0 = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.

The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0$. (z1 is the same as z. z0 is the same as the number “1”. I wrote it this way to make the pattern more clear.)

I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a1 = 83 and a2 = 76. Then the z-transform for this set of data would be $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$. You’d probably agree that’s not a polynomial, although it looks a lot like one.

The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.

A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression $77 z^4 + 81 z^3 + 84 z^2 + 81 z^1 + 78 z^0 + 83 \left(\frac{1}{z}\right)^1 + 76 \left(\frac{1}{z}\right)^2$.

Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure $\cdots + 0z^2 + 0z^1 + 1 + 1\left(\frac{1}{z}\right)^1 + 1\left(\frac{1}{z}\right)^2 + 1\left(\frac{1}{z}\right)^3 + 1\left(\frac{1}{z}\right)^4 + \cdots$. And that turns out to be the same as $1 \div \left(1 - \left(\frac{1}{z}\right)\right)$. That’s much shorter to write down, at least.

Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a0, and adding to that one-fifth of yesterday’s temperature, a-1, and one-fifth of the day before’s temperature a-2, and one-fifth a-3, and one-fifth a-4.

The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?

We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.