## My All 2020 Mathematics A to Z: Jacobi Polynomials

Mr Wu, author of the Singapore Maths Tuition blog, gave me a good nomination for this week’s topic: the j-function of number theory. Unfortunately I concluded I didn’t understand the function well enough to write about it. So I went to a topic of my own choosing instead.

The Jacobi Polynomials discussed here are named for Carl Gustav Jacob Jacobi. Jacobi lived in Prussia in the first half of the 19th century. Though his career was short, it was influential. I’ve already discussed the Jacobian, which describes how changes of variables change volume. He has a host of other things named for him, most of them in matrices or mathematical physics. He was also a pioneer in those elliptic curves you hear so much about these days.

# Jacobi Polynomials.

Jacobi Polynomials are a family of functions. Polynomials, it happens; this is a happy case where the name makes sense. “Family” is the name mathematicians give to a bunch of functions that have some similarity. This often means there’s a parameter, and each possible value of the parameter describes a different function in the family. For example, we talk about the family of sine functions, $S_n(z)$. For every integer n we have the function $S_n(z) = \sin(n z)$ where z is a real number between -π and π.

We like a family because every function in it gives us some nice property. Often, the functions play nice together, too. This is often something like mutual orthogonality. This means two different representatives of the family are orthogonal to one another. “Orthogonal” means “perpendicular”. We can talk about functions being perpendicular to one another through a neat mechanism. It comes from vectors. It’s easy to use vectors to represent how to get from one point in space to another. From vectors we define a dot product, a way of multiplying them together. A dot product has to meet a couple rules that are pretty easy to do. And if you don’t do anything weird? Then the dot product between two vectors is the cosine of the angle made by the end of the first vector, the origin, and the end of the second vector.

Functions, it turns out, meet all the rules for a vector space. (There are not many rules to make a vector space.) And we can define something that works like a dot product for two functions. Take the integral, over the whole domain, of the first function times the second. This meets all the rules for a dot product. (There are not many rules to make a dot product.) Did you notice me palm that card? When I did not say “the dot product is take the integral …”? That card will come back. That’s for later. For now: we have a vector space, we have a dot product, we can take arc-cosines, so why not define the angle between functions?

Mostly we don’t because we don’t care. Where we do care? We do like functions that are at right angles to one another. As with most things mathematicians do, it’s because it makes life easier. We’ll often want to describe properties of a function we don’t yet know. We can describe the function we don’t yet know as the sum of coefficients — some fixed real number — times basis functions that we do know. And then our problem of finding the function changes to one of finding the coefficients. If we picked a set of basis functions that are all orthogonal to one another, the finding of these coefficients gets easier. Analytically and numerically: we can often turn each coefficient into its own separate problem. Let a different computer, or at least computer process, work on each coefficient and get the full answer much faster.

The Jacobi Polynomials have three coefficients. I see them most often labelled α, β, and n. Likely you imagine this means it’s a huge family. It is huger than that. A zoologist would call this a superfamily, at least. Probably an order, possibly a class.

It turns out different relationships of these coefficients give you families of functions. Many of these families are noteworthy enough to have their own names. For example, if α and β are both zero, then the Jacobi functions are a family also known as the Legendre Polynomials. This is a great set of orthogonal polynomials. And the roots of the Legendre Polynomials give you information needed for Gaussian quadrature. Gaussian quadrature is a neat trick for numerically integrating a function. Take a weighted sum of the function you’re integrating evaluated at a set of points. This can get a very good — maybe even perfect — numerical estimate of the integral. The points to use, and the weights to use, come from a Legendre polynomial.

If α and β are both $-\frac{1}{2}$ then the Jacobi Polynomials are the Chebyshev Polynomials of the first kind. (There’s also a second kind.) These are handy in approximation theory, describing ways to better interpolate a polynomial from a set of data. They also have a neat, peculiar relationship to the multiple-cosine formulas. Like, $\cos(2\theta) = 2\cos^2(\theta) - 1$. And the second Chebyshev polynomial is $T_2(x) = 2x^2 - 1$. Imagine sliding between x and $cos(\theta)$ and you see the relationship. $cos(3\theta) = 4 \cos^3(\theta) - 3\cos(\theta)$ and $T_3(x) = 4x^3 - 3x$. And so on.

Chebyshev Polynomials have some superpowers. One that’s most amazing is accelerating convergence. Often a numerical process, such as finding the solution of an equation, is an iterative process. You can’t find the answer all at once. You instead find an approximation and do something that improves it. Each time you do the process, you get a little closer to the true answer. This can be fine. But, if the problem you’re working on allows it, you can use the first couple iterations of the solution to figure out where this is going. The result is that you can get very good answers using the same amount of computer time you needed to just get decent answers. The trade, of course, is that you need to understand Chebyshev Polynomials and accelerated convergence. We always have to make trades like that.

Back to the Jacobi Polynomials family. If α and β are the same number, then the Jacobi functions are a family called the Gegenbauer Polynomials. These are great in mathematical physics, in potential theory. You can turn the gravitational or electrical potential function — that one-over-the-distance-squared force — into a sum of better-behaved functions. And they also describe zonal spherical harmonics. These let you represent functions on the surface of a sphere as the sum of coefficients times basis functions. They work in much the way the terms of a Fourier series do.

If β is zero and there’s a particular relationship between α and n that I don’t want to get into? The Jacobi Polynomials become the Zernike Polynomials, which I never heard of before this paragraph either. I read they are the tools you need to understand optics, and particularly how lenses will alter the light passing through.

Since the Jacobi Polynomials have a greater variety of form than even poison ivy has, you’ll forgive me not trying to list them. Or even listing a representative sample. You might also ask how they’re related at all.

Well, they all solve the same differential equation, for one. Not literally a single differential equation. A family of differential equations, where α and β and n turn up in the coefficients. The formula using these coefficients is the same in all these differential equations. That’s a good reason to see a relationship. Or we can write the Jacobi Polynomials as a series, a function made up of the sum of terms. The coefficients for each of the terms depends on α and β and n, always in the same way. I’ll give you that formula. You won’t like it and won’t ever use it. The Jacobi Polynomial for a particular α, β, and n is the polynomial

$P_n^{(\alpha, \beta)}(z) = (n+\alpha)!(n + \beta)!\sum_{s=0}^n \frac{1}{s!(n + \alpha - s)!(\beta + s)!(n - s)!}\left(\frac{z-1}{2}\right)^{n-s}\left(\frac{z + 1}{2}\right)^s$

Its domain, by the way, is the real numbers from -1 to 1. We need something for the domain. It turns out there’s nothing you can do on the real numbers that you can’t fit into the domain from -1 to 1 anyway. (If you have to do something on, say, the interval from 10 to 54? Do a change of variable, scaling things down and moving them, and use -1 to 1. Then undo that change when you’re done.) The range is the real numbers, as you’d expect.

(You maybe noticed I used ‘z’ for the independent variable there, rather than ‘x’. Usually using ‘z’ means we expect this to be a complex number. But ‘z’ here is definitely a real number. This is because we can also get to the Jacobi Polynomials through the hypergeometric series, a function I don’t want to get into. But for the hypergeometric series we are open to the variable being a complex number. So many references carry that ‘z’ back into Jacobi Polynomials.)

Another thing which links these many functions is recurrence. If you know the Jacobi Polynomial for one set of parameters — and you do; $P_0^{(\alpha, \beta)}(z) = 1$ — you can find others. You do this in a way rather like how you find new terms in the Fibonacci series by adding together terms you already know. These formulas can be long. Still, if you know $P_{n-1}^{(\alpha, \beta)}$ and $P_{n-2}^{(\alpha, \beta)}$ for the same α and β? Then you can calculate $P_n^{(\alpha, \beta)}$ with nothing more than pen, paper, and determination. If it helps,

$P_1^{(\alpha, \beta)}(z) = (\alpha + 1) + (\alpha + \beta + 2)\frac{z - 1}{2}$

and this is true for any α and β. You’ll never do anything with that. This is fine.

There is another way that all these many polynomials are related. It goes back to their being orthogonal. We measured orthogonality by a dot product. Back when I palmed that card I told you was the integral of the two functions multiplied together. This is indeed a dot product. We can define others. We make those others by taking a weighted integral of the product of these two functions. That is, integrate the two functions times a third, a weight function. Of course there’s reasons to do this; they amount to deciding that some parts of the domain are more important than others. The weight function can be anything that meets a few rules. If you want to get the Jacobi Polynomials out of them, you start with the function $P_0^{(\alpha, \beta)}(z) = 1$ and the weight function

$w_n(z) = (1 - z)^{\alpha} (1 + z)^{\beta}$

As I say, though, you’ll never use that. If you’re eager and ready to leap into this work you can use this to build a couple Legendre Polynomials. Or Chebyshev Polynomials. For the full Jacobi Polynomials, though? Use, like, the command JacobiP[n, a, b, z] in Mathematica, or jacobiP(n, a, b, z) in Matlab. Other people have programmed this for you. Enjoy their labor.

In my work I have not used the full set of Jacobi Polynomials much. There’s more of them than I need. I do rely on the Legendre Polynomials, and the Chebyshev Polynomials. Other mathematicians use other slices regularly. It is stunning to sometimes look and realize that these many functions, different as they look, are reflections of one another, though. Mathematicians like to generalize, and find one case that covers as many things as possible. It’s rare that we are this successful.

I thank you for reading this. All of this year’s A-to-Z essays should be available at this link. The essays from every A-to-Z sequence going back to 2015 should be at this link. And I’m already looking ahead to the M, N, and O essays that I’ll be writing the day before publication instead of the week before like I want! I appreciate any nominations you have, even ones I can’t cover fairly.

## My 2019 Mathematics A To Z: Fourier series

Today’s A To Z term came to me from two nominators. One was @aajohannas, again offering a great topic. Another was Mr Wu, author of the Singapore Maths Tuition blog. I hope neither’s disappointed here.

Fourier series are named for Jean-Baptiste Joseph Fourier, and are maybe the greatest example of the theory that’s brilliantly wrong. Anyone can be wrong about something. There’s genius in being wrong in a way that gives us good new insights into things. Fourier series were developed to understand how the fluid we call “heat” flows through and between objects. Heat is not a fluid. So what? Pretending it’s a fluid gives us good, accurate results. More, you don’t need to use Fourier series to work with a fluid. Or a thing you’re pretending is a fluid. It works for lots of stuff. The Fourier series method challenged assumptions mathematicians had made about how functions worked, how continuity worked, how differential equations worked. These problems could be sorted out. It took a lot of work. It challenged and expended our ideas of functions.

Fourier also managed to hold political offices in France during the Revolution, the Consulate, the Empire, the Bourbon Restoration, the Hundred Days, and the Second Bourbon Restoration without getting killed for his efforts. If nothing else this shows the depth of his talents.

# Fourier series.

So, how do you solve differential equations? As long as they’re linear? There’s usually something we can do. This is one approach. It works well. It has a bit of a weird setup.

The weirdness of the setup: you want to think of functions as points in space. The allegory is rather close. Think of the common association between a point in space and the coordinates that describe that point. Pretend those are the same thing. Then you can do stuff like add points together. That is, take the coordinates of both points. Add the corresponding coordinates together. Match that sum-of-coordinates to a point. This gives us the “sum” of two points. You can subtract points from one another, again by going through their coordinates. Multiply a point by a constant and get a new point. Find the angle between two points. (This is the angle formed by the line segments connecting the origin and both points.)

Functions can work like this. You can add functions together and get a new function. Subtract one function from another. Multiply a function by a constant. It’s even possible to describe an “angle” between two functions. Mathematicians usually call that the dot product or the inner product. But we will sometimes call two functions “orthogonal”. That means the ordinary everyday meaning of “orthogonal”, if anyone said “orthogonal” in ordinary everyday life.

We can take equations of a bunch of variables and solve them. Call the values of that solution the coordinates of a point. Then we talk about finding the point where something interesting happens. Or the points where something interesting happens. We can do the same with differential equations. This is finding a point in the space of functions that makes the equation true. Maybe a set of points. So we can find a function or a family of functions solving the differential equation.

You have reasons for skepticism, even if you’ll grant me treating functions as being like points in space. You might remember solving systems of equations. You need as many equations as there are dimensions of space; a two-dimensional space needs two equations. A three-dimensional space needs three equations. You might have worked four equations in four variables. You were threatened with five equations in five variables if you didn’t all settle down. You’re not sure how many dimensions of space “all the possible functions” are. It’s got to be more than the one differential equation we started with.

This is fair. The approach I’m talking about uses the original differential equation, yes. But it breaks it up into a bunch of linear equations. Enough linear equations to match the space of functions. We turn a differential equation into a set of linear equations, a matrix problem, like we know how to solve. So that settles that.

So suppose $f(x)$ solves the differential equation. Here I’m going to pretend that the function has one independent variable. Many functions have more than this. Doesn’t matter. Everything I say here extends into two or three or more independent variables. It takes longer and uses more symbols and we don’t need that. The thing about $f(x)$ is that we don’t know what it is, but would quite like to.

What we’re going to do is choose a reference set of functions that we do know. Let me call them $g_0(x), g_1(x), g_2(x), g_3(x), \cdots$ going on to however many we need. It can be infinitely many. It certainly is at least up to some $g_N(x)$ for some big enough whole number N. These are a set of “basis functions”. For any function we want to represent we can find a bunch of constants, called coefficients. Let me use $a_0, a_1, a_2, a_3, \cdots$ to represent them. Any function we want is the sum of the coefficient times the matching basis function. That is, there’s some coefficients so that

$f(x) = a_0\cdot g_0(x) + a_1\cdot g_1(x) + a_2\cdot g_2(x) + a_3\cdot g_3(x) + \cdots$

is true. That summation goes on until we run out of basis functions. Or it runs on forever. This is a great way to solve linear differential equations. This is because we know the basis functions. We know everything we care to know about them. We know their derivatives. We know everything on the right-hand side except the coefficients. The coefficients matching any particular function are constants. So the derivatives of $f(x)$, written as the sum of coefficients times basis functions, are easy to work with. If we need second or third or more derivatives? That’s no harder to work with.

You may know something about matrix equations. That is that solving them takes freaking forever. The bigger the equation, the more forever. If you have to solve eight equations in eight unknowns? If you start now, you might finish in your lifetime. For this function space? We need dozens, hundreds, maybe thousands of equations and as many unknowns. Maybe infinitely many. So we seem to have a solution that’s great apart from how we can’t use it.

Except. What if the equations we have to solve are all easy? If we have to solve a bunch that looks like, oh, $2a_0 = 4$ and $3a_1 = -9$ and $2a_2 = 10$ … well, that’ll take some time, yes. But not forever. Great idea. Is there any way to guarantee that?

It’s in the basis functions. If we pick functions that are orthogonal, or are almost orthogonal, to each other? Then we can turn the differential equation into an easy matrix problem. Not as easy as in the last paragraph. But still, not hard.

So what’s a good set of basis functions?

And here, about 800 words later than everyone was expecting, let me introduce the sine and cosine functions. Sines and cosines make great basis functions. They don’t grow without bounds. They don’t dwindle to nothing. They’re easy to differentiate. They’re easy to integrate, which is really special. Most functions are hard to integrate. We even know what they look like. They’re waves. Some have long wavelengths, some short wavelengths. But waves. And … well, it’s easy to make sets of them orthogonal.

We have to set some rules. The first is that each of these sine and cosine basis functions have a period. That is, after some time (or distance), they repeat. They might repeat before that. Most of them do, in fact. But we’re guaranteed a repeat after no longer than some period. Call that period ‘L’.

Each of these sine and cosine basis functions has to have a whole number of complete oscillations within the period L. So we can say something about the sine and cosine functions. They have to look like these:

$s_j(x) = \sin\left(\frac{2\pi j}{L} x\right)$

$c_k(x) = \cos\left(\frac{2\pi k}{L} x\right)$

Here ‘j’ and ‘k’ are some whole numbers. I have two sets of basis functions at work here. Don’t let that throw you. We could have labelled them all as $g_k(x)$, with some clever scheme that told us for a given k whether it represents a sine or a cosine. It’s less hard work if we have s’s and c’s. And if we have coefficients of both a’s and b’s. That is, we suppose the function $f(x)$ is:

$f(x) = \frac{1}{2}a_0 + b_1 s_1(x) + a_1 c_1(x) + b_2 s_2(x) + a_2 s_2(x) + b_3 s_3(x) + a_3 c_3(x) + \cdots$

This, at last, is the Fourier series. Each function has its own series. A “series” is a summation. It can be of finitely many terms. It can be of infinitely many. Often infinitely many terms give more interesting stuff. Like this, for example. Oh, and there’s a bare $\frac{1}{2}a_0$ there, not multiplied by anything more complicated. It makes life easier. It lets us see that the Fourier series for, like, 3 + f(x) is the same as the Fourier series for f(x), except for the leading term. The ½ before that makes easier some work that’s outside the scope of this essay. Accept it as one of the merry, wondrous appearances of ‘2’ in mathematics expressions.

It’s great for solving differential equations. It’s also great for encryption. The sines and the cosines are standard functions, after all. We can send all the information we need to reconstruct a function by sending the coefficients for it. This can also help us pick out signal from noise. Noise has a Fourier series that looks a particular way. If you take the coefficients for a noisy signal and remove that? You can get a good approximation of the original, noiseless, signal.

This all seems great. That’s a good time to feel skeptical. First, like, not everything we want to work with looks like waves. Suppose we need a function that looks like a parabola. It’s silly to think we can add a bunch of sines and cosines and get a parabola. Like, a parabola isn’t periodic, to start with.

So it’s not. To use Fourier series methods on something that’s not periodic, we use a clever technique: we tell a fib. We declare that the period is something bigger than we care about. Say the period is, oh, ten million years long. A hundred light-years wide. Whatever. We trust that the difference between the function we do want, and the function that we calculate, will be small. We trust that if someone ten million years from now and a hundred light-years away wishes to complain about our work, we will be out of the office that day. Letting the period L be big enough is a good reliable tool.

The other thing? Can we approximate any function as a Fourier series? Like, at least chunks of parabolas? Polynomials? Chunks of exponential growths or decays? What about sawtooth functions, that rise and fall? What about step functions, that are constant for a while and then jump up or down?

The answer to all these questions is “yes,” although drawing out the word and raising a finger to say there are some issues we have to deal with. One issue is that most of the time, we need an infinitely long series to represent a function perfectly. This is fine if we’re trying to prove things about functions in general rather than solve some specific problem. It’s no harder to write the sum of infinitely many terms than the sum of finitely many terms. You write an ∞ symbol instead of an N in some important places. But if we want to solve specific problems? We probably want to deal with finitely many terms. (I hedge that statement on purpose. Sometimes it turns out we can find a formula for all the infinitely many coefficients.) This will usually give us an approximation of the $f(x)$ we want. The approximation can be as good as we want, but to get a better approximation we need more terms. Fair enough. This kind of tradeoff doesn’t seem too weird.

Another issue is in discontinuities. If $f(x)$ jumps around? If it has some point where it’s undefined? If it has corners? Then the Fourier series has problems. Summing up sines and cosines can’t give us a sudden jump or a gap or anything. Near a discontinuity, the Fourier series will get this high-frequency wobble. A bigger jump, a bigger wobble. You may not blame the series for not representing a discontinuity. But it does mean that what is, otherwise, a pretty good match for the $f(x)$ you want gets this region where it stops being so good a match.

That’s all right. These issues aren’t bad enough, or unpredictable enough, to keep Fourier series from being powerful tools. Even when we find problems for which sines and cosines are poor fits, we use this same approach. Describe a function we would like to know as the sums of functions we choose to work with. Fourier series are one of those ideas that helps us solve problems, and guides us to new ways to solve problems.

This is my last big essay for the week. All of Fall 2019 A To Z posts should be at this link. The letter G should get its chance on Tuesday and H next Thursday. I intend to have A To Z essays should be available at this link. If you’d like to nominate topics for essays, I’m asking for the letters I through N at this link. Thank you.

## My Math Blog Statistics, November 2014

October 2014 was my fourth-best month in the mathematics blog here, if by “best” we mean “has a number of page views”, and second-best if by “best” we mean “has a number of unique visitors”. And now November 2014 has taken October’s place on both counts, by having bigger numbers for both page views and visitors, as WordPress reveals such things to me. Don’t tell October; that’d just hurt its feelings. Plus, I got to the 19,000th page view, and as of right now I’m sitting at 19,181; it’s conceivable I might reach my 20,000th viewer this month, though that would be a slight stretch.

But the total number of page views grew from 625 up to 674, and the total number of visitors from 323 to 366. The number of page views is the highest since May 2014 (751), although this is the greatest number of visitors since January 2014 (473), the second month when WordPress started revealing those numbers to us mere bloggers. I like the trends, though; since June the number of visitors has been growing at a pretty steady rate, although steadily enough I can’t say whether it’s an arithmetic or geometric progression. (In an arithmetic progression, the difference between two successive numbers is about constant, for example: 10, 15, 20, 25, 30, 35, 40. In a geometric progression, the ratio between two successive numbers is about constant, for example: 10, 15, 23, 35, 53, 80, 120.) Views per visitor dropped from 1.93 to 1.84, although I’m not sure even that is a really significant difference.

The countries sending me the most readers were just about the same set as last month: the United States at 458; Canada recovering from a weak October with 27 viewers; Argentina at 20; Austria and the United Kingdom tied at 19; Australia at 17; Germany at 16 and Puerto Rico at 14.

Sending only one reader this month were: Belgium, Bermuda, Croatia, Estonia, Guatemala, Hong Kong, Italy, Lebanon, Malaysia, the Netherlands, Norway, Oman, the Philippines, Romania, Singapore, South Korea, and Sweden. (Really, Singapore? I’m a little hurt. I used to live there.) The countries repeating that from October were Estonia, the Netherlands, Norway, and Sweden; Sweden’s going on three months with just a single reader each. I don’t know what’s got me only slightly read in Scandinavia and the Balkans.

My most-read articles for November were pretty heavily biased towards the comics, with a side interest in that Pythagorean triangle problem with an inscribed circle. Elke Stangl had wondered about the longevity of my most popular posts, and I was curious too, so I’m including in brackets a note about the number of days between the first and the last view which WordPress has on record. This isn’t a perfect measure of longevity, especially for the most recent posts, but it’s a start.

As ever there’s no good search term poetry, but among the things that brought people here were:

• trapezoid
• how many grooves are on one side of an lp record?
• origin is the gateway to your entire gaming universe.
• cauchy funny things done
• trapezoid funny
• yet another day with no plans to use algebra

Won’t lie; that last one feels a little personal. But the “origin is the gateway” thing keeps turning up and I don’t know why. I’d try to search for it but that’d just bring me back here, leaving me no more knowledgeable, wouldn’t it?

## Echoing “Fourier Echoes Euler”

The above tweet is from the Analysis Fact of The Day feed, which for the 5th had a neat little bit taken from Joseph Fourier’s The Analytic Theory Of Heat, published 1822. Fourier was trying to at least describe the way heat moves through objects, and along the way he developed thing called Fourier series and a field called Fourier Analysis. In this we treat functions — even ones we don’t yet know — as sinusoidal waves, overlapping and interfering with and reinforcing one another.

If we have infinitely many of these waves we can approximate … well, not every function, but surprisingly close to all the functions that might represent real-world affairs, and surprisingly near all the functions we’re interested in anyway. The advantage of representing functions as sums of sinusoidal waves is that sinusoidal waves are very easy to differentiate and integrate, and to add together those differentials and integrals, and that means we can turn problems that are extremely hard into problems that may be longer, but are made up of much easier parts. Since usually it’s better to do something that’s got many easy steps than it is to do something with a few hard ones, Fourier series and Fourier analysis are some of the things you get to know well as you become a mathematician.

The “Fourier Echoes Euler” page linked here shows simply one nice, sweet result that Fourier proved in that major work. It demonstrates what you get if, for absolutely any real number x, you add together $\cos\left(x\right) - \frac12 \cos\left(2x\right) + \frac13 \cos\left(3x\right) - \frac14 \cos\left(4x\right) + \frac15 \cos\left(5x\right) - \cdots$ et cetera. There’s one step in it — “integration by parts” — that you’ll have to remember from freshman calculus, or maybe I’ll get around to explaining that someday, but I would expect most folks reading this far could follow this neat result.