## From my Sixth A-to-Z: Taylor Series

By the time of 2019 and my sixth A-to-Z series , I had some standard narrative tricks I could deploy. My insistence that everything is polynomials, for example. Anecdotes from my slight academic career. A prose style that emphasizes what we do with the idea of something rather than instructions. That last comes from the idea that if you wanted to know how to compute a Taylor series you’d just look it up on Mathworld or Wikipedia or whatnot. The thing a pop mathematics blog can do is give some reason that you’d want to know how to compute a Taylor series. I regret talking about functions that break Taylor series, though. I have to treat these essays as introducing the idea of a Taylor series to someone who doesn’t know anything about them. And it’s bad form to teach how stuff doesn’t work too close to teaching how it does work. Readers tend to blur what works and what doesn’t together. Still, $f(x) = \exp(-\frac{1}{x^2})$ is a really neat weird function and it’d be a shame to let it go completely unmentioned.

Today’s A To Z term was nominated by APMA, author of the Everybody Makes DATA blog. It was a topic that delighted me to realize I could explain. Then it started to torment me as I realized there is a lot to explain here, and I had to pick something. So here’s where things ended up.

# Taylor Series.

In the mid-2000s I was teaching at a department being closed down. In its last semester I had to teach Computational Quantum Mechanics. The person who’d normally taught it had transferred to another department. But a few last majors wanted the old department’s version of the course, and this pressed me into the role. Teaching a course you don’t really know is a rush. It’s a semester of learning, and trying to think deeply enough that you can convey something to students. This while all the regular demands of the semester eat your time and working energy. And this in the leap of faith that the syllabus you made up, before you truly knew the subject, will be nearly enough right. And that you have not committed to teaching something you do not understand.

So around mid-course I realized I needed to explain finding the wave function for a hydrogen atom with two electrons. The wave function is this probability distribution. You use it to find things like the probability a particle is in a certain area, or has a certain momentum. Things like that. A proton with one electron is as much as I’d ever done, as a physics major. We treat the proton as the center of the universe, immobile, and the electron hovers around that somewhere. Two electrons, though? A thing repelling your electron, and repelled by your electron, and neither of those having fixed positions? What the mathematics of that must look like terrified me. When I couldn’t procrastinate it farther I accepted my doom and read exactly what it was I should do.

It turned out I had known what I needed for nearly twenty years already. Got it in high school.

Of course I’m discussing Taylor Series. The equations were loaded down with symbols, yes. But at its core, the important stuff, was this old and trusted friend.

The premise behind a Taylor Series is even older than that. It’s universal. If you want to do something complicated, try doing the simplest thing that looks at all like it. And then make that a little bit more like you want. And then a bit more. Keep making these little improvements until you’ve got it as right as you truly need. Put that vaguely, the idea describes Taylor series just as well as it describes making a video game or painting a state portrait. We can make it more specific, though.

A series, in this context, means the sum of a sequence of things. This can be finitely many things. It can be infinitely many things. If the sum makes sense, we say the series converges. If the sum doesn’t, we say the series diverges. When we first learn about series, the sequences are all numbers. $1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \cdots$, for example, which diverges. (It adds to a number bigger than any finite number.) Or $1 + \frac{1}{2^2} + \frac{1}{3^2} + \frac{1}{4^2} + \cdots$, which converges. (It adds to $\frac{1}{6}\pi^2$.)

In a Taylor Series, the terms are all polynomials. They’re simple polynomials. Let me call the independent variable ‘x’. Sometimes it’s ‘z’, for the reasons you would expect. (‘x’ usually implies we’re looking at real-valued functions. ‘z’ usually implies we’re looking at complex-valued functions. ‘t’ implies it’s a real-valued function with an independent variable that represents time.) Each of these terms is simple. Each term is the distance between x and a reference point, raised to a whole power, and multiplied by some coefficient. The reference point is the same for every term. What makes this potent is that we use, potentially, many terms. Infinitely many terms, if need be.

Call the reference point ‘a’. Or if you prefer, x0. z0 if you want to work with z’s. You see the pattern. This ‘a’ is the “point of expansion”. The coefficients of each term depend on the original function at the point of expansion. The coefficient for the term that has $(x - a)$ is the first derivative of f, evaluated at a. The coefficient for the term that has $(x - a)^2$ is the second derivative of f, evaluated at a (times a number that’s the same for the squared-term for every Taylor Series). The coefficient for the term that has $(x - a)^3$ is the third derivative of f, evaluated at a (times a different number that’s the same for the cubed-term for every Taylor Series).

You’ll never guess what the coefficient for the term with $(x - a)^{122,743}$ is. Nor will you ever care. The only reason you would wish to is to answer an exam question. The instructor will, in that case, have a function that’s either the sine or the cosine of x. The point of expansion will be 0, $\frac{\pi}{2}$, $\pi$, or $\frac{3\pi}{2}$.

Otherwise you will trust that this is one of the terms of $(x - a)^n$, ‘n’ representing some counting number too great to be interesting. All the interesting work will be done with the Taylor series either truncated to a couple terms, or continued on to infinitely many.

What a Taylor series offers is the chance to approximate a function we’re genuinely interested in with a polynomial. This is worth doing, usually, because polynomials are easier to work with. They have nice analytic properties. We can automate taking their derivatives and integrals. We can set a computer to calculate their value at some point, if we need that. We might have no idea how to start calculating the logarithm of 1.3. We certainly have an idea how to start calculating $0.3 - \frac{1}{2}(0.3^2) + \frac{1}{3}(0.3^3)$. (Yes, it’s 0.3. I’m using a Taylor series with a = 1 as the point of expansion.)

The first couple terms tell us interesting things. Especially if we’re looking at a function that represents something physical. The first two terms tell us where an equilibrium might be. The next term tells us whether an equilibrium is stable or not. If it is stable, it tells us how perturbations, points near the equilibrium, behave.

The first couple terms will describe a line, or a quadratic, or a cubic, some simple function like that. Usually adding more terms will make this Taylor series approximation a better fit to the original. There might be a larger region where the polynomial and the original function are close enough. Or the difference between the polynomial and the original function will be closer together on the same old region.

We would really like that region to eventually grow to the whole domain of the original function. We can’t count on that, though. Roughly, the interval of convergence will stretch from ‘a’ to wherever the first weird thing happens. Weird things are, like, discontinuities. Vertical asymptotes. Anything you don’t like dealing with in the original function, the Taylor series will refuse to deal with. Outside that interval, the Taylor series diverges and we just can’t use it for anything meaningful. Which is almost supernaturally weird of them. The Taylor series uses information about the original function, but it’s all derivatives at a single point. Somehow the derivatives of, say, the logarithm of x around x = 1 give a hint that the logarithm of 0 is undefinable. And so they won’t help us calculate the logarithm of 3.

Things can be weirder. There are functions that just break Taylor series altogether. Some are obvious. A function needs lots of derivatives at a point to have a good Taylor series approximation. So, many fractal curves won’t have a Taylor series approximation. These curves are all corners, points where they aren’t continuous or where derivatives don’t exist. Some are obviously designed to break Taylor series approximations. We can make a function that follows different rules if x is rational than if x is irrational. There’s no approximating that, and you’d blame the person who made such a function, not the Taylor series. It can be subtle. The function defined by the rule $f(x) = \exp{-\frac{1}{x^2}}$, with the note that if x is zero then f(x) is 0, seems to satisfy everything we’d look for. It’s a function that’s mostly near 1, that drops down to being near zero around where x = 0. But its Taylor series expansion around a = 0 is a horizontal line always at 0. The interval of convergence can be a single point, challenging our idea of what an interval is.

That’s all right. If we can trust that we’re avoiding weird parts, Taylor series give us an outstanding new tool. Grant that the Taylor series describes a function with the same rule as our original function. The Taylor series is often easier to work with, especially if we’re working on differential equations. We can automate, or at least find formulas for, taking the derivative of a polynomial. Or adding together derivatives of polynomials. Often we can attack a differential equation too hard to solve otherwise by supposing the answer is a polynomial. This is essentially what that quantum mechanics problem used, and why the tool was so familiar when I was in a strange land.

Roughly. What I was actually doing was treating the function I wanted as a power series. This is, like the Taylor series, the sum of a sequence of terms, all of which are $(x - a)^n$ times some coefficient. What makes it not a Taylor series is that the coefficients weren’t the derivatives of any function I knew to start. But the experience of Taylor series trained me to look at functions as things which could be approximated by polynomials.

This gives us the hint to look at other series that approximate interesting functions. We get a host of these, with names like Laurent series and Fourier series and Chebyshev series and such. Laurent series look like Taylor series but we allow powers to be negative integers as well as positive ones. Fourier series do away with polynomials. They instead use trigonometric functions, sines and cosines. Chebyshev series build on polynomials, but not on pure powers. They’ll use orthogonal polynomials. These behave like perpendicular directions do. That orthogonality makes many numerical techniques behave better.

The Taylor series is a great introduction to these tools. Its first several terms have good physical interpretations. Its calculation requires tools we learn early on in calculus. The habits of thought it teaches guides us even in unfamiliar territory.

And I feel very relieved to be done with this. I often have a few false starts to an essay, but those are mostly before I commit words to text editor. This one had about four branches that now sit in my scrap file. I’m glad to have a deadline forcing me to just publish already.

Thank you, though. This and the essays for the Fall 2019 A to Z should be at this link. Next week: the letters U and V. And all past A to Z essays ought to be at this link.

## My 2019 Mathematics A To Z: Wallis Products

Today’s A To Z term was suggested by Dina Yagodich, whose YouTube channel features many topics, including calculus and differential equations, statistics, discrete math, and Matlab. Matlab is especially valuable to know as a good quick calculation can answer many questions.

# Wallis Products.

The Wallis named here is John Wallis, an English clergyman and mathematician and cryptographer. His most tweetable work is how we follow his lead in using the symbol ∞ to represent infinity. But he did much in calculus. And it’s a piece of that which brings us to today. He particularly noticed this:

$\frac{1}{2}\pi = \frac{2}{1}\cdot \frac{2}{3}\cdot \frac{4}{3}\cdot \frac{4}{5}\cdot \frac{6}{5}\cdot \frac{6}{7}\cdot \frac{8}{7}\cdot \frac{8}{9}\cdot \frac{10}{9}\cdot \frac{10}{11}\cdots$

This is an infinite product. It’s multiplication’s answer to the infinite series. It always amazes me when an infinite product works. There are dangers when you do anything with an infinite number of terms. Even the basics of arithmetic, like that you can change the order in which you calculate but still get the same result, break down. Series, in which you add together infinitely many things, are risky, but I’m comfortable with the rules to know when the sum can be trusted. Infinite products seem more mysterious. Then you learn an infinite product converges if and only if the series made from the logarithms of the terms in it also converges. Then infinite products seem less exciting.

There are many infinite products that give us π. Some work quite efficiently, giving us lots of digits for a few terms’ work. Wallis’s formula does not. We need about a thousand terms for it to get us a π of about 3.141. This is a bit much to calculate even today. In 1656, when he published it in Arithmetica Infinitorum, a book I have never read? Wallis was able to do mental arithmetic well. His biography at St Andrews says once when having trouble sleeping he calculated the square root of a 53-digit number in his head, and in the morning, remembered it, and was right. Still, this would be a lot of work. How could Wallis possibly do it? And what work could possibly convince anyone else that he was right?

As it common to striking discoveries it was a mixture of insight and luck and persistence and pattern recognition. He seems to have started with pondering the value of

$\int_0^1 \left(1 - x^2\right)^{\frac{1}{2}} dx$

Happily, he knew exactly what this was: $\frac{1}{4}\pi$. He knew this because of a bit of insight. We can interpret the integral here as asking for the area that’s enclosed, on a Cartesian coordinate system, by the positive x-axis, the positive y-axis, and the set of points which makes true the equation $y = \left(1 - x^2\right)^\frac{1}{2}$. This curve is the upper half of a circle with radius 1 and centered on the origin. The area enclosed by all this is one-fourth the area of a circle of radius 1. So that’s how he could know the value of the integral, without doing any symbol manipulation.

The question, in modern notation, would be whether he could do that integral. And, for this? He couldn’t. But, unable to do the problem he wanted, he tried doing the most similar problem he could and see what that proved. $\left(1 - x^2\right)^{\frac{1}{2}}$ was beyond his power to integrate; but what if he swapped those exponents? Worked on $\left(1 - x^{\frac{1}{2}}\right)^2$instead? This would not — could not — give him what he was interested in. But it would give him something he could calculate. So can we:

$\int_0^1 \left(1 - x^{\frac{1}{2}}\right)^2 dx = \int_0^1 1 - 2x^{\frac{1}{2}} + x dx = 1 - 2\cdot\frac{2}{3} + \frac{1}{2} = \frac{1}{6}$

And now here comes persistence. What if it’s not $x^{\frac{1}{2}}$ inside the parentheses there? If it’s x raised to some other unit fraction instead? What if the parentheses aren’t raised to the second power, but to some other whole number? Might that reveal something useful? Each of these integrals is calculable, and he calculated them. He worked out a table for many values of

$\int_0^1 \left(1 - x^{\frac{1}{p}}\right)^q dx$

for different sets of whole numbers p and q. He trusted that if he kept this up, he’d find some interesting pattern. And he does. The integral, for example, always turns out to be a unit fraction. And there’s a deeper pattern. Let me share results for different values of p and q; the integral is the reciprocal of the number inside the table. The topmost row is values of q; the leftmost column is values of p.

0 1 2 3 4 5 6 7
0 1 1 1 1 1 1 1 1
1 1 2 3 4 5 6 7 7
2 1 3 6 10 15 21 28 36
3 1 4 10 20 35 56 84 120
4 1 5 15 35 70 126 210 330
5 1 6 21 56 126 252 462 792
6 1 7 28 84 210 462 924 1716
7 1 8 36 120 330 792 1716 3432

There is a deep pattern here, although I’m not sure Wallis noticed that one. Look along the diagonals, running from lower-left to upper-right. These are the coefficients of the binomial expansion. Yang Hui’s triangle, if you prefer. Pascal’s triangle, if you prefer that. Let me call the term in row p, column q of this table $a_{p, q}$. Then

$a_{p, q} = \frac{(p + 1)!}{p! q!}$

Great material, anyway. The trouble is that it doesn’t help Wallis with the original problem, which — in this notation — would have $p = \frac12$ and $q = \frac12$. What he really wanted was the Binomial Theorem, but western mathematicians didn’t know it yet. Here a bit of luck comes in. He had noticed there’s a relationship between terms in one column and terms in another, particularly, that

$a_{p, q} = \frac{p + q}{q} a_{p, q - 1}$

So why shouldn’t that hold if p and q aren’t whole numbers? … We would today say why should they hold? But Wallis was working with a different idea of mathematical rigor. He made assumptions that it turned out in this case were correct. Of course, had he been wrong, we wouldn’t have heard of any of this and I would have an essay on some other topic.

With luck in Wallis’s favor we can go back to making a table. What would the row for $p = \frac12$ look like? We’ll need both whole and half-integers. $p = \frac12, q = 1$ is easy; its reciprocal is 1. $p = \frac12, q = \frac12$ is also easy; that’s the insight Wallis had to start with. Its reciprocal is $\frac{4}{\pi}$. What about the rest? Use the equation just up above, relating $a_{p, q}$ to $a_{p, q - 1}$; then we can start to fill in:

0 1/2 1 3/2 2 5/2 3 7/2
1/2 1 $\frac{4}{\pi}$ $\frac{3}{2}$ $\frac{4}{3}\frac{4}{\pi}$ $\frac{3\cdot 5}{2\cdot 4}$ $\frac{2\cdot 4}{5}\frac{4}{\pi}$ $\frac{3\cdot 5\cdot 7}{2\cdot 4\cdot 6}$ $\frac{2\cdot 2\cdot 4\cdot 4}{5\cdot 7}\frac{4}{\pi}$

Anything we can learn from this? … Well, sure. For one, as we go left to right, all these entries are increasing. So, like, the second column is less than the third which is less than the fourth. Here’s a triple inequality for you:

$\frac{4}{\pi} < \frac{3}{2} < \frac{4}{3}\frac{4}{\pi}$

Multiply all that through by, on, $\frac{2}{\pi}$. And then divide it all through by $\frac{3}{2}$. What have we got?

$\frac{2\cdot 2}{3} < \frac{\pi}{2} < \frac{2\cdot 2}{3}\cdot \frac{2\cdot 2}{3}$

I did some rearranging of terms, but, that’s the pattern. One-half π has to be between $\frac{2\cdot 2}{3}$ and four-thirds that.

Move over a little. Start from the row where $q = \frac32$. This starts us out with

$\frac{4}{3}\frac{4}{\pi} < \frac{3}{2} < \frac{2\cdot 4}{5}\frac{4}{\pi}$

Multiply everything by $\frac{\pi}{4}$, and divide everything by $\frac{3}{2}$ and follow with some symbol manipulation. And here’s a tip which would have saved me some frustration working out my notes: $\frac{\pi}{4} = \frac{\pi}{2}\cdot\frac{3}{6}$. Also, 6 equals 2 times 3. Later on, you may want to remember that 8 equals 2 times 4. All this gets us eventually to

$\frac{2\cdot 2\cdot 4\cdot 4}{3\cdot 3\cdot 5} < \frac{\pi}{2} < \frac{2\cdot 2\cdot 4\cdot 4}{3\cdot 3\cdot 5}\cdot \frac{6}{5}$

Move over to the next terms, starting from $q = \frac52$. This will get us eventually to

$\frac{2\cdot 2\cdot 4\cdot 4 \cdot 6 \cdot 6}{3\cdot 3\cdot 5\cdot 5\cdot 7} < \frac{\pi}{2} < \frac{2\cdot 2\cdot 4\cdot 4 \cdot 6 \cdot 6}{3\cdot 3\cdot 5\cdot 5\cdot 7}\cdot \frac{8}{7}$

You see the pattern here. Whatever the value of $\frac{\pi}{2}$, it’s squeezed between some number, on the left side of this triple inequality, and that same number times … uh … something like $\frac{10}{9}$ or $\frac{12}{11}$ or $\frac{14}{13}$ or $\frac{1,000,000,000,002}{1,000,000,000,001}$. That last one is a number very close to 1. So the conclusion is that $\frac{\pi}{2}$ has to equal whatever that pattern is making for the number on the left there.

We can make this more rigorous. Like, we don’t have to just talk about squeezing the number we want between two nearly-equal values. We can rely on the use of the … Squeeze Theorem … to prove this is okay. And there’s much we have to straighten out. Particularly, we really don’t want to write out expressions like

$\frac{2\cdot 2 \cdot 4\cdot 4\cdot 6\cdot 6\cdot 8\cdot 8 \cdot 10\cdot 10 \cdots}{3\cdot 3\cdot 5\cdot 5 \cdot 7\cdot 7 \cdot 9\cdot 9 \cdot 11\cdot 11 \cdots}$

Put that way, it looks like, well, we can divide each 3 in the denominator into a 6 in the numerator to get a 2, each 5 in the denominator to a 10 in the numerator to get a 2, and so on. We get a product that’s infinitely large, instead of anything to do with π. This is that problem where arithmetic on infinitely long strings of things becomes dangerous. To be rigorous, we need to write this product as the limit of a sequence, with finite numerator and denominator, and be careful about how to compose the numerators and denominators.

But this is all right. Wallis found a lovely result and in a way that’s common to much work in mathematics. It used a combination of insight and persistence, with pattern recognition and luck making a great difference. Often when we first find something the proof of it is rough, and we need considerable work to make it rigorous. The path that got Wallis to these products is one we still walk.

There’s just three more essays to go this year! I hope to have the letter X published here, Thursday. All the other A-to-Z essays for this year are also at that link. And past A-to-Z essays are at this link. Thanks for reading.

## My 2019 Mathematics A To Z: Taylor Series

Today’s A To Z term was nominated by APMA, author of the Everybody Makes DATA blog. It was a topic that delighted me to realize I could explain. Then it started to torment me as I realized there is a lot to explain here, and I had to pick something. So here’s where things ended up.

# Taylor Series.

In the mid-2000s I was teaching at a department being closed down. In its last semester I had to teach Computational Quantum Mechanics. The person who’d normally taught it had transferred to another department. But a few last majors wanted the old department’s version of the course, and this pressed me into the role. Teaching a course you don’t really know is a rush. It’s a semester of learning, and trying to think deeply enough that you can convey something to students. This while all the regular demands of the semester eat your time and working energy. And this in the leap of faith that the syllabus you made up, before you truly knew the subject, will be nearly enough right. And that you have not committed to teaching something you do not understand.

So around mid-course I realized I needed to explain finding the wave function for a hydrogen atom with two electrons. The wave function is this probability distribution. You use it to find things like the probability a particle is in a certain area, or has a certain momentum. Things like that. A proton with one electron is as much as I’d ever done, as a physics major. We treat the proton as the center of the universe, immobile, and the electron hovers around that somewhere. Two electrons, though? A thing repelling your electron, and repelled by your electron, and neither of those having fixed positions? What the mathematics of that must look like terrified me. When I couldn’t procrastinate it farther I accepted my doom and read exactly what it was I should do.

It turned out I had known what I needed for nearly twenty years already. Got it in high school.

Of course I’m discussing Taylor Series. The equations were loaded down with symbols, yes. But at its core, the important stuff, was this old and trusted friend.

The premise behind a Taylor Series is even older than that. It’s universal. If you want to do something complicated, try doing the simplest thing that looks at all like it. And then make that a little bit more like you want. And then a bit more. Keep making these little improvements until you’ve got it as right as you truly need. Put that vaguely, the idea describes Taylor series just as well as it describes making a video game or painting a state portrait. We can make it more specific, though.

A series, in this context, means the sum of a sequence of things. This can be finitely many things. It can be infinitely many things. If the sum makes sense, we say the series converges. If the sum doesn’t, we say the series diverges. When we first learn about series, the sequences are all numbers. $1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \cdots$, for example, which diverges. (It adds to a number bigger than any finite number.) Or $1 + \frac{1}{2^2} + \frac{1}{3^2} + \frac{1}{4^2} + \cdots$, which converges. (It adds to $\frac{1}{6}\pi^2$.)

In a Taylor Series, the terms are all polynomials. They’re simple polynomials. Let me call the independent variable ‘x’. Sometimes it’s ‘z’, for the reasons you would expect. (‘x’ usually implies we’re looking at real-valued functions. ‘z’ usually implies we’re looking at complex-valued functions. ‘t’ implies it’s a real-valued function with an independent variable that represents time.) Each of these terms is simple. Each term is the distance between x and a reference point, raised to a whole power, and multiplied by some coefficient. The reference point is the same for every term. What makes this potent is that we use, potentially, many terms. Infinitely many terms, if need be.

Call the reference point ‘a’. Or if you prefer, x0. z0 if you want to work with z’s. You see the pattern. This ‘a’ is the “point of expansion”. The coefficients of each term depend on the original function at the point of expansion. The coefficient for the term that has $(x - a)$ is the first derivative of f, evaluated at a. The coefficient for the term that has $(x - a)^2$ is the second derivative of f, evaluated at a (times a number that’s the same for the squared-term for every Taylor Series). The coefficient for the term that has $(x - a)^3$ is the third derivative of f, evaluated at a (times a different number that’s the same for the cubed-term for every Taylor Series).

You’ll never guess what the coefficient for the term with $(x - a)^{122,743}$ is. Nor will you ever care. The only reason you would wish to is to answer an exam question. The instructor will, in that case, have a function that’s either the sine or the cosine of x. The point of expansion will be 0, $\frac{\pi}{2}$, $\pi$, or $\frac{3\pi}{2}$.

Otherwise you will trust that this is one of the terms of $(x - a)^n$, ‘n’ representing some counting number too great to be interesting. All the interesting work will be done with the Taylor series either truncated to a couple terms, or continued on to infinitely many.

What a Taylor series offers is the chance to approximate a function we’re genuinely interested in with a polynomial. This is worth doing, usually, because polynomials are easier to work with. They have nice analytic properties. We can automate taking their derivatives and integrals. We can set a computer to calculate their value at some point, if we need that. We might have no idea how to start calculating the logarithm of 1.3. We certainly have an idea how to start calculating $0.3 - \frac{1}{2}(0.3^2) + \frac{1}{3}(0.3^3)$. (Yes, it’s 0.3. I’m using a Taylor series with a = 1 as the point of expansion.)

The first couple terms tell us interesting things. Especially if we’re looking at a function that represents something physical. The first two terms tell us where an equilibrium might be. The next term tells us whether an equilibrium is stable or not. If it is stable, it tells us how perturbations, points near the equilibrium, behave.

The first couple terms will describe a line, or a quadratic, or a cubic, some simple function like that. Usually adding more terms will make this Taylor series approximation a better fit to the original. There might be a larger region where the polynomial and the original function are close enough. Or the difference between the polynomial and the original function will be closer together on the same old region.

We would really like that region to eventually grow to the whole domain of the original function. We can’t count on that, though. Roughly, the interval of convergence will stretch from ‘a’ to wherever the first weird thing happens. Weird things are, like, discontinuities. Vertical asymptotes. Anything you don’t like dealing with in the original function, the Taylor series will refuse to deal with. Outside that interval, the Taylor series diverges and we just can’t use it for anything meaningful. Which is almost supernaturally weird of them. The Taylor series uses information about the original function, but it’s all derivatives at a single point. Somehow the derivatives of, say, the logarithm of x around x = 1 give a hint that the logarithm of 0 is undefinable. And so they won’t help us calculate the logarithm of 3.

Things can be weirder. There are functions that just break Taylor series altogether. Some are obvious. A function needs lots of derivatives at a point to have a good Taylor series approximation. So, many fractal curves won’t have a Taylor series approximation. These curves are all corners, points where they aren’t continuous or where derivatives don’t exist. Some are obviously designed to break Taylor series approximations. We can make a function that follows different rules if x is rational than if x is irrational. There’s no approximating that, and you’d blame the person who made such a function, not the Taylor series. It can be subtle. The function defined by the rule $f(x) = \exp{-\frac{1}{x^2}}$, with the note that if x is zero then f(x) is 0, seems to satisfy everything we’d look for. It’s a function that’s mostly near 1, that drops down to being near zero around where x = 0. But its Taylor series expansion around a = 0 is a horizontal line always at 0. The interval of convergence can be a single point, challenging our idea of what an interval is.

That’s all right. If we can trust that we’re avoiding weird parts, Taylor series give us an outstanding new tool. Grant that the Taylor series describes a function with the same rule as our original function. The Taylor series is often easier to work with, especially if we’re working on differential equations. We can automate, or at least find formulas for, taking the derivative of a polynomial. Or adding together derivatives of polynomials. Often we can attack a differential equation too hard to solve otherwise by supposing the answer is a polynomial. This is essentially what that quantum mechanics problem used, and why the tool was so familiar when I was in a strange land.

Roughly. What I was actually doing was treating the function I wanted as a power series. This is, like the Taylor series, the sum of a sequence of terms, all of which are $(x - a)^n$ times some coefficient. What makes it not a Taylor series is that the coefficients weren’t the derivatives of any function I knew to start. But the experience of Taylor series trained me to look at functions as things which could be approximated by polynomials.

This gives us the hint to look at other series that approximate interesting functions. We get a host of these, with names like Laurent series and Fourier series and Chebyshev series and such. Laurent series look like Taylor series but we allow powers to be negative integers as well as positive ones. Fourier series do away with polynomials. They instead use trigonometric functions, sines and cosines. Chebyshev series build on polynomials, but not on pure powers. They’ll use orthogonal polynomials. These behave like perpendicular directions do. That orthogonality makes many numerical techniques behave better.

The Taylor series is a great introduction to these tools. Its first several terms have good physical interpretations. Its calculation requires tools we learn early on in calculus. The habits of thought it teaches guides us even in unfamiliar territory.

And I feel very relieved to be done with this. I often have a few false starts to an essay, but those are mostly before I commit words to text editor. This one had about four branches that now sit in my scrap file. I’m glad to have a deadline forcing me to just publish already.

Thank you, though. This and the essays for the Fall 2019 A to Z should be at this link. Next week: the letters U and V. And all past A to Z essays ought to be at this link.

## Reading the Comics, May 23, 2018: Nice Warm Gymnasium Edition

I haven’t got any good ideas for the title for this collection of mathematically-themed comic strips. But I was reading the Complete Peanuts for 1999-2000 and just ran across one where Rerun talked about consoling his basketball by bringing it to a nice warm gymnasium somewhere. So that’s where that pile of words came from.

Mark Anderson’s Andertoons for the 21st is the Mark Anderson’s Andertoons for this installment. It has Wavehead suggest a name for the subtraction of fractions. It’s not by itself an absurd idea. Many mathematical operations get specialized names, even though we see them as specific cases of some more general operation. This may reflect the accidents of history. We have different names for addition and subtraction, though we eventually come to see them as the same operation.

In calculus we get introduced to Maclaurin Series. These are polynomials that approximate more complicated functions. They’re the best possible approximations for a region around 0 in the domain. They’re special cases of the Taylor Series. Those are polynomials that approximate more complicated functions. But you get to pick where in the domain they should be the best approximation. Maclaurin series are nothing but a Taylor series; we keep the names separate anyway, for the reasons. And slightly baffling ones; James Gregory and Brook Taylor studied Taylor series before Colin Maclaurin did Maclaurin series. But at least Taylor worked on Taylor series, and Maclaurin on Macularin series. So for a wonder mathematicians named these things for appropriate people. (Ignoring that Indian mathematicians were poking around this territory centuries before the Europeans were. I don’t know whether English mathematicians of the 18th century could be expected to know of Indian work in the field, in fairness.)

In numerical calculus, we have a scheme for approximating integrals known as the trapezoid rule. It approximates the areas under curves by approximating a curve as a trapezoid. (Any questions?) But this is one of the Runge-Kutta methods. Nobody calls it that except to show they know neat stuff about Runge-Kutta methods. The special names serve to pick out particularly interesting or useful cases of a more generally used thing. Wavehead’s coinage probably won’t go anywhere, but it doesn’t hurt to ask.

Percy Crosby’s Skippy for the 22nd I admit I don’t quite understand. It mentions arithmetic anyway. I think it’s a joke about a textbook like this being good only if it’s got the questions and the answers. But it’s the rare Skippy that’s as baffling to me as most circa-1930 humor comics are.

Ham’s Life on Earth for the 23rd presents the blackboard full of symbols as an attempt to prove something challenging. In this case, to say something about the existence of God. It’s tempting to suppose that we could say something about the existence or nonexistence of God using nothing but logic. And there are mathematics fields that are very close to pure logic. But our scary friends in the philosophy department have been working on the ontological argument for a long while. They’ve found a lot of arguments that seem good, and that fall short for reasons that seem good. I’ll defer to their experience, and suppose that any mathematics-based proof to have the same problems.

Bill Amend’s FoxTrot Classics for the 23rd deploys a Maclaurin series. If you want to calculate the cosine of an angle, and you know the angle in radians, you can find the value by adding up the terms in an infinitely long series. So if θ is the angle, measured in radians, then its cosine will be:

$\cos\left(\theta\right) = \sum_{k = 0}^{\infty} \left(-1\right)^k \frac{\theta^k}{k!}$

60 degrees is $\frac{\pi}{3}$ in radians and you see from the comic how to turn this series into a thing to calculate. The series does, yes, go on forever. But since the terms alternate in sign — positive then negative then positive then negative — you have a break. Suppose all you want is the answer to within an error margin. Then you can stop adding up terms once you’ve gotten to a term that’s smaller than your error margin. So if you want the answer to within, say, 0.001, you can stop as soon as you find a term with absolute value less than 0.001.

For high school trig, though, this is all overkill. There’s five really interesting angles you’d be expected to know anything about. They’re 0, 30, 45, 60, and 90 degrees. And you need to know about reflections of those across the horizontal and vertical axes. Those give you, like, -30 degrees or 135 degrees. Those reflections don’t change the magnitude of the cosines or sines. They might change the plus-or-minus sign is all. And there’s only three pairs of numbers that turn up for these five interesting angles. There’s 0 and 1. There’s $\frac{1}{2}$ and $\frac{\sqrt{3}}{2}$. There’s $\frac{1}{\sqrt{2}}$ and $\frac{1}{\sqrt{2}}$. Three things to memorize, plus a bit of orienteering, to know whether the cosine or the sine should be the larger size and whether they should positive or negative. And then you’ve got them all.

You might get asked for, like, the sine of 15 degrees. But that’s someone testing whether you know the angle-addition or angle-subtraction formulas. Or the half-angle and double-angle formulas. Nobody would expect you to know the cosine of 15 degrees. The cosine of 30 degrees, though? Sure. It’s $\frac{\sqrt{3}}{2}$.

Mike Thompson’s Grand Avenue for the 23rd is your basic confused-student joke. People often have trouble going from percentages to decimals to fractions and back again. Me, I have trouble in going from percentage chances to odds, as in, “two to one odds” or something like that. (Well, “one to one odds” I feel confident in, and “two to one” also. But, say, “seven to five odds” I can’t feel sure I understand, other than that the second choice is a perceived to be a bit more likely than the first.)

… You know, this would have parsed as the Maclaurin Series Edition, wouldn’t it? Well, if only I were able to throw away words I’ve already written and replace them with better words before publishing, huh?

I was all set to say how complaining about GoComics.com’s pages not loading had gotten them fixed. But they only worked for Monday alone; today they’re broken again. Right. I haven’t tried sending an error report again; we’ll see if that works. Meanwhile, I’m still not through last week’s comic strips and I had just enough for one day to nearly enough justify an installment for the one day. Should finish off the rest of the week next essay, probably in time for next week.

Mark Leiknes’s Cow and Boy rerun for the 23rd circles around some of Zeno’s Paradoxes. At the heart of some of them is the question of whether a thing can be divided infinitely many times, or whether there must be some smallest amount of a thing. Zeno wonders about space and time, but you can do as well with substance, with matter. Mathematics majors like to say the problem is easy; Zeno just didn’t realize that a sum of infinitely many things could be a finite and nonzero number. This misses the good question of how the sum of infinitely many things, none of which are zero, can be anything but infinitely large? Or, put another way, what’s different in adding $\frac11 + \frac12 + \frac13 + \frac14 + \cdots$ and adding $\frac11 + \frac14 + \frac19 + \frac{1}{16} + \cdots$ that the one is infinitely large and the other not?

Or how about this. Pick your favorite string of digits. 23. 314. 271828. Whatever. Add together the series $\frac11 + \frac12 + \frac13 + \frac14 + \cdots$except that you omit any terms that have your favorite string there. So, if you picked 23, don’t add $\frac{1}{23}$, or $\frac{1}{123}$, or $\frac{1}{802301}$ or such. That depleted series does converge. The heck is happening there? (Here’s why it’s true for a single digit being thrown out. Showing it’s true for longer strings of digits takes more work but not really different work.)

J C Duffy’s Lug Nuts for the 23rd is, I think, the first time I have to give a content warning for one of these. It’s a porn-movie advertisement spoof. But it mentions Einstein and Pi and has the tagline “she didn’t go for eggheads … until he showed her a new equation!”. So, you know, it’s using mathematics skill as a signifier of intelligence and riffing on the idea that nerds like sex too.

John Graziano’s Ripley’s Believe It or Not for the 23rd has a trivia that made me initially think “not”. It notes Vince Parker, Senior and Junior, of Alabama were both born on Leap Day, the 29th of February. I’ll accept this without further proof because of the very slight harm that would befall me were I to accept this wrongly. But it also asserted this was a 1-in-2.1-million chance. That sounded wrong. Whether it is depends on what you think the chance is of.

Because what’s the remarkable thing here? That a father and son have the same birthday? Surely the chance of that is 1 in 365. The father could be born any day of the year; the son, also any day. Trusting there’s no influence of the father’s birthday on the son’s, then, 1 in 365 it is. Or, well, 1 in about 365.25, since there are leap days. There’s approximately one leap day every four years, so, surely that, right?

And not quite. In four years there’ll be 1,461 days. Four of them will be the 29th of January and four the 29th of September and four the 29th of August and so on. So if the father was born any day but leap day (a “non-bissextile day”, if you want to use a word that starts a good fight in a Scrabble match), the chance the son’s birth is the same is 4 chances in 1,461. 1 in 365.25. If the father was born on Leap Day, then the chance the son was born the same day is only 1 chance in 1,461. Still way short of 1-in-2.1-million. So, Graziano’s Ripley’s is wrong if that’s the chance we’re looking at.

Ah, but what if we’re looking at a different chance? What if we’re looking for the chance that the father is born the 29th of February and the son is also born the 29th of February? There’s a 1-in-1,461 chance the father’s born on Leap Day. And a 1-in-1,461 chance the son’s born on Leap Day. And if those events are independent, the father’s birth date not influencing the son’s, then the chance of both those together is indeed 1 in 2,134,521. So Graziano’s Ripley’s is right if that’s the chance we’re looking at.

Which is a good reminder: if you want to work out the probability of some event, work out precisely what the event is. Ordinary language is ambiguous. This is usually a good thing. But it’s fatal to discussing probability questions sensibly.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 23rd presents his mathematician discovering a new set of numbers. This will happen. Mathematics has had great success, historically, finding new sets of things that look only a bit like numbers were understood. And showing that if they follow rules that are, as much as possible, like the old numbers, we get useful stuff out of them. The mathematician claims to be a formalist, in the punch line. This is a philosophy that considers mathematical results to be the things you get by starting with some symbols and some rules for manipulating them. What this stuff means, and whether it reflects anything of interest in the real world, isn’t of interest. We can know the results are good because they follow the rules.

This sort of approach can be fruitful. It can force you to accept results that are true but intuition-defying. And it can give results impressive confidence. You can even, at least in principle, automate the creating and the checking of logical proofs. The disadvantages are that it takes forever to get anything done. And it’s hard to shake the idea that we ought to have some idea what any of this stuff means.

## Something Cute I Never Noticed Before About Infinite Sums

This is a trifle, for which I apologize. I’ve been sick. But I ran across this while reading Carl B Boyer’s The History of the Calculus and its Conceptual Development. This is from the chapter “A Century Of Anticipation”, developments leading up to Newton and Leibniz and The Calculus As We Know It. In particular, while working out the indefinite integrals for simple powers — x raised to a whole number — John Wallis, whom you’ll remember from such things as the first use of the ∞ symbol and beating up Thomas Hobbes for his lunch money, noted this:

$\frac{0 + 1}{1 + 1} = \frac{1}{2}$

Which is fine enough. But then Wallis also noted that

$\frac{0 + 1 + 2}{2 + 2 + 2} = \frac{1}{2}$

And furthermore that

$\frac{0 + 1 + 2 + 3}{3 + 3 + 3 + 3} = \frac{1}{2}$

$\frac{0 + 1 + 2 + 3 + 4}{4 + 4 + 4 + 4 + 4} = \frac{1}{2}$

$\frac{0 + 1 + 2 + 3 + 4 + 5}{5 + 5 + 5 + 5 + 5 + 5} = \frac{1}{2}$

And isn’t that neat? Wallis goes on to conclude that this is true not just for finitely many terms in the numerator and denominator, but also if you carry on infinitely far. This seems like a dangerous leap to make, but they treated infinities and infinitesimals dangerously in those days.

What makes this work is — well, it’s just true; explaining how that can be is kind of like explaining how it is circles have a center point. All right. But we can prove that this has to be true at least for finite terms. A sum like 0 + 1 + 2 + 3 is an arithmetic progression. It’s the sum of a finite number of terms, each of them an equal difference from the one before or the one after (or both).

Its sum will be equal to the number of terms times the arithmetic mean of the first and last. That is, it’ll be the number of terms times the sum of the first and the last terms and divided that by two. So that takes care of the numerator. If we have the sum 0 + 1 + 2 + 3 + up to whatever number you like which we’ll call ‘N’, we know its value has to be (N + 1) times N divided by 2. That takes care of the numerator.

The denominator, well, that’s (N + 1) cases of the number N being added together. Its value has to be (N + 1) times N. So the fraction is (N + 1) times N divided by 2, itself divided by (N + 1) times N. That’s got to be one-half except when N is zero. And if N were zero, well, that fraction would be 0 over 0 and we know what kind of trouble that is.

It’s a tiny bit, although you can use it to make an argument about what to expect from $\int{x^n dx}$, as Wallis did. And it delighted me to see and to understand why it should be so.

## Calculating Pi Less Terribly

Back on “Pi Day” I shared a terrible way of calculating the digits of π. It’s neat in principle, yes. Drop a needle randomly on a uniformly lined surface. Keep track of how often the needle crosses over a line. From this you can work out the numerical value of π. But it’s a terrible method. To be sure that π is about 3.14, rather than 3.12 or 3.38, you can expect to need to do over three and a third million needle-drops. So I described this as a terrible way to calculate π.

A friend on Twitter asked if it was worse than adding up 4 * (1 – 1/3 + 1/5 – 1/7 + … ). It’s a good question. The answer is yes, it’s far worse than that. But I want to talk about working π out that way.

Continue reading “Calculating Pi Less Terribly”

## But How Interesting Is A Basketball Score?

When I worked out how interesting, in an information-theory sense, a basketball game — and from that, a tournament — might be, I supposed there was only one thing that might be interesting about the game: who won? Or to be exact, “did (this team) win”? But that isn’t everything we might want to know about a game. For example, we might want to know what a team scored. People often do. So how to measure this?

The answer was given, in embryo, in my first piece about how interesting a game might be. If you can list all the possible outcomes of something that has multiple outcomes, and how probable each of those outcomes is, then you can describe how much information there is in knowing the result. It’s the sum, for all of the possible results, of the quantity negative one times the probability of the result times the logarithm-base-two of the probability of the result. When we were interested in only whether a team won or lost there were just the two outcomes possible, which made for some fairly simple calculations, and indicates that the information content of a game can be as high as 1 — if the team is equally likely to win or to lose — or as low as 0 — if the team is sure to win, or sure to lose. And the units of this measure are bits, the same kind of thing we use to measure (in groups of bits called bytes) how big a computer file is.

## It Would Have Been One More Ride Because

I apologize for being slow writing the conclusion of the explanation for why my Dearly Beloved and I would expect one more ride following our plan to keep re-riding Disaster Transport as long as a fairly flipped coin came up tails. It’s been a busy week, and actually, I’d got stuck trying to think of a way to explain the sum I needed to take using only formulas that a normal person might find, or believe. I think I have it.

## Proving A Number Is Not 1

I want to do some more tricky examples of using this ε idea, where I show two numbers have to be the same because the difference between them is smaller than every positive number. Before I do, I want to put out a problem where we can show two numbers are not the same, since I think that makes it easier to see why the proof works where it does. It’s easy to get hypnotized by the form of an argument, and to not notice that the result doesn’t actually hold, particularly if all you see are repetitions of proofs where things work out and don’t see cases of the proof being invalid.

## What Numbers Equal Zero?

I want to give some examples of showing numbers are equal by showing the difference between them is ε. It’s a fairly abstruse idea but when it works amazing things become possible.

The easy example, although one that produces strong resistance, is showing that the number 1 is equal to the number 0.9999…. But here I have to say what I mean by that second number. It’s obvious to me that I mean a number formed by putting a decimal point up, and then filling in a ‘9’ to every digit past the decimal, repeating forever and ever without end. That’s a description so easy to grasp it looks obvious. I can give a more precise, less intuitively obvious, description, though, which makes it easier to prove what I’m going to be claiming.