I am, believe it or not, working ahead of deadline on the Little Mathematics A-to-Z for this year. I feel so happy about that. But that’s eating up time to write fresh stuff here. So please let me share some older material, this from my prolific year 2016.
However, it’s not hard to create a number that you know is transcendental. Here’s how to do it, with an easy step-by-step guide. If you want to create this and declare it’s named after you, enjoy! Nobody but you will ever care about this number, I’m afraid. Its only interesting traits will be that it’s transcendental and that you crafted it. Still, isn’t that nice anyway? I think it’s nice anyway.
I’m happy to have a subject from Elke Stangl, author of elkemental Force. That’s a fun and wide-ranging blog which, among other things, just published a poem about proofs. You might enjoy.
One delight, and sometimes deadline frustration, of these essays is discovering things I had not thought about. Researching quadratic forms invited the obvious question of what is a form? And that goes undefined on, for example, Mathworld. Also in the textbooks I’ve kept. Even ones you’d think would mention, like R W R Darling’s Differential Forms and Connections, or Frigyes Riesz and Béla Sz-Nagy’s Functional Analysis. Reluctantly I started thinking about what we talk about when discussing forms.
Quadratic forms offer some hints. These take a vector in some n-dimensional space, and return a scalar. Linear forms, and cubic forms, do the same. The pattern suggests a form is a mapping from a space like to or maybe to . That looks good, but then we have to ask: isn’t that just an operator? Also: then what about differential forms? Or volume forms? These are about how to fill space. There’s nothing scalar in that. But maybe these are both called forms because they fill similar roles. They might have as little to do with one another as red pandas and giant pandas do.
Enlightenment comes after much consideration or happening on Wikipedia’s page about homogenous polynomials. That offers “an algebraic form, or simply form, is a function defined by a homogeneous polynomial”. That satisfies. First, because it gets us back to polynomials. Second, because all the forms I could think of do have rules based in homogeneous polynomials. They might be peculiar polynomials. Volume forms, for example, have a polynomial in wedge products of differentials. But it counts.
A function’s homogenous if it scales a particular way. Evaluate it at some set of coordinates x, y, z, (more variables if you need). That’s some number (let’s say). Take all those coordinates and multiply them by the same constant; let me call that α. Evaluate the function at α x, α y α z, (α times more variables if you need). Then that value is αk times the original value of f. k is some constant. It depends on the function, but not on what x, y, z, (more) are.
For a quadratic form, this constant k equals 4. This is because in the quadratic form, all the terms in the polynomial are of the second degree. So, for example, is a quadratic form. So is ; the x times the y brings this to a second degree. Also a quadratic form is . So is .
This can have many variables. If we have a lot, we have a couple choices. One is to start using subscripts, and to write the form something like:
This is respectable enough. People who do a lot of differential geometry get used to a shortcut, the Einstein Summation Convention. In that, we take as implicit the summation instructions. So they’d write the more compact . Those of us who don’t do a lot of differential geometry think that looks funny. And we have more familiar ways to write things down. Like, we can put the collection of variables into an ordered n-tuple. Call it the vector . If we then think to put the numbers into a square matrix we have a great way of writing things. We have to manipulate the a little to make the matrix, but it’s nothing complicated. Once that’s done we can write the quadratic form as:
This uses matrix multiplication. The vector we assume is a column vector, a bunch of rows one column across. Then we have to take its transposition, one row a bunch of columns across, to make the matrix multiplication work out. If we don’t like that notation with its annoying superscripts? We can declare the bare ‘x’ to mean the vector, and use inner products:
This is easier to type at least. But what does it get us?
Looking at some quadratic forms may give us an idea. practically begs to be matched to an , and the name “the equation of a circle”. is less familiar, but to the crowd reading this, not much less familiar. Fill that out to and we have a hyperbola. If we have and let that then we have an ellipse, something a bit wider than it is tall. Similarly is a hyperbola still, just anamorphic.
If we expand into three variables we start to see spheres: just begs to equal . Or ellipsoids: , set equal to some (positive) , is something we might get from rolling out clay. Or hyperboloids: or , set equal to , give us nice shapes. (We can also get cylinders: equalling some positive number describes a tube.)
How about ? This also wants to be an ellipse. , to pick an easy number, is a rotated ellipse. The long axis is along the line described by . The short axis is along the line described by . How about — let me make this easy. ? The equation describes a hyperbola, but a rotated one, with the x- and y-axes as its asymptotes.
Do you want to take any guesses about three-dimensional shapes? Like, what might represent? If you’re thinking “ellipsoid, only it’s at an angle” you’re doing well. It runs really long in one direction, along the plane described by . It runs medium-size along the plane described by . It runs pretty short along the z-axis. We could run some more complicated shapes. Ellipses pointing in weird directions. Hyperboloids of different shapes. They’ll have things in common.
One is that they have obviously important axes. Axes of symmetry, particularly. There’ll be one for each dimension of space. An ellipse has a long axis and a short axis. An ellipsoid has a long, a middle, and a short. (It might be that two of these have the same length. If all three have the same length, you have a sphere, my friend.) A hyperbola, similarly, has two axes of symmetry. One of them is the midpoint between the two branches of the hyperbola. One of them slices through the two branches, through the points where the two legs come closest together. Hyperboloids, in three dimensions, have three axes of symmetry. One of them connects the points where the two branches of hyperboloid come closest together. The other two run perpendicular to that.
We can go on imagining more dimensions of space. We don’t need them. The important things are already there. There are, for these shapes, some preferred directions. The ones around which these quadratic-form shapes have symmetries. These directions are perpendicular to each other. These preferred directions are important. We call them “eigenvectors”, a partly-German name.
Eigenvectors are great for a bunch of purposes. One is that if the matrix A represents a problem you’re interested in? The eigenvectors are probably a great basis to solve problems in it. This is a change of basis vectors, which is the same work as doing a rotation. And it’s happy to report this change of coordinates doesn’t mess up the problem any. We can rewrite the problem to be easier.
And, roughly, any time we look at reflections in a Euclidean space, there’s a quadratic form lurking around. This leads us into interesting places. Looking at reflections encourages us to see abstract algebra, to see groups. That space can be rotated in infinitesimally small pieces gets us a kind of group named a Lie (pronounced ‘lee’) Algebra. Quadratic forms give us a way of classifying those.
Quadratic forms work in number theory also. There’s a neat theorem, the 15 Theorem. If a quadratic form, with integer coefficients, can produce all the integers from 1 through 15, then it can produce all positive numbers. For example, can, for sets of integers x, y, z, and w, add up to any positive number you like. (It’s not guaranteed this will happen. can’t produce 15.) We know of at least 54 combinations which generate all the positive integers, like and and such.
There’s more, of course. There always is. I spent time skimming Quadratic Forms and their Applications, Proceedings of the Conference on Quadratic Forms and their Applications. It was held at University College Dublin in July of 1999. It’s some impressive work. I can think of very little that I can describe. Even Winfried Scharlau’s On the History of the Algebraic Theory of Quadratic Forms, from page 229, is tough going. Ina Kersten’s Biography of Ernst Witt, one of the major influences on quadratic forms, is accessible. I’m not sure how much of the particular work communicates.
It’s easy at least to know what things this field is about, though. The things that we calculate. That they connect to novel and abstract places shows how close together arithmetic and dynamical systems and topology and group theory and number theory are, despite appearances.
I have another topic today suggested by Beth, of the I Didn’t Have My Glasses On …. inspiration blog. It overlaps a bit with other essays I’ve posted this A-to-Z sequence, but that’s all right. We get a better understanding of things by considering them from several perspectives. This one will be a bit more historical.
Pop science writer Isaac Asimov told a story he was proud of about his undergraduate days. A friend’s philosophy professor held court after class. One day he declared mathematicians were mystics, believing in things they even admit are “imaginary numbers”. Young Asimov, taking offense, offered to prove the reality of the square root of minus one, if the professor gave him one-half pieces of chalk. The professor snapped a piece of chalk in half and gave one piece to him. Asimov said this is one piece of chalk. The professor answered it was half the length of a piece of chalk and Asimov said that’s not what he asked for. Even if we accept “half the length” is okay, how do we know this isn’t 48 percent the length of a standard piece of chalk? If the professor was that bad on “one-half” how could he have opinions on “imaginary numbers”?
This story is another “STEM undergraduates outwitting the philosophy expert” legend. (Even if it did happen. What we know is the story Asimov spun it into, in which a plucky young science fiction fan out-argued someone whose job is forming arguments.) Richard Feynman tells a similar story, befuddling a philosophy class with the question of how we can prove a brick has a interior. It helps young mathematicians and science majors feel better about their knowledge. But Asimov’s story does get at a couple points. First, that “imaginary” is a terrible name for a class of numbers. The square root of minus one is as “real” as one-half is. Second, we’ve decided that one-half is “real” in some way. What the philosophy professor would have baffled Asimov to explain is: in what way is one-half real? Or minus one?
We’re introduced to imaginary numbers through polynomials. I mean in education. It’s usually right after getting into quadratics, looking for solutions to equations like . That quadratic has two solutions, but it’s possible to have a quadratic with only one, such as . Or to have a quadratic with no solutions, such as, iconically, . We might underscore that by plotting the curve whose x- and y-coordinates makes true the equation . There’s no point on the curve with a y-coordinate of zero, so, there we go.
Having established that has no solutions, the course then asks “what if we go ahead and say there was one”? Two solutions, in fact, and . This is all right for introducing the idea that mathematics is a tool. If it doesn’t do something we need, we can alter it.
But I see trouble in teaching someone how you can’t take square roots of negative numbers and then teaching them how to take square roots of negative numbers. It’s confusing at least. It needs some explanation about what changed. We might do better introducing them in a more historical method.
Historically, imaginary numbers (in the West) come from polynomials, yes. Different polynomials. Cubics, and quartics. Mathematicians still liked finding roots of them. Mathematicians would challenge one another to solve sets of polynomials. This seems hard to believe, but many sources agree on this. I hope we’re not all copying Eric Temple Bell here. (Bell’s Men of Mathematics is an inspiring collection of biographical sketches. But it’s not careful differentiating legends from documented facts.) And there are enough nerd challenges today that I can accept people daring one another to find solutions of .
Quadratics, equations we can write as for some real numbers a, b, and c, we’ve known about forever. Euclid solved these kinds of equations using geometric reasoning. Chinese mathematicians 2200 years ago described rules for how to find roots. The Indian mathematician Brahmagupta, by the early 7th century, described the quadratic formula to find at least one root. Both possible roots were known to Indian mathematicians a thousand years ago. We’ve reduced the formula today to
With that filtering into Western Europe, the search was on for similar formulas for other polynomials. This turns into several interesting threads. One is a tale of intrigue and treachery involving Gerolamo Cardano, Niccolò Tartaglia, and Ludovico Ferrari. I’ll save that for another essay because I have to cut something out, so of course I skip the dramatic thing. Another thread is the search for quadratic-like formulas for other polynomials. They exist for third-power and fourth-power polynomials. Not (generally) for the fifth- or higher-powers. That is, there are individual polynomials you can solve by formulas, like, . But stare at it and you can see where that’s “really” a quadratic pretending to be sixth-power. Finding there was no formula to find, though, lead people to develop group theory. And group theory underlies much of mathematics and modern physics.
The first great breakthrough solving the general cubic, , came near the end of the 14th century in some manuscripts out of Florence. It’s built on a transformation. Transformations are key to mathematics. The point of a transformation is to turn a problem you don’t know how to do into one you do. As I write this, MathWorld lists 543 pages as matching “transformation”. That’s about half what “polynomial” matches (1,199) and about three times “trigonometric” (184). So that can help you judge importance.
Here, the transformation to make is to write a related polynomial in terms of a new variable. You can call that new variable x’ if you like, or z. I’ll use z so as to not have too many superscript marks flying around. This will be a “depressed polynomial”. “Depressed” here means that at least one of the coefficients in the new polynomial is zero. (Here, for this problem, it means we won’t have a squared term in the new polynomial.) I suspect the term is old-fashioned.
Let z be the new variable, related to x by the equation . And then figure out what and are. Using all that, and the knowledge that , and a lot of arithmetic, you get to one of these three equations:
where p and q are some new coefficients. They’re positive numbers, or possibly zeros. They’re both derived from a, b, c, and d. And so in the 15th Century the search was on to solve one or more of these equations.
From our perspective in the 21st century, our first question is: what three equations? How are these not all the same equation? And today, yes, we would write this as one depressed equation, most likely . We would allow that p or q or both might be negative numbers.
And there is part of the great mysterious historical development. These days we generally learn about negative numbers. Once we are comfortable, our teachers hope, with those we get imaginary numbers. But in the Western tradition mathematicians noticed both, and approached both, at roughly the same time. With roughly similar doubts, too. It’s easy to point to three apples; who can point to “minus three” apples? We can arrange nine apples into a neat square. How big a square can we set “minus nine” apples in?
Hesitation and uncertainty about negative numbers would continue quite a long while. At least among Western mathematicians. Indian mathematicians seem to have been more comfortable with them sooner. And merchants, who could model a negative number as a debt, seem to have gotten the idea better.
But even seemingly simple questions could be challenging. John Wallis, in the 17th century, postulated that negative numbers were larger than infinity. Leonhard Euler seems to have agreed. (The notion may seem odd. It has echoes today, though. Computers store numbers as bit patterns. The normal scheme represents negative numbers by making the first bit in a pattern 1. These bit patterns make the negative numbers look bigger than the biggest positive numbers. And thermodynamics gives us a temperature defined by the relationship of energy to entropy. That definition implies there can be negative temperatures. Those are “hotter” — higher-energy, at least — than infinitely-high positive temperatures.) In the 18th century we see temperature scales designed so that the weather won’t give negative numbers too often. Augustus De Morgan wrote in 1831 that a negative number “occurring as the solution of a problem indicates some inconsistency or absurdity”. De Morgan was not an amateur. He coded the rules for deductive logic so well we still call them De Morgan’s laws. He put induction on a logical footing. And he found negative numbers (and imaginary numbers) a sign of defective work. In 1831. 1831!
But back to cubic equations. Allow that we’ve gotten comfortable enough with negative numbers we only want to solve the one depressed equation of . How to do it? … Another transformation, then. There are a couple you can do. Modern mathematicians would likely define a new variable w, set so that . This turns the depressed equation into
And this, believe it or not, is a disguised quadratic. Multiply everything in it by and move things around a little. You get
From there, quadratic formula to solve for . Then from that, take cube roots and you get three values of z. From that, you get your three values of x.
You see why nobody has taught this in high school algebra since 1959. Also why I am not touching the quartic formula, the equivalent of this for polynomials of degree four.
There are other approaches. And they can work out easier for particular problems. Take, for example, which I introduced in the first act. It’s past the time we set it off.
Rafael Bombelli, in the 1570s, pondered this particular equation. Notice it’s already depressed. A formula developed by Cardano addressed this, in the form . Notice that’s the second of the three sorts of depressed polynomial. Cardano’s formula says that one of the roots will be at
Put to this problem, we get something that looks like a compelling reason to stop:
Bombelli did not stop with that, though. He carried on as though these expressions of the square root of -121 made sense. And, if he did that he found these terms added up. You get an x of 4.
Which is true. It’s easy to check that it’s right. And here is the great surprising thing. Start from the respectable enough equation. It has nothing suspicious in it, not even negative numbers. Follow it through and you need to use negative numbers. Worse, you need to use the square roots of negative numbers. But keep going, as though you were confident in this, and you get a correct answer. And a real number.
We can get the other roots. Divide out of . What’s left is . You can use the quadratic formula for this. The other two roots are , about -0.268, and , about -3.732.
So here we have good reasons to work with negative numbers, and with imaginary numbers. We may not trust them. But they get us to correct answers. And this brings up another little secret of mathematics. If all you care about is an answer, then it’s all right to use a dubious method to get an answer.
There is a logical rigor missing in “we got away with it, I guess”. The name “imaginary numbers” tells of the disapproval of its users. We get the name from René Descartes, who was more generally discussing complex numbers. He wrote something like “in many cases no quantity exists which corresponds to what one imagines”.
John Wallis, taking a break from negative numbers and his other projects and quarrels, thought of how to represent imaginary numbers as branches off a number line. It’s a good scheme that nobody noticed at the time. Leonhard Euler envisioned matching complex numbers with points on the plane, but didn’t work out a logical basis for this. In 1797 Caspar Wessel presented a paper that described using vectors to represent complex numbers. It’s a good approach. Unfortunately that paper too sank without a trace, undiscovered for a century.
In 1806 Jean-Robert Argand wrote an “Essay on the Geometrical Interpretation of Imaginary Quantities”. Jacques Français got a copy, and published a paper describing the basics of complex numbers. He credited the essay, but noted that there was no author on the title page and asked the author to identify himself. Argand did. We started to get some good rigor behind the concept.
In 1831 William Rowan Hamilton, of Hamiltonian fame, described complex numbers using ordered pairs. Once we can define their arithmetic using the arithmetic of real numbers we have a second solid basis. More reason to trust them. Augustin-Louis Cauchy, who proved about four billion theorems of complex analysis, published a new construction of them. This used a group theory approach, a polynomial ring we denote as . I don’t have the strength to explain all that today. Matrices give us another approach. This matches complex numbers with particular two-row, two-column matrices. This turns the addition and multiplication of numbers into what Hamilton described.
And here we have some idea why mathematicians use negative numbers, and trust imaginary numbers. We are pushed toward them by convenience. Negative numbers let us work with one equation, , rather than three. (Or more than three equations, if we have to work with an x we know to be negative.) Imaginary numbers we can start with, and find answers we know to be true. And this encourages us to find reasons to trust the results. Having one line of reasoning is good. Having several lines — Argand’s geometric, Hamilton’s coordinates, Cauchy’s rings — is reassuring. We may not be able to point to an imaginary number of anything. But if we can trust our arithmetic on real numbers we can trust our arithmetic on imaginary numbers.
As I mentioned Descartes gave the name “imaginary number” to all of what we would now call “complex numbers”. Gauss published a geometric interpretation of complex numbers in 1831. And gave us the term “complex number”. Along the way he complained about the terminology, though. He noted “had +1, -1, and , instead of being called positive, negative, and imaginary (or worse still, impossible) unity, been given the names say, of direct, inverse, and lateral unity, there would hardly have been any scope for such obscurity”. I’ve never heard them term “impossible numbers”, except as an adjective.
The name of a thing doesn’t affect what it is. It can affect how we think about it, though. We can ask whether Asimov’s professor would dismiss “lateral numbers” as mysticism. Or at least as more mystical than “three” is. We can, in context, understand why Descartes thought of these as “imaginary numbers”. He saw them as something to use for the length of a calculation, and that would disappear once its use was done. We still have such concepts, things like “dummy variables” in a calculus problem. We can’t think of a use for dummy variables except to let a calculation proceed. But perhaps we’ll see things differently in four hundred years. Shall have to come back and check.
Today’s A To Z term was nominated by APMA, author of the Everybody Makes DATA blog. It was a topic that delighted me to realize I could explain. Then it started to torment me as I realized there is a lot to explain here, and I had to pick something. So here’s where things ended up.
In the mid-2000s I was teaching at a department being closed down. In its last semester I had to teach Computational Quantum Mechanics. The person who’d normally taught it had transferred to another department. But a few last majors wanted the old department’s version of the course, and this pressed me into the role. Teaching a course you don’t really know is a rush. It’s a semester of learning, and trying to think deeply enough that you can convey something to students. This while all the regular demands of the semester eat your time and working energy. And this in the leap of faith that the syllabus you made up, before you truly knew the subject, will be nearly enough right. And that you have not committed to teaching something you do not understand.
So around mid-course I realized I needed to explain finding the wave function for a hydrogen atom with two electrons. The wave function is this probability distribution. You use it to find things like the probability a particle is in a certain area, or has a certain momentum. Things like that. A proton with one electron is as much as I’d ever done, as a physics major. We treat the proton as the center of the universe, immobile, and the electron hovers around that somewhere. Two electrons, though? A thing repelling your electron, and repelled by your electron, and neither of those having fixed positions? What the mathematics of that must look like terrified me. When I couldn’t procrastinate it farther I accepted my doom and read exactly what it was I should do.
It turned out I had known what I needed for nearly twenty years already. Got it in high school.
Of course I’m discussing Taylor Series. The equations were loaded down with symbols, yes. But at its core, the important stuff, was this old and trusted friend.
The premise behind a Taylor Series is even older than that. It’s universal. If you want to do something complicated, try doing the simplest thing that looks at all like it. And then make that a little bit more like you want. And then a bit more. Keep making these little improvements until you’ve got it as right as you truly need. Put that vaguely, the idea describes Taylor series just as well as it describes making a video game or painting a state portrait. We can make it more specific, though.
A series, in this context, means the sum of a sequence of things. This can be finitely many things. It can be infinitely many things. If the sum makes sense, we say the series converges. If the sum doesn’t, we say the series diverges. When we first learn about series, the sequences are all numbers. , for example, which diverges. (It adds to a number bigger than any finite number.) Or , which converges. (It adds to .)
In a Taylor Series, the terms are all polynomials. They’re simple polynomials. Let me call the independent variable ‘x’. Sometimes it’s ‘z’, for the reasons you would expect. (‘x’ usually implies we’re looking at real-valued functions. ‘z’ usually implies we’re looking at complex-valued functions. ‘t’ implies it’s a real-valued function with an independent variable that represents time.) Each of these terms is simple. Each term is the distance between x and a reference point, raised to a whole power, and multiplied by some coefficient. The reference point is the same for every term. What makes this potent is that we use, potentially, many terms. Infinitely many terms, if need be.
Call the reference point ‘a’. Or if you prefer, x0. z0 if you want to work with z’s. You see the pattern. This ‘a’ is the “point of expansion”. The coefficients of each term depend on the original function at the point of expansion. The coefficient for the term that has is the first derivative of f, evaluated at a. The coefficient for the term that has is the second derivative of f, evaluated at a (times a number that’s the same for the squared-term for every Taylor Series). The coefficient for the term that has is the third derivative of f, evaluated at a (times a different number that’s the same for the cubed-term for every Taylor Series).
You’ll never guess what the coefficient for the term with is. Nor will you ever care. The only reason you would wish to is to answer an exam question. The instructor will, in that case, have a function that’s either the sine or the cosine of x. The point of expansion will be 0, , , or .
Otherwise you will trust that this is one of the terms of , ‘n’ representing some counting number too great to be interesting. All the interesting work will be done with the Taylor series either truncated to a couple terms, or continued on to infinitely many.
What a Taylor series offers is the chance to approximate a function we’re genuinely interested in with a polynomial. This is worth doing, usually, because polynomials are easier to work with. They have nice analytic properties. We can automate taking their derivatives and integrals. We can set a computer to calculate their value at some point, if we need that. We might have no idea how to start calculating the logarithm of 1.3. We certainly have an idea how to start calculating . (Yes, it’s 0.3. I’m using a Taylor series with a = 1 as the point of expansion.)
The first couple terms tell us interesting things. Especially if we’re looking at a function that represents something physical. The first two terms tell us where an equilibrium might be. The next term tells us whether an equilibrium is stable or not. If it is stable, it tells us how perturbations, points near the equilibrium, behave.
The first couple terms will describe a line, or a quadratic, or a cubic, some simple function like that. Usually adding more terms will make this Taylor series approximation a better fit to the original. There might be a larger region where the polynomial and the original function are close enough. Or the difference between the polynomial and the original function will be closer together on the same old region.
We would really like that region to eventually grow to the whole domain of the original function. We can’t count on that, though. Roughly, the interval of convergence will stretch from ‘a’ to wherever the first weird thing happens. Weird things are, like, discontinuities. Vertical asymptotes. Anything you don’t like dealing with in the original function, the Taylor series will refuse to deal with. Outside that interval, the Taylor series diverges and we just can’t use it for anything meaningful. Which is almost supernaturally weird of them. The Taylor series uses information about the original function, but it’s all derivatives at a single point. Somehow the derivatives of, say, the logarithm of x around x = 1 give a hint that the logarithm of 0 is undefinable. And so they won’t help us calculate the logarithm of 3.
Things can be weirder. There are functions that just break Taylor series altogether. Some are obvious. A function needs lots of derivatives at a point to have a good Taylor series approximation. So, many fractal curves won’t have a Taylor series approximation. These curves are all corners, points where they aren’t continuous or where derivatives don’t exist. Some are obviously designed to break Taylor series approximations. We can make a function that follows different rules if x is rational than if x is irrational. There’s no approximating that, and you’d blame the person who made such a function, not the Taylor series. It can be subtle. The function defined by the rule , with the note that if x is zero then f(x) is 0, seems to satisfy everything we’d look for. It’s a function that’s mostly near 1, that drops down to being near zero around where x = 0. But its Taylor series expansion around a = 0 is a horizontal line always at 0. The interval of convergence can be a single point, challenging our idea of what an interval is.
That’s all right. If we can trust that we’re avoiding weird parts, Taylor series give us an outstanding new tool. Grant that the Taylor series describes a function with the same rule as our original function. The Taylor series is often easier to work with, especially if we’re working on differential equations. We can automate, or at least find formulas for, taking the derivative of a polynomial. Or adding together derivatives of polynomials. Often we can attack a differential equation too hard to solve otherwise by supposing the answer is a polynomial. This is essentially what that quantum mechanics problem used, and why the tool was so familiar when I was in a strange land.
Roughly. What I was actually doing was treating the function I wanted as a power series. This is, like the Taylor series, the sum of a sequence of terms, all of which are times some coefficient. What makes it not a Taylor series is that the coefficients weren’t the derivatives of any function I knew to start. But the experience of Taylor series trained me to look at functions as things which could be approximated by polynomials.
This gives us the hint to look at other series that approximate interesting functions. We get a host of these, with names like Laurent series and Fourier series and Chebyshev series and such. Laurent series look like Taylor series but we allow powers to be negative integers as well as positive ones. Fourier series do away with polynomials. They instead use trigonometric functions, sines and cosines. Chebyshev series build on polynomials, but not on pure powers. They’ll use orthogonal polynomials. These behave like perpendicular directions do. That orthogonality makes many numerical techniques behave better.
The Taylor series is a great introduction to these tools. Its first several terms have good physical interpretations. Its calculation requires tools we learn early on in calculus. The habits of thought it teaches guides us even in unfamiliar territory.
And I feel very relieved to be done with this. I often have a few false starts to an essay, but those are mostly before I commit words to text editor. This one had about four branches that now sit in my scrap file. I’m glad to have a deadline forcing me to just publish already.
Today’s A To Z term is my pick again. So I choose the Julia Set. This is named for Gaston Julia, one of the pioneers in chaos theory and fractals. He was born earlier than you imagine. No, earlier than that: he was born in 1893.
The early 20th century saw amazing work done. We think of chaos theory and fractals as modern things, things that require vast computing power to understand. The computers help, yes. But the foundational work was done more than a century ago. Some of these pioneering mathematicians may have been able to get some numerical computing done. But many did not. They would have to do the hard work of thinking about things which they could not visualize. Things which surely did not look like they imagined.
We think of things as moving. Even static things we consider as implying movement. Else we’d think it odd to ask, “Where does that road go?” This carries over to abstract things, like mathematical functions. A function is a domain, a range, and a rule matching things in the domain to things in the range. It “moves” things as much as a dictionary moves words.
Yet we still think of a function as expressing motion. A common way for mathematicians to write functions uses little arrows, and describes what’s done as “mapping”. We might write . This is a general idea. We’re expressing that it maps things in the set D to things in the set R. We can use the notation to write something more specific. If ‘z’ is in the set D, we might write . This describes the rule that matches things in the domain to things in the range. represents the evaluation of this rule at a specific point, the one where the independent variable has the value ‘2’. represents the evaluation of this rule at a specific point without committing to what that point is. represents a collection of points. It’s the set you get by evaluating the rule at every point in D.
And it’s not bad to think of motion. Many functions are models of things that move. Particles in space. Fluids in a room. Populations changing in time. Signal strengths varying with a sensor’s position. Often we’ll calculate the development of something iteratively, too. If the domain and the range of a function are the same set? There’s no reason that we can’t take our z, evaluate f(z), and then take whatever that thing is and evaluate f(f(z)). And again. And again.
My age cohort, at least, learned to do this almost instinctively when we discovered you could take the result on a calculator and hit a function again. Calculate something and keep hitting square root; you get a string of numbers that eventually settle on 1. Or you started at zero. Calculate something and keep hitting square; you settle at either 0, 1, or grow to infinity. Hitting sine over and over … well, that was interesting since you might settle on 0 or some other, weird number. Same with tangent. Cosine you wouldn’t settle down to zero.
Serious mathematicians look at this stuff too, though. Take any set ‘D’, and find what its image is, f(D). Then iterate this, figuring out what f(f(D)) is. Then f(f(f(D))). f(f(f(f(D)))). And so on. What happens if you keep doing this? Like, forever?
We can say some things, at least. Even without knowing what f is. There could be a part of D that all these many iterations of f will send out to infinity. There could be a part of D that all these many iterations will send to some fixed point. And there could be a part of D that just keeps getting shuffled around without ever finishing.
Some of these might not exist. Like, doesn’t have any fixed points or shuffled-around points. It sends everything off to infinity. has only a fixed point; nothing from it goes off to infinity and nothing’s shuffled back and forth. has a fixed point and a lot of points that shuffle back and forth.
Thinking about these fixed points and these shuffling points gets us Julia Sets. These sets are the fixed points and shuffling-around points for certain kinds of functions. These functions are ones that have domain and range of the complex-valued numbers. Complex-valued numbers are the sum of a real number plus an imaginary number. A real number is just what it says on the tin. An imaginary number is a real number multiplied by . What is ? It’s the imaginary unit. It has the neat property that . That’s all we need to know about it.
Oh, also, zero times is zero again. So if you really want, you can say all real numbers are complex numbers; they’re just themselves plus . Complex-valued functions are worth a lot of study in their own right. Better, they’re easier to study (at the introductory level) than real-valued functions are. This is such a relief to the mathematics major.
And now let me explain some little nagging weird thing. I’ve been using ‘z’ to represent the independent variable here. You know, using it as if it were ‘x’. This is a convention mathematicians use, when working with complex-valued numbers. An arbitrary complex-valued number tends to be called ‘z’. We haven’t forgotten x, though. We just in this context use ‘x’ to mean “the real part of z”. We also use “y” to carry information about the imaginary part of z. When we write ‘z’ we hold in trust an ‘x’ and ‘y’ for which . This all comes in handy.
But we still don’t have Julia Sets for every complex-valued function. We need it to be a rational function. The name evokes rational numbers, but that doesn’t seem like much guidance. is a rational function. It seems too boring to be worth studying, though, and it is. A “rational function” is a function that’s one polynomial divided by another polynomial. This whether they’re real-valued or complex-valued polynomials.
So. Start with an ‘f’ that’s one complex-valued polynomial divided by another complex-valued polynomial. Start with the domain D, all of the complex-valued numbers. Find f(D). And f(f(D)). And f(f(f(D))). And so on. If you iterated this ‘f’ without limit, what’s the set of points that never go off to infinity? That’s the Julia Set for that function ‘f’.
There are some famous Julia sets, though. There are the Julia sets that we heard about during the great fractal boom of the 1980s. This was when computers got cheap enough, and their graphic abilities good enough, to automate the calculation of points in these sets. At least to approximate the points in these sets. And these are based on some nice, easy-to-understand functions. First, you have to pick a constant C. This C is drawn from the complex-valued numbers. But that can still be, like, ½, if that’s what interests you. For whatever your C is? Define this function:
And that’s it. Yes, this is a rational function. The numerator function is . The denominator function is .
This produces many different patterns. If you picked C = 0, you get a circle. Good on you for starting out with something you could double-check. If you picked C = -2? You get a long skinny line, again, easy enough to check. If you picked C = -1? Well, now you have a nice interesting weird shape, several bulging ovals with peninsulas of other bulging ovals all over. Pick other numbers. Pick numbers with interesting imaginary components. You get pinwheels. You get jagged streaks of lightning. You can even get separate islands, whole clouds of disjoint threatening-looking blobs.
There is some guessing what you’ll get. If you work out a Julia Set for a particular C, you’ll see a similar-looking Julia Set for a different C that’s very close to it. This is a comfort.
You can create a Julia Set for any rational function. I’ve only ever seen anyone actually do it for functions that look like what we already had. . Sometimes . I suppose once, in high school, I might have tried but I don’t remember what it looked like. If someone’s done, say, please write in and let me know what it looks like.
The Julia Set has a famous partner. Maybe the most famous fractal of them all, the Mandelbrot Set. That’s the strange blobby sea surrounded by lightning bolts that you see on the cover of every pop mathematics book from the 80s and 90s. If a C gives us a Julia Set that’s one single, contiguous patch? Then that C is in the Mandelbrot Set. Also vice-versa.
The ideas behind these sets are old. Julia’s paper about the iterations of rational functions first appeared in 1918. Julia died in 1978, the same year that the first computer rendering of the Mandelbrot set was done. I haven’t been able to find whether that rendering existed before his death. Nor have I decided which I would think the better sequence.
I had another free choice. I thought I’d go back to one of the topics I knew and loved in grad school even though I didn’t have the time to properly study it then. It turned out I had forgotten some important points and spent a night crash-relearning knot theory. This isn’t a bad thing necessarily.
This is a thing which comes from graphs. Not the graphs you ever drew in algebra class. Graphs as in graph theory. These figures made of spots called vertices. Pairs of vertices are connected by edges. There’s many interesting things to study about these.
One path to take in understanding graphs is polynomials. Of course I would bring things back to polynomials. But there’s good reasons. These reasons come to graph theory by way of knot theory. That’s an interesting development since we usually learn graph theory before knot theory. But knot theory has the idea of representing these complicated shapes as polynomials.
There are a bunch of different polynomials for any given graph. The oldest kind, the Alexander Polynomial, J W Alexander developed in the 1920s. And that was about it until the 1980s when suddenly everybody was coming up with good new polynomials. The definitions are different. They give polynomials that look different. Some are able to distinguish between a knot and the knot that’s its reflection across a mirror. Some, like the Alexander aren’t. But they’re common in some important ways. One is that they might not actually be, you know, polynomials. I mean, they’ll be the sum of numbers — whole numbers, even — times a variable raised to a power. The variable might be t, might be x. Might be something else, but it doesn’t matter. It’s a pure dummy variable. But the variable might be raised to a negative power, which isn’t really a polynomial. It might even be raised to, oh, one-half or three-halves, or minus nine-halves, or something like that. We can try saying this is “a polynomial in t-to-the-halves”. Mostly it’s because we don’t have a better name for it.
And going from a particular knot to a polynomial follows a pretty common procedure. At least it can, when you’re learning knot theory and feel a bit overwhelmed trying to prove stuff about “knot invariants” and “homologies” and all. Having a specific example can be such a comfort. You can work this out by an iterative process. Take a specific drawing of your knot. There’s places where the strands of the knot cross over one another. For each of those crossings you ponder some alternate cases where the strands cross over in a different way. And then you add together some coefficient times the polynomial of this new, different knot. The coefficient you get by the rules of whatever polynomial you’re making. The new, different knots are, usually, no more complicated than what you started with. They’re often simpler knots. This is what saves you from an eternity of work. You’re breaking the knot down into more but simpler knots. Just the fact of doing that can be satisfying enough. Eventually you get to something really simple, like a circle, and declare that’s some basic polynomial. Then there’s a lot bit of adding up coefficients and powers and all that. Tedious but not hard.
Knots are made from a continuous loop of … we’ll just call it thread. It can fold over itself many times. It has to, really, or it hasn’t got a chance of being more interesting than a circle. A graph is different. That there are vertices seems to change things. Less than you’d think, though. The thread of a knot can cross over and under itself. Edges of a graph can cross over and under other edges. This isn’t too different. We can also imagine replacing a spot where two edges cross over and under the other with an intersection and new vertex.
So we get to the Yamada polynomial by treating a graph an awful lot like we might treat a knot. Take the graph and split it up at each overlap. At each overlap we have something that looks, at least locally, kind of like an X. An upper left, upper right, lower left, and lower right intersection. The lower left connects to the upper right, and the upper left connects to the lower right. But these two edges don’t actually touch; one passes over the other. (By convention, the lower left going to the upper right is on top.)
There’s three alternate graphs. One has the upper left connected to the lower left, and the upper right connected to the lower right. This looks like replacing the X with a )( loop. The second alternate has the upper left connected to the upper right, and the lower left connected to the lower right. This looks like … well, that )( but rotated ninety degrees. I can’t do that without actually including a picture. The third alternate puts a vertex in the X. So now the upper left, upper right, lower left, and lower right all connect to the new vertex in the center.
Probably you’d agree that replacing the original X with a )( pattern, or its rotation, probably doesn’t make the graph any more complicated. And it might make the graph simpler. But adding that new vertex looks like trouble. It looks like it’s getting more complicated. We might get stuck in an infinite regression of more-complicated polynomials.
What saves us is the coefficient we’re multiplying the polynomials for these new graphs by. It’s called the “chromatic coefficient” and it reflects how many different colors you need to color in this graph. An edge needs to connect two different colors. And — what happens if an edge connects a vertex to itself? That is, the edge loops around back to where it started? That’s got a chromatic number of zero and the moment we get a single one of these loops anywhere in our graph we can stop calculating. We’re done with that branch of the calculations. This is what saves us.
There’s a catch. It’s a catch that knot polynomials have, too. This scheme writes a polynomial not just for a particular graph but a particular way of rendering this graph. There’s always other ways to draw it. If nothing else you can always twirl a edge over itself, into a loop like you get when Christmas tree lights start tangling themselves up. But you can move the vertices to different places. You can have an edge go outside the rest of the figure instead of inside, that sort of thing. Starting from a different rendition of the shape gets you to a different polynomial.
Superficially different, anyway. What you get from two different renditions of the same graph are polynomials different by your dummy variable raised to a whole number. Also maybe a plus-or-minus sign. You can see a difference between, say, (to make up an example) and . But you can see that second polynomial is just . It’s some confounding factor times something that is distinctive to the graph.
And that distinctive part, the thing that doesn’t change if you draw the graph differently? That’s the Yamada polynomial, at last. It’s a way to represent this collection of vertices and edges using only coefficients and exponents.
I would like to give an impressive roster of uses for these polynomials here. I’m afraid I have to let you down. There is the obvious use: if you suspect two graphs are really the same, despite how different they look, here’s a test. Calculate their Yamada polynomials and if they’re different, you know the graphs were different. It can be hard to tell. Get anything with more than, say, eight vertices and 24 edges in it and you’re not going to figure that out by sight.
I encountered the Yamada polynomial specifically as part of a textbook chapter about chemistry. It’s easy to imagine there should be great links between knots and graphs and the way that atoms bundle together into molecules. The shape of their structures describes what they will do. But I am not enough of a chemist to say how this description helps chemists understand molecules. It’s possible that it doesn’t: Yamada’s paper introducing the polynomial was published in 1989. My knot theory textbook might have brought it up because it looked exciting. There are trends and fashions in mathematical thought too. I don’t know what several more decades of work have done to the polynomial’s reputation. I’m glad to hear from people who know better.
I’m planning this week to open up the end of the alphabet — and the year — to topic suggestions. So there’s no need to panic about that.
The Quadratic Equation is the tool humanity used to discover mathematics. Yes, I exaggerate a bit. But it touches a stunning array of important things. It is most noteworthy because of the time I impressed by several-levels-removed boss at the summer job I had while an undergraduate. He had been stumped by a data-optimization problem for weeks. I noticed it was just a quadratic equation, that’s easy to solve. He was, must be said, overly impressed. I would go on to grad school where I was once stymied for a week because I couldn’t find the derivative of correctly. It is, correctly, . So I have sympathy for my remote supervisor.
We normally write the Quadratic Equation in one of two forms:
The first form is great when you are first learning about polynomials, and parabolas. And you’re content to something raised to the second power. The second form is great when you are learning advanced stuff about polynomials. Then you start wanting to know things true about polynomials that go up to arbitrarily high powers. And we always want to know about polynomials. The subscripts under mean we can’t run out of letters to be coefficients. Setting the subscripts and powers to keep increasing lets us write this out neatly.
We don’t have to use x. We never do. But we mostly use x. Maybe t, if we’re writing an equation that describes something changing with time. Maybe z, if we want to emphasize how complex-valued numbers might enter into things. The name of the independent variable doesn’t matter. But stick to the obvious choices. If you’re going to make the variable ‘f’ you better have a good reason.
The equation is very old. We have ancient Babylonian clay tablets which describe it. Well, not the quadratic equation as we write it. The oldest problems put it as finding numbers that simultaneously solve two equations, one of them a sum and one of them a product. Changing one equation into two is a venerable mathematical process. It often makes problems simpler. We do this all the time in Ordinary Differential Equations. I doubt there is a direct connection between Ordinary Differential Equations and this alternate form of the Quadratic Equation. But it is a reminder that the ways we express mathematical problems are our conventions. We can rewrite problems to make our lives easier, to make answers clearer. We should look for chances to do that.
It weaves into everything. Some things seem obvious. Suppose the coefficients — a, b, and c; or if you’d rather — are all real-valued numbers. Then the quadratic equation has to hav two solutions. There can be two real-valued solutions. There can be one real-valued solution, counted twice for reasons that make sense but are too much a digression for me to justify here. There can be two complex-valued solutions. We can infer the usefulness of imaginary and complex-valued numbers by finding solutions to the quadratic equation.
(The quadratic equation is a great introduction complex-valued numbers. It’s not how mathematicians came to them. Complex-valued numbers looked like obvious nonsense. They corresponded to there being no real-valued answers. A formula that gives obvious nonsense when there’s no answer is great. It’s formulas that give subtle nonsense when there’s no answer that are dangerous. But similar-in-design formulas for cubic and quartic polynomials could use complex-valued numbers in intermediate steps. Plunging ahead as though these complex-valued numbers were proper would get to the real-valued answers. This made the argument that complex-valued numbers should be taken seriously.)
We learn useful things right away from trying to solve it. We teach students to “complete the square” as a first approach to solving it. Completing the square is not that useful by itself: a few pages later in the textbook we get to the quadratic formula and that has every quadratic equation solved. Just plug numbers into the formula. But completing the square teaches something more useful than just how to solve an equation. It’s a method in which we solve a problem by saying, you know, this would be easy to solve if only it were different. And then thinking how to change it into a different-looking problem with the same solutions. This is brilliant work. A mathematician is imagined to have all sorts of brilliant ideas on how to solve problems. Closer to to the truth is that she’s learned all sorts of brilliant ways to make a problem more like one she already knows how to solve. (This is the nugget of truth which makes one genre of mathematical jokes. These jokes have the punch line, “the mathematician declares, `this is a problem already solved’ and goes back to sleep.”)
Stare at the solutions of the quadratic equation. You will find patterns. Suppose the coefficients are all real numbers. Then there are some numbers that can be solutions: 0, 1, square root of 15, -3.5, these can all turn up. There are some numbers that can’t be. π. e. The tangent of 2. It’s not just a division between rational and irrational numbers. There are different kinds of irrational numbers. This — alongside looking at other polynomials — leads us to transcendental numbers.
Keep staring at the two solutions of the quadratic equation. You’ll notice the sum of the solutions is . You’ll notice the product of the two solutions is . You’ll glance back at those ancient Babylonian tablets. This seems interesting, but little more than that. It’s a lead, though. Similar formulas exist for the sum of the solutions for a cubic, for a quartic, for other polynomials. Also for the sum of products of pairs of these solutions. Or the sum of products of triplets of these solutions. Or the product of all these solutions. These are known as Vieta’s Formulas, after the 16th-century mathematician François Viète. (This by way of his Latinized, academic’sona, name, Franciscus Vieta.) This gives us a way to rewrite the original polynomial as a set of polynomials in several variables. What’s interesting is the set of polynomials have symmetries. They all look like, oh, “xy + yz + zx”. No one variable gets used in a way distinguishable from the others.
This leads us to group theory. The coefficients start out in a ring. The quotients from these Vieta’s Formulas give us an “extension” of the ring. An extension is roughly what the common use of the word suggests. It takes the ring and builds from it a bigger thing that satisfies some nice interesting rules. And it leads us to surprises. The ancient Greeks had several challenges to be done with only straightedge and compass. One was to make a cube double the volume of a given cube. It’s impossible to do, with these tools. (Even ignoring the question of what we would draw on.) Another was to trisect any arbitrary angle; it turns out, there are angles it’s just impossible. The group theory derived, in part, from this tells us why. One more impossibility: drawing a square that has exactly the same area as a given circle.
But there are possible things still. Step back from the quadratic equation, that bit. Make a function, instead, something that matches numbers (real, complex, what have you) to numbers (the same). Its rule: any x in the domain matches to the number in the range. We can make a picture that represents this. Set Cartesian coordinates — the x and y coordinates that people think of as the default — on a surface. Then highlight all the points with coordinates (x, y) which make true the equation . This traces out a particular shape, the parabola.
Draw a line that crosses this parabola twice. There’s now one fully-enclosed piece of the surface. How much area is enclosed there? It’s possible to find a triangle with area three-quarters that of the enclosed part. It’s easy to use straightedge and compass to draw a square the same area as a given triangle. Showing the enclosed area is four-thirds the triangle’s area? That can … kind of … be done by straightedge and compass. It takes infinitely many steps to do this. But if you’re willing to allow a process to go on forever? And you show that the process would reach some fixed, knowable answer? This could be done by the ancient Greeks; indeed, it was. Aristotle used this as an example of the method of exhaustion. It’s one of the ideas that reaches toward integral calculus.
This has been a lot of exact, “analytic” results. There are neat numerical results too. Vieta’s formulas, for example, give us good ways to find approximate solutions of the quadratic equation. They work well if one solution is much bigger than the other. Numerical methods for finding solutions tend to work better if you can start from a decent estimate of the answer. And you can learn of numerical stability, and the need for it, studying these.
Numerical calculations have a problem. We have a set number of decimal places with which to work. What happens if we need a calculation that takes more decimal places than we’re given to do perfectly? Here’s a toy version: two-thirds is the number 0.6666. Or 0.6667. Already we’re in trouble. What is three times two-thirds? We’re going to get either 1.9998 or 2.0001 and either way something’s wrong. The wrongness looks small. But any formula you want to use has some numbers that will turn these small errors into big ones. So numerical stability is, in fairness, not something unique to the quadratic equation. It is something you learn if you study the numerics of the equation deeply enough.
I’m also delighted to learn, through Wikipedia, that there’s a prosthaphaeretic method for solving the quadratic equation. Prosthaphaeretic methods use trigonometric functions and identities to rewrite problems. You might call it madness to rely on arctangents and half-angle formulas and such instead of, oh, doing a division or taking a square root. This is because you have calculators. But if you don’t? If you have to do all that work by hand? That’s terrible. But if someone has already prepared a table listing the sines and cosines and tangents of a great variety of angles? They did a great many calculations already. You just need to pick out the one that tells you what you hope to know. I’ll spare you the steps of solving the quadratic equation using trig tables. Wikipedia describes it fine enough.
So you see how much mathematics this connects to. It’s a bit of question-begging to call it that important. As I said, we’ve known the quadratic equation for a long time. We’ve thought about it for a long while. It would be surprising if we didn’t find many and deep links to other things. Even if it didn’t have links, we would try to understand new mathematical tools in terms of how they affect familiar old problems like this. But these are some of the things which we’ve found, and which run through much of what we understand mathematics to be.
I never heard of today’s entry topic three months ago. Indeed, three weeks ago I was still making guesses about just what Gaurish, author of For the love of Mathematics,, was asking about. It turns out to be maybe the grand union of everything that’s ever been in one of my A To Z sequences. I overstate, but barely.
The specific thing that a Young Tableau is is beautiful in its simplicity. It could almost be a recreational mathematics puzzle, except that it isn’t challenging enough.
Start with a couple of boxes laid in a row. As many or as few as you like.
Now set another row of boxes. You can have as many as the first row did, or fewer. You just can’t have more. Set the second row of boxes — well, your choice. Either below the first row, or else above. I’m going to assume you’re going below the first row, and will write my directions accordingly. If you do things the other way you’re following a common enough convention. I’m leaving it on you to figure out what the directions should be, though.
Now add in a third row of boxes, if you like. Again, as many or as few boxes as you like. There can’t be more than there are in the second row. Set it below the second row.
And a fourth row, if you want four rows. Again, no more boxes in it than the third row had. Keep this up until you’ve got tired of adding rows of boxes.
How many boxes do you have? I don’t know. But take the numbers 1, 2, 3, 4, 5, and so on, up to whatever the count of your boxes is. Can you fill in one number for each box? So that the numbers are always increasing as you go left to right in a single row? And as you go top to bottom in a single column? Yes, of course. Go in order: ‘1’ for the first box you laid down, then ‘2’, then ‘3’, and so on, increasing up to the last box in the last row.
Can you do it in another way? Any other order?
Except for the simplest of arrangements, like a single row of four boxes or three rows of one box atop another, the answer is yes. There can be many of them, turns out. Seven boxes, arranged three in the first row, two in the second, one in the third, and one in the fourth, have 35 possible arrangements. It doesn’t take a very big diagram to get an enormous number of possibilities. Could be fun drawing an arbitrary stack of boxes and working out how many arrangements there are, if you have some time in a dull meeting to pass.
Let me step away from filling boxes. In one of its later, disappointing, seasons Futurama finally did a body-swap episode. The gimmick: two bodies could only swap the brains within them one time. So would it be possible to put Bender’s brain back in his original body, if he and Amy (or whoever) had already swapped once? The episode drew minor amusement in mathematics circles, and a lot of amazement in pop-culture circles. The writer, a mathematics major, found a proof that showed it was indeed always possible, even after many pairs of people had swapped bodies. The idea that a theorem was created for a TV show impressed many people who think theorems are rarer and harder to create than they necessarily are.
It was a legitimate theorem, and in a well-developed field of mathematics. It’s about permutation groups. These are the study of the ways you can swap pairs of things. I grant this doesn’t sound like much of a field. There is a surprising lot of interesting things to learn just from studying how stuff can be swapped, though. It’s even of real-world relevance. Most subatomic particles of a kind — electrons, top quarks, gluons, whatever — are identical to every other particle of the same kind. Physics wouldn’t work if they weren’t. What would happen if we swap the electron on the left for the electron on the right, and vice-versa? How would that change our physics?
A chunk of quantum mechanics studies what kinds of swaps of particles would produce an observable change, and what kind of swaps wouldn’t. When the swap doesn’t make a change we can describe this as a symmetric operation. When the swap does make a change, that’s an antisymmetric operation. And — the Young Tableau that’s a single row of two boxes? That matches up well with this symmetric operation. The Young Tableau that’s two rows of a single box each? That matches up with the antisymmetric operation.
How many ways could you set up three boxes, according to the rules of the game? A single row of three boxes, sure. One row of two boxes and a row of one box. Three rows of one box each. How many ways are there to assign the numbers 1, 2, and 3 to those boxes, and satisfy the rules? One way to do the single row of three boxes. Also one way to do the three rows of a single box. There’s two ways to do the one-row-of-two-boxes, one-row-of-one-box case.
What if we have three particles? How could they interact? Well, all three could be symmetric with each other. This matches the first case, the single row of three boxes. All three could be antisymmetric with each other. This matches the three rows of one box. Or you could have two particles that are symmetric with each other and antisymmetric with the third particle. Or two particles that are antisymmetric with each other but symmetric with the third particle. Two ways to do that. Two ways to fill in the one-row-of-two-boxes, one-row-of-one-box case.
This isn’t merely a neat, aesthetically interesting coincidence. I wouldn’t spend so much time on it if it were. There’s a matching here that’s built on something meaningful. The different ways to arrange numbers in a set of boxes like this pair up with a select, interesting set of matrices whose elements are complex-valued numbers. You might wonder who introduced complex-valued numbers, let alone matrices of them, into evidence. Well, who cares? We’ve got them. They do a lot of work for us. So much work they have a common name, the “symmetric group over the complex numbers”. As my leading example suggests, they’re all over the place in quantum mechanics. They’re good to have around in regular physics too, at least in the right neighborhoods.
Porter’s chapter also lets me tie this back to tensors. Tensors have varied ranks, the number of different indicies you can have on the things. What happens when you swap pairs of indices in a tensor? How many ways can you swap them, and what does that do to what the tensor describes? Please tell me you already suspect this is going to match something in Young Tableaus. They do this by way of the symmetries and permutations mentioned above. But they are there.
As I say, three months ago I had no idea these things existed. If I ever ran across them it was from seeing the name at MathWorld’s list of terms that start with ‘Y’. The article shows some nice examples (with each rows a atop the previous one) but doesn’t make clear how much stuff this subject runs through. I can’t fit everything in to a reasonable essay. (For example: the number of ways to arrange, say, 20 boxes into rows meeting these rules is itself a partition problem. Partition problems are probability and statistical mechanics. Statistical mechanics is the flow of heat, and the movement of the stars in a galaxy, and the chemistry of life.) I am delighted by what does fit.
I have two pieces to assemble for this. One is in factors. We can take any counting number, a positive whole number, and write it as the product of prime numbers. 2038 is equal to the prime 2 times the prime 1019. 4312 is equal to 2 raised to the third power times 7 raised to the second times 11. 1040 is 2 to the fourth power times 5 times 13. 455 is 5 times 7 times 13.
There are many ways to divide up numbers like this. Here’s one. Is there a square number among its factors? 2038 and 455 don’t have any. They’re each a product of prime numbers that are never repeated. 1040 has a square among its factors. 2 times 2 divides into 1040. 4312, similarly, has a square: we can write it as 2 squared times 2 times 7 squared times 11. So that is my first piece. We can divide counting numbers into squarefree and not-squarefree.
The other piece is in binomial coefficients. These are numbers, often quite big numbers, that get dumped on the high school algebra student as she tries to work with some expression like . They’re also dumped on the poor student in calculus, as something about Newton’s binomial coefficient theorem. Which we hear is something really important. In my experience it wasn’t explained why this should rank up there with, like, the differential calculus. (Spoiler: it’s because of polynomials.) But it’s got some great stuff to it.
Binomial coefficients are among those utility players in mathematics. They turn up in weird places. In dealing with polynomials, of course. They also turn up in combinatorics, and through that, probability. If you run, for example, 10 experiments each of which could succeed or fail, the chance you’ll get exactly five successes is going to be proportional to one of these binomial coefficients. That they touch on polynomials and probability is a sign we’re looking at a thing woven into the whole universe of mathematics. We saw them some in talking, last A-To-Z around, about Yang Hui’s Triangle. That’s also known as Pascal’s Triangle. It has more names too, since it’s been found many times over.
The theorem under discussion is about central binomial coefficients. These are one specific coefficient in a row. The ones that appear, in the triangle, along the line of symmetry. They’re easy to describe in formulas. for a whole number ‘n’ that’s greater than or equal to zero, evaluate what we call 2n choose n:
If ‘n’ is zero, this number is or 1. If ‘n’ is 1, this number is or 2. If ‘n’ is 2, this number is 6. If ‘n’ is 3, this number is (sparing the formula) 20. The numbers keep growing. 70, 252, 924, 3432, 12870, and so on.
So. 1 and 2 and 6 are squarefree numbers. Not much arguing that. But 20? That’s 2 squared times 5. 70? 2 times 5 times 7. 252? 2 squared times 3 squared times 7. 924? That’s 2 squared times 3 times 7 times 11. 3432? 2 cubed times 3 times 11 times 13; there’s a 2 squared in there. 12870? 2 times 3 squared times it doesn’t matter anymore. It’s not a squarefree number.
There’s a bunch of not-squarefree numbers in there. The question: do we ever stop seeing squarefree numbers here?
So here’s Sárközy’s Theorem. It says that this central binomial coefficient is never squarefree as long as ‘n’ is big enough. András Sárközy showed in 1985 that this was true. How big is big enough? … We have a bound, at least, for this theorem. If ‘n’ is larger than the number then the corresponding coefficient can’t be squarefree. It might not surprise you that the formulas involved here feature the Riemann Zeta function. That always seems to turn up for questions about large prime numbers.
That’s a common state of affairs for number theory problems. Very often we can show that something is true for big enough numbers. I’m not sure there’s a clear reason why. When numbers get large enough it can be more convenient to deal with their logarithms, I suppose. And those look more like the real numbers than the integers. And real numbers are typically easier to prove stuff about. Maybe that’s it. This is vague, yes. But to ask ‘why’ some things are easy and some are hard to prove is a hard question. What is a satisfying ’cause’ here?
It’s tempting to say that since we know this is true for all ‘n’ above a bound, we’re done. We can just test all the numbers below that bound, and the rest is done. You can do a satisfying proof this way: show that eventually the statement is true, and show all the special little cases before it is. This particular result is kind of useless, though. is a number that’s something like 241 digits long. For comparison, the total number of things in the universe is something like a number about 80 digits long. Certainly not more than 90. It’d take too long to test all those cases.
That’s all right. Since Sárközy’s proof in 1985 there’ve been other breakthroughs. In 1988 P Goetgheluck proved it was true for a big range of numbers: every ‘n’ that’s larger than 4 and less than . That’s a number something more than 12 million digits long. In 1991 I Vardi proved we had no squarefree central binomial coefficients for ‘n’ greater than 4 and less than , which is a number about 233 million digits long. And then in 1996 Andrew Granville and Olivier Ramare showed directly that this was so for all ‘n’ larger than 4.
So that 70 that turned up just a few lines in is the last squarefree one of these coefficients.
Is this surprising? Maybe, maybe not. I’ll bet most of you didn’t have an opinion on this topic twenty minutes ago. Let me share something that did surprise me, and continues to surprise me. In 1974 David Singmaster proved that any integer divides almost all the binomial coefficients out there. “Almost all” is here a term of art, but it means just about what you’d expect. Imagine the giant list of all the numbers that can be binomial coefficients. Then pick any positive integer you like. The number you picked will divide into so many of the giant list that the exceptions won’t be noticeable. So that square numbers like 4 and 9 and 16 and 25 should divide into most binomial coefficients? … That’s to be expected, suddenly. Into the central binomial coefficients? That’s not so obvious to me. But then so much of number theory is strange and surprising and not so obvious.
Today’s A To Z entry is a change of pace. It dives deeper into analysis than this round has been. The term comes from Mr Wu, of the Singapore Maths Tuition blog, whom I thank for the request.
An old joke, as most of my academia-related ones are. The young scholar says to his teacher how amazing it was in the old days, when people were foolish, and thought the Sun and the Stars moved around the Earth. How fortunate we are to know better. The elder says, ah yes, but what would it look like if it were the other way around?
There are many things to ponder packed into that joke. For one, the elder scholar’s awareness that our ancestors were no less smart or perceptive or clever than we are. For another, the awareness that there is a problem. We want to know about the universe. But we can only know what we perceive now, where we are at this moment. Even a note we’ve written in the past, or a message from a trusted friend, we can’t take uncritically. What we know is that we perceive this information in this way, now. When we pay attention to our friends in the philosophy department we learn that knowledge is even harder than we imagine. But I’ll stop there. The problem is hard enough already.
We can put it in a mathematical form, one that seems immune to many of the worst problems of knowledge. In this form it looks something like this: if what can we know about the universe, if all we really know is what things in that universe are doing near us? The things that we look at are functions. The universe we’re hoping to understand is the domain of the functions. One filter we use to see the universe is Morse Theory.
We don’t look at every possible function. Functions are too varied and weird for that. We look at functions whose range is the real numbers. And they must be smooth. This is a term of art. It means the function has derivatives. It has to be continuous. It can’t have sharp corners. And it has to have lots of derivatives. The first derivative of a smooth function has to also be continuous, and has to also lack corners. And the derivative of that first derivative has to be continuous, and to lack corners. And the derivative of that derivative has to be the same. A smooth function can can differentiate over and over again, infinitely many times. None of those derivatives can have corners or jumps or missing patches or anything. This is what makes it smooth.
Most functions are not smooth, in much the same way most shapes are not circles. That’s all right. There are many smooth functions anyway, and they describe things we find interesting. Or we think they’re interesting, anyway. Smooth functions are easy for us to work with, and to know things about. There’s plenty of smooth functions. If you’re interested in something else there’s probably a smooth function that’s close enough for practical use.
Morse Theory builds on the “critical points” of these smooth functions. A critical point, in this context, is one where the derivative is zero. Derivatives being zero usually signal something interesting going on. Often they show where the function changes behavior. In freshman calculus they signal where a function changes from increasing to decreasing, so the critical point is a maximum. In physics they show where a moving body no longer has an acceleration, so the critical point is an equilibrium. Or where a system changes from one kind of behavior to another. And here — well, many things can happen.
So take a smooth function. And take a critical point that it’s got. (And, erg. Technical point. The derivative of your smooth function, at that critical point, shouldn’t be having its own critical point going on at the same spot. That makes stuff more complicated.) It’s possible to approximate your smooth function near that critical point with, of course, a polynomial. It’s always polynomials. The shape of these polynomials gives you an index for these points. And that can tell you something about the shape of the domain you’re on.
At least, it tells you something about what the shape is where you are. The universal model for this — based on skimming texts and papers and popularizations of this — is of a torus standing vertically. Like a doughnut that hasn’t tipped over, or like a tire on a car that’s working as normal. I suspect this is the best shape to use for teaching, as anyone can understand it while it still shows the different behaviors. I won’t resist.
Imagine slicing this tire horizontally. Slice it close to the bottom, below the central hole, and the part that drops down is a disc. At least, it could be flattened out tolerably well to a disc.
Slice it somewhere that intersects the hole, though, and you have a different shape. You can’t squash that down to a disc. You have a noodle shape. A cylinder at least. That’s different from what you got the first slice.
Slice the tire somewhere higher. Somewhere above the central hole, and you have … well, it’s still a tire. It’s got a hole in it, but you could imagine patching it and driving on. There’s another different shape that we’ve gotten from this.
Imagine we were confined to the surface of the tire, but did not know what surface it was. That we start at the lowest point on the tire and ascend it. From the way the smooth functions around us change we can tell how the surface we’re on has changed. We can see its change from “basically a disc” to “basically a noodle” to “basically a doughnut”. We could work out what the surface we’re on has to be, thanks to how these smooth functions around us change behavior.
Occasionally we mathematical-physics types want to act as though we’re not afraid of our friends in the philosophy department. So we deploy the second thing we know about Immanuel Kant. He observed that knowing the force of gravity falls off as the square of the distance between two things implies that the things should exist in a three-dimensional space. (Source: I dunno, I never read his paper or book or whatever and dunno I ever heard anyone say they did.) It’s a good observation. Geometry tells us what physics can happen, but what physics does happen tells us what geometry they happen in. And it tells the philosophy department that we’ve heard of Immanuel Kant. This impresses them greatly, we tell ourselves.
Morse Theory is a manifestation of how observable physics teaches us the geometry they happen on. And in an urgent way, too. Some of Edward Witten’s pioneering work in superstring theory was in bringing Morse Theory to quantum field theory. He showed a set of problems called the Morse Inequalities gave us insight into supersymmetric quantum mechanics. The link between physics and doughnut-shapes may seem vague. This is because you’re not remembering that mathematical physics sees “stuff happening” as curves drawn on shapes which represent the kind of problem you’re interested in. Learning what the shapes representing the problem look like is solving the problem.
If you’re interested in the substance of this, the universally-agreed reference is J Milnor’s 1963 text Morse Theory. I confess it’s hard going to read, because it’s a symbols-heavy textbook written before the existence of LaTeX. Each page reminds one why typesetters used to get hazard pay, and not enough of it.
So stop me if you’ve heard this one before. We’re going to make something interesting. You bring to it a complex-valued number. Anything you like. Let me call it ‘s’ for the sake of convenience. I know, it’s weird not to call it ‘z’, but that’s how this field of mathematics developed. I’m going to make a series built on this. A series is the sum of all the terms in a sequence. I know, it seems weird for a ‘series’ to be a single number, but that’s how that field of mathematics developed. The underlying sequence? I’ll make it in three steps. First, I start with all the counting numbers: 1, 2, 3, 4, 5, and so on. Second, I take each one of those terms and raise them to the power of your ‘s’. Third, I take the reciprocal of each of them. That’s the sequence. And when we add —
Yes, that’s right, it’s the Riemann-Zeta Function. The one behind the Riemann Hypothesis. That’s the mathematical conjecture that everybody loves to cite as the biggest unsolved problem in mathematics now that we know someone did something about Fermat’s Last Theorem. The conjecture is about what the zeroes of this function are. What values of ‘s’ make this sum equal to zero? Some boring ones. Zero, negative two, negative four, negative six, and so on. It has a lot of non-boring zeroes. All the ones we know of have an ‘s’ with a real part of ½. So far we know of at least 36 billion values of ‘s’ that make this add up to zero. They’re all ½ plus some imaginary number. We conjecture that this isn’t coincidence and all the non-boring zeroes are like that. We might be wrong. But it’s the way I would bet.
Anyone who’d be reading this far into a pop mathematics blog knows something of why the Riemann Hypothesis is interesting. It carries implications about prime numbers. It tells us things about a host of other theorems that are nice to have. Also they know it’s hard to prove. Really, really hard.
Ancient mathematical lore tells us there are a couple ways to solve a really, really hard problem. One is to narrow its focus. Try to find as simple a case of it as you can solve. Maybe a second simple case you can solve. Maybe a third. This could show you how, roughly, to solve the general problem. Not always. Individual cases of Fermat’s Last Theorem are easy enough to solve. You can show that doesn’t have any non-boring answers where a, b, and c are all positive whole numbers. Same with , though it takes longer. That doesn’t help you with the general .
There’s another approach. It sounds like the sort of crazy thing Captain Kirk would get away with. It’s to generalize, to make a bigger, even more abstract problem. Sometimes that makes it easier.
For the Riemann-Zeta Function there’s one compelling generalization. It fits into that sequence I described making. After taking the reciprocals of integers-raised-to-the-s-power, multiply each by some number. Which number? Well, that depends on what you like. It could be the same number every time, if you like. That’s boring, though. That’s just the Riemann-Zeta Function times your number. It’s more interesting if what number you multiply by depends on which integer you started with. (Do not let it depend on ‘s’; that’s more complicated than you want.) When you do that? Then you’ve created an L-Function.
Specifically, you’ve created a Dirichlet L-Function. Dirichlet here is Peter Gustav Lejeune Dirichlet, a 19th century German mathematician who got his name on like everything. He did major work on partial differential equations, on Fourier series, on topology, in algebra, and on number theory, which is what we’d call these L-functions. There are other L-Functions, with identifying names such as Artin and Hecke and Euler, which get more directly into group theory. They look much like the Dirichlet L-Function. In building the sequence I described in the top paragraph, they do something else for the second step.
The L-Function is going to look like this:
The sigma there means to evaluate the thing that comes after it for each value of ‘n’ starting at 1 and increasing, by 1, up to … well, something infinitely large. The are the numbers you’ve picked. They’re some value that depend on the index ‘n’, but don’t depend on the power ‘s’. This may look funny but it’s a standard way of writing the terms in a sequence.
An L-Function has to meet some particular criteria that I’m not going to worry about here. Look them up before you get too far into your research. These criteria give us ways to classify different L-Functions, though. We can describe them by degree, much as we describe polynomials. We can describe them by signature, part of those criteria I’m not getting into. We can describe them by properties of the extra numbers, the ones in that fourth step that you multiply the reciprocals by. And so on. LMFDB, an encyclopedia of L-Functions, lists eight or nine properties usable for a taxonomy of these things. (The ambiguity is in what things you consider to depend on what other things.)
What makes this interesting? For one, everything that makes the Riemann Hypothesis interesting. The Riemann-Zeta Function is a slice of the L-Functions. But there’s more. They merge into elliptic curves. Every elliptic curve corresponds to some L-Function. We can use the elliptic curve or the L-Function to prove what we wish to show. Elliptic curves are subject to group theory; so, we can bring group theory into these series.
And then it gets deeper. It always does. Go back to that formula for the L-Function like I put in mathematical symbols. I’m going to define a new function. It’s going to look a lot like a polynomial. Well, that L(s) already looked a lot like a polynomial, but this is going to look even more like one.
Pick a number τ. It’s complex-valued. Any number. All that I care is that its imaginary part be positive. In the trade we say that’s “in the upper half-plane”, because we often draw complex-valued numbers as points on a plane. The real part serves as the horizontal and the imaginary part serves as the vertical axis.
Now go back to your L-Function. Remember those numbers you picked? Good. I’m going to define a new function based on them. It looks like this:
You see what I mean about looking like a polynomial? If τ is a complex-valued number, then is just another complex-valued number. If we gave that a new name like ‘z’, this function would look like the sum of constants times z raised to positive powers. We’d never know it was any kind of weird polynomial.
Anyway. This new function ‘f(τ)’ has some properties. It might be something called a weight-2 Hecke eigenform, a thing I am not going to explain without charging someone by the hour. But see the logic here: every elliptic curve matches with some kind of L-Function. Each L-Function matches with some ‘f(τ)’ kind of function. Those functions might or might not be these weight-2 Hecke eigenforms.
So here’s the thing. There was a big hypothesis formed in the 1950s that every rational elliptic curve matches to one of these ‘f(τ)’ functions that’s one of these eigenforms. It’s true. It took decades to prove. You may have heard of it, as the Taniyama-Shimura Conjecture. In the 1990s Wiles and Taylor proved this was true for a lot of elliptic curves, which is what proved Fermat’s Last Theorem after all that time. The rest of it was proved around 2000.
As I said, sometimes you have to make your problem bigger and harder to get something interesting out of it.
I am one letter closer to the end of Gaurish’s main block of requests. They’re all good ones, mind you. This gets me back into elliptic curves and Diophantine equations. I might be writing about the wrong thing.
My love’s father has a habit of asking us to rate our hobbies. This turned into a new running joke over a family vacation this summer. It’s a simple joke: I shuffled the comparables. “Which is better, Bon Jovi or a roller coaster?” It’s still a good question.
But as genial yet nasty as the spoof is, my love’s father asks natural questions. We always want to compare things. When we form a mathematical construct we look for ways to measure it. There’s typically something. We’ll put one together. We call this a height function.
We start with an elliptic curve. The coordinates of the points on this curve satisfy some equation. Well, there are many equations they satisfy. We pick one representation for convenience. The convenient thing is to have an easy-to-calculate height. We’ll write the equation for the curve as
Here both ‘A’ and ‘B’ are some integers. This form might be unique, depending on whether a slightly fussy condition on prime numbers hold. (Specifically, if ‘p’ is a prime number and ‘p4‘ divides into ‘A’, then ‘p6‘ must not divide into ‘B’. Yes, I know you realized that right away. But I write to a general audience, some of whom are learning how to see these things.) Then the height of this curve is whichever is the larger number, four times the cube of the absolute value of ‘A’, or 27 times the square of ‘B’. I ask you to just run with it. I don’t know the implications of the height function well enough to say why, oh, 25 times the square of ‘B’ wouldn’t do as well. The usual reason for something like that is that some obvious manipulation makes the 27 appear right away, or disappear right away.
This idea of height feeds in to a measure called rank. “Rank” is a term the young mathematician encounters first while learning matrices. It’s the number of rows in a matrix that aren’t equal to some sum or multiple of other rows. That is, it’s how many different things there are among a set. You can see why we might find that interesting. So many topics have something called “rank” and it measures how many different things there are in a set of things. In elliptic curves, the rank is a measure of how complicated the curve is. We can imagine the rational points on the elliptic curve as things generated by some small set of starter points. The starter points have to be of infinite order. Starter points that don’t, don’t count for the rank. Please don’t worry about what “infinite order” means here. I only mention this infinite-order business because if I don’t then something I have to say about two paragraphs from here will sound daft. So, the rank is how many of these starter points you need to generate the elliptic curve. (WARNING: Call them “generating points” or “generators” during your thesis defense.)
There’s no known way of guessing what the rank is if you just know ‘A’ and ‘B’. There are algorithms that can calculate the rank given a particular ‘A’ and ‘B’. But it’s not something like the quadratic formula where you can just do a quick calculation and know what you’re looking for. We don’t even know if the algorithms we have will work for every elliptic curve.
We think that there’s no limit to the height of elliptic curves. We don’t know this. We know there exist curves with ranks as high as 28. They seem to be rare [*]. I don’t know if that’s proven. But we do know there are elliptic curves with rank zero. A lot of them, in fact. (See what I meant two paragraphs back?) These are the elliptic curves that have only finitely many rational points on them.
And there’s a lot of those. There’s a well-respected that the average rank, of all the elliptic curves there are, is ½. It might be. What we have been able to prove is that the average rank is less than or equal to 1.17. Also that it should be larger than zero. So we’re maybe closing in on the ½ conjecture? At least we know something. I admit this essay I’ve started wondering what we do know of elliptic curves.
What do the height, and through it the rank, get us? I worry I’m repeating myself. By themselves they give us families of elliptic curves. Shapes that are similar in a particular and not-always-obvious way. And they feed into the Birch and Swinnerton-Dyer conjecture, which is the hipster’s Riemann Hypothesis. That is, it’s this big, unanswered, important problem that would, if answered, tell us things about a lot of questions that I’m not sure can be concisely explained. At least not why they’re interesting. We know some special cases, at least. Wikipedia tells me nothing’s proved for curves with rank greater than 1. Humanity’s ignorance on this point makes me feel slightly better pondering what I don’t know about elliptic curves.
(There are some other things within the field of elliptic curves called height functions. There’s particularly a height of individual points. I was unsure which height Gaurish found interesting so chose one. The other starts by measuring something different; it views, for example, as having a lower height than does , even though the numbers are quite close in value. It develops along similar lines, trying to find classes of curves with similar behavior. And it gets into different unsolved conjectures. We have our ideas about how to think of fields.).
[*] Wikipedia seems to suggest we only know of one, provided by Professor Noam Elkies in 2006, and let me quote it in full. I apologize that it isn’t in the format I suggested at top was standard. Elkies way outranks me academically so we have to do things his way:
I can’t figure how to get WordPress to present that larger. I sympathize. I’m tired just looking at an equation like that. This page lists records of known elliptic curve ranks. I don’t know if the lack of any records more recent than 2006 reflects the page not having been updated or nobody having found a rank-29 curve. I fully accept the field might be more difficult than even doing maintenance on a web page’s content is.
A Diophantine equation is a polynomial. Well, of course it is. It’s an equation, or a set of equations, setting one polynomial equal to another. Possibly equal to a constant. What makes this different from “any old equation” is the coefficients. These are the constant numbers that you multiply the variables, your x and y and x2 and z8 and so on, by. To make a Diophantine equation all these coefficients have to be integers. You know one well, because it’s that thing that Fermat’s Last Theorem is all about. And you’ve probably seen . It turns up a lot because that’s a line, and we do a lot of stuff with lines.
Diophantine equations are interesting. There are a couple of cases that are easy to solve. I mean, at least that we can find solutions for. , for example, that’s easy to solve. it turns out we can’t solve. Well, we can if n is equal to 1 or 2. Or if x or y or z are zero. These are obvious, that is, they’re quite boring. That one took about four hundred years to solve, and the solution was “there aren’t any solutions”. This may convince you of how interesting these problems are. What, from looking at it, tells you that is simple while is (most of the time) impossible?
I don’t know. Nobody really does. There are many kinds of Diophantine equation, all different-looking polynomials. Some of them are special one-off cases, like . For example, there’s for some integers x, y, z, and w. Leonhard Euler conjectured this equation had only boring solutions. You’ll remember Euler. He wrote the foundational work for every field of mathematics. It turns out he was wrong. It has infinitely many interesting solutions. But the smallest one is and that one took a computer search to find. We can forgive Euler not noticing it.
Some are groups of equations that have similar shapes. There’s the Fermat’s Last Theorem formula, for example, which is a different equation for every different integer n. Then there’s what we call Pell’s Equation. This one is (or equals -1), for some counting number D. It’s named for the English mathematician John Pell, who did not discover the equation (even in the Western European tradition; Indian mathematicians were familiar with it for a millennium), did not solve the equation, and did not do anything particularly noteworthy in advancing human understanding of the solution. Pell owes his fame in this regard to Leonhard Euler, who misunderstood Pell’s revising a translation of a book discussing a solution for Pell’s authoring a solution. I confess Euler isn’t looking very good on Diophantine equations.
But nobody looks very good on Diophantine equations. Make up a Diophantine equation of your own. Use whatever whole numbers, positive or negative, that you like for your equation. Use whatever powers of however many variables you like for your equation. So you get something that looks maybe like this:
Does it have any solutions? I don’t know. Nobody does. There isn’t a general all-around solution. You know how with a quadratic equation we have this formula where you recite some incantation about “b squared minus four a c” and get any roots that exist? Nothing like that exists for Diophantine equations in general. Specific ones, yes. But they’re all specialties, crafted to fit the equation that has just that shape.
So for each equation we have to ask: is there a solution? Is there any solution that isn’t obvious? Are there finitely many solutions? Are there infinitely many? Either way, can we find all the solutions? And we have to answer them anew. What answers these have? Whether answers are known to exist? Whether answers can exist? We have to discover anew for each kind of equation. Knowing answers for one kind doesn’t help us for any others, except as inspiration. If some trick worked before, maybe it will work this time.
There are a couple usually reliable tricks. Can the equation be rewritten in some way that it becomes the equation for a line? If it can we probably have a good handle on any solutions. Can we apply modulo arithmetic to the equation? If it is, we might be able to reduce the number of possible solutions that the equation has. In particular we might be able to reduce the number of possible solutions until we can just check every case. Can we use induction? That is, can we show there’s some parameter for the equations, and that knowing the solutions for one value of that parameter implies knowing solutions for larger values? And then find some small enough value we can test it out by hand? Or can we show that if there is a solution, then there must be a smaller solution, and smaller yet, until we can either find an answer or show there aren’t any? Sometimes. Not always. The field blends seamlessly into number theory. And number theory is all sorts of problems easy to pose and hard or impossible to solve.
We name these equation after Diophantus of Alexandria, a 3rd century Greek mathematician. His writings, what we have of them, discuss how to solve equations. Not general solutions, the way we might want to solve , but specific ones, like . His books are among those whose rediscovery shaped the rebirth of mathematics. Pierre de Fermat’s scribbled his famous note in the too-small margins of Diophantus’s Arithmetica. (Well, a popular translation.)
But the field predates Diophantus, at least if we look at specific problems. Of course it does. In mathematics, as in life, any search for a source ends in a vast, marshy ambiguity. The field stays vital. If we loosen ourselves to looking at inequalities — , let's say — then we start seeing optimization problems. What values of x and y will make this equation most nearly true? What values will come closest to satisfying this bunch of equations? The questions are about how to find the best possible fit to whatever our complicated sets of needs are. We can't always answer. We keep searching.
This is another supplemental piece because it’s too much to include in the next bit of Why Stuff Can Orbit. I need some more stuff about how a mathematical physicist would look at something.
This is also a story about approximations. A lot of mathematics is really about approximations. I don’t mean numerical computing. We all know that when we compute we’re making approximations. We use 0.333333 instead of one-third and we use 3.141592 instead of π. But a lot of precise mathematics, what we call analysis, is also about approximations. We do this by a logical structure that works something like this: take something we want to prove. Now for every positive number ε we can find something — a point, a function, a curve — that’s no more than ε away from the thing we’re really interested in, and which is easier to work with. Then we prove whatever we want to with the easier-to-work-with thing. And since ε can be as tiny a positive number as we want, we can suppose ε is a tinier difference than we can hope to measure. And so the difference between the thing we’re interested in and the thing we’ve proved something interesting about is zero. (This is the part that feels like we’re pulling a scam. We’re not, but this is where it’s worth stopping and thinking about what we mean by “a difference between two things”. When you feel confident this isn’t a scam, continue.) So we proved whatever we proved about the thing we’re interested in. Take an analysis course and you will see this all the time.
When we get into mathematical physics we do a lot of approximating functions with polynomials. Why polynomials? Yes, because everything is polynomials. But also because polynomials make so much mathematical physics easy. Polynomials are easy to calculate, if you need numbers. Polynomials are easy to integrate and differentiate, if you need analysis. Here that’s the calculus that tells you about patterns of behavior. If you want to approximate a continuous function you can always do it with a polynomial. The polynomial might have to be infinitely long to approximate the entire function. That’s all right. You can chop it off after finitely many terms. This finite polynomial is still a good approximation. It’s just good for a smaller region than the infinitely long polynomial would have been.
Necessary qualifiers: pages 65 through 82 of any book on real analysis.
So. Let me get to functions. I’m going to use a function named ‘f’ because I’m not wasting my energy coming up with good names. (When we get back to the main Why Stuff Can Orbit sequence this is going to be ‘U’ for potential energy or ‘E’ for energy.) It’s got a domain that’s the real numbers, and a range that’s the real numbers. To express this in symbols I can write . If I have some number called ‘x’ that’s in the domain then I can tell you what number in the domain is matched by the function ‘f’ to ‘x’: it’s the number ‘f(x)’. You were expecting maybe 3.5? I don’t know that about ‘f’, not yet anyway. The one thing I do know about ‘f’, because I insist on it as a condition for appearing, is that it’s continuous. It hasn’t got any jumps, any gaps, any regions where it’s not defined. You could draw a curve representing it with a single, if wriggly, stroke of the pen.
I mean to build an approximation to the function ‘f’. It’s going to be a polynomial expansion, a set of things to multiply and add together that’s easy to find. To make this polynomial expansion this I need to choose some point to build the approximation around. Mathematicians call this the “point of expansion” because we froze up in panic when someone asked what we were going to name it, okay? But how are we going to make an approximation to a function if we don’t have some particular point we’re approximating around?
(One answer we find in grad school when we pick up some stuff from linear algebra we hadn’t been thinking about. We’ll skip it for now.)
I need a name for the point of expansion. I’ll use ‘a’. Many mathematicians do. Another popular name for it is ‘x0‘. Or if you’re using some other variable name for stuff in the domain then whatever that variable is with subscript zero.
So my first approximation to the original function ‘f’ is … oh, shoot, I should have some new name for this. All right. I’m going to use ‘F0‘ as the name. This is because it’s one of a set of approximations, each of them a little better than the old. ‘F1‘ will be better than ‘F0‘, but ‘F2‘ will be even better, and ‘F2038‘ will be way better yet. I’ll also say something about what I mean by “better”, although you’ve got some sense of that already.
I start off by calling the first approximation ‘F0‘ by the way because you’re going to think it’s too stupid to dignify with a number as big as ‘1’. Well, I have other reasons, but they’ll be easier to see in a bit. ‘F0‘, like all its sibling ‘Fn‘ functions, has a domain of the real numbers and a range of the real numbers. The rule defining how to go from a number ‘x’ in the domain to some real number in the range?
That is, this first approximation is simply whatever the original function’s value is at the point of expansion. Notice that’s an ‘x’ on the left side of the equals sign and an ‘a’ on the right. This seems to challenge the idea of what an “approximation” even is. But it’s legit. Supposing something to be constant is often a decent working assumption. If you failed to check what the weather for today will be like, supposing that it’ll be about like yesterday will usually serve you well enough. If you aren’t sure where your pet is, you look first wherever you last saw the animal. (Or, yes, where your pet most loves to be. A particular spot, though.)
We can make this rigorous. A mathematician thinks this is rigorous: you pick any margin of error you like. Then I can find a region near enough to the point of expansion. The value for ‘f’ for every point inside that region is ‘f(a)’ plus or minus your margin of error. It might be a small region, yes. Doesn’t matter. It exists, no matter how tiny your margin of error was.
But yeah, that expansion still seems too cheap to work. My next approximation, ‘F1‘, will be a little better. I mean that we can expect it will be closer than ‘F0‘ was to the original ‘f’. Or it’ll be as close for a bigger region around the point of expansion ‘a’. What it’ll represent is a line. Yeah, ‘F0‘ was a line too. But ‘F0‘ is a horizontal line. ‘F1‘ might be a line at some completely other angle. If that works better. The second approximation will look like this:
Here ‘m’ serves its traditional yet poorly-explained role as the slope of a line. What the slope of that line should be we learn from the derivative of the original ‘f’. The derivative of a function is itself a new function, with the same domain and the same range. There’s a couple ways to denote this. Each way has its strengths and weaknesses about clarifying what we’re doing versus how much we’re writing down. And trying to write down almost anything can inspire confusion in analysis later on. There’s a part of analysis when you have to shift from thinking of particular problems to how problems work then.
So I will define a new function, spoken of as f-prime, this way:
If you look closely you realize there’s two different meanings of ‘x’ here. One is the ‘x’ that appears in parentheses. It’s the value in the domain of f and of f’ where we want to evaluate the function. The other ‘x’ is the one in the lower side of the derivative, in that . That’s my sloppiness, but it’s not uniquely mine. Mathematicians keep this straight by using the symbols so much they don’t even see the ‘x’ down there anymore so have no idea there’s anything to find confusing. Students keep this straight by guessing helplessly about what their instructors want and clinging to anything that doesn’t get marked down. Sorry. But what this means is to “take the derivative of the function ‘f’ with respect to its variable, and then, evaluate what that expression is for the value of ‘x’ that’s in parentheses on the left-hand side”. We can do some things that avoid the confusion in symbols there. They all require adding some more variables and some more notation in, and it looks like overkill for a measly definition like this.
Anyway. We really just want the deriviate evaluated at one point, the point of expansion. That is:
which by the way avoids that overloaded meaning of ‘x’ there. Put this together and we have what we call the tangent line approximation to the original ‘f’ at the point of expansion:
This is also called the tangent line, because it’s a line that’s tangent to the original function. A plot of ‘F1‘ and the original function ‘f’ are guaranteed to touch one another only at the point of expansion. They might happen to touch again, but that’s luck. The tangent line will be close to the original function near the point of expansion. It might happen to be close again later on, but that’s luck, not design. Most stuff you might want to do with the original function you can do with the tangent line, but the tangent line will be easier to work with. It exactly matches the original function at the point of expansion, and its first derivative exactly matches the original function’s first derivative at the point of expansion.
We can do better. We can find a parabola, a second-order polynomial that approximates the original function. This will be a function ‘F2(x)’ that looks something like:
What we’re doing is adding a parabola to the approximation. This is that curve that looks kind of like a loosely-drawn U. The ‘m2‘ there measures how spread out the U is. It’s not quite the slope, but it’s kind of like that, which is why I’m using the letter ‘m’ for it. Its value we get from the second derivative of the original ‘f’:
We find the second derivative of a function ‘f’ by evaluating the first derivative, and then, taking the derivative of that. We can denote it with two ‘ marks after the ‘f’ as long as we aren’t stuck wrapping the function name in ‘ marks to set it out. And so we can describe the function this way:
This will be a better approximation to the original function near the point of expansion. Or it’ll make larger the region where the approximation is good.
If the first derivative of a function at a point is zero that means the tangent line is horizontal. In physics stuff this is an equilibrium. The second derivative can tell us whether the equilibrium is stable or not. If the second derivative at the equilibrium is positive it’s a stable equilibrium. The function looks like a bowl open at the top. If the second derivative at the equilibrium is negative then it’s an unstable equilibrium.
We can make better approximations yet, by using even more derivatives of the original function ‘f’ at the point of expansion:
There’s better approximations yet. You can probably guess what the next, fourth-degree, polynomial would be. Or you can after I tell you the fraction in front of the new term will be . The only big difference is that after about the third derivative we give up on adding ‘ marks after the function name ‘f’. It’s just too many little dots. We start writing, like, ‘f(iv)‘ instead. Or if the Roman numerals are too much then ‘f(2038)‘ instead. Or if we don’t want to pin things down to a specific value ‘f(j)‘ with the understanding that ‘j’ is some whole number.
We don’t need all of them. In physics problems we get equilibriums from the first derivative. We get stability from the second derivative. And we get springs in the second derivative too. And that’s what I hope to pick up on in the next installment of the main series.
Today’s is another request from gaurish and another I’m glad to have as it let me learn things too. That’s a particularly fun kind of essay to have here.
Yang Hui’s Triangle.
It’s a triangle. Not because we’re interested in triangles, but because it’s a particularly good way to organize what we’re doing and show why we do that. We’re making an arrangement of numbers. First we need cells to put the numbers in.
Start with a single cell in what’ll be the top middle of the triangle. It spreads out in rows beneath that. The rows are staggered. The second row has two cells, each one-half width to the side of the starting one. The third row has three cells, each one-half width to the sides of the row above, so that its center cell is directly under the original one. The fourth row has four cells, two of which are exactly underneath the cells of the second row. The fifth row has five cells, three of them directly underneath the third row’s cells. And so on. You know the pattern. It’s the one that pins in a plinko board take. Just trimmed down to a triangle. Make as many rows as you find interesting. You can always add more later.
In the top cell goes the number ‘1’. There’s also a ‘1’ in the leftmost cell of each row, and a ‘1’ in the rightmost cell of each row.
What of interior cells? The number for those we work out by looking to the row above. Take the cells to the immediate left and right of it. Add the values of those together. So for example the center cell in the third row will be ‘1’ plus ‘1’, commonly regarded as ‘2’. In the third row the leftmost cell is ‘1’; it always is. The next cell over will be ‘1’ plus ‘2’, from the row above. That’s ‘3’. The cell next to that will be ‘2’ plus ‘1’, a subtly different ‘3’. And the last cell in the row is ‘1’ because it always is. In the fourth row we get, starting from the left, ‘1’, ‘4’, ‘6’, ‘4’, and ‘1’. And so on.
It’s a neat little arithmetic project. It has useful application beyond the joy of making something neat. Many neat little arithmetic projects don’t have that. But the numbers in each row give us binomial coefficients, which we often want to know. That is, if we wanted to work out (a + b) to, say, the third power, we would know what it looks like from looking at the fourth row of Yanghui’s Triangle. It will be . This turns up in polynomials all the time.
Look at diagonals. By diagonal here I mean a line parallel to the line of ‘1’s. Left side or right side; it doesn’t matter. Yang Hui’s triangle is bilaterally symmetric around its center. The first diagonal under the edges is a bit boring but familiar enough: 1-2-3-4-5-6-7-et cetera. The second diagonal is more curious: 1-3-6-10-15-21-28 and so on. You’ve seen those numbers before. They’re called the triangular numbers. They’re the number of dots you need to make a uniformly spaced, staggered-row triangle. Doodle a bit and you’ll see. Or play with coins or pool balls.
The third diagonal looks more arbitrary yet: 1-4-10-20-35-56-84 and on. But these are something too. They’re the tetrahedronal numbers. They’re the number of things you need to make a tetrahedron. Try it out with a couple of balls. Oranges if you’re bored at the grocer’s. Four, ten, twenty, these make a nice stack. The fourth diagonal is a bunch of numbers I never paid attention to before. 1-5-15-35-70-126-210 and so on. This is — well. We just did tetrahedrons, the triangular arrangement of three-dimensional balls. Before that we did triangles, the triangular arrangement of two-dimensional discs. Do you want to put in a guess what these “pentatope numbers” are about? Sure, but you hardly need to. If we’ve got a bunch of four-dimensional hyperspheres and want to stack them in a neat triangular pile we need one, or five, or fifteen, or so on to make the pile come out neat. You can guess what might be in the fifth diagonal. I don’t want to think too hard about making triangular heaps of five-dimensional hyperspheres.
There’s more stuff lurking in here, waiting to be decoded. Add the numbers of, say, row four up and you get two raised to the third power. Add the numbers of row ten up and you get two raised to the ninth power. You see the pattern. Add everything in, say, the top five rows together and you get the fifth Mersenne number, two raised to the fifth power (32) minus one (31, when we’re done). Add everything in the top ten rows together and you get the tenth Mersenne number, two raised to the tenth power (1024) minus one (1023).
Or add together things on “shallow diagonals”. Start from a ‘1’ on the outer edge. I’m going to suppose you started on the left edge, but remember symmetry; it’ll be fine if you go from the right instead. Add to that ‘1’ the number you get by moving one cell to the right and going up-and-right. And then again, go one cell to the right and then one cell up-and-right. And again and again, until you run out of cells. You get the Fibonacci sequence, 1-1-2-3-5-8-13-21-and so on.
We can even make an astounding picture from this. Take the cells of Yang Hui’s triangle. Color them in. One shade if the cell has an odd number, another if the cell has an even number. It will create a pattern we know as the Sierpiński Triangle. (Wacław Sierpiński is proving to be the surprise special guest star in many of this A To Z sequence’s essays.) That’s the fractal of a triangle subdivided into four triangles with the center one knocked out, and the remaining triangles them subdivided into four triangles with the center knocked out, and on and on.
By now I imagine even my most skeptical readers agree this is an interesting, useful mathematical construct. Also that they’re wondering why I haven’t said the name “Blaise Pascal”. The Western mathematical tradition knows of this from Pascal’s work, particularly his 1653 Traité du triangle arithmétique. But mathematicians like to say their work is universal, and independent of the mere human beings who find it. Constructions like this triangle give support to this. Yang lived in China, in the 12th century. I imagine it possible Pascal had hard of his work or been influenced by it, by some chain, but I know of no evidence that he did.
And even if he had, there are other apparently independent inventions. The Avanti Indian astronomer-mathematician-astrologer Varāhamihira described the addition rule which makes the triangle work in commentaries written around the year 500. Omar Khayyám, who keeps appearing in the history of science and mathematics, wrote about the triangle in his 1070 Treatise on Demonstration of Problems of Algebra. Again so far as I am aware there’s not a direct link between any of these discoveries. They are things different people in different traditions found because the tools — arithmetic and aesthetically-pleasing orders of things — were ready for them.
Yang Hui wrote about his triangle in the 1261 book Xiangjie Jiuzhang Suanfa. In it he credits the use of the triangle (for finding roots) as invented around 1100 by mathematician Jia Xian. This reminds us that it is not merely mathematical discoveries that are found by many peoples at many times and places. So is Boyer’s Law, discovered by Hubert Kennedy.
I’m happy to say it’s another request today. This one’s from HowardAt58, author of the Saving School Math blog. He’s given me some great inspiration in the past.
It’s right there in the name. Osculating. You know what that is from that one Daffy Duck cartoon where he cries out “Greetings, Gate, let’s osculate” while wearing a moustache. Daffy’s imitating somebody there, but goodness knows who. Someday the mystery drives the young you to a dictionary web site. Osculate means kiss. This doesn’t seem to explain the scene. Daffy was imitating Jerry Colonna. That meant something in 1943. You can find him on old-time radio recordings. I think he’s funny, in that 40s style.
Make the substitution. A kissing circle. Suppose it’s not some playground antic one level up from the Kissing Bandit that plagues recess yet one or two levels down what we imagine we’d do in high school. It suggests a circle that comes really close to something, that touches it a moment, and then goes off its own way.
But then touching. We know another word for that. It’s the root behind “tangent”. Tangent is a trigonometry term. But it appears in calculus too. The tangent line is a line that touches a curve at one specific point and is going in the same direction as the original curve is at that point. We like this because … well, we do. The tangent line is a good approximation of the original curve, at least at the tangent point and for some region local to that. The tangent touches the original curve, and maybe it does something else later on. What could kissing be?
The osculating circle is about approximating an interesting thing with a well-behaved thing. So are similar things with names like “osculating curve” or “osculating sphere”. We need that a lot. Interesting things are complicated. Well-behaved things are understood. We move from what we understand to what we would like to know, often, by an approximation. This is why we have tangent lines. This is why we build polynomials that approximate an interesting function. They share the original function’s value, and its derivative’s value. A polynomial approximation can share many derivatives. If the function is nice enough, and the polynomial big enough, it can be impossible to tell the difference between the polynomial and the original function.
The osculating circle, or sphere, isn’t so concerned with matching derivatives. I know, I’m as shocked as you are. Well, it matches the first and the second derivatives of the original curve. Anything past that, though, it matches only by luck. The osculating circle is instead about matching the curvature of the original curve. The curvature is what you think it would be: it’s how much a function curves. If you imagine looking closely at the original curve and an osculating circle they appear to be two arcs that come together. They must touch at one point. They might touch at others, but that’s incidental.
Osculating circles, and osculating spheres, sneak out of mathematics and into practical work. This is because we often want to work with things that are almost circles. The surface of the Earth, for example, is not a sphere. But it’s only a tiny bit off. It’s off in ways that you only notice if you are doing high-precision mapping. Or taking close measurements of things in the sky. Sometimes we do this. So we map the Earth locally as if it were a perfect sphere, with curvature exactly what its curvature is at our observation post.
Or we might be observing something moving in orbit. If the universe had only two things in it, and they were the correct two things, all orbits would be simple: they would be ellipses. They would have to be “point masses”, things that have mass without any volume. They never are. They’re always shapes. Spheres would be fine, but they’re never perfect spheres even. The slight difference between a perfect sphere and whatever the things really are affects the orbit. Or the other things in the universe tug on the orbiting things. Or the thing orbiting makes a course correction. All these things make little changes in the orbiting thing’s orbit. The actual orbit of the thing is a complicated curve. The orbit we could calculate is an osculating — well, an osculating ellipse, rather than an osculating circle. Similar idea, though. Call it an osculating orbit if you’d rather.
That osculating circles have practical uses doesn’t mean they aren’t respectable mathematics. I’ll concede they’re not used as much as polynomials or sine curves are. I suppose that’s because polynomials and sine curves have nicer derivatives than circles do. But osculating circles do turn up as ways to try solving nonlinear differential equations. We need the help. Linear differential equations anyone can solve. Nonlinear differential equations are pretty much impossible. They also turn up in signal processing, as ways to find the frequencies of a signal from a sampling of data. This, too, we would like to know.
We get the name “osculating circle” from Gottfried Wilhelm Leibniz. This might not surprise. Finding easy-to-understand shapes that approximate interesting shapes is why we have calculus. Isaac Newton described a way of making them in the Principia Mathematica. This also might not surprise. Of course they would on this subject come so close together without kissing.
As I get into the second month of Theorem Thursdays I have, I think, the whole roster of weeks sketched out. Today, I want to dive into some real analysis, and the study of numbers. It’s the sort of thing you normally get only if you’re willing to be a mathematics major. I’ll try to be readable by people who aren’t. If you carry through to the end and follow directions you’ll have your very own mathematical construct, too, so enjoy.
Liouville’s Approximation Theorem
It all comes back to polynomials. Of course it does. Polynomials aren’t literally everything in mathematics. They just come close. Among the things we can do with polynomials is divide up the real numbers into different sets. The tool we use is polynomials with integer coefficients. Integers are the positive and the negative whole numbers, stuff like ‘4’ and ‘5’ and ‘-12’ and ‘0’.
A polynomial is the sum of a bunch of products of coefficients multiplied by a variable raised to a power. We can use anything for the variable’s name. So we use ‘x’. Sometimes ‘t’. If we want complex-valued polynomials we use ‘z’. Some people trying to make a point will use ‘y’ or ‘s’ but they’re just showing off. Coefficients are just numbers. If we know the numbers, great. If we don’t know the numbers, or we want to write something that doesn’t commit us to any particular numbers, we use letters from the start of the alphabet. So we use ‘a’, maybe ‘b’ if we must. If we need a lot of numbers, we use subscripts: a0, a1, a2, and so on, up to some an for some big whole number n. To talk about one of these without committing ourselves to a specific example we use a subscript of i or j or k: aj, ak. It’s possible that aj and ak equal each other, but they don’t have to, unless j and k are the same whole number. They might also be zero, but they don’t have to be. They can be any numbers. Or, for this essay, they can be any integers. So we’d write a generic polynomial f(x) as:
(Some people put the coefficients in the other order, that is, and so on. That’s not wrong. The name we give a number doesn’t matter. But it makes it harder to remember what coefficient matches up with, say, x14.)
A zero, or root, is a value for the variable (‘x’, or ‘t’, or what have you) which makes the polynomial equal to zero. It’s possible that ‘0’ is a zero, but don’t count on it. A polynomial of degree n — meaning the highest power to which x is raised is n — can have up to n different real-valued roots. All we’re going to care about is one.
Rational numbers are what we get by dividing one whole number by another. They’re numbers like 1/2 and 5/3 and 6. They’re numbers like -2.5 and 1.0625 and negative a billion. Almost none of the real numbers are rational numbers; they’re exceptional freaks. But they are all the numbers we actually compute with, once we start working out digits. Thus we remember that to live is to live paradoxically.
And every rational number is a root of a first-degree polynomial. That is, there’s some polynomial f(x) = a_0 + a_1 x that’s made zero for your polynomial. It’s easy to tell you what it is, too. Pick your rational number. You can write that as the integer p divided by the integer q. Now look at the polynomial f(x) = p – q x. Astounded yet?
That trick will work for any rational number. It won’t work for any irrational number. There’s no first-degree polynomial with integer coefficients that has the square root of two as a root. There are polynomials that do, though. There’s f(x) = 2 – x2. You can find the square root of two as the zero of a second-degree polynomial. You can’t find it as the zero of any lower-degree polynomials. So we say that this is an algebraic number of the second degree.
This goes on higher. Look at the cube root of 2. That’s another irrational number, so no first-degree polynomials have it as a root. And there’s no second-degree polynomials that have it as a root, not if we stick to integer coefficients. Ah, but f(x) = 2 – x3? That’s got it. So the cube root of two is an algebraic number of degree three.
We can go on like this, although I admit examples for higher-order algebraic numbers start getting hard to justify. Most of the numbers people have heard of are either rational or are order-two algebraic numbers. I can tell you truly that the eighth root of two is an eighth-degree algebraic number. But I bet you don’t feel enlightened. At best you feel like I’m setting up for something. The number r(5), the smallest radius a disc can have so that five of them will completely cover a disc of radius 1, is eighth-degree and that’s interesting. But you never imagined the number before and don’t have any idea how big that is, other than “I guess that has to be smaller than 1”. (It’s just a touch less than 0.61.) I sound like I’m wasting your time, although you might start doing little puzzles trying to make smaller coins cover larger ones. Do have fun.
Liouville’s Approximation Theorem is about approximating algebraic numbers with rational ones. Almost everything we ever do is with rational numbers. That’s all right because we can make the difference between the number we want, even if it’s r(5), and the numbers we can compute with, rational numbers, as tiny as we need. We trust that the errors we make from this approximation will stay small. And then we discover chaos science. Nothing is perfect.
For example, suppose we need to estimate π. Everyone knows we can approximate this with the rational number 22/7. That’s about 3.142857, which is all right but nothing great. Some people know we can approximate it as 333/106. (I didn’t until I started writing this paragraph and did some research.) That’s about 3.141509, which is better. Then there’s 355/113, which is not as famous as 22/7 but is a celebrity compared to 333/106. That’s about 3.141529. Then we get into some numbers only mathematics hipsters know: 103993/33102 and 104348/33215 and so on. Fine.
The Liouville Approximation Theorem is about sequences that converge on an irrational number. So we have our first approximation x1, that’s the integer p1 divided by the integer q1. So, 22 and 7. Then there’s the next approximation x2, that’s the integer p2 divided by the integer q2. So, 333 and 106. Then there’s the next approximation yet, x3, that’s the integer p3 divided by the integer q3. As we look at more and more approximations, xj‘s, we get closer and closer to the actual irrational number we want, in this case π. Also, the denominators, the qj‘s, keep getting bigger.
The theorem speaks of having an algebraic number, call it x, of some degree n greater than 1. Then we have this limit on how good an approximation can be. The difference between the number x that we want, and our best approximation p / q, has to be larger than the number (1/q)n + 1. The approximation might be higher than x. It might be lower than x. But it will be off by at least the n-plus-first power of 1/q.
Polynomials let us separate the real numbers into infinitely many tiers of numbers. They also let us say how well the most accessible tier of numbers, rational numbers, can approximate these more exotic things.
One of the things we learn by looking at numbers through this polynomial screen is that there are transcendental numbers. These are numbers that can’t be the root of any polynomial with integer coefficients. π is one of them. e is another. Nearly all numbers are transcendental. But the proof that any particular number is one is hard. Joseph Liouville showed that transcendental numbers must exist by using continued fractions. But this approximation theorem tells us how to make our own transcendental numbers. This won’t be any number you or anyone else has ever heard of, unless you pick a special case. But it will be yours.
You will need:
a1, an integer from 1 to 9, such as ‘1’, ‘9’, or ‘5’.
a2, another integer from 1 to 9. It may be the same as a1 if you like, but it doesn’t have to be.
a3, yet another integer from 1 to 9. It may be the same as a1 or a2 or, if it so happens, both.
a4, one more integer from 1 to 9 and you know what? Let’s summarize things a bit.
A whopping great big gob of integers aj, every one of them from 1 to 9, for every possible integer ‘j’ so technically this is infinitely many of them.
Comfort with the notation n!, which is the factorial of n. For whole numbers that’s the product of every whole number from 1 to n, so, 2! is 1 times 2, or 2. 3! is 1 times 2 times 3, or 6. 4! is 1 times 2 times 3 times 4, or 24. And so on.
Not to be thrown by me writing -n!. By that I mean work out n! and then multiply that by -1. So -2! is -2. -3! is -6. -4! is -24. And so on.
Now, assemble them into your very own transcendental number z, by this formula:
If you’ve done it right, this will look something like:
Ah, but, how do you know this is transcendental? We can prove it is. The proof is by contradiction, which is how a lot of great proofs are done. We show nonsense follows if the thing isn’t true, so the thing must be true. (There are mathematicians that don’t care for proof-by-contradiction. They insist on proof by charging straight ahead and showing a thing is true directly. That’s a matter of taste. I think every mathematician feels that way sometimes, to some extent or on some issues. The proof-by-contradiction is easier, at least in this case.)
Suppose that your z here is not transcendental. Then it’s got to be an algebraic number of degree n, for some finite number n. That’s what it means not to be transcendental. I don’t know what n is; I don’t care. There is some n and that’s enough.
Now, let’s let zm be a rational number approximating z. We find this approximation by taking the first m! digits after the decimal point. So, z1 would be just the number 0.a1. z2 is the number 0.a1a2. z3 is the number 0.a1a2000a3. I don’t know what m you like, but that’s all right. We’ll pick a nice big m.
So what’s the difference between z and zm? Well, it can’t be larger than 10 times 10-(m + 1)!. This is for the same reason that π minus 3.14 can’t be any bigger than 0.01.
Now suppose we have the best possible rational approximation, p/q, of your number z. Its first m! digits are going to be p / 10m!. This will be zm And by the Liouville Approximation Theorem, then, the difference between z and zm has to be at least as big as (1/10m!)(n + 1).
So we know the difference between z and zm has to be larger than one number. And it has to be smaller than another. Let me write those out.
We don’t need the z – zm anymore. That thing on the rightmost side we can write what I’ll swear is a little easier to use. What we have left is:
And this will be true whenever the number m! (n + 1) is greater than (m + 1)! – 1 for big enough numbers m.
But there’s the thing. This isn’t true whenever m is greater than n. So the difference between your alleged transcendental number and its best-possible rational approximation has to be simultaneously bigger than a number and smaller than that same number without being equal to it. Supposing your number is anything but transcendental produces nonsense. Therefore, congratulations! You have a transcendental number.
If you chose all 1’s for your aj‘s, then you have what is sometimes called the Liouville Constant. If you didn’t, you may have a transcendental number nobody’s ever noticed before. You can name it after someone if you like. That’s as meaningful as naming a star for someone and cheaper. But you can style it as weaving someone’s name into the universal truth of mathematics. Enjoy!
I’m glad to finally give you a mathematics essay that lets you make something you can keep.
I am still taking requests for this Theorem Thursdays sequence. I intend to post each Thursday in June and July an essay talking about some theorem and what it means and why it’s important. I have gotten a couple of requests in, but I’m happy to take more; please just give me a little lead time. But I want to start with one that delights me.
The Intermediate Value Theorem
I own a Scion tC. It’s a pleasant car, about 2400 percent more sporty than I am in real life. I got it because it met my most important criteria: it wasn’t expensive and it had a sun roof. That it looks stylish is an unsought bonus.
But being a car, and a black one at that, it has a common problem. Leave it parked a while, then get inside. In the winter, it gets so cold that snow can fall inside it. In the summer, it gets so hot that the interior, never mind the passengers, risks melting. While pondering this slight inconvenience I wondered, isn’t there any outside temperature that leaves my car comfortable?
Of course there is. We know this before thinking about it. The sun heats the car, yes. When the outside temperature is low enough, there’s enough heat flowing out that the car gets cold. When the outside temperature’s high enough, not enough heat flows out. The car stays warm. There must be some middle temperature where just enough heat flows out that the interior doesn’t get particularly warm or cold. Not just one middle temperature, come to that. There is a range of temperatures that are comfortable to sit in. But that just means there’s a range of outside temperatures for which the car’s interior stays comfortable. We know this range as late April, early May, here. Most years, anyway.
The reasoning that lets us know there is a comfort-producing outside temperature we can see as a use of the Intermediate Value Theorem. It addresses a function f with domain [a, b], and range of the real numbers. The domain is closed; that is, the numbers we call ‘a’ and ‘b’ are both in the set. And f has to be a continuous function. If you want to draw it, you can do so without having to lift pen from paper. (WARNING: Do not attempt to pass your Real Analysis course with that definition. But that’s what the proper definition means.)
So look at the numbers f(a) and f(b). Pick some number between them, and I’ll call that number ‘g’. There must be at least one number ‘c’, that’s between ‘a’ and ‘b’, and for which f(c) equals g.
Bernard Bolzano, an early-19th century mathematician/logician/theologist/priest, gets the credit for first proving this theorem. Bolzano’s version was a little different. It supposes that f(a) and f(b) are of opposite sign. That is, f(a) is a positive and f(b) a negative number. Or f(a) is negative and f(b) is positive. And Bolzano’s theorem says there must be some number ‘c’ for which f(c) is zero.
You can prove this by drawing any wiggly curve at all and then a horizontal line in the middle of it. Well, that doesn’t prove it to mathematician’s satisfaction. But it will prove the matter in the sense that you’ll be convinced. It’ll also convince anyone you try explaining this to.
You might wonder why anyone needed this proved at all. It’s a bit like proving that as you pour water into the sink there’ll come a time the last dish gets covered with water. So it is. The need for a proof came about from the ongoing attempt to make mathematics rigorous. We have an intuitive idea of what it means for functions to be continuous; see my above comment about lifting pens from paper. Can that be put in terms that don’t depend on physical intuition? … Yes, it can. And we can divorce the Intermediate Value Theorem from our physical intuitions. We can know something that’s true even if we never see a car or a sink.
This theorem might leave you feeling a little hollow inside. Proving that there is some ‘c’ for which f(c) equals g, or even equals zero, doesn’t seem to tell us much about how to find it. It doesn’t even tell us that there’s only one ‘c’, rather than two or three or a hundred million candidates that meet our criteria. Fair enough. The Intermediate Value Theorem is more about proving the existence of solutions, rather than how to find them.
But knowing there is a solution can help us find them. The Intermediate Value Theorem as we know it grew out of finding roots for polynomials. One numerical method, easy to set up for any problem, is the bisection method. If you know that somewhere between ‘a’ and ‘b’ the function goes from positive to negative, then find the midpoint, ‘c’. The function is equal to zero either between ‘a’ and ‘c’, or between ‘c’ and ‘b’. Pick the side that it’s on, and bisect that. Pick the half of that which the zero must be in. Bisect that half. And repeat until you get close enough to the answer for your needs. (The same reasoning applies to a lot of problems in which you divide the search range in two each time until the answer appears.)
We can get some pretty heady results from the Intermediate Value Theorem, too, even if we don’t know where any of them are. An example you’ll see everywhere is that there must be spots on the opposite sides of the globe with the exact same temperature. Or humidity, or daily rainfall, or any other quantity like that. I had thought everyone was ripping that example off from Richard Courant and Herbert Robbins’s masterpiece What Is Mathematics?. But I can’t find this particular example in there. I wonder what we are all ripping it off from.
So here’s a neat example that is ripped off from them. Draw two blobs on the plane. Is there a straight line that bisects both of them at once? Bisecting here means there’s exactly as much of one blob on one side of the line as on the other. There certainly is. The trick is there are any number of lines that will bisect one blob, and then look at what that does to the other.
A similar ripped-off result you can do with a single blob of any shape you like. Draw any line that bisects it. There are a lot of candidates. Can you draw a line perpendicular to that so that the blob gets quartered, divided into four spots of equal area? Yes. Try it.
But surely the best use of the Intermediate Value Theorem is in the problem of wobbly tables. If the table has four legs, all the same length, and the problem is the floor isn’t level it’s all right. There is some way to adjust the table so it won’t wobble. (Well, the ground can’t be angled more than a bit over 35 degrees, but that’s all right. If the ground has a 35 degree angle you aren’t setting a table on it. You’re rolling down it.) Finally a mathematical proof can save us from despair!
Except that the proof doesn’t work if the table legs are uneven which, alas, they often are. But we can’t get everything.
Courant and Robbins put forth one more example that’s fantastic, although it doesn’t quite work. But it’s a train problem unlike those you’ve seen before. Let me give it to you as they set it out:
Suppose a train travels from station A to station B along a straight section of track. The journey need not be of uniform speed or acceleration. The train may act in any manner, speeding up, slowing down, coming to a halt, or even backing up for a while, before reaching B. But the exact motion of the train is supposed to be known in advance; that is, the function s = f(t) is given, where s is the distance of the train from station A, and t is the time, measured from the instant of departure.
On the floor of one of the cars a rod is pivoted so that it may move without friction either forward or backward until it touches the floor. If it does touch the floor, we assume that it remains on the floor henceforth; this wil be the case if the rod does not bounce.
Is it possible to place the rod in such a position that, if it is released at the instant when the train starts and allowed to move solely under the influence of gravity and the motion of the train, it will not fall to the floor during the entire journey from A to B?
They argue it is possible, and use the Intermediate Value Theorem to show it. They admit the range of angles it’s safe to start the rod from may be too small to be useful.
But they’re not quite right. Ian Stewart, in the revision of What Is Mathematics?, includes an appendix about this. Stewart credits Tim Poston with pointing out, in 1976, the flaw. It’s possible to imagine a path which causes the rod, from one angle, to just graze tipping over, let’s say forward, and then get yanked back and fall over flat backwards. This would leave no room for any starting angles that avoid falling over entirely.
It’s a subtle flaw. You might expect so. Nobody mentioned it between the book’s original publication in 1941, after which everyone liking mathematics read it, and 1976. And it is one that touches on the complications of spaces. This little Intermediate Value Theorem problem draws us close to chaos theory. It’s one of those ideas that weaves through all mathematics.
Take a huge bag and stuff all the real numbers into it. Give the bag a good solid shaking. Stir up all the numbers until they’re thoroughly mixed. Reach in and grab just the one. There you go: you’ve got a transcendental number. Enjoy!
OK, I detect some grumbling out there. The first is that you tried doing this in your head because you somehow don’t have a bag large enough to hold all the real numbers. And you imagined pulling out some number like “2” or “37” or maybe “one-half”. And you may not be exactly sure what a transcendental number is. But you’re confident the strangest number you extracted, “minus 8”, isn’t it. And you’re right. None of those are transcendental numbers.
I regret saying this, but that’s your own fault. You’re lousy at picking random numbers from your head. So am I. We all are. Don’t believe me? Think of a positive whole number. I predict you probably picked something between 1 and 10. Almost surely something between 1 and 100. Surely something less than 10,000. You didn’t even consider picking something between 10,012,002,214,473,325,937,775 and 10,012,002,214,473,325,937,785. Challenged to pick a number, people will select nice and familiar ones. The nice familiar numbers happen not to be transcendental.
I detect some secondary grumbling there. Somebody picked π. And someone else picked e. Very good. Those are transcendental numbers. They’re also nice familiar numbers, at least to people who like mathematics a lot. So they attract attention.
Still haven’t said what they are. What they are traces back, of course, to polynomials. Take a polynomial that’s got one variable, which we call ‘x’ because we don’t want to be difficult. Suppose that all the coefficients of the polynomial, the constant numbers we presumably know or could find out, are integers. What are the roots of the polynomial? That is, for what values of x is the polynomial a complicated way of writing ‘zero’?
For example, try the polynomial x2 – 6x + 5. If x = 1, then that polynomial is equal to zero. If x = 5, the polynomial’s equal to zero. Or how about the polynomial x2 + 4x + 4? That’s equal to zero if x is equal to -2. So a polynomial with integer coefficients can certainly have positive and negative integers as roots.
How about the polynomial 2x – 3? Yes, that is so a polynomial. This is almost easy. That’s equal to zero if x = 3/2. How about the polynomial (2x – 3)(4x + 5)(6x – 7)? It’s my polynomial and I want to write it so it’s easy to find the roots. That polynomial will be zero if x = 3/2, or if x = -5/4, or if x = 7/6. So a polynomial with integer coefficients can have positive and negative rational numbers as roots.
How about the polynomial x2 – 2? That’s equal to zero if x is the square root of 2, about 1.414. It’s also equal to zero if x is minus the square root of 2, about -1.414. And the square root of 2 is irrational. So we can certainly have irrational numbers as roots.
So if we can have whole numbers, and rational numbers, and irrational numbers as roots, how can there be anything else? Yes, complex numbers, I see you raising your hand there. We’re not talking about complex numbers just now. Only real numbers.
It isn’t hard to work out why we can get any whole number, positive or negative, from a polynomial with integer coefficients. Or why we can get any rational number. The irrationals, though … it turns out we can only get some of them this way. We can get square roots and cube roots and fourth roots and all that. We can get combinations of those. But we can’t get everything. There are irrational numbers that are there but that even polynomials can’t reach.
It’s all right to be surprised. It’s a surprising result. Maybe even unsettling. Transcendental numbers have something peculiar about them. The 19th Century French mathematician Joseph Liouville first proved the things must exist, in 1844. (He used continued fractions to show there must be such things.) It would be seven years later that he gave an example of one in nice, easy-to-understand decimals. This is the number 0.110 001 000 000 000 000 000 001 000 000 (et cetera). This number is zero almost everywhere. But there’s a 1 in the n-th digit past the decimal if n is the factorial of some number. That is, 1! is 1, so the 1st digit past the decimal is a 1. 2! is 2, so the 2nd digit past the decimal is a 1. 3! is 6, so the 6th digit past the decimal is a 1. 4! is 24, so the 24th digit past the decimal is a 1. The next 1 will appear in spot number 5!, which is 120. After that, 6! is 720 so we wait for the 720th digit to be 1 again.
And what is this Liouville number 0.110 001 000 000 000 000 000 001 000 000 (et cetera) used for, besides showing that a transcendental number exists? Not a thing. It’s of no other interest. And this plagued the transcendental numbers until 1873. The only examples anyone had of transcendental numbers were ones built to show that they existed. In 1873 Charles Hermite showed finally that e, the base of the natural logarithm, was transcendental. e is a much more interesting number; we have reasons to care about it. Every exponential growth or decay or oscillating process has e lurking in it somewhere. In 1882 Ferdinand von Lindemann showed that π was transcendental, and that’s an even more interesting number.
That bit about π has interesting implications. One goes back to the ancient Greeks. Is it possible, using straightedge and compass, to create a square that’s exactly the same size as a given circle? This is equivalent to saying, if I give you a line segment, can you create another line segment that’s exactly the square root of π times as long? This geometric problem is equivalent to an algebraic one. That problem: can you create a polynomial, with integer coefficients, that has the square root of π as a root? (WARNING: I’m skipping some important points for the sake of clarity. DO NOT attempt to use this to pass your thesis defense without putting those points back in.) We want the square root of π because … well, what’s the area of a square whose sides are the square root of π long? That’s right. So we start with a line segment that’s equal to the radius of the circle and we can do that, surely. Once we have the radius, can’t we make a line that’s the square root of π times the radius, and from that make a square with area exactly π times the radius squared? Since π is transcendental, then, no. We can’t. Sorry. One of the great problems of ancient mathematics, and one that still has the power to attract the casual mathematician, got its final answer in 1882.
Georg Cantor is a name even non-mathematicians might recognize. He showed there have to be some infinite sets bigger than others, and that there must be more real numbers than there are rational numbers. Four years after showing that, he proved there are as many transcendental numbers as there are real numbers.
They’re everywhere. They permeate the real numbers so much that we can understand the real numbers as the transcendental numbers plus some dust. They’re almost the dark matter of mathematics. We don’t actually know all that many of them. Wolfram MathWorld has a table listing numbers proven to be transcendental, and the fact we can list that on a single web page is remarkable. Some of them are large sets of numbers, yes, like for every positive whole number d. And we can infer many more from them; if π is transcendental then so is 2π, and so is 5π, and so is -20.38π, and so on. But the table of numbers proven to be irrational is still just 25 rows long.
There are even mysteries about obvious numbers. π is transcendental. So is e. We know that at least one of π times e and π plus e is transcendental. Perhaps both are. We don’t know which one is, or if both are. We don’t know whether ππ is transcendental. We don’t know whether ee is, either. Don’t even ask if πe is.
How, by the way, does this fit with my claim that everything in mathematics is polynomials? — Well, we found these numbers in the first place by looking at polynomials. The set is defined, even to this day, by how a particular kind of polynomial can’t reach them. Thinking about a particular kind of polynomial makes visible this interesting set.
For something else to read, and one that doesn’t get into polynomials, here’s a post from Stephen Cavadino of the CavMaths blog, abut the areas of lunes. Lunes are … well, they’re kind of moon-shaped figures. Cavadino particularly writes about the Theorem of Not That Hippocrates. Start with a half circle. Draw a symmetric right triangle inside the circle. Draw half-circles off the two equal legs of that right triangle. The area between the original half-circle and the newly-drawn half circles is … how much? The answer may surprise you.
Cavadino doesn’t get into this, but: it’s possible to make a square that has the same area as these strange crescent shapes using only straightedge and compass. Not That Hippocrates knew this. It’s impossible to make a square with the exact same area as a circle using only straightedge and compass. But these figures, with edges that are defined by circles of just the right relative shapes, they’re fine. Isn’t that wondrous?
Polynomials are everything. Everything in mathematics, anyway. If humans study it, it’s a polynomial. If we know anything about a mathematical construct, it’s because we ran across it while trying to understand polynomials.
I exaggerate. A tiny bit. Maybe by three percent. But polynomials are big.
They’re easy to recognize. We can get them in pre-algebra. We make them out of a set of numbers called coefficients and one or more variables. The coefficients are usually either real numbers or complex-valued numbers. The variables we usually allow to be either real or complex-valued numbers. We take each coefficient and multiply it by some power of each variable. And we add all that up. So, polynomials are things that look like these things:
The first polynomial maybe looks nice and comfortable. The second may look a little threatening, what with it having two variables and a square root in it, but it’s not too weird. The third is an infinitely long polynomial; you’re supposed to keep going on in that pattern, adding even more terms. The last is a generic representation of a polynomial. Each number a0, a1, a2, et cetera is some coefficient that we in principle know. It’s a good way of representing a polynomial when we want to work with it but don’t want to tie ourselves down to a particular example. The highest power we raise a variable to we call the degree of the polynomial. A second-degree polynomial, for example, has an x2 in it, but not an x3 or x4 or x18 or anything like that. A third-degree polynomial has an x3, but not x to any higher powers. Degree is a useful way of saying roughly how long a polynomial is, so it appears all over discussions of polynomials.
It’s because they’re great. They do everything we’d ever want to do and they’re great at it. We can add them together as easily as we add regular old numbers. We can subtract them as well. We can multiply and divide them. There’s even prime polynomials, just like there are prime numbers. They take longer to work out, but they’re not harder.
And they do great stuff in advanced mathematics too. In calculus we want to take derivatives of functions. Polynomials, we always can. We get another polynomial out of that. So we can keep taking derivatives, as many as we need. (We might need a lot of them.) We can integrate too. The integration produces another polynomial. So we can keep doing that as long as we need too. (We need to do this a lot, too.) This lets us solve so many problems in calculus, which is about how functions work. It also lets us solve so many problems in differential equations, which is about systems whose change depends on the current state of things.
That’s great for analyzing polynomials, but what about things that aren’t polynomials?
Well, if a function is continuous, then it might as well be a polynomial. To be a little more exact, we can set a margin of error. And we can always find polynomials that are less than that margin of error away from the original function. The original function might be annoying to deal with. The polynomial that’s as close to it as we want, though, isn’t.
Not every function is continuous. Most of them aren’t. But most of the functions we want to do work with are, or at least are continuous in stretches. Polynomials let us understand the functions that describe most real stuff.
Nice for mathematicians, all right, but how about for real uses? How about for calculations?
Oh, polynomials are just magnificent. You know why? Because you can evaluate any polynomial as soon as you can add and multiply. (Also subtract, but we think of that as addition.) Remember, x4 just means “x times x times x times x”, four of those x’s in the product. All these polynomials are easy to evaluate.
Even better, we don’t have to evaluate them. We can automate away the evaluation. It’s easy to set a calculator doing this work, and it will do it without complaint and with few unforeseeable mistakes.
Now remember that thing where we can make a polynomial close enough to any continuous function? And we can always set a calculator to evaluate a polynomial? Guess that this means about continuous functions. We have a tool that lets us calculate stuff we would want to know. Things like arccosines and logarithms and Bessel functions and all that. And we get nice easy to understand numbers out of them. For example, that third polynomial I gave you above? That’s not just infinitely long. It’s also a polynomial that approximates the natural logarithm. Pick a positive number x that’s between 0 and 4 and put it in that polynomial. Calculate terms and add them up. You’ll get closer and closer to the natural logarithm of that number. You’ll get there faster if you pick a number near 2, but you’ll eventually get there for whatever number you pick. (Calculus will tell us why x has to be between 0 and 4. Don’t worry about it for now.)
So through polynomials we can understand functions, analytically and numerically.
And they keep revealing things to us. We discovered complex-valued numbers because we wanted to find roots, values of x that make a polynomial of x equal to zero. Some formulas worked well for third- and fourth-degree polynomials. (They look like the quadratic formula, which solves second-degree polynomials. The big difference is nobody remembers what they are without looking them up.) But the formulas sometimes called for things that looked like square roots of negative numbers. Absurd! But if you carried on as if these square roots of negative numbers meant something, you got meaningful answers. And correct answers.
We wanted formulas to solve fifth- and higher-degree polynomials exactly. We can do this with second and third and fourth-degree polynomials, after all. It turns out we can’t. Oh, we can solve some of them exactly. The attempt to understand why, though, helped us create and shape group theory, the study of things that look like but aren’t numbers.
Polynomials go on, sneaking into everything. We can look at a square matrix and discover its characteristic polynomial. This allows us to find beautifully-named things like eigenvalues and eigenvectors. These reveal secrets of the matrix’s structure. We can find polynomials in the formulas that describe how many ways to split up a group of things into a smaller number of sets. We can find polynomials that describe how networks of things are connected. We can find polynomials that describe how a knot is tied. We can even find polynomials that distinguish between a knot and the knot’s reflection in the mirror.
A major field of mathematics is Algebra. By this mathematicians don’t mean algebra. They mean studying collections of things on which you can do stuff that looks like arithmetic. There’s good reasons why this field has that confusing name. Nobody knows what they are.
We’ve seen before the creation of things that look a bit like arithmetic. Rings are a collection of things for which we can do something that works like addition and something that works like multiplication. There are a lot of different kinds of rings. When a mathematics popularizer tries to talk about rings, she’ll talk a lot about the whole numbers. We can usually count on the audience to know what they are. If that won’t do for the particular topic, she’ll try the whole numbers modulo something. If she needs another example then she talks about the ways you can rotate or reflect a triangle, or square, or hexagon and get the original shape back. Maybe she calls on the sets of polynomials you can describe. Then she has to give up on words and make do with pictures of beautifully complicated things. And after that she has to give up because the structures get too abstract to describe without losing the audience.
Dedekind Domains are a kind of ring that meets a bunch of extra criteria. There’s no point my listing them all. It would take several hundred words and you would lose motivation to continue before I was done. If you need them anyway Eric W Weisstein’s MathWorld dictionary gives the exact criteria. It also has explanations for all the words in those criteria.
Dedekind Domains, also called Dedekind Rings, are aptly named for Richard Dedekind. He was a 19th century mathematician, the last doctoral student of Gauss, and one of the people who defined what we think of as algebra. He also gave us a rigorous foundation for what irrational numbers are.
Among the problems that fascinated Dedekind was Fermat’s Last Theorem. This can’t surprise you. Every person who would be a mathematician is fascinated by it. We take our innings fiddling with cases and ways to show an + bn can’t equal cn for interesting whole numbers a, b, c, and n. We usually go about this by saying, “Suppose we have the smallest a, b, and c for which this is true and for which n is bigger than 2”. Then we do a lot of scribbling that shows this implies something contradictory, like an even number equals an odd, or that there’s some set of smaller numbers making this true. This proves the original supposition was false. Mathematicians first learn that trick as a way to show the square root of two can’t be a rational number. We stick with it because it’s nice and familiar and looks relevant. Most of us get maybe as far as proving there aren’t any solutions for n = 3 or maybe n = 4 and go on to other work. Dedekind didn’t prove the theorem. But he did find new ways to look at numbers.
One problem with proving Fermat’s Last Theorem is that it’s all about integers. Integers are hard to prove things about. Real numbers are easier. Complex-valued numbers are easier still. This is weird but it’s so. So we have this promising approach: if we could prove something like Fermat’s Last Theorem for complex-valued numbers, we’d get it up for integers. Or at least we’d be a lot of the way there. The one flaw is that Fermat’s Last Theorem isn’t true for complex-valued numbers. It would be ridiculous if it were true.
But we can patch things up. We can construct something called Gaussian Integers. These are complex-valued numbers which we can match up to integers in a compelling way. We could use the tools that work on complex-valued numbers to squeeze out a result about integers.
You know that this didn’t work. If it had, we wouldn’t have had to wait for the 1990s for the proof of Fermat’s Last Theorem. And that proof would have anything to do with this stuff. It hasn’t. One of the problems keeping this kind of proof from working is factoring. Whole numbers are either prime numbers or the product of prime numbers. Or they’re 1, ruled out of the universe of prime numbers for reasons I get to after the next paragraph. Prime numbers are those like 2, 5, 13, 37 and many others. They haven’t got any factors besides themselves and 1. The other whole numbers are the products of prime numbers. 12 is equal to 2 times 2 times 3. 35 is equal to 5 times 7. 165 is equal to 3 times 5 times 11.
If we stick to whole numbers, then, these all have unique prime factorizations. 24 is equal to 2 times 2 times 2 times 3. And there are no other combinations of prime numbers that multiply together to give us 24. We could rearrange the numbers — 2 times 3 times 2 times 2 works. But it will always be a combination of three 2’s and a single 3 that we multiply together to get 24.
(This is a reason we don’t consider 1 a prime number. If we did consider a prime number, then “three 2’s and a single 3” would be a prime factorization of 24, but so would “three 2’s, a single 3, and two 1’s”. Also “three 2’s, a single 3, and fifteen 1’s”. Also “three 2’s, a single 3, and one 1”. We have a lot of theorems that depend on whole numbers having a unique prime factorization. We could add the phrase “except for the count of 1’s in the factorization” to every occurrence of the phrase “prime factorization”. Or we could say that 1 isn’t a prime number. It’s a lot less work to say 1 isn’t a prime number.)
The trouble is that if we work with Gaussian integers we don’t have that unique prime factorization anymore. There are still prime numbers. But it’s possible to get some numbers as a product of different sets of prime numbers. And this point breaks a lot of otherwise promising attempts to prove Fermat’s Last Theorem. And there’s no getting around that, not for Fermat’s Last Theorem.
Dedekind saw a good concept lurking under this, though. The concept is called an ideal. It’s a subset of a ring that itself satisfies the rules for being a ring. And if you take something from the original ring and multiply it by something in the ideal, you get something that’s still in the ideal. You might already have one in mind. Start with the ring of integers. The even numbers are an ideal of that. Add any two even numbers together and you get an even number. Multiply any two even numbers together and you get an even number. Take any integer, even or not, and multiply it by an even number. You get an even number.
It’s not just even numbers that do this. The multiples of 3 make an ideal in the integers too. Add two multiples of 3 together and you get a multiple of 3. Multiply two multiples of 3 together and you get another multiple of 3. Multiply any integer by a multiple of 3 and you get a multiple of 3.
The multiples of 4 also make an ideal, as do the multiples of 5, or the multiples of 82, or of any whole number you like.
Odd numbers don’t make an ideal, though. Add two odd numbers together and you don’t get an odd number. Multiply an integer by an odd number and you might get an odd number, you might not.
And not every ring has an ideal lurking within it. For example, take the integers modulo 3. In this case there are only three numbers: 0, 1, and 2. 1 + 1 is 2, uncontroversially. But 1 + 2 is 0. 2 + 2 is 1. 2 times 1 is 2, but 2 times 2 is 1 again. This is self-consistent. But it hasn’t got an ideal within it. There isn’t a smaller set that has addition work.
The multiples of 4 make an interesting ideal in the integers. They’re not just an ideal of the integers. They’re also an ideal of the even numbers. Well, the even numbers make a ring. They couldn’t be an ideal of the integers if they couldn’t be a ring in their own right. And the multiples of 4 — well, multiply any even number by a multiple of 4. You get a multiple of 4 again. This keeps on going. The multiples of 8 are an ideal for the multiples of 4, the multiples of 2, and the integers. Multiples of 16 and 32 make for even deeper nestings of ideals.
The multiples of 6, now … that’s an ideal of the integers, for all the reasons the multiples of 2 and 3 and 4 were. But it’s also an ideal of the multiples of 2. And of the multiples of 3. We can see the collection of “things that are multiples of 6” as a product of “things that are multiples of 2” and “things that are multiples of 3”. Dedekind saw this before us.
You might want to pause a moment while considering the idea of multiplying whole sets of numbers together. It’s a heady concept. Trying to do proofs with the concept feels at first like being tasked with alphabetizing a cloud. But we’re not planning to prove anything so you can move on if you like with an unalphabetized cloud.
A Dedekind Domain is a ring that has ideals like this. And the ideals come in two categories. Some are “prime ideals”, which act like prime numbers do. The non-prime ideals are the products of prime ideals. And while we might not have unique prime factorizations of numbers, we do have unique prime factorizations of ideals. That is, if an ideal is a product of some set of prime ideals, then it can’t also be the product of some other set of prime ideals. We get back something like unique factors.
This may sound abstract. But you know a Dedekind Domain. The integers are one. That wasn’t a given. Yes, we start algebra by looking for things that work like regular arithmetic do. But that doesn’t promise that regular old numbers will still satisfy us. We can, for instance, study things where the order matters in multiplication. Then multiplying one thing by a second gives us a different answer to multiplying the second thing by the first. Still, regular old integers are Dedekind domains and it’s hard to think of being more familiar than that.
Another example is the set of polynomials. You might want to pause for a moment here. Mathematics majors need a pause to start thinking of polynomials as being something kind of like regular old numbers. But you can certainly add one polynomial to another, and you get a polynomial out of it. You can multiply one polynomial by another, and you get a polynomial out of that. Try it. After that the only surprise would be that there are prime polynomials. But if you try to think of two polynomials that multiply together to give you “x + 1” you realize there have to be.
Other examples start getting more exotic. They’re things like the Gaussian integers I mentioned before. Gaussian integers are themselves an example of a structure called algebraic integers. Algebraic integers are — well, think of all the polynomials you can out of integer coefficients, and with a leading coefficient of 1. So, polynomials that look like “x3 – 4 x2 + 15 x + 6” or the like. All of the roots of those, the values of x which make that expression equal to zero, are algebraic integers. Yes, almost none of them are integers. We know. But the algebraic integers are also a Dedekind Domain.
I’d like to describe some more Dedekind Domains. I am foiled. I can find some more, but explaining them outside the dialect of mathematics is hard. It would take me more words than I am confident readers will give me.
I hope you are satisfied to know a bit of what a Dedekind Domain is. It is a kind of thing which works much like integers do. But a Dedekind Domain can be just different enough that we can’t count on factoring working like we are used to. We don’t lose factoring altogether, though. We are able to keep an attenuated version. It does take quite a few words to explain exactly how to set this up, however.
John D Cook’s Algebra Fact of the Day points to a pair of algorithms about making change. Specifically these are about how many ways there are to provide a certain amount of change using United States coins. By that he, and the algorithms, mean 1, 5, 10, 25, and 50 cent pieces. I’m not sure if 50 cent coins really count, since they don’t circulate any more than dollar coins do. Anyway, if you want to include or rule out particular coins it’s clear enough how to adapt things.
What surprised me was a simple algorithm, taken from Ronald L Graham, Donald E Knuth, and Oren Patashnik’s Concrete Mathematics: A Foundation For Computer Science to count the number of ways to make a certain amount of change. You start with the power series that’s equivalent to this fraction:
A power series is a polynomial. The power series for , for example, is and carries on forever like that. But if you choose a number between minus one and positive one, and put that in for z in either or in that series you’ll get the same number. (If z is not between minus one and positive one, it doesn’t. Don’t worry about it. For what we’re doing we will never need any z.)
The power series for that big fraction with all the kinds of change in it is more tedious to work out. You’d need the power series for and and and so on, and to multiply all those series together. And yes, that’s multiplying infinitely long polynomials together, which you might reasonably expect will take some time.
You don’t need to, though. All you really want is a single term in this series. To tell how many ways there are to make n cents of change, look at the coefficient, the number, in front of the zn term. That’s the number of ways. So while this may be a lot of work, it’s not going to be hard work, and it’s going to be finite. You only have to work out the products that give you a zn power. That will take planning and preparation to do correctly, but that’s all.
The z-transform comes to us from signal processing. The signal we take to be a sequence of numbers, all representing something sampled at uniformly spaced times. The temperature at noon. The power being used, second-by-second. The number of customers in the store, once a month. Anything. The sequence of numbers we take to stretch back into the infinitely great past, and to stretch forward into the infinitely distant future. If it doesn’t, then we pad the sequence with zeroes, or some other safe number that we know means “nothing”. (That’s another classic mathematician’s trick.)
It’s convenient to have a name for this sequence. “a” is a good one. The different sampled values are denoted by an index. a0 represents whatever value we have at the “start” of the sample. That might represent the present. That might represent where sampling began. That might represent just some convenient reference point. It’s the equivalent of mileage maker zero; we have to have something be the start.
a1, a2, a3, and so on are the first, second, third, and so on samples after the reference start. a-1, a-2, a-3, and so on are the first, second, third, and so on samples from before the reference start. That might be the last couple of values before the present.
So for example, suppose the temperatures the last several days were 77, 81, 84, 82, 78. Then we would probably represent this as a-4 = 77, a-3 = 81, a-2 = 84, a-1 = 82, a0 = 78. We’ll hope this is Fahrenheit or that we are remotely sensing a temperature.
The z-transform of a sequence of numbers is something that looks a lot like a polynomial, based on these numbers. For this five-day temperature sequence the z-transform would be the polynomial . (z1 is the same as z. z0 is the same as the number “1”. I wrote it this way to make the pattern more clear.)
I would not be surprised if you protested that this doesn’t merely look like a polynomial but actually is one. You’re right, of course, for this set, where all our samples are from negative (and zero) indices. If we had positive indices then we’d lose the right to call the transform a polynomial. Suppose we trust our weather forecaster completely, and add in a1 = 83 and a2 = 76. Then the z-transform for this set of data would be . You’d probably agree that’s not a polynomial, although it looks a lot like one.
The use of z for these polynomials is basically arbitrary. The main reason to use z instead of x is that we can learn interesting things if we imagine letting z be a complex-valued number. And z carries connotations of “a possibly complex-valued number”, especially if it’s used in ways that suggest we aren’t looking at coordinates in space. It’s not that there’s anything in the symbol x that refuses the possibility of it being complex-valued. It’s just that z appears so often in the study of complex-valued numbers that it reminds a mathematician to think of them.
A sound question you might have is: why do this? And there’s not much advantage in going from a list of temperatures “77, 81, 84, 81, 78, 83, 76” over to a polynomial-like expression .
Where this starts to get useful is when we have an infinitely long sequence of numbers to work with. Yes, it does too. It will often turn out that an interesting sequence transforms into a polynomial that itself is equivalent to some easy-to-work-with function. My little temperature example there won’t do it, no. But consider the sequence that’s zero for all negative indices, and 1 for the zero index and all positive indices. This gives us the polynomial-like structure . And that turns out to be the same as . That’s much shorter to write down, at least.
Probably you’ll grant that, but still wonder what the point of doing that is. Remember that we started by thinking of signal processing. A processed signal is a matter of transforming your initial signal. By this we mean multiplying your original signal by something, or adding something to it. For example, suppose we want a five-day running average temperature. This we can find by taking one-fifth today’s temperature, a0, and adding to that one-fifth of yesterday’s temperature, a-1, and one-fifth of the day before’s temperature a-2, and one-fifth a-3, and one-fifth a-4.
The effect of processing a signal is equivalent to manipulating its z-transform. By studying properties of the z-transform, such as where its values are zero or where they are imaginary or where they are undefined, we learn things about what the processing is like. We can tell whether the processing is stable — does it keep a small error in the original signal small, or does it magnify it? Does it serve to amplify parts of the signal and not others? Does it dampen unwanted parts of the signal while keeping the main intact?
We can understand how data will be changed by understanding the z-transform of the way we manipulate it. That z-transform turns a signal-processing idea into a complex-valued function. And we have a lot of tools for studying complex-valued functions. So we become able to say a lot about the processing. And that is what the z-transform gets us.
I don’t want folks thinking I’m claiming a monopoly on the mathematics-glossary front. HowardAt58 is happy to explain words too. Here he talks about one of the definitions of “vertex”, in this case the one that relates to parabolas and other polynomial curves. As a bonus, there’s osculating circles.
Here are some definitions of the vertex of a parabola.
One is complete garbage, one is correct though put rather chattily.
The rest are not definitions, though very popular (this is just a selection). But they are true statements
Mathwarehouse: The vertex of a parabola is the highest or lowest point, also known as the maximum or minimum of a
Mathopenref: A parabola is the shape defined by a quadratic equation. The vertex is the peak in the curve as shown on
the right. The peak will be pointing either downwards or upwards depending on the sign of the x2 term.
Virtualnerd: Each quadratic equation has either a maximum or minimum, but did you that this point has a special name?
In a quadratic equation, this point is called the vertex!
Mathwords: Vertex of a Parabola: The point at which a parabola makes its sharpest turn. Purplemath: The…
It’s a common joke that mathematicians shun things that have anything to do with the real world. You can see where the impression comes from, though. Even common mathematical constructs, such as “functions”, are otherworldly abstractions once a mathematician is done defining them precisely. It can look like mathematicians find real stuff to be too dull to study.
Knot theory goes against the stereotype. A mathematician’s knot is just about what you would imagine: threads of something that get folded and twisted back around themselves. Every now and then a knot theorist will get a bit of human-interest news going for the department by announcing a new way to tie a tie, or to tie a shoelace, or maybe something about why the Christmas tree lights get so tangled up. These are really parts of the field, and applications that almost leap off the page as one studies. It’s a bit silly, admittedly. The only way anybody needs to tie a tie is go see my father and have him do it for you, and then just loosen and tighten the knot for the two or three times you’ll need it. And there’s at most two ways of tying a shoelace anybody needs. Christmas tree lights are a bigger problem but nobody can really help with getting them untangled. But studying the field encourages a lot of sketches of knots, and they almost cry out to be done out of some real material.
One amazing thing about knots is that they can be described as mathematical expressions. There are multiple ways to encode a description for how a knot looks as a polynomial. An expression like contains enough information to draw one knot as opposed to all the others that might exist. (In this case it’s a very simple knot, one known as the right-hand trefoil knot. A trefoil knot is a knot with a trefoil-like pattern.) Indeed, it’s possible to describe knots with polynomials that let you distinguish between a knot and its mirror-image reflection.
Biology, life, is knots. The DNA molecules that carry and transmit genes tangle up on themselves, creating knots. The molecules that DNA encodes, proteins and enzymes and all the other basic tools of cells, can be represented as knots. Since at this level the field is about how molecules interact you probably would expect that much of chemistry can be seen as the ways knots interact. Statistical mechanics, the study of unspeakably large number of particles, do as well. A field you can be introduced to by studying your sneaker runs through the most useful arteries of science.
That said, mathematicians do make their knots of unreal stuff. The mathematical knot is, normally, a one-dimensional thread rather than a cylinder of stuff like a string or rope or shoelace. No matter; just imagine you’ve got a very thin string. And we assume that it’s frictionless; the knot doesn’t get stuck on itself. As a result a mathematician just learning knot theory would snootily point out that however tightly wound up your extension cord is, it’s not actually knotted. You could in principle push one of the ends of the cord all the way through the knot and so loosen it into an untangled string, if you could push the cord from one end and if the cord didn’t get stuck on itself. So, yes, real-world knots are mathematically not knots. After all, something that just falls apart with a little push hardly seems worth the name “knot”.
My point is that mathematically a knot has to be a closed loop. And it’s got to wrap around itself in some sufficiently complicated way. A simple circle of string is not a knot. If “not a knot” sounds a bit childish you might use instead the Lewis Carrollian term “unknot”.
We can fix that, though, using a surprisingly common mathematical trick. Take the shoelace or rope or extension cord you want to study. And extend it: draw lines from either end of the cord out to the edge of your paper. (This is a great field for doodlers.) And then pretend that the lines go out and loop around, touching each other somewhere off the sheet of paper, as simply as possible. What had been an unknot is now not an unknot. Study wisely.
My title is an exaggeration. In eighth grade Prealgebra I learned many things, but I confess that I didn’t learn well from that particular teacher that particular year. What I most clearly remember learning I picked up from a substitute who filled in a few weeks. It’s a method for factoring quadratic expressions into binomial expressions, and I must admit, it’s not very good. It’s cumbersome and totally useless once one knows the quadratic equation. But it’s fun to do, and I liked it a lot, and I’ve never seen it described as a way to factor quadratic expressions. So let me put it on the web and do what I can to preserve its legacy, and get hundreds of people telling me what it actually is and how everybody but the people I know went through a phase of using it.
It’s a method which looks at first like it’s going to be a magic square, but it’s not, and I’m at a loss what to call it. I don’t remember the substitute teacher’s name, so I can’t use that. I do remember the regular teacher’s name, but it wasn’t, as far as I know, part of his lesson plan, and it’d not be fair to him to let his legacy be defined by one student who just didn’t get him.
Since I suspect that the comics roundup posts are the most popular ones I post, I’m very glad to see there was a bumper crop of strips among the ones I read regularly (from King Features Syndicate and from gocomics.com) this past week. Some of those were from cancelled strips in perpetual reruns, but that’s fine, I think: there aren’t any particular limits on how big an electronic comics page one can have, after all, and while it’s possible to read a short-lived strip long enough that you see all its entries, it takes a couple go-rounds to actually have them all memorized.
[ Oh, wow. Yesterday’s entry had way fewer hits than average. I also put an equation out right up front where everyone could see it. I wonder if this might be a test of Stephen Hawking’s dictum about equations and sales. Or maybe I was just boring yesterday. I’d ask, but apparently, nobody found me interesting enough yesterday to know for comparison. ]
Polynomials turn up all over the place. There are multiple good reasons for this. For one, suppose we have any continuous function that we want to study. (“Continuous” has a technical definition, although if you imagine what we might mean by that in ordinary English — that we could draw it without having to lift pen from paper — you’ve got it, apart from freak cases designed to confuse students taking real analysis by making continuous functions that don’t look anything like something you could ever draw, which is jolly good fun until the grades are returned.) If we’re willing to accept a certain margin of error around that function, though, we can always find a polynomial that’s within that margin of error of the function we really want to study. I have read, albeit in secondary sources, that for a while in the 18th century it was thought that a mathematician could just as well define a function as “something that a polynomial can approximate”.
I do sometimes read online forums of educators, particularly math educators, since it’s fun to have somewhere to talk shop, and the topics of conversation are constant enough you don’t have to spend much time getting the flavor of a particular group before participating. If you suppose the students are lazy, the administrators meddling, the community unsupportive, and the public irrationally terrified of mathematics you’ve covered most forum threads. I had no luck holding forth my view on one particular topic, though, so I’ll try fighting again here where I can easily squelch the opposition.
The argument, a subset of students-are-lazy (as they don’t wish to understand mathematics), was about a mnemonic technique called FOIL. It’s a tool to help people multiply binomials. Binomials are the sum (or difference) of two quantities, for example, (a + 2) or (b + 5). Here a and b are numbers whose value I don’t care about; I don’t care about the 2 or 5 either, but by picking specific values I avoid having too much abstraction in my paragraph. The product of (a + 2) with (b + 5) is the sum of all the pairs made by multiplying one term in the first binomial by one term in the second. There are four such pairs: a times b, and a times 5, and 2 times b, and 2 times 5. And therefore the product (a + 2) * (b + 5) will be a*b + a*5 + 2*b + 2*5. That would usually be cleaned up by writing 5*a instead of a*5, and by writing 10 instead of 2*5, so the sum would become a*b + 5*a + 2*b + 10.
FOIL is a way of making sure one has covered all the pairs. The letters stand for First, Outer, Inner, Last, and they mean: take the product of the First terms in each binomial, a and b; and those of the Outer terms, a and 5; and those of the Inner terms, 2 and b; and those of the Last terms, 2 and 5.
Here is my distinguished colleague’s objection to FOIL: Nobody needs it. This is true.