Do you need to know the formula to tell you what the sum of the first N counting numbers, raised to a power? No, you do not. Not really. It can save a bit of time to know the sum of the numbers raised to the first power. Most mathematicians would know it, or be able to recreate it fast enough:
It’s a neat one. Mariani describes a way to use knowledge of the sum of numbers to the first power to generate a formula for the sum of squares. And then to use the sum of squares formula to generate the sum of cubes. The sum of cubes then lets you get the sub of fourth-powers. And so on. This takes a while to do if you’re interested in the sum of twentieth powers. But do you know how many times you’ll ever need to generate that formula? Anyway, as Mariani notes, this sort of thing is useful if you find yourself at a mathematics competition. Or some other event where you can’t just have the computer calculate this stuff.
Mariani’s process is a great one. Like many mnemonics it doesn’t make literal sense. It expects one to integrate and differentiate polynomials. Anyone likely to be interested in a formula for the sums of twelfth powers knows how to do those in their sleep. But they’re integrating and differentiating polynomials for which, in context, the integrals and derivatives don’t exist. Or at least don’t mean anything. That’s all right. If all you want is the right answer, it’s okay to get there by a wrong method. At least if you verify the answer is right, which the last section of Mariani’s paper does. So, give it a read if you’d like to see a neat mathematical trick to a maybe useful result.
Analysis is about proving why the rest of mathematics works. It’s a hard field. My experience, a typical one, included crashing against real analysis as an undergraduate and again as a graduate student. It turns out mathematics works by throwing a lot of symbols around.
Let me give an example. If you read pop mathematics blogs you know about the number represented by . You’ve seen proofs, some of them even convincing, that this number equals 1. Not a tiny bit less than 1, but exactly 1. Here’s a real-analysis treatment. And — I may regret this — I recommend you don’t read it. Not closely, at least. Instead, look at its shape. Look at the words and symbols as graphic design elements, and trust that what I say is not nonsense. Resume reading after the horizontal rule.
It’s convenient to have a name for the number . I’ll call that , for “repeating”. 1 we’ll call 1. I think you’ll grant that whatever r is, it can’t be more than 1. I hope you’ll accept that if the difference between 1 and r is zero, then r equals 1. So what is the difference between 1 and r?
Give me some number . It has to be a positive number. The implication in the letter is that it’s a small number. This isn’t actually required in general. We expect it. We feel surprise and offense if it’s ever not the case.
I can show that the difference between 1 and r is less than . I know there is some smallest counting number N so that . For example, say is 0.125. Then we can let N = 1, and . Or suppose is 0.00625. But then if N = 3, . (If is bigger than 1, let N = 1.) Now we have to ask why I want this N.
Whatever the value of r is, I know that it is more than 0.9. And that it is more than 0.99. And that it is more than 0.999. In fact, it’s more than the number you get by truncating r after any whole number N of digits. Let me call the number you get by truncating r after N digits. So, and and and so on.
Since , it has to be true that . And since we know what is, we can say exactly what is. It's . And we picked N so that . So . But all we know of is that it's a positive number. It can be any positive number. So has to be smaller than each and every positive number. The biggest number that’s smaller than every positive number is zero. So the difference between 1 and r must be zero and so they must be equal.
That is a compelling argument. Granted, it compels much the way your older brother kneeling on your chest and pressing your head into the ground compels. But this argument gives the flavor of what much of analysis is like.
For one, it is fussy, leaning to technical. You see why the subject has the reputation of driving off all but the most intent mathematics majors. If you get comfortable with this sort of argument it’s hard to notice anymore.
For another, the argument shows that the difference between two things is less than every positive number. Therefore the difference is zero and so the things are equal. This is one of mathematics’ most important tricks. And another point, there’s a lot of talk about . And about finding differences that are, it usually turns out, smaller than some . (As an undergraduate I found something wasteful in how the differences were so often so much less than . We can’t exhaust the small numbers, though. It still feels uneconomic.)
Something this misses is another trick, though. That’s adding zero. I couldn’t think of a good way to use that here. What we often get is the need to show that, say, function and function are equal. That is, that they are less than apart. What we can often do is show that is close to some related function, which let me call .
I know what you’re suspecting: must be a polynomial. Good thought! Although in my experience, it’s actually more likely to be a piecewise constant function. That is, it’s some number, eg, “2”, for part of the domain, and then “2.5” in some other region, with no transition between them. Some other values, even values not starting with “2”, in other parts of the domain. Usually this is easier to prove stuff about than even polynomials are.
But get back to . It’s got the same deal as , some approximation easier to prove stuff about. Then we want to show that is close to some . And then show that is close to . So — watch this trick. Or, again, watch the shape of this trick. Read again after the horizontal rule.
The difference is equal to since adding zero, that is, adding the number , can’t change a quantity. And is equal to . Same reason: is zero. So:
Now we use the “triangle inequality”. If a, b, and c are the lengths of a triangle’s sides, the sum of any two of those numbers is larger than the third. And that tells us:
And then if you can show that is less than ? And that is also ? And you see where this is going for ? Then you’ve shown that . With luck, each of these little pieces is something you can prove.
Don’t worry about what all this means. It’s meant to give a flavor of what you do in an analysis course. It looks hard, but most of that is because it’s a different sort of work than you’d done before. If you hadn’t seen the adding-zero and triangle-inequality tricks? I don’t know how long you’d need to imagine them.
There are other tricks too. An old reliable one is showing that one thing is bounded by the other. That is, that . You use this trick all the time because if you can also show that , then those two have to be equal.
The good thing — and there is good — is that once you get the hang of these tricks analysis starts to come together. And even get easier. The first course you take as a mathematics major is real analysis, all about functions of real numbers. The next course in this track is complex analysis, about functions of complex-valued numbers. And it is easy. Compared to what comes before, yes. But also on its own. Every theorem in complex analysis named after Augustin-Louis Cauchy. They all show that the integral of your function, calculated along a closed loop, is zero. I exaggerate by .
In grad school, if you make it, you get to functional analysis, which examines functions on functions and other abstractions like that. This, too, is easy, possibly because all the basic approaches you’ve seen several courses over. Or it feels easy after all that mucking around with the real numbers.
This is not the entirety of explaining how mathematics works. Since all these proofs depend on how numbers work, we need to show how numbers work. How logic works. But those are subjects we can leave for grad school, for someone who’s survived this gauntlet.
What we mean by that is the area between some left boundary, , and some right boundary, , that’s above the x-axis, and below that curve. And there’s just no finding a, you know, answer. Something that looks like (to make up an answer) the area is or something normal like that. The one interesting exception is that you can find the area if the left bound is and the right bound . That’s done by some clever reasoning and changes of variables which is why we see that and only that in freshman calculus. (Oh, and as a side effect we can get the integral between 0 and infinity, because that has to be half of that.)
Anyway, Quintanilla includes a nice bit along the way, that I don’t remember from my freshman calculus, pointing out why we can’t come up with a nice simple formula like that. It’s a loose argument, showing what would happen if we suppose there is a way to integrate this using normal functions and showing we get a contradiction. A proper proof is much harder and fussier, but this is likely enough to convince someone who understands a bit of calculus and a bit of Taylor series.
The talk of the catenary and the brachistochrone give away that this is a calculus paper. The catenary and the brachistochrone are some of the oldest problems in calculus as we know it. The catenary is the problem of what shape a weighted chain takes under gravity. The brachistochrone is the problem of what path a beam of light traces out moving through regions with different indexes of refraction. (As in, through films of glass or water or such.) Straight lines and circles we’ve heard of from other places.
The paper relies on calculus so if you’re not comfortable with that, well, skim over the lines with symbols. Rojas discusses the ways that we can treat all these different shapes as solutions of related, very similar problems. And there’s some talk about calculating approximate solutions. There is special delight in this as these are problems that can be done by an analog computer. You can build a tool to do some of these calculations. And I do mean “you”; the approach is to build a box, like, the sort of thing you can do by cutting up plastic sheets and gluing them together and setting toothpicks or wires on them. Then dip the model into a soap solution. Lift it out slowly and take a good picture of the soapy surface.
This is not as quick, or as precise, as fiddling with a Matlab or Octave or Mathematica simulation. But it can be much more fun.
We have goldfish, normally kept in an outdoor pond. It’s not a deep enough pond that it would be safe to leave them out for a very harsh winter. So we keep as many as we can catch in a couple 150-gallon tanks in the basement.
Recently, and irritatingly close to when we’d set them outside, the nitrate level in the tanks grew too high. Fish excrete ammonia. Microorganisms then turn the ammonia into nitrates and then nitrates. In the wild, the nitrates then get used by … I dunno, plants? Which don’t thrive enough hin our basement to clean them out. To get the nitrate out of the water all there is to do is replace the water.
We have six buckets, each holding five gallons, of water that we can use for replacement. So there’s up to 30 gallons of water that we could change out in a day. Can’t change more because tap water contains chloramines, which kill bacteria (good news for humans) but hurt fish (bad news for goldfish). We can treat the tap water to neutralize the chloramines, but want to give that time to finish. I have never found a good reference for how long this takes. I’ve adopted “about a day” because we don’t have a water tap in the basement and I don’t want to haul more than 30 gallons of water downstairs any given day.
So I got thinking, what’s the fastest way to get the nitrate level down for both tanks? Change 15 gallons in each of them once a day, or change 30 gallons in one tank one day and the other tank the next?
And, happy to say, I realized this was the tea-making problem I’d done a couple months ago. The tea-making problem had a different goal, that of keeping as much milk in the tea as possible. But the thing being studied was how partial replacements of a solution with one component affects the amount of the other component. The major difference is that the fish produce (ultimately) more nitrates in time. There’s no tea that spontaneously produces milk. But if nitrate-generation is low enough, the same conclusions follow. So, a couple days of 30-gallon changes, in alternating tanks, and we had the nitrates back to a decent level.
We’d have put the fish outside this past week if I hadn’t broken, again, the tool used for cleaning the outside pond.
The problem I’d set out last week: I have a teapot good for about three cups of tea. I want to put milk in the once, before the first cup. How much should I drink before topping up the cup, to have the most milk at the end?
I have expectations. Some of this I know from experience, doing other problems where things get replaced at random. Here, tea or milk particles get swallowed at random, and replaced with tea particles. Yes, ‘particle’ is a strange word to apply to “a small bit of tea”. But it’s not like I can call them tea molecules. “Particle” will do and stop seeming weird someday.
Random replacement problems tend to be exponential decays. That I know from experience doing problems like this. So if I get an answer that doesn’t look like an exponential decay I’ll doubt it. I might be right, but I’ll need more convincing.
I also get some insight from extreme cases. We can call them reductios. Here “reductio” as in the word we usually follow with “ad absurdum”. Make the case ridiculous and see if that offers insight. The first reductio is to suppose I drink the entire first cup down to the last particle, then pour new tea in. By the second cup, there’s no milk left. The second reductio is to suppose I drink not a bit of the first cup of milk-with-tea. Then I have the most milk preserved. It’s not a satisfying break. But it leads me to suppose the most milk makes it through to the end if I have a lot of small sips and replacements of tea. And to look skeptically if my work suggests otherwise.
So that’s what I expect. What actually happens? Here, I do a bit of reasoning. Suppose that I have a mug. It can hold up to 1 unit of tea-and-milk. And the teapot, which holds up to 2 more units of tea-and-milk. What units? For the mathematics, I don’t care.
I’m going to suppose that I start with some amount — call it — of milk. is some number between 0 and 1. I fill the cup up to full, that is, 1 unit of tea-and-milk. And I drink some amount of the mixture. Call the amount I drink . It, too, is between 0 and 1. After this, I refill the mug up to full, so, putting in units of tea. And I repeat this until I empty the teapot. So I can do this times.
I know you noticed that I’m short on tea here. The teapot should hold 3 units of tea. I’m only pouring out . I could be more precise by refilling the mug times. I’m also going to suppose that I refill the mug with amount of tea a whole number of times. This sounds necessarily true. But consider: what if I drank and re-filled three-quarters of a cup of tea each time? How much tea is poured that third time?
I make these simplifications for good reasons. They reduce the complexity of the calculations I do without, I trust, making the result misleading. I can justify it too. I don’t drink tea from a graduated cylinder. It’s a false precision to pretend I do. I drink (say) about half my cup and refill it. How much tea I get in the teapot is variable too. Also, I don’t want to do that much work for this problem.
In fact, I’m going to do most of the work of this problem with a single drawing of a square. Here it is.
So! I start out with units of tea in the mixture. After drinking units of milk-and-tea, what’s left is units of milk in the mixture.
How about the second refill? The process is the same as the first refill. But where, before, there had been units of milk in the tea, now there are only units in. So that horizontal strip is a little narrower is all. The same reasoning applies and so, after the second refill, there’s milk in the mixture.
If you nodded to that, you’d agree that after the third refill there’s . And are pretty sure what happens at the fourth and fifth and so on. If you didn’t nod to that, it’s all right. If you’re willing to take me on faith we can continue. If you’re not, that’s good too. Try doing a couple drawings yourself and you may convince yourself. If not, I don’t know. Maybe try, like, getting six white and 24 brown beads, stir them up, take out four at random. Replace all four with brown beads and count, and do that several times over. If you’re short on beads, cut up some paper into squares and write ‘B’ and ‘W’ on each square.
But anyone comfortable with algebra can see how to reduce this. The amount of milk remaining after j refills is going to be
How many refills does it take to run out of tea? That we knew from above: it’s refills. So my last full mug of tea will have left in it
units of milk.
Anyone who does differential equations recognizes this. It’s the discrete approximation of the exponential decay curve. Discrete, here, because we take out some finite but nonzero amount of milk-and-tea, , and replace it with the same amount of pure tea.
Now, again, I’ve seen this before so I know its conclusions. The most milk will make it to the end of is as small as possible. The best possible case would be if I drink and replace an infinitesimal bit of milk-and-tea each time. Then the last mug would end with of milk. That’s as in the base of the natural logarithm. Every mathematics problem has an somewhere in it and I’m not exaggerating much. All told this would be about 13 and a half percent of the original milk.
Drinking more realistic amounts, like, half the mug before refilling, makes the milk situation more dire. Replacing half the mug at a time means the last full mug has only one-sixteenth what I started with. Drinking a quarter of the mug and replacing it lets about one-tenth the original milk survive.
But all told the lesson is clear. If I want milk in the last mug, I should put some in each refill. Putting all the milk in at the start and letting it dissolve doesn’t work.
I’ve been taking milk in my tea lately. I have a teapot good for about three cups of tea. So that’s got me thinking about how to keep the most milk in the last of my tea. You may ask why I don’t just get some more milk when I refill the cup. I answer that if I were willing to work that hard I wouldn’t be a mathematician.
It’s easy to spot the lowest amount of milk I could have. If I drank the whole of the first cup, there’d be only whatever milk was stuck by surface tension to the cup for the second. And so even less than that for the third. But if I drank half a cup, poured more tea in, drank half again, poured more in … without doing the calculation, that’s surely more milk for the last full cup.
So what’s the strategy for the most milk I could get in the final cup? And how much is in there?
The exact suggestion I got for L was “Leibniz, the inventor of Calculus”. I can’t in good conscience offer that. This isn’t to deny Leibniz’s critical role in calculus. We rely on many of the ideas he’d had for it. We especially use his notation. But there are few great big ideas that can be truly credited to an inventor, or even a team of inventors. Put aside the sorry and embarrassing priority dispute with Isaac Newton. Many mathematicians in the 16th and 17th century were working on how to improve the Archimedean “method of exhaustion”. This would find the areas inside select curves, integral calculus. Johannes Kepler worked out the areas of ellipse slices, albeit with considerable luck. Gilles Roberval tried working out the area inside a curve as the area of infinitely many narrow rectangular strips. We still learn integration from this. Pierre de Fermat recognized how tangents to a curve could find maximums and minimums of functions. This is a critical piece of differential calculus. Isaac Barrow, Evangelista Torricelli (of barometer fame), Pietro Mengoli, and Stephano Angeli all pushed mathematics towards calculus. James Gregory proved, in geometric form, the relationship between differentiation and integration. That relationship is the Fundamental Theorem of Calculus.
This is not to denigrate Leibniz. We don’t dismiss the Wright Brothers though we know that without them, Alberto Santos-Dumont or Glenn Curtiss or Samuel Langley would have built a workable airplane anyway. We have Leibniz’s note, dated the 29th of October, 1675 (says Florian Cajori), writing out to mean the sum of all l’s. By mid-November he was integrating functions, and writing out his work as . Any mathematics or physics or chemistry or engineering major today would recognize that. A year later he was writing things like , which we’d also understand if not quite care to put that way.
Though we use his notation and his basic tools we don’t exactly use Leibniz’s particular ideas of what calculus means. It’s been over three centuries since he published. It would be remarkable if he had gotten the concepts exactly and in the best of all possible forms. Much of Leibniz’s calculus builds on the idea of a differential. This is a quantity that’s smaller than any positive number but also larger than zero. How does that make sense? George Berkeley argued it made not a lick of sense. Mathematicians frowned, but conceded Berkeley was right. By the mid-19th century they had a rationale for differentials that avoided this weird sort of number.
It’s hard to avoid the differential’s lure. The intuitive appeal of “imagine moving this thing a tiny bit” is always there. In science or engineering applications it’s almost mandatory. Few things we encounter in the real world have the kinds of discontinuity that create logic problems for differentials. Even in pure mathematics, we will look at a differential equation like and rewrite it as . Leibniz’s notation gives us the idea that taking derivatives is some kind of fraction. It isn’t, but in many problems we act as though it were. It works out often enough we forget that it might not.
Better, though. From the 1960s Abraham Robinson and others worked out a different idea of what real numbers are. In that, differentials have a rigorous logical definition. We call the mathematics which uses this “non-standard analysis”. The name tells something of its use. This is not to call it wrong. It’s merely not what we learn first, or necessarily at all. And it is Leibniz’s differentials. 304 years after his death there is still a lot of mathematics he could plausibly recognize.
There is still a lot of still-vital mathematics that he touched directly. Leibniz appears to be the first person to use the term “function”, for example, to describe that thing we’re plotting with a curve. He worked on systems of linear equations, and methods to find solutions if they exist. This technique is now called Gaussian elimination. We see the bundling of the equations’ coefficients he did as building a matrix and finding its determinant. We know that technique, today, as Cramer’s Rule, after Gabriel Cramer. The Japanese mathematician Seki Takakazu had discovered determinants before Leibniz, though.
Leibniz tried to study a thing he called “analysis situs”, which two centuries on would be a name for topology. My reading tells me you can get a good fight going among mathematics historians by asking whether he was a pioneer in topology. So I’ll decline to take a side in that.
In the 1680s he tried to create an algebra of thought, to turn reasoning into something like arithmetic. His goal was good: we see these ideas today as Boolean algebra, and concepts like conjunction and disjunction and negation and the empty set. Anyone studying logic knows these today. He’d also worked in something we can see as symbolic logic. Unfortunately for his reputation, the papers he wrote about that went unpublished until late in the 19th century. By then other mathematicians, like Gottlob Frege and Charles Sanders Peirce, had independently published the same ideas.
We give Leibniz’ name to a particular series that tells us the value of π:
(The Indian mathematician Madhava of Sangamagrama knew the formula this comes from by the 14th century. I don’t know whether Western Europe had gotten the news by the 17th century. I suspect it hadn’t.)
The drawback to using this to figure out digits of π is that it takes forever to use. Taking ten decimal digits of π demands evaluating about five billion terms. That’s not hyperbole; it just takes like forever to get its work done.
Which is something of a theme in Leibniz’s biography. He had a great many projects. Some of them even reached a conclusion. Many did not, and instead sprawled out with great ambition and sometimes insight before getting lost. Consider a practical one: he believed that the use of wind-driven propellers and water pumps could drain flooded mines. (Mines are always flooding.) In principle, he was right. But they all failed. Leibniz blamed deliberate obstruction by administrators and technicians. He even blamed workers afraid that new technologies would replace their jobs. Yet even in this failure he observed and had bracing new thoughts. The geology he learned in the mines project made him hypothesize that the Earth had been molten. I do not know the history of geology well enough to say whether this was significant to that field. It may have been another frustrating moment of insight (lucky or otherwise) ahead of its time but not connected to the mainstream of thought.
Another project, tantalizing yet incomplete: the “stepped reckoner”, a mechanical arithmetic machine. The design was to do addition and subtraction, multiplication and division. It’s a breathtaking idea. It earned him election into the (British) Royal Society in 1673. But it never was quite complete, never getting carries to work fully automatically. He never did finish it, and lost friends with the Royal Society when he moved on to other projects. He had a note describing a machine that could do some algebraic operations. In the 1690s he had some designs for a machine that might, in theory, integrate differential equations. It’s a fantastic idea. At some point he also devised a cipher machine. I do not know if this is one that was ever used in its time.
His greatest and longest-lasting unfinished project was for his employer, the House of Brunswick. Three successive Brunswick rulers were content to let Leibniz work on his many side projects. The one that Ernest Augustus wanted was a history of the Guelf family, in the House of Brunswick. One that went back to the time of Charlemagne or earlier if possible. The goal was to burnish the reputation of the house, which had just become a hereditary Elector of the Holy Roman Empire. (That is, they had just gotten to a new level of fun political intriguing. But they were at the bottom of that level.) Starting from 1687 Leibniz did good diligent work. He travelled throughout central Europe to find archival materials. He studied their context and meaning and relevance. He organized it. What he did not do, by his death in 1716, was write the thing.
It is always difficult to understand another person. Moreso someone you know only through biography. And especially someone who lived in very different times. But I do see a particular an modern personality type here. We all know someone who will work so very hard getting prepared to do a project Right that it never gets done. You might be reading the words of one right now.
Leibniz was a compulsive Society-organizer. He promoted ones in Brandenberg and Berlin and Dresden and Vienna and Saint Petersburg. None succeeded. It’s not obvious why. Leibniz was well-connected enough; he’s known to have over six hundred correspondents. Even for a time of great letter-writing, that’s a lot.
But it does seem like something about him offended others. Failing to complete big projects, like the stepped reckoner or the History of the Guelf family, seems like some of that. Anyone who knows of calculus knows of the dispute about the Newton-versus-Leibniz priority dispute. Grant that Leibniz seems not to have much fueled the quarrel. (And that modern historians agree Leibniz did not steal calculus from Newton.) Just being at the center of Drama causes people to rate you poorly.
There seems like there’s more, though. He was liked, for example, by the Electress Sophia of Hanover and her daughter Sophia Charlotte. These were the mother and the sister of Britain’s King George I. When George I ascended to the British throne he forbade Leibniz coming to London until at least one volume of the history was written. (The restriction seems fair, considering Leibniz was 27 years into the project by then.)
There are pieces in his biography that suggest a person a bit too clever for his own good. His first salaried position, for example, was as secretary to a Nuremberg alchemical society. He did not know alchemy. He passed himself off as deeply learned, though. I don’t blame him. Nobody would ever pass a job interview if they didn’t pretend to have expertise. Here it seems to have worked.
But consider, for example, his peace mission to Paris. Leibniz was born in the last years of the Thirty Years War. In that, the Great Powers of Europe battled each other in the German states. They destroyed Germany with a thoroughness not matched until World War II. Leibniz reasonably feared France’s King Louis XIV had designs on what was left of Germany. So his plan was to sell the French government on a plan of attacking Egypt and, from there, the Dutch East Indies. This falls short of an early-Enlightenment idea of rational world peace and a congress of nations. But anyone who plays grand strategy games recognizes the “let’s you and him fight” scheming. (The plan became irrelevant when France went to war with the Netherlands. The war did rope Brandenberg-Prussia, Cologne, Münster, and the Holy Roman Empire into the mess.)
And I have not discussed Leibniz’s work in philosophy, outside his logic. He’s respected for the theory of monads, part of the long history of trying to explain how things can have qualities. Like many he tried to find a deductive-logic argument about whether God must exist. And he proposed the notion that the world that exists is the most nearly perfect that can possibly be. Everyone has been dragging him for that ever since he said it, and they don’t look ready to stop. It’s an unfair rap, even if it makes for funny spoofs of his writing.
The optimal world may need to be badly defective in some ways. And this recognition inspires a question in me. Obviously Leibniz could come to this realization from thinking carefully about the world. But anyone working on optimization problems knows the more constraints you must satisfy, the less optimal your best-fit can be. Some things you might like may end up being lousy, because the overall maximum is more important. I have not seen anything to suggest Leibniz studied the mathematics of optimization theory. Is it possible he was working in things we now recognize as such, though? That he has notes in the things we would call Lagrange multipliers or such? I don’t know, and would like to know if anyone does.
Leibniz’s funeral was unattended by any dignitary or courtier besides his personal secretary. The Royal Academy and the Berlin Academy of Sciences did not honor their member’s death. His grave was unmarked for a half-century. And yet historians of mathematics, philosophy, physics, engineering, psychology, social science, philology, and more keep finding his work, and finding it more advanced than one would expect. Leibniz’s legacy seems to be one always rising and emerging from shade, but never being quite where it should.
GoldenOj suggested the exponential as a topic. It seemed like a good important topic, but one that was already well-explored by other people. Then I realized I could spend time thinking about something which had bothered me.
In here I write about “the” exponential, which is a bit like writing about “the” multiplication. We can talk about and and many other such exponential functions. One secret of algebra, not appreciated until calculus (or later), is that all these different functions are a single family. Understanding one exponential function lets you understand them all. Mathematicians pick one, the exponential with base e, because we find that convenient. e itself isn’t a convenient number — it’s a bit over 2.718 — but it has some wonderful properties. When I write “the exponential” here, I am looking at this function where we look at .
This piece will have a bit more mathematics, as in equations, than usual. If you like me writing about mathematics more than reading equations, you’re hardly alone. I recommend letting your eyes drop to the next sentence, or at least the next sentence that makes sense. You should be fine.
My professor for real analysis, in grad school, gave us one of those brilliant projects. Starting from the definition of the logarithm, as an integral, prove at least thirty things. They could be as trivial as “the log of 1 is 0”. They could be as subtle as how to calculate the log of one number in a different base. It was a great project for testing what we knew about why calculus works.
And it gives me the structure to write about the exponential function. Anyone reading a pop-mathematics blog about exponentials knows them. They’re these functions that, as the independent variable grows, grow ever-faster. Or that decay asymptotically to zero. Some readers know that, if the independent variable is an imaginary number, the exponential is a complex number too. As the independent variable grows, becoming a bigger imaginary number, the exponential doesn’t grow. It oscillates, a sine wave.
That’s weird. I’d like to see why that makes sense.
To say “why” this makes sense is doomed. It’s like explaining “why” 36 is divisible by three and six and nine but not eight. It follows from what the words we have mean. The “why” I’ll offer is reasons why this strange behavior is plausible. It’ll be a mix of deductive reasoning and heuristics. This is a common blend when trying to understand why a result happens, or why we should accept it.
I’ll start with the definition of the logarithm, as used in real analysis. The natural logarithm, if you’re curious. It has a lot of nice properties. You can use this to prove over thirty things. Here it is:
The “s” is a dummy variable. You’ll never see it in actual use.
So now let me summon into existence a new function. I want to call it g. This is because I’ve worked this out before and I want to label something else as f. There is something coming ahead that’s a bit of a syntactic mess. This is the best way around it that I can find.
Here, ‘c’ is a constant. It might be real. It might be imaginary. It might be complex. I’m using ‘c’ rather than ‘a’ or ‘b’ so that I can later on play with possibilities.
So the alert reader noticed that g(x) here means “take the logarithm of x, and divide it by a constant”. So it does. I’ll need two things built off of g(x), though. The first is its derivative. That’s taken with respect to x, the only variable. Finding the derivative of an integral sounds intimidating but, happy to say, we have a theorem to make this easy. It’s the Fundamental Theorem of Calculus, and it tells us:
We can use the ‘ to denote “first derivative” if a function has only one variable. Saves time to write and is easier to type.
The other thing that I need, and the thing I really want, is the inverse of g. I’m going to call this function f(t). A more common notation would be to write but we already have in the works here. There is a limit to how many little one-stroke superscripts we need above g. This is the tradeoff to using ‘ for first derivatives. But here’s the important thing:
Here, we have some extratextual information. We know the inverse of a logarithm is an exponential. We even have a standard notation for that. We’d write
in any context besides this essay as I’ve set it up.
What I would like to know next is: what is the derivative of f(t)? This sounds impossible to know, if we’re thinking of “the inverse of this integration”. It’s not. We have the Inverse Function Theorem to come to our aid. We encounter the Inverse Function Theorem briefly, in freshman calculus. There we use it to do as many as two problems and then hide away forever from the Inverse Function Theorem. (This is why it’s not mentioned in my quick little guide to how to take derivatives.) It reappears in real analysis for this sort of contingency. The inverse function theorem tells us, if f the inverse of g, that:
That g'(f(t)) means, use the rule for g'(x), with f(t) substituted in place of ‘x’. And now we see something magic:
And that is the wonderful thing about the exponential. Its derivative is a constant times its original value. That alone would make the exponential one of mathematics’ favorite functions. It allows us, for example, to transform differential equations into polynomials. (If you want everlasting fame, albeit among mathematicians, invent a new way to turn differential equations into polynomials.) Because we could turn, say,
by supposing that f(t) has to be for the correct value of c. Then all you need do is find a value of ‘c’ that makes that last equation true.
Supposing that the answer has this convenient form may remind you of searching for the lost keys over here where the light is better. But we find so many keys in this good light. If you carry on in mathematics you will never stop seeing this trick, although it may be disguised.
In part because it’s so easy to work with. In part because exponentials like this cover so much of what we might like to do. Let’s go back to looking at the derivative of the exponential function.
There are many ways to understand what a derivative is. One compelling way is to think of it as the rate of change. If you make a tiny change in t, how big is the change in f(t)? So what is the rate of change here?
We can pose this as a pretend-physics problem. This lets us use our physical intuition to understand things. This also is the transition between careful reasoning and ad-hoc arguments. Imagine a particle that, at time ‘t’, is at the position . What is its velocity? That’s the first derivative of its position, so, .
If we are using our physics intuition to understand this it helps to go all the way. Where is the particle? Can we plot that? … Sure. We’re used to matching real numbers with points on a number line. Go ahead and do that. Not to give away spoilers, but we will want to think about complex numbers too. Mathematicians are used to matching complex numbers with points on the Cartesian plane, though. The real part of the complex number matches the horizontal coordinate. The imaginary part matches the vertical coordinate.
So how is this particle moving?
To say for sure we need some value of t. All right. Pick your favorite number. That’s our t. f(t) follows from whatever your t was. What’s interesting is that the change also depends on c. There’s a couple possibilities. Let me go through them.
First, what if c is zero? Well, then the definition of g(t) was gibberish and we can’t have that. All right.
What if c is a positive real number? Well, then, f'(t) is some positive multiple of whatever f(t) was. The change is “away from zero”. The particle will push away from the origin. As t increases, f(t) increases, so it pushes away faster and faster. This is exponential growth.
What if c is a negative real number? Well, then, f'(t) is some negative multiple of whatever f(t) was. The change is “towards zero”. The particle pulls toward the origin. But the closer it gets the more slowly it approaches. If t is large enough, f(t) will be so tiny that is too small to notice. The motion declines into imperceptibility.
What if c is an imaginary number, though?
So let’s suppose that c is equal to some real number b times , where .
I need some way to describe what value f(t) has, for whatever your pick of t was. Let me say it’s equal to , where and are some real numbers whose value I don’t care about. What’s important here is that .
And, then, what’s the first derivative? The magnitude and direction of motion? That’s easy to calculate; it’ll be . This is an interesting complex number. Do you see what’s interesting about it? I’ll get there next paragraph.
So f(t) matches some point on the Cartesian plane. But f'(t), the direction our particle moves with a small change in t, is another poiat whatever complex number f'(t) is as another point on the plane. The line segment connecting the origin to f(t) is perpendicular to the one connecting the origin to f'(t). The ‘motion’ of this particle is perpendicular to its position. And it always is. There’s several ways to show this. An easy one is to just pick some values for and and b and try it out. This proof is not rigorous, but it is quick and convincing.
If your direction of motion is always perpendicular to your position, then what you’re doing is moving in a circle around the origin. This we pick up in physics, but it applies to the pretend-particle moving here. The exponentials of and and will all be points on a locus that’s a circle centered on the origin. The values will look like the cosine of an angle plus times the sine of an angle.
And there, I think, we finally get some justification for the exponential of an imaginary number being a complex number. And for why exponentials might have anything to do with cosines and sines.
You might ask what if c is a complex number, if it’s equal to for some real numbers a and b. In this case, you get spirals as t changes. If a is positive, you get points spiralling outward as t increases. If a is negative, you get points spiralling inward toward zero as t increases. If b is positive the spirals go counterclockwise. If b is negative the spirals go clockwise. is the same as .
This does depend on knowing the exponential of a sum of terms, such as of , is equal to the product of the exponential of those terms. This is a good thing to have in your portfolio. If I remember right, it comes in around the 25th thing. It’s an easy result to have if you already showed something about the logarithms of products.
This was a week of few mathematically-themed comic strips. I don’t mind. If there was a recurring motif, it was about parents not doing mathematics well, or maybe at all. That’s not a very deep observation, though. Let’s look at what is here.
Liniers’s Macanudo for the 18th puts forth 2020 as “the year most kids realized their parents can’t do math”. Which may be so; if you haven’t had cause to do (say) long division in a while then remembering just how to do it is a chore. This trouble is not unique to mathematics, though. Several decades out of regular practice they likely also have trouble remembering what the 11th Amendment to the US Constitution is for, or what the rule is about using “lie” versus “lay”. Some regular practice would correct that, though. In most cases anyway; my experience suggests I cannot possibly learn the rule about “lie” versus “lay”. I’m also shaky on “set” as a verb.
Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 18th shows a mathematician talking, in the jargon of first and second derivatives, to support the claim there’ll never be a mathematician president. Yes, Weinersmith is aware that James Garfield, 20th President of the United States, is famous in trivia circles for having an original proof of the Pythagorean theorem. It would be a stretch to declare Garfield a mathematician, though, except in the way that anyone capable of reason can be a mathematician. Raymond Poincaré, President of France for most of the 1910s and prime minister before and after that, was not a mathematician. He was cousin to Henri Poincaré, who founded so much of our understanding of dynamical systems and of modern geometry. I do not offhand know what presidents (or prime ministers) of other countries have been like.
Weinersmith’s mathematician uses the jargon of the profession. Specifically that of calculus. It’s unlikely to communicate well with the population. The message is an ordinary one, though. The first derivative of something with respect to time means the rate at which things are changing. The first derivative of a thing, with respect to time being positive means that the quantity of the thing is growing. So, that first half means “things are getting more bad”.
The second derivative of a thing with respect to time, though … this is interesting. The second derivative is the same thing as the first derivative with respect to time of “the first derivative with respect to time”. It’s what the change is in the rate-of-change. If that second derivative is negative, then the first derivative will, in time, change from being positive to being negative. So the rate of increase of the original thing will, in time, go from a positive to a negative number. And so the quantity will eventually decline.
So the mathematician is making a this-is-the-end-of-the-beginning speech. The point at which the the second derivative of a quantity changes sign is known as the “inflection point”. Reaching that is often seen as the first important step in, for example, disease epidemics. It is usually the first good news, the promise that there will be a limit to the badness. It’s also sometimes mentioned in economic crises or sometimes demographic trends. “Inflection point” is likely as technical a term as one can expect the general public to tolerate, though. Even that may be pushing things.
Julie Larson’s The Dinette Set rerun for the 21st fusses around words. Along the way Burl mentions his having learned that two negatives can make a positive, in mathematics. Here it’s (most likely) the way that multiplying or dividing two negative numbers will produce a positive number.
I’m again falling behind the comic strips; I haven’t had the writing time I’d like, and that review of last month’s readership has to go somewhere. So let me try to dig my way back to current. The happy news is I get to do one of those single-day Reading the Comics posts, nearly.
Harley Schwadron’s 9 to 5 for the 7th strongly implies that the kid wearing a lemon juicer for his hat has nearly flunked arithmetic. At the least it’s mathematics symbols used to establish this is a school.
Jef Mallett’s Frazz for the 7th has kids thinking about numbers whose (English) names rhyme. And that there are surprisingly few of them, considering that at least the smaller whole numbers are some of the most commonly used words in the language. It would be interesting if there’s some deeper reason that they don’t happen to rhyme, but I would expect that it’s just, well, why should the names of 6 and 8 (say) have anything to do with each other?
There are, arguably, gaps in Evan and Kevyn’s reasoning, and on the 8th one of the other kids brings them up. Basically, is there any reason to say that thirteen and nineteen don’t rhyme? Or that twenty-one and forty-one don’t? Evan writes this off as pedantry. But I, admittedly inclined to be a pedant, think there’s a fair question here. How many numbers do we have names for? Is there something different between the name we have for 11 and the name we have for 1100? Or 2011?
There isn’t an objectively right or wrong answer; at most there are answers that are more or less logically consistent, or that are more or less convenient. Finding what those differences are can be interesting, and I think it bad faith to shut down the argument as “pedantry”.
Dave Whamond’s Reality Check for the 7th claims “birds aren’t partial to fractions” and shows a bird working out, partially with diagrams, the saying about birds in the hand and what they’re worth in the bush.
The narration box, phrasing the bird as not being “partial to fractions”, intrigues me. I don’t know if the choice is coincidental on Whamond’s part. But there is something called “partial fractions” that you get to learn painfully well in Calculus II. It’s used in integrating functions. It turns out that you often can turn a “rational function”, one whose rule is one polynomial divided by another, into the sum of simpler fractions. The point of that is making the fractions into things easier to integrate. The technique is clever, but it’s hard to learn. And, I must admit, I’m not sure I’ve ever used it to solve a problem of interest to me. But it’s very testable stuff.
Today’s A To Z term was suggested by Dina Yagodich, whose YouTube channel features many topics, including calculus and differential equations, statistics, discrete math, and Matlab. Matlab is especially valuable to know as a good quick calculation can answer many questions.
The Wallis named here is John Wallis, an English clergyman and mathematician and cryptographer. His most tweetable work is how we follow his lead in using the symbol ∞ to represent infinity. But he did much in calculus. And it’s a piece of that which brings us to today. He particularly noticed this:
This is an infinite product. It’s multiplication’s answer to the infinite series. It always amazes me when an infinite product works. There are dangers when you do anything with an infinite number of terms. Even the basics of arithmetic, like that you can change the order in which you calculate but still get the same result, break down. Series, in which you add together infinitely many things, are risky, but I’m comfortable with the rules to know when the sum can be trusted. Infinite products seem more mysterious. Then you learn an infinite product converges if and only if the series made from the logarithms of the terms in it also converges. Then infinite products seem less exciting.
There are many infinite products that give us π. Some work quite efficiently, giving us lots of digits for a few terms’ work. Wallis’s formula does not. We need about a thousand terms for it to get us a π of about 3.141. This is a bit much to calculate even today. In 1656, when he published it in Arithmetica Infinitorum, a book I have never read? Wallis was able to do mental arithmetic well. His biography at St Andrews says once when having trouble sleeping he calculated the square root of a 53-digit number in his head, and in the morning, remembered it, and was right. Still, this would be a lot of work. How could Wallis possibly do it? And what work could possibly convince anyone else that he was right?
As it common to striking discoveries it was a mixture of insight and luck and persistence and pattern recognition. He seems to have started with pondering the value of
Happily, he knew exactly what this was: . He knew this because of a bit of insight. We can interpret the integral here as asking for the area that’s enclosed, on a Cartesian coordinate system, by the positive x-axis, the positive y-axis, and the set of points which makes true the equation . This curve is the upper half of a circle with radius 1 and centered on the origin. The area enclosed by all this is one-fourth the area of a circle of radius 1. So that’s how he could know the value of the integral, without doing any symbol manipulation.
The question, in modern notation, would be whether he could do that integral. And, for this? He couldn’t. But, unable to do the problem he wanted, he tried doing the most similar problem he could and see what that proved. was beyond his power to integrate; but what if he swapped those exponents? Worked on instead? This would not — could not — give him what he was interested in. But it would give him something he could calculate. So can we:
And now here comes persistence. What if it’s not inside the parentheses there? If it’s x raised to some other unit fraction instead? What if the parentheses aren’t raised to the second power, but to some other whole number? Might that reveal something useful? Each of these integrals is calculable, and he calculated them. He worked out a table for many values of
for different sets of whole numbers p and q. He trusted that if he kept this up, he’d find some interesting pattern. And he does. The integral, for example, always turns out to be a unit fraction. And there’s a deeper pattern. Let me share results for different values of p and q; the integral is the reciprocal of the number inside the table. The topmost row is values of q; the leftmost column is values of p.
There is a deep pattern here, although I’m not sure Wallis noticed that one. Look along the diagonals, running from lower-left to upper-right. These are the coefficients of the binomial expansion. Yang Hui’s triangle, if you prefer. Pascal’s triangle, if you prefer that. Let me call the term in row p, column q of this table . Then
Great material, anyway. The trouble is that it doesn’t help Wallis with the original problem, which — in this notation — would have and . What he really wanted was the Binomial Theorem, but western mathematicians didn’t know it yet. Here a bit of luck comes in. He had noticed there’s a relationship between terms in one column and terms in another, particularly, that
So why shouldn’t that hold if p and q aren’t whole numbers? … We would today say why should they hold? But Wallis was working with a different idea of mathematical rigor. He made assumptions that it turned out in this case were correct. Of course, had he been wrong, we wouldn’t have heard of any of this and I would have an essay on some other topic.
With luck in Wallis’s favor we can go back to making a table. What would the row for look like? We’ll need both whole and half-integers. is easy; its reciprocal is 1. is also easy; that’s the insight Wallis had to start with. Its reciprocal is . What about the rest? Use the equation just up above, relating to ; then we can start to fill in:
Anything we can learn from this? … Well, sure. For one, as we go left to right, all these entries are increasing. So, like, the second column is less than the third which is less than the fourth. Here’s a triple inequality for you:
Multiply all that through by, on, . And then divide it all through by . What have we got?
I did some rearranging of terms, but, that’s the pattern. One-half π has to be between and four-thirds that.
Move over a little. Start from the row where . This starts us out with
Multiply everything by , and divide everything by and follow with some symbol manipulation. And here’s a tip which would have saved me some frustration working out my notes: . Also, 6 equals 2 times 3. Later on, you may want to remember that 8 equals 2 times 4. All this gets us eventually to
Move over to the next terms, starting from . This will get us eventually to
You see the pattern here. Whatever the value of , it’s squeezed between some number, on the left side of this triple inequality, and that same number times … uh … something like or or or . That last one is a number very close to 1. So the conclusion is that has to equal whatever that pattern is making for the number on the left there.
We can make this more rigorous. Like, we don’t have to just talk about squeezing the number we want between two nearly-equal values. We can rely on the use of the … Squeeze Theorem … to prove this is okay. And there’s much we have to straighten out. Particularly, we really don’t want to write out expressions like
Put that way, it looks like, well, we can divide each 3 in the denominator into a 6 in the numerator to get a 2, each 5 in the denominator to a 10 in the numerator to get a 2, and so on. We get a product that’s infinitely large, instead of anything to do with π. This is that problem where arithmetic on infinitely long strings of things becomes dangerous. To be rigorous, we need to write this product as the limit of a sequence, with finite numerator and denominator, and be careful about how to compose the numerators and denominators.
But this is all right. Wallis found a lovely result and in a way that’s common to much work in mathematics. It used a combination of insight and persistence, with pattern recognition and luck making a great difference. Often when we first find something the proof of it is rough, and we need considerable work to make it rigorous. The path that got Wallis to these products is one we still walk.
These are named for George Green, an English mathematician of the early 19th century. He’s one of those people who gave us our idea of mathematical physics. He’s credited with coining the term “potential”, as in potential energy, and in making people realize how studying this simplified problems. Mostly problems in electricity and magnetism, which were so very interesting back then. On the side also came work in multivariable calculus. His work most famous to mathematics and physics majors connects integrals over the surface of a shape with (different) integrals over the entire interior volume. In more specific problems, he did work on the behavior of water in canals.
There’s a patch of (high school) algebra where you solve systems of equations in a couple variables. Like, you have to do one system where you’re solving, say,
And then maybe later on you get a different problem, one that looks like:
If you solve both of them you notice you’re doing a lot of the same work. All the same hard work. It’s only the part on the right-hand side of the equals signs that are different. Even then, the series of steps you follow on the right-hand-side are the same. They have different numbers is all. What makes the problem distinct is the stuff on the left-hand-side. It’s the set of what coefficients times what variables add together. If you get enough about matrices and vectors you get in the habit of writing this set of equations as one matrix equation, as
Here holds all the unknown variables, your x and y and z and anything else that turns up. Your holds the right-hand side. Do enough of these problems and you notice something. You can describe how to find the solution for these equations before you even know what the right-hand-side is. You can do all the hard work of solving this set of equations for a generic set of right-hand-side constants. Fill them in when you need a particular answer.
I mentioned, while writing about Fourier series, how it turns out most of what you do to numbers you can also do to functions. This really proves itself in differential equations. Both partial and ordinary differential equations. A differential equation works with some not-yet-known function u(x). For what I’m discussing here it doesn’t matter whether ‘x’ is a single variable or a whole set of independent variables, like, x and y and z. I’ll use ‘x’ as shorthand for all that. The differential equation takes u(x) and maybe multiplies it by something, and adds to that some derivatives of u(x) multiplied by something. Those somethings can be constants. They can be other, known, functions with independent variable x. They can be functions that depend on u(x) also. But if they are, then this is a nonlinear differential equation and there’s no solving that.
So suppose we have a linear differential equation. Partial or ordinary, whatever you like. There’s terms that have u(x) or its derivatives in them. Move them all to the left-hand-side. Move everything else to the right-hand-side. This right-hand-side might be constant. It might depend on x. Doesn’t matter. This right-hand-side is some function which I’ll call f(x). This f(x) might be constant; that’s fine. That’s still a legitimate function.
Put this way, every differential equation looks like:
That stuff with u(x) and its derivatives we can call an operator. An operator’s a function which has a domain of functions and a range of functions. So we can give give that a name. ‘L’ is a good name here, because if it’s not the operator for a linear differential equation — a linear operator — then we’re done anyway. So whatever our differential equation was we can write it:
Writing it makes it look like we’re multiplying L by u(x). We’re not. We’re really not. This is more like if ‘L’ is the predicate of a sentence and ‘u(x)’ is the object. Read it like, to make up an example, ‘L’ means ‘three times the second derivative plus two x times’ and ‘u(x)’ as ‘u(x)’.
Still, looking at and then back up at tells you what I’m thinking. We can find some set of instructions to, for any , find the that makes true. So why can’t we find some set of instructions to, for any , find the that makes true?
This is where a Green’s function comes in. Or, like everybody says, “the” Green’s function. “The” here we use like we might talk about “the” roots of a polynomial. Every polynomial has different roots. So, too, does every differential equation have a different Green’s function. What the Green’s function is depends on the equation. It can also depend on what domain the differential equation applies to. It can also depend on some extra information called initial values or boundary values.
The Green’s function for a differential equation has twice as many independent variables as the differential equation has. This seems like we’re making a mess of things. It’s all right. These new variables are the falsework, the scaffolding. Once they’ve helped us get our work done they disappear. This kind of thing we call a “dummy variable”. If x is the actual independent variable, then pick something else — s is a good choice — for the dummy variable. It’s from the same domain as the original x, though. So the Green’s function is some . All right, but how do you find it?
To get this, you have to solve a particular special case of the differential equation. You have to solve:
This may look like we’re not getting anywhere. It may even look like we’re getting in more trouble. What is this , for example? Well, this is a particular and famous thing called the Dirac delta function. It’s called a function as a courtesy to our mathematical physics friends, who don’t care about whether it truly is a function. Dirac is Paul Dirac, from over in physics. The one whose biography is called The Strangest Man. His delta function is a strange function. Let me say that its independent variable is t. Then is zero, unless t is itself zero. If t is zero then is … something. What is that something? … Oh … something big. It’s … just … don’t look directly at it. What’s important is the integral of this function:
I write it this way because there’s delta functions for two-dimensional spaces, three-dimensional spaces, everything. If you integrate over a region that includes the origin, the integral of the delta function is 1. If you integrate over a region that doesn’t, the integral of the delta function is 0.
The delta function has a neat property sometimes called filtering. This is what happens if you integrate some function times the Dirac delta function. Then …
This may look dumb. That’s fine. This scheme is so good at getting rid of integrals where you don’t want them. Or at getting integrals in where it’d be convenient to have.
So, I have a mental model of what the Dirac delta function does. It might help you. Think of beating a drum. It can sound like many different things. It depends on how hard you hit it, how fast you hit it, what kind of stick you use, where exactly you hit it. I think of each differential equation as a different drumhead. The Green’s function is then the sound of a specific, uniform, reference hit at a reference position. This produces a sound. I can use that sound to extrapolate how every different sort of drumming would sound on this particular drumhead.
So solving this one differential equation, to find the Green’s function for a particular case, may be hard. Maybe not. Often it’s easier than some particular f(x) because the Dirac delta function is so weird that it becomes kinda easy-ish. But you do have to find one solution to this differential equation, somehow.
Once you do, though? Once you have this ? That is glorious. Because then, whatever your f is? The solution to is:
Here the integral is over whatever the domain of the differential equation is, and whatever the domain of f is. This last integral is where the dummy variable finally evaporates. All that remains is x, as we want.
A little bit of … arithmetic isn’t the right word. But symbol manipulation will convince you this is right, if you need convincing. (The trick is remembering that ‘x’ and ‘s’ are different variables. When you differentiate with respect to ‘x’, ‘s’ acts like a constant. When you integrate with respect to ‘s’, ‘x’ acts like a constant.)
What can make a Green’s function worth finding is that we do a lot of the same kinds of differential equations. We do a lot of diffusion problems. A lot of wave transmission problems. A lot of wave-transmission-with-losses problems. So there are many problems that can all use the same tools to solve.
Consider remote detection problems. This can include things like finding things underground. It also includes, like, medical sensors. We would like to know “what kind of thing produces a signal like this?” We can detect the signal easily enough. We can model how whatever it is between the thing and our sensors changes what we could detect. (This kind of thing we call an “inverse problem”, finding the thing that could produce what we know.) Green’s functions are one of the ways we can get at the source of what we can see.
Now, Green’s functions are a powerful and useful idea. They sprawl over a lot of mathematical applications. As they do, they pick up regional dialects. Things like deciding that , for example. None of these are significant differences. But before you go poking into someone else’s field and solving their problems, take a moment. Double-check that their symbols do mean precisely what you think they mean. It’ll save you some petty quarrels.
Also, I really don’t like how those systems of equations turned out up at the top of this essay. But I couldn’t work out how to do arrays of equations all lined up along the equals sign, or other mildly advanced LaTeX stuff like doing a function-definition-by-cases. If someone knows of the Real Official Proper List of what you can and can’t do with the LaTeX that comes from a standard free WordPress.com blog I’d appreciate a heads-up. Thank you.
The Extreme Value Theorem, which I chose to write about, is a fundamental bit of analysis. There is also a similarly-named but completely unrelated Extreme Value Theory. This exists in the world of statistics. That’s about outliers, and about how likely it is you’ll find an even more extreme outlier if you continue sampling. This is valuable in risk assessment: put another way, it’s the question of what neighborhoods you expect to flood based on how the river’s overflowed the last hundred years. Or be in a wildfire, or be hit by a major earthquake, or whatever. The more I think about it the more I realize that’s worth discussing too. Maybe in the new year, if I decide to do some A To Z extras.
And then there are theorems that seem the opposite. Ones that seem so obvious, and so obviously true, that they hardly seem like mathematics. If they’re not axioms, they might as well be. The extreme value theorem is one of these.
It’s a theorem about functions. Here, functions that have a domain and a range that are both real numbers. Even more specifically, about continuous functions. “Continuous” is a tricky idea to make precise, but we don’t have to do it. A century of mathematicians worked out meanings that correspond pretty well to what you’d imagine it should mean. It means you can draw a graph representing the function without lifting the pen. (Do not attempt to use this definition at your thesis defense. I’m skipping what a century’s worth of hard thinking about the subject.)
And it’s a theorem about “extreme” values. “Extreme” is a convenient word. It means “maximum or minimum”. We’re often interested in the greatest or least value of a function. Having a scheme to find the maximum is as good as having one to find a minimum. So there’s little point talking about them as separate things. But that forces us to use a bunch of syllables. Or to adopt a convention that “by maximum we always mean maximum or minimum”. We could say we mean that, but I’ll bet a good number of mathematicians, and 95% of mathematics students, would forget the “or minimum” within ten minutes. “Extreme”, then. It’s short and punchy and doesn’t commit us to a maximum or a minimum. It’s simply the most outstanding value we can find.
The Extreme Value Theorem doesn’t help us find them. It only proves to us there is an extreme to find. Particularly, it says that if a continuous function has a domain that’s a closed interval, then it has to have a maximum and a minimum. And it has to attain the maximum and the minimum at least once each. That is, something in the domain matches to the maximum. And something in the domain matches to the minimum. Could be multiple times, yes.
This might not seem like much of a theorem. Existence proofs rarely do. It’s a bias, I suppose. We like to think we’re out looking for solutions. So we suppose there’s a solution to find. Checking that there is an answer before we start looking? That seems excessive. Before heading to the airport we might check the flight wasn’t delayed. But we almost never check that there is still a Newark to fly to. I’m not sure, in working out problems, that we check it explicitly. We decide early on that we’re working with continuous functions and so we can try out the usual approaches. That we use the theorem becomes invisible.
And that’s sort of the history of this theorem. The Extreme Value Theorem, for example, is part of how we now prove Rolle’s Theorem. Rolle’s theorem is about functions continuous and differentiable on the interval from a to b. And functions that have the same value for a and for b. The conclusion is the function hass got a local maximum or minimum in-between these. It’s the theorem depicted in that xkcd comic you maybe didn’t check out a few paragraphs ago. Rolle’s Theorem is named for Michael Rolle, who proved the theorem (for polynomials) in 1691. The Indian mathematician Bhaskara II, in the 12th century, stated the theorem too. (I’m so ignorant of the Indian mathematical tradition that I don’t know whether Bhaskara II stated it for polynomials, or for functions in general, or how it was proved.)
The Extreme Value Theorem was proven around 1860. (There was an earlier proof, by Bernard Bolzano, whose name you’ll find all over talk about limits and functions and continuity and all. But that was unpublished until 1930. The proofs known about at the time were done by Karl Weierstrass. His is the other name you’ll find all over talk about limits and functions and continuity and all. Go on, now, guess who it was proved the Extreme Value Theorem. And guess what theorem, bearing the name of two important 19th-century mathematicians, is at the core of proving that. You need at most two chances!) That is, mathematicians were comfortable using the theorem before it had a clear identity.
Once you know that it’s there, though, the Extreme Value Theorem’s a great one. It’s useful. Rolle’s Theorem I just went through. There’s also the quite similar Mean Value Theorem. This one is about functions continuous and differentiable on an interval. It tells us there’s at least one point where the derivative is equal to the mean slope of the function on that interval. This is another theorem that’s a quick proof once you have the Extreme Value Theorem. Or we can get more esoteric. There’s a technique known as Lagrange Multipliers. It’s a way to find where on a constrained surface a function is at its maximum or minimum. It’s a clever technique, one that I needed time to accept as a thing that could possibly work. And why should it work? Go ahead, guess what the centerpiece of at least one method of proving it is.
Step back from calculus and into real analysis. That’s the study of why calculus works, and how real numbers work. The Extreme Value Theorem turns up again and again. Like, one technique for defining the integral itself is to approximate a function with a “stepwise” function. This is one that looks like a pixellated, rectangular approximation of the function. The definition depends on having a stepwise rectangular approximation that’s as close as you can get to a function while always staying less than it. And another stepwise rectangular approximation that’s as close as you can get while always staying greater than it.
And then other results. Often in real analysis we want to know about whether sets are closed and bounded. The Extreme Value Theorem has a neat corollary. Start with a continuous function with domain that’s a closed and bounded interval. Then, this theorem demonstrates, the range is also a closed and bounded interval. I know this sounds like a technical point. But it is the sort of technical point that makes life easier.
The Extreme Value Theorem even takes on meaning when we don’t look at real numbers. We can rewrite it in topological spaces. These are sets of points for which we have an idea of a “neighborhood” of points. We don’t demand that we know what distance is exactly, though. What had been a closed and bounded interval becomes a mathematical construct called a “compact set”. The idea of a continuous function changes into one about the image of an open set being another open set. And there is still something recognizably the Extreme Value Theorem. It tells us about things called the supremum and infimum, which are slightly different from the maximum and minimum. Just enough to confuse the student taking real analysis the first time through.
Topological spaces are an abstracted concept. Real numbers are topological spaces, yes. But many other things also are. Neighborhoods and compact sets and open sets are also abstracted concepts. And so this theorem has its same quiet utility in these many spaces. It’s just there quietly supporting more challenging work.
I liked that episode. I’ve got happy memories of the time when I first saw it. I thought the sketch in which Crow T Robot got so volume-obsessed was goofy and dumb in the fun-nerd way.
I accept Mr Kassinger’s challenge only I’m going to take it seriously.
How big is a thing?
There is a legend about Thomas Edison. He was unimpressed with a new hire. So he hazed the college-trained engineer who deeply knew calculus. He demanded the engineer tell him the volume within a light bulb. The engineer went to work, making measurements of the shape of the bulb’s outside. And then started the calculations. This involves a calculus technique called “volumes of rotation”. This can tell the volume within a rotationally symmetric shape. It’s tedious, especially if the outer edge isn’t some special nice shape. Edison, fed up, took the bulb, filled it with water, poured that out into a graduated cylinder and said that was the answer.
I’m skeptical of legends. I’m skeptical of stories about the foolish intellectual upstaged by the practical man-of-action. And I’m skeptical of Edison because, jeez, I’ve read biographies of the man. Even the fawning ones make him out to be yeesh.
But the legend’s Edison had a point. If the volume of a shape is not how much stuff fits inside the shape, what is it? And maybe some object has too complicated a shape to find its volume. Can we think of a way to produce something with the same volume, but that is easier? Sometimes we can. When we do this with straightedge and compass, the way the Ancient Greeks found so classy, we call this “quadrature”. It’s called quadrature from its application in two dimensions. It finds, for a shape, a square with the same area. For a three-dimensional object, we find a cube with the same volume. Cubes are easy to understand.
Straightedge and compass can’t do everything. Indeed, there’s so much they can’t do. Some of it is stuff you’d think it should be able to, like, find a cube with the same volume as a sphere. Integration gives us a mathematical tool for describing how much stuff is inside a shape. It’s even got a beautiful shorthand expression. Suppose that D is the shape. Then its volume V is:
Here “dV” is the “volume form”, a description of how the coordinates we describe a space in relate to the volume. The is jargon, meaning, “integrate over the whole volume”. The subscript “D” modifies that phrase by adding “of D” to it. Writing “D” is shorthand for “these are all the points inside this shape, in whatever coordinate system you use”. If we didn’t do that we’d have to say, on each sign, what points are inside the shape, coordinate by coordinate. At this level the equation doesn’t offer much help. It says the volume is the sum of infinitely many, infinitely tiny pieces of volume. True, but that doesn’t give much guidance about whether it’s more or less than two cups of water. We need to get more specific formulas, usually. We need to pick coordinates, for example, and say what coordinates are inside the shape. A lot of the resulting formulas can’t be integrated exactly. Like, an ellipsoid? Maybe you can integrate that. Don’t try without getting hazard pay.
We can approximate this integral. Pick a tiny shape whose volume is easy to know. Fill your shape with duplicates of it. Count the duplicates. Multiply that count by the volume of this tiny shape. Done. This is numerical integration, sometimes called “numerical quadrature”. If we’re being generous, we can say the legendary Edison did this, using water molecules as the tiny shape. And working so that he didn’t need to know the exact count or the volume of individual molecules. Good computational technique.
It’s hard not to feel we’re begging the question, though. We want the volume of something. So we need the volume of something else. Where does that volume come from?
Well, where does an inch come from? Or a centimeter? Whatever unit you use? You pick something to use as reference. Any old thing will do. Which is why you get fascinating stories about choosing what to use. And bitter arguments about which of several alternatives to use. And we express the length of something as some multiple of this reference length.
Volume works the same way. Pick a reference volume, something that can be one unit-of-volume. Other volumes are some multiple of that unit-of-volume. Possibly a fraction of that unit-of-volume.
Usually we use a reference volume that’s based on the reference length. Typically, we imagine a cube that’s one unit of length on each side. The volume of this cube with sides of length 1 unit-of-length is then 1 unit-of-volume. This seems all nice and orderly and it’s surely not because mathematicians have paid off by six-sided-dice manufacturers.
Does it have to be?
That we need some reference volume seems inevitable. We can’t very well say the area of something is ten times nothing-in-particular. Does that reference volume have to be a cube? Or even a rectangle or something else? It seems obvious that we need some reference shape that tiles, that can fill up space by itself … right?
What if we don’t?
I’m going to drop out of three dimensions a moment. Not because it changes the fundamentals, but because it makes something easier. Specifically, it makes it easier if you decide you want to get some construction paper, cut out shapes, and try this on your own. What this will tell us about area is just as true for volume. Area, for a two-dimensional sapce, and volume, for a three-dimensional, describe the same thing. If you’ll let me continue, then, I will.
So draw a figure on a clean sheet of paper. What’s its area? Now imagine you have a whole bunch of shapes with reference areas. A bunch that have an area of 1. That’s by definition. That’s our reference area. A bunch of smaller shapes with an area of one-half. By definition, too. A bunch of smaller shapes still with an area of one-third. Or one-fourth. Whatever. Shapes with areas you know because they’re marked on them.
Here’s one way to find the area. Drop your reference shapes, the ones with area 1, on your figure. How many do you need to completely cover the figure? It’s all right to cover more than the figure. It’s all right to have some of the reference shapes overlap. All you need is to cover the figure completely. … Well, you know how many pieces you needed for that. You can count them up. You can add up the areas of all these pieces needed to cover the figure. So the figure’s area can’t be any bigger than that sum.
Can’t be exact, though, right? Because you might get a different number if you covered the figure differently. If you used smaller pieces. If you arranged them better. This is true. But imagine all the possible reference shapes you had, and all the possible ways to arrange them. There’s some smallest area of those reference shapes that would cover your figure. Is there a more sensible idea for what the area of this figure would be?
And put this into three dimensions. If we start from some reference shapes of volume 1 and maybe 1/2 and 1/3 and whatever other useful fractions there are? Doesn’t this covering make sense as a way to describe the volume? Cubes or rectangles are easy to imagine. Tetrahedrons too. But why not any old thing? Why not, as the Mystery Science Theater 3000 episode had it, turkeys?
This is a nice, flexible, convenient way to define area. So now let’s see where it goes all bizarre. We know this thanks to Giuseppe Peano. He’s among the late-19th/early-20th century mathematicians who shaped modern mathematics. They did this by showing how much of our mathematics broke intuition. Peano was (here) exploring what we now call fractals. And noted a family of shapes that curl back on themselves, over and over. They’re beautiful.
And they fill area. Fill volume, if done in three dimensions. It seems impossible. If we use this covering scheme, and try to find the volume of a straight line, we get zero. Well, we find that any positive number is too big, and from that conclude that it has to be zero. Since a straight line has length, but not volume, this seems fine. But a Peano curve won’t go along with this. A Peano curve winds back on itself so much that there is some minimum volume to cover it.
This unsettles. But this idea of volume (or area) by covering works so well. To throw it away seems to hobble us. So it seems worth the trade. We allow ourselves to imagine a line so long and so curled up that it has a volume. Amazing.
And now I get to relax and unwind and enjoy a long weekend before coming to the letter ‘W’. That’ll be about some topic I figure I can whip out a nice tight 500 words about, and instead, produce some 1541-word monstrosity while I wonder why I’ve had no free time at all since August. Tuesday, give or take, it’ll be available at this link, as are the rest of these glossary posts. Thanks for reading.
While putting together the last comics from a week ago I realized there was a repeat among them. And a pretty recent repeat too. I’m supposing this is a one-off, but who can be sure? We’ll get there. I figure to cover last week’s mathematically-themed comics in posts on Wednesday and Thursday, subject to circumstances.
As fits the joke, the bit of calculus in this textbook paragraph is wrong. does not equal . This is even ignoring that we should expect, with an indefinite integral like this, a constant of integration. An indefinite integral like this is equal to a family of related functions. But it’s common shorthand to write out one representative function. But the indefinite integral of is not . You can confirm that by differentiating . The result is nothing like . Differentiating an indefinite integral should get the original function back. Here are the rules you need to do that for yourself.
As I make it out, a correct indefinite integral would be:
Plus that “constant of integration” the value of which we can’t tell just from the function we want to indefinitely-integrate. I admit I haven’t double-checked that I’m right in my work here. I trust someone will tell me if I’m not. I’m going to feel proud enough if I can get the LaTeX there to display.
Stephen Beals’s Adult Children for the 27th has run already. It turned up in late March of this year. Michael Spivak’s Calculus is a good choice for representative textbook. Calculus holds its terrors, too. Even someone who’s gotten through trigonometry can find the subject full of weird, apparently arbitrary rules. And formulas like those in the above paragraph.
Rob Harrell’s Big Top for the 27th is a strip about the difficulties of splitting a restaurant bill. And they’ve not even got to calculating the tip. (Maybe it’s just a strip about trying to push the group to splitting the bill a way that lets you off cheap. I haven’t had to face a group bill like this in several years. My skills with it are rusty.)
I got an irresistible topic for today’s essay. It’s courtesy Peter Mander, author of Carnot Cycle, “the classical blog about thermodynamics”. It’s bimonthly and it’s one worth waiting for. Some of the essays are historical; some are statistical-mechanics; many are mixtures of them. You could make a fair argument that thermodynamics is the most important field of physics. It’s certainly one that hasn’t gotten the popularization treatment it deserves, for its importance. Mander is doing something to correct that.
It is hard to think of limits without thinking of motion. The language even professional mathematicians use suggests it. We speak of the limit of a function “as x goes to a”, or “as x goes to infinity”. Maybe “as x goes to zero”. But a function is a fixed thing, a relationship between stuff in a domain and stuff in a range. It can’t change any more than January, AD 1988 can change. And ‘x’ here is a dummy variable, part of the scaffolding to let us find what we want to know. I suppose ‘x’ can change, but if we ever see it, something’s gone very wrong. But we want to use it to learn something about a function for a point like ‘a’ or ‘infinity’ or ‘zero’.
The language of motion helps us learn, to a point. We can do little experiments: if , then, what should we expect it to be for x near zero? It’s irresistible to try out the calculator. Let x be 0.1. 0.01. 0.001. 0.0001. The numbers say this f(x) gets closer and closer to 1. That’s good, right? We know we can’t just put in an x of zero, because there’s some trouble that makes. But we can imagine creeping up on the zero we really wanted. We might spot some obvious prospects for mischief: what if x is negative? We should try -0.1, -0.01, -0.001 and so on. And maybe we won’t get exactly the right answer. But if all we care about is the first (say) three digits and we try out a bunch of x’s and the corresponding f(x)’s agree to those three digits, that’s good enough, right?
This is good for giving an idea of what to expect a limit to look like. It should be, well, what it really really really looks like a function should be. It takes some thinking to see where it might go wrong. It might go to different numbers based on which side you approach from. But that seems like something you can rationalize. Indeed, we do; we can speak of functions having different limits based on what direction you approach from. Sometimes that’s the best one can say about them.
But it can get worse. It’s possible to make functions that do crazy weird things. Some of these look like you’re just trying to be difficult. Like, set f(x) equal to 1 if x is rational and 0 if x is irrational. If you don’t expect that to be weird you’re not paying attention. Can’t blame someone for deciding that falls outside the realm of stuff you should be able to find limits for. And who would make, say, an f(x) that was 1 if x was 0.1 raised to some power, but 2 if x was 0.2 raised to some power, and 3 otherwise? Besides someone trying to prove a point?
Fine. But you can make a function that looks innocent and yet acts weird if the domain is two-dimensional. Or more. It makes sense to say that the functions I wrote in the above paragraph should be ruled out of consideration. But the limit of at the origin? You get different results approaching in different directions. And the function doesn’t give obvious signs of imminent danger here.
We need a better idea. And we even have one. This took centuries of mathematical wrangling and arguments about what should and shouldn’t be allowed. This should inspire sympathy with Intro Calc students who don’t understand all this by the end of week three. But here’s what we have.
I need a supplementary idea first. That is the neighborhood. A point has a neighborhood if there’s some open set that contains it. We represent this by drawing a little blob around the point we care about. If we’re looking at the neighborhood of a real number, then this is a little interval, that’s all. When we actually get around to calculating, we make these neighborhoods little circles. Maybe balls. But when we’re doing proofs about how limits work, or how we use them to prove things, we make blobs. This “neighborhood” idea looks simple, but we need it, so here we go.
So start with a function, named ‘f’. It has a domain, which I’ll call ‘D’. And a range, which I want to call ‘R’, but I don’t think I need the shorthand. Now pick some point ‘a’. This is the point at which we want to evaluate the limit. This seems like it ought to be called the “limit point” and it’s not. I’m sorry. Mathematicians use “limit point” to talk about something else. And, unfortunately, it makes so much sense in that context that we aren’t going to change away from that.
‘a’ might be in the domain ‘D’. It might not. It might be on the border of ‘D’. All that’s important is that there be a neighborhood inside ‘D’ that contains ‘a’.
I don’t know what f(a) is. There might not even be an f(a), if a is on the boundary of the domain ‘D’. But I do know that everything inside the neighborhood of ‘a’, apart from ‘a’, is in the domain. So we can look at the values of f(x) for all the x’s in this neighborhood. This will create a set, in the range, that’s known as the image of the neighborhood. It might be a continuous chunk in the range. It might be a couple of chunks. It might be a single point. It might be some crazy-quilt set. Depends on ‘f’. And the neighborhood. No matter.
Now I need you to imagine the reverse. Pick a point in the range. And then draw a neighborhood around it. Then pick out what we call the pre-image of it. That’s all the points in the domain that get matched to values inside that neighborhood. Don’t worry about trying to do it; that’s for the homework practice. Would you agree with me that you can imagine it?
I hope so because I’m about to describe the part where Intro Calc students think hard about whether they need this class after all.
All right. Then I want something in the range. I’m going to call it ‘L’. And it’s special. It’s the limit of ‘f’ at ‘a’ if this following bit is true:
Think of every neighborhood you could pick of ‘L’. Can be big, can be small. Just has to be a neighborhood of ‘L’. Now think of the pre-image of that neighborhood. Is there always a neighborhood of ‘a’ inside that pre-image? It’s okay if it’s a tiny neighborhood. Just has to be an open neighborhood. It doesn’t have to contain ‘a’. You can allow a pinpoint hole there.
If you can always do this, however tiny the neighborhood of ‘L’ is, then the limit of ‘f’ at ‘a’ is ‘L’. If you can’t always do this — if there’s even a single exception — then there is no limit of ‘f’ at ‘a’.
I know. I felt like that the first couple times through the subject too. The definition feels backward. Worse, it feels like it begs the question. We suppose there’s an ‘L’ and then test these properties about it and then if it works we say we’re done? I know. It’s a pain when you start calculating this with specific formulas and all that, too. But supposing there is an answer and then learning properties about it, including whether it can exist? That’s a slick trick. We can use it.
Thing is, the pain is worth it. We can calculate with it and not have to out-think tricky functions. It works for domains with as many dimensions as you need. It works for limits that aren’t inside the domain. It works with domains and ranges that aren’t real numbers. It works for functions with weird and complicated domains. We can adapt it if we want to consider limits that are constrained in some way. It won’t be fooled by tricks like I put up above, the f(x) with different rules for the rational and irrational numbers.
So mathematicians shrug, and do enough problems that they get the hang of it, and use this definition. It’s worth it, once you get there.
I’m back to requests! Today’s comes from commenter Dina Yagodich. I don’t know whether Yagodich has a web site, YouTube channel, or other mathematics-discussion site, but am happy to pass along word if I hear of one.
Let me start by explaining integral calculus in two paragraphs. One of the things done in it is finding a `definite integral’. This is itself a function. The definite integral has as its domain the combination of a function, plus some boundaries, and its range is numbers. Real numbers, if nobody tells you otherwise. Complex-valued numbers, if someone says it’s complex-valued numbers. Yes, it could have some other range. But if someone wants you to do that they’re obliged to set warning flares around the problem and precede and follow it with flag-bearers. And you get at least double pay for the hazardous work. The function that gets definite-integrated has its own domain and range. The boundaries of the definite integral have to be within the domain of the integrated function.
For real-valued functions this definite integral has a great physical interpretation. A real-valued function means the domain and range are both real numbers. You see a lot of these. Call the function ‘f’, please. Call its independent variable ‘x’ and its dependent variable ‘y’. Using Euclidean coordinates, or as normal people call it “graph paper”, draw the points that make true the equation “y = f(x)”. Then draw in the x-axis, that is, the points where “y = 0”. The boundaries of the definite integral are going to be two values of ‘x’, a lower and an upper bound. Call that lower bound ‘a’ and the upper bound ‘b’. And heck, call that a “left boundary” and a “right boundary”, because … I mean, look at them. Draw the vertical line at “x = a” and the vertical line at “x = b”. If ‘f(x)’ is always a positive number, then there’s a shape bounded below by “y = 0”, on the left by “x = a”, on the right by “x = b”, and above by “y = f(x)”. And the definite integral is the area of that enclosed space. If ‘f(x)’ is sometimes zero, then there’s several segments, but their combined area is the definite integral. If ‘f(x)’ is sometimes below zero, then there’s several segments. The definite integral is the sum of the areas of parts above “y = 0” minus the area of the parts below “y = 0”.
(Why say “left boundary” instead of “lower boundary”? Taste, pretty much. But I look at the words “lower boundary” and think about the lower edge, that is, the line where “y = 0” here. And “upper boundary” makes sense as a way to describe the curve where “y = f(x)” as well as “x = b”. I’m confusing enough without making the simple stuff ambiguous.)
Don’t try to pass your thesis defense on this alone. But it’s what you need to understand ‘e’. Start out with the function ‘f’, which has domain of the positive real numbers and range of the positive real numbers. For every ‘x’ in the domain, ‘f(x)’ is the reciprocal, one divided by x. This is a shape you probably know well. It’s a hyperbola. Its asymptotes are the x-axis and the y-axis. It’s a nice gentle curve. Its plot passes through such famous points as (1, 1), (2, 1/2), (1/3, 3), and pairs like that. (10, 1/10) and (1/100, 100) too. ‘f(x)’ is always positive on this domain. Use as left boundary the line “x = 1”. And then — let’s think about different right boundaries.
If the right boundary is close to the left boundary, then this area is tiny. If it’s at, like, “x = 1.1” then the area can’t be more than 0.1. (It’s less than that. If you don’t see why that’s so, fit a rectangle of height 1 and width 0.1 around this curve and these boundaries. See?) But if the right boundary is farther out, this area is more. It’s getting bigger if the right boundary is “x = 2” or “x = 3”. It can get bigger yet. Give me any positive number you like. I can find a right boundary so the area inside this is bigger than your number.
Is there a right boundary where the area is exactly 1? … Well, it’s hard to see how there couldn’t be. If a quantity (“area between x = 1 and x = b”) changes from less than one to greater than one, it’s got to pass through 1, right? … Yes, it does, provided some technical points are true, and in this case they are. So that’s nice.
And there is. It’s a number (settle down, I see you quivering with excitement back there, waiting for me to unveil this) a slight bit more than 2.718. It’s a neat number. Carry it out a couple more digits and it turns out to be 2.718281828. So it looks like a great candidate to memorize. It’s not. It’s an irrational number. The digits go off without repeating or falling into obvious patterns after that. It’s a transcendental number, which has to do with polynomials. Nobody knows whether it’s a normal number, because remember, a normal number is just any real number that you never heard of. To be a normal number, every finite string of digits has to appear in the decimal expansion, just as often as every other string of digits of the same length. We can show by clever counting arguments that roughly every number is normal. Trick is it’s hard to show that any particular number is.
So let me do another definite integral. Set the left boundary to this “x = 2.718281828(etc)”. Set the right boundary a little more than that. The enclosed area is less than 1. Set the right boundary way off to the right. The enclosed area is more than 1. What right boundary makes the enclosed area ‘1’ again? … Well, that will be at about “x = 7.389”. That is, at the square of 2.718281828(etc).
Repeat this. Set the left boundary at “x = (2.718281828etc)2”. Where does the right boundary have to be so the enclosed area is 1? … Did you guess “x = (2.718281828etc)3”? Yeah, of course. You know my rhetorical tricks. What do you want to guess the area is between, oh, “x = (2.718281828etc)3” and “x = (2.718281828etc)5”? (Notice I put a ‘5’ in the superscript there.)
Now, relationships like this will happen with other functions, and with other left- and right-boundaries. But if you want it to work with a function whose rule is as simple as “f(x) = 1 / x”, and areas of 1, then you’re going to end up noticing this 2.718281828(etc). It stands out. It’s worthy of a name.
Which is why this 2.718281828(etc) is a number you’ve heard of. It’s named ‘e’. Leonhard Euler, whom you will remember as having written or proved the fundamental theorem for every area of mathematics ever, gave it that name. He used it first when writing for his own work. Then (in November 1731) in a letter to Christian Goldbach. Finally (in 1763) in his textbook Mechanica. Everyone went along with him because Euler knew how to write about stuff, and how to pick symbols that worked for stuff.
Once you know ‘e’ is there, you start to see it everywhere. In Western mathematics it seems to have been first noticed by Jacob (I) Bernoulli, who noticed it in toy compound interest problems. (Given this, I’d imagine it has to have been noticed by the people who did finance. But I am ignorant of the history of financial calculations. Writers of the kind of pop-mathematics history I read don’t notice them either.) Bernoulli and Pierre Raymond de Montmort noticed the reciprocal of ‘e’ turning up in what we’ve come to call the ‘hat check problem’. A large number of guests all check one hat each. The person checking hats has no idea who anybody is. What is the chance that nobody gets their correct hat back? … That chance is the reciprocal of ‘e’. The number’s about 0.368. In a connected but not identical problem, suppose something has one chance in some number ‘N’ of happening each attempt. And it’s given ‘N’ attempts given for it to happen. What’s the chance that it doesn’t happen? The bigger ‘N’ gets, the closer the chance it doesn’t happen gets to the reciprocal of ‘e’.
It comes up in peculiar ways. In high school or freshman calculus you see it defined as what you get if you take for ever-larger real numbers ‘x’. (This is the toy-compound-interest problem Bernoulli found.) But you can find the number other ways. You can calculate it — if you have the stamina — by working out the value of
There’s a simpler way to write that. There always is. Take all the nonnegative whole numbers — 0, 1, 2, 3, 4, and so on. Take their factorials. That’s 1, 1, 2, 6, 24, and so on. Take the reciprocals of all those. That’s … 1, 1, one-half, one-sixth, one-twenty-fourth, and so on. Add them all together. That’s ‘e’.
This ‘e’ turns up all the time. Any system whose rate of growth depends on its current value has an ‘e’ lurking in its description. That’s true if it declines, too, as long as the decline depends on its current value. It gets stranger. Cross ‘e’ with complex-valued numbers and you get, not just growth or decay, but oscillations. And many problems that are hard to solve to start with become doable, even simple, if you rewrite them as growths and decays and oscillations. Through ‘e’ problems too hard to do become problems of polynomials, or even simpler things.
Simple problems become that too. That property about the area underneath “f(x) = 1/x” between “x = 1” and “x = b” makes ‘e’ such a natural base for logarithms that we call it the base for natural logarithms. Logarithms let us replace multiplication with addition, and division with subtraction, easier work. They change exponentiation problems to multiplication, again easier. It’s a strange touch, a wondrous one.
There are some numbers interesting enough to attract books about them. π, obviously. 0. The base of imaginary numbers, , has a couple. I only know one pop-mathematics treatment of ‘e’, Eli Maor’s e: The Story Of A Number. I believe there’s room for more.
You know, the way anyone’s calculator will let you raise 2 to the 85th power. And then raise 3 to whatever number that is. Anyway. The digits of this will agree with the digits of ‘e’ for the first 18,457,734,525,360,901,453,873,570 decimal digits. One Richard Sabey found that, by what means I do not know, in 2004. The page linked there includes a bunch of other, no less amazing, approximations to numbers like ‘e’ and π and the Euler-Mascheroni Constant.
I haven’t got any good ideas for the title for this collection of mathematically-themed comic strips. But I was reading the Complete Peanuts for 1999-2000 and just ran across one where Rerun talked about consoling his basketball by bringing it to a nice warm gymnasium somewhere. So that’s where that pile of words came from.
Mark Anderson’s Andertoons for the 21st is the Mark Anderson’s Andertoons for this installment. It has Wavehead suggest a name for the subtraction of fractions. It’s not by itself an absurd idea. Many mathematical operations get specialized names, even though we see them as specific cases of some more general operation. This may reflect the accidents of history. We have different names for addition and subtraction, though we eventually come to see them as the same operation.
In calculus we get introduced to Maclaurin Series. These are polynomials that approximate more complicated functions. They’re the best possible approximations for a region around 0 in the domain. They’re special cases of the Taylor Series. Those are polynomials that approximate more complicated functions. But you get to pick where in the domain they should be the best approximation. Maclaurin series are nothing but a Taylor series; we keep the names separate anyway, for the reasons. And slightly baffling ones; James Gregory and Brook Taylor studied Taylor series before Colin Maclaurin did Maclaurin series. But at least Taylor worked on Taylor series, and Maclaurin on Macularin series. So for a wonder mathematicians named these things for appropriate people. (Ignoring that Indian mathematicians were poking around this territory centuries before the Europeans were. I don’t know whether English mathematicians of the 18th century could be expected to know of Indian work in the field, in fairness.)
In numerical calculus, we have a scheme for approximating integrals known as the trapezoid rule. It approximates the areas under curves by approximating a curve as a trapezoid. (Any questions?) But this is one of the Runge-Kutta methods. Nobody calls it that except to show they know neat stuff about Runge-Kutta methods. The special names serve to pick out particularly interesting or useful cases of a more generally used thing. Wavehead’s coinage probably won’t go anywhere, but it doesn’t hurt to ask.
Percy Crosby’s Skippy for the 22nd I admit I don’t quite understand. It mentions arithmetic anyway. I think it’s a joke about a textbook like this being good only if it’s got the questions and the answers. But it’s the rare Skippy that’s as baffling to me as most circa-1930 humor comics are.
Ham’s Life on Earth for the 23rd presents the blackboard full of symbols as an attempt to prove something challenging. In this case, to say something about the existence of God. It’s tempting to suppose that we could say something about the existence or nonexistence of God using nothing but logic. And there are mathematics fields that are very close to pure logic. But our scary friends in the philosophy department have been working on the ontological argument for a long while. They’ve found a lot of arguments that seem good, and that fall short for reasons that seem good. I’ll defer to their experience, and suppose that any mathematics-based proof to have the same problems.
Bill Amend’s FoxTrot Classics for the 23rd deploys a Maclaurin series. If you want to calculate the cosine of an angle, and you know the angle in radians, you can find the value by adding up the terms in an infinitely long series. So if θ is the angle, measured in radians, then its cosine will be:
60 degrees is in radians and you see from the comic how to turn this series into a thing to calculate. The series does, yes, go on forever. But since the terms alternate in sign — positive then negative then positive then negative — you have a break. Suppose all you want is the answer to within an error margin. Then you can stop adding up terms once you’ve gotten to a term that’s smaller than your error margin. So if you want the answer to within, say, 0.001, you can stop as soon as you find a term with absolute value less than 0.001.
For high school trig, though, this is all overkill. There’s five really interesting angles you’d be expected to know anything about. They’re 0, 30, 45, 60, and 90 degrees. And you need to know about reflections of those across the horizontal and vertical axes. Those give you, like, -30 degrees or 135 degrees. Those reflections don’t change the magnitude of the cosines or sines. They might change the plus-or-minus sign is all. And there’s only three pairs of numbers that turn up for these five interesting angles. There’s 0 and 1. There’s and . There’s and . Three things to memorize, plus a bit of orienteering, to know whether the cosine or the sine should be the larger size and whether they should positive or negative. And then you’ve got them all.
You might get asked for, like, the sine of 15 degrees. But that’s someone testing whether you know the angle-addition or angle-subtraction formulas. Or the half-angle and double-angle formulas. Nobody would expect you to know the cosine of 15 degrees. The cosine of 30 degrees, though? Sure. It’s .
Mike Thompson’s Grand Avenue for the 23rd is your basic confused-student joke. People often have trouble going from percentages to decimals to fractions and back again. Me, I have trouble in going from percentage chances to odds, as in, “two to one odds” or something like that. (Well, “one to one odds” I feel confident in, and “two to one” also. But, say, “seven to five odds” I can’t feel sure I understand, other than that the second choice is a perceived to be a bit more likely than the first.)
… You know, this would have parsed as the Maclaurin Series Edition, wouldn’t it? Well, if only I were able to throw away words I’ve already written and replace them with better words before publishing, huh?
I hate to disillusion anyone but I lack hard rules about what qualifies as a mathematically-themed comic strip. During a slow week, more marginal stuff makes it. This past week was going slow enough that I tagged Wednesday’s Quincy rerun, from March of 1979 for possible inclusion. And all it does is mention that Quincy’s got a mathematics test due. Fortunately for me the week picked up a little. It cheats me of an excuse to point out Ted Shearer’s art style to people, but that’s not really my blog’s business.
Also it may not surprise you but since I’ve decided I need to include GoComics images I’ve gotten more restrictive. Somehow the bit of work it takes to think of a caption and to describe the text and images of a comic strip feel like that much extra work.
Roy Schneider’s The Humble Stumble for the 13th of May is a logic/geometry puzzle. Is it relevant enough for here? Well, I spent some time working it out. And some time wondering about implicit instructions. Like, if the challenge is to have exactly four equally-sized boxes after two toothpicks are moved, can we have extra stuff? Can we put a toothpick where it’s just a stray edge, part of no particular shape? I can’t speak to how long you stay interested in this sort of puzzle. But you can have some good fun rules-lawyering it.
Jeff Harris’s Shortcuts for the 13th is a children’s informational feature about Aristotle. Aristotle is renowned for his mathematical accomplishments by many people who’ve got him mixed up with Archimedes. Aristotle it’s harder to say much about. He did write great texts that pop-science writers credit as giving us the great ideas about nature and physics and chemistry that the Enlightenment was able to correct in only about 175 years of trying. His mathematics is harder to summarize though. We can say certainly that he knew some mathematics. And that he encouraged thinking of subjects as built on logical deductions from axioms and definitions. So there is that influence.
Dan Thompson’s Brevity for the 15th is a pun, built on the bell curve. This is also known as the Gaussian distribution or the normal distribution. It turns up everywhere. If you plot how likely a particular value is to turn up, you get a shape that looks like a slightly melted bell. In principle the bell curve stretches out infinitely far. In practice, the curve turns into a horizontal line so close to zero you can’t see the difference once you’re not-too-far away from the peak.
Jason Chatfield’s Ginger Meggs for the 16th I assume takes place in a mathematics class. I’m assuming the question is adding together four two-digit numbers. But “what are 26, 24, 33, and 32” seems like it should be open to other interpretations. Perhaps Mr Canehard was asking for some class of numbers those all fit into. Integers, obviously. Counting numbers. Compound numbers rather than primes. I keep wanting to say there’s something deeper, like they’re all multiples of three (or something) but they aren’t. They haven’t got any factors other than 1 in common. I mention this because I’d love to figure out what interesting commonality those numbers have and which I’m overlooking.
Ed Stein’s Freshly Squeezed for the 17th is a story problem strip. Bit of a passive-aggressive one, in-universe. But I understand why it would be formed like that. The problem’s incomplete, as stated. There could be some fun in figuring out what extra bits of information one would need to give an answer. This is another new-tagged comic.
Henry Scarpelli and Craig Boldman’s Archie for the 19th name-drops calculus, credibly, as something high schoolers would be amazed to see one of their own do in their heads. There’s not anything on the blackboard that’s iconically calculus, it happens. Dilton’s writing out a polynomial, more or less, and that’s a fit subject for high school calculus. They’re good examples on which to learn differentiation and integration. They’re a little more complicated than straight lines, but not too weird or abstract. And they follow nice, easy-to-summarize rules. But they turn up in high school algebra too, and can fit into geometry easily. Or any subject, really, as remember, everything is polynomials.
Mark Anderson’s Andertoons for the 19th is Mark Anderson’s Andertoons for the week. Glad that it’s there. Let me explain why it is proper construction of a joke that a Fibonacci Division might be represented with a spiral. Fibonacci’s the name we give to Leonardo of Pisa, who lived in the first half of the 13th century. He’s most important for explaining to the western world why these Hindu-Arabic numerals were worth learning. But his pop-cultural presence owes to the Fibonacci Sequence, the sequence of numbers 1, 1, 2, 3, 5, 8, and so on. Each number’s the sum of the two before it. And this connects to the Golden Ratio, one of pop mathematics’ most popular humbugs. As the terms get bigger and bigger, the ratio between a term and the one before it gets really close to the Golden Ratio, a bit over 1.618.
So. Draw a quarter-circle that connects the opposite corners of a 1×1 square. Connect that to a quarter-circle that connects opposite corners of a 2×2 square. Connect that to a quarter-circle connecting opposite corners of a 3×3 square. And a 5×5 square, and an 8×8 square, and a 13×13 square, and a 21×21 square, and so on. Yes, there are ambiguities in the way I’ve described this. I’ve tried explaining how to do things just right. It makes a heap of boring words and I’m trying to reduce how many of those I write. But if you do it the way I want, guess what shape you have?
And that is why this is a correctly-formed joke about the Fibonacci Division.
This one I saw through John Allen Paulos’s twitter feed. He points out that it’s like the Collatz conjecture but is, in fact, proven. If you try this yourself don’t make the mistake of giving up too soon. You might figure, like start with 12. Sum the squares of its digits and you get 5, which is neither 1 nor anything in that 4-16-37-58-89-145-42-20 cycle. Not so! Square 5 and you get 25. Square those digits and add them and you get 29. Square those digits and add them and you get 40. And what comes next?
This is about a proof of Fermat’s Theorem of Sums of Two Squares. According to it, a prime number — let’s reach deep into the alphabet and call it p — can be written as the sum of two squares if and only if p is one more than a whole multiple of four. It’s a proof by using fixed point methods. This is a fun kind of proof, at least to my sense of fun. It’s an approach that’s got a clear physical interpretation. Imagine picking up a (thin) patch of bread dough, stretching it out some and maybe rotating it, and then dropping it back on the board. There’s at least one bit of dough that’s landed in the same spot it was before. Once you see this you will never be able to just roll out dough the same way. So here the proof involves setting up an operation on integers which has a fixed point, and that the fixed point makes the property true.
John D Cook, who runs a half-dozen or so mathematics-fact-of-the-day Twitter feeds, looks into calculating the volume of an egg. It involves calculus, as finding the volume of many interesting shapes does. I am surprised to learn the volume can be written out as a formula that depends on the shape of the egg. I would have bet that it couldn’t be expressed in “closed form”. This is a slightly flexible term. It’s meant to mean the thing can be written using only normal, familiar functions. However, we pretend that the inverse hyperbolic tangent is a “normal, familiar” function.
For example, there’s the surface area of an egg. This can be worked out too, again using calculus. It can’t be written even with the inverse hyperbolic cotangent, so good luck. You have to get into numerical integration if you want an answer humans can understand.
Comic Strip Master Command spent most of February making sure I could barely keep up. It didn’t slow down the final week of the month either. Some of the comics were those that I know are in eternal reruns. I don’t think I’m repeating things I’ve already discussed here, but it is so hard to be sure.
Bill Amend’s FoxTrot for the 24th of February has a mathematics problem with a joke answer. The approach to finding the area’s exactly right. It’s easy to find areas of simple shapes like rectangles and triangles and circles and half-circles. Cutting a complicated shape into known shapes, finding those areas, and adding them together works quite well, most of the time. And that’s intuitive enough. There are other approaches. If you can describe the outline of a shape well, you can use an integral along that outline to get the enclosed area. And that amazes me even now. One of the wonders of calculus is that you can swap information about a boundary for information about the interior, and vice-versa. It’s a bit much for even Jason Fox, though.
Jef Mallett’s Frazz for the 25th is a dispute between Mrs Olsen and Caulfield about whether it’s possible to give more than 100 percent. I come down, now as always, on the side that argues it depends what you figure 100 percent is of. If you mean “100% of the effort it’s humanly possible to expend” then yes, there’s no making more than 100% of an effort. But there is an amount of effort reasonable to expect for, say, an in-class quiz. It’s far below the effort one could possibly humanly give. And one could certainly give 105% of that effort, if desired. This happens in the real world, of course. Famously, in the right circles, the Space Shuttle Main Engines normally reached 104% of full throttle during liftoff. That’s because the original specifications for what full throttle would be turned out to be lower than was ultimately needed. And it was easier to plan around running the engines at greater-than-100%-throttle than it was to change all the earlier design documents.
Matt Janz’s Out of the Gene Pool rerun for the 25th tosses off a mention of “New Math”. It’s referenced as a subject that’s both very powerful but also impossible for Pop, as an adult, to understand. It’s an interesting denotation. Usually “New Math”, if it’s mentioned at all, is held up as a pointlessly complicated way of doing simple problems. This is, yes, the niche that “Common Core” has taken. But Janz’s strip might be old enough to predate people blaming everything on Common Core. And it might be character, that the father is old enough to have heard of New Math but not anything in the nearly half-century since. It’s an unusual mention in that “New” Math is credited as being good for things. (I’m aware this strip’s a rerun. I had thought I’d mentioned it in an earlier Reading the Comics post, but can’t find it. I am surprised.)
So, I must confess failure. Not about deciphering Józef Maria Hoëne-Wronski’s attempted definition of π. He’d tried this crazy method throwing a lot of infinities and roots of infinities and imaginary numbers together. I believe I translated it into the language of modern mathematics fairly. And my failure is not that I found the formula actually described the number -½π.
Oh, I had an error in there, yes. And I’d found where it was. It was all the way back in the essay which first converted Wronski’s formula into something respectable. It was a small error, first appearing in the last formula of that essay and never corrected from there. This reinforces my suspicion that when normal people see formulas they mostly look at them to confirm there is a formula there. With luck they carry on and read the sentences around them.
My failure is I wanted to write a bit about boring mistakes. The kinds which you make all the time while doing mathematics work, but which you don’t worry about. Dropped signs. Constants which aren’t divided out, or which get multiplied in incorrectly. Stuff like this which you only detect because you know, deep down, that you should have gotten to an attractive simple formula and you haven’t. Mistakes which are tiresome to make, but never make you wonder if you’re in the wrong job.
The trouble is I can’t think of how to make an essay of that. We don’t tend to rate little mistakes like the wrong sign or the wrong multiple or a boring unnecessary added constant as important. This is because they’re not. The interesting stuff in a mathematical formula is usually the stuff representing variations. Change is interesting. The direction of the change? Eh, nice to know. A swapped plus or minus sign alters your understanding of the direction of the change, but that’s all. Multiplying or dividing by a constant wrongly changes your understanding of the size of the change. But that doesn’t alter what the change looks like. Just the scale of the change. Adding or subtracting the wrong constant alters what you think the change is varying from, but not what the shape of the change is. Once more, not a big deal.
But you also know that instinctively, or at least you get it from seeing how it’s worth one or two points on an exam to write -sin where you mean +sin. Or how if you ask the instructor in class about that 2 where a ½ should be, she’ll say, “Oh, yeah, you’re right” and do a hurried bit of erasing before going on.
Thus my failure: I don’t know what to say about boring mistakes that has any insight.
For the record here’s where I got things wrong. I was creating a function, named ‘f’ and using as a variable ‘x’, to represent Wronski’s formula. I’d gotten to this point:
And then I observed how the stuff in curly braces there is “one of those magic tricks that mathematicians know because they see it all the time”. And I wanted to call in this formula, correctly:
So here’s where I went wrong. I took the way off in the front of that first formula and combined it with the stuff in braces to make 2 times a sine of some stuff. I apologize for this. I must have been writing stuff out faster than I was thinking about it. If I had thought, I would have gone through this intermediate step:
Because with that form in mind, it’s easy to take the stuff in curled braces and the in the denominator. From that we get, correctly, . And then the on the far left of that expression and the on the right multiply together to produce the number 8.
So the function ought to have been, all along:
Not very different, is it? Ah, but it makes a huge difference. Carry through with all the L’Hôpital’s Rule stuff described in previous essays. All the complicated formula work is the same. There’s a different number hanging off the front, waiting to multiply in. That’s all. And what you find, redoing all the work but using this corrected function, is that Wronski’s original mess —
Possibly the book I drew this from misquoted Wronski. It’s at least as good to have a formula for 2π as it is to have one for π. Or Wronski had a mistake in his original formula, and had a constant multiplied out front which he didn’t want. It happens to us all.
Józef Maria Hoëne-Wronski’s had an idea for a new, universal, culturally-independent definition of π. It was this formula that nobody went along with because they had looked at it:
I made some guesses about what he would want this to mean. And how we might put that in terms of modern, conventional mathematics. I describe those in the above links. In terms of limits of functions, I got this:
The trouble is that limit took more work than I wanted to do to evaluate. If you try evaluating that ‘f(x)’ at ∞, you get an expression that looks like zero times ∞. This begs for the use of L’Hôpital’s Rule, which tells you how to find the limit for something that looks like zero divided by zero, or like ∞ divided by ∞. Do a little rewriting — replacing that first ‘x’ with ‘ — and this ‘f(x)’ behaves like L’Hôpital’s Rule needs.
The trouble is, that’s a pain to evaluate. L’Hôpital’s Rule works on functions that look like one function divided by another function. It does this by calculating the derivative of the numerator function divided by the derivative of the denominator function. And I decided that was more work than I wanted to do.
Where trouble comes up is all those parts where turns up. The derivatives of functions with a lot of terms in them get more complicated than the original functions were. Is there a way to get rid of some or all of those?
And there is. Do a change of variables. Let me summon the variable ‘y’, whose value is exactly . And then I’ll define a new function, ‘g(y)’, whose value is whatever ‘f’ would be at . That is, and this is just a little bit of algebra:
The limit of ‘f(x)’ for ‘x’ at ∞ should be the same number as the limit of ‘g(y)’ for ‘y’ at … you’d really like it to be zero. If ‘x’ is incredibly huge, then has to be incredibly small. But we can’t just swap the limit of ‘x’ at ∞ for the limit of ‘y’ at 0. The limit of a function at a point reflects the value of the function at a neighborhood around that point. If the point’s 0, this includes positive and negative numbers. But looking for the limit at ∞ gets at only positive numbers. You see the difference?
… For this particular problem it doesn’t matter. But it might. Mathematicians handle this by taking a “one-sided limit”, or a “directional limit”. The normal limit at 0 of ‘g(y)’ is based on what ‘g(y)’ looks like in a neighborhood of 0, positive and negative numbers. In the one-sided limit, we just look at a neighborhood of 0 that’s all values greater than 0, or less than 0. In this case, I want the neighborhood that’s all values greater than 0. And we write that by adding a little + in superscript to the limit. For the other side, the neighborhood less than 0, we add a little – in superscript. So I want to evalute:
Limits and L’Hôpital’s Rule and stuff work for one-sided limits the way they do for regular limits. So there’s that mercy. The first attempt at this limit, seeing what ‘g(y)’ is if ‘y’ happens to be 0, gives . A zero divided by a zero is promising. That’s not defined, no, but it’s exactly the format that L’Hôpital’s Rule likes. The numerator is:
And the denominator is:
The first derivative of the denominator is blessedly easy: the derivative of y, with respect to y, is 1. The derivative of the numerator is a little harder. It demands the use of the Product Rule and the Chain Rule, just as last time. But these chains are easier.
The first derivative of the numerator is going to be:
Yeah, this is the simpler version of the thing I was trying to figure out last time. Because this is what’s left if I write the derivative of the numerator over the derivative of the denominator:
And now this is easy. Promise. There’s no expressions of ‘y’ divided by other expressions of ‘y’ or anything else tricky like that. There’s just a bunch of ordinary functions, all of them defined for when ‘y’ is zero. If this limit exists, it’s got to be equal to:
is 0. And the sine of 0 is 0. The cosine of 0 is 1. So all this gets to be a lot simpler, really fast.
And 20 is equal to 1. So the part to the left of the + sign there is all zero. What remains is:
And so, finally, we have it. Wronski’s formula, as best I make it out, is a function whose value is …
… So, what Wronski had been looking for, originally, was π. This is … oh, so very close to right. I mean, there’s π right there, it’s just multiplied by an unwanted . The question is, where’s the mistake? Was Wronski wrong to start with? Did I parse him wrongly? Is it possible that the book I copied Wronski’s formula from made a mistake?
Could be any of them. I’d particularly suspect I parsed him wrongly. I returned the library book I had got the original claim from, and I can’t find it again before this is set to publish. But I should check whether Wronski was thinking to find π, the ratio of the circumference to the diameter of a circle. Or might he have looked to find the ratio of the circumference to the radius of a circle? Either is an interesting number worth finding. We’ve settled on the circumference-over-diameter as valuable, likely for practical reasons. It’s much easier to measure the diameter than the radius of a thing. (Yes, I have read the Tau Manifesto. No, I am not impressed by it.) But if you know 2π, then you know π, or vice-versa.
The next question: yeah, but I turned up -½π. What am I talking about 2π for? And the answer there is, I’m not the first person to try working out Wronski’s stuff. You can try putting the expression, as best you parse it, into a tool like Mathematica and see what makes sense. Or you can read, for example, Quora commenters giving answers with way less exposition than I do. And I’m convinced: somewhere along the line I messed up. Not in an important way, but, essentially, doing something equivalent to divided by -2 when I should have multiplied by that.
I’ve spotted my mistake. I figure to come back around to explaining where it is and how I made it.
So now a bit more on Józef Maria Hoëne-Wronski’s attempted definition of π. I had got it rewritten to this form:
And I’d tried the first thing mathematicians do when trying to evaluate the limit of a function at a point. That is, take the value of that point and put it in whatever the formula is. If that formula evaluates to something meaningful, then that value is the limit. That attempt gave this:
Because the limit of ‘x’, for ‘x’ at ∞, is infinitely large. The limit of ‘‘ for ‘x’ at ∞ is 1. The limit of ‘ for ‘x’ at ∞ is 0. We can take limits that are 0, or limits that are some finite number, or limits that are infinitely large. But multiplying a zero times an infinity is dangerous. Could be anything.
Mathematicians have a tool. We know it as L’Hôpital’s Rule. It’s named for the French mathematician Guillaume de l’Hôpital, who discovered it in the works of his tutor, Johann Bernoulli. (They had a contract giving l’Hôpital publication rights. If Wikipedia’s right the preface of the book credited Bernoulli, although it doesn’t appear to be specifically for this. The full story is more complicated and ambiguous. The previous sentence may be said about most things.)
So here’s the first trick. Suppose you’re finding the limit of something that you can write as the quotient of one function divided by another. So, something that looks like this:
(Normally, this gets presented as ‘f(x)’ divided by ‘g(x)’. But I’m already using ‘f(x)’ for another function and I don’t want to muddle what that means.)
Suppose it turns out that at ‘a’, both ‘h(x)’ and ‘g(x)’ are zero, or both ‘h(x)’ and ‘g(x)’ are ∞. Zero divided by zero, or ∞ divided by ∞, looks like danger. It’s not necessarily so, though. If this limit exists, then we can find it by taking the first derivatives of ‘h’ and ‘g’, and evaluating:
That ‘ mark is a common shorthand for “the first derivative of this function, with respect to the only variable we have around here”.
This doesn’t look like it should help matters. Often it does, though. There’s an excellent chance that either ‘h'(x)’ or ‘g'(x)’ — or both — aren’t simultaneously zero, or ∞, at ‘a’. And once that’s so, we’ve got a meaningful limit. This doesn’t always work. Sometimes we have to use this l’Hôpital’s Rule trick a second time, or a third or so on. But it works so very often for the kinds of problems we like to do. Reaches the point that if it doesn’t work, we have to suspect we’re calculating the wrong thing.
But wait, you protest, reasonably. This is fine for problems where the limit looks like 0 divided by 0, or ∞ divided by ∞. What Wronski’s formula got me was 0 times 1 times ∞. And I won’t lie: I’m a little unsettled by having that 1 there. I feel like multiplying by 1 shouldn’t be a problem, but I have doubts.
That zero times ∞ thing, thought? That’s easy. Here’s the second trick. Let me put it this way: isn’t ‘x’ really the same thing as ?
I expect your answer is to slam your hand down on the table and glare at my writing with contempt. So be it. I told you it was a trick.
And it’s a perfectly good one. And it’s perfectly legitimate, too. is a meaningful number if ‘x’ is any finite number other than zero. So is . Mathematicians accept a definition of limit that doesn’t really depend on the value of your expression at a point. So that wouldn’t be meaningful for ‘x’ at zero doesn’t mean we can’t evaluate its limit for ‘x’ at zero. And just because we might not be sure that would mean for infinitely large ‘x’ doesn’t mean we can’t evaluate its limit for ‘x’ at ∞.
I see you, person who figures you’ve caught me. The first thing I tried was putting in the value of ‘x’ at the ∞, all ready to declare that this was the limit of ‘f(x)’. I know my caveats, though. Plugging in the value you want the limit at into the function whose limit you’re evaluating is a shortcut. If you get something meaningful, then that’s the same answer you would get finding the limit properly. Which is done by looking at the neighborhood around but not at that point. So that’s why this reciprocal-of-the-reciprocal trick works.
So back to my function, which looks like this:
Do I want to replace ‘x’ with , or do I want to replace with ? I was going to say something about how many times in my life I’ve been glad to take the reciprocal of the sine of an expression of x. But just writing the symbols out like that makes the case better than being witty would.
So here is a new, L’Hôpital’s Rule-friendly, version of my version of Wronski’s formula:
I put that -2 out in front because it’s not really important. The limit of a constant number times some function is the same as that constant number times the limit of that function. We can put that off to the side, work on other stuff, and hope that we remember to bring it back in later. I manage to remember it about four-fifths of the time.
So these are the numerator and denominator functions I was calling ‘h(x)’ and ‘g(x)’ before:
The limit of both of these at ∞ is 0, just as we might hope. So we take the first derivatives. That for ‘g(x)’ is easy. Anyone who’s reached week three in Intro Calculus can do it. This may only be because she’s gotten bored and leafed through the formulas on the inside front cover of the textbook. But she can do it. It’s:
When I last looked at Józef Maria Hoëne-Wronski’s attempted definition of π I had gotten it to this. Take the function:
And find its limit when ‘x’ is ∞. Formally, you want to do this by proving there’s some number, let’s say ‘L’. And ‘L’ has the property that you can pick any margin-of-error number ε that’s bigger than zero. And whatever that ε is, there’s some number ‘N’ so that whenever ‘x’ is bigger than ‘N’, ‘f(x)’ is larger than ‘L – ε’ and also smaller than ‘L + ε’. This can be a lot of mucking about with expressions to prove.
Fortunately we have shortcuts. There’s work we can do that gets us ‘L’, and we can rely on other proofs that show that this must be the limit of ‘f(x)’ at some value ‘a’. I use ‘a’ because that doesn’t commit me to talking about ∞ or any other particular value. The first approach is to just evaluate ‘f(a)’. If you get something meaningful, great! We’re done. That’s the limit of ‘f(x)’ at ‘a’. This approach is called “substitution” — you’re substituting ‘a’ for ‘x’ in the expression of ‘f(x)’ — and it’s great. Except that if your problem’s interesting then substitution won’t work. Still, maybe Wronski’s formula turns out to be lucky. Fit in ∞ where ‘x’ appears and we get:
So … all right. Not quite there yet. But we can get there. For example, has to be — well. It’s what you would expect if you were a kid and not worried about rigor: 0. We can make it rigorous if you like. (It goes like this: Pick any ε larger than 0. Then whenever ‘x’ is larger than then is less than ε. So the limit of at ∞ has to be 0.) So let’s run with this: replace all those expressions with 0. Then we’ve got:
The sine of 0 is 0. 20 is 1. So substitution tells us limit is -2 times ∞ times 1 times 0. That there’s an ∞ in there isn’t a problem. A limit can be infinitely large. Think of the limit of ‘x2‘ at ∞. An infinitely large thing times an infinitely large thing is fine. The limit of ‘x ex‘ at ∞ is infinitely large. A zero times a zero is fine; that’s zero again. But having an ∞ times a 0? That’s trouble. ∞ times something should be huge; anything times zero should be 0; which term wins?
So we have to fall back on alternate plans. Fortunately there’s a tool we have for limits when we’d otherwise have to face an infinitely large thing times a zero.
I hope to write about this next time. I apologize for not getting through it today but time wouldn’t let me.
I remain fascinated with Józef Maria Hoëne-Wronski’s attempted definition of π. It had started out like this:
And I’d translated that into something that modern mathematicians would accept without flinching. That is to evaluate the limit of a function that looks like this:
So. I don’t want to deal with that f(x) as it’s written. I can make it better. One thing that bothers me is seeing the complex number raised to a power. I’d like to work with something simpler than that. And I can’t see that number without also noticing that I’m subtracting from it raised to the same power. and are a “conjugate pair”. It’s usually nice to see those. It often hints at ways to make your expression simpler. That’s one of those patterns you pick up from doing a lot of problems as a mathematics major, and that then look like magic to the lay audience.
Here’s the first way I figure to make my life simpler. It’s in rewriting that and stuff so it’s simpler. It’ll be simpler by using exponentials. Shut up, it will too. I get there through Gauss, Descartes, and Euler.
At least I think it was Gauss who pointed out how you can match complex-valued numbers with points on the two-dimensional plane. On a sheet of graph paper, if you like. The number matches to the point with x-coordinate 1, y-coordinate 1. The number matches to the point with x-coordinate 1, y-coordinate -1. Yes, yes, this doesn’t sound like much of an insight Gauss had, but his work goes on. I’m leaving it off here because that’s all that I need for right now.
So these two numbers that offended me I can think of as points. They have Cartesian coordinates (1, 1) and (1, -1). But there’s never only one coordinate system for something. There may be only one that’s good for the problem you’re doing. I mean that makes the problem easier to study. But there are always infinitely many choices. For points on a flat surface like a piece of paper, and where the points don’t represent any particular physics problem, there’s two good choices. One is the Cartesian coordinates. In it you refer to points by an origin, an x-axis, and a y-axis. How far is the point from the origin in a direction parallel to the x-axis? (And in which direction? This gives us a positive or a negative number) How far is the point from the origin in a direction parallel to the y-axis? (And in which direction? Same positive or negative thing.)
The other good choice is polar coordinates. For that we need an origin and a positive x-axis. We refer to points by how far they are from the origin, heedless of direction. And then to get direction, what angle the line segment connecting the point with the origin makes with the positive x-axis. The first of these numbers, the distance, we normally label ‘r’ unless there’s compelling reason otherwise. The other we label ‘θ’. ‘r’ is always going to be a positive number or, possibly, zero. ‘θ’ might be any number, positive or negative. By convention, we measure angles so that positive numbers are counterclockwise from the x-axis. I don’t know why. I guess it seemed less weird for, say, the point with Cartesian coordinates (0, 1) to have a positive angle rather than a negative angle. That angle would be , because mathematicians like radians more than degrees. They make other work easier.
So. The point corresponds to the polar coordinates and . The point corresponds to the polar coordinates and . Yes, the θ coordinates being negative one times each other is common in conjugate pairs. Also, if you have doubts about my use of the word “the” before “polar coordinates”, well-spotted. If you’re not sure about that thing where ‘r’ is not negative, again, well-spotted. I intend to come back to that.
With the polar coordinates ‘r’ and ‘θ’ to describe a point I can go back to complex numbers. I can match the point to the complex number with the value given by , where ‘e’ is that old 2.71828something number. Superficially, this looks like a big dumb waste of time. I had some problem with imaginary numbers raised to powers, so now, I’m rewriting things with a number raised to imaginary powers. Here’s why it isn’t dumb.
It’s easy to raise a number written like this to a power. raised to the n-th power is going to be equal to . (Because and we’re going to go ahead and assume this stays true if ‘b’ is a complex-valued number. It does, but you’re right to ask how we know that.) And this turns into raising a real-valued number to a power, which we know how to do. And it involves dividing a number by that power, which is also easy.
And we can get back to something that looks like too. That is, something that’s a real number plus times some real number. This is through one of the many Euler’s Formulas. The one that’s relevant here is that for any real number ‘φ’. So, that’s true also for ‘θ’ times ‘n’. Or, looking to where everybody knows we’re going, also true for ‘θ’ divided by ‘x’.
OK, on to the people so anxious about all this. I talked about the angle made between the line segment that connects a point and the origin and the positive x-axis. “The” angle. “The”. If that wasn’t enough explanation of the problem, mention how your thinking’s done a 360 degree turn and you see it different now. In an empty room, if you happen to be in one. Your pedantic know-it-all friend is explaining it now. There’s an infinite number of angles that correspond to any given direction. They’re all separated by 360 degrees or, to a mathematician, 2π.
And more. What’s the difference between going out five units of distance in the direction of angle 0 and going out minus-five units of distance in the direction of angle -π? That is, between walking forward five paces while facing east and walking backward five paces while facing west? Yeah. So if we let ‘r’ be negative we’ve got twice as many infinitely many sets of coordinates for each point.
This complicates raising numbers to powers. θ times n might match with some point that’s very different from θ-plus-2-π times n. There might be a whole ring of powers. This seems … hard to work with, at least. But it’s, at heart, the same problem you get thinking about the square root of 4 and concluding it’s both plus 2 and minus 2. If you want “the” square root, you’d like it to be a single number. At least if you want to calculate anything from it. You have to pick out a preferred θ from the family of possible candidates.
For me, that’s whatever set of coordinates has ‘r’ that’s positive (or zero), and that has ‘θ’ between -π and π. Or between 0 and 2π. It could be any strip of numbers that’s 2π wide. Pick what makes sense for the problem you’re doing. It’s going to be the strip from -π to π. Perhaps the strip from 0 to 2π.
What this all amounts to is that I can turn this:
without changing its meaning any. Raising a number to the one-over-x power looks different from raising it to the n power. But the work isn’t different. The function I wrote out up there is the same as this function:
I can’t look at that number, , sitting there, multiplied by two things added together, and leave that. (OK, subtracted, but same thing.) I want to something something distributive law something and that gets us here:
Also, yeah, that square root of two raised to a power looks weird. I can turn that square root of two into “two to the one-half power”. That gets to this rewrite:
And then. Those parentheses. e raised to an imaginary number minus e raised to minus-one-times that same imaginary number. This is another one of those magic tricks that mathematicians know because they see it all the time. Part of what we know from Euler’s Formula, the one I waved at back when I was talking about coordinates, is this:
That’s good for any real-valued φ. For example, it’s good for the number . And that means we can rewrite that function into something that, finally, actually looks a little bit simpler. It looks like this:
And that’s the function whose limit I want to take at ∞. No, really.
I ran out of time to do my next bit on Wronski’s attempted definition of π. Next week, all goes well. But I have something to share anyway. William Lane Craig, of the The author of Boxing Pythagoras blog was intrigued by the starting point. And as a fan of studying how people understand infinity and infinitesimals (and how they don’t), this two-century-old example of mixing the numerous and the tiny set his course.
For example, can we speak of a number that’s larger than zero, but smaller than the reciprocal of any positive integer? It’s hard to imagine such a thing. But what if we can show that if we suppose such a number exists, then we can do this logically sound work with it? If you want to say that isn’t enough to show a number exists, then I have to ask how you know imaginary numbers or negative numbers exist.
Standard analysis, you probably guessed, doesn’t do that. It developed over the 19th century when the logical problems of these kinds of numbers seemed unsolvable. Mostly that’s done by limits, showing that a thing must be true whenever some quantity is small enough, or large enough. It seems safe to trust that the infinitesimally small is small enough, and the infinitely large is large enough. And it’s not like mathematicians back then were bad at their job. Mathematicians learned a lot of things about how infinitesimals and infinities work over the late 19th and early 20th century. It makes modern work possible.
Anyway, Boxing Pythagoras goes over what a non-standard analysis treatment of the formula suggests. I think it’s accessible even if you haven’t had much non-standard analysis in your background. At least it worked for me and I haven’t had much of the stuff. I think it’s also accessible if you’re good at following logical argument and won’t be thrown by Greek letters as variables. Most of the hard work is really arithmetic with funny letters. I recommend going and seeing if he did get to π.
A couple weeks ago I shared a fascinating formula for π. I got it from Carl B Boyer’s The History of Calculus and its Conceptual Development. He got it from Józef Maria Hoëne-Wronski, early 19th-century Polish mathematician. His idea was that an absolute, culturally-independent definition of π would come not from thinking about circles and diameters but rather this formula:
Now, this formula is beautiful, at least to my eyes. It’s also gibberish. At least it’s ungrammatical. Mathematicians don’t like to write stuff like “four times infinity”, at least not as more than a rough draft on the way to a real thought. What does it mean to multiply four by infinity? Is arithmetic even a thing that can be done on infinitely large quantities? Among Wronski’s problems is that they didn’t have a clear answer to this. We’re a little more advanced in our mathematics now. We’ve had a century and a half of rather sound treatment of infinitely large and infinitely small things. Can we save Wronski’s work?
Start with the easiest thing. I’m offended by those bits. Well, no, I’m more unsettled by them. I would rather have in there. The difference? … More taste than anything sound. I prefer, if I can get away with it, using the square root symbol to mean the positive square root of the thing inside. There is no positive square root of -1, so, pfaugh, away with it. Mere style? All right, well, how do you know whether those terms are meant to be or its additive inverse, ? How do you know they’re all meant to be the same one? See? … As with all style preferences, it’s impossible to be perfectly consistent. I’m sure there are times I accept a big square root symbol over a negative or a complex-valued quantity. But I’m not forced to have it here so I’d rather not. First step:
Also dividing by is the same as multiplying by so the second easy step gives me:
Now the hard part. All those infinities. I don’t like multiplying by infinity. I don’t like dividing by infinity. I really, really don’t like raising a quantity to the one-over-infinity power. Most mathematicians don’t. We have a tool for dealing with this sort of thing. It’s called a “limit”.
Mathematicians developed the idea of limits over … well, since they started doing mathematics. In the 19th century limits got sound enough that we still trust the idea. Here’s the rough way it works. Suppose we have a function which I’m going to name ‘f’ because I have better things to do than give functions good names. Its domain is the real numbers. Its range is the real numbers. (We can define functions for other domains and ranges, too. Those definitions look like what they do here.)
I’m going to use ‘x’ for the independent variable. It’s any number in the domain. I’m going to use ‘a’ for some point. We want to know the limit of the function “at a”. ‘a’ might be in the domain. But — and this is genius — it doesn’t have to be. We can talk sensibly about the limit of a function at some point where the function doesn’t exist. We can say “the limit of f at a is the number L”. I hadn’t introduced ‘L’ into evidence before, but … it’s a number. It has some specific set value. Can’t say which one without knowing what ‘f’ is and what its domain is and what ‘a’ is. But I know this about it.
Pick any error margin that you like. Call it ε because mathematicians do. However small this (positive) number is, there’s at least one neighborhood in the domain of ‘f’ that surrounds ‘a’. Check every point in that neighborhood other than ‘a’. The value of ‘f’ at all those points in that neighborhood other than ‘a’ will be larger than L – ε and smaller than L + ε.
Yeah, pause a bit there. It’s a tricky definition. It’s a nice common place to crash hard in freshman calculus. Also again in Intro to Real Analysis. It’s not just you. Perhaps it’ll help to think of it as a kind of mutual challenge game. Try this.
You draw whatever error bar, as big or as little as you like, around ‘L’.
But Ialways respond by drawing some strip around ‘a’.
You then pick absolutely any ‘x’ inside my strip, other than ‘a’.
Is f(x) always within the error bar you drew?
Suppose f(x) is. Suppose that you can pick any error bar however tiny, and I can answer with a strip however tiny, and every single ‘x’ inside my strip has an f(x) within your error bar … then, L is the limit of f at a.
Again, yes, tricky. But mathematicians haven’t found a better definition that doesn’t break something mathematicians need.
To write “the limit of f at a is L” we use the notation:
The ‘lim’ part probably makes perfect sense. And you can see where ‘f’ and ‘a’ have to enter into it. ‘x’ here is a “dummy variable”. It’s the falsework of the mathematical expression. We need some name for the independent variable. It’s clumsy to do without. But it doesn’t matter what the name is. It’ll never appear in the answer. If it does then the work went wrong somewhere.
What I want to do, then, is turn all those appearances of ‘∞’ in Wronski’s expression into limits of something at infinity. And having just said what a limit is I have to do a patch job. In that talk about the limit at ‘a’ I talked about a neighborhood containing ‘a’. What’s it mean to have a neighborhood “containing ∞”?
The answer is exactly what you’d think if you got this question and were eight years old. The “neighborhood of infinity” is “all the big enough numbers”. To make it rigorous, it’s “all the numbers bigger than some finite number that let’s just call N”. So you give me an error bar around ‘L’. I’ll give you back some number ‘N’. Every ‘x’ that’s bigger than ‘N’ has f(x) inside your error bars. And note that I don’t have to say what ‘f(∞)’ is or even commit to the idea that such a thing can be meaningful. I only ever have to think directly about values of ‘f(x)’ where ‘x’ is some real number.
So! First, let me rewrite Wronski’s formula as a function, defined on the real numbers. Then I can replace each ∞ with the limit of something at infinity and … oh, wait a minute. There’s three ∞ symbols there. Do I need three limits?
Ugh. Yeah. Probably. This can be all right. We can do multiple limits. This can be well-defined. It can also be a right pain. The challenge-and-response game needs a little modifying to work. You still draw error bars. But I have to draw multiple strips. One for each of the variables. And every combination of values inside all those strips has give an ‘f’ that’s inside your error bars. There’s room for great mischief. You can arrange combinations of variables that look likely to break ‘f’ outside the error bars.
So. Three independent variables, all taking a limit at ∞? That’s not guaranteed to be trouble, but I’d expect trouble. At least I’d expect something to keep the limit from existing. That is, we could find there’s no number ‘L’ so that this drawing-neighborhoods thing works for all three variables at once.
Let’s try. One of the ∞ will be a limit of a variable named ‘x’. One of them a variable named ‘y’. One of them a variable named ‘z’. Then:
Without doing the work, my hunch is: this is utter madness. I expect it’s probably possible to make this function take on many wildly different values by the judicious choice of ‘x’, ‘y’, and ‘z’. Particularly ‘y’ and ‘z’. You maybe see it already. If you don’t, you maybe see it now that I’ve said you maybe see it. If you don’t, I’ll get there, but not in this essay. But let’s suppose that it’s possible to make f(x, y, z) take on wildly different values like I’m getting at. This implies that there’s not any limit ‘L’, and therefore Wronski’s work is just wrong.
Thing is, Wronski wouldn’t have thought that. Deep down, I am certain, he thought the three appearances of ∞ were the same “value”. And that to translate him fairly we’d use the same name for all three appearances. So I am going to do that. I shall use ‘x’ as my variable name, and replace all three appearances of ∞ with the same variable and a common limit. So this gives me the single function:
And then I need to take the limit of this at ∞. If Wronski is right, and if I’ve translated him fairly, it’s going to be π. Or something easy to get π from.
I’ve been reading Carl B Boyer’s The History of Calculus and its Conceptual Development. It’s been slow going, because reading about how calculus’s ideas developed is hard. The ideas underlying it are subtle to start with. And the ideas have to be discussed using vague, unclear definitions. That’s not because dumb people were making arguments. It’s because these were smart people studying ideas at the limits of what we understood. When we got clear definitions we had the fundamentals of calculus understood. (By our modern standards. The future will likely see us as accepting strange ambiguities.) And I still think Boyer whiffs the discussion of Zeno’s Paradoxes in a way that mathematics and science-types usually do. (The trouble isn’t imagining that infinite series can converge. The trouble is that things are either infinitely divisible or they’re not. Either way implies things that seem false.)
Anyway. Boyer got to a part about the early 19th century. This was when mathematicians were discovering infinities and infinitesimals are amazing tools. Also that mathematicians should maybe learn if they follow any rules. Because you can just plug symbols in to formulas and grind out what looks like they might mean and get answers. Sometimes this works great. Grind through the formulas for solving cubic polynomials as though square roots of negative numbers make sense. You get good results. Later, we worked out a coherent scheme of “complex-valued numbers” that justified it all. We can get lucky with infinities and infinitesimals, sometimes.
And this brought Boyer to an argument made by Józef Maria Hoëne-Wronski. He was a Polish mathematician whose fantastic ambition in … everything … didn’t turn out many useful results. Algebra, the Longitude Problem, building a rival to the railroad, even the Kosciuszko Uprising, none quite panned out. (And that’s not quite his name. The ‘n’ in ‘Wronski’ should have an acute mark over it. But WordPress’s HTML engine doesn’t want to imagine such a thing exists. Nor do many typesetters writing calculus or differential equations books, Boyer’s included.)
But anyone who studies differential equations knows his name, for a concept called the Wronskian. It’s a matrix determinant that anyone who studies differential equations hopes they won’t ever have to do after learning it. And, says Boyer, Wronski had this notion for an “absolute meaning of the number π”. (By “absolute” Wronski means one that not drawn from cultural factors like the weird human interset in circle perimeters and diameters. Compare it to the way we speak of “absolute temperature”, where the zero means something not particular to western European weather.)
I will admit I’m not fond of “real” alternate definitions of π. They seem to me mostly to signal how clever the definition-originator is. The only one I like at all defines π as the smallest positive root of the simple-harmonic-motion differential equation. (With the right starting conditions and all that.) And I’m not sure that isn’t “circumference over diameter” in a hidden form.
And yes, that definition is a mess of early-19th-century wild, untamed casualness in the use of symbols. But I admire the crazypants beauty of it. If I ever get a couple free hours I should rework it into something grammatical. And then see if, turned into something tolerable, Wronski’s idea is something even true.
Boyer allows that “perhaps” because of the strange notation and “bizarre use of the symbol ∞” Wronski didn’t make much headway on this point. I can’t fault people for looking at that and refusing to go further. But isn’t it enchanting as it is?
The rest of last week had more mathematically-themed comic strips than Sunday alone did. As sometimes happens, I noticed an objectively unimportant detail in one of the comics and got to thinking about it. Whether I could solve the equation as posted, or whether at least part of it made sense as a mathematics problem. Well, you’ll see.
Patrick McDonnell’s Mutts for the 25th of September I include because it’s cute and I like when I can feature some comic in these roundups. Maybe there’s some discussion that could be had about what “equals” means in ordinary English versus what it means in mathematics. But I admit that’s a stretch.
Olivia Walch’s Imogen Quest for the 25th uses, and describes, the mathematics of a famous probability problem. This is the surprising result of how few people you need to have a 50 percent chance that some pair of people have a birthday in common. It then goes over to some other probability problems. The examples are silly. But the reasoning is sound. And the approach is useful. To find the chance of something happens it’s often easiest to work out the chance it doesn’t. Which is as good as knowing the chance it does, since a thing can either happen or not happen. At least in probability problems, which define “thing” and “happen” so there’s not ambiguity about whether it happened or not.
Piers Baker’s Ollie and Quentin rerun for the 26th I’m pretty sure I’ve written about before, although back before I included pictures of the Comics Kingdom strips. (The strip moved from Comics Kingdom over to GoComics, which I haven’t caught removing old comics from their pages.) Anyway, it plays on a core piece of probability. It sets out the world as things, “events”, that can have one of multiple outcomes, and which must have one of those outcomes. Coin tossing is taken to mean, by default, an event that has exactly two possible outcomes, each equally likely. And that is near enough true for real-world coin tossing. But there is a little gap between “near enough” and “true”.
Rick Stromoski’s Soup To Nutz for the 27th is your standard sort of Dumb Royboy joke, in this case about him not knowing what percentages are. You could do the same joke about fractions, including with the same breakdown of what part of the mathematics geek population ruins it for the remainder.
Nate Fakes’s Break of Day for the 28th is not quite the anthropomorphic-numerals joke for the week. Anthropomorphic mathematics problems, anyway. The intriguing thing to me is that the difficult, calculus, problem looks almost legitimate to me. On the right-hand-side of the first two lines, for example, the calculation goes from
This is a little sloppy. The first line ought to end in a ‘dt’, and the second ought to have a constant of integration. If you don’t know what these calculus things are let me explain: they’re calculus things. You need to include them to express the work correctly. But if you’re just doing a quick check of something, the mathematical equivalent of a very rough preliminary sketch, it’s common enough to leave that out.
It doesn’t quite parse or mean anything precisely as it is. But it looks like the sort of thing that some context would make meaningful. That there’s repeated appearances of , or , particularly makes me wonder if Frakes used a problem he (or a friend) was doing for some reason.
I’ve been reading Elke Stangl’s Elkemental Force blog for years now. Sometimes I even feel social-media-caught-up enough to comment, or at least to like posts. This is relevant today as I discuss one of the Stangl’s suggestions for my letter-V topic.
So sometime in pre-algebra, or early in (high school) algebra, you start drawing equations. It’s a simple trick. Lay down a coordinate system, some set of axes for ‘x’ and ‘y’ and maybe ‘z’ or whatever letters are important. Look to the equation, made up of x’s and y’s and maybe z’s and so. Highlight all the points with coordinates whose values make the equation true. This is the logical basis for saying (eg) that the straight line “is” .
A short while later, you learn about polar coordinates. Instead of using ‘x’ and ‘y’, you have ‘r’ and ‘θ’. ‘r’ is the distance from the center of the universe. ‘θ’ is the angle made with respect to some reference axis. It’s as legitimate a way of describing points in space. Some classrooms even have a part of the blackboard (whiteboard, whatever) with a polar-coordinates “grid” on it. This looks like the lines of a dartboard. And you learn that some shapes are easy to describe in polar coordinates. A circle, centered on the origin, is ‘r = 2’ or something like that. A line through the origin is ‘θ = 1’ or whatever. The line that we’d called before? … That’s … some mess. And now … that’s not even a line. That’s some kind of spiral. Two spirals, really. Kind of wild.
And something to bother you a while. is an equation that looks the same as . You’ve changed the names of the variables, but not how they relate to each other. But one is a straight line and the other a spiral thing. How can that be?
The answer, ultimately, is that the letters in the equations aren’t these content-neutral labels. They carry meaning. ‘x’ and ‘y’ imply looking at space a particular way. ‘r’ and ‘θ’ imply looking at space a different way. A shape has different representations in different coordinate systems. Fair enough. That seems to settle the question.
But if you get to calculus the question comes back. You can integrate over a region of space that’s defined by Cartesian coordinates, x’s and y’s. Or you can integrate over a region that’s defined by polar coordinates, r’s and θ’s. The first time you try this, you find … well, that any region easy to describe in Cartesian coordinates is painful in polar coordinates. And vice-versa. Way too hard. But if you struggle through all that symbol manipulation, you get … different answers. Eventually the calculus teacher has mercy and explains. If you’re integrating in Cartesian coordinates you need to use “dx dy”. If you’re integrating in polar coordinates you need to use “r dr dθ”. If you’ve never taken calculus, never mind what this means. What is important is that “r dr dθ” looks like three things multiplied together, while “dx dy” is two.
We get this explained as a “change of variables”. If we want to go from one set of coordinates to a different one, we have to do something fiddly. The extra ‘r’ in “r dr dθ” is what we get going from Cartesian to polar coordinates. And we get formulas to describe what we should do if we need other kinds of coordinates. It’s some work that introduces us to the Jacobian, which looks like the most tedious possible calculation ever at that time. (In Intro to Differential Equations we learn we were wrong, and the Wronskian is the most tedious possible calculation ever. This is also wrong, but it might as well be true.) We typically move on after this and count ourselves lucky it got no worse than that.
None of this is wrong, even from the perspective of more advanced mathematics. It’s not even misleading, which is a refreshing change. But we can look a little deeper, and get something good from doing so.
The deeper perspective looks at “differential forms”. These are about how to encode information about how your coordinate system represents space. They’re tensors. I don’t blame you for wondering if they would be. A differential form uses interactions between some of the directions in a space. A volume form is a differential form that uses all the directions in a space. And satisfies some other rules too. I’m skipping those because some of the symbols involved I don’t even know how to look up, much less make WordPress present.
What’s important is the volume form carries information compactly. As symbols it tells us that this represents a chunk of space that’s constant no matter what the coordinates look like. This makes it possible to do analysis on how functions work. It also tells us what we would need to do to calculate specific kinds of problem. This makes it possible to describe, for example, how something moving in space would change.
The volume form, and the tools to do anything useful with it, demand a lot of supporting work. You can dodge having to explicitly work with tensors. But you’ll need a lot of tensor-related materials, like wedge products and exterior derivatives and stuff like that. If you’ve never taken freshman calculus don’t worry: the people who have taken freshman calculus never heard of those things either. So what makes this worthwhile?
Yes, person who called out “polynomials”. Good instinct. Polynomials are usually a reason for any mathematics thing. This is one of maybe four exceptions. I have to appeal to my other standard answer: “group theory”. These volume forms match up naturally with groups. There’s not only information about how coordinates describe a space to consider. There’s ways to set up coordinates that tell us things.
That isn’t all. These volume forms can give us new invariants. Invariants are what mathematicians say instead of “conservation laws”. They’re properties whose value for a given problem is constant. This can make it easier to work out how one variable depends on another, or to work out specific values of variables.
For example, classical physics problems like how a bunch of planets orbit a sun often have a “symplectic manifold” that matches the problem. This is a description of how the positions and momentums of all the things in the problem relate. The symplectic manifold has a volume form. That volume is going to be constant as time progresses. That is, there’s this way of representing the positions and speeds of all the planets that does not change, no matter what. It’s much like the conservation of energy or the conservation of angular momentum. And this has practical value. It’s the subject that brought my and Elke Stangl’s blogs into contact, years ago. It also has broader applicability.
There’s no way to provide an exact answer for the movement of, like, the sun and nine-ish planets and a couple major moons and all that. So there’s no known way to answer the question of whether the Earth’s orbit is stable. All the planets are always tugging one another, changing their orbits a little. Could this converge in a weird way suddenly, on geologic timescales? Might the planet might go flying off out of the solar system? It doesn’t seem like the solar system could be all that unstable, or it would have already. But we can’t rule out that some freaky alignment of Jupiter, Saturn, and Halley’s Comet might not tweak the Earth’s orbit just far enough for catastrophe to unfold. Granted there’s nothing we could do about the Earth flying out of the solar system, but it would be nice to know if we face it, we tell ourselves.
But we can answer this numerically. We can set a computer to simulate the movement of the solar system. But there will always be numerical errors. For example, we can’t use the exact value of π in a numerical computation. 3.141592 (and more digits) might be good enough for projecting stuff out a day, a week, a thousand years. But if we’re looking at millions of years? The difference can add up. We can imagine compensating for not having the value of π exactly right. But what about compensating for something we don’t know precisely, like, where Jupiter will be in 16 million years and two months?
Symplectic forms can help us. The volume form represented by this space has to be conserved. So we can rewrite our simulation so that these forms are conserved, by design. This does not mean we avoid making errors. But it means we avoid making certain kinds of errors. We’re more likely to make what we call “phase” errors. We predict Jupiter’s location in 16 million years and two months. Our simulation puts it thirty degrees farther in its circular orbit than it actually would be. This is a less serious mistake to make than putting Jupiter, say, eight-tenths as far from the Sun as it would really be.
Volume forms seem, at first, a lot of mechanism for a small problem. And, unfortunately for students, they are. They’re more trouble than they’re worth for changing Cartesian to polar coordinates, or similar problems. You know, ones that the student already has some feel for. They pay off on more abstract problems. Tracking the movement of a dozen interacting things, say, or describing a space that’s very strangely shaped. Those make the effort to learn about forms worthwhile.
It was again a week just busy enough that I’m comfortable splitting the Reading The Comments thread into two pieces. It’s also a week that made me think about cake. So, I’m happy with the way last week shaped up, as far as comic strips go. Other stuff could have used a lot of work Let’s read.
Stephen Bentley’s Herb and Jamaal rerun for the 13th depicts “teaching the kids math” by having them divide up a cake fairly. I accept this as a viable way to make kids interested in the problem. Cake-slicing problems are a corner of game theory as it addresses questions we always find interesting. How can a resource be fairly divided? How can it be divided if there is not a trusted authority? How can it be divided if the parties do not trust one another? Why do we not have more cake? The kids seem to be trying to divide the cake by volume, which could be fair. If the cake slice is a small enough wedge they can likely get near enough a perfect split by ordinary measures. If it’s a bigger wedge they’d need calculus to get the answer perfect. It’ll be well-approximated by solids of revolution. But they likely don’t need perfection.
This is assuming the value of the icing side is not held in greater esteem than the bare-cake sides. This is not how I would value the parts of the cake. They’ll need to work something out about that, too.
Mac King and Bill King’s Magic in a Minute for the 13th features a bit of numerical wizardry. That the dates in a three-by-three block in a calendar will add up to nine times the centered date. Why this works is good for a bit of practice in simplifying algebraic expressions. The stunt will be more impressive if you can multiply by nine in your head. I’d do that by taking ten times the given date and then subtracting the original date. I won’t say I’m fond of the idea of subtracting 23 from 230, or 17 from 170. But a skilled performer could do something interesting while trying to do this subtraction. (And if you practice the trick you can get the hang of the … fifteen? … different possible answers.)
Bill Amend’s FoxTrot rerun for the 14th mentions mathematics. Young nerd Jason’s trying to get back into hand-raising form. Arithmetic has considerable advantages as a thing to practice answering teachers. The questions have clear, definitely right answers, that can be worked out or memorized ahead of time, and can be asked in under half a panel’s word balloon space. I deduce the strip first ran the 21st of August, 2006, although that image seems to be broken.
Ed Allison’s Unstrange Phenomena for the 14th suggests changes in the definition of the mile and the gallon to effortlessly improve the fuel economy of cars. As befits Allison’s Dadaist inclinations the numbers don’t work out. As it is, if you defined a New Mile of 7,290 feet (and didn’t change what a foot was) and a New Gallon of 192 fluid ounces (and didn’t change what an old fluid ounce was) then a 20 old-miles-per-old-gallon car would come out to about 21.7 new-miles-per-new-gallon. Commenter Del_Grande points out that if the New Mile were 3,960 feet then the calculation would work out. This inspires in me curiosity. Did Allison figure out the numbers that would work and then make a mistake in the final art? Or did he pick funny-looking numbers and not worry about whether they made sense? No way to tell from here, I suppose. (Allison doesn’t mention ways to get in touch on the comic’s About page and I’ve only got the weakest links into the professional cartoon community.)
Patrick Roberts’s Todd the Dinosaur for the 15th mentions long division as the stuff of nightmares. So it is. I guess MathWorld and Wikipedia endorse calling 128 divided by 4 long division, although I’m not sure I’m comfortable with that. This may be idiosyncratic; I’d thought of long division as where the divisor is two or more digits. A three-digit number divided by a one-digit one doesn’t seem long to me. I’d just think that was division. I’m curious what readers’ experiences have been.
Stand on the edge of a plot of land. Walk along its boundary. As you walk the edge pay attention. Note how far you walk before changing direction, even in the slightest. When you return to where you started consult your notes. Contained within them is the area you circumnavigated.
If that doesn’t startle you perhaps you haven’t thought about how odd that is. You don’t ever touch the interior of the region. You never do anything like see how many standard-size tiles would fit inside. You walk a path that is as close to one-dimensional as your feet allow. And encoded in there somewhere is an area. Stare at that incongruity and you realize why integrals baffle the student so. They have a deep strangeness embedded in them.
We who do mathematics have always liked integration. They grow, in the western tradition, out of geometry. Given a shape, what is a square that has the same area? There are shapes it’s easy to find the area for, given only straightedge and compass: a rectangle? Easy. A triangle? Just as straightforward. A polygon? If you know triangles then you know polygons. A lune, the crescent-moon shape formed by taking a circular cut out of a circle? We can do that. (If the cut is the right size.) A circle? … All right, we can’t do that, but we spent two thousand years trying before we found that out for sure. And we can do some excellent approximations.
That bit of finding-a-square-with-the-same-area was called “quadrature”. The name survives, mostly in the phrase “numerical quadrature”. We use that to mean that we computed an integral’s approximate value, instead of finding a formula that would get it exactly. The otherwise obvious choice of “numerical integration” we use already. It describes computing the solution of a differential equation. We’re not trying to be difficult about this. Solving a differential equation is a kind of integration, and we need to do that a lot. We could recast a solving-a-differential-equation problem as a find-the-area problem, and vice-versa. But that’s bother, if we don’t need to, and so we talk about numerical quadrature and numerical integration.
Integrals are built on two infinities. This is part of why it took so long to work out their logic. One is the infinity of number; we find an integral’s value, in principle, by adding together infinitely many things. The other is an infinity of smallness. The things we add together are infinitesimally small. That we need to take things, each smaller than any number yet somehow not zero, and in such quantity that they add up to something, seems paradoxical. Their geometric origins had to be merged into that of arithmetic, of algebra, and it is not easy. Bishop George Berkeley made a steady name for himself in calculus textbooks by pointing this out. We have worked out several logically consistent schemes for evaluating integrals. They work, mostly, by showing that we can make the error caused by approximating the integral smaller than any margin we like. This is a standard trick, or at least it is, now that we know it.
That “in principle” above is important. We don’t actually work out an integral by finding the sum of infinitely many, infinitely tiny, things. It’s too hard. I remember in grad school the analysis professor working out by the proper definitions the integral of 1. This is as easy an integral as you can do without just integrating zero. He escaped with his life, but it was a close scrape. He offered the integral of x as a way to test our endurance, without actually doing it. I’ve never made it through that.
The greatest tool we have on our side is the Fundamental Theorem of Calculus. Even the name promises it’s the greatest tool we might have. This rule tells us how to connect integrating a function to differentiating another function. If we can find a function whose derivative is the thing we want to integrate, then we have a formula for the integral. It’s that function we found. What a fantastic result.
The trouble is it’s so hard to find functions whose derivatives are the thing we wanted to integrate. There are a lot of functions we can find, mind you. If we want to integrate a polynomial it’s easy. Sine and cosine and even tangent? Yeah. Logarithms? A little tedious but all right. A constant number raised to the power x? Also tedious but doable. A constant number raised to the power x2? Hold on there, that’s madness. No, we can’t do that.
There is a weird grab-bag of functions we can find these integrals for. They’re mostly ones we can find some integration trick for. An integration trick is some way to turn the integral we’re interested in into a couple of integrals we can do and then mix back together. A lot of a Freshman Calculus course is a heap of tricks we’ve learned. They have names like “u-substitution” and “integration by parts” and “trigonometric substitution”. Some of them are really exotic, such as turning a single integral into a double integral because that leads us to something we can do. And there’s something called “differentiation under the integral sign” that I don’t know of anyone actually using. People know of it because Richard Feynman, in his fun memoir What Do You Care What Other People Think: 250 Pages Of How Awesome I Was In Every Situation Ever, mentions how awesome it made him in so many situations. Mathematics, physics, and engineering nerds are required to read this at an impressionable age, so we fall in love with a technique no textbook ever mentions. Sorry.
I’ve written about all this as if we were interested just in areas. We’re not. We like calculating lengths and volumes and, if we dare venture into more dimensions, hypervolumes and the like. That’s all right. If we understand how to calculate areas, we have the tools we need. We can adapt them to as many or as few dimensions as we need. By weighting integrals we can do calculations that tell us about centers of mass and moments of inertial, about the most and least probable values of something, about all quantum mechanics.
As often happens, this powerful tool starts with something anyone might ponder: what size square has the same area as this other shape? And then think seriously about it.