## From my Seventh A-to-Z: Big-O and Little-O Notation

I toss off a mention in this essay, about its book publication. By the time it appeared I was thinking whether I could assemble these A-to-Z’s, or a set of them, into a book. I haven’t had the energy to put that together but it still seems viable.

Mr Wu, author of the Singapore Maths Tuition blog, asked me to explain a technical term today. I thought that would be a fun, quick essay. I don’t learn very fast, do I?

A note on style. I make reference here to “Big-O” and “Little-O”, capitalizing and hyphenating them. This is to give them visual presence as a name. In casual discussion they’re just read, or said, as the two words or word-and-a-letter. Often the Big- or Little- gets dropped and we just talk about O. An O, without further context, in my experience means Big-O.

The part of me that wants smooth consistency in prose urges me to write “Little-o”, as the thing described is represented with a lowercase ‘o’. But Little-o sounds like a midway game or an Eyerly Aircraft Company amusement park ride. And I never achieve consistency in my prose anyway. Maybe for the book publication. Until I’m convinced another is better, though, “Little-O” it is.

# Big-O and Little-O Notation.

When I first went to college I had a campus post office box. I knew my box number. I also knew the length of the sluggish line for the combination lock code. The lock was a dial, lettered A through J. Being a young STEM-class idiot I thought, boy, would it actually be quicker to pick the lock than wait for the line? A three-letter combination, of ten options? That’s 1,000 possibilities. If I could try five a minute that’s, at worst, three hours 20 minutes. Combination might be anywhere in that set; I might get lucky. I could expect to spend 80 minutes picking my lock.

I decided to wait in line instead, and good that I did. I was unaware lock settings might not be a letter, like ‘A’. It could be the midway point between adjacent letters, like ‘AB’. That meant there were eight times as many combinations as I estimated, and I could expect to spend over ten hours. Even the slow line was faster than that. It transpired that my combination had two of these midway letters.

But that’s a little demonstration of algorithmic complexity. Also in cracking passwords by trial-and-error. Doubling the set of possible combination codes octuples the time it takes to break into the set. Making the combination longer would also work; each extra letter would multiply the cracking time by twenty. So you understand why your password should include “special characters” like punctuation, but most of all should be long.

We’re often interested in how long to expect a task to take. Sometimes we’re interested in the typical time it takes. Often we’re interested in the longest it could ever take. If we have a deterministic algorithm, we can say. We can count how many steps it takes. Sometimes this is easy. If we want to add two two-digit numbers together we know: it will be, at most, three single-digit additions plus, maybe, writing down a carry. (To add 98 and 37 is adding 8 + 7 to get 15, to add 9 + 3 to get 12, and to take the carry from the 15, so, 1 + 12 to get 13, so we have 135.) We can get a good quarrel going about what “a single step” is. We can argue whether that carry into the hundreds column is really one more addition. But we can agree that there is some smallest bit of arithmetic work, and proceed from that.

For any algorithm we have something that describes how big a thing we’re working on. It’s often ‘n’. If we need more than one variable to describe how big it is, ‘m’ gets called up next. If we’re estimating how long it takes to work on a number, ‘n’ is the number of digits in the number. If we’re thinking about a square matrix, ‘n’ is the number of rows and columns. If it’s a not-square matrix, then ‘n’ is the number of rows and ‘m’ the number of columns. Or vice-versa; it’s your matrix. If we’re looking for an item in a list, ‘n’ is the number of items in the list. If we’re looking to evaluate a polynomial, ‘n’ is the order of the polynomial.

In normal circumstances we don’t work out how many steps some operation does take. It’s more useful to know that multiplying these two long numbers would take about 900 steps than that it would need only 816. And so this gives us an asymptotic estimate. We get an estimate of how much longer cracking the combination lock will take if there’s more letters to pick from. This allowing that some poor soul will get the combination A-B-C.

There are a couple ways to describe how long this will take. The more common is the Big-O. This is just the letter, like you find between N and P. Since that’s easy, many have taken to using a fancy, vaguely cursive O, one that looks like $\mathcal{O}$. I agree it looks nice. Particularly, though, we write $\mathcal{O}(f(n))$, where f is some function. In practice, we’ll see functions like $\mathcal{O}(n)$ or $\mathcal{O}(n^2 \log(n))$ or $\mathcal{O}(n^3)$. Usually something simple like that. It can be tricky. There’s a scheme for multiplying large numbers together that’s $\mathcal{O}(n \cdot 2^{\sqrt{2 log (n)}} \cdot log(n))$. What you will not see is something like $\mathcal{O}(\sin (n))$, or $\mathcal{O}(n^3 - n^4)$ or such. This comes to what we mean by the Big-O.

It’ll be convenient for me to have a name for the actual number of steps the algorithm takes. Let me call the function describing that g(n). Then g(n) is $\mathcal{O}(f(n))$ if once n gets big enough, g(n) is always less than C times f(n). Here c is some constant number. Could be 1. Could be 1,000,000. Could be 0.00001. Doesn’t matter; it’s some positive number.

There’s some neat tricks to play here. For example, the function ‘$n$‘ is $\mathcal{O}(n)$. It’s also $\mathcal{O}(n^2)$ and $\mathcal{O}(n^9)$ and $\mathcal{O}(e^{n})$. The function ‘$n^2$‘ is also $\mathcal{O}(n^2)$ and those later terms, but it is not $\mathcal{O}(n)$. And you can see why $\mathcal{O}(\sin(n))$ is right out.

There is also a Little-O notation. It, too, is an upper bound on the function. But it is a stricter bound, setting tighter restrictions on what g(n) is like. You ask how it is the stricter bound gets the minuscule letter. That is a fine question. I think it’s a quirk of history. Both symbols come to us through number theory. Big-O was developed first, published in 1894 by Paul Bachmann. Little-O was published in 1909 by Edmund Landau. Yes, the one with the short Hilbert-like list of number theory problems. In 1914 G H Hardy and John Edensor Littlewood would work on another measure and they used Ω to express it. (If you see the letter used for Big-O and Little-O as the Greek omicron, then you see why a related concept got called omega.)

What makes the Little-O measure different is its sternness. g(n) is $o(f(n))$ if, for every positive number C, whenever n is large enough g(n) is less than or equal to C times f(n). I know that sounds almost the same. Here’s why it’s not.

If g(n) is $\mathcal{O}(f(n))$, then you can go ahead and pick a C and find that, eventually, $g(n) \le C f(n)$. If g(n) is $o(f(n))$, then I, trying to sabotage you, can go ahead and pick a C, trying my best to spoil your bounds. But I will fail. Even if I pick, like a C of one millionth of a billionth of a trillionth, eventually f(n) will be so big that $g(n) \le C f(n)$. I can’t find a C small enough that f(n) doesn’t eventually outgrow it, and outgrow g(n).

This implies some odd-looking stuff. Like, that the function n is not $o(n)$. But the function n is at least $o(n^2)$, and $o(n^9)$ and those other fun variations. Being Little-O compels you to be Big-O. Big-O is not compelled to be Little-O, although it can happen.

These definitions, for Big-O and Little-O, I’ve laid out from algorithmic complexity. It’s implicitly about functions defined on the counting numbers. But there’s no reason I have to limit the ideas to that. I could define similar ideas for a function g(x), with domain the real numbers, and come up with an idea of being on the order of f(x).

We make some adjustments to this. The important one is that, with algorithmic complexity, we assumed g(n) had to be a positive number. What would it even mean for something to take minus four steps to complete? But a regular old function might be zero or negative or change between negative and positive. So we look at the absolute value of g(x). Is there some value of C so that, when x is big enough, the absolute value of g(x) stays less than C times f(x)? If it does, then g(x) is $\mathcal{O}(f(x))$. Is it the case that for every positive number C it’s true that g(x) is less than C times f(x), once x is big enough? Then g(x) is $o(f(x))$.

Fine, but why bother defining this?

A compelling answer is that it gives us a way to describe how different a function is from an approximation to that function. We are always looking for approximations to functions because most functions are hard. We have a small set of functions we like to work with. Polynomials are great numerically. Exponentials and trig functions are great analytically. That’s about all the functions that are easy to work with. Big-O notation particularly lets us estimate how bad an error we make using the approximation.

For example, the Runge-Kutta method numerically approximates solutions to ordinary differential equations. It does this by taking the information we have about the function at some point x to approximate its value at a point x + h. ‘h’ is some number. The difference between the actual answer and the Runge-Kutta approximation is $\mathcal{O}(h^4)$. We use this knowledge to make sure our error is tolerable. Also, we don’t usually care what the function is at x + h. It’s just what we can calculate. What we want is the function at some point a fair bit away from x, call it x + L. So we use our approximate knowledge of conditions at x + h to approximate the function at x + 2h. And use x + 2h to tell us about x + 3h, and from that x + 4h and so on, until we get to x + L. We’d like to have as few of these uninteresting intermediate points as we can, so look for as big an h as is safe.

That context may be the more common one. We see it, particularly, in Taylor Series and other polynomial approximations. For example, the sine of a number is approximately:

$\sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \frac{x^9}{9!} + \mathcal{O}(x^{11})$

This has consequences. It tells us, for example, that if x is about 0.1, this approximation is probably pretty good. So it is: the sine of 0.1 (radians) is about 0.0998334166468282 and that’s exactly what five terms here gives us. But it also warns that if x is about 10, this approximation may be gibberish. And so it is: the sine of 10.0 is about -0.5440 and the polynomial is about 1448.27.

The connotation in using Big-O notation here is that we look for small h’s, and for $\mathcal{O}(x)$ to be a tiny number. It seems odd to use the same notation with a large independent variable and with a small one. The concept carries over, though, and helps us talk efficiently about this different problem.

## My All 2020 Mathematics A to Z: Big-O and Little-O Notation

Mr Wu, author of the Singapore Maths Tuition blog, asked me to explain a technical term today. I thought that would be a fun, quick essay. I don’t learn very fast, do I?

A note on style. I make reference here to “Big-O” and “Little-O”, capitalizing and hyphenating them. This is to give them visual presence as a name. In casual discussion they’re just read, or said, as the two words or word-and-a-letter. Often the Big- or Little- gets dropped and we just talk about O. An O, without further context, in my experience means Big-O.

The part of me that wants smooth consistency in prose urges me to write “Little-o”, as the thing described is represented with a lowercase ‘o’. But Little-o sounds like a midway game or an Eyerly Aircraft Company amusement park ride. And I never achieve consistency in my prose anyway. Maybe for the book publication. Until I’m convinced another is better, though, “Little-O” it is.

# Big-O and Little-O Notation.

When I first went to college I had a campus post office box. I knew my box number. I also knew the length of the sluggish line for the combination lock code. The lock was a dial, lettered A through J. Being a young STEM-class idiot I thought, boy, would it actually be quicker to pick the lock than wait for the line? A three-letter combination, of ten options? That’s 1,000 possibilities. If I could try five a minute that’s, at worst, three hours 20 minutes. Combination might be anywhere in that set; I might get lucky. I could expect to spend 80 minutes picking my lock.

I decided to wait in line instead, and good that I did. I was unaware lock settings might not be a letter, like ‘A’. It could be the midway point between adjacent letters, like ‘AB’. That meant there were eight times as many combinations as I estimated, and I could expect to spend over ten hours. Even the slow line was faster than that. It transpired that my combination had two of these midway letters.

But that’s a little demonstration of algorithmic complexity. Also in cracking passwords by trial-and-error. Doubling the set of possible combination codes octuples the time it takes to break into the set. Making the combination longer would also work; each extra letter would multiply the cracking time by twenty. So you understand why your password should include “special characters” like punctuation, but most of all should be long.

We’re often interested in how long to expect a task to take. Sometimes we’re interested in the typical time it takes. Often we’re interested in the longest it could ever take. If we have a deterministic algorithm, we can say. We can count how many steps it takes. Sometimes this is easy. If we want to add two two-digit numbers together we know: it will be, at most, three single-digit additions plus, maybe, writing down a carry. (To add 98 and 37 is adding 8 + 7 to get 15, to add 9 + 3 to get 12, and to take the carry from the 15, so, 1 + 12 to get 13, so we have 135.) We can get a good quarrel going about what “a single step” is. We can argue whether that carry into the hundreds column is really one more addition. But we can agree that there is some smallest bit of arithmetic work, and proceed from that.

For any algorithm we have something that describes how big a thing we’re working on. It’s often ‘n’. If we need more than one variable to describe how big it is, ‘m’ gets called up next. If we’re estimating how long it takes to work on a number, ‘n’ is the number of digits in the number. If we’re thinking about a square matrix, ‘n’ is the number of rows and columns. If it’s a not-square matrix, then ‘n’ is the number of rows and ‘m’ the number of columns. Or vice-versa; it’s your matrix. If we’re looking for an item in a list, ‘n’ is the number of items in the list. If we’re looking to evaluate a polynomial, ‘n’ is the order of the polynomial.

In normal circumstances we don’t work out how many steps some operation does take. It’s more useful to know that multiplying these two long numbers would take about 900 steps than that it would need only 816. And so this gives us an asymptotic estimate. We get an estimate of how much longer cracking the combination lock will take if there’s more letters to pick from. This allowing that some poor soul will get the combination A-B-C.

There are a couple ways to describe how long this will take. The more common is the Big-O. This is just the letter, like you find between N and P. Since that’s easy, many have taken to using a fancy, vaguely cursive O, one that looks like $\mathcal{O}$. I agree it looks nice. Particularly, though, we write $\mathcal{O}(f(n))$, where f is some function. In practice, we’ll see functions like $\mathcal{O}(n)$ or $\mathcal{O}(n^2 \log(n))$ or $\mathcal{O}(n^3)$. Usually something simple like that. It can be tricky. There’s a scheme for multiplying large numbers together that’s $\mathcal{O}(n \cdot 2^{\sqrt{2 log (n)}} \cdot log(n))$. What you will not see is something like $\mathcal{O}(\sin (n))$, or $\mathcal{O}(n^3 - n^4)$ or such. This comes to what we mean by the Big-O.

It’ll be convenient for me to have a name for the actual number of steps the algorithm takes. Let me call the function describing that g(n). Then g(n) is $\mathcal{O}(f(n))$ if once n gets big enough, g(n) is always less than C times f(n). Here c is some constant number. Could be 1. Could be 1,000,000. Could be 0.00001. Doesn’t matter; it’s some positive number.

There’s some neat tricks to play here. For example, the function ‘$n$‘ is $\mathcal{O}(n)$. It’s also $\mathcal{O}(n^2)$ and $\mathcal{O}(n^9)$ and $\mathcal{O}(e^{n})$. The function ‘$n^2$‘ is also $\mathcal{O}(n^2)$ and those later terms, but it is not $\mathcal{O}(n)$. And you can see why $\mathcal{O}(\sin(n))$ is right out.

There is also a Little-O notation. It, too, is an upper bound on the function. But it is a stricter bound, setting tighter restrictions on what g(n) is like. You ask how it is the stricter bound gets the minuscule letter. That is a fine question. I think it’s a quirk of history. Both symbols come to us through number theory. Big-O was developed first, published in 1894 by Paul Bachmann. Little-O was published in 1909 by Edmund Landau. Yes, the one with the short Hilbert-like list of number theory problems. In 1914 G H Hardy and John Edensor Littlewood would work on another measure and they used Ω to express it. (If you see the letter used for Big-O and Little-O as the Greek omicron, then you see why a related concept got called omega.)

What makes the Little-O measure different is its sternness. g(n) is $o(f(n))$ if, for every positive number C, whenever n is large enough g(n) is less than or equal to C times f(n). I know that sounds almost the same. Here’s why it’s not.

If g(n) is $\mathcal{O}(f(n))$, then you can go ahead and pick a C and find that, eventually, $g(n) \le C f(n)$. If g(n) is $o(f(n))$, then I, trying to sabotage you, can go ahead and pick a C, trying my best to spoil your bounds. But I will fail. Even if I pick, like a C of one millionth of a billionth of a trillionth, eventually f(n) will be so big that $g(n) \le C f(n)$. I can’t find a C small enough that f(n) doesn’t eventually outgrow it, and outgrow g(n).

This implies some odd-looking stuff. Like, that the function n is not $o(n)$. But the function n is at least $o(n^2)$, and $o(n^9)$ and those other fun variations. Being Little-O compels you to be Big-O. Big-O is not compelled to be Little-O, although it can happen.

These definitions, for Big-O and Little-O, I’ve laid out from algorithmic complexity. It’s implicitly about functions defined on the counting numbers. But there’s no reason I have to limit the ideas to that. I could define similar ideas for a function g(x), with domain the real numbers, and come up with an idea of being on the order of f(x).

We make some adjustments to this. The important one is that, with algorithmic complexity, we assumed g(n) had to be a positive number. What would it even mean for something to take minus four steps to complete? But a regular old function might be zero or negative or change between negative and positive. So we look at the absolute value of g(x). Is there some value of C so that, when x is big enough, the absolute value of g(x) stays less than C times f(x)? If it does, then g(x) is $\mathcal{O}(f(x))$. Is it the case that for every positive number C it’s true that g(x) is less than C times f(x), once x is big enough? Then g(x) is $o(f(x))$.

Fine, but why bother defining this?

A compelling answer is that it gives us a way to describe how different a function is from an approximation to that function. We are always looking for approximations to functions because most functions are hard. We have a small set of functions we like to work with. Polynomials are great numerically. Exponentials and trig functions are great analytically. That’s about all the functions that are easy to work with. Big-O notation particularly lets us estimate how bad an error we make using the approximation.

For example, the Runge-Kutta method numerically approximates solutions to ordinary differential equations. It does this by taking the information we have about the function at some point x to approximate its value at a point x + h. ‘h’ is some number. The difference between the actual answer and the Runge-Kutta approximation is $\mathcal{O}(h^4)$. We use this knowledge to make sure our error is tolerable. Also, we don’t usually care what the function is at x + h. It’s just what we can calculate. What we want is the function at some point a fair bit away from x, call it x + L. So we use our approximate knowledge of conditions at x + h to approximate the function at x + 2h. And use x + 2h to tell us about x + 3h, and from that x + 4h and so on, until we get to x + L. We’d like to have as few of these uninteresting intermediate points as we can, so look for as big an h as is safe.

That context may be the more common one. We see it, particularly, in Taylor Series and other polynomial approximations. For example, the sine of a number is approximately:

$\sin(x) = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \frac{x^9}{9!} + \mathcal{O}(x^{11})$

This has consequences. It tells us, for example, that if x is about 0.1, this approximation is probably pretty good. So it is: the sine of 0.1 (radians) is about 0.0998334166468282 and that’s exactly what five terms here gives us. But it also warns that if x is about 10, this approximation may be gibberish. And so it is: the sine of 10.0 is about -0.5440 and the polynomial is about 1448.27.

The connotation in using Big-O notation here is that we look for small h’s, and for $\mathcal{O}(x)$ to be a tiny number. It seems odd to use the same notation with a large independent variable and with a small one. The concept carries over, though, and helps us talk efficiently about this different problem.

I hope this week to post the Playful Math Education Blog Carnival for September. Any educational or recreational or fun mathematics sites you know about would be greatly helpful to me and them. Thanks for your help.

Lastly, I am open for mathematics topics starting with P, Q, and R to write about next month. I’ve basically chosen my ‘P’ subject, though I’d be happy to hear alternatives for ‘Q’ and ‘R’ yet.

## In Our Time podcast repeats episode on P versus NP

The BBC’s general-discussion podcast In Our Time repeated another mathematics-themed session this week. The topic is P versus NP, a matter of computational complexity. P and NP here are shorthands to describe the amount of work needed to follow a procedure. And, particularly, how the amount of work changes as the size of the problem being worked on changes. We know there are problems of a complexity type P, and problems of a complexity type NP. What’s not known is whether those are, actually, the same, whether there’s a clever way of describing an NP problem so we can solve it with a P approach.

I do not remember whether I heard this program when it originally broadcast in 2015. And I haven’t had time to listen to this one yet. But these discussions are usually prett solid, and will get to discussing the ambiguities and limitations and qualifications of the field. So I feel comfortable recommending it even without a recent listen, which I will likely get to sometime during this week’s walks.

## Reading the Comics, April 6, 2020: My Perennials Edition

As much as everything is still happening, and so much, there’s still comic strips. I’m fortunately able here to focus just on the comics that discuss some mathematical theme, so let’s get started in exploring last week’s reading. Worth deeper discussion are the comics that turn up here all the time.

Lincoln Peirce’s Big Nate for the 5th is a casual mention. Nate wants to get out of having to do his mathematics homework. This really could be any subject as long as it fit the word balloon.

John Hambrock’s The Brilliant Mind of Edison Lee for the 6th is a funny-answers-to-story-problems joke. Edison Lee’s answer disregards the actual wording of the question, which supposes the group is travelling at an average 70 miles per hour. The number of stops doesn’t matter in this case.

Mark Anderson’s Andertoons for the 6th is the Mark Anderson’s Andertoons for the week. In it Wavehead gives the “just use a calculator” answer for geometry problems.

Not much to talk about there. But there is a fascinating thing about perimeters that you learn if you go far enough in Calculus. You have to get into multivariable calculus, something where you integrate a function that has at least two independent variables. When you do this, you can find the integral evaluated over a curve. If it’s a closed curve, something that loops around back to itself, then you can do something magic. Integrating the correct function on the curve around a shape will tell you the enclosed area.

And this is an example of one of the amazing things in multivariable calculus. It tells us that integrals over a boundary can tell us something about the integral within a volume, and vice-versa. It can be worth figuring out whether your integral is better solved by looking at the boundaries or at the interiors.

Heron’s Formula, for the area of a triangle based on the lengths of its sides, is an expression of this calculation. I don’t know of a formula exactly like that for the perimeter of a quadrilateral, but there are similar formulas if you know the lengths of the sides and of the diagonals.

Richard Thompson’s Cul de Sac rerun for the 6th sees Petey working on his mathematics homework. As with the Big Nate strip, it could be any subject.

Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 5th depicts, fairly, the sorts of things that excite mathematicians. The number discussed here is about algorithmic complexity. This is the study of how long it takes to do an algorithm. How long always depends on how big a problem you are working on; to sort four items takes less time than sorting four million items. Of interest here is how much the time to do work grows with the size of whatever you’re working on.

The mathematician’s particular example, and I thank dtpimentel in the comments for finding this, is about the Coppersmith–Winograd algorithm. This is a scheme for doing matrix multiplication, a particular kind of multiplication and addition of squares of numbers. The squares have some number N rows and N columns. It’s thought that there exists some way to do matrix multiplication in the order of N2 time, that is, if it takes 10 time units to multiply matrices of three rows and three columns together, we should expect it takes 40 time units to multiply matrices of six rows and six columns together. The matrix multiplication you learn in linear algebra takes on the order of N3 time, so, it would take like 80 time units.

We don’t know the way to do that. The Coppersmith–Winograd algorithm was thought, after Virginia Vassilevska Williams’s work in 2011, to take something like N2.3728642 steps. So that six-rows-six-columns multiplication would take slightly over 51.796 844 time units. In 2014, François le Gall found it was no worse than N2.3728639 steps, so this would take slightly over 51.796 833 time units. The improvement doesn’t seem like much, but on tiny problems it never does. On big problems, the improvement’s worth it. And, sometimes, you make a good chunk of progress at once.

I’ll have some more comic strips to discuss in an essay at this link, sometime later this week. Thanks for reading.

## Reading the Comics, November 21, 2019: Computational Science Edition

There were just a handful of comic strips that mentioned mathematical topics I found substantial. Of those that did, computational science came up a couple times. So that’s how we got to here.

Rick Detorie’s One Big Happy for the 17th has Joe writing an essay on the history of computing. It’s basically right, too, within the confines of space and understandable mistakes like replacing Pennsylvania with an easier-to-spell state. And within the confines of simplification for the sake of getting the idea across briefly. Most notable is Joe explaining ENIAC as “the first electronic digital computer”. Anyone calling anything “the first” of an invention is simplifying history, possibly to the point of misleading. But we must simplify any history to have it be understandable. ENIAC is among the first computers that anyone today would agree is of a kind with the laptop I use. And it’s certainly the one that, among its contemporaries, most captured the public imagination.

Incidentally, Heman Hollerith was born on Leap Day, 1860; this coming year will in that sense see only his 39th birthday.

Ryan North’s Dinosaur Comics for the 18th is based on the question of whether P equals NP. This is, as T-Rex says, the greatest unsolved problem in computer science. These are what appear to be two different kinds of problems. Some of them we can solve in “polynomial time”, with the number of steps to find a solution growing as some polynomial function of the size of the problem. Others seem to be “non-polynomial”, meaning the number of steps to find a solution grows as … something not a polynomial.

You see one problem. Not knowing a way to solve a problem in polynomial time does not necessarily mean there isn’t a solution. It may mean we just haven’t thought of one. If there is a way we haven’t thought of, then we would say P equals NP. And many people assume that very exciting things would then follow. Part of this is because computational complexity researchers know that many NP problems are isomorphic to one another. That is, we can describe any of these problems as a translation of another of these problems. This is the other part which makes this joke: the declaration that ‘whether God likes poutine’ is isomorphic to the question ‘does P equal NP’.

We tend to assume, also, that if P does equal NP then NP problems, such as breaking public-key cryptography, are all suddenly easy. This isn’t necessarily guaranteed. When we describe something as polynomial or non-polynomial time we’re talking about the pattern by which the number of steps needed to find the solution grows. In that case, then, an algorithm that takes one million steps plus one billion times the size-of-the-problem to the one trillionth power is polynomial time. An algorithm that takes two raised to the size-of-the-problem divided by one quintillion (rounded up to the next whole number) is non-polynomial. But for most any problem you’d care to do, this non-polynomial algorithm will be done sooner. If it turns out P does equal NP, we still don’t necessarily know that NP problems are practical to solve.

Bil Keane and Jeff Keane’s The Family Circus for the 20th has Dolly explaining to Jeff about the finiteness of the alphabet and infinity of numbers. I remember in my childhood coming to understand this and feeling something unjust in the difference between the kinds of symbols. That we can represent any of those whole numbers with just ten symbols (thirteen, if we include commas, decimals, and a multiplication symbol for the sake of using scientific notation) is an astounding feat of symbolic economy.

Zach Weinersmth’s Saturday Morning Breakfast cereal for the 21st builds on the statistics of genetics. In studying the correlations between one thing and another we look at something which varies, usually as the result of many factors, including some plain randomness. If there is a correlation between one variable and another we usually can describe how much of the change in one quantity depends on the other. This is what the scientist means on saying the presence of this one gene accounts for 0.1% of the variance in eeeeevil. The way this is presented, the activity of one gene is responsible for about one-thousandth of the level of eeeeevil in the person.

As the father observes, this doesn’t seem like much. This is because there are a lot of genes describing most traits. And that before we consider epigenetics, the factors besides what is in DNA that affect how an organism develops. I am, unfortunately, too ignorant of the language of genetics to be able to say what a typical variation for a single gene would be, and thus to check whether Weinersmith has the scale of numbers right.

This finishes the mathematically-themed comic strips from this past week. If all goes to my plan, Tuesday and Thursday will find the last of this year’s A-to-Z postings for this year. And Wednesday? I’ll try to think of something for Wednesday. It’d be a shame to just leave it hanging loose like it might.

## Reading the Comics, November 12, 2016: Frazz and Monkeys Edition

Two things made repeat appearances in the mathematically-themed comics this week. They’re the comic strip Frazz and the idea of having infinitely many monkeys typing. Well, silly answers to word problems also turned up, but that’s hard to say many different things about. Here’s what I make the week in comics out to be.

Sandra Bell-Lundy’s Between Friends for the 6th introduces the infinite monkeys problem. I wonder sometimes why the monkeys-on-typewriters thing has so caught the public imagination. And then I remember it encourages us to stare directly into infinity and its intuition-destroying nature from the comfortable furniture of the mundane — typewriters, or keyboards, for goodness’ sake — with that childish comic dose of monkeys. Given that it’s a wonder we ever talk about anything else, really.

Monkeys writing Shakespeare has for over a century stood as a marker for what’s possible but incredibly improbable. I haven’t seen it compared to finding a four-digit PIN. It has got me wondering about the chance that four randomly picked letters will be a legitimate English word. I’m sure the chance is more than the one-in-a-thousand chance someone would guess a randomly drawn PIN correctly on one try. More than one in a hundred? I’m less sure. The easy-to-imagine thing to do is set a computer to try out all 456,976 possible sets of four letters and check them against a dictionary. The number of hits divided by the number of possibilities would be the chance of drawing a legitimate word. If I had a less capable computer, or were checking even longer words, I might instead draw some set number of words, never minding that I didn’t get every possibility. The fraction of successful words in my sample would be something close to the chance of drawing any legitimate word.

If I thought a little deeper about the problem, though, I’d just count how many four-letter words are already in my dictionary and divide that into 456,976. It’s always a mistake to start programming before you’ve thought the problem out. The trouble is not being able to tell when that thinking-out is done.

Richard Thompson’s Poor Richard’s Almanac for the 7th is the other comic strip to mention infinite monkeys. Well, chimpanzees in this case. But for the mathematical problem they’re not different. I’ve featured this particular strip before. But I’m a Thompson fan. And goodness but look at the face on the T S Eliot fan in the lower left corner there.

Jeff Mallet’s Frazz for the 6th gives Caulfield one of those flashes of insight that seems like it should be something but doesn’t mean much. He’s had several of these lately, as mentioned here last week. As before this is a fun discovery about Roman Numerals, but it doesn’t seem like it leads to much. Perhaps a discussion of how the subtractive principle — that you can write “four” as “IV” instead of “IIII” — evolved over time. But then there isn’t much point to learning Roman Numerals at all. It’s got some value in showing how much mathematics depends on culture. Not just that stuff can be expressed in different ways, but that those different expressions make different things easier or harder to do. But I suspect that isn’t the objective of lessons about Roman Numerals.

Frazz got my attention again the 12th. This time it just uses arithmetic, and a real bear of an arithmetic problem, as signifier for “a big pile of hard work”. This particular problem would be — well, I have to call it tedious, rather than hard. doing it is just a long string of adding together two numbers. But to do that over and over, by my count, at least 47 times for this one problem? Hardly any point to doing that much for one result.

Patrick Roberts’s Todd the Dinosaur for the 7th calls out fractions, and arithmetic generally, as the stuff that ruins a child’s dreams. (Well, a dinosaur child’s dreams.) Still, it’s nice to see someone reminding mathematicians that a lot of their field is mostly used by accountants. Actuaries we know about; mathematics departments like to point out that majors can get jobs as actuaries. I don’t know of anyone I went to school with who chose to become one or expressed a desire to be an actuary. But I admit not asking either.

Mike Thompson’s Grand Avenue started off a week of students-resisting-the-test-question jokes on the 7th. Most of them are hoary old word problem jokes. But, hey, I signed up to talk about it when a comic strip touches a mathematics topic and word problems do count.

Zach Weinersmith’s Saturday Morning Breakfast Cereal reprinted the 7th is a higher level of mathematical joke. It’s from the genre of nonsense calculation. This one starts off with what’s almost a cliche, at least for mathematics and physics majors. The equation it starts with, $e^{i Pi} = -1$, is true. And famous. It should be. It links exponentiation, imaginary numbers, π, and negative numbers. Nobody would have seen it coming. And from there is the sort of typical gibberish reasoning, like writing “Pi” instead of π so that it can be thought of as “P times i”, to draw to the silly conclusion that P = 0. That much work is legitimate.

From there it sidelines into “P = NP”, which is another equation famous to mathematicians and computer scientists. It’s a shorthand expression of a problem about how long it takes to find solutions. That is, how many steps it takes. How much time it would take a computer to solve a problem. You can see why it’s important to have some study of how long it takes to do a problem. It would be poor form to tie up your computer on a problem that won’t be finished before the computer dies of old age. Or just take too long to be practical.

Most problems have some sense of size. You can look for a solution in a small problem or in a big one. You expect searching for the solution in a big problem to take longer. The question is how much longer? Some methods of solving problems take a length of time that grows only slowly as the size of the problem grows. Some take a length of time that grows crazy fast as the size of the problem grows. And there are different kinds of time growth. One kind is called Polynomial, because everything is polynomials. But there’s a polynomial in the problem’s size that describes how long it takes to solve. We call this kind of problem P. Another is called Non-Deterministic Polynomial, for problems that … can’t. We assume. We don’t know. But we know some problems that look like they should be NP (“NP Complete”, to be exact).

It’s an open question whether P and NP are the same thing. It’s possible that everything we think might be NP actually can be solved by a P-class algorithm we just haven’t thought of yet. It would be a revolution in our understanding of how to find solutions if it were. Most people who study algorithms think P is not NP. But that’s mostly (as I understand it) because it seems like if P were NP then we’d have some leads on proving that by now. You see how this falls short of being rigorous. But it is part of expertise to get a feel for what seems to make sense in light of everything else we know. We may be surprised. But it would be inhuman not to have any expectations of a problem like this.

Mark Anderson’s Andertoons for the 8th gives us the Andertoons content for the week. It’s a fair question why a right triangle might have three sides, three angles, three vertices, and just the one hypotenuse. The word’s origin, from Greek, meaning “stretching under” or “stretching between”. It’s unobjectionable that we might say this is the stretch from one leg of the right triangle to another. But that leaves unanswered why there’s just the one hypothenuse, since the other two legs also stretch from the end of one leg to another. Dr Sarah on The Math Forum suggests we need to think of circles. Draw a circle and a diameter line on it. Now pick any point on the circle other than where the diameter cuts it. Draw a line from one end of the diameter to your point. And from your point to the other end of the diameter. You have a right triangle! And the hypothenuse is the leg stretching under the other two. Yes, I’m assuming you picked a point above the diameter. You did, though, didn’t you? Humans do that sort of thing.

I don’t know if Dr Sarah’s explanation is right. It sounds plausible and sensible. But those are weak pins to hang an etymology on. But I have no reason to think she’s mistaken. And the explanation might help people accept there is the one hypothenuse and there’s something interesting about it.

The first (and as I write this only) commenter, Kristiaan, has a good if cheap joke there.