## On The Goldfish Situation

If you’ve been following me on Twitter you’ve seen reports of the Great Migration. This is the pompous name I give to the process of bringing the goldfish who were in tanks in the basement for the winter back outside again. This to let them enjoy the benefits of the summer, like, not having me poking around testing their water every day. (We had a winter with a lot of water quality problems. I’m probably over-testing.)

My reports about moving them back — by setting a net in that could trap some fish and moving them out — included reports of how many remained in each tank. And many people told me how such updates as “Twelve goldfish are in the left tank, three in the right, and fifteen have been brought outside” sound like the start of a story problem. Maybe it does. I don’t have a particular story problem built on this. I’m happy to take nominations for such.

But I did have some mathematics essays based on the problem of moving goldfish to the pond outdoors and to the warm water tank indoors:

• How To Count Fish, about how one could estimate a population by sampling it twice.
• How To Re-Count Fish, about one of the practical problems in using this to count as few goldfish as we have at our household.
• How Not To Count Fish, about how this population estimate wouldn’t work because of the peculiarities of goldfish psychology. Honest.

That I spend one essay describing how to do a thing, and then two more essays describing why it won’t work, may seem characteristically me. Well, yeah. Mathematics is a great tool. To use a tool safely requires understanding its powers and its limitations. I like thinking about what mathematics can and can’t do.

## My 2018 Mathematics A To Z: Unit Fractions

My subject for today is another from Iva Sallay, longtime friend of the blog and creator of the Find the Factors recreational mathematics game. I think you’ll likely find something enjoyable at her site, whether it’s the puzzle or the neat bits of trivia as she works through all the counting numbers.

# Unit Fractions.

We don’t notice how unit fractions are around us. Likely there’s some in your pocket. Or there have been recently. Think of what you do when paying for a thing, when it’s not a whole number of dollars. (Pounds, euros, whatever the unit of currency is.) Suppose you have exact change. What do you give for the 38 cents?

Likely it’s something like a 25-cent piece and a 10-cent piece and three one-cent pieces. This is an American and Canadian solution. I know that 20-cent pieces are more common than 25-cent ones worldwide. It doesn’t make much difference; if you want it to be three 10-cent, one five-cent, and three one-cent pieces that’s as good. And granted, outside the United States it’s growing common to drop pennies altogether and round prices off to a five- or ten-cent value. Again, it doesn’t make much difference.

But look at the coins. The 25 cent piece is one-quarter of a dollar. It’s even called that, and stamped that on one side. I sometimes hear a dime called “a tenth of a dollar”, although mostly by carnival barkers in one-reel cartoons of the 1930s. A nickel is one-twentieth of a dollar. A penny is one-hundredth. A 20-cent piece is one-fifth of a dollar. And there are half-dollars out there, although not in the United States, not really anymore.

(Pre-decimalized currencies offered even more unit fractions. Using old British coins, for familiarity-to-me and great names, there were farthings, 1/960th of a pound; halfpennies, 1/480th; pennies, 1/240th; threepence, 1/80th of a pound; groats, 1/60th; sixpence, 1/40th; florins, 1/10th; half-crowns, 1/8th; crowns, 1/4th. And what seem to the modern wallet like impossibly tiny fractions like the half-, third-, and quarter-farthings used where 1/3840th of a pound might be a needed sum of money.)

Unit fractions get named and defined somewhere in elementary school arithmetic. They go on, becoming forgotten sometime after that. They might make a brief reappearance in calculus. There are some rational functions that get easier to integrate if you think of them as the sums of fractions, with constant numerators and polynomial denominators. These aren’t unit fractions. A unit fraction has a 1, the unit, in the numerator. But we see units along the way to integrating $\frac{1}{x^2 - x}$ as an example. And see it in the promise that there are still more amazing integrals to learn how to do.

They get more attention if you take a history of computation class. Or read the subject on your own. Unit fractions stand out in history. We learn the Ancient Egyptians worked with fractions as sums of unit fractions. That is, had they dollars, they would not look at the $\frac{38}{100}$ we do. They would look at $\frac{1}{4}$ plus $\frac{1}{10}$ plus $\frac{1}{100}$ plus $\frac{1}{100}$ plus $\frac{1}{100}$. When we count change we are using, without noticing it, a very old computing scheme.

This isn’t quite true. The Ancient Egyptians seemed to shun repeating a unit like that. To use $\frac{1}{100}$ once is fine; three times is suspicious. They would prefer something like $\frac{1}{3}$ plus $\frac{1}{24}$ plus $\frac{1}{200}$. Or maybe some other combination. I just wrote out the first one I found.

But there are many ways we can make 38 cents using ordinary coins of the realm. There are infinitely many ways to make up any fraction using unit fractions. There’s surely a most “efficient”. Most efficient might be the one which uses the fewest number of terms. Most efficient might be the one that uses the smallest denominators. Choose what you like; no one knows a scheme that always turns up the most efficient, either way. We can always find some representation, though. It may not be “good”, but it will exist, which may be good enough. Leonardo of Pisa, or as he got named in the 19th century, Fibonacci, proved that was true.

We may ask why the Egyptians used unit fractions. They seem inefficient compared to the way we work with fractions. Or, better, decimals. I’m not sure the question can have a coherent answer. Why do we have a fashion for converting fractions to a “proper” form? Why do we use the number of decimal points we do for a given calculation? Sometimes a particular mode of expression is the fashion. It comes to seem natural because everyone uses it. We do it too.

And there is practicality to them. Even efficiency. If you need π, for example, you can write it as 3 plus $\frac{1}{8}$ plus $\frac{1}{61}$ and your answer is off by under one part in a thousand. Combine this with the Egyptian method of multiplication, where you would think of (say) “11 times π” as “1 times π plus 2 times π plus 8 times π”. And with tables they had worked up which tell you what $\frac{2}{8}$ and $\frac{2}{61}$ would be in a normal representation. You can get rather good calculations without having to do more than addition and looking up doublings. Represent π as 3 plus $\frac{1}{8}$ plus $\frac{1}{61}$ plus $\frac{1}{5020}$ and you’re correct to within one part in 130 million. That isn’t bad for having to remember four whole numbers.

(The Ancient Egyptians, like many of us, were not absolutely consistent in only using unit fractions. They had symbols to represent $\frac{2}{3}$ and $\frac{3}{4}$, probably due to these numbers coming up all the time. Human systems vary to make the commonest stuff we do easier.)

Enough practicality or efficiency, if this is that. Is there beauty? Is there wonder? Certainly. Much of it is in number theory. Number theory splits between astounding results and results that would be astounding if we had any idea how to prove them. Many of the astounding results are about unit fractions. Take, for example, the harmonic series $1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6} + \cdots$. Truncate that series whenever you decide you’ve had enough. Different numbers of terms in this series will add up to different numbers. Eventually, infinitely many numbers. The numbers will grow ever-higher. There’s no number so big that it won’t, eventually, be surpassed by some long-enough truncated harmonic series. And yet, past the number 1, it’ll never touch a whole number again. Infinitely many partial sums. Partial sums differing from one another by one-googol-plex and smaller. And yet of the infinitely many whole numbers this series manages to miss them all, after its starting point. Worse, any set of consecutive terms, not even starting from 1, will never hit a whole number. I can understand a person who thinks mathematics is boring, but how can anyone not find it astonishing?

There are more strange, beautiful things. Consider heptagonal numbers, which Iva Sallay knows well. These are numbers like 1 and 7 and 18 and 34 and 55 and 1288. Take a heptagonal number of, oh, beads or dots or whatever, and you can lay them out to form a regular seven-sided figure. Add together the reciprocals of the heptagonal numbers. What do you get? It’s a weird number. It’s irrational, which you maybe would have guessed as more likely than not. But it’s also transcendental. Most real numbers are transcendental. But it’s often hard to prove any specific number is.

Unit fractions creep back into actual use. For example, in modular arithmetic, they offer a way to turn division back into multiplication. Division, in modular arithmetic, tends to be hard. Indeed, if you need an algorithm to make random-enough numbers, you often will do something with division in modular arithmetic. Suppose you want to divide by a number x, modulo y, and x and y are relatively prime, though. Then unit fractions tell us how to turn this into finding a greatest common denominator problem.

They teach us about our computers, too. Much of serious numerical mathematics involves matrix multiplication. Matrices are, for this purpose, tables of numbers. The Hilbert Matrix has elements that are entirely unit fractions. The Hilbert Matrix is really a family of square matrices. Pick any of the family you like. It can have two rows and two columns, or three rows and three columns, or ten rows and ten columns, or a million rows and a million columns. Your choice. The first row is made of the numbers $1, \frac{1}{2}, \frac{1}{3}, \frac{1}{4},$ and so on. The second row is made of the numbers $\frac{1}{2}, \frac{1}{3}, \frac{1}{4}, \frac{1}{5},$ and so on. The third row is made of the numbers $\frac{1}{3}, \frac{1}{4}, \frac{1}{5}, \frac{1}{6},$ and so on. You see how this is going.

Matrices can have inverses. It’s not guaranteed; matrices are like that. But the Hilbert Matrix does. It’s another matrix, of the same size. All the terms in it are integers. Multiply the Hilbert Matrix by its inverse and you get the Identity Matrix. This is a matrix, the same number of rows and columns as you started with. But nearly every element in the identity matrix is zero. The only exceptions are on the diagonal — first row, first column; second row, second column; third row, third column; and so on. There, the identity matrix has a 1. The identity matrix works, for matrix multiplication, much like the real number 1 works for normal multiplication.

Matrix multiplication is tedious. It’s not hard, but it involves a lot of multiplying and adding and it just takes forever. So set a computer to do this! And you get … uh ..

For a small Hilbert Matrix and its inverse, you get an identity matrix. That’s good. For a large Hilbert Matrix and its inverse? You get garbage. Large isn’t maybe very large. A 12 by 12 matrix gives you trouble. A 14 by 14 matrix gives you a mess. Well, on my computer it does. Cute little laptop I got when my former computer suddenly died. On a better computer? One designed for computation? … You could do a little better. Less good than you might imagine.

The trouble is that computers don’t really do mathematics. They do an approximation of it, numerical computing. Most use a scheme called floating point arithmetic. It mostly works well. There’s a bit of error in every calculation. For most calculations, though, the error stays small. At least relatively small. The Hilbert Matrix, built of unit fractions, doesn’t respect this. It and its inverse have a “numerical instability”. Some kinds of calculations make errors explode. They’ll overwhelm the meaningful calculation. It’s a bit of a mess.

Numerical instability is something anyone doing mathematics on the computer must learn. Must grow comfortable with. Must understand. The matrix multiplications, and inverses, that the Hilbert Matrix involves highlights those. A great and urgent example of a subtle danger of computerized mathematics waits for us in these unit fractions. And we’ve known and felt comfortable with them for thousands of years.

There’ll be some mathematical term with a name starting ‘V’ that, barring surprises, should be posted Friday. What’ll it be? I have an idea at least. It’ll be available at this link, as are the rest of these glossary posts.

## How To Re-Count Fish

Last week I chatted a bit with a probabilistic, sampling-based method to estimate the population of fish in our backyard pond. The method estimates the population $N$ of a thing, in this case the fish, by capturing a sample of size $M$ and dividing that $M$ by the probability of catching one of the things in your sampling. Since we might know know the chance of catching the thing beforehand, we estimate it: catch some number $n$ of the fish or whatever, then put them back, and then re-catch as many. Some number $m$ of those will be re-caught, so we can estimate the chance of catching one fish as $\frac{m}{n}$. So the original population will be somewhere about $N = M \div \frac{m}{n} = M \cdot \frac{n}{m}$.

I want to talk a little bit about why that won’t work.

There is of course the obvious reason to think this will go wrong; it amounts to exactly the same reason why a baseball player with a .250 batting average — meaning the player can expect to get a hit in one out of every four at-bats — might go an entire game without getting on base, or might get on base three times in four at-bats. If something has $N$ chances to happen, and it has a probability $p$ of happening at every chance, it’s most likely that it will happen $N \cdot p$ times, but it can happen more or fewer times than that. Indeed, we’d get a little suspicious if it happened exactly $N \cdot p$ times. If we flipped a fair coin twenty times, it’s most likely to come up tails ten times, but there’s nothing odd about it coming up tails only eight or as many as fourteen times, and it’d stand out if it always came up tails exactly ten times.

To apply this to the fish problem: suppose that there are $N = 50$ fish in the pond; that 50 is the number we want to get. And suppose we know for a fact that every fish has a 12.5 percent chance — $p = 0.125$ — of being caught in our trap. Ignore for right now how we know that probability; just pretend we can count on that being exactly true. The expectation value, the most probable number of fish to catch in any attempt, is $N \cdot p = 50 \cdot 0.125 = 6.25$ fish, which presents our first obvious problem. Well, maybe a fish might be wriggling around the edge of the net and fall out as we pull the trap out. (This actually happened as I was pulling some of the baby fish in for the winter.)

With these numbers it’s most probable to catch six fish, slightly less probable to catch seven fish, less probable yet to catch five, then eight and so on. But these are all tolerably plausible numbers. I used a mathematics package (Octave, an open-source clone of Matlab) to run ten simulated catches, from fifty fish each with a probability of .125 of being caught, and came out with these sizes $M$ for the fish harvests:

 M = 4 6 3 6 7 7 5 7 8 9

Since we know, by some method, that the chance $p$ of catching any one fish is exactly 0.125, this implies fish populations $N = M \div p$ of:

 M = N = 4 6 3 6 7 7 5 7 8 9 32 48 24 48 56 56 40 56 64 72

Now, none of these is the right number, although 48 is respectably close and 56 isn’t too bad. But the range is hilarious: there might be as few as 24 or as many as 72 fish, based on just this evidence. That might as well be guessing.

This is essentially a matter of error analysis. Any one attempt at catching fish may be faulty, because the fish are too shy of the trap, or too eager to leap into it, or are just being difficult for some reason. But we can correct for the flaws of one attempt at fish-counting by repeating the experiment. We can’t always be unlucky in the same ways.

This is conceptually easy, and extremely easy to do on the computer; it’s a little harder in real life but certainly within the bounds of our research budget, since I just have to go out back and put the trap out. And redoing the experiment even pays off, too: average those population samples from the ten simulated runs there and we get a mean estimated fish population of 49.6, which is basically dead on.

(That was lucky, I must admit. Ten attempts isn’t really enough to make the variation comfortably small. Another run with ten simulated catchings produced a mean estimate population of 56; the next one … well, 49.6 again, but the one after that gave me 64. It isn’t until we get into a couple dozen attempts that the mean population estimate gets reliably close to fifty. Still, the work is essentially the same as the problem of “I flipped a fair coin some number of times; it came up tails ten times. How many times did I flip it?” It might have been any number ten or above, but I most probably flipped it about twenty times, and twenty would be your best guess absent more information.)

The same problem affects working out what the probability of catching a fish is, since we do that by catching some small number $n$ of fish and then seeing how many some smaller number $m$ of them we re-catch later on. Suppose the probability of catching a fish really is $p = 0.125$, but we’re only trying to catch $n = 6$ fish. Here’s a couple rounds of ten simulated catchings of six fish, and how many of those were re-caught:

 2 0 1 0 1 0 1 0 0 1 2 0 1 1 0 3 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 2 1

Obviously any one of those indicates a probability ranging from 0 to 0.5 of re-catching a fish. Technically, yes, 0.125 is a number between 0 and 0.5, but it hasn’t really shown itself. But if we average out all these probabilities … well, those forty attempts give us a mean estimated probability of 0.092. This isn’t excellent but at least it’s in range. If we keep doing the experiment we’d get do better; one simulated batch of a hundred experiments turned up a mean estimated probability of 0.12833. (And there’s variations, of course; another batch of 100 attempts estimated the probability at 0.13333, and then the next at 0.10667, though if you use all three hundred of these that gets to an average of 0.12278, which isn’t too bad.)

This inconvenience amounts to a problem of working with small numbers in the original fish population, in the number of fish sampled in any one catching, and in the number of catches done to estimate their population. Small numbers tend to be problems for probability and statistics; the tools grow much more powerful and much more precise when they can work with enormously large collections of things. If the backyard pond held infinitely many fish we could have a much better idea of how many fish were in it.