If you’ve been following me on Twitter you’ve seen reports of the Great Migration. This is the pompous name I give to the process of bringing the goldfish who were in tanks in the basement for the winter back outside again. This to let them enjoy the benefits of the summer, like, not having me poking around testing their water every day. (We had a winter with a lot of water quality problems. I’m probably over-testing.)

The Great Migration finally: four goldfish brought outside today. 12 remain in the left tank, 14 in the right, I think.

My reports about moving them back — by setting a net in that could trap some fish and moving them out — included reports of how many remained in each tank. And many people told me how such updates as “Twelve goldfish are in the left tank, three in the right, and fifteen have been brought outside” sound like the start of a story problem. Maybe it does. I don’t have a particular story problem built on this. I’m happy to take nominations for such.

But I did have some mathematics essays based on the problem of moving goldfish to the pond outdoors and to the warm water tank indoors:

How To Count Fish, about how one could estimate a population by sampling it twice.

How To Re-Count Fish, about one of the practical problems in using this to count as few goldfish as we have at our household.

How Not To Count Fish, about how this population estimate wouldn’t work because of the peculiarities of goldfish psychology. Honest.

That I spend one essay describing how to do a thing, and then two more essays describing why it won’t work, may seem characteristically me. Well, yeah. Mathematics is a great tool. To use a tool safely requires understanding its powers and its limitations. I like thinking about what mathematics can and can’t do.

A couple of weeks after that — on Thanksgiving, it happens — we caught one more fish. This brought the total to 54. And I either failed to make note of it or I can’t find the note I made of it. Such happens.

In getting the pond ready for the spring, and the return of our goldfish to the outdoors, we found another one! It was just this orange thing dug into the muck of the pool, and we thought initially it was something that had fallen in and gotten lost. A heron scarer, was my love’s first guess. The pond thermometer that sank without trace some years back was mine. I used the grabber to poke at it and woke up a pretty sulky goldfish. It went over to some algae where we couldn’t so easily bother it.

So that brings our fish count to 55, for those keeping track. Fortunately, it was a very gentle winter in our parts. We’re hoping to bring the goldfish back out to the pond in the next week or two. Our best estimate for the carrying capacity of the pond is 65 to 130 goldfish, so, we will see whether the goldfish do anything about this slight underpopulation.

Folks who’ve been around a while may remember the matter of our fish. I’d spent some time in the spring describing ways to estimate a population using techniques other than just counting everybody. And then revealed that the population of goldfish in our pond was something like 53, based on counting the fifty which we’d had wintering over in our basement and the three we counted in the pond despite the winter ice. This is known as determining the population “by inspection”.

I’m disappointed to say that, as best we can work out, they didn’t get around to producing any new goldfish this year. We didn’t see any evidence of babies, and haven’t seen any noticeably small ones swimming around. It’s possible we set them out too late in the spring. It’s possible too that the summer was never quite warm enough for them to feel like it was fish-production time.

This does mean that we have a reasonably firm upper limit on the number of fish we need to take in. 53 appears to be it. And the winter’s been settling in, though, and we’ve started taking them in. This past day we took in twelve. That’s not bad for the first harvest and if we’re lucky we should have the pond emptied in a week or so. I’ll let folks know if there turn out to be a surprise in goldfish cardinality.

A couple months ago I wrote about the problem of counting the number of goldfish in the backyard pond. For those who’d missed it:

How To Count Fish, which presented a way to estimate a population by simply doing two samplings of the population.

How To Re-Count Fish, which described some of the numerical problems in estimation-based population samples.

How Not To Count Fish, which threatened to collapse the entire project under fiddly practical problems.

Spring finally arrived, and about a month ago we finally stopped having nights that touched freezing. So we moved the goldfish which had been wintering over in the basement out to the backyard. This also let us count just how many goldfish we’d caught, and I thought folks might like to know what the population did look like.

The counting didn’t require probabilistic methods this time. Instead we took the fish from the traps and set up a correspondence between them and an ordered subset of positive whole numbers. This is the way you describe “just counting” so that it sounds either ferociously difficult or like a game. Whether it’s difficult or a game depends on whether you were a parent or a student back when the New Math was a thing. My love and I were students.

Altogether then there were fifty goldfish that had wintered over in the stock tank in the basement: eight adults and 42 baby fish. (Possibly nine and 41; one of the darker goldfish is small for an adult, but large for a baby.) Over the spring I identified at least three baby fish that had wintered over outdoors successfully. It was a less harsh winter than the one before. So there are now at least 53 goldfish in the pond. There are surely more on the way, but we haven’t seen any new babies yet.

Also this spring we finally actually measured the pond. We’d previously estimated it to be about ten feet in diameter and two feet deep, implying a carrying capacity of about 60 goldfish if some other assumptions are made. Now we’ve learned it’s nearer twelve feet in diameter and twenty inches deep. Call that two meters radius and half a meter height. That’s a volume of about 6.3 cubic meters, or 6300 liters, or enough volume of water for about 80 goldfish. We’ll see what next fall brings.

I’d discussed a probability/sampling-based method to estimate the number of fish that might be in our pond out back, and then some of the errors that have to be handled if you want to have a reliable result. Now, I want to get into why the method doesn’t work, at least not without much greater insight into goldfish behavior than simply catching a couple and releasing them will do.

Catching a sample, re-releasing it, and counting how many of that sample we re-catch later on is a logically valid method, provided certain assumptions the method requires are accurately — or at least accurately enough — close to the way the actual thing works. Here are some of the ways goldfish fall short of the ideal.

First faulty assumption: Goldfish are perfectly identical. In this goldfish-trapped we make the assumption that there is some, fixed, constant probability of a goldfish being caught in the net. We have to assume that this is the same number for every goldfish, and that it doesn’t change as goldfish go through the experience of getting caught and then released. But goldfish have personality, as you learn if you have a bunch in a nice setting and do things like try feeding them koi treats or introduce something new like a wire-mesh trap to their environment. Some are adventurous and will explore the unfamiliar thing; some are shy and will let everyone else go first and then maybe not bother going at all. I empathize with both positions.

If there are enough goldfish, the variation between personalities is probably not going to matter much. There’ll be some that are easy to catch, and they’ll probably be roughly as common as the ones who can’t be coaxed into the trap at all. It won’t be exactly balanced unless we’re very lucky, but this would probably only throw off our calculations a little bit.

Whether the goldfish learn, and become more, or less, likely to be trapped in time is harder. Goldfish do learn, certainly, although it’s not obvious to me that the trapping and releasing experience would be one they draw much of a lesson from. It’s only a little inconvenience, really, and not at all harmful; what should they learn? Other than that there’s maybe an easy bit of food to be had here so why not go in? So this might change their behavior and it’s hard to predict how.

(I note that animal capture studies get quite frustrated when the animals start working out how to game the folks studying them. Bil Gilbert’s early-70s study of coatis — Latin American raccoons, written up in the lovely popularization Chulo: A Year Among The Coatimundis — was plagued by some coatis who figured out going into the trap was an easy, safe meal they’d be released from without harm, and wouldn’t go back about their business and leave room for other specimens.)

Second faulty assumption: Goldfish are not perfectly identical. This is the biggest challenge to counting goldfish population by re-catching a sample of them. How do you know if you caught a goldfish before? When they grow to adulthood, it’s not so bad, since they grow fairly distinctive patterns of orange and white and black and such, and they’ll usually settle into different sizes. (That said, we do have two adult fish who were very distinct when we first got them, but who’ve grown into near-twins.)

But baby goldfish? They’re basically all tiny black things, meant to hide into the mud at the bottom of ponds and rivers — their preferred habitat — and pretty near indistinguishable. As they get larger they get distinguishable, a bit, and start to grow patterns, but for the vast number of baby fish there’s just no telling one from another.

When we were trying to work out whether some mice we found in the house were ones we had previously caught and put out in the garage, we were able to mark them by squiring some food dye at their heads as they were released. The mice would rub the food dye from their heads onto their whole bodies and it would take a while before the dye would completely fade out. (We didn’t re-catch any mice, although it’s hard to dye a wild mouse efficiently because they will take off like bullets. Also one time when we thought we’d captured one there were actually three in the humane trap and you try squiring the food dye bottle at two more mice than you thought were there, fleeing.) But you can see how the food dye wouldn’t work here. Animal researchers with a budget might go on to attach collars or somehow otherwise mark animals, but if there’s a way to mark and track goldfish with ordinary household items I can’t think of it.

(No, we will not be taking the bits of americium in our smoke detectors out and injecting them into trapped goldfish; among the objections, I don’t have a radioactivity detector.)

Third faulty assumption: Goldfish are independent entities. The first two faulty assumptions are ones that could be kind of worked around. If there’s enough goldfish then the distribution of how likely any one is to get caught will probably be near enough normal that we can pretend there’s an identical chance of catching each, and if we really thought about it we could probably find some way of marking goldfish to tell if we re-caught any. Independence, though; this is the point on which so many probability-based schemes fall.

Independence, in the language of probability, is the principle that one thing’s happening does not affect the likelihood of another thing happening. For our problem, it’s the assumption that one goldfish being caught does not make it any more or less likely that another goldfish will be caught. We like independence, in studying probability. It makes so many problems easier to study, or even possible to study, and it often seems like a reasonable supposition.

A good number of interesting scientific discoveries amount to finding evidence that two things are not actually independent, and that one thing happening makes it more (or less) likely the other will. Sometimes these turn out to be vapor — there was a 19th-century notion suggesting a link between sunspot activity and economic depressions (because sunspots correlate to solar activity, which could affect agriculture, and up to 1893 the economy and agriculture were pretty much the same thing) — but when there is a link the results can be profound, as see the smoking-and-cancer link, or for something promising but still (to my understanding) under debate, the link between leaded gasoline and crime rates.

How this applies to the goldfish population problem, though, is that goldfish are social creatures. They school, loosely, forming and re-forming groups, and would much rather be around another goldfish than not. Even as babies they form these adorable tiny little schools; that may be in the hopes that someone else will get eaten by a bigger fish, but they keep hanging around other fish their own size through their whole lives. If there’s a goldfish inside the trap, it is hard to believe that other goldfish are not going to follow it just to be with the company.

Indeed, the first day we set out the trap for the winter, we pulled in all but one of the adult fish, all of whom apparently followed the others into the enclosure. I’m sorry I couldn’t photograph that because it was both adorable and funny to see so many fish just station-keeping beside one another — they were even all looking in the same direction — and waiting for whatever might happen next. Throughout the months we were able to spend bringing in fish, the best bait we could find was to have one fish already in the trap, and a couple days we did leave one fish in a few more hours or another night so that it would be joined by several companions the next time we checked.

So that’s something which foils the catch and re-catch scheme: goldfish are not independent entities. They’re happy to follow one another into trap. I would think the catch and re-catch scheme should be salvageable, if it were adapted to the way goldfish actually behave. But that requires a mathematician admitting that he can’t just blunder into a field with an obvious, simple scheme to solve a problem, and instead requires the specialized knowledge and experience of people who are experts in the field, and that of course can’t be done. (For example, I don’t actually know that goldfish behavior is sufficiently non-independent as to make an important difference in a population estimate of this kind. But someone who knew goldfish or carp well could tell me, or tell me how to find out.)

For those curious how the goldfish worked out, though, we were able to spend about two and a half months catching fish before the pond froze over for the winter, though the number we caught each week dropped off as the temperature dropped. We have them floating about in a stock tank in the basement, waiting for the coming of spring and the time the pond will be warm enough for them to re-occupy it. We also know that at least some of the goldfish we didn’t catch made it to, well, about a month ago. I’d seen one of the five orange baby fish who refused to go into the trap through a hole in the ice then. It was holding close to the bottom but seemed to be in good shape.

This coming year should be an exciting one for our fish population.

Last week I chatted a bit with a probabilistic, sampling-based method to estimate the population of fish in our backyard pond. The method estimates the population of a thing, in this case the fish, by capturing a sample of size and dividing that by the probability of catching one of the things in your sampling. Since we might know know the chance of catching the thing beforehand, we estimate it: catch some number of the fish or whatever, then put them back, and then re-catch as many. Some number of those will be re-caught, so we can estimate the chance of catching one fish as . So the original population will be somewhere about .

I want to talk a little bit about why that won’t work.

There is of course the obvious reason to think this will go wrong; it amounts to exactly the same reason why a baseball player with a .250 batting average — meaning the player can expect to get a hit in one out of every four at-bats — might go an entire game without getting on base, or might get on base three times in four at-bats. If something has chances to happen, and it has a probability of happening at every chance, it’s most likely that it will happen times, but it can happen more or fewer times than that. Indeed, we’d get a little suspicious if it happened exactly times. If we flipped a fair coin twenty times, it’s most likely to come up tails ten times, but there’s nothing odd about it coming up tails only eight or as many as fourteen times, and it’d stand out if it always came up tails exactly ten times.

To apply this to the fish problem: suppose that there are fish in the pond; that 50 is the number we want to get. And suppose we know for a fact that every fish has a 12.5 percent chance — — of being caught in our trap. Ignore for right now how we know that probability; just pretend we can count on that being exactly true. The expectation value, the most probable number of fish to catch in any attempt, is fish, which presents our first obvious problem. Well, maybe a fish might be wriggling around the edge of the net and fall out as we pull the trap out. (This actually happened as I was pulling some of the baby fish in for the winter.)

With these numbers it’s most probable to catch six fish, slightly less probable to catch seven fish, less probable yet to catch five, then eight and so on. But these are all tolerably plausible numbers. I used a mathematics package (Octave, an open-source clone of Matlab) to run ten simulated catches, from fifty fish each with a probability of .125 of being caught, and came out with these sizes for the fish harvests:

M =

4

6

3

6

7

7

5

7

8

9

Since we know, by some method, that the chance of catching any one fish is exactly 0.125, this implies fish populations of:

M =

4

6

3

6

7

7

5

7

8

9

N =

32

48

24

48

56

56

40

56

64

72

Now, none of these is the right number, although 48 is respectably close and 56 isn’t too bad. But the range is hilarious: there might be as few as 24 or as many as 72 fish, based on just this evidence. That might as well be guessing.

This is essentially a matter of error analysis. Any one attempt at catching fish may be faulty, because the fish are too shy of the trap, or too eager to leap into it, or are just being difficult for some reason. But we can correct for the flaws of one attempt at fish-counting by repeating the experiment. We can’t always be unlucky in the same ways.

This is conceptually easy, and extremely easy to do on the computer; it’s a little harder in real life but certainly within the bounds of our research budget, since I just have to go out back and put the trap out. And redoing the experiment even pays off, too: average those population samples from the ten simulated runs there and we get a mean estimated fish population of 49.6, which is basically dead on.

(That was lucky, I must admit. Ten attempts isn’t really enough to make the variation comfortably small. Another run with ten simulated catchings produced a mean estimate population of 56; the next one … well, 49.6 again, but the one after that gave me 64. It isn’t until we get into a couple dozen attempts that the mean population estimate gets reliably close to fifty. Still, the work is essentially the same as the problem of “I flipped a fair coin some number of times; it came up tails ten times. How many times did I flip it?” It might have been any number ten or above, but I most probably flipped it about twenty times, and twenty would be your best guess absent more information.)

The same problem affects working out what the probability of catching a fish is, since we do that by catching some small number of fish and then seeing how many some smaller number of them we re-catch later on. Suppose the probability of catching a fish really is , but we’re only trying to catch fish. Here’s a couple rounds of ten simulated catchings of six fish, and how many of those were re-caught:

2

0

1

0

1

0

1

0

0

1

2

0

1

1

0

3

0

0

1

1

0

1

0

1

0

0

1

0

0

0

1

0

0

0

0

0

0

0

2

1

Obviously any one of those indicates a probability ranging from 0 to 0.5 of re-catching a fish. Technically, yes, 0.125 is a number between 0 and 0.5, but it hasn’t really shown itself. But if we average out all these probabilities … well, those forty attempts give us a mean estimated probability of 0.092. This isn’t excellent but at least it’s in range. If we keep doing the experiment we’d get do better; one simulated batch of a hundred experiments turned up a mean estimated probability of 0.12833. (And there’s variations, of course; another batch of 100 attempts estimated the probability at 0.13333, and then the next at 0.10667, though if you use all three hundred of these that gets to an average of 0.12278, which isn’t too bad.)

This inconvenience amounts to a problem of working with small numbers in the original fish population, in the number of fish sampled in any one catching, and in the number of catches done to estimate their population. Small numbers tend to be problems for probability and statistics; the tools grow much more powerful and much more precise when they can work with enormously large collections of things. If the backyard pond held infinitely many fish we could have a much better idea of how many fish were in it.

[ According to the WordPress statistics, trapezoids are just the hook bringing people into here. I didn’t realize there was such a big community of people who need trapezoid information. If I did I’d have played up my search engine terms more. ]

If anyone had doubts about using polynomials as a generally good thing I hope either the doubts or the doubters are quieted now. My next couple goals are simple ones: I want to set up polynomials to interpolate what the population of Charlotte, North Carolina, was around 1975. That is, I’ll be creating at least one equation of the form where somehow the right choices of numbers for , et cetera will mean if I put the right number in for x I’ll get out of it an estimate of the population. I’ve got symbols. I need to figure what I want them to mean.

[ Curious: one of the search engine terms which brought people here yesterday was “inner obnoxious”. I can think of when I’d used the words together, eg, in a phrase like “your inner obnoxious twelve-year-old”, the person who makes any kind of attempt at instruction difficult. But who’s searching for that? I find also that “the gil blog by norm feuti” and “heavenly nostrils” brought me visitors so, good for everyone, I think. ]

So polynomials have a number of really nice properties. They’re easy to work with, which is a big one. We might work with difficult mathematical objects, but, rather as with people, we’ll only work with the difficult if they offer something worthwhile in trade, such as solving problems we otherwise can’t hope to tackle. Polynomials are nice and friendly, uncomplaining, and as mathematical objects go, quite un-difficult. Polynomials can be used to approximate any function, which is another big one, as long as we don’t take that “any function” too literally. We still have to think about it some. But here’s an advantage so big it’s almost invisible: to evaluate a polynomial we take some number x and raise it to a variety of powers, which we get by multiplying x by itself over and over again. We take each of those powers and multiply them by a corresponding number, a coefficient. We then add up the products of those coefficients with those powers of x. In all that time we’ve done something great.

Polynomials turn up all over the place. There are multiple good reasons for this. For one, suppose we have any continuous function that we want to study. (“Continuous” has a technical definition, although if you imagine what we might mean by that in ordinary English — that we could draw it without having to lift pen from paper — you’ve got it, apart from freak cases designed to confuse students taking real analysis by making continuous functions that don’t look anything like something you could ever draw, which is jolly good fun until the grades are returned.) If we’re willing to accept a certain margin of error around that function, though, we can always find a polynomial that’s within that margin of error of the function we really want to study. I have read, albeit in secondary sources, that for a while in the 18th century it was thought that a mathematician could just as well define a function as “something that a polynomial can approximate”.

I have a couple of other thoughts about these piecewise constant functions which I’ve been using to make interpolations. The basic idea is simple enough; we pretend the population of Charlotte was a constant number, the 840,347 it happened to be on the 1970 Census Day, and then leapt upwards at some point to the 971,391 it would have on the 1980 Census Day. Maybe it leapt up immediately after the 1970 Census; maybe immediately before the 1980; maybe at the exact middle moment between the two; maybe some other day. Are those all the options we have?

[ We didn’t break 3,100 yet, and too bad that. But over the day I did get my first readers from Turkey and the second from the United Arab Emirates that I’ve noticed. Also while my many posts about trapezoids are drawing search engine results, “frazz sequins” comes up a lot. ]

I think I’ve managed, more or less, acceptance that a piecewise constant interpolation makes the simplest way to estimate the population of Charlotte, North Carolina, when all I had to work with was the population data from the 1970 and the 1980 censuses. In 1970 the city had 840,347 people; in 1980 it had 971,391, and therefore the easiest guess to the population in 1975 would be the 1970 value, of 840,347. We suppose that on the 1st of April, 1970 — that Census Day — the population was the lower value, and then sometime before the 1st of April, 1980, it leapt up at once by the 131,044-person difference. Only … how do I know the population jumped up sometime after 1975?

[ I’d like to thank all who’ve read me or passed on links to me for getting my total hit count above 3,000. In fact, as I write this, the total seems to be 3,033, which is a pleasantly 3-ish number. I suppose that it’s ungrateful to look for 4,000 right away, but after all, I do hope to be interesting or useful, and both of those seem to correlate pretty strongly with being read. In any case, I’ll see how long it takes to reach 3,100, and be silent about that if it’s a number of days too embarrassing to mention. ]

The task I’ve set myself is finding an approximation to the population of Charlotte, North Carolina, for the year 1975. The tools I have on hand are the data that I’m fairly sure I believe for Charlotte’s population in 1970 and in 1980. I have to accept one thing or I’ll be hopelessly disappointed ever after: I’m not going to get the right answer. I’m not going to do my job badly, at least not on purpose; it’s just that — barring a remarkable stroke of luck — I won’t get Charlotte’s actual 1975 population. That’s the nature of interpolations (and extrapolations). But there are degrees of wrongness. Guessing that Charlotte had no people in it in 1975, or twenty millions of people, would be obviously ridiculously wrong. Guessing that it had somewhere between 840,347 (its 1970 Census population) and 971,391 (its 1980 Census population) seems much more plausible. So let me make my first interpolation to Charlotte’s 1975 population.

[ I’m grateful to all for the help in reading my pages here. I’ve not quite reached 3,000 hits, but it’s within sight. If you do know of people who might be interested in either what I’m doing now — and it should be clearer after today’s post — or articles I’ve written in the past, please let them know, or let me know if I could be doing better at reaching interested audiences. ]

I left off the list of places I’d lived the city of Charlotte, North Carolina. There’s justice in my doing so. We lived there only for a couple years, when I was extremely young. I have only a few memories of the place, most of them based on the popcorn machine they had in my preschool program. I don’t know what else I got out of that, but I certainly appreciated seeing popcorn pop. Also I had two brothers born then. But, mostly, I can’t say that Charlotte made much of an impression on me. I couldn’t identify any major features of it from memory, and challenged to point to it on a map I might point at Delaware instead, or wander off to find a soda. Plus, I last lived there somewhere around 1975. I can accept that the population of South Amboy, New Jersey, may not have changed very much since the mid-1970s, but not that Charlotte’s hasn’t.

[ I don’t wish to be too shameless here, but I’m closing in on 3,000 visitors to my little blog here. Can we get there? Kindly pass on a reference to people you think might be interested; if I matched my most-popular-ever day I’d reach 3,000 tonight easily. ]

I’ve lived almost my entire life in New Jersey, which has its effects on my world view; for example, it produces an extreme defensiveness about the state — really, has there been a fresh Jersey Joke since Benjamin Franklin’s quip about it being “a barrel tapped at both ends”, and they’re not even sure it wasn’t James Madison who said that instead, if anyone ever did? — and a feeling that one should refer to Bruce Springsteen as “Bruce”, as if we’d ever knowingly been in the same zip code simultaneously. Add to that not understanding what is wrong with other states that you’re forced to pump your own gas, and not being able to get a cackling laughter and a voice-over announcer wailing “Rrrrrrrrraceway Park!” out of the head, and you’ve got a first sketch of my personality. (I seem to have missed going to Action Park. My father insists he took me there; I grant he may have taken my siblings, but I don’t remember ever getting there, and the fact I have all my limbs suggests I never did go there.) But there are some other impressions that one gets from growing up in New Jersey.