Tagged: estimation Toggle Comment Threads | Keyboard Shortcuts

  • Joseph Nebus 6:00 pm on Thursday, 16 February, 2017 Permalink | Reply
    Tags: , , Crock, estimation, , Mr Lowe, , , ,   

    Reading the Comics, February 11, 2017: Trivia Edition 

    And now to wrap up last week’s mathematically-themed comic strips. It’s not a set that let me get into any really deep topics however hard I tried overthinking it. Maybe something will turn up for Sunday.

    Mason Mastroianni, Mick Mastroianni, and Perri Hart’s B.C. for the 7th tries setting arithmetic versus celebrity trivia. It’s for the old joke about what everyone should know versus what everyone does know. One might question whether Kardashian pet eating habits are actually things everyone knows. But the joke needs some hyperbole in it to have any vitality and that’s the only available spot for it. It’s easy also to rate stuff like arithmetic as trivia since, you know, calculators. But it is worth knowing that seven squared is pretty close to 50. It comes up when you do a lot of estimates of calculations in your head. The square root of 10 is pretty near 3. The square root of 50 is near 7. The cube root of 10 is a little more than 2. The cube root of 50 a little more than three and a half. The cube root of 100 is a little more than four and a half. When you see ways to rewrite a calculation in estimates like this, suddenly, a lot of amazing tricks become possible.

    Leigh Rubin’s Rubes for the 7th is a “mathematics in the real world” joke. It could be done with any mythological animals, although I suppose unicorns have the advantage of being relatively easy to draw recognizably. Mermaids would do well too. Dragons would also read well, but they’re more complicated to draw.

    Mark Pett’s Mr Lowe rerun for the 8th has the kid resisting the mathematics book. Quentin’s grounds are that how can he know a dated book is still relevant. There’s truth to Quentin’s excuse. A mathematical truth may be universal. Whether we find it interesting is a matter of culture and even fashion. There are many ways to present any fact, and the question of why we want to know this fact has as many potential answers as it has people pondering the question.

    Zach Weinersmith’s Saturday Morning Breakfast Cereal for the 8th is a paean to one of the joys of numbers. There is something wonderful in counting, in measuring, in tracking. I suspect it’s nearly universal. We see it reflected in people passing around, say, the number of rivets used in the Chrysler Building or how long a person’s nervous system would reach if stretched out into a line or ever-more-fanciful measures of stuff. Is it properly mathematics? It’s delightful, isn’t that enough?

    Scott Hilburn’s The Argyle Sweater for the 10th is a Fibonacci Sequence joke. That’s a good one for taping to the walls of a mathematics teacher’s office.

    'Did you ever take a date to a drive-in movie in high school?' 'Once, but she went to the concession stand and never came back.' 'Did you wonder why?' 'Yeah, but I kept on doing my math homework.'

    Bill Rechin’s Crock rerun for the 11th of February, 2017. They actually opened a brand-new drive-in theater something like forty minutes away from us a couple years back. We haven’t had the chance to get there. But we did get to one a fair bit farther away where yes, we saw Turbo, that movie about the snail that races in the Indianapolis 500. The movie was everything we hoped for and it’s just a shame Roger Ebert died too young to review it for us.

    Bill Rechin’s Crock rerun for the 11th is a name-drop of mathematics. Really anybody’s homework would be sufficiently boring for the joke. But I suppose mathematics adds the connotation that whatever you’re working on hasn’t got a human story behind it, the way English or History might, and that it hasn’t got the potential to eat, explode, or knock a steel ball into you the way Biology, Chemistry, or Physics have. Fair enough.

  • Joseph Nebus 3:00 pm on Sunday, 27 March, 2016 Permalink | Reply
    Tags: , estimation, , , ,   

    Fish, Re-Counted 

    We had a bit of a surprise with our goldfish. Longtime readers might remember my string of essays describing how one might count fish by something other than the process of actually counting them all. In preparing our pond, which isn’t deep enough to be safe against a harsh winter, last October we set out trap and caught, we believed, all of them. There turned out to be 53 of them when I posted an update.

    Several dozen goldfish, most of them babies, within a 150-gallon rubber stock tank, their wintering home.

    Stock photograph of our goldfish in a stock tank for the winter. Previous winter.

    A couple of weeks after that — on Thanksgiving, it happens — we caught one more fish. This brought the total to 54. And I either failed to make note of it or I can’t find the note I made of it. Such happens.

    In getting the pond ready for the spring, and the return of our goldfish to the outdoors, we found another one! It was just this orange thing dug into the muck of the pool, and we thought initially it was something that had fallen in and gotten lost. A heron scarer, was my love’s first guess. The pond thermometer that sank without trace some years back was mine. I used the grabber to poke at it and woke up a pretty sulky goldfish. It went over to some algae where we couldn’t so easily bother it.

    So that brings our fish count to 55, for those keeping track. Fortunately, it was a very gentle winter in our parts. We’re hoping to bring the goldfish back out to the pond in the next week or two. Our best estimate for the carrying capacity of the pond is 65 to 130 goldfish, so, we will see whether the goldfish do anything about this slight underpopulation.

  • Joseph Nebus 12:00 pm on Tuesday, 27 October, 2015 Permalink | Reply
    Tags: estimation, , ,   

    Fish Autumn Update 

    Folks who’ve been around a while may remember the matter of our fish. I’d spent some time in the spring describing ways to estimate a population using techniques other than just counting everybody. And then revealed that the population of goldfish in our pond was something like 53, based on counting the fifty which we’d had wintering over in our basement and the three we counted in the pond despite the winter ice. This is known as determining the population “by inspection”.

    It's a pond about ten feet across and maybe two feet deep, with (at the time of the photograph) at most eleven fish in it.

    This is the backyard pond; pictured are several fish, though not all of them.

    I’m disappointed to say that, as best we can work out, they didn’t get around to producing any new goldfish this year. We didn’t see any evidence of babies, and haven’t seen any noticeably small ones swimming around. It’s possible we set them out too late in the spring. It’s possible too that the summer was never quite warm enough for them to feel like it was fish-production time.

    This does mean that we have a reasonably firm upper limit on the number of fish we need to take in. 53 appears to be it. And the winter’s been settling in, though, and we’ve started taking them in. This past day we took in twelve. That’s not bad for the first harvest and if we’re lucky we should have the pond emptied in a week or so. I’ll let folks know if there turn out to be a surprise in goldfish cardinality.

  • Joseph Nebus 2:34 pm on Wednesday, 3 June, 2015 Permalink | Reply
    Tags: , , , , , estimation, , ,   

    A Summer 2015 Mathematics A To Z: error 


    This is one of my A to Z words that everyone knows. An error is some mistake, evidence of our human failings, to be minimized at all costs. That’s … well, it’s an attitude that doesn’t let you use error as a tool.

    An error is the difference between what we would like to know and what we do know. Usually, what we would like to know is something hard to work out. Sometimes it requires complicated work. Sometimes it requires an infinite amount of work to get exactly right. Who has the time for that?

    This is how we use errors. We look for methods that approximate the thing we want, and that estimate how much of an error that method makes. Usually, the method involves doing some basic step some large number of times. And usually, if we did the step more times, the estimate of the error we make will be smaller. My essay “Calculating Pi Less Terribly” shows an example of this. If we add together more terms from that Leibniz formula we get a running total that’s closer to the actual value of π.

    (More …)

    • baffledbaboon 3:13 pm on Wednesday, 3 June, 2015 Permalink | Reply

      Whenever I make an error – my partner likes to tell me that I “broke math”. This all stemming from the one time I was given all the steps to the problem and still got an answer that wasn’t even close.


      • Joseph Nebus 10:41 pm on Friday, 5 June, 2015 Permalink | Reply

        Aw, that sort of thing happens to everybody. Mathematicians especially. There’s a bit of folklore that says to never give an arithmetic problem to a mathematician because even if you ever do get an answer from it, it won’t be anything near right.


  • Joseph Nebus 7:06 pm on Sunday, 17 May, 2015 Permalink | Reply
    Tags: alternating series, , , estimation, Funky Winkerbean, , Leibniz, , ,   

    Calculating Pi Less Terribly 

    Back on “Pi Day” I shared a terrible way of calculating the digits of π. It’s neat in principle, yes. Drop a needle randomly on a uniformly lined surface. Keep track of how often the needle crosses over a line. From this you can work out the numerical value of π. But it’s a terrible method. To be sure that π is about 3.14, rather than 3.12 or 3.38, you can expect to need to do over three and a third million needle-drops. So I described this as a terrible way to calculate π.

    A friend on Twitter asked if it was worse than adding up 4 * (1 – 1/3 + 1/5 – 1/7 + … ). It’s a good question. The answer is yes, it’s far worse than that. But I want to talk about working π out that way.

    Science teacher Mark Twain says if you divide the diameter of the moon into its circumference you get pi in the sky, and then laughs so hard at his own joke it causes chest pains.

    Tom Batiuk’s Funky Winkerbean for the 17th of May, 2015. The worst part of this strip is Science Teacher Mark Twain will go back to the teachers’ lounge and complain that none of his students got it.

    This isn’t part of the main post. But the comic strip happened to mention π on a day when I’m talking about π so who am I to resist coincidence?

    (More …)

    • Matthew Wright 9:30 pm on Sunday, 17 May, 2015 Permalink | Reply

      I tried memorising pi once, but for some reason I couldn’t finish. It wasn’t very rational of me. I sort of had to say that. (Actually, I probably didn’t…)


      • Joseph Nebus 5:27 pm on Wednesday, 20 May, 2015 Permalink | Reply

        Aw, not to fear. I don’t think worse of you for saying it. It is the kind of joke people have to say, after all.


    • abyssbrain 3:40 am on Monday, 18 May, 2015 Permalink | Reply

      It’s really difficult to manually calculate pi using a series. William Shanks claimed to have calculated pi manually up to more than 700 digits using the Machin’s formula,

      \frac{\pi}{4}=4\arctan \frac{1}{5}-\arctan \frac{1}{239}

      but he erred on the 528th digit, I think. It was a very amazing achievement nonetheless.


      • Joseph Nebus 5:31 pm on Wednesday, 20 May, 2015 Permalink | Reply

        Shanks’s case is interesting, not just because of his great work and tragic error. There is also that museum rotunda that tries to honor him by displaying the digits of pi; it was built before his error was found.

        So the question is: keep the digits he calculated which are wrong, or replace them with the digits he would have calculated had he done the work right? Bearing in mind the purpose is to honor Shanks’s work, and nobody is going to get the digits of pi from reading what is essentially a piece of memorial art.

        Liked by 1 person

    • Chow Kim Wan 1:47 am on Wednesday, 3 June, 2015 Permalink | Reply

      From what I know, the Gregory-Leibniz series, while theoretically correct, converges very slowly to the desired value. I tried it once, up to around eight hundred terms. It was nightmare trying to get the figure to converge to a reasonably good number of decimal places. Some other formulas are more useful for this purpose. This series remains one of theoretical interest and mathematical beauty.


      • Joseph Nebus 10:40 pm on Friday, 5 June, 2015 Permalink | Reply

        Oh, there’s no need to disparage the series as ‘theoretically’ correct; it’s right, no question about that. It’s just a matter of how much work is required to get what you want out of it. As series approximations for pi go, it’s not very efficient. It takes a lot of work to get a few meager decimal places right. But at least it’s very easy to understand.

        If you were stranded on a desert island and needed to calculate the digits of pi for some reason, you could remember this formula well enough and work out its terms well enough. Other formulas would get you more decimal places with fewer terms being calculated, but you have to remember and apply the formulas, and that’s a pain.

        Interestingly, it’s possible to calculate an arbitrary binary digit of pi without working out all the binary digits that come before it. There’s no way to do that for the decimal digits of pi; I forget whether there’s merely no known way to do that, or if it’s known to be impossible to do that. But the result is if you wanted to know just the (say) 2,038 trillionth binary digit of pi, you could work that out without knowing anything about the 2,037,999,999,999,999 digits that came before it.

        Liked by 1 person

  • Joseph Nebus 5:44 pm on Saturday, 4 April, 2015 Permalink | Reply
    Tags: , , , estimation, , , ,   

    But How Interesting Is A Basketball Score? 

    When I worked out how interesting, in an information-theory sense, a basketball game — and from that, a tournament — might be, I supposed there was only one thing that might be interesting about the game: who won? Or to be exact, “did (this team) win”? But that isn’t everything we might want to know about a game. For example, we might want to know what a team scored. People often do. So how to measure this?

    The answer was given, in embryo, in my first piece about how interesting a game might be. If you can list all the possible outcomes of something that has multiple outcomes, and how probable each of those outcomes is, then you can describe how much information there is in knowing the result. It’s the sum, for all of the possible results, of the quantity negative one times the probability of the result times the logarithm-base-two of the probability of the result. When we were interested in only whether a team won or lost there were just the two outcomes possible, which made for some fairly simple calculations, and indicates that the information content of a game can be as high as 1 — if the team is equally likely to win or to lose — or as low as 0 — if the team is sure to win, or sure to lose. And the units of this measure are bits, the same kind of thing we use to measure (in groups of bits called bytes) how big a computer file is.

    (More …)

  • Joseph Nebus 6:46 pm on Monday, 30 March, 2015 Permalink | Reply
    Tags: , , , estimation, , , ,   

    But How Interesting Is A Real Basketball Tournament? 

    When I wrote about how interesting the results of a basketball tournament were, and came to the conclusion that it was 63 (and filled in that I meant 63 bits of information), I was careful to say that the outcome of a basketball game between two evenly-matched opponents has an information content of 1 bit. If the game is a foregone conclusion, then the game hasn’t got so much information about it. If the game really is foregone, the information content is 0 bits; you already know what the result will be. If the game is an almost sure thing, there’s very little information to be had from actually seeing the game. An upset might be thrilling to watch, but you would hardly count on that, if you’re being rational. But most games aren’t sure things; we might expect the higher-seed to win, but it’s plausible they don’t. How does that affect how much information there is in the results of a tournament?

    Last year, the NCAA College Men’s Basketball tournament inspired me to look up what the outcomes of various types of matches were, and which teams were more likely to win than others. If some person who wrote something for statistics.about.com is correct, based on 27 years of March Madness outcomes, the play between a number one and a number 16 seed is a foregone conclusion — the number one seed always wins — while number two versus number 15 is nearly sure. So while the first round of play will involve 32 games — four regions, each region having eight games — there’ll be something less than 32 bits of information in all these games, since many of them are so predictable.

    If we take the results from that statistics.about.com page as accurate and reliable as a way of predicting the outcomes of various-seeded teams, then we can estimate the information content of the first round of play at least.

    Here’s how I work it out, anyway:

    Contest Probability the Higher Seed Wins Information Content of this Outcome
    #1 seed vs #16 seed 100% 0 bits
    #2 seed vs #15 seed 96% 0.2423 bits
    #3 seed vs #14 seed 85% 0.6098 bits
    #4 seed vs #13 seed 79% 0.7415 bits
    #5 seed vs #12 seed 67% 0.9149 bits
    #6 seed vs #11 seed 67% 0.9149 bits
    #7 seed vs #10 seed 60% 0.9710 bits
    #8 seed vs #9 seed 47% 0.9974 bits

    So if the eight contests in a single region were all evenly matched, the information content of that region would be 8 bits. But there’s one sure and one nearly-sure game in there, and there’s only a couple games where the two teams are close to evenly matched. As a result, I make out the information content of a single region to be about 5.392 bits of information. Since there’s four regions, that means the first round of play — the first 32 games — have altogether about 21.567 bits of information.

    Warning: I used three digits past the decimal point just because three is a nice comfortable number. Do not by hypnotized into thinking this is a more precise measure than it really is. I don’t know what the precise chance of, say, a number three seed beating a number fourteen seed is; all I know is that in a 27-year sample, it happened the higher-seed won 85 percent of the time, so the chance of the higher-seed winning is probably close to 85 percent. And I only know that if whoever it was wrote this article actually gathered and processed and reported the information correctly. I would not be at all surprised if the first round turned out to have only 21.565 bits of information, or as many as 21.568.

    A statistical analysis of the tournaments which I dug up last year indicated that in the last three rounds — the Elite Eight, Final Four, and championship game — the higher- and lower-seeded teams are equally likely to win, and therefore those games have an information content of 1 bit per game. The last three rounds therefore have 7 bits of information total.

    Unfortunately, experimental data seems to fall short for the second round — 16 games, where the 32 winners in the first round play, producing the Sweet Sixteen teams — and the third round — 8 games, producing the Elite Eight. If someone’s done a study of how often the higher-seeded team wins I haven’t run across it.

    There are six of these games in each of the four regions, for 24 games total. Presumably the higher-seeded is more likely than the lower-seeded to win, but I don’t know how much more probable it is the higher-seed will win. I can come up with some bounds: the 24 games total in the second and third rounds can’t have an information content less than 0 bits, since they’re not all foregone conclusions. The higher-ranked seed won’t win all the time. And they can’t have an information content of more than 24 bits, since that’s how much there would be if the games were perfectly even matches.

    So, then: the first round carries about 21.567 bits of information. The second and third rounds carry between 0 and 24 bits. The fourth through sixth rounds (the sixth round is the championship game) carry seven bits. Overall, the 63 games of the tournament carry between 28.567 and 52.567 bits of information. I would expect that many of the second-round and most of the third-round games are pretty close to even matches, so I would expect the higher end of that range to be closer to the true information content.

    Let me make the assumption that in this second and third round the higher-seed has roughly a chance of 75 percent of beating the lower seed. That’s a number taken pretty arbitrarily as one that sounds like a plausible but not excessive advantage the higher-seeded teams might have. (It happens it’s close to the average you get of the higher-seed beating the lower-seed in the first round of play, something that I took as confirming my intuition about a plausible advantage the higher seed has.) If, in the second and third rounds, the higher-seed wins 75 percent of the time and the lower-seed 25 percent, then the outcome of each game is about 0.8113 bits of information. Since there are 24 games total in the second and third rounds, that suggests the second and third rounds carry about 19.471 bits of information.

    Warning: Again, I went to three digits past the decimal just because three digits looks nice. Given that I do not actually know the chance a higher-seed beats a lower-seed in these rounds, and that I just made up a number that seems plausible you should not be surprised if the actual information content turns out to be 19.468 or even 19.472 bits of information.

    Taking all these numbers, though — the first round with its something like 21.567 bits of information; the second and third rounds with something like 19.471 bits; the fourth through sixth rounds with 7 bits — the conclusion is that the win/loss results of the entire 63-game tournament are about 48 bits of information. It’s a bit higher the more unpredictable the games involving the final 32 and the Sweet 16 are; it’s a bit lower the more foregone those conclusions are. But 48 bits sounds like a plausible enough answer to me.

  • Joseph Nebus 6:39 pm on Thursday, 26 February, 2015 Permalink | Reply
    Tags: animal research, , estimation, , , , mice, , ,   

    How Not To Count Fish 

    I’d discussed a probability/sampling-based method to estimate the number of fish that might be in our pond out back, and then some of the errors that have to be handled if you want to have a reliable result. Now, I want to get into why the method doesn’t work, at least not without much greater insight into goldfish behavior than simply catching a couple and releasing them will do.

    Catching a sample, re-releasing it, and counting how many of that sample we re-catch later on is a logically valid method, provided certain assumptions the method requires are accurately — or at least accurately enough — close to the way the actual thing works. Here are some of the ways goldfish fall short of the ideal.

    First faulty assumption: Goldfish are perfectly identical. In this goldfish-trapped we make the assumption that there is some, fixed, constant probability of a goldfish being caught in the net. We have to assume that this is the same number for every goldfish, and that it doesn’t change as goldfish go through the experience of getting caught and then released. But goldfish have personality, as you learn if you have a bunch in a nice setting and do things like try feeding them koi treats or introduce something new like a wire-mesh trap to their environment. Some are adventurous and will explore the unfamiliar thing; some are shy and will let everyone else go first and then maybe not bother going at all. I empathize with both positions.

    If there are enough goldfish, the variation between personalities is probably not going to matter much. There’ll be some that are easy to catch, and they’ll probably be roughly as common as the ones who can’t be coaxed into the trap at all. It won’t be exactly balanced unless we’re very lucky, but this would probably only throw off our calculations a little bit.

    Whether the goldfish learn, and become more, or less, likely to be trapped in time is harder. Goldfish do learn, certainly, although it’s not obvious to me that the trapping and releasing experience would be one they draw much of a lesson from. It’s only a little inconvenience, really, and not at all harmful; what should they learn? Other than that there’s maybe an easy bit of food to be had here so why not go in? So this might change their behavior and it’s hard to predict how.

    (I note that animal capture studies get quite frustrated when the animals start working out how to game the folks studying them. Bil Gilbert’s early-70s study of coatis — Latin American raccoons, written up in the lovely popularization Chulo: A Year Among The Coatimundis — was plagued by some coatis who figured out going into the trap was an easy, safe meal they’d be released from without harm, and wouldn’t go back about their business and leave room for other specimens.)

    Second faulty assumption: Goldfish are not perfectly identical. This is the biggest challenge to counting goldfish population by re-catching a sample of them. How do you know if you caught a goldfish before? When they grow to adulthood, it’s not so bad, since they grow fairly distinctive patterns of orange and white and black and such, and they’ll usually settle into different sizes. (That said, we do have two adult fish who were very distinct when we first got them, but who’ve grown into near-twins.)

    But baby goldfish? They’re basically all tiny black things, meant to hide into the mud at the bottom of ponds and rivers — their preferred habitat — and pretty near indistinguishable. As they get larger they get distinguishable, a bit, and start to grow patterns, but for the vast number of baby fish there’s just no telling one from another.

    When we were trying to work out whether some mice we found in the house were ones we had previously caught and put out in the garage, we were able to mark them by squiring some food dye at their heads as they were released. The mice would rub the food dye from their heads onto their whole bodies and it would take a while before the dye would completely fade out. (We didn’t re-catch any mice, although it’s hard to dye a wild mouse efficiently because they will take off like bullets. Also one time when we thought we’d captured one there were actually three in the humane trap and you try squiring the food dye bottle at two more mice than you thought were there, fleeing.) But you can see how the food dye wouldn’t work here. Animal researchers with a budget might go on to attach collars or somehow otherwise mark animals, but if there’s a way to mark and track goldfish with ordinary household items I can’t think of it.

    (No, we will not be taking the bits of americium in our smoke detectors out and injecting them into trapped goldfish; among the objections, I don’t have a radioactivity detector.)

    Third faulty assumption: Goldfish are independent entities. The first two faulty assumptions are ones that could be kind of worked around. If there’s enough goldfish then the distribution of how likely any one is to get caught will probably be near enough normal that we can pretend there’s an identical chance of catching each, and if we really thought about it we could probably find some way of marking goldfish to tell if we re-caught any. Independence, though; this is the point on which so many probability-based schemes fall.

    Independence, in the language of probability, is the principle that one thing’s happening does not affect the likelihood of another thing happening. For our problem, it’s the assumption that one goldfish being caught does not make it any more or less likely that another goldfish will be caught. We like independence, in studying probability. It makes so many problems easier to study, or even possible to study, and it often seems like a reasonable supposition.

    A good number of interesting scientific discoveries amount to finding evidence that two things are not actually independent, and that one thing happening makes it more (or less) likely the other will. Sometimes these turn out to be vapor — there was a 19th-century notion suggesting a link between sunspot activity and economic depressions (because sunspots correlate to solar activity, which could affect agriculture, and up to 1893 the economy and agriculture were pretty much the same thing) — but when there is a link the results can be profound, as see the smoking-and-cancer link, or for something promising but still (to my understanding) under debate, the link between leaded gasoline and crime rates.

    How this applies to the goldfish population problem, though, is that goldfish are social creatures. They school, loosely, forming and re-forming groups, and would much rather be around another goldfish than not. Even as babies they form these adorable tiny little schools; that may be in the hopes that someone else will get eaten by a bigger fish, but they keep hanging around other fish their own size through their whole lives. If there’s a goldfish inside the trap, it is hard to believe that other goldfish are not going to follow it just to be with the company.

    Indeed, the first day we set out the trap for the winter, we pulled in all but one of the adult fish, all of whom apparently followed the others into the enclosure. I’m sorry I couldn’t photograph that because it was both adorable and funny to see so many fish just station-keeping beside one another — they were even all looking in the same direction — and waiting for whatever might happen next. Throughout the months we were able to spend bringing in fish, the best bait we could find was to have one fish already in the trap, and a couple days we did leave one fish in a few more hours or another night so that it would be joined by several companions the next time we checked.

    So that’s something which foils the catch and re-catch scheme: goldfish are not independent entities. They’re happy to follow one another into trap. I would think the catch and re-catch scheme should be salvageable, if it were adapted to the way goldfish actually behave. But that requires a mathematician admitting that he can’t just blunder into a field with an obvious, simple scheme to solve a problem, and instead requires the specialized knowledge and experience of people who are experts in the field, and that of course can’t be done. (For example, I don’t actually know that goldfish behavior is sufficiently non-independent as to make an important difference in a population estimate of this kind. But someone who knew goldfish or carp well could tell me, or tell me how to find out.)

    Several dozen goldfish, most of them babies, within a 150-gallon rubber stock tank, their wintering home.

    Goldfish brought indoors, to a stock tank, for the winter.

    For those curious how the goldfish worked out, though, we were able to spend about two and a half months catching fish before the pond froze over for the winter, though the number we caught each week dropped off as the temperature dropped. We have them floating about in a stock tank in the basement, waiting for the coming of spring and the time the pond will be warm enough for them to re-occupy it. We also know that at least some of the goldfish we didn’t catch made it to, well, about a month ago. I’d seen one of the five orange baby fish who refused to go into the trap through a hole in the ice then. It was holding close to the bottom but seemed to be in good shape.

    This coming year should be an exciting one for our fish population.

    • AR 3:06 am on Friday, 27 February, 2015 Permalink | Reply

      I’m interested in this idea of statistical independence. By this measure, people are not very independent either!


      • Joseph Nebus 3:26 am on Friday, 27 February, 2015 Permalink | Reply

        Well, it depends on the frame, really. In some ways people can be treated as statistically independent entities: all the people in a mall food court, for example, could be modeled as equally likely as every other person to go to the McDonald’s, the sandwich shop, the Sbarro’s, or the mediterranean food grill, at least for the purpose of figuring out pedestrian traffic flows, or how tables would be best arranged, or matters like that.


        • AR 11:11 am on Friday, 27 February, 2015 Permalink | Reply

          Is that because there are so many people going through a mall, making group decision-making and following statistically unimportant?

          I think I see dependence sometimes on facebook or Amazon reviews… whoever makes the first comment on a post or leaves the first review on a movie usually sets the tone for the majority of the other reviews and comments. It seems like social media has revealed people’s overall lack of individuality and their tendency to follow. (Blogging is, of course, the shining exception – the online bastion of originality.)

          So for instance, I wonder what the outcome would be if one of those restaurants got models to walk in and out of their doors all day, and sit at outside tables laughing and eating and raving about the food.


          • Joseph Nebus 8:10 pm on Saturday, 28 February, 2015 Permalink | Reply

            Well, independence in this context just means that the probability that (say) this party is going to that food stall doesn’t depend on what any other parties are doing. That is, it doesn’t matter if the last four parties entering all went to the hot dog place; that doesn’t make the next party to enter more — or less — likely to go to the pizza place.

            (There are limits on this, of course. If there’s a line of 200 people at the hot dog place and nobody anywhere else, I would skip my hot dog plans. Or suppose that the hot dogs have got to be too good to pass up, I guess.)

            The sort of dependence you’re describing seems to be more of an anchoring effect, one of the strange and creepy aspects of human decision-making. If asked to give their opinion on something, people will tend to give an answer that’s close to whatever the last thing they heard was, even if it had nothing to do with whatever they’re supposed to decide. The tone-setting effect of the first comments is probably a reflection of that effect in less-quantifiable matters.

            Liked by 1 person

    • Aquileana 5:25 am on Friday, 27 February, 2015 Permalink | Reply

      This is so interesting Joseph… I agree with the commenter above as I found that the excerpt called faulty assumption: Goldfish are independent entities, were outstanding.
      Thanks for sharing this information and problems on statistics. Best wishes :star: Aquileana :D


    • Angie Mc 5:56 pm on Friday, 6 March, 2015 Permalink | Reply

      I’m going to read together with my 9 year old son who loves fish and real life math. Thanks, Joseph :D


      • Joseph Nebus 1:05 am on Sunday, 8 March, 2015 Permalink | Reply

        Aw, good. Hope you enjoy.

        We’re hoping the weather will be warm enough in the next month that we can transfer the fish back into the pond. I can think of a mathematical problem that results from this, but it’s a less obviously appealing one than simply “estimate the pond’s fish population”.

        Liked by 1 person

        • Angie Mc 4:16 am on Monday, 9 March, 2015 Permalink | Reply

          Have I mentioned my son-in-law who is a grad student in physics? Once he gets through prelims this spring (he’s hyper studying), I will send him your blog. I think he’ll really like it.


          • Joseph Nebus 12:01 am on Tuesday, 10 March, 2015 Permalink | Reply

            I don’t remember that you had. That’s neat and I do hope he enjoys, although I also remember keenly how much work went into prelims. Analysis particularly was a hard one for me to get through.

            Liked by 1 person

            • Angie Mc 1:13 am on Tuesday, 10 March, 2015 Permalink | Reply

              I love my son-in-law has a son. I met him when he was 19 and contemplating going to college for math. He’s first generation college and I couldn’t be more proud of him. We are close and I’m feeling his prelim pain!


              • Joseph Nebus 7:54 pm on Thursday, 12 March, 2015 Permalink | Reply

                Oh, that’s great for him. I hope the prelims go well. Grad school was wonderful at least for me, and my love, although it is a lot of very particular challenges, many of them more of endurance and tolerance for the grad school lifestyle than anything else. I was able to come out the far end successfully, but it was a close-run thing at points, with prelims one of those points.


  • Joseph Nebus 11:13 pm on Wednesday, 18 February, 2015 Permalink | Reply
    Tags: , error analysis, estimation, , , , ,   

    How To Re-Count Fish 

    Last week I chatted a bit with a probabilistic, sampling-based method to estimate the population of fish in our backyard pond. The method estimates the population N of a thing, in this case the fish, by capturing a sample of size M and dividing that M by the probability of catching one of the things in your sampling. Since we might know know the chance of catching the thing beforehand, we estimate it: catch some number n of the fish or whatever, then put them back, and then re-catch as many. Some number m of those will be re-caught, so we can estimate the chance of catching one fish as \frac{m}{n} . So the original population will be somewhere about N = M \div \frac{m}{n} = M \cdot \frac{n}{m} .

    I want to talk a little bit about why that won’t work.

    There is of course the obvious reason to think this will go wrong; it amounts to exactly the same reason why a baseball player with a .250 batting average — meaning the player can expect to get a hit in one out of every four at-bats — might go an entire game without getting on base, or might get on base three times in four at-bats. If something has N chances to happen, and it has a probability p of happening at every chance, it’s most likely that it will happen N \cdot p times, but it can happen more or fewer times than that. Indeed, we’d get a little suspicious if it happened exactly N \cdot p times. If we flipped a fair coin twenty times, it’s most likely to come up tails ten times, but there’s nothing odd about it coming up tails only eight or as many as fourteen times, and it’d stand out if it always came up tails exactly ten times.

    To apply this to the fish problem: suppose that there are N = 50 fish in the pond; that 50 is the number we want to get. And suppose we know for a fact that every fish has a 12.5 percent chance — p = 0.125 — of being caught in our trap. Ignore for right now how we know that probability; just pretend we can count on that being exactly true. The expectation value, the most probable number of fish to catch in any attempt, is N \cdot p = 50 \cdot 0.125 = 6.25 fish, which presents our first obvious problem. Well, maybe a fish might be wriggling around the edge of the net and fall out as we pull the trap out. (This actually happened as I was pulling some of the baby fish in for the winter.)

    It's a pond about ten feet across and maybe two feet deep, with (at the time of the photograph) at most eleven fish in it.

    This is the backyard pond; pictured are several fish, though not all of them.

    With these numbers it’s most probable to catch six fish, slightly less probable to catch seven fish, less probable yet to catch five, then eight and so on. But these are all tolerably plausible numbers. I used a mathematics package (Octave, an open-source clone of Matlab) to run ten simulated catches, from fifty fish each with a probability of .125 of being caught, and came out with these sizes M for the fish harvests:

    M = 4 6 3 6 7 7 5 7 8 9

    Since we know, by some method, that the chance p of catching any one fish is exactly 0.125, this implies fish populations N = M \div p of:

    M = 4 6 3 6 7 7 5 7 8 9
    N = 32 48 24 48 56 56 40 56 64 72

    Now, none of these is the right number, although 48 is respectably close and 56 isn’t too bad. But the range is hilarious: there might be as few as 24 or as many as 72 fish, based on just this evidence. That might as well be guessing.

    This is essentially a matter of error analysis. Any one attempt at catching fish may be faulty, because the fish are too shy of the trap, or too eager to leap into it, or are just being difficult for some reason. But we can correct for the flaws of one attempt at fish-counting by repeating the experiment. We can’t always be unlucky in the same ways.

    This is conceptually easy, and extremely easy to do on the computer; it’s a little harder in real life but certainly within the bounds of our research budget, since I just have to go out back and put the trap out. And redoing the experiment even pays off, too: average those population samples from the ten simulated runs there and we get a mean estimated fish population of 49.6, which is basically dead on.

    (That was lucky, I must admit. Ten attempts isn’t really enough to make the variation comfortably small. Another run with ten simulated catchings produced a mean estimate population of 56; the next one … well, 49.6 again, but the one after that gave me 64. It isn’t until we get into a couple dozen attempts that the mean population estimate gets reliably close to fifty. Still, the work is essentially the same as the problem of “I flipped a fair coin some number of times; it came up tails ten times. How many times did I flip it?” It might have been any number ten or above, but I most probably flipped it about twenty times, and twenty would be your best guess absent more information.)

    The same problem affects working out what the probability of catching a fish is, since we do that by catching some small number n of fish and then seeing how many some smaller number m of them we re-catch later on. Suppose the probability of catching a fish really is p = 0.125 , but we’re only trying to catch n = 6 fish. Here’s a couple rounds of ten simulated catchings of six fish, and how many of those were re-caught:

    2 0 1 0 1 0 1 0 0 1
    2 0 1 1 0 3 0 0 1 1
    0 1 0 1 0 0 1 0 0 0
    1 0 0 0 0 0 0 0 2 1

    Obviously any one of those indicates a probability ranging from 0 to 0.5 of re-catching a fish. Technically, yes, 0.125 is a number between 0 and 0.5, but it hasn’t really shown itself. But if we average out all these probabilities … well, those forty attempts give us a mean estimated probability of 0.092. This isn’t excellent but at least it’s in range. If we keep doing the experiment we’d get do better; one simulated batch of a hundred experiments turned up a mean estimated probability of 0.12833. (And there’s variations, of course; another batch of 100 attempts estimated the probability at 0.13333, and then the next at 0.10667, though if you use all three hundred of these that gets to an average of 0.12278, which isn’t too bad.)

    This inconvenience amounts to a problem of working with small numbers in the original fish population, in the number of fish sampled in any one catching, and in the number of catches done to estimate their population. Small numbers tend to be problems for probability and statistics; the tools grow much more powerful and much more precise when they can work with enormously large collections of things. If the backyard pond held infinitely many fish we could have a much better idea of how many fish were in it.

  • Joseph Nebus 5:33 pm on Friday, 13 February, 2015 Permalink | Reply
    Tags: estimation, , , , , population estimates,   

    How To Count Fish 

    We have a pond out back, and in 2013, added some goldfish to it. The goldfish, finding themselves in a comfortable spot with clean water, went about the business of making more goldfish. They didn’t have much time to do that before winter of 2013, but they had a very good summer in 2014, producing so many baby goldfish that we got a bit tired of discovering new babies. The pond isn’t quite deep enough that we could be sure it was safe for them to winter over, so we had to work out moving them to a tub indoors. This required, among other things, having an idea how many goldfish there were. The question then was: how many goldfish were in the pond?

    It's a pond about ten feet across and maybe two feet deep, with (at the time of the photograph) at most eleven fish in it.

    This is the backyard pond; pictured are several fish, though not all of them.

    It’s not hard to come up with a maximum estimate: a goldfish needs some amount of water to be healthy. Wikipedia seems to suggest a single fish needs about twenty gallons — call it 80 liters — and I’ll accept that since it sounds plausible enough and it doesn’t change the logic of the maximum estimate if the number is actually something different. The pond’s about ten feet across, and roughly circular, and not quite two feet deep. Call that a circular cylinder, with a diameter of three meters, and a depth of two-thirds of a meter, and that implies a volume of about pi times (3/2) squared times (2/3) cubic meters. That’s about 4.7 cubic meters, or 4700 liters. So there probably would be at most 60 goldfish in the pond. Could the goldfish have reached the pond’s maximum carrying capacity that quickly? Easily; you would not believe how fast goldfish will make more goldfish given fresh water and a little warm weather.

    It can be a little harder to quite believe in the maximum estimate. For one, smaller fish don’t need as much water as bigger ones do and the baby fish are, after all, small. Or, since we don’t really know how deep the pond is — it’s not a very regular bottom, and it’s covered with water — might there be even more water and thus capacity for even more fish? That might sound ridiculous but consider: an error of two inches in my estimate of the pond’s depth amounts to a difference of 350 liters or room for four or five fish.

    We can turn to probability, though. If we have some way of catching fish — and we have; we’ve got a wire trap and a mesh trap, which we’d use for bringing in fish — we could set them out and see how many fish we can catch. If we suppose there’s a certain probability p of catching any one fish, and if there are N fish in the pond any of which might be caught, then we could expect that some number M =  N \cdot p fish are going to be caught. So if, say, we have a one-in-three chance of catching a fish, and after trying we’ve got some number M fish — let’s say there were 8 caught, so we have some specific number to play with — we could conclude that there must have been about M \div p = 8 \div \frac{1}{3} or 24 fish in the population to catch.

    This does bring up the problem of how to guess what the probability of catching any one fish is. But if we make some reasonable-sounding assumptions we can get an estimate of that: set out the traps and catch some number, call it n , of fish. Then set them back and after they’ve had time to recover from the experience, put the traps out again to catch n fish again. We can expect that of that bunch there will be some number, call it m , of the fish we’d previously caught. The ratio of the fish we catch twice to the number of fish we caught in the first place should be close to the chance of catching any one fish.

    So let’s lay all this out. If there are some unknown number N fish in the pond, and there is a chance of \frac{m}{n} of any one fish being caught, and we’ve caught in seriously trying M fish, then: M = N \cdot \frac{m}{n} and therefore N = M \cdot \frac{n}{m} .

    For example, suppose in practice we caught ten fish, and were able to re-catch four of them. Then in trying seriously we caught twelve fish. From this we’d conclude that n = 10, m = 4, M = 12 and therefore there are about N = M \cdot \frac{m}{n} = 12 \cdot \frac{10}{4} = 30 fish in the pond.

    Or if in practice we’d caught twelve fish, five of them a second time, and then in trying seriously we caught eleven fish. Then since n = 12, m = 5, M = 11 we get an estimate of N = M \cdot \frac{m}{n} = 11 \cdot \frac{12}{5} = 26.4 or call it 26 fish in the pond.

    Or for another variation: suppose the first time out we caught nine fish, and the second time around, catching another nine, we re-caught three of them. If we’re feeling a little lazy we can skip going around and catching fish again, and just use the figures that n = 9, m = 3, M = 9 and from that conclude there are about N = 9 \cdot \frac{9}{3} = 27 fish in the pond.

    So, in principle, if we’ve made assumptions about the fish population that are right, or at least close enough to right, we can estimate what the fish population is without having to go to the work of catching every single one of them.

    Since this is a generally useful scheme for estimating a population let me lay it out in an easy-to-follow formula.

    To estimate the size of a population of N things, assuming that they are all equally likely to be detected by some system (being caught in a trap, being photographed by someone at a spot, anything), try this:

    1. Catch some particular number n of the things. Then let them go back about their business.
    2. Catch another n of them. Count the number m of them that you caught before.
    3. The chance of catching one is therefore about p = m \div n .
    4. Catch some number M of the things.
    5. Since — we assume — every one of the N things had the same chance p of being caught, and since we caught M of them, then we estimate there to be N = M \div p of the things to catch.

    Warning! There is a world of trouble hidden in that “we assume” on the last step there. Do not use this for professional wildlife-population-estimation until you have fully understood those two words.

    • mathtuition88 9:22 am on Monday, 16 February, 2015 Permalink | Reply

      Nice pond! Hope you catch all the fish


      • Joseph Nebus 10:37 pm on Monday, 16 February, 2015 Permalink | Reply

        The pond’s gotten iced over, so we’ve caught all the fish we’ll be getting for this winter. We know there are some left in yet, but I did spot at least one of them alive and apparently well recently. And it hasn’t been as hard a winter as last year’s was, so, we’re hopeful. In another month we could reasonably expect the weather to spend most of its time above freezing and then we can get to the problem of moving the fish back out.

        Liked by 1 person

    • elkement 1:34 pm on Monday, 16 February, 2015 Permalink | Reply

      I wonder if you could measure a charactistic amount of something that fish extract from the water or they produce? Such as: Oxygen content, or food not yet eaten after X hours, or their sh*t?
      Or fish need to mutate so that they will emit light at a specific wavelength and intensity. In any case, you could measure the number of fish by measuring the grand total of some stuff.


      • Joseph Nebus 10:43 pm on Monday, 16 February, 2015 Permalink | Reply

        It’s very tempting to think of counting the fish by measuring biological processes in the pond. But the pond and the population are probably at just the wrong scale for that to be useful: there’s fish that are barely two finger-widths long, while some of the adults are maybe one and a half times my hand’s greatest extent. With that kind of distribution of body size, anything that tries measuring, like, oxygen use or food consumption is just going to be swamped by the error margins.

        If the pond were smaller, so none of the fish ever got that huge (goldfish size is limited by water quality, which depends on, well, fish size and number, among other factors) then we could work out an average fish metabolism and believe in the numbers. If the pond were larger, so that the fish population would be in the tens of thousands, we could use an average fish metabolism and trust in the Law of Large Numbers that the really huge and really small fish are going to balance one another out.

        As it is, I don’t believe the pond or the population is big enough that we can rely on averages like that, and it’s not small enough to make fish-tracking easy. It’s very nice to look at on a warm day, though.

        Liked by 1 person

  • Joseph Nebus 3:17 am on Wednesday, 5 June, 2013 Permalink | Reply
    Tags: , , computer, estimation, , ,   

    How Big Is This Number? Answered 

    My little question about just how big a number 3^{3^{15}} was got answered just exactly right by John Friedrich, so if you wondered about how I could say a number took about seven million digits just to write out, there’s your answer. Friedrich gives it as a number with 6,846,169 digits, and I agree. Better, the calculator I found which was able to handle this (MatCalcLite, a free calculator app I have on my iPad) agrees too: it claims that 3^{3^{15}} is about 3.25 \times 10^{6 846 168} which has that magic 6,846,169 digits.

    Friedrich uses logarithms to work it out, and this is one of the things logarithms are good for in these days when you don’t generally need them to do multiplications and divisions. You can look at logarithms as letting you evaluate the lengths of numbers — how many digits they need to work out — rather than the numbers themselves, and this brings to the field of accessibility numbers that would otherwise be too big to work with, even on the calculator. (Another thing logarithms are good for is that they’re quite nice to work with if you have to do calculus, so once you’re comfortable with them, you start looking for chances to slip them into analysis.)

    One nagging little point about Friedrich’s work, though, is that you need to know the logarithm of 3 to work it out. (Also you need the logarithm of 10, or you could try using the common logarithm — the logarithm base ten — of 3 instead.) For finding the actual number that’s fine; trying to get this answer with any precision without looking up the logarithm of 3 is quirky if not crazy.

    But what if you want to do this purely by the joys of mental arithmetic? Could you work out 3^{3^{15}} without finding a table of logarithms? Obviously you can’t if you want a really precise answer, and here 3.25 \times 10^{6 846 168} counts as precise, but could you at least get a good idea of how big a number it is?

    • fluffy 7:09 pm on Wednesday, 5 June, 2013 Permalink | Reply

      The UNIX ‘bc’ tool can actually calculate it directly. I don’t think your comment form can accept a 7MB post, however.


    • John Friedrich 12:39 am on Thursday, 6 June, 2013 Permalink | Reply

      3^21 = 10460353203, which you can round off to 10,000,000,000 or 10^10. So 3^(21n) is roughly equal to 10^(10n).
      21n = 3^15 gives n = 3^15 / 21 = 3^14 / 7 = 683281.285714, 10n = 6832812.85714, and the approximate number of digits is 6,832,813, which is off by about 0.2%.

      If you just want an order of magnitude and don’t want to use a calculator, you can approximate 3^2 as roughly equal to 10^1, 3^(2n) is roughly equal to 10^n, n = 3^15 / 2 = 243 ^ 3 / 2 = 14,348,907 / 2 = 7174453.5, which is off by about 5%.


      • John Friedrich 12:40 am on Thursday, 6 June, 2013 Permalink | Reply

        Oops, forgot to add 1 to n at the end there to make it the correct number of digits, hardly affects the outcome though.


Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc
%d bloggers like this: