## Ensembled

A couple weeks back voting in the Democratic party’s Iowa caucus had several districts tied between Clinton and Sanders supporters. The ties were broken by coin tosses. That fact produced a bunch of jokes at Iowa’s expense. I can’t join in this joking. If the votes don’t support one candidate over another, but someone must win, what’s left but an impartial tie-breaking scheme?

After Clinton won six of the coin tosses people joked about the “impartial” idea breaking down. Well, we around here know that there are no unfair coins. And while it’s possible to have an unfair coin toss, I’m not aware of any reason to think any of the tosses were. It’s lucky to win six coin tosses. If the tosses are fair, the chance of getting any one right is one-half. Suppose the tosses are “independent”. That is, the outcome of one doesn’t change the chances of any other. Then the chance of getting six right in a row is the chance of getting one right, times itself, six times over. That is, the chance is one-half raised to the sixth power. That’s a small number, about 1.5 percent. But it’s not so riotously small as to deserve rioting.

Yes and no. It depends on what you mean by “any other outcome”. Grant that heads and tails are equally likely to come up. Grant also that coin tosses are independent. Then six heads, H H H H H H, are just as likely to come up as six tails, T T T T T T. I don’t think anyone will argue with me that far.

But are both of these exactly as likely as the first toss coming up heads and all the others tails? As likely as H T T T T T? Yes, I would say they are. But I understand if you feel skeptical, and if you want convincing. The chance of getting heads once in a fair coin toss is one-half. We started with that. What’s the chance of getting five tails in a row? That must be one-half raised to the fifth power. The first coin toss and the last five don’t depend on one another. This means the chance of that first heads followed by those five tails is one-half times one-half to the fifth power. And that’s one-half to the sixth power.

What about the first two tosses coming up heads and the next four tails? H H T T T T? We can run through the argument again. The chance of two coin tosses coming up heads would be one-half to the second power. The chance of four coin tosses coming up tails would be one-half to the fourth power. The chance of the first streak being followed by the second is the product of the two chances. One-half to the second power times one-half to the fourth power is one-half to the sixth power.

We could go on like this and try out all the possible outcomes. There’s only 64 of them. That’s going to be boring. We could prove any particular string of outcomes is just as likely as any other. We need to make an argument that’s a little more clever, but also a little more abstract.

Don’t think just now of a particular sequence of coin toss outcomes. Consider this instead: what is the chance you will call a coin toss right? You might call heads, you might call tails. The coin might come up heads, the coin might come up tails. The chance you call it right, though — well, won’t that be one-half? Stay at this point until you’re sure it is.

So write out a sequence of possible outcomes. Don’t tell me what it is. It can be any set of H and T, as you like, as long as it’s six outcomes long.

What is the chance you wrote down six correct tosses in a row? That’ll be the chance of calling one outcome right, one-half, times itself six times over. One-half to the sixth power. So I know the probability that your prediction was correct. Which of the 64 possible outcomes did you write down? I don’t know. I suspect you didn’t even write one down. I would’ve just pretended I had one in mind until the essay required me to do something too. But the exact same argument applies no matter which sequence you pretended to write down. (Look at it. I didn’t use any information about what sequence you would have picked. So how could the sequence affect the outcome?) Therefore each of the 64 possible outcomes has the same chance of coming up.

So in this context, yes, six heads in a row is exactly as likely as any other sequence of six coin tosses.

I will guess that you aren’t perfectly happy with this argument. It probably feels like something is unaccounted-for. What’s unaccounted-for is that nobody cares about the difference between the sequence H H T H H H and the sequence H H H T H H. Would you even notice the difference if I hadn’t framed the paragraph to make the difference stand out? In either case, the sequence is “one tail, five heads”. What’s the chance of getting “one tail, five heads”?

Well, the chance of getting one of several mutually exclusive outcomes is the sum of the chance of each individual outcome. And these are mutually exclusive outcomes: you can’t get both H H T H H H and H H H T H H as the result of the same set of coin tosses.

(There can be not-mutually-exclusive outcomes. Consider, for example, the chance of getting “at least three tails” and the chance of the third coin toss being heads. Calculating the chance of either of those outcomes happening demands more thinking. But we don’t have to deal with that here, so we won’t.)

There are six distinct ways to get one tails and five heads. The tails can be the first toss’s result. Or the tails can be the second toss’s result. Or the tails can be the third toss’s result. And so on. Each of these possible outcomes has the same probability, one-half to the sixth power. So the chance of getting “one tails, five heads” is one-half to the sixth power, added to itself, six times over. That is, it’s six times one-half to the sixth power. That will come up about one time in eleven that you do a sequence of six coin tosses.

There are fifteen ways to get two tails and four heads. So the chance of the outcome being “two tails, four heads” is fifteen times one-half to the sixth power. That will come up a bit less than one in four times.

There are twenty, count ’em, ways to get three tails and three heads. So the chance of that is twenty times one-half to the sixth power. That’s a little more than three times in ten. There are fifteen ways to get four tails and two heads, so the chance of that drops again. There’s six ways to get five tails and one heads. And there’s just one way to get six tails and no heads on six coin tosses.

So if you think of the outcome as “this many tails and that many heads”, then, no, not all outcomes are equally likely. “Three tails and three heads” is a lot more likely than “no tails and six heads”. “Two tails and four heads” is more likely than “one tails and five heads”.

Whether it’s right to say “every outcome is just as likely” depends on what you think “an outcome” is. If it’s a particular sequence of heads and tails, then yes, it is. If it’s the aggregate statistic of how many heads and tails, then no, it’s not.

We see this kind of distinction all over the place. Every hand of cards, for example, might be as likely to turn up as every other hand of cards. But consider five-card poker hands. There are very few hands that have the interesting pattern of being a straight flush, five sequential cards of the same face. There are more hands that have the interesting pattern of four-of-a-kind. There are a lot of hands that have the mildly interesting pattern of two-of-a-kind and nothing else going on. There’s a huge mass of cards that don’t have any pattern we’ve seen fit to notice. So a straight flush is regarded as a very unlikely hand to have, and four-of-a-kind more likely but still rare. Two-of-a-kind is none too rare. Nothing at all is most likely, at least in a five-card hand. (When you get seven cards, a hand with nothing at all becomes less likely. You have so many chances that you just have to hit something.)

The distinction carries over into statistical mechanics. The field studies the state of things. Is a mass of material solid or liquid or gas? Is a solid magnetized or not, or is it trying to be? Are molecules in a high- or a low-energy state?

Mathematicians use the name “ensemble” to describe a state of whatever it is we’re studying. But we have the same problem of saying what kind of description we mean. Suppose we are studying the magnetism of a solid object. We do this by imagining the object as a bunch of smaller regions, each with a tiny bit of magnetism. That bit might have the north pole pointing up, or the south pole pointing up. We might say the ensemble is that there are ten percent more north-pole-up regions than there are south-pole-up regions.

But by that, do we mean we’re interested in “ten percent more north-pole-up than south-pole-up regions”? Or do we mean “these particular regions are north-pole-up, and these are south-pole-up”? We distinguish this by putting in some new words.

The “canonical ensemble” is, generally, the kind of aggregate-statistical-average description of things. So, “ten percent more north-pole-up than south-pole-up regions” would be such a canonical ensemble. Or “one tails, five heads” would be a canonical ensemble. If we want to look at the fine details we speak of the “microcanonical ensemble”. That would be “these particular regions are north-pole-up, and these are south-pole-up”. Or that would be “the coin tosses came up H H H T H H”.

Just what is a canonical and what is a microcanonical ensemble depends on context. Of course it would. Consider the standpoint of the city manager, hoping to estimate the power and water needs of neighborhoods and bringing the language of statistical mechanics to the city-planning world. There, it is enough detail to know how many houses on a particular street are occupied and how many residents there are. She could fairly consider that a microcanonical ensemble. From the standpoint of the letter carriers for the post office, though, that would be a canonical ensemble. It would give an idea how much time would be needed to deliver on that street. But would be just short of useful in getting letters to recipients. The letter carrier would want to know which people are in which house before rating that a microcanonical ensemble.

Much of statistical mechanics is studying ensembles, and which ensembles are more or less likely than others. And how that likelihood changes as conditions change.

So let me answer the original question. In this coin-toss problem, yes, every microcanonical ensemble is just as likely as every other microcanonical ensemble. The sequence ‘H H H H H H’ is just as likely as ‘H T H H H T’ or ‘T T H T H H’ are. But not every canonical ensemble is as likely as every other one. Six heads in six tosses are less likely than two heads and four tails, or three heads and three tails, are. The answer depends on what you mean by the question.

## Doesn’t The Other Team Count? How Much?

I’d worked out an estimate of how much information content there is in a basketball score, by which I was careful to say the score that one team manages in a game. I wasn’t able to find out what the actual distribution of real-world scores was like, unfortunately, so I made up a plausible-sounding guess: that college basketball scores would be distributed among the imaginable numbers (whole numbers from zero through … well, infinitely large numbers, though in practice probably not more than 150) according to a very common distribution called the “Gaussian” or “normal” distribution, that the arithmetic mean score would be about 65, and that the standard deviation, a measure of how spread out the distribution of scores is, would be about 10.

If those assumptions are true, or are at least close enough to true, then there are something like 5.4 bits of information in a single team’s score. Put another way, if you were trying to divine the score by asking someone who knew it a series of carefully-chosen questions, like, “is the score less than 65?” or “is the score more than 39?”, with at each stage each question equally likely to be answered yes or no, you could expect to hit the exact score with usually five, sometimes six, such questions.

## How Not To Count Fish

I’d discussed a probability/sampling-based method to estimate the number of fish that might be in our pond out back, and then some of the errors that have to be handled if you want to have a reliable result. Now, I want to get into why the method doesn’t work, at least not without much greater insight into goldfish behavior than simply catching a couple and releasing them will do.

Catching a sample, re-releasing it, and counting how many of that sample we re-catch later on is a logically valid method, provided certain assumptions the method requires are accurately — or at least accurately enough — close to the way the actual thing works. Here are some of the ways goldfish fall short of the ideal.

First faulty assumption: Goldfish are perfectly identical. In this goldfish-trapped we make the assumption that there is some, fixed, constant probability of a goldfish being caught in the net. We have to assume that this is the same number for every goldfish, and that it doesn’t change as goldfish go through the experience of getting caught and then released. But goldfish have personality, as you learn if you have a bunch in a nice setting and do things like try feeding them koi treats or introduce something new like a wire-mesh trap to their environment. Some are adventurous and will explore the unfamiliar thing; some are shy and will let everyone else go first and then maybe not bother going at all. I empathize with both positions.

If there are enough goldfish, the variation between personalities is probably not going to matter much. There’ll be some that are easy to catch, and they’ll probably be roughly as common as the ones who can’t be coaxed into the trap at all. It won’t be exactly balanced unless we’re very lucky, but this would probably only throw off our calculations a little bit.

Whether the goldfish learn, and become more, or less, likely to be trapped in time is harder. Goldfish do learn, certainly, although it’s not obvious to me that the trapping and releasing experience would be one they draw much of a lesson from. It’s only a little inconvenience, really, and not at all harmful; what should they learn? Other than that there’s maybe an easy bit of food to be had here so why not go in? So this might change their behavior and it’s hard to predict how.

(I note that animal capture studies get quite frustrated when the animals start working out how to game the folks studying them. Bil Gilbert’s early-70s study of coatis — Latin American raccoons, written up in the lovely popularization Chulo: A Year Among The Coatimundis — was plagued by some coatis who figured out going into the trap was an easy, safe meal they’d be released from without harm, and wouldn’t go back about their business and leave room for other specimens.)

Second faulty assumption: Goldfish are not perfectly identical. This is the biggest challenge to counting goldfish population by re-catching a sample of them. How do you know if you caught a goldfish before? When they grow to adulthood, it’s not so bad, since they grow fairly distinctive patterns of orange and white and black and such, and they’ll usually settle into different sizes. (That said, we do have two adult fish who were very distinct when we first got them, but who’ve grown into near-twins.)

But baby goldfish? They’re basically all tiny black things, meant to hide into the mud at the bottom of ponds and rivers — their preferred habitat — and pretty near indistinguishable. As they get larger they get distinguishable, a bit, and start to grow patterns, but for the vast number of baby fish there’s just no telling one from another.

When we were trying to work out whether some mice we found in the house were ones we had previously caught and put out in the garage, we were able to mark them by squiring some food dye at their heads as they were released. The mice would rub the food dye from their heads onto their whole bodies and it would take a while before the dye would completely fade out. (We didn’t re-catch any mice, although it’s hard to dye a wild mouse efficiently because they will take off like bullets. Also one time when we thought we’d captured one there were actually three in the humane trap and you try squiring the food dye bottle at two more mice than you thought were there, fleeing.) But you can see how the food dye wouldn’t work here. Animal researchers with a budget might go on to attach collars or somehow otherwise mark animals, but if there’s a way to mark and track goldfish with ordinary household items I can’t think of it.

(No, we will not be taking the bits of americium in our smoke detectors out and injecting them into trapped goldfish; among the objections, I don’t have a radioactivity detector.)

Third faulty assumption: Goldfish are independent entities. The first two faulty assumptions are ones that could be kind of worked around. If there’s enough goldfish then the distribution of how likely any one is to get caught will probably be near enough normal that we can pretend there’s an identical chance of catching each, and if we really thought about it we could probably find some way of marking goldfish to tell if we re-caught any. Independence, though; this is the point on which so many probability-based schemes fall.

Independence, in the language of probability, is the principle that one thing’s happening does not affect the likelihood of another thing happening. For our problem, it’s the assumption that one goldfish being caught does not make it any more or less likely that another goldfish will be caught. We like independence, in studying probability. It makes so many problems easier to study, or even possible to study, and it often seems like a reasonable supposition.

A good number of interesting scientific discoveries amount to finding evidence that two things are not actually independent, and that one thing happening makes it more (or less) likely the other will. Sometimes these turn out to be vapor — there was a 19th-century notion suggesting a link between sunspot activity and economic depressions (because sunspots correlate to solar activity, which could affect agriculture, and up to 1893 the economy and agriculture were pretty much the same thing) — but when there is a link the results can be profound, as see the smoking-and-cancer link, or for something promising but still (to my understanding) under debate, the link between leaded gasoline and crime rates.

How this applies to the goldfish population problem, though, is that goldfish are social creatures. They school, loosely, forming and re-forming groups, and would much rather be around another goldfish than not. Even as babies they form these adorable tiny little schools; that may be in the hopes that someone else will get eaten by a bigger fish, but they keep hanging around other fish their own size through their whole lives. If there’s a goldfish inside the trap, it is hard to believe that other goldfish are not going to follow it just to be with the company.

Indeed, the first day we set out the trap for the winter, we pulled in all but one of the adult fish, all of whom apparently followed the others into the enclosure. I’m sorry I couldn’t photograph that because it was both adorable and funny to see so many fish just station-keeping beside one another — they were even all looking in the same direction — and waiting for whatever might happen next. Throughout the months we were able to spend bringing in fish, the best bait we could find was to have one fish already in the trap, and a couple days we did leave one fish in a few more hours or another night so that it would be joined by several companions the next time we checked.

So that’s something which foils the catch and re-catch scheme: goldfish are not independent entities. They’re happy to follow one another into trap. I would think the catch and re-catch scheme should be salvageable, if it were adapted to the way goldfish actually behave. But that requires a mathematician admitting that he can’t just blunder into a field with an obvious, simple scheme to solve a problem, and instead requires the specialized knowledge and experience of people who are experts in the field, and that of course can’t be done. (For example, I don’t actually know that goldfish behavior is sufficiently non-independent as to make an important difference in a population estimate of this kind. But someone who knew goldfish or carp well could tell me, or tell me how to find out.)

For those curious how the goldfish worked out, though, we were able to spend about two and a half months catching fish before the pond froze over for the winter, though the number we caught each week dropped off as the temperature dropped. We have them floating about in a stock tank in the basement, waiting for the coming of spring and the time the pond will be warm enough for them to re-occupy it. We also know that at least some of the goldfish we didn’t catch made it to, well, about a month ago. I’d seen one of the five orange baby fish who refused to go into the trap through a hole in the ice then. It was holding close to the bottom but seemed to be in good shape.

This coming year should be an exciting one for our fish population.