People reading my Reading the Comics post Sunday maybe noticed something. I mean besides my correct, reasonable complaining about the Comics Kingdom redesign. That is that all the comics were from before the 30th of March. That is, none were from the week before the 7th of April. The last full week of March had a lot of comic strips. The first week of April didn’t. So things got bumped a little. Here’s the results. It wasn’t a busy week, not when I filter out the strips that don’t offer much to write about. So now I’m stuck for what to post Thursday.
The strip explains things well enough. The Library holds every book that will ever be written. In the original story there are some constraints. Particularly, all the books are 410 pages. If you wanted, say, a 600-page book, though, you could find one book with the first 410 pages and another book with the remaining 190 pages and then some filler. The catch, as explained in the story and in the comic strip, is finding them. And there is the problem of finding a ‘correct’ text. Every possible text of the correct length should be in there. So every possible book that might be titled Mark Twain vs Frankenstein, including ones that include neither Mark Twain nor Frankenstein, is there. Which is the one you want to read?
Henry Scarpelli and Craig Boldman’s Archie for the 4th features an equal-divisions problem. In principle, it’s easy to divide a pizza (or anything else) equally; that’s what we have fractions for. Making them practical is a bit harder. I do like Jughead’s quick work, though. It’s got the slight-of-hand you expect from stage magic.
Scott Hilburn’s The Argyle Sweater for the 4th takes place in an algebra class. I’m not sure what algebraic principle demonstrates, but it probably came from somewhere. It’s 4,829,210. The exponentials on the blackboard do cue the reader to the real joke, of the sign reading “kick10 me”. I question whether this is really an exponential kicking situation. It seems more like a simple multiplication to me. But it would be harder to make that joke read clearly.
Tony Cochran’s Agnes for the 5th is part of a sequence investigating how magnets work. Agnes and Trout find just … magnet parts inside. This is fair. It’s even mathematics.
Thermodynamics classes teach one of the great mathematical physics models. This is about what makes magnets. Magnets are made of … smaller magnets. This seems like question-begging. Ultimately you get down to individual molecules, each of which is very slightly magnetic. When small magnets are lined up in the right way, they can become a strong magnet. When they’re lined up in another way, they can be a weak magnet. Or no magnet at all.
How do they line up? It depends on things, including how the big magnet is made, and how it’s treated. A bit of energy can free molecules to line up, making a stronger magnet out of a weak one. Or it can break up the alignments, turning a strong magnet into a weak one. I’ve had physics instructors explain that you could, in principle, take an iron rod and magnetize it just by hitting it hard enough on the desk. And then demagnetize it by hitting it again. I have never seen one do this, though.
This is more than just a physics model. The mathematics of it is … well, it can be easy enough. A one-dimensional, nearest-neighbor model, lets us describe how materials might turn into magnets or break apart, depending on their temperature. Two- or three-dimensional models, or models that have each small magnet affected by distant neighbors, are harder.
Today’s topic is the lone (so far) request by bunnydoe, so I’m under pressure to make it decent. If she or anyone else would like to nominate subjects for the letters U through Z, please drop me a note at this post. I keep fooling myself into thinking I’ll get one done in under 1200 words.
This is a story which makes a capitalist look kind of good. I make no vouches for its truth, or even, at this remove, where I got it. The story as I heard it was about Ray Kroc, who made McDonald’s into a thing people of every land can complain about. The story has him demonstrate skepticism about the use of business consultants. A consultant might find, for example, that each sesame-seed hamburger bun has (say) 43 seeds. And that if they just cut it down to 41 seeds then each franchise would save (say) $50,000 annually. And no customer would notice the difference. Fine; trim the seeds a little. The next round of consultant would point out, cutting from 41 seeds to 38 would save a further $65,000 per store per year. And again no customer would notice the difference. Cut to 36 seeds? No customer would notice. This process would end when each bun had three sesame seeds, and the customers notice.
I mention this not for my love of sesame-seed buns. It’s just a less-common version of the Sorites Paradox. It’s a very old logical problem. We draw it, and its name, from the Ancient Greek philosophers. In the oldest form, it’s about a heap of sand, and which grain of sand’s removal destroys the heap. This form we attribute to Eubulides of Miletus. Eubulides is credited with a fair number of logical paradoxes. One of them we all know, the Liar Paradox, “What I am saying now is a lie”. Another, the Horns Paradox, I hadn’t encountered before researching this essay. But it bids fair to bring me some delight every day of the rest of my life. “What you have not lost, you have. But you have not lost horns. Therefore you have horns.” Eubelides has a bunch of other paradoxes. Some read, to my uninformed eye, like restatements of other paradoxes. Some look ready to be recast as arguments about Lois Lane’s relationship with Superman. Miletus we know because for a good stretch there every interesting philosopher was hanging around Miletus.
Part of the paradox’s intractability must be that it’s so nearly induction. Induction is a fantastic tool for mathematical problems. We couldn’t do without it. But consider the argument. If a bun is unsatisfying, one more seed won’t make it satisfying. A bun with one seed is unsatisfying. Therefore all buns have an unsatisfying number of sesame seeds on them. It suggests there must be some point at which “adding one more seed won’t help” stops being true. Fine; where is that point, and why isn’t it one fewer or one more seed?
A certain kind of nerd has a snappy answer for the Sorites Paradox. Test a broad population on a variety of sesame-seed buns. There’ll be some so sparse that nearly everyone will say they’re unsatisfying. There’ll be some so abundant most everyone agrees they’re great. So there’s the buns most everyone says are fine. There’s the buns most everyone says are not. The dividing line is at any point between the sparsest that satisfy most people and the most abundant that don’t. The nerds then declare the problem solved and go off. Let them go. We were lucky to get as much of their time as we did. They’re quite busy solving what “really” happened for Rashomon. The approach of “set a line somewhere” is fine if all want is guidance on where to draw a line. It doesn’t help say why we can anoint some border over any other. At least when we use a river as border between states we can agree going into the water disrupts what we were doing with the land. And even then we have to ask what happens during droughts and floods, and if the river is an estuary, how tides affect matters.
We might see an answer by thinking more seriously about these sesame-seed buns. We force a problem by declaring that every bun is either satisfying or it is not. We can imagine buns with enough seeds that we don’t feel cheated by them, but that we also don’t feel satisfied by. This reflects one of the common assumptions of logic. Mathematicians know it as the Law of the Excluded Middle. A thing is true or it is not true. There is no middle case. This is fine for logic. But for everyday words?
It doesn’t work when considering sesame-seed buns. I can imagine a bun that is not satisfying, but also is not unsatisfying. Surely we can make some logical provision for the concept of “meh”. Now we need not draw some arbitrary line between “satisfying” and “unsatisfying”. We must draw two lines, one of them between “unsatisfying” and “meh”. There is a potential here for regression. Also for the thought of a bun that’s “satisfying-meh-satisfying by unsatisfying”. I shall step away from this concept.
But there are more subtle ways to not exclude the middle. For example, we might decide a statement’s truth exists on a spectrum. We can match how true a statement is to a number. Suppose an obvious falsehood is zero; an unimpeachable truth is one, and normal mortal statements somewhere in the middle. “This bun with a single sesame seed is satisfying” might have a truth of 0.01. This perhaps reflects the tastes of people who say they want sesame seeds but don’t actually care. “This bun with fifteen sesame seeds is satisfying” might have a truth of 0.25, say. “This bun with forty sesame seeds is satisfying” might have a truth of 0.97. (It’s true for everyone except those who remember the flush times of the 43-seed bun.) This seems to capture the idea that nothing is always wholly anything. But we can still step into absurdity. Suppose “this bun with 23 sesame seeds is satisfying” has a truth of 0.50. Then “this bun with 23 sesame seeds is not satisfying” should also have a truth of 0.50. What do we make of the statement “this bun with 23 sesame seeds is simultaneously satisfying and not satisfying”? Do we make something different to “this bun with 23 sesame seeds is simultaneously satisfying and satisfying”?
I see you getting tired in the back there. This may seem like word games. And we all know that human words are imprecise concepts. What has this to do with logic, or mathematics, or anything but the philosophy of language? And the first answer is that we understand logic and mathematics through language. When learning mathematics we get presented with definitions that seem absolute and indisputable. We start to see the human influence in mathematics when we ask why 1 is not a prime number. Later we see things like arguments about whether a ring has a multiplicative identity. And then there are more esoteric debates about the bounds of mathematical concepts.
Perhaps we can think of a concept we can’t describe in words. If we don’t express it to other people, the concept dies with us. We need words. No, putting it in symbols does not help. Mathematical symbols may look like slightly alien scrawl. But they are shorthand for words, and can be read as sentences, and there is this fuzziness in all of them.
And we find mathematical properties that share this problem. Consider: what is the color of the chemical element flerovium? Before you say I just made that up, flerovium was first synthesized in 1998, and officially named in 2012. We’d guess that it’s a silvery-white or maybe grey metallic thing. Humanity has only ever observed about ninety atoms of the stuff. It’s, for atoms this big, amazingly stable. We know an isotope of it that has a half-life of two and a half seconds. But it’s hard to believe we’ll ever have enough of the stuff to look at it and say what color it is.
That’s … all right, though? Maybe? Because we know the quantum mechanics that seem to describe how atoms form. And how they should pack together. And how light should be absorbed, and how light should be emitted, and how light should be scattered by it. At least in principle. The exact answers might be beyond us. But we can imagine having a solution, at least in principle. We can imagine the computer that after great diligent work gives us a picture of what a ten-ton lump of flerovium would look like.
So where does its color come from? Or any of the other properties that these atoms have as a group? No one atom has a color. No one atom has a density, either, or a viscosity. No one atom has a temperature, or a surface tension, or a boiling point. In combination, though, they have.
These are known to statistical mechanics, and through that thermodynamics, as intensive properties. If we have a partition function, which describes all the ways a system can be organized, we can extract information about these properties. They turn up as derivatives with respect to the right parameters of the system.
But the same problem exists. Take a homogeneous gas. It has some temperature. Divide it into two equal portions. Both sides have the same temperature. Divide each half into two equal portions again. All four pieces have the same temperature. Divide again, and again, and a few more times. You eventually get containers with so little gas in them they don’t have a temperature. Where did it go? When did it disappear?
The counterpart to an intensive property is an extensive one. This is stuff like the mass or the volume or the energy of a thing. Cut the gas’s container in two, and each has half the volume. Cut it in half again, and each of the four containers has one-quarter the volume. Keep this up and you stay in uncontroversial territory, because I am not discussing Zeno’s Paradoxes here.
And like Zeno’s Paradoxes, the Sorites Paradox can seem at first trivial. We can distinguish a heap from a non-heap; who cares where the dividing line is? Or whether the division is a gradual change? It seems easy. To show why it is easy is hard. Each potential answer is interesting, and plausible, and when you think hard enough of it, not quite satisfying. Good material to think about.
I had a free choice of topics for today! Nobody had a suggestion for the letter ‘N’, so, I’ll take one of my own. If you did put in a suggestion, I apologize; I somehow missed the comment in which you did. I’ll try to do better in future.
Nearest Neighbor Model.
Why are restaurants noisy?
It’s one of those things I wondered while at a noisy restaurant. I have heard it is because restauranteurs believe patrons buy more, and more expensive stuff, in a noisy place. I don’t know that I have heard this correctly, nor that what I heard was correct. I’ll leave it to people who work that end of restaurants to say. But I wondered idly whether mathematics could answer why.
It’s easy to form a rough model. Suppose I want my brilliant words to be heard by the delightful people at my table. Then I have to be louder, to them, than the background noise is. Fine. I don’t like talking loudly. My normal voice is soft enough even I have a hard time making it out. And I’ll drop the ends of sentences when I feel like I’ve said all the interesting parts of them. But I can overcome my instinct if I must.
The trouble comes from other people thinking of themselves the way I think of myself. They want to be heard over how loud I have been. And there’s no convincing them they’re wrong. If there’s bunches of tables near one another, we’re going to have trouble. We’ll each by talking loud enough to drown one another out, until the whole place is a racket. If we’re close enough together, that is. If the tables around mine are empty, chances are my normal voice is enough for the cause. If they’re not, we might have trouble.
So this inspires a model. The restaurant is a space. The tables are set positions, points inside it. Each table is making some volume of noise. Each table is trying to be louder than the background noise. At least until the people at the table reach the limits of their screaming. Or decide they can’t talk, they’ll just eat and go somewhere pleasant.
Making calculations on this demands some more work. Some is obvious: how do you represent “quiet” and “loud”? Some is harder: how far do voices carry? Grant that a loud table is still loud if you’re near it. How far away before it doesn’t sound loud? How far away before you can’t hear it anyway? Imagine a dining room that’s 100 miles long. There’s no possible party at one end that could ever be heard at the other. Never mind that a 100-mile-long restaurant would be absurd. It shows that the limits of people’s voices are a thing we have to consider.
There are many ways to model this distance effect. A realistic one would fall off with distance, sure. But it would also allow for echoes and absorption by the walls, and by other patrons, and maybe by restaurant decor. This would take forever to get answers from, but if done right it would get very good answers. A simpler model would give answers less fitted to your actual restaurant. But the answers may be close enough, and let you understand the system. And may be simple enough that you can get answers quickly. Maybe even by hand.
And so I come to the “nearest neighbor model”. The common English meaning of the words suggest what it’s about. We get it from models, like my restaurant noise problem. It’s made of a bunch of points that have some value. For my problem, tables and their noise level. And that value affects stuff in some region around these points.
In the “nearest neighbor model”, each point directly affects only its nearest neighbors. Saying which is the nearest neighbor is easy if the points are arranged in some regular grid. If they’re evenly spaced points on a line, say. Or a square grid. Or a triangular grid. If the points are in some other pattern, you need to think about what the nearest neighbors are. This is why people working in neighbor-nearness problems get paid the big money.
Suppose I use a nearest neighbor model for my restaurant problem. In this, I pretend the only background noise at my table is that of the people the next table over, in each direction. Two tables over? Nope. I don’t hear them at my table. I do get an indirect effect. Two tables over affects the table that’s between mine and theirs. But vice-versa, too. The table that’s 100 miles away can’t affect me directly, but it can affect a table in-between it and me. And that in-between table can affect the next one closer to me, and so on. The effect is attenuated, yes. Shouldn’t it be, if we’re looking at something farther away?
This sort of model is easy to work with numerically. I’m inclined toward problems that work numerically. Analytically … well, it can be easy. It can be hard. There’s a one-dimensional version of this problem, a bunch of evenly-spaced sites on an infinitely long line. If each site is limited to one of exactly two values, the problem becomes easy enough that freshman physics majors can solve it exactly. They don’t, not the first time out. This is because it requires recognizing a trigonometry trick that they don’t realize would be relevant. But once they know the trick, they agree it’s easy, when they go back two years later and look at it again. It just takes familiarity.
This comes up in thermodynamics, because it makes a nice model for how ferromagnetism can work. More realistic problems, like, two-dimensional grids? … That’s harder to solve exactly. Can be done, though not by undergraduates. Three-dimensional can’t, last time I looked. Weirdly, four-dimensional can. You expect problems to only get harder with more dimensions of space, and then you get a surprise like that.
The nearest-neighbor-model is a first choice. It’s hardly the only one. If I told you there were a next-nearest-neighbor model, what would you suppose it was? Yeah, you’d be right. As long as you supposed it was “things are affected by the nearest and the next-nearest neighbors”. Mathematicians have heard of loopholes too, you know.
As for my restaurant model? … I never actually modelled it. I did think about the model. I concluded my model wasn’t different enough from ferromagnetism models to need me to study it more. I might be mistaken. There may be interesting weird effects caused by the facts of restaurants. That restaurants are pretty small things. That they can have echo-y walls and ceilings. That they can have sound-absorbing things like partial walls or plants. Perhaps I gave up too easily when I thought I knew the answer. Some of my idle thoughts end up too idle.
Some mathematics escapes mathematicians and joins culture. This is one such. The monkeys are part of why. They’re funny and intelligent and sad and stupid and deft and clumsy, and they can sit at a keyboard almost look in place. They’re so like humans, except that we empathize with them. To imagine lots of monkeys, and putting them to some silly task, is compelling.
The metaphor traces back to a 1913 article by the mathematical physicist Émile Borel which I have not read. Searching the web I find much more comment about it than I find links to a translation of the text. And only one copy of the original, in French. And that page wants €10 for it. So I can tell you what everybody says was in Borel’s original text, but can’t verify it. The paper’s title is “Statistical Mechanics and Irreversibility”. From this I surmise that Borel discussed one of the great paradoxes of statistical mechanics. If we open a bottle of one gas in an airtight room, it disperses through the room. Why doesn’t every molecule of gas just happen, by chance, to end up back where it started? It does seem that if we waited long enough, it should. It’s unlikely it would happen on any one day, but give it enough days …
But let me turn to many web sites that are surely not all copying Wikipedia on this. Borel asked us to imagine a million monkeys typing ten hours a day. He posited it was possible but extremely unlikely that they would exactly replicate all the books of the richest libraries of the world. But that would be more likely than the atmosphere in a room un-mixing like that. Fair enough, but we’re not listening anymore. We’re thinking of monkeys. Borel’s is a fantastic image. It would see some adaptation in the years. Physicist Arthur Eddington, in 1928, made it an army of monkeys, with their goal being the writing all the books in the British Museum. By 1960 Bob Newhart had an infinite number of monkeys and typewriters, and a goal of all the great books. Stating the premise gets a laugh I doubt the setup would today. I’m curious whether Newhart brought the idea to the mass audience. (Google NGrams for “monkeys at typewriters” suggest that phrase was unwritten, in books, before about 1965.) We may owe Bob Newhart thanks for a lot of monkeys-at-typewriters jokes.
Newhart has a monkey hit on a line from Hamlet. I don’t know if it was Newhart that set the monkeys after Shakespeare particularly, rather than some other great work of writing. Shakespeare does seem to be the most common goal now. Sometimes the number of monkeys diminishes, to a thousand or even to one. Some people move the monkeys off of typewriters and onto computers. Some take the cowardly measure of putting the monkeys at “keyboards”. The word is ambiguous enough to allow for typewriters, computers, and maybe a Megenthaler Linotype. The monkeys now work 24 hours a day. This will be a comment someday about how bad we allowed pre-revolutionary capitalism to get.
The cultural legacy of monkeys-at-keyboards might well itself be infinite. It turns up in comic strips every few weeks at least. Television shows, usually writing for a comic beat, mention it. Computer nerds doing humor can’t resist the idea. Here’s a video of a 1979 Apple ][ program titled THE INFINITE NO. OF MONKEYS, which used this idea to show programming tricks. And it’s a great philosophical test case. If a random process puts together a play we find interesting, has it created art? No deliberate process creates a sunset, but we can find in it beauty and meaning. Why not words? There’s likely a book to write about the infinite monkeys in pop culture. Though the quotations of original materials would start to blend together.
But the big question. Have the monkeys got a chance? In a break from every probability question ever, the answer is: it depends on what the question precisely is. Occasional real-world experiments-cum-art-projects suggest that actual monkeys are worse typists than you’d think. They do more of bashing the keys with a stone before urinating on it, a reminder of how slight is the difference between humans and our fellow primates. So we turn to abstract monkeys who behave more predictably, and run experiments that need no ethical oversight.
So we must think what we mean by Shakespeare’s Plays. Arguably the play is a specific performance of actors in a set venue doing things. This is a bit much to expect of even a skilled abstract monkey. So let us switch to the book of a play. This has a more clear representation. It’s a string of characters. Mostly letters, some punctuation. Good chance there’s numerals in there. It’s probably a lot of characters. So the text to match is some specific, long string of characters in a particular order.
And what do we mean by a monkey at the keyboard? Well, we mean some process that picks characters randomly from the allowed set. When I see something is picked “randomly” I want to know what the distribution rule is. Like, are Q’s exactly as probable as E’s? As &’s? As %’s? How likely it is a particular string will get typed is easiest to answer if we suppose a “uniform” distribution. This means that every character is equally likely. We can quibble about capital and lowercase letters. My sense is most people frame the problem supposing case-insensitivity. That the monkey is doing fine to type “whaT beArD weRe i BEsT tO pLAy It iN?”. Or we could set the monkey at an old typesetter’s station, with separate keys for capital and lowercase letters. Some will even forgive the monkeys punctuating terribly. Make your choices. It affects the numbers, but not the point.
I’ll suppose there are 91 characters to pick from, as a Linotype keyboard had. So the monkey has capitals and lowercase and common punctuation to get right. Let your monkey pick one character. What is the chance it hit the first character of one of Shakespeare’s plays? Well, the chance is 1 in 91 that you’ve hit the first character of one specific play. There’s several dozen plays your monkey might be typing, though. I bet some of them even start with the same character, so giving an exact answer is tedious. If all we want monkey-typed Shakespeare plays, we’re being fussy if we want The Tempest typed up first and Cymbeline last. If we want a more tractable problem, it’s easier to insist on a set order.
So suppose we do have a set order. Then there’s a one-in-91 chance the first character matches the first character of the desired text. A one-in-91 chance the second character typed matches the second character of the desired text. A one-in-91 chance the third character typed matches the third character of the desired text. And so on, for the whole length of the play’s text. Getting one character right doesn’t make it more or less likely the next one is right. So the chance of getting a whole play correct is raised to the power of however many characters are in the first script. Call it 800,000 for argument’s sake. More characters, if you put two spaces between sentences. The prospects of getting this all correct is … dismal.
I mean, there’s some cause for hope. Spelling was much less fixed in Shakespeare’s time. There are acceptable variations for many of his words. It’d be silly to rule out a possible script that (say) wrote “look’d” or “look’t”, rather than “looked”. Still, that’s a slender thread.
But there is more reason to hope. Chances are the first monkey will botch the first character. But what if they get the first character of the text right on the second character struck? Or on the third character struck? It’s all right if there’s some garbage before the text comes up. Many writers have trouble starting and build from a first paragraph meant to be thrown away. After every wrong letter is a new chance to type the perfect thing, reassurance for us all.
Since the monkey does type, hypothetically, forever … well, so each character has a probability of only (or whatever) of starting the lucky sequence. The monkey will have chances to start. More chances than that.
And we don’t have only one monkey. We have a thousand monkeys. At least. A million monkeys. Maybe infinitely many monkeys. Each one, we trust, is working independently, owing to the monkeys’ strong sense of academic integrity. There are monkeys working on the project. And more than that. Each one takes their chance.
There are dizzying possibilities here. There’s the chance some monkey will get it all exactly right first time out. More. Think of a row of monkeys. What’s the chance the first thing the first monkey in the row types is the first character of the play? What’s the chance the first thing the second monkey in the row types is the second character of the play? The chance the first thing the third monkey in the row types is the third character in the play? What’s the chance a long enough row of monkeys happen to hit the right buttons so the whole play appears in one massive simultaneous stroke of the keys? Not any worse than the chance your one monkey will type this all out. Monkeys at keyboards are ergodic. It’s as good to have a few monkeys working a long while as to have many monkeys working a short while. The Mythical Man-Month is, for this project, mistaken.
That solves it then, doesn’t it? A monkey, or a team of monkeys, has a nonzero probability of typing out all Shakespeare’s plays. Or the works of Dickens. Or of Jorge Luis Borges. Whatever you like. Given infinitely many chances at it, they will, someday, succeed.
What is the chance that the monkeys screw up? They get the works of Shakespeare just right, but for a flaw. The monkeys’ Midsummer Night’s Dream insists on having the fearsome lion played by “Smaug the joiner” instead. This would send the play-within-the-play in novel directions. The result, though interesting, would not be Shakespeare. There’s a nonzero chance they’ll write the play that way. And so, given infinitely many chances, they will.
What’s the chance that they always will? That they just miss every single chance to write “Snug”. It comes out “Smaug” every time?
We can say. Call the probability that they make this Snug-to-Smaug typo any given time . That’s a number from 0 to 1. 0 corresponds to not making this mistake; 1 to certainly making it. The chance they get it right is . The chance they make this mistake twice is smaller than . The chance that they get it right at least once in two tries is closer to 1 than is. The chance that, given three tries, they make the mistake every time is even smaller still. The chance that they get it right at least once is even closer to 1.
You see where this is going. Every extra try makes the chance they got it wrong every time smaller. Every extra try makes the chance they get it right at least once bigger. And now we can let some analysis come into play.
So give me a positive number. I don’t know your number, so I’ll call it ε. It’s how unlikely you want something to be before you say it won’t happen. Whatever your ε was, I can give you a number . If the monkeys have taken more than tries, the chance they get it wrong every single time is smaller than your ε. The chance they get it right at least once is bigger than 1 – ε. Let the monkeys have infinitely many tries. The chance the monkey gets it wrong every single time is smaller than any positive number. So the chance the monkey gets it wrong every single time is zero. It … can’t happen, right? The chance they get it right at least once is closer to 1 than to any other number. So it must be 1. So it must be certain. Right?
But let me give you this. Detach a monkey from typewriter duty. This one has a coin to toss. It tosses fairly, with the coin having a 50% chance of coming up tails and 50% chance of coming up heads each time. The monkey tosses the coin infinitely many times. What is the chance the coin comes up tails every single one of these infinitely many times? The chance is zero, obviously. At least you can show the chance is smaller than any positive number. So, zero.
Yet … what power enforces that? What forces the monkey to eventually have a coin come up heads? It’s … nothing. Each toss is a fair toss. Each toss is independent of its predecessors. But there is no force that causes the monkey, after a hundred million billion trillion tosses of “tails”, to then toss “heads”. It’s the gambler’s fallacy to think there is one. The hundred million billion trillionth-plus-one toss is as likely to come up tails as the first toss is. It’s impossible that the monkey should toss tails infinitely many times. But there’s no reason it can’t happen. It’s also impossible that the monkeys still on the typewriters should get Shakespeare wrong every single time. But there’s no reason that can’t happen.
It’s unsettling. Well, probability is unsettling. If you don’t find it disturbing you haven’t thought long enough about it. Infinities, too, are unsettling so.
Formally, mathematicians interpret this — if not explain it — by saying the set of things that can happen is a “probability space”. The likelihood of something happening is what fraction of the probability space matches something happening. (I’m skipping a lot of background to say something that simple. Do not use this at your thesis defense without that background.) This sort of “impossible” event has “measure zero”. So its probability of happening is zero. Measure turns up in analysis, in understanding how calculus works. It complicates a bunch of otherwise-obvious ideas about continuity and stuff. It turns out to apply to probability questions too. Imagine the space of all the things that could possibly happen as being the real number line. Pick one number from that number line. What is the chance you have picked exactly the number -24.11390550338228506633488? I’ll go ahead and say you didn’t. It’s not that you couldn’t. It’s not impossible. It’s just that the chance that this happened, out of the infinity of possible outcomes, is zero.
The infinite monkeys give us this strange set of affairs. Some things have a probability of zero of happening, which does not rule out that they can. Some things have a probability of one of happening, which does not mean they must. I do not know what conclusion Borel ultimately drew about the reversibility problem. I expect his opinion to be that we have a clear answer, and unsettlingly great room for that answer to be incomplete.
I have to specify. There’s a bunch of mathematics concepts called `distribution’. Some of them are linked. Some of them are just called that because we don’t have a better word. Like, what else would you call multiplying the sum of something? I want to describe a distribution that comes to us in probability and in statistics. Through these it runs through modern physics, as well as truly difficult sciences like sociology and economics.
We get to distributions through random variables. These are variables that might be any one of multiple possible values. There might be as few as two options. There might be a finite number of possibilities. There might be infinitely many. They might be numbers. At the risk of sounding unimaginative, they often are. We’re always interested in measuring things. And we’re used to measuring them in numbers.
What makes random variables hard to deal with is that, if we’re playing by the rules, we never know what it is. Once we get through (high school) algebra we’re comfortable working with an ‘x’ whose value we don’t know. But that’s because we trust that, if we really cared, we would find out what it is. Or we would know that it’s a ‘dummy variable’, whose value is unimportant but gets us to something that is. A random variable is different. Its value matters, but we can’t know what it is.
Instead we get a distribution. This is a function which gives us information about what the outcomes are, and how likely they are. There are different ways to organize this data. If whoever’s talking about it doesn’t say just what they’re doing, bet on it being a “probability distribution function”. This follows slightly different rules based on whether the range of values is discrete or continuous, but the idea is roughly the same. Every possible outcome has a probability at least zero but not more than one. The total probability over every possible outcome is exactly one. There’s rules about the probability of two distinct outcomes happening. Stuff like that.
Distributions are interesting enough when they’re about fixed things. In learning probability this is stuff like hands of cards or totals of die rolls or numbers of snowstorms in the season. Fun enough. These get to be more personal when we take a census, or otherwise sample things that people do. There’s something wondrous in knowing that while, say, you might not know how long a commute your neighbor has, you know there’s an 80 percent change it’s between 15 and 25 minutes (or whatever). It’s also good for urban planners to know.
It gets exciting when we look at how distributions can change. It’s hard not to think of that as “changing over time”. (You could make a fair argument that “change” is “time”.) But it doesn’t have to. We can take a function with a domain that contains all the possible values in the distribution, and a range that’s something else. The image of the distribution is some new distribution. (Trusting that the function doesn’t do something naughty.) These functions — these mappings — might reflect nothing more than relabelling, going from (say) a distribution of “false and true” values to one of “-5 and 5” values instead. They might reflect regathering data; say, going from the distribution of a die’s outcomes of “1, 2, 3, 4, 5, or 6” to something simpler, like, “less than two, exactly two, or more than two”. Or they might reflect how something does change in time. They’re all mappings; they’re all ways to change what a distribution represents.
These mappings turn up in statistical mechanics. Processes will change the distribution of positions and momentums and electric charges and whatever else the things moving around do. It’s hard to learn. At least my first instinct was to try to warm up to it by doing a couple test cases. Pick specific values for the random variables and see how they change. This can help build confidence that one’s calculating correctly. Maybe give some idea of what sorts of behaviors to expect.
But it’s calculating the wrong thing. You need to look at the distribution as a specific thing, and how that changes. It’s a change of view. It’s like the change in view from thinking of a position as an x- and y- and maybe z-coordinate to thinking of position as a vector. (Which, I realize now, gave me slightly similar difficulties in thinking of what to do for any particular calculation.)
Distributions can change in time, just the way that — in simpler physics — positions might change. Distributions might stabilize, forming an equilibrium. This can mean that everything’s found a place to stop and rest. That will never happen for any interesting problem. What you might get is an equilibrium like the rings of Saturn. Everything’s moving, everything’s changing, but the overall shape stays the same. (Roughly.)
There are many specifically named distributions. They represent patterns that turn up all the time. The binomial distribution, for example, which represents what to expect if you have a lot of examples of something that can be one of two values each. The Poisson distribution, for representing how likely something that could happen any time (or any place) will happen in a particular span of time (or space). The normal distribution, also called the Gaussian distribution, which describes everything that isn’t trying to be difficult. There are like 400 billion dozen more named ones, each really good at describing particular kinds of problems. But they’re all distributions.
So I’m going to have a third Reading the Comics essay for last week’s strips. This happens sometimes. Two of the four strips for this essay mention percentages. But one of the others is so important to me that it gets naming rights for the essay. You’ll understand when I’m done. I hope.
Angie Bailey’s Texts From Mittens for the 2nd talks about percentages. That’s a corner of arithmetic that many people find frightening and unwelcoming. I’m tickled that Mittens doesn’t understand how easy it is to work out a percentage of 100. It’s a good, reasonable bit of characterization for a cat.
John Graziano’s Ripley’s Believe It Or Not for the 2nd is about a subject close to my heart. At least a third of it is. The mention of negative Kelvin temperatures set off a … heated … debate on the comments thread at GoComics.com. Quite a few people remember learning in school that the Kelvin temperature scale. It starts with the coldest possible temperature, which is zero. And that’s that. They have taken this to denounce Graziano as writing obvious nonsense. Well.
Something you should know about anything you learned in school: the reality is more complicated than that. This is true for thermodynamics. This is true for mathematics. This is true for anything interesting enough for humans to study. This also applies to stuff you learned as an undergraduate. Also to grad school.
So what are negative temperatures? At least on an absolute temperature scale, where the answer isn’t an obvious and boring “cold”? One clue is in the word “absolute” there. It means a way of measuring temperature that’s in some way independent of how we do the measurement. In ordinary life we measure temperatures with physical phenomena. Fluids that expand or contract as their temperature changes. Metals that expand or contract as their temperatures change. For special cases like blast furnaces, sample slugs of clays that harden or don’t at temperature. Observing the radiation of light off a thing. And these are all fine, useful in their domains. They’re also bound in particular physical experiments, though. Is there a definition of temperature that … you know … we can do mathematically?
Of course, or I wouldn’t be writing this. There are two mathematical-physics components to give us temperature. One is the internal energy of your system. This is the energy of whatever your thing is, less the gravitational or potential energy that reflects where it happens to be sitting. Also minus the kinetic energy that comes of the whole system moving in whatever way you like. That is, the energy you’d see if that thing were in an otherwise empty universe. The second part is — OK, this will confuse people. It’s the entropy. Which is not a word for “stuff gets broken”. Not in this context. The entropy of a system describes how many distinct ways there are for a system to arrange its energy. Low-entropy systems have only a few ways to put things. High-entropy systems have a lot of ways to put things. This does harmonize with the pop-culture idea of entropy. There are many ways for a room to be messy. There are few ways for it to be clean. And it’s so easy to make a room messier and hard to make it tidier. We say entropy tends to increase.
So. A mathematical physicist bases “temperature” on the internal energy and the entropy. Imagine giving a system a tiny bit more energy. How many more ways would the system be able to arrange itself with that extra energy? That gives us the temperature. (To be precise, it gives us the reciprocal of the temperature. We could set this up as how a small change in entropy affects the internal energy, and get temperature right away. But I have an easier time thinking of going from change-in-energy to change-in-entropy than the other way around. And this is my blog so I get to choose how I set things up.)
This definition sounds bizarre. But it works brilliantly. It’s all nice clean mathematics. It matches perfectly nice easy-to-work-out cases, too. Like, you may kind of remember from high school physics how the temperature of a gas is something something average kinetic energy something. Work out the entropy and the internal energy of an ideal gas. Guess what this change-in-entropy/change-in-internal-energy thing gives you? Exactly something something average kinetic energy something. It’s brilliant.
In ordinary stuff, adding a little more internal energy to a system opens up new ways to arrange that energy. It always increases the entropy. So the absolute temperature, from this definition, is always positive. Good stuff. Matches our intuition well.
So in 1956 Dr Norman Ramsey and Dr Martin Klein published some interesting papers in the Physical Review. (Here’s a link to Ramsey’s paper and here’s Klein’s, if you can get someone else to pay for your access.) Their insightful question: what happens if a physical system has a maximum internal energy? If there’s some way of arranging the things in your system so that no more energy can come in? What if you’re close to but not at that maximum?
It depends on details, yes. But consider this setup: there’s one, or only a handful, of ways to arrange the maximum possible internal energy. There’s some more ways to arrange nearly-the-maximum-possible internal energy. There’s even more ways to arrange not-quite-nearly-the-maximum-possible internal energy.
Look at what that implies, though. If you’re near the maximum-possible internal energy, then adding a tiny bit of energy reduces the entropy. There’s fewer ways to arrange that greater bit of energy. Greater internal energy, reduced entropy. This implies the temperature is negative.
So we have to allow the idea of negative temperatures. Or we have to throw out this statistical-mechanics-based definition of temperature. And the definition works so well otherwise. Nobody’s got an idea nearly as good for it. So mathematical physicists shrugged, and noted this as a possibility, but mostly ignored it for decades. If it got mentioned, it was because the instructor was showing off a neat weird thing. This is how I encountered it, as a young physics major full of confidence and not at all good on wedge products. But it was sitting right there, in my textbook, Kittel and Kroemer’s Thermal Physics. Appendix E, four brisk pages before the index. Still, it was an enchanting piece.
And a useful one, possibly the most useful four-page aside I encountered as an undergraduate. My thesis research simulated a fluid-equilibrium problem run at different temperatures. There was a natural way that this fluid would have a maximum possible internal energy. So, a good part — the most fascinating part — of my research was in the world of negative temperatures. It’s a strange one, one where entropy seems to work in reverse. Things build, spontaneously. More heat, more energy, makes them build faster. In simulation, a shell of viscosity-free gas turned into what looked for all the world like a solid shell.
All right, but you can simulate anything on a computer, or in equations, as I did. Would this ever happen in reality? … And yes, in some ways. Internal energy and entropy are ideas that have natural, irresistible fits in information theory. This is the study of … information. I mean, how you send a signal and how you receive a signal. It turns out a lot of laser physics has, in information theory terms, behavior that’s negative-temperature. And, all right, but that’s not what anybody thinks of as temperature.
Well, these ideas happen still. They usually need some kind of special constraint on the things. Atoms held in a magnetic field so that their motions are constrained. Vortices locked into place on a two-dimensional surface (a prerequisite to my little fluids problems). Atoms bound into a lattice that keeps them from being able to fly free. All weird stuff, yes. But all exactly as the statistical-mechanics temperature idea calls on.
And notice. These negative temperatures happen only when the energy is extremely high. This is the grounds for saying that they’re hotter than positive temperatures. And good reason, too. Getting into what heat is, as opposed to temperature, is an even longer discussion. But it seems fair to say something with a huge internal energy has more heat than something with slight internal energy. So Graziano’s Ripley’s claim is right.
(GoComics.com commenters, struggling valiantly, have tried to talk about quantum mechanics stuff and made a hash of it. As a general rule, skip any pop-physics explanation of something being quantum mechanics.)
If you’re interested in more about this, I recommend Stephen J Blundell and Katherine M Blundell’s Concepts in Thermal Physics. Even if you’re not comfortable enough in calculus to follow the derivations, the textbook prose is insightful.
John Hambrock’s The Brilliant Mind of Edison Lee for the 3rd is a probability joke. And it’s built on how impossible putting together a particular huge complicated structure can be. I admit I’m not sure how I’d go about calculating the chance of a heap of Legos producing a giraffe shape. Imagine working out the number of ways Legos might fall together. Imagine working out how many of those could be called giraffe shapes. It seems too great a workload. And figuring it by experiment, shuffling Legos until a giraffe pops out, doesn’t seem much better.
This approaches an argument sometimes raised about the origins of life. Grant there’s no chance that a pile of Legos could be dropped together to make a giraffe shape. How can the much bigger pile of chemical elements have been stirred together to make an actual giraffe? Or, the same problem in another guise. If a monkey could go at a typewriter forever without typing any of Shakespeare’s plays, how did a chain of monkeys get to writing all of them?
And there’s a couple of explanations. At least partial explanations. There is much we don’t understand about the origins of life. But one is that the universe is huge. There’s lots of stars. It looks like most stars have planets. There’s lots of chances for chemicals to mix together and form a biochemistry. Even an impossibly unlikely thing will happen, given enough chances.
And another part is selection. A pile of Legos thrown into a pile can do pretty much anything. Any piece will fit into any other piece in a variety of ways. A pile of chemicals are more constrained in what they can do. Hydrogen, oxygen, and a bit of activation energy can make hydrogen-plus-hydroxide ions, water, or hydrogen peroxide, and that’s it. There can be a lot of ways to arrange things. Proteins are chains of amino acids. These chains can be about as long as you like. (It seems.) (I suppose there must be some limit.) And they curl over and fold up in some of the most complicated mathematical problems anyone can even imagine doing. How hard is it to find a set of chemicals that are a biochemistry? … That’s hard to say. There are about twenty amino acids used for proteins in our life. It seems like there could be a plausible life with eighteen amino acids, or 24, including a couple we don’t use here. It seems plausible, though, that my father could have had two brothers growing up; if there were, would I exist?
Jason Chatfield’s Ginger Meggs for the 3rd is a story-problem joke. Familiar old form to one. The question seems to be a bit mangled in the asking, though. Thirty percent of Jonson’s twelve apples is a nasty fractional number of apples. Surely the question should have given Jonson ten and Fitzclown twelve apples. Then thirty percent of Jonson’s apples would be a nice whole number.
The Theorem of the Day is just what the name offers. They’re fit onto single slides, so there’s not much text to read. I’ll grant some of them might be hard reading at once, though, if you’re not familiar with the lingo. Anyway, this particular theorem, the Lindemann-Weierstrass Theorem, is one of the famous ones. Also one of the best-named ones. Karl Weierstrass is one of those names you find all over analysis. Over the latter half of the 19th century he attacked the logical problems that had bugged calculus for the previous three centuries and beat them all. I’m lying, but not by much. Ferdinand von Lindemann’s name turns up less often, but he’s known in mathematics circles for proving that π is transcendental (and so, ultimately, that the circle can’t be squared by compass and straightedge). And he was David Hilbert’s thesis advisor.
The Lindemann-Weierstrass Theorem is one of those little utility theorems that’s neat on its own, yes, but is good for proving other stuff. This theorem says that if a given number is algebraic (ask about that some A To Z series) then e raised to that number has to be transcendental, and vice-versa. (The exception: e raised to 0 is equal to 1.) The page also mentions one of those fun things you run across when you have a scientific calculator and can repeat an operation on whatever the result of the last operation was.
And last, Katherine Bourzac writing for Nature.com reports the creation of a two-dimensional magnet. This delights me since one of the classic problems in statistical mechanics is a thing called the Ising model. It’s a basic model for the mathematics of how magnets would work. The one-dimensional version is simple enough that you can give it to undergrads and have them work through the whole problem. The two-dimensional version is a lot harder to solve and I’m not sure I ever saw it laid out even in grad school. (Mind, I went to grad school for mathematics, not physics, and the subject is a lot more physics.) The four- and higher-dimensional model can be solved by a clever approach called mean field theory. The three-dimensional model .. I don’t think has any exact solution, which seems odd given how that’s the version you’d think was most useful.
That there’s a real two-dimensional magnet (well, a one-molecule-thick magnet) doesn’t really affect the model of two-dimensional magnets. The model is interesting enough for its mathematics, which teaches us about all kinds of phase transitions. And it’s close enough to the way certain aspects of real-world magnets behave to enlighten our understanding. The topic couldn’t avoid drawing my eye, is all.
And now I can wrap up last week’s delivery from Comic Strip Master Command. It’s only five strips. One certainly stars an accountant. one stars a kid that I believe is being coded to read as an accountant. The rest, I don’t know. I pick Edition titles for flimsy reasons anyway. This’ll do.
Ryan North’s Dinosaur Comics for the 6th is about things that could go wrong. And every molecule of air zipping away from you at once is something which might possibly happen but which is indeed astronomically unlikely. This has been the stuff of nightmares since the late 19th century made probability an important part of physics. The chance all the air near you would zip away at once is impossibly unlikely. But such unlikely events challenge our intuitions about probability. An event that has zero chance of happening might still happen, given enough time and enough opportunities. But we’re not using our time well to worry about that. If nothing else, even if all the air around you did rush away at once, it would almost certainly rush back right away.
Steve Kelley and Jeff Parker’s Dustin for the 7th of March talks about the SATs and the chance of picking right answers on a multiple-choice test. I haven’t heard about changes to the SAT but I’ll accept what the comic strip says about them for the purpose of discussion here. At least back when I took it the SAT awarded one point to the raw score for a correct answer, and subtracted one-quarter point for a wrong answer. (The raw scores were then converted into a 200-to-800 range.) I liked this. If you had no idea and guessed on answers you should expect to get one in five right and four in five wrong. On average then you would expect no net change to your raw score. If one or two wrong answers can be definitely ruled out then guessing from the remainder brings you a net positive. I suppose the change, if it is being done, is meant to be confident only right answers are rewarded. I’m not sure this is right; it seems to me there’s value in being able to identify certainly wrong answers even if the right one isn’t obvious. But it’s not my test and I don’t expect to need to take it again either. I can expression opinions without penalty.
Mark Anderson’s Andertoons for the 7th is the Mark Anderson’s Andertoons for last week. It’s another kid-at-the-chalkboard panel. What gets me is that if the kid did keep one for himself then shouldn’t he have written 38?
Brian Basset’s Red and Rover for the 8th mentions fractions. It’s just there as the sort of thing a kid doesn’t find all that naturally compelling. That’s all right I like the bug-eyed squirrel in the first panel.
Bill Holbrook’s On The Fastrack for the 9th concludes the wedding of accountant Fi. It uses the square root symbol so as to make the cake topper clearly mathematical as opposed to just an age.
Most of the comics I review here are printed on GoComics.com. Well, most of the comics I read online are from there. But even so I think they have more comic strips that mention mathematical themes. Anyway, they’re unleashing a complete web site redesign on Monday. I don’t know just what the final version will look like. I know that the beta versions included the incredibly useful, that is to say dumb, feature where if a particular comic you do read doesn’t have an update for the day — and many of them don’t, as they’re weekly or three-times-a-week or so — then it’ll show some other comic in its place. I mean, the idea of encouraging people to find new comics is a good one. To some extent that’s what I do here. But the beta made no distinction between “comic you don’t read because you never heard of Microcosm” and “comic you don’t read because glancing at it makes your eyes bleed”. And on an idiosyncratic note, I read a lot of comics. I don’t need to see Dude and Dude reruns in fourteen spots on my daily comics page, even if I didn’t mind it to start.
Anyway. I am hoping, desperately hoping, that with the new site all my old links to comics are going to keep working. If they don’t then I suppose I’m just ruined. We’ll see. My suggestion is if you’re at all curious about the comics you read them today (Sunday) just to be safe.
Ashleigh Brilliant’s Pot-Shots is a curious little strip I never knew of until GoComics picked it up a few years ago. Its format is compellingly simple: a little illustration alongside a wry, often despairing, caption. I love it, but I also understand why was the subject of endless queries to the Detroit Free Press (Or Whatever) about why was this thing taking up newspaper space. The strip rerun the 31st of December is a typical example of the strip and amuses me at least. And it uses arithmetic as the way to communicate reasoning, both good and bad. Brilliant’s joke does address something that logicians have to face, too. Whether an argument is logically valid depends entirely on its structure. If the form is correct the reasoning may be excellent. But to be sound an argument has to be correct and must also have its assumptions be true. We can separate whether an argument is right from whether it could ever possibly be right. If you don’t see the value in that, you have never participated in an online debate about where James T Kirk was born and whether Spock was the first Vulcan in Star Fleet.
Thom Bluemel’s Birdbrains for the 2nd of January, 2017, is a loaded-dice joke. Is this truly mathematics? Statistics, at least? Close enough for the start of the year, I suppose. Working out whether a die is loaded is one of the things any gambler would like to know, and that mathematicians might be called upon to identify or exploit. (I had a grandmother unshakably convinced that I would have some natural ability to beat the Atlantic City casinos if she could only sneak the underaged me in. I doubt I could do anything of value there besides see the stage magic show.)
Jack Pullan’s Boomerangs rerun for the 2nd is built on the one bit of statistical mechanics that everybody knows, that something or other about entropy always increasing. It’s not a quantum mechanics rule, but it’s a natural confusion. Quantum mechanics has the reputation as the source of all the most solid, irrefutable laws of the universe’s working. Statistical mechanics and thermodynamics have this musty odor of 19th-century steam engines, no matter how much there is to learn from there. Anyway, the collapse of systems into disorder is not an irrevocable thing. It takes only energy or luck to overcome disorderliness. And in many cases we can substitute time for luck.
Scott Hilburn’s The Argyle Sweater for the 3rd is the anthropomorphic-geometry-figure joke that’s I’ve been waiting for. I had thought Hilburn did this all the time, although a quick review of Reading the Comics posts suggests he’s been more about anthropomorphic numerals the past year. This is why I log even the boring strips: you never know when I’ll need to check the last time Scott Hilburn used “acute” to mean “cute” in reference to triangles.
Mike Thompson’s Grand Avenue uses some arithmetic as the visual cue for “any old kind of schoolwork, really”. Steve Breen’s name seems to have gone entirely from the comic strip. On Usenet group rec.arts.comics.strips Brian Henke found that Breen’s name hasn’t actually been on the comic strip since May, and D D Degg found a July 2014 interview indicating Thompson had mostly taken the strip over from originator Breen.
Mark Anderson’s Andertoons for the 5th is another name-drop that doesn’t have any real mathematics content. But come on, we’re talking Andertoons here. If I skipped it the world might end or something untoward like that.
Ted Shearer’s Quincy for the 14th of November, 1977, doesn’t have any mathematical content really. Just a mention. But I need some kind of visual appeal for this essay and Shearer is usually good for that.
There comes a time a physics major, or a mathematics major paying attention to one of the field’s best non-finance customers, first works on a statistical mechanics problem. Instead of keeping track of the positions and momentums of one or two or four particles she’s given the task of tracking millions of particles. It’s listed as a distribution of all the possible values they can have. But she still knows what it really is. And she looks at how to describe the way this distribution changes in time. If she’s the slightest bit like me, or anyone I knew, she freezes up this. Calculate the development of millions of particles? Impossible! She tries working out what happens to just one, instead, and hopes that gives some useful results.
And then it does.
It’s a bit much to call this luck. But it is because the student starts off with some simple problems. Particles of gas in a strong box, typically. They don’t interact chemically. Maybe they bounce off each other, but she’s never asked about that. She’s asked about how they bounce off the walls. She can find the relationship between the volume of the box and the internal gas pressure on the interior and the temperature of the gas. And it comes out right.
She goes on to some other problems and it suddenly fails. Eventually she re-reads the descriptions of how to do this sort of problem. And she does them again and again and it doesn’t feel useful. With luck there’s a moment, possibly while showering, that the universe suddenly changes. And the next time the problem works out. She’s working on distributions instead of toy little single-particle problems.
But the problem remains: why did it ever work, even for that toy little problem?
It’s because some systems of things are ergodic. It’s a property that some physics (or mathematics) problems have. Not all. It’s a bit hard to describe clearly. Part of what motivated me to take this topic is that I want to see if I can explain it clearly.
Every part of some system has a set of possible values it might have. A particle of gas can be in any spot inside the box holding it. A person could be in any of the buildings of her city. A pool ball could be travelling in any direction on the pool table. Sometimes that will change. Gas particles move. People go to the store. Pool balls bounce off the edges of the table.
These values will have some kind of distribution. Look at where the gas particle is now. And a second from now. And a second after that. And so on, to the limits of human knowledge. Or to when the box breaks open. Maybe the particle will be more often in some areas than in others. Maybe it won’t. Doesn’t matter. It has some distribution. Over time we can say how often we expect to find the gas particle in each of its possible places.
The same with whatever our system is. People in buildings. Balls on pool tables. Whatever.
Now instead of looking at one particle (person, ball, whatever) we have a lot of them. Millions of particle in the box. Tens of thousands of people in the city. A pool table that somehow supports ten thousand balls. Imagine they’re all settled to wherever they happen to be.
So where are they? The gas particle one is easy to imagine. At least for a mathematics major. If you’re stuck on it I’m sorry. I didn’t know. I’ve thought about boxes full of gas particles for decades now and it’s hard to remember that isn’t normal. Let me know if you’re stuck, and where you are. I’d like to know where the conceptual traps are.
But back to the gas particles in a box. Some fraction of them are in each possible place in the box. There’s a distribution here of how likely you are to find a particle in each spot.
How does that distribution, the one you get from lots of particles at once, compare to the first, the one you got from one particle given plenty of time? If they agree the system is ergodic. And that’s why my hypothetical physics major got the right answers from the wrong work. (If you are about to write me to complain I’m leaving out important qualifiers let me say I know. Please pretend those qualifiers are in place. If you don’t see what someone might complain about thank you, but it wouldn’t hurt to think of something I might be leaving out here. Try taking a shower.)
The person in a building is almost certainly not an ergodic system. There’s buildings any one person will never ever go into, however possible it might be. But nearly all buildings have some people who will go into them. The one-person-with-time distribution won’t be the same as the many-people-at-once distribution. Maybe there’s a way to qualify things so that it becomes ergodic. I doubt it.
The pool table, now, that’s trickier to say. For a real pool table no, of course not. An actual ball on an actual table rolls to a stop pretty soon, either from the table felt’s friction or because it drops into a pocket. Tens of thousands of balls would form an immobile heap on the table that would be pretty funny to see, now that I think of it. Well, maybe those are the same. But they’re a pretty boring same.
Anyway when we talk about “pool tables” in this context we don’t mean anything so sordid as something a person could play pool on. We mean something where the table surface hasn’t any friction. That makes the physics easier to model. It also makes the game unplayable, which leaves the mathematical physicist strangely unmoved. In this context anyway. We also mean a pool table that hasn’t got any pockets. This makes the game even more unplayable, but the physics even easier. (It makes it, really, like a gas particle in a box. Only without that difficult third dimension to deal with.)
And that makes it clear. The one ball on a frictionless, pocketless table bouncing around forever maybe we can imagine. A huge number of balls on that frictionless, pocketless table? Possibly trouble. As long as we’re doing imaginary impossible unplayable pool we could pretend the balls don’t collide with each other. Then the distributions of what ways the balls are moving could be equal. If they do bounce off each other, or if they get so numerous they can’t squeeze past one another, well, that’s different.
An ergodic system lets you do this neat, useful trick. You can look at a single example for a long time. Or you can look at a lot of examples at one time. And they’ll agree in their typical behavior. If one is easier to study than the other, good! Use the one that you can work with. Mathematicians like to do this sort of swapping between equivalent problems a lot.
The problem is it’s hard to find ergodic systems. We may have a lot of things that look ergodic, that feel like they should be ergodic. But proved ergodic, with a logic that we can’t shake? That’s harder to do. Often in practice we will include a note up top that we are assuming the system to be ergodic. With that “ergodic hypothesis” in mind we carry on with our work. It gives us a handle on a lot of problems that otherwise would be beyond us.
As I’ve done before I’m using one of my essays to set up for another essay. It makes a later essay easier. What I want to talk about is worth some paragraphs on its own.
The 19th Century saw the discovery of some unsettling truths about … well, everything, really. If there is an intellectual theme of the 19th Century it’s that everything has an unsettling side. In the 20th Century craziness broke loose. The 19th Century, though, saw great reasons to doubt that we knew what we knew.
But one of the unsettling truths grew out of mathematical physics. We start out studying physics the way Galileo or Newton might have, with falling balls. Ones that don’t suffer from air resistance. Then we move up to more complicated problems, like balls on a spring. Or two balls bouncing off each other. Maybe one ball, called a “planet”, orbiting another, called a “sun”. Maybe a ball on a lever swinging back and forth. We try a couple simple problems with three balls and find out that’s just too hard. We have to track so much information about the balls, about their positions and momentums, that we can’t solve any problems anymore. Oh, we can do the simplest ones, but we’re helpless against the interesting ones.
And then we discovered something. By “we” I mean people like James Clerk Maxwell and Josiah Willard Gibbs. And that is that we can know important stuff about how millions and billions and even vaster numbers of things move around. Maxwell could work out how the enormously many chunks of rock and ice that make up Saturn’s rings move. Gibbs could work out how the trillions of trillions of trillions of trillions of particles of gas in a room move. We can’t work out how four particles move. How is it we can work out how a godzillion particles move?
We do it by letting go. We stop looking for that precision and exactitude and knowledge down to infinitely many decimal points. Even though we think that’s what mathematicians and physicists should have. What we do instead is consider the things we would like to know. Where something is. What its momentum is. What side of a coin is showing after a toss. What card was taken off the top of the deck. What tile was drawn out of the Scrabble bag.
There are possible results for each of these things we would like to know. Perhaps some of them are quite likely. Perhaps some of them are unlikely. We track how likely each of these outcomes are. This is called the distribution of the values. This can be simple. The distribution for a fairly tossed coin is “heads, 1/2; tails, 1/2”. The distribution for a fairly tossed six-sided die is “1/6 chance of 1; 1/6 chance of 2; 1/6 chance of 3” and so on. It can be more complicated. The distribution for a fairly tossed pair of six-sided die starts out “1/36 chance of 2; 2/36 chance of 3; 3/36 chance of 4” and so on. If we’re measuring something that doesn’t come in nice discrete chunks we have to talk about ranges: the chance that a 30-year-old male weighs between 180 and 185 pounds, or between 185 and 190 pounds. The chance that a particle in the rings of Saturn is moving between 20 and 21 kilometers per second, or between 21 and 22 kilometers per second, and so on.
We may be unable to describe how a system evolves exactly. But often we’re able to describe how the distribution of its possible values evolves. And the laws by which probability work conspire to work for us here. We can get quite precise predictions for how a whole bunch of things behave even without ever knowing what any thing is doing.
That’s unsettling to start with. It’s made worse by one of the 19th Century’s late discoveries, that of chaos. That a system can be perfectly deterministic. That you might know what every part of it is doing as precisely as you care to measure. And you’re still unable to predict its long-term behavior. That’s unshakeable too, although statistical techniques will give you an idea of how likely different behaviors are. You can learn the distribution of what is likely, what is unlikely, and how often the outright impossible will happen.
Distributions follow rules. Of course they do. They’re basically the rules you’d imagine from looking at and thinking about something with a range of values. Something like a chart of how many students got what grades in a class, or how tall the people in a group are, or so on. Each possible outcome turns up some fraction of the time. That fraction’s never less than zero nor greater than 1. Add up all the fractions representing all the times every possible outcome happens and the sum is exactly 1. Something happens, even if we never know just what. But we know how often each outcome will.
There is something amazing to consider here. We can know and track everything there is to know about a physical problem. But we will be unable to do anything with it, except for the most basic and simple problems. We can choose to relax, to accept that the world is unknown and unknowable in detail. And this makes imaginable all sorts of problems that should be beyond our power. Once we’ve given up on this precision we get precise, exact information about what could happen. We can choose to see it as a moral about the benefits and costs and risks of how tightly we control a situation. It’s a surprising lesson to learn from one’s training in mathematics.
Do you ever think about why stuff dissolves? Like, why a spoon of sugar in a glass of water should seem to disappear instead of turning into a slight change in the water’s clarity? Well, sure, in those moods when you look at the world as a child does, not accepting that life is just like that and instead can imagine it being otherwise. Take that sort of question and put it to adult inquiry and you get great science.
Peter Mander of the Carnot Cycle blog this month writes a tale about Jacobus Henricus van ‘t Hoff, the first winner of a Nobel Prize for Chemistry. In 1883, on hearing of an interesting experiment with semipermeable membranes, van ‘t Hoff had a brilliant insight about why things go into solution, and how. The insight had only one little problem. It makes for fine reading about the history of chemistry and of its mathematical study.
In other, television-related news, the United States edition of The Price Is Right included a mention of “square root day” yesterday, 4/4/16. It was in the game “Cover-Up”, in which the contestant tries making successively better guesses at the price of a car. This they do by covering up wrong digits with new guesses. For the start of the game, before the contestant’s made any guesses, they need something irrelevant to the game to be on the board. So, they put up mock calendar pages for 1/1/2001, 2/2/2004, 3/3/2009, 4/4/2016, and finally a card reading . The game show also had a round devoted to Pi Day a few weeks back. So I suppose they’re trying to reach out to people into pop mathematics. It’s cute.
A couple weeks back voting in the Democratic party’s Iowa caucus had several districts tied between Clinton and Sanders supporters. The ties were broken by coin tosses. That fact produced a bunch of jokes at Iowa’s expense. I can’t join in this joking. If the votes don’t support one candidate over another, but someone must win, what’s left but an impartial tie-breaking scheme?
After Clinton won six of the coin tosses people joked about the “impartial” idea breaking down. Well, we around here know that there are no unfair coins. And while it’s possible to have an unfair coin toss, I’m not aware of any reason to think any of the tosses were. It’s lucky to win six coin tosses. If the tosses are fair, the chance of getting any one right is one-half. Suppose the tosses are “independent”. That is, the outcome of one doesn’t change the chances of any other. Then the chance of getting six right in a row is the chance of getting one right, times itself, six times over. That is, the chance is one-half raised to the sixth power. That’s a small number, about 1.5 percent. But it’s not so riotously small as to deserve rioting.
My love asked me about a claim about this made on a Facebook discussion. The writer asserted that six heads was exactly as likely as any other outcome of six coin tosses. My love wondered: is that true?
Yes and no. It depends on what you mean by “any other outcome”. Grant that heads and tails are equally likely to come up. Grant also that coin tosses are independent. Then six heads, H H H H H H, are just as likely to come up as six tails, T T T T T T. I don’t think anyone will argue with me that far.
But are both of these exactly as likely as the first toss coming up heads and all the others tails? As likely as H T T T T T? Yes, I would say they are. But I understand if you feel skeptical, and if you want convincing. The chance of getting heads once in a fair coin toss is one-half. We started with that. What’s the chance of getting five tails in a row? That must be one-half raised to the fifth power. The first coin toss and the last five don’t depend on one another. This means the chance of that first heads followed by those five tails is one-half times one-half to the fifth power. And that’s one-half to the sixth power.
What about the first two tosses coming up heads and the next four tails? H H T T T T? We can run through the argument again. The chance of two coin tosses coming up heads would be one-half to the second power. The chance of four coin tosses coming up tails would be one-half to the fourth power. The chance of the first streak being followed by the second is the product of the two chances. One-half to the second power times one-half to the fourth power is one-half to the sixth power.
We could go on like this and try out all the possible outcomes. There’s only 64 of them. That’s going to be boring. We could prove any particular string of outcomes is just as likely as any other. We need to make an argument that’s a little more clever, but also a little more abstract.
Don’t think just now of a particular sequence of coin toss outcomes. Consider this instead: what is the chance you will call a coin toss right? You might call heads, you might call tails. The coin might come up heads, the coin might come up tails. The chance you call it right, though — well, won’t that be one-half? Stay at this point until you’re sure it is.
So write out a sequence of possible outcomes. Don’t tell me what it is. It can be any set of H and T, as you like, as long as it’s six outcomes long.
What is the chance you wrote down six correct tosses in a row? That’ll be the chance of calling one outcome right, one-half, times itself six times over. One-half to the sixth power. So I know the probability that your prediction was correct. Which of the 64 possible outcomes did you write down? I don’t know. I suspect you didn’t even write one down. I would’ve just pretended I had one in mind until the essay required me to do something too. But the exact same argument applies no matter which sequence you pretended to write down. (Look at it. I didn’t use any information about what sequence you would have picked. So how could the sequence affect the outcome?) Therefore each of the 64 possible outcomes has the same chance of coming up.
So in this context, yes, six heads in a row is exactly as likely as any other sequence of six coin tosses.
I will guess that you aren’t perfectly happy with this argument. It probably feels like something is unaccounted-for. What’s unaccounted-for is that nobody cares about the difference between the sequence H H T H H H and the sequence H H H T H H. Would you even notice the difference if I hadn’t framed the paragraph to make the difference stand out? In either case, the sequence is “one tail, five heads”. What’s the chance of getting “one tail, five heads”?
Well, the chance of getting one of several mutually exclusive outcomes is the sum of the chance of each individual outcome. And these are mutually exclusive outcomes: you can’t get both H H T H H H and H H H T H H as the result of the same set of coin tosses.
(There can be not-mutually-exclusive outcomes. Consider, for example, the chance of getting “at least three tails” and the chance of the third coin toss being heads. Calculating the chance of either of those outcomes happening demands more thinking. But we don’t have to deal with that here, so we won’t.)
There are six distinct ways to get one tails and five heads. The tails can be the first toss’s result. Or the tails can be the second toss’s result. Or the tails can be the third toss’s result. And so on. Each of these possible outcomes has the same probability, one-half to the sixth power. So the chance of getting “one tails, five heads” is one-half to the sixth power, added to itself, six times over. That is, it’s six times one-half to the sixth power. That will come up about one time in eleven that you do a sequence of six coin tosses.
There are fifteen ways to get two tails and four heads. So the chance of the outcome being “two tails, four heads” is fifteen times one-half to the sixth power. That will come up a bit less than one in four times.
There are twenty, count ’em, ways to get three tails and three heads. So the chance of that is twenty times one-half to the sixth power. That’s a little more than three times in ten. There are fifteen ways to get four tails and two heads, so the chance of that drops again. There’s six ways to get five tails and one heads. And there’s just one way to get six tails and no heads on six coin tosses.
So if you think of the outcome as “this many tails and that many heads”, then, no, not all outcomes are equally likely. “Three tails and three heads” is a lot more likely than “no tails and six heads”. “Two tails and four heads” is more likely than “one tails and five heads”.
Whether it’s right to say “every outcome is just as likely” depends on what you think “an outcome” is. If it’s a particular sequence of heads and tails, then yes, it is. If it’s the aggregate statistic of how many heads and tails, then no, it’s not.
We see this kind of distinction all over the place. Every hand of cards, for example, might be as likely to turn up as every other hand of cards. But consider five-card poker hands. There are very few hands that have the interesting pattern of being a straight flush, five sequential cards of the same face. There are more hands that have the interesting pattern of four-of-a-kind. There are a lot of hands that have the mildly interesting pattern of two-of-a-kind and nothing else going on. There’s a huge mass of cards that don’t have any pattern we’ve seen fit to notice. So a straight flush is regarded as a very unlikely hand to have, and four-of-a-kind more likely but still rare. Two-of-a-kind is none too rare. Nothing at all is most likely, at least in a five-card hand. (When you get seven cards, a hand with nothing at all becomes less likely. You have so many chances that you just have to hit something.)
The distinction carries over into statistical mechanics. The field studies the state of things. Is a mass of material solid or liquid or gas? Is a solid magnetized or not, or is it trying to be? Are molecules in a high- or a low-energy state?
Mathematicians use the name “ensemble” to describe a state of whatever it is we’re studying. But we have the same problem of saying what kind of description we mean. Suppose we are studying the magnetism of a solid object. We do this by imagining the object as a bunch of smaller regions, each with a tiny bit of magnetism. That bit might have the north pole pointing up, or the south pole pointing up. We might say the ensemble is that there are ten percent more north-pole-up regions than there are south-pole-up regions.
But by that, do we mean we’re interested in “ten percent more north-pole-up than south-pole-up regions”? Or do we mean “these particular regions are north-pole-up, and these are south-pole-up”? We distinguish this by putting in some new words.
The “canonical ensemble” is, generally, the kind of aggregate-statistical-average description of things. So, “ten percent more north-pole-up than south-pole-up regions” would be such a canonical ensemble. Or “one tails, five heads” would be a canonical ensemble. If we want to look at the fine details we speak of the “microcanonical ensemble”. That would be “these particular regions are north-pole-up, and these are south-pole-up”. Or that would be “the coin tosses came up H H H T H H”.
Just what is a canonical and what is a microcanonical ensemble depends on context. Of course it would. Consider the standpoint of the city manager, hoping to estimate the power and water needs of neighborhoods and bringing the language of statistical mechanics to the city-planning world. There, it is enough detail to know how many houses on a particular street are occupied and how many residents there are. She could fairly consider that a microcanonical ensemble. From the standpoint of the letter carriers for the post office, though, that would be a canonical ensemble. It would give an idea how much time would be needed to deliver on that street. But would be just short of useful in getting letters to recipients. The letter carrier would want to know which people are in which house before rating that a microcanonical ensemble.
Much of statistical mechanics is studying ensembles, and which ensembles are more or less likely than others. And how that likelihood changes as conditions change.
So let me answer the original question. In this coin-toss problem, yes, every microcanonical ensemble is just as likely as every other microcanonical ensemble. The sequence ‘H H H H H H’ is just as likely as ‘H T H H H T’ or ‘T T H T H H’ are. But not every canonical ensemble is as likely as every other one. Six heads in six tosses are less likely than two heads and four tails, or three heads and three tails, are. The answer depends on what you mean by the question.
I couldn’t go on calling this Back To School Editions. A couple of the comic strips the past week have given me reason to mention people famous in mathematics or physics circles, and one who’s even famous in the real world too. That’ll do for a title.
Jeff Corriveau’s Deflocked for the 15th of September tells what I want to call an old joke about geese formations. The thing is that I’m not sure it is an old joke. At least I can’t think of it being done much. It seems like it should have been.
The formations that geese, or other birds, form has been a neat corner of mathematics. The question they inspire is “how do birds know what to do?” How can they form complicated groupings and, more, change their flight patterns at a moment’s notice? (Geese flying in V shapes don’t need to do that, but other flocking birds will.) One surprising answer is that if each bird is just trying to follow a couple of simple rules, then if you have enough birds, the group will do amazingly complex things. This is good for people who want to say how complex things come about. It suggests you don’t need very much to have robust and flexible systems. It’s also bad for people who want to say how complex things come about. It suggests that many things that would be interesting can’t be studied in simpler models. Use a smaller number of birds or fewer rules or such and the interesting behavior doesn’t appear.
Scott Adams’s Dilbert Classics from the 15th and 16th of September (originally run the 22nd and 23rd of July, 1992) are about mathematical forecasts of the future. This is a hard field. It’s one people have been dreaming of doing for a long while. J Willard Gibbs, the renowned 19th century physicist who put the mathematics of thermodynamics in essentially its modern form, pondered whether a thermodynamics of history could be made. But attempts at making such predictions top out at demographic or rough economic forecasts, and for obvious reason.
The next day Dilbert’s garbageman, the smartest person in the world, asserts the problem is chaos theory, that “any complex iterative model is no better than a wild guess”. I wouldn’t put it that way, although I’m not sure what would convey the idea within the space available. One problem with predicting complicated systems, even if they are deterministic, is that there is a difference between what we can measure a system to be and what the system actually is. And for some systems that slight error will be magnified quickly to the point that a prediction based on our measurement is useless. (Fortunately this seems to affect only interesting systems, so we can still do things like study physics in high school usefully.)
Maria Scrivan’s Half Full for the 16th of September makes the Common Core joke. A generation ago this was a New Math joke. It’s got me curious about the history of attempts to reform mathematics teaching, and how poorly they get received. Surely someone’s written a popular or at least semipopular book about the process? I need some friends in the anthropology or sociology departments to tell, I suppose.
In Mark Tatulli’s Heart of the City for the 16th of September, Heart is already feeling lost in mathematics. She’s in enough trouble she doesn’t recognize mathematics terms. That is an old joke, too, although I think the best version of it was done in a Bloom County with no mathematical content. (Milo Bloom met his idol Betty Crocker and learned that she was a marketing icon who knew nothing of cooking. She didn’t even recognize “shish kebob” as a cooking term.)
Mell Lazarus’s Momma for the 16th of September sneers at the idea of predicting where specks of dust will land. But the motion of dust particles is interesting. What can be said about the way dust moves when the dust is being battered by air molecules that are moving as good as randomly? This becomes a problem in statistical mechanics, and one that depends on many things, including just how fast air particles move and how big molecules are. Now for the celebrity part of this story.
Albert Einstein published four papers in his “Annus mirabilis” year of 1905. One of them was the Special Theory of Relativity, and another the mass-energy equivalence. Those, and the General Theory of Relativity, are surely why he became and still is a familiar name to people. One of his others was on the photoelectric effect. It’s a cornerstone of quantum mechanics. If Einstein had done nothing in relativity he’d still be renowned among physicists for that. The last paper, though, that was on Brownian motion, the movement of particles buffeted by random forces like this. And if he’d done nothing in relativity or quantum mechanics, he’d still probably be known in statistical mechanics circles for this work. Among other things this work gave the first good estimates for the size of atoms and molecules, and gave easily observable, macroscopic-scale evidence that molecules must exist. That took some work, though.
My love and I play in several pinball leagues. I need to explain something of how they work.
Most of them organize league nights by making groups of three or four players and having them play five games each on a variety of pinball tables. The groupings are made by order. The 1st through 4th highest-ranked players who’re present are the first group, the 5th through 8th the second group, the 9th through 12th the third group, and so on. For each table the player with the highest score gets some number of league points. The second-highest score earns a lesser number of league points, third-highest gets fewer points yet, and the lowest score earns the player comments about how the table was not being fair. The total number of points goes into the player’s season score, which gives her ranking.
You might see the bootstrapping problem here. Where do the rankings come from? And what happens if someone joins the league mid-season? What if someone misses a competition day? (Some leagues give a fraction of points based on the player’s season average. Other leagues award no points.) How does a player get correctly ranked?
Niklas Eriksson’s Carpe Diem (June 20) is captioned “Life at the Quantum Level”. And it’s built on the idea that quantum particles could be in multiple places at once. Whether something can be in two places at once depends on coming up with a clear idea about what you mean by “thing” and “places” and for that matter “at once”; when you try to pin the ideas down they prove to be slippery. But the mathematics of quantum mechanics is fascinating. It cries out for treating things we would like to know about, such as positions and momentums and energies of particles, as distributions instead of fixed values. That is, we know how likely it is a particle is in some region of space compared to how likely it is somewhere else. In statistical mechanics we resort to this because we want to study so many particles, or so many interactions, that it’s impractical to keep track of them all. In quantum mechanics we need to resort to this because it appears this is just how the world works.
Brian and Ron Boychuk’s Chuckle Brothers (June 20) name-drops algebra as the kind of mathematics kids still living with their parents have trouble with. That’s probably required by the desire to make a joking definition of “aftermath”, so that some specific subject has to be named. And it needs parents to still be watching closely over their kids, something that doesn’t quite fit for college-level classes like Intro to Differential Equations. So algebra, geometry, or trigonometry it must be. I am curious whether algebra reads as the funniest of that set of words, or if it just fits better in the space available. ‘Geometry’ is as long a word as ‘algebra’, but it may not have the same connotation of being an impossibly hard class.
This month Peter Mander’s CarnotCycle blog talks about the interesting world of statistical equilibriums. And particularly it talks about stable equilibriums. A system’s in equilibrium if it isn’t going to change over time. It’s in a stable equilibrium if being pushed a little bit out of equilibrium isn’t going to make the system unpredictable.
For simple physical problems these are easy to understand. For example, a marble resting at the bottom of a spherical bowl is in a stable equilibrium. At the exact bottom of the bowl, the marble won’t roll away. If you give the marble a little nudge, it’ll roll around, but it’ll stay near where it started. A marble sitting on the top of a sphere is in an equilibrium — if it’s perfectly balanced it’ll stay where it is — but it’s not a stable one. Give the marble a nudge and it’ll roll away, never to come back.
In statistical mechanics we look at complicated physical systems, ones with thousands or millions or even really huge numbers of particles interacting. But there are still equilibriums, some stable, some not. In these, stuff will still happen, but the kind of behavior doesn’t change. Think of a steadily-flowing river: none of the water is staying still, or close to it, but the river isn’t changing.
CarnotCycle describes how to tell, from properties like temperature and pressure and entropy, when systems are in a stable equilibrium. These are properties that don’t tell us a lot about what any particular particle is doing, but they can describe the whole system well. The essay is higher-level than usual for my blog. But if you’re taking a statistical mechanics or thermodynamics course this is just the sort of essay you’ll find useful.
In terms of simplicity, purely mechanical systems have an advantage over thermodynamic systems in that stability and instability can be defined solely in terms of potential energy. For example the center of mass of the tower at Pisa, in its present state, must be higher than in some infinitely near positions, so we can conclude that the structure is not in stable equilibrium. This will only be the case if the tower attains the condition of metastability by returning to a vertical position or absolute stability by exceeding the tipping point and falling over.
Thermodynamic systems lack this simplicity, but in common with purely mechanical systems, thermodynamic equilibria are always metastable or stable, and never unstable. This is equivalent to saying that every spontaneous (observable) process proceeds towards an equilibrium state, never away from it.
If we restrict our attention to a thermodynamic system of unchanging composition and apply…
In this information-theory context, an experiment is just anything that could have different outcomes. A team can win or can lose or can tie in a game; that makes the game an experiment. The outcomes are the team wins, or loses, or ties. A team can get a particular score in the game; that makes that game a different experiment. The possible outcomes are the team scores zero points, or one point, or two points, or so on up to whatever the greatest possible score is.
If you know the probability p of each of the different outcomes, and since this is a mathematics thing we suppose that you do, then we have what I was calling the information content of the outcome of the experiment. That’s a number, measured in bits, and given by the formula
The sigma summation symbol means to evaluate the expression to the right of it for every value of some index j. The pj means the probability of outcome number j. And the logarithm may be that of any base, although if we use base two then we have an information content measured in bits. Those are the same bits as are in the bytes that make up the megabytes and gigabytes in your computer. You can see this number as an estimate of how many well-chosen yes-or-no questions you’d have to ask to pick the actual result out of all the possible ones.
I’d called this the information content of the experiment’s outcome. That’s an idiosyncratic term, chosen because I wanted to hide what it’s normally called. The normal name for this is the “entropy”.
To be more precise, it’s known as the “Shannon entropy”, after Claude Shannon, pioneer of the modern theory of information. However, the equation defining it looks the same as one that defines the entropy of statistical mechanics, that thing everyone knows is always increasing and somehow connected with stuff breaking down. Well, almost the same. The statistical mechanics one multiplies the sum by a constant number called the Boltzmann constant, after Ludwig Boltzmann, who did so much to put statistical mechanics in its present and very useful form. We aren’t thrown by that. The statistical mechanics entropy describes energy that is in a system but that can’t be used. It’s almost background noise, present but nothing of interest.
Is this Shannon entropy the same entropy as in statistical mechanics? This gets into some abstract grounds. If two things are described by the same formula, are they the same kind of thing? Maybe they are, although it’s hard to see what kind of thing might be shared by “how interesting the score of a basketball game is” and “how much unavailable energy there is in an engine”.
The legend has it that when Shannon was working out his information theory he needed a name for this quantity. John von Neumann, the mathematician and pioneer of computer science, suggested, “You should call it entropy. In the first place, a mathematical development very much like yours already exists in Boltzmann’s statistical mechanics, and in the second place, no one understands entropy very well, so in any discussion you will be in a position of advantage.” There are variations of the quote, but they have the same structure and punch line. The anecdote appears to trace back to an April 1961 seminar at MIT given by one Myron Tribus, who claimed to have heard the story from Shannon. I am not sure whether it is literally true, but it does express a feeling about how people understand entropy that is true.
Well, these entropies have the same form. And they’re given the same name, give or take a modifier of “Shannon” or “statistical” or some other qualifier. They’re even often given the same symbol; normally a capital S or maybe an H is used as the quantity of entropy. (H tends to be more common for the Shannon entropy, but your equation would be understood either way.)
I’m not comfortable saying they’re the same thing, though. After all, we use the same formula to calculate a batting average and to work out the average time of a commute. But we don’t think those are the same thing, at least not more generally than “they’re both averages”. These entropies measure different kinds of things. They have different units that just can’t be sensibly converted from one to another. And the statistical mechanics entropy has many definitions that not just don’t have parallels for information, but wouldn’t even make sense for information. I would call these entropies siblings, with strikingly similar profiles, but not more than that.
But let me point out something about the Shannon entropy. It is low when an outcome is predictable. If the outcome is unpredictable, presumably knowing the outcome will be interesting, because there is no guessing what it might be. This is where the entropy is maximized. But an absolutely random outcome also has a high entropy. And that’s boring. There’s no reason for the outcome to be one option instead of another. Somehow, as looked at by the measure of entropy, the most interesting of outcomes and the most meaningless of outcomes blur together. There is something wondrous and strange in that.
Peter Mander of the Carnot Cycle blog, which is primarily about thermodynamics, has a neat bit about constructing a mathematical model for how the body works. This model doesn’t look anything like a real body, as it’s concerned with basically the flow of heat, and how respiration fires the work our bodies need to do to live. Modeling at this sort of detail brings to mind an old joke told of mathematicians — that, challenged to design a maximally efficient dairy farm, the mathematician begins with “assume a spherical cow” — but great insights can come from models that look too simple to work.
It also, sad to say, includes a bit of Bright Young Science-Minded Lad (in this case, the author’s partner of the time) reasoning his way through what traumatized people might think, in a way that’s surely well-intended but also has to be described as “surely well-intended”, so, know that the tags up top of the article aren’t misleading.
I should mention — I should have mentioned earlier, but it has been a busy week — that CarnotCycle has published the second part of “The Geometry of Thermodynamics”. This is a bit of a tougher read than the first part, admittedly, but it’s still worth reading. The essay reviews how James Clerk Maxwell — yes, that Maxwell — developed the thermodynamic relationships that would have made him famous in physics if it weren’t for his work in electromagnetism that ultimately overthrew the Newtonian paradigm of space and time.
The ingenious thing is that the best part of this work is done on geometric grounds, on thinking of the spatial relationships between quantities that describe how a system moves heat around. “Spatial” may seem a strange word to describe this since we’re talking about things that don’t have any direct physical presence, like “temperature” and “entropy”. But if you draw pictures of how these quantities relate to one another, you have curves and parallelograms and figures that follow the same rules of how things fit together that you’re used to from ordinary everyday objects.
A wonderful side point is a touch of human fallibility from a great mind: in working out his relations, Maxwell misunderstood just what was meant by “entropy”, and needed correction by the at-least-as-great Josiah Willard Gibbs. Many people don’t quite know what to make of entropy even today, and Maxwell was working when the word was barely a generation away from being coined, so it’s quite reasonable he might not understand a term that was relatively new and still getting its precise definition. It’s surprising nevertheless to see.
James Clerk Maxwell and the geometrical figure with which he proved his famous thermodynamic relations
Every student of thermodynamics sooner or later encounters the Maxwell relations – an extremely useful set of statements of equality among partial derivatives, principally involving the state variables P, V, T and S. They are general thermodynamic relations valid for all systems.
The four relations originally stated by Maxwell are easily derived from the (exact) differential relations of the thermodynamic potentials: