My 2019 Mathematics A To Z: Sample Space

Today’s A To Z term is another from goldenoj. It’s one important to probability, and it’s one at the center of the field.

Sample Space.

The sample space is a tool for probability questions. We need them. Humans are bad at probability questions. Thinking of sample spaces helps us. It’s a way to recast probability questions so that our intuitions about space — which are pretty good — will guide us to probabilities.

A sample space collects the possible results of some experiment. “Experiment” means what way mathematicians intend, so, not something with test tubes and colorful liquids that might blow up. Instead it’s things like tossing coins and dice and pulling cards out of reduced decks. At least while we’re learning. In real mathematical work this turns into more varied stuff. Fluid flows or magnetic field strengths or economic forecasts. The experiment is the doing of something which gives us information. This information is the result of flipping this coin or drawing this card or measuring this wind speed. Once we know the information, that’s the outcome.

So each possible outcome we represent as a point in the sample space. Describing it as a “space” might cause trouble. “Space” carries connotations of something three-dimensional and continuous and contiguous. This isn’t necessarily so. We can be interested in discrete outcomes. A coin’s toss has two possible outcomes. Three, if we count losing the coin. The day of the week on which someone’s birthday falls has seven possible outcomes. We can also be interested in continuous outcomes. The amount of rain over the day is some nonnegative real number. The amount of time spent waiting at this traffic light is some nonnegative real number. We’re often interested in discrete representations of something continuous. We did not have $\frac{1}{2}\sqrt{2}$ inches of rain overnight, even if we did. We recorded 0.71 inches after the storm.

We don’t demand every point in the sample space to be equally probable. There seems to be a circularity to requiring that. What we do demand is that the sample space be a “sigma algebra”, or σ-algebra to write it briefly. I don’t know how σ came to be the shorthand for this kind of algebra. Here “algebra” means a thing with a bunch of rules. These rules are about what you’d guess if you read pop mathematics blogs and had to bluff your way through a conversation of rules about sets. The algebra’s this collection of sets made up of the elements of X. Subsets of this algebra have to be contained in this collection. Their complements are also sets in the collection. The unions of sets have to be in the collection.

So the sample space is a set. All the possible outcomes of the experiment we’re thinking about are its elements. Every experiment must have some outcome that’s inside the sample space. And any two different outcomes have to be mutually exclusive. That is, if outcome A has happened, then outcome B has not happened. And vice-versa; I’m not so fond of A that I would refuse B.

I see your protest. You’ve worked through probability homework problems where you’re asked the chance a card drawn from this deck is either a face card or a diamond. The jack of diamonds is both. This is true; but it’s not what we’re looking at. The outcome of this experiment is the card that’s drawn, which might be any of 52 options.

If you like treating it that way. You might build the sample space differently, like saying that it’s an ordered pair. One part of the pair is the suit of the card. The other part is the value. This might be better for the problem you’re doing. This is part of why the probability department commands such high wages. There are many sample spaces that can describe the problem you’re interested in. This does include one where one event is “draw a card that’s a face card or diamond” and the other is “draw one that isn’t”. (These events don’t have an equal probability.) The work is finding a sample space that clarifies your problem.

Working out the sample space that clarifies the problem is the hard part, usually. Not being rigorous about the space gives us many probability paradoxes. You know, like the puzzle where you’re told someone’s two children are either boys or girls. One walks in and it’s a girl. You’re told the probability the other is a boy is two-thirds. And you get mad. Or the Monty Hall Paradox, where you’re asked to pick which of three doors has the grand prize behind it. You’re shown one that you didn’t pick which hasn’t. You’re given the chance to switch to the remaining door. You’re told the probability that the grand prize is behind that other door is two-thirds, and you get mad. There are probability paradoxes that don’t involve a chance of two-thirds. Having a clear idea of the sample space avoids getting the answers wrong, at least. There’s not much to do about not getting mad.

Like I said, we don’t insist that every point in the sample space have an equal probability of being the outcome. Or, if it’s a continuous space, that every region of the same area has the same probability. It is certainly easier if it does. Then finding the probability of some result becomes easy. You count the number of outcomes that satisfy that result, and divide by the total number of outcomes. You see this in problems about throwing two dice and asking the chance the total is seven, or five, or twelve.

For a continuous sample space, you’d find the area of all the results that satisfy the result. Divide that by the area of the sample space and there’s the probability of that result. (It’s possible for a result to have an area of zero, which implies that the thing cannot happen. This presents a paradox. A thing is in the sample space because it is a possible outcome. What these measure-zero results are, typically, is something like every one of infinitely many tossed coins coming up tails. That can’t happen, but it’s not like there’s any reason it can’t.)

If every outcome isn’t equally likely, though? Sometimes we can redesign the sample space to something that is. The result of rolling two dice is a familiar example. The chance of the dice totalling 2 is different from the chance of them totalling 4. So a sample space that’s just the sums, the numbers 2 through 12, is annoying to deal with. But rewrite the space as the ordered pairs, the result of die one and die two? Then we have something nice. The chance of die one being 1 and die two being 1 is the same as the chance of die one being 2 and die two being 2. There happen to be other die combinations that add up to 4 is all.

Sometimes there’s no finding a sample space which describes what you’re interested in and that makes every point equally probable. Or nearly enough. The world is vast and complicated. That’s all right. We can have a function that describes, for each point in the sample space, the probability of its turning up. Really we had that already, for equally-probable outcomes. It’s just that was all the same number. But this function is called the probability measure. If we combine together a sample space, and a collection of all the events we’re interested in, and a probability measure for all these events, then this triad is a probability space.

And probability spaces give us all sorts of great possibilities. Dearest to my own work is Monte Carlo methods, in which we look for particular points inside the sample space. We do this by starting out anywhere, picking a point at random. And then try moving to a different point, picking the “direction” of the change at random. We decide whether that move succeeds by a rule that depends in part on the probability measure, and in part on how well whatever we’re looking for holds true. This is a scheme that demands a lot of calculation. You won’t be surprised that it only became a serious tool once computing power was abundant.

So for many problems there is no actually listing all the sample space. A real problem might include, say, the up-or-down orientation of millions of magnets. This is a sample space of unspeakable vastness. But thinking out this space, and what it must look like, helps these probability questions become ones that our intuitions help us with instead. If you do not know what to do with a probability question, think to the sample spaces.

This and other essays for the Fall 2019 A to Z should be at this link. Later this week I hope to publish the letter T. And all of the A to Z essays ought to be at this link. Thanks for reading.

Reading the Comics, November 25, 2017: Shapes and Probability Edition

This week was another average-grade week of mathematically-themed comic strips. I wonder if I should track them and see what spurious correlations between events and strips turn up. That seems like too much work and there’s better things I could do with my time, so it’s probably just a few weeks before I start doing that.

Ruben Bolling’s Super-Fun-Pax Comics for the 19th is an installment of A Voice From Another Dimension. It’s in that long line of mathematics jokes that are riffs on Flatland, and how we might try to imagine spaces other than ours. They’re taxing things. We can understand some of the rules of them perfectly well. Does that mean we can visualize them? Understand them? I’m not sure, and I don’t know a way to prove whether someone does or does not. This wasn’t one of the strips I was thinking of when I tossed “shapes” into the edition title, but you know what? It’s close enough to matching.

Olivia Walch’s Imogen Quest for the 20th — and I haven’t looked, but it feels to me like I’m always featuring Imogen Quest lately — riffs on the Monty Hall Problem. The problem is based on a game never actually played on Monty Hall’s Let’s Make A Deal, but very like ones they do. There’s many kinds of games there, but most of them amount to the contestant making a choice, and then being asked to second-guess the choice. In this case, pick a door and then second-guess whether to switch to another door. The Monty Hall Problem is a great one for Internet commenters to argue about while the rest of us do something productive. The trouble — well, one trouble — is that whether switching improves your chance to win the car is that whether it does depends on the rules of the game. It’s not stated, for example, whether the host must open a door showing a goat behind it. It’s not stated that the host certainly knows which doors have goats and so chooses one of those. It’s not certain the contestant even wants a car when, hey, goats. What assumptions you make about these issues affects the outcome.

If you take the assumptions that I would, given the problem — the host knows which door the car’s behind, and always offers the choice to switch, and the contestant would rather have a car, and such — then Walch’s analysis is spot on.

Jonathan Mahood’s Bleeker: The Rechargeable Dog for the 20th features a pretend virtual reality arithmetic game. The strip is of incredibly low mathematical value, but it’s one of those comics I like that I never hear anyone talking about, so, here.

Richard Thompson’s Cul de Sac rerun for the 20th talks about shapes. And the names for shapes. It does seem like mathematicians have a lot of names for slightly different quadrilaterals. In our defense, if you’re talking about these a lot, it helps to have more specific names than just “quadrilateral”. Rhomboids are those parallelograms which have all four sides the same length. A parallelogram has to have two pairs of equal-sized legs, but the two pairs’ sizes can be different. Not so a rhombus. Mathworld says a rhombus with a narrow angle that’s 45 degrees is sometimes called a lozenge, but I say they’re fibbing. They make even more preposterous claims on the “lozenge” page.

Todd Clark’s Lola for the 20th does the old “when do I need to know algebra” question and I admit getting grumpy like this when people ask. Do French teachers have to put up with this stuff?

Brian Fies’s Mom’s Cancer rerun for the 23rd is from one of the delicate moments in her story. Fies’s mother just learned the average survival rate for her cancer treatment is about five percent and, after months of things getting haltingly better, is shaken. But as with most real-world probability questions context matters. The five-percent chance is, as described, the chance someone who’d just been diagnosed in the state she’d been diagnosed in would survive. The information that she’s already survived months of radiation and chemical treatment and physical therapy means they’re now looking at a different question. What is the chance she will survive, given that she has survived this far with this care?

Mark Anderson’s Andertoons for the 24th is the Mark Anderson’s Andertoons for the week. It’s a protesting-student kind of joke. For the student’s question, I’m not sure how many sides a polygon has before we can stop memorizing them. I’d say probably eight. Maybe ten. Of the shapes whose names people actually care about, mm. Circle, triangle, a bunch of quadrilaterals, pentagons, hexagons, octagons, maybe decagon and dodecagon. No, I’ve never met anyone who cared about nonagons. I think we could drop heptagons without anyone noticing either. Among quadrilaterals, ugh, let’s see. Square, rectangle, rhombus, parallelogram, trapezoid (or trapezium), and I guess diamond although I’m not sure what that gets you that rhombus doesn’t already. Toss in circles, ellipses, and ovals, and I think that’s all the shapes whose names you use.

Stephan Pastis’s Pearls Before Swine for the 25th does the rounding-up joke that’s been going around this year. It’s got a new context, though.

Reading the Comics, August 29, 2015: Unthemed Edition

I can’t think of any particular thematic link through the past week’s mathematical comic strips. This happens sometimes. I’ll make do. They’re all Gocomics.com strips this time around, too, so I haven’t included the strips. The URLs ought to be reasonably stable.

J C Duffy’s Lug Nuts (August 23) is a cute illustration of the first, second, third, and fourth dimensions. The wall-of-text might be a bit off-putting, especially the last panel. It’s worth the reading. Indeed, you almost don’t need the cartoon if you read the text.

Tom Toles’s Randolph Itch, 2 am (August 24) is an explanation of pie charts. This might be the best stilly joke of the week. I may just be an easy touch for a pie-in-the-face.

Charlie Podrebarac’s Cow Town (August 26) is about the first day of mathematics camp. It’s also every graduate students’ thesis defense anxiety dream. The zero with a slash through it popping out of Jim Smith’s mouth is known as the null sign. That comes to us from set theory, where it describes “a set that has no elements”. Null sets have many interesting properties considering they haven’t got any things. And that’s important for set theory. The symbol was introduced to mathematics in 1939 by Nicholas Bourbaki, the renowned mathematician who never existed. He was important to the course of 20th century mathematics.

Eric the Circle (August 26), this one by ‘Arys’, is a Venn diagram joke. It makes me realize the Eric the Circle project does less with Venn diagrams than I expected.

John Graziano’s Ripley’s Believe It Or Not (August 26) talks of a Akira Haraguchi. If we believe this, then, in 2006 he recited 111,700 digits of pi from memory. It’s an impressive stunt and one that makes me wonder who did the checking that he got them all right. The fact-checkers never get their names in Graziano’s Ripley’s.

Mark Parisi’s Off The Mark (August 27, rerun from 1987) mentions Monty Hall. This is worth mentioning in these parts mostly as a matter of courtesy. The Monty Hall Problem is a fine and imagination-catching probability question. It represents a scenario that never happened on the game show Let’s Make A Deal, though.

Jeff Stahler’s Moderately Confused (August 28) is a word problem joke. I do wonder if the presence of battery percentage indicators on electronic devices has helped people get a better feeling for percentages. I suppose only vaguely. The devices can be too strangely nonlinear to relate percentages of charge to anything like device lifespan. I’m thinking here of my cell phone, which will sit in my messenger bag for three weeks dropping slowly from 100% to 50%, and then die for want of electrons after thirty minutes of talking with my father. I imagine you have similar experiences, not necessarily with my father.

Thom Bluemel’s Birdbrains (August 29) is a caveman-mathematics joke. This one’s based on calendars, which have always been mathematical puzzles.

At Least One Daughter Exists

In the class I’m teaching we’ve entered probability. This is a fun subject. It’s one of the bits of mathematics which people encounter most often, about as much as the elements of geometry enter ordinary life. It seems like everyone has some instinctive understanding of probability, at least given how people will hear a probability puzzle and give a solution with confidence. You don’t get that with pure algebra problems. Ask someone “the neighbor’s two children were born three years apart and twice the sum of their ages is 42; how old are they?” and you get an assurance of how mathematics was always their weakest subject and they never could do it. Ask someone “one of the neighbor’s children just walked in, and was a girl; what is the probability the other child is also a girl?” and you’ll get an answer.

But it’s getting a correct answer that is really interesting, and unfortunately, while everyone has some instinctive understanding and will give an answer as above, there’s little guarantee it’ll be the right one. Sometimes, and I say this looking over the exam papers, it seems our instinctive understanding of probability is designed to be the wrong one. I’m happy that people aren’t afraid of doing probability questions, not the way they are afraid of algebra or geometry or calculus or the more exotic realms, though, and feel like it’s my role to find the most straightforward ways to understanding which start from that willingness to try.

Some of the rotten track record people have in probability puzzles probably derives from how so many probability puzzles start as recreational puzzles, that is, things which are meant to look easy and turn out to be subtly complicated. I suspect the daughters-question comes from recreational puzzles, since there’s the follow-up question that “the elder child enters, and is a girl; what is the probability the younger is a girl?” There’s some soundness in presenting the two as a learning path, since they present what looks like the same question twice, and get different answers, and learning why there are different answers teaches something about how to do probability questions. But it still feels to me like the goal is that pleasant confusion a trick offers.

One Goat Short

I like game shows. Liking game shows is not one of the more respectable hobbies, compared to, say, Crimean War pedantry, or laughing at goats. Game shows have a long history of being sneered at by people who can’t be bothered to learn enough about game shows to sneer at them for correct reasons. Lost somewhere within my archives is even an anthology of science fiction short stories about game shows, which if you take out the punch lines of “and the loser DIES!” or “and the host [ typically Chuck Woolery ] is SATAN!”, would leave nearly nothing, and considering that science fiction as a genre has spent most of its existence feeling picked-on as the “smelly, unemployed cousin of the entertainment industry” (Mike Nelson’s Movie Megacheese) that’s quite some sneering. Sneering at game shows even earned an episode of The Mary Tyler Moore show which managed to be not just bad but offensively illogical.

Nevertheless, I like them, and was a child at a great age for game shows on broadcast television: the late 1970s and early 1980s had an apparently endless menu of programs, testing people’s abilities to think of words, to spell words, to price household goods, and guess how other people answered surveys. We haven’t anything like that anymore; on network TV about the only game shows that survive are Jeopardy! (which nearly alone of the genre gets any respect), Wheel of Fortune, The Price Is Right, and, returned after decades away, Let’s Make A Deal. (I don’t regard reality shows as game shows, despite a common programming heritage. I can’t say what it is precisely other than location and sometimes scale that, say, Survivor or The Amazing Race do that Beat The Clock or Truth Or Consequences do not, but there’s something.) Now and then something new flutters into being, but it vanishes without leaving much of a trace, besides retreading jokes about the people who’d watch it.

All that is longwinded preliminary to one of those things that amuses mostly me. On the Thursday (27 October) episode of Let’s Make A Deal, they briefly looked like they might be playing The Monty Hall Problem.