I’ve had a little more time attempting to teach probability to my students and realized I had been overlooking something obvious in the communication of ideas such as the probability of events or the expectation value of a random variable. Students have a much easier time getting the abstract idea if the examples used for it are already ones they find interesting, and if the examples can avoid confusing interpretations. This is probably about 3,500 years behind the curve in educational discoveries, but at least I got there eventually.

A “random variable”, here, sounds a bit scary, but shouldn’t. It means that the variable, for which *x* is a popular name, is some quantity which might be any of a collection of possible values. We don’t know for any particular experiment what value it has, at least before the experiment is done, but we know how likely it is to be any of those. For example, the number of bathrooms in a house is going to be one of 1, 1.5, 2, 2.5, 3, 3.5, up to the limits of tolerance of the zoning committee.

The expectation value of a random variable is kind of the average value of that variable. You find it by taking the sum of each of the possible values of the random variable times the probability of the random variable having that value. This is at least for a discrete random variable, where the imaginable values are, er, discrete: there’s no continuous ranges of possible values. Number of bathrooms is clearly discrete; the number of seconds one spends in the bathroom is, at least in principle, continuous. For a continuous random variable you don’t take the sum, but instead take an integral, which is just a sum that handles the idea of infinitely many possible values quite well.

This bathroom example I drew from the textbook, which used (I assume imaginary) counts of how many houses in a township had 1, or 1.5, or 2, or so on bathrooms to estimate the probability of each of those bathroom counts. And all went pretty well for this example. If there were, say, 400 houses with 1 bathroom out of the 3200 sampled, this says the probability of a 1-bathroom house was ^{400}/_{3200}; if there were 640 houses with 1.5 bathrooms, that makes the probability of a 1.5-bathroom house ^{640}/_{3200}, and so on. So I’d have to multiply 1 by ^{400}/_{3200}, and 1.5 by ^{640}/_{3200}, and so on, which I find to be one of those little win-win situations in class. The multiplications I can do in my head make me look more like I can do math in my head; the ones I have to call out to anyone in the class who has a calculator for get those students to say something out loud.

Added all together, the number of bathrooms per house and the probability of each bathroom count turned up an expectation value of about 2.19 bathrooms per house, and I could feel all this easy comfort with which students accepted these numbers I was scribbling on the whiteboard vanish. Half-baths are familiar enough constructs, but who’s ever listed a place with 0.19 of a bathroom? I suppose it’d have to be a slightly undersized sink. But it strikes me that if the problem were about the expected number of children in a family, the idea of 2.19 children is familiar enough from decades of census data as to not be confusing, even if children do come in whole-number units.

There’s a happy ending, though; I realized another example that avoided trying to visualize something under a fifth of a bathroom. That went back to standardized tests, specifically the SAT. As I remembered it, and nobody chose to correct my memories, the raw scores for the multiple-choice segments are computed by awarding a student 1 point for a correct answer, and deducting a quarter-point for an incorrect answer. (Nothing is awarded for an answer left blank.) So, if a student guesses wildly among the five possible answers, what is the expected benefit, or loss, to the raw score?

If all five answers are equally likely, then the probability of getting 1 point — of getting the question right — is ^{1}/_{5}. The probability of losing ^{1}/_{4} point — of getting the answer wrong — is ^{4}/_{5}. And 1 times (^{1}/_{5}) plus (- ^{1}/_{4}) times ^{4}/_{5} is a neatly designed zero. Everybody understood SAT raw scores, and the cleverness in the point scheme in making pure guessing irrelevant, at least on average, to the raw score.

And it got better from showing what the value in the SAT strategy tip of “try to rule out one or two answers as impossible” was. If one wrong answer is ruled out and the guessing is done from the rest, the probability of gaining 1 point is 1/4, and the probability of losing 1/4 point is 3/4, for a slight but distinctly positive expectation value. The expectation value is even more encouraging for ruling out two answers as impossible. I’m certain at least more of them understand what numbers to write down and why than did on the bathroom count.

This does suggest I need to find some continuous random variables that are of obvious interest to students and don’t have intuitive pitfalls for the next week of classes, though.