Happy New Year!
I want to bring a pretty remarkable project to people’s attention. Dan Slimmon here has taken the archive of Jeopardy! responses (you know, the answers, only the ones given in the form of a question) from the whole Jeopardy! fan archive, http://www.j-archive.com, and analyzed them. He was interested not just in the most common response — which turns out to be “What is Australia?” — but in the expectation value of the responses.
Expectation value I’ve talked about before, and for that matter, everyone mentioning probability or statistics has. Slimmon works out approximately what the expectation value would be for each clue. That is, imagine this: if you
ignored the answer on the board entirely and just guessed to every answer either responded absolutely nothing or else responded “What is Australia?”, some of the time you’d be right, and you’d get whatever that clue was worth. How much would you expect to get if you just guessed that answer? Responses that turn up often, such as “Australia”, or that turn up more often in higher-value squares, are worth more. Responses that turn up rarely, or only in low-value squares, have a lower expectation value.
Simmons goes on to list, based on his data, what the 1000 most frequent Jeopardy! responses are, and what the 1000 responses with the highest expectation value are. I’m so delighted to discover this work that I want to bring folks’ attention to it. (I do have a reservation about his calculations, but I need some time to convince myself that I understand exactly his calculation, and my reservation, before I bother anyone with it.)
The comments at his page include a discussion of a technical point about the expectation value calculation which has an interesting point about the approximations often useful, or inevitable, in this kind of work, but that’ll take a separate essay to quite explain that I haven’t the time for just today.
[ Edit: I initially misunderstood Slimmon’s method and have amended the article to reflect the calculation’s details. Specifically I misunderstood him at first to have calculated the expectation value of giving a particular response, and either having it be right or wrong. Slimmon assumed that one would either give the response or not at all; getting the answer wrong costs the contestant money and so has a negative value, while not answering has no value. ]