Autocorrected Monkeys and Pulled Tea


The Twop Twips account on Twitter — I’m not sure how to characterize what it is exactly, but friends retweet it often enough — had the above advice about the infinite monkeys problem, and what seems to me correct advice that turning on autocorrect will get them to write the works of Shakespeare more quickly. And then John Kovaleski’s monkey-featuring comic strip Bo Nanas featured the infinite monkey problem today, so obviously I have to spend more time thinking of it.

It seems fair that monkeys with autocorrect will be more likely to hit a word than a monkey without will be. Let’s try something simpler than Shakespeare and just consider the chance of typing the word “the”, and to keep the numbers friendly let’s imagine that the keyboard has just the letters and a space bar. We’ll not care about punctuation or numbers; that’s what copy editors would be for, if anyone had been employed as a copy editor since 1996, when someone in the budgeting office discovered there was autocorrect.

Anyway, there’s 27 characters on this truncated keyboard, and if the monkeys were equally likely to hit any one of them, then, there’d be 27 times 27 times 27 — that is, 19,683 — different three-character strings they might hit. Exactly one of them is the desired word “the”. So, roughly, we would expect the monkey to get the word right one time in each 19,683 attempts at a three-character string. (We wouldn’t have to wait quite so long if we’ll accept the monkey as writing continuously and pluck out three characters in a row wherever they appear, but that’s more work than I feel like doing, and I doubt it would significantly change the qualitative results, of how much faster it’d be if autocorrect were on.)

But how many tries would be needed to hit a word that gets autocorrected to “the”? And here we get into the mysteries of the English language. I’d be surprised by a spell checker that couldn’t figure out “teh” probably means “the”. Similarly “hte” should get back to “the”. So we can suppose the five other permutations of the letters in “the” will be autocorrected. So there’s six different strings of the 19,683 possibilities that will get fixed to “the”. The monkey has one chance in 3280.5 of getting one of them and so, on average, the monkey can be expected to be right once in every 3281 attempts.

But there’s other typos possible: “thw” is probably just my finger slipping, and “ghe” isn’t too implausible either. At least my spell checker recognizes both as most likely meant to be “the”. Let’s suppose that a spell checker can get to the right word if any one letter is mistaken. This means that there are some 78 other three-character strings that would get fixed to “the”, for a total of 84 possible three-character strings which are either “the” or would get autocorrected to “the”. With that many, there’s one chance in a touch more than 234 that a three-character string will get corrected to “the”, and we have to wait, considering, not very long at all.

It gets better if two-character errors are allowed, but I can’t make myself believe that the spell check will turn “yje” into “the”, and that’s something which might be typed if you just had the right hand on the wrong keys. My checker hasn’t got any idea what “yje” is supposed to be anyway, so, one wrong letter is probably the limit.

Except. “tie” is one character wrong for “the” and no spell checker will protest “tie”. Similarly “she” and “thy” and a couple of other words. And it’d be a bit much to expect “t e” or “ he” to be turned back into “the” even though both are just the one keystroke off. And a spell checker would probably suppose that “tht” is a typo for “that”. It’s hard to guess how many of the one-character-off words will not actually be caught. Let’s say that maybe half the one-character-off words will be corrected to “the”; that’s still a pretty good 39 one-character misspellings, plus five permutations, plus the correct spelling or 45 candidate three-character strings for autocorrect to get. So our monkey has something like one chance in 450 of getting “the” in banging on the keyboard three times.

For four-letter words there are many more combinations — 531,441, if we just list the strings of our 27 allowed characters — but then there are more strings which would get autocorrected. Let’s say we want the string “thus”; there are 23 ways to arrange those letters in addition to the correct one. And there are 104 one-character-off strings; supposing that half of them will get us to “thus”, then, there’s 76 strings that get one to the desired “thus”. That’s a pretty dismal one chance in about 7,000 of typing one of them, unfortunately. Things get a little better if we suppose that some two-character errors are going to be corrected, although I can’t find one which my spell checker will accept right now, and if a single error and a transposition are viable.

With longer words yet there’s more chances for spell checker forgiveness: you can get pretty far off “accommodate” or “aneurysm” and still be saved by the spell checker, which is good for me as I last spelled “accommodate” correctly sometime in 1992, and I thought it looked wrong then.

So the conclusion has to be: you’ll get a bit of an improvement in speed by turning on autocorrect, for the obvious reason that you’re more likely to get one right out of 450 than you are to get one right out of 19,000. But it’s not going to help you very much; the number of ways to spell things so completely wrong that not even spell check can find you just grows far too rapidly to be helped. If I get a little bored I might work out the chance of getting a permutation-or-one-off for strings of different lengths.

And your monkey might be ill-served by autocorrect anyway. When I lived in Singapore I’d occasionally have teh tarik (“pulled tea”), black tea with sugar and milk tossed back and forth until it’s nice and frothy. It’s a fine drink but hard to write back home about because even if you get past the spell checker, the reader assumes the “teh” is a typo and mentally corrects for it. When this came up I’d include a ritual emphasis that I actually meant what I wrote, but you see the problem. Fortunately Shakespeare wrote relatively little about southeast Asian teas, but if you wanted to expand the infinite monkey problem to the problem of guiding tourists through Singapore, you’d have to turn the autocorrect off to have any hope of success.

About these ads

2 thoughts on “Autocorrected Monkeys and Pulled Tea

  1. I didn’t know so many people thought about monkeys, typewriters, and Shakespeare until I started reading your blog. I always enjoy the comics you share on this topic, and I really liked your probability explanation.

    • I didn’t suspect how common the monkeys-at-typewriters image was, or how popular it was with cartoonists, until I started keeping track of them for the blog here. I guess I understand why — it’s an easy thing to imagine and, hey, monkeys are usually funny — but it’s surprising how common it is, considering that it’s about a pretty abstract point of probability.

Please Write Something Good

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s