The Summer 2017 Mathematics A To Z: Benford's Law


Today’s entry in the Summer 2017 Mathematics A To Z is one for myself. I couldn’t post this any later.

Summer 2017 Mathematics A to Z, featuring a coati (it's kind of the Latin American raccoon) looking over alphabet blocks, with a lot of equations in the background.
Art courtesy of Thomas K Dye, creator of the web comic Newshounds. He has a Patreon for those able to support his work. He’s also open for commissions, starting from US$10.

Benford’s Law.

My car’s odometer first read 9 on my final test drive before buying it, in June of 2009. It flipped over to 10 barely a minute after that, somewhere near Jersey Freeze ice cream parlor at what used to be the Freehold Traffic Circle. Ask a Central New Jersey person of sufficient vintage about that place. Its odometer read 90 miles sometime that weekend, I think while I was driving to The Book Garden on Route 537. Ask a Central New Jersey person of sufficient reading habits about that place. It’s still there. It flipped over to 100 sometime when I was driving back later that day.

The odometer read 900 about two months after that, probably while I was driving to work, as I had a longer commute in those days. It flipped over to 1000 a couple days after that. The odometer first read 9,000 miles sometime in spring of 2010 and I don’t remember what I was driving to for that. It flipped over from 9,999 to 10,000 miles several weeks later, as I pulled into the car dealership for its scheduled servicing. Yes, this kind of impressed the dealer that I got there exactly on the round number.

The odometer first read 90,000 in late August of last year, as I was driving to some competitive pinball event in western Michigan. It’s scheduled to flip over to 100,000 miles sometime this week as I get to the dealer for its scheduled maintenance. While cars have gotten to be much more reliable and durable than they used to be, the odometer will never flip over to 900,000 miles. At least I can’t imagine owning it long enough, at my rate of driving the past eight years, that this would ever happen. It’s hard to imagine living long enough for the car to reach 900,000 miles. Thursday or Friday it should flip over to 100,000 miles. The leading digit on the odometer will be 1 or, possibly, 2 for the rest of my association with it.

The point of this little autobiography is this observation. Imagine all the days that I have owned this car, from sometime in June 2009 to whatever day I sell, lose, or replace it. Pick one. What is the leading digit of my odometer on that day? It could be anything from 1 to 9. But it’s more likely to be 1 than it is 9. Right now it’s as likely to be any of the digits. But after this week the chance of ‘1’ being the leading digit will rise, and become quite more likely than that of ‘9’. And it’ll never lose that edge.

This is a reflection of Benford’s Law. It is named, as most mathematical things are, imperfectly. The law-namer was Frank Benford, a physicist, who in 1938 published a paper The Law Of Anomalous Numbers. It confirmed the observation of Simon Newcomb. Newcomb was a 19th century astronomer and mathematician of an exhausting number of observations and developments. Newcomb observed the logarithm tables that anyone who needed to compute referred to often. The earlier pages were more worn-out and dirty and damaged than the later pages. People worked with numbers that start with ‘1’ more than they did numbers starting with ‘2’. And more those that start ‘2’ than start ‘3’. More that start with ‘3’ than start with ‘4’. And on. Benford showed this was not some fluke of calculations. It turned up in bizarre collections of data. The surface areas of rivers. The populations of thousands of United States municipalities. Molecular weights. The digits that turned up in an issue of Reader’s Digest. There is a bias in the world toward numbers that start with ‘1’.

And this is, prima facie, crazy. How can the surface areas of rivers somehow prefer to be, say, 100-199 hectares instead of 500-599 hectares? A hundred is a human construct. (Indeed, it’s many human constructs.) That we think ten is an interesting number is an artefact of our society. To think that 100 is a nice round number and that, say, 81 or 144 are not is a cultural choice. Grant that the digits of street addresses of people listed in American Men of Science — one of Benford’s data sources — have some cultural bias. How can another of his sources, molecular weights, possibly?

The bias sneaks in subtly. Don’t they all? It lurks at the edge of the table of data. The table header, perhaps, where it says “River Name” and “Surface Area (sq km)”. Or at the bottom where it says “Length (miles)”. Or it’s never explicit, because I take for granted people know my car’s mileage is measured in miles.

What would be different in my introduction if my car were Canadian, and the odometer measured kilometers instead? … Well, I’d not have driven the 9th kilometer; someone else doing a test-drive would have. The 90th through 99th kilometers would have come a little earlier that first weekend. The 900th through 999th kilometers too. I would have passed the 99,999th kilometer years ago. In kilometers my car has been in the 100,000s for something like four years now. It’s less absurd that it could reach the 900,000th kilometer in my lifetime, but that still won’t happen.

What would be different is the precise dates about when my car reached its milestones, and the amount of days it spent in the 1’s and the 2’s and the 3’s and so on. But the proportions? What fraction of its days it spends with a 1 as the leading digit versus a 2 or a 5? … Well, that’s changed a little bit. There is some final mile, or kilometer, my car will ever register and it makes a little difference whether that’s 239,000 or 385,000. But it’s only a little difference. It’s the difference in how many times a tossed coin comes up heads on the first 1,000 flips versus the second 1,000 flips. They’ll be different numbers, but not that different.

What’s the difference between a mile and a kilometer? A mile is longer than a kilometer, but that’s it. They measure the same kinds of things. You can convert a measurement in miles to one in kilometers by multiplying by a constant. We could as well measure my car’s odometer in meters, or inches, or parsecs, or lengths of football fields. The difference is what number we multiply the original measurement by. We call this “scaling”.

Whatever we measure, in whatever unit we measure, has to have a leading digit of something. So it’s got to have some chance of starting out with a ‘1’, some chance of starting out with a ‘2’, some chance of starting out with a ‘3’, and so on. But that chance can’t depend on the scale. Measuring something in smaller or larger units doesn’t change the proportion of how often each leading digit is there.

These facts combine to imply that leading digits follow a logarithmic-scale law. The leading digit should be a ‘1’ something like 30 percent of the time. And a ‘2’ about 18 percent of the time. A ‘3’ about one-eighth of the time. And it decreases from there. ‘9’ gets to take the lead a meager 4.6 percent of the time.

Roughly. It’s not going to be so all the time. Measure the heights of humans in meters and there’ll be far more leading digits of ‘1’ than we should expect, as most people are between 1 and 2 meters tall. Measure them in feet and ‘5’ and ‘6’ take a great lead. The law works best when data can sprawl over many orders of magnitude. If we lived in a world where people could as easily be two inches as two hundred feet tall, Benford’s Law would make more accurate predictions about their heights. That something is a mathematical truth does not mean it’s independent of all reason.

For example, the reader thinking back some may be wondering: granted that atomic weights and river areas and populations carry units with them that create this distribution. How do street addresses, one of Benford’s observed sources, carry any unit? Well, street addresses are, at least in the United States custom, a loose measure of distance. The 100 block (for example) of a street is within one … block … from whatever the more important street or river crossing that street is. The 900 block is farther away.

This extends further. Block numbers are proxies for distance from the major cross feature. House numbers on the block are proxies for distance from the start of the block. We have a better chance to see street number 419 than 1419, to see 419 than 489, or to see 419 than to see 1489. We can look at Benford’s Law in the second and third and other minor digits of numbers. But we have to be more cautious. There is more room for variation and quirk events. A block-filling building in the downtown area can take whatever street number the owners think most auspicious. Smaller samples of anything are less predictable.

Nevertheless, Benford’s Law has become famous to forensic accountants the past several decades, if we allow the use of the word “famous” in this context. But its fame is thanks to the economists Hal Varian and Mark Nigrini. They observed that real-world financial data should be expected to follow this same distribution. If they don’t, then there might be something suspicious going on. This is not an ironclad rule. There might be good reasons for the discrepancy. If your work trips are always to the same location, and always for one week, and there’s one hotel it makes sense to stay at, and you always learn you’ll need to make the trips about one month ahead of time, of course the hotel bill will be roughly the same. Benford’s Law is a simple, rough tool, a way to decide what data to scrutinize for mischief. With this in mind I trust none of my readers will make the obvious leading-digit mistake when padding their expense accounts anymore.

Since I’ve done you that favor, anyone out there think they can pick me up at the dealer’s Thursday, maybe Friday? Thanks in advance.

Reading the Comics, August 10, 2015: How People Think Edition


Today’s installment of Reading the Comics has a bunch of strips that seem to touch on human psychology. That properly could always be said; what we know of mathematics is what humans have thought about. But sometimes the link between a mathematical topic and human psychology is more obvious.

Wes Molebash’s Molebashed (August 5) is a reminder that one can find interesting mental arithmetic problems anywhere. This does not mean they’re always welcome. But they can still be fun to do. For example while walking through a parking lot I noticed another state’s license plate and wondered how many six-letter combinations you could get. Well, that’s 266, obviously, but how big a number is that? Working out that sort of thing is why people have to repeat what they’re saying to me.

Mark Pett’s Mr Lowe (August 6, rerun from sometime in 2000) has a student complaining the mathematics books are two years old. The complaint is absurd but also kind of sensible. Mathematical truths are immortal, or at least they are once they’re proven. Whether something is proven is, to an extent, a cultural construct: it takes an incredible load of work to actually prove something rigorously with every step in place. We usually are content if we show enough reasoning to be confident that every step could be filled in if need be. More a matter of taste, though, is whether these truths are interesting. As an example, I mentioned just a few posts ago the versine function. There are computations which, if you’re doing them by hand, are best done with the versine function or a table of values of the versine function. But we don’t need to do that sort of work anymore, and the versine function has plunged into obscurity. Nothing that we knew about versines has stopped being true. But we’d be eccentric, at least, to make it a part of a trigonometry course in the way someone 150 years ago might have. Mathematics is not culturally neutral. Few interesting things are.

Kieran Meehan's Pros and Cons for the 7th of August, 2015.
Kieran Meehan’s Pros and Cons for the 7th of August, 2015.

Kieran Meehan’s Pros and Cons (August 7) is a probability joke. As often happens, the probability joke is built on the gambler’s fallacy. The fallacy in this case is the supposition that if one hasn’t had an accident in an unusually long while, then one must be due. Properly, though, we should ask whether accidents are independent events. If they are independent — if the chance of having an accident does not change based on whether you had an accident yesterday, or in the past week, or in the past year, or so on — then it’s silly to say you’re “due” for one. If your rate of accidents is lower than expected, you’re just having a lucky streak is all. However, I can imagine the chance of having an accident not being independent. I can imagine going a long time without accidents making someone careless about normal risks, or inexperienced in judging new ones, and that might make an accident more likely that one expects. It’s difficult to answer a probability question without understanding human psychology.

John Graziano’s Ripley’s Believe It or Not (August 7) claims there are over 26,000 possible outcomes of tic-tac-toe. I think the claim is poorly worded, though. If by an “outcome” of a tic-tac-toe game we mean the arrangement of X and O marks then there are at most 19,683 outcomes — each of the nine cells contains an X, an O, or is left blank. That’s an overestimate, though. A grid of nine X’s can’t be a legal outcome of a game, after all; nor can one that has two X’s, one O, and six blank spaces. There have to be at least three X’s and at least two O’s, and at most four blank spaces. The number of X’s can be equal to or one greater than the number of O’s. This removes a lot of possibilities.

I think what Graziano’s Ripley’s wants to claim is there are over 26,000 different tic-tac-toe games. This I can more readily believe. There are 9 possible spaces the first player can take on the first turn; there are 8 choices for the second player on the first turn. There are 7 choices for the first player on the second turn; there are 6 choices for the second player on the second turn. And so on. So there are at most 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 possible ways to play out the game; that’s a total of 362,880 possibilities. But not all those possibilities are needed. If a game’s won after two and a half turns, it stops, and a lot of possible continuations are voided. I don’t have a good estimate of how many those are. And we might choose to rule out symmetries. The game in which X fills out the top row while O tries the center isn’t really different to the game in which X fills out the bottom row while O takes the center. For that matter, it’s not different to the one where X fills in the right column while O fills in the center column. If you don’t count symmetries like this as different games, then we have fewer games altogether. So if that is what Graziano means, then 26,000 may be a fair estimate of tic-tac-toe games.

That, by the way, is the strip that gave me the most to think about of this set.

Hammy is tricked into summer math again. (He's asked how many cookies he and three friends would have in total, if they each got two cookies. He gets the answer correct.)
Rick Kirkman and Jerry Scott’s Baby Blues for the 8th of August, 2015.

Rick Kirkman and Jerry Scott’s Baby Blues (August 8) is another installment of Kids Doing Mathematics During Summer Vacation. This is almost the theme of the summer in mathematics comics. Possibly it’s the theme of every summer.

Bill Amend’s FoxTrot (August 9) is one of those odd jokes that also is a pretty good business opportunity. Jason Fox proposes some of the many shapes that could, in principle, hold ice cream. I believe hemispheres at least are available, actually, at least to restaurants. But some of these shapes, such as pyramids or dodecahedrons or such, seem like they could be made and just happen not to have been. (Well, half-dodecahedrons, anyway.) That probably reflects that a cone or similarly narrow-based shape forces more of a given amount of ice cream to overflow the top of the cone, suggesting abundance. Geometric possibilities must give way to making the product look bigger.