## The Summer 2017 Mathematics A To Z: Benford's Law

Today’s entry in the Summer 2017 Mathematics A To Z is one for myself. I couldn’t post this any later.

# Benford’s Law.

My car’s odometer first read 9 on my final test drive before buying it, in June of 2009. It flipped over to 10 barely a minute after that, somewhere near Jersey Freeze ice cream parlor at what used to be the Freehold Traffic Circle. Ask a Central New Jersey person of sufficient vintage about that place. Its odometer read 90 miles sometime that weekend, I think while I was driving to The Book Garden on Route 537. Ask a Central New Jersey person of sufficient reading habits about that place. It’s still there. It flipped over to 100 sometime when I was driving back later that day.

The odometer read 900 about two months after that, probably while I was driving to work, as I had a longer commute in those days. It flipped over to 1000 a couple days after that. The odometer first read 9,000 miles sometime in spring of 2010 and I don’t remember what I was driving to for that. It flipped over from 9,999 to 10,000 miles several weeks later, as I pulled into the car dealership for its scheduled servicing. Yes, this kind of impressed the dealer that I got there exactly on the round number.

The odometer first read 90,000 in late August of last year, as I was driving to some competitive pinball event in western Michigan. It’s scheduled to flip over to 100,000 miles sometime this week as I get to the dealer for its scheduled maintenance. While cars have gotten to be much more reliable and durable than they used to be, the odometer will never flip over to 900,000 miles. At least I can’t imagine owning it long enough, at my rate of driving the past eight years, that this would ever happen. It’s hard to imagine living long enough for the car to reach 900,000 miles. Thursday or Friday it should flip over to 100,000 miles. The leading digit on the odometer will be 1 or, possibly, 2 for the rest of my association with it.

The point of this little autobiography is this observation. Imagine all the days that I have owned this car, from sometime in June 2009 to whatever day I sell, lose, or replace it. Pick one. What is the leading digit of my odometer on that day? It could be anything from 1 to 9. But it’s more likely to be 1 than it is 9. Right now it’s as likely to be any of the digits. But after this week the chance of ‘1’ being the leading digit will rise, and become quite more likely than that of ‘9’. And it’ll never lose that edge.

This is a reflection of Benford’s Law. It is named, as most mathematical things are, imperfectly. The law-namer was Frank Benford, a physicist, who in 1938 published a paper The Law Of Anomalous Numbers. It confirmed the observation of Simon Newcomb. Newcomb was a 19th century astronomer and mathematician of an exhausting number of observations and developments. Newcomb observed the logarithm tables that anyone who needed to compute referred to often. The earlier pages were more worn-out and dirty and damaged than the later pages. People worked with numbers that start with ‘1’ more than they did numbers starting with ‘2’. And more those that start ‘2’ than start ‘3’. More that start with ‘3’ than start with ‘4’. And on. Benford showed this was not some fluke of calculations. It turned up in bizarre collections of data. The surface areas of rivers. The populations of thousands of United States municipalities. Molecular weights. The digits that turned up in an issue of Reader’s Digest. There is a bias in the world toward numbers that start with ‘1’.

And this is, prima facie, crazy. How can the surface areas of rivers somehow prefer to be, say, 100-199 hectares instead of 500-599 hectares? A hundred is a human construct. (Indeed, it’s many human constructs.) That we think ten is an interesting number is an artefact of our society. To think that 100 is a nice round number and that, say, 81 or 144 are not is a cultural choice. Grant that the digits of street addresses of people listed in American Men of Science — one of Benford’s data sources — have some cultural bias. How can another of his sources, molecular weights, possibly?

The bias sneaks in subtly. Don’t they all? It lurks at the edge of the table of data. The table header, perhaps, where it says “River Name” and “Surface Area (sq km)”. Or at the bottom where it says “Length (miles)”. Or it’s never explicit, because I take for granted people know my car’s mileage is measured in miles.

What would be different in my introduction if my car were Canadian, and the odometer measured kilometers instead? … Well, I’d not have driven the 9th kilometer; someone else doing a test-drive would have. The 90th through 99th kilometers would have come a little earlier that first weekend. The 900th through 999th kilometers too. I would have passed the 99,999th kilometer years ago. In kilometers my car has been in the 100,000s for something like four years now. It’s less absurd that it could reach the 900,000th kilometer in my lifetime, but that still won’t happen.

What would be different is the precise dates about when my car reached its milestones, and the amount of days it spent in the 1’s and the 2’s and the 3’s and so on. But the proportions? What fraction of its days it spends with a 1 as the leading digit versus a 2 or a 5? … Well, that’s changed a little bit. There is some final mile, or kilometer, my car will ever register and it makes a little difference whether that’s 239,000 or 385,000. But it’s only a little difference. It’s the difference in how many times a tossed coin comes up heads on the first 1,000 flips versus the second 1,000 flips. They’ll be different numbers, but not that different.

What’s the difference between a mile and a kilometer? A mile is longer than a kilometer, but that’s it. They measure the same kinds of things. You can convert a measurement in miles to one in kilometers by multiplying by a constant. We could as well measure my car’s odometer in meters, or inches, or parsecs, or lengths of football fields. The difference is what number we multiply the original measurement by. We call this “scaling”.

Whatever we measure, in whatever unit we measure, has to have a leading digit of something. So it’s got to have some chance of starting out with a ‘1’, some chance of starting out with a ‘2’, some chance of starting out with a ‘3’, and so on. But that chance can’t depend on the scale. Measuring something in smaller or larger units doesn’t change the proportion of how often each leading digit is there.

These facts combine to imply that leading digits follow a logarithmic-scale law. The leading digit should be a ‘1’ something like 30 percent of the time. And a ‘2’ about 18 percent of the time. A ‘3’ about one-eighth of the time. And it decreases from there. ‘9’ gets to take the lead a meager 4.6 percent of the time.

Roughly. It’s not going to be so all the time. Measure the heights of humans in meters and there’ll be far more leading digits of ‘1’ than we should expect, as most people are between 1 and 2 meters tall. Measure them in feet and ‘5’ and ‘6’ take a great lead. The law works best when data can sprawl over many orders of magnitude. If we lived in a world where people could as easily be two inches as two hundred feet tall, Benford’s Law would make more accurate predictions about their heights. That something is a mathematical truth does not mean it’s independent of all reason.

For example, the reader thinking back some may be wondering: granted that atomic weights and river areas and populations carry units with them that create this distribution. How do street addresses, one of Benford’s observed sources, carry any unit? Well, street addresses are, at least in the United States custom, a loose measure of distance. The 100 block (for example) of a street is within one … block … from whatever the more important street or river crossing that street is. The 900 block is farther away.

This extends further. Block numbers are proxies for distance from the major cross feature. House numbers on the block are proxies for distance from the start of the block. We have a better chance to see street number 419 than 1419, to see 419 than 489, or to see 419 than to see 1489. We can look at Benford’s Law in the second and third and other minor digits of numbers. But we have to be more cautious. There is more room for variation and quirk events. A block-filling building in the downtown area can take whatever street number the owners think most auspicious. Smaller samples of anything are less predictable.

Nevertheless, Benford’s Law has become famous to forensic accountants the past several decades, if we allow the use of the word “famous” in this context. But its fame is thanks to the economists Hal Varian and Mark Nigrini. They observed that real-world financial data should be expected to follow this same distribution. If they don’t, then there might be something suspicious going on. This is not an ironclad rule. There might be good reasons for the discrepancy. If your work trips are always to the same location, and always for one week, and there’s one hotel it makes sense to stay at, and you always learn you’ll need to make the trips about one month ahead of time, of course the hotel bill will be roughly the same. Benford’s Law is a simple, rough tool, a way to decide what data to scrutinize for mischief. With this in mind I trust none of my readers will make the obvious leading-digit mistake when padding their expense accounts anymore.

Since I’ve done you that favor, anyone out there think they can pick me up at the dealer’s Thursday, maybe Friday? Thanks in advance.

## 48 Altered States

I saw this intriguing map produced by Brian Brettschneider.

He made it on and for Twitter, as best I can determine. I found it from a stray post in Usenet newsgroup soc.history.what-if, dedicated to ways history could have gone otherwise. It also covers ways that it could not possibly have gone otherwise but would be interesting to see happen. Very different United States state boundaries are part of the latter set of things.

The location of these boundaries is described in English and so comes out a little confusing. It’s hard to make concise. Every point in, say, this alternate Missouri is closer to Missouri’s capital of … uhm … Missouri City than it is to any other state’s capital. And the same for all the other states. All you kind readers who made it through my recent A To Z know a technical term for this. This is a Voronoi Diagram. It uses as its basis points the capitals of the (contiguous) United States.

It’s an amusing map. I mean amusing to people who can attach concepts like amusement to maps. It’d probably be a good one to use if someone needed to make a Risk-style grand strategy game map and didn’t want to be to beholden to the actual map.

No state comes out unchanged, although a few don’t come out too bad. Maine is nearly unchanged. Michigan isn’t changed beyond recognition. Florida gets a little weirder but if you showed someone this alternate shape they’d recognize the original. No such luck with alternate Tennessee or alternate Wyoming.

The connectivity between states changes a little. California and Arizona lose their border. Washington and Montana gain one; similarly, Vermont and Maine suddenly become neighbors. The “Four Corners” spot where Utah, Colorado, New Mexico, and Arizona converge is gone. Two new ones look like they appear, between New Hampshire, Massachusetts, Rhode Island, and Connecticut; and between Pennsylvania, Maryland, Virginia, and West Virginia. I would be stunned if that weren’t just because we can’t zoom far enough in on the map to see they’re actually a pair of nearby three-way junctions.

I’m impressed by the number of borders that are nearly intact, like those of Missouri or Washington. After all, many actual state boundaries are geographic features like rivers that a Voronoi Diagram doesn’t notice. How could Ohio come out looking anything like Ohio?

The reason comes to historical subtleties. At least once you get past the original 13 states, basically the east coast of the United States. The boundaries of those states were set by colonial charters, with boundaries set based on little or ambiguous information about what the local terrain was actually like, and drawn to reward or punish court factions and favorites. Never mind the original thirteen (plus Maine and Vermont, which we might as well consider part of the original thirteen).

After that, though, the United States started drawing state boundaries and had some method to it all. Generally a chunk of territory would be split into territories and later states that would be roughly rectangular, so far as practical, and roughly similar in size to the other states carved of the same area. So for example Missouri and Alabama are roughly similar to Georgia in size and even shape. Louisiana, Arkansas, and Missouri are about equal in north-south span and loosely similar east-to-west. Kansas, Nebraska, South Dakota, and North Dakota aren’t too different in their north-to-south or east-to-west spans.

There’s exceptions, for reasons tied to the complexities of history. California and Texas get peculiar shapes because they could. Michigan has an upper peninsula for quirky reasons that some friend of mine on Twitter discovers every three weeks or so. But the rough guide is that states look a lot more similar to one another than you’d think from a quick look. Mark Stein’s How The States Got Their Shapes is an endlessly fascinating text explaining this all.

If there is a loose logic to state boundaries, though, what about state capitals? Those are more quirky. One starts to see the patterns when considering questions like “why put California’s capital in Sacramento instead of, like, San Francisco?” or “Why Saint Joseph instead Saint Louis or Kansas City?” There is no universal guide, but there are some trends. Generally states end up putting their capitals in a city that’s relatively central, at least to the major population centers around the time of statehood. And, generally, not in one of the state’s big commercial or industrial centers. The desire to be geographically central is easy to understand. No fair making citizens trudge that far if they have business in the capital. Avoiding the (pardon) first tier of cities has subtler politics to it; it’s an attempt to get the government somewhere at least a little inconvenient to the money powers.

There’s exceptions, of course. Boston is the obviously important city in Massachusetts, Salt Lake City the place of interest for Utah, Denver the equivalent for Colorado. Capitals relocated; Atlanta is Georgia’s eighth(?) I think since statehood. Sometimes they were weirder. Until 1854 Rhode Island rotated between five cities, to the surprise of people trying to name a third city in Rhode Island. New Jersey settled on Trenton as compromise between the East and West Jersey capitals of Perth Amboy and Burlington. But if you look for a city that’s fairly central but not the biggest in the state you get to the capital pretty often.

So these are historical and cultural factors which combine to make a Voronoi Diagram map of the United States strange, but not impossibly strange, compared to what has really happened. Things are rarely so arbitrary as they seem at first.

## Vector.

A vector’s a thing you can multiply by a number and then add to another vector.

Oh, I know what you’re thinking. Wasn’t a vector one of those things that points somewhere? A direction and a length in that direction? (Maybe dressed up in more formal language. I’m glad to see that apparently New Jersey Tech’s student newspaper is still The Vector and still uses the motto “With Magnitude And Direction’.) Yeah, that’s how we’re always introduced to it. Pointing to stuff is a good introduction to vectors. Nearly everyone finds their way around places. And it’s a good learning model, to learn how to multiply vectors by numbers and to add vectors together.

But thinking too much about directions, either in real-world three-dimensional space, or in the two-dimensional space of the thing we’re writing notes on, can be limiting. We can get too hung up on a particular representation of a vector. Usually that’s an ordered set of numbers. That’s all right as far as it goes, but why limit ourselves? A particular representation can be easy to understand, but as the scary people in the philosophy department have been pointing out for 26 centuries now, a particular example of a thing and the thing are not identical.

And if we look at vectors as “things we can multiply by a number, then add another vector to”, then we see something grand. We see a commonality in many different kinds of things. We can do this multiply-and-add with those things that point somewhere. Call those coordinates. But we can also do this with matrices, grids of numbers or other stuff it’s convenient to have. We can also do this with ordinary old numbers. (Think about it.) We can do this with polynomials. We can do this with sets of linear equations. We can do this with functions, as long as they’re defined for compatible domains. We can even do this with differential equations. We can see a unity in things that seem, at first, to have nothing to do with one another.

We call these collections of things “vector spaces”. It’s a space much like the space you happen to exist in is. Adding two things in the space together is much like moving from one place to another, then moving again. You can’t get out of the space. Multiplying a thing in the space by a real number is like going in one direction a short or a long or whatever great distance you want. Again you can’t get out of the space. This is called “being closed”.

(I know, you may be wondering if it isn’t question-begging to say a vector is a thing in a vector space, which is made up of vectors. It isn’t. We define a vector space as a set of things that satisfy a certain group of rules. The things in that set are the vectors.)

Vector spaces are nice things. They work much like ordinary space does. We can bring many of the ideas we know from spatial awareness to vector spaces. For example, we can usually define a “length” of things. And something that works like the “angle” between things. We can define bases, breaking down a particular element into a combination of standard reference elements. This helps us solve problems, by finding ways they’re shadows of things we already know how to solve. And it doesn’t take much to satisfy the rules of being a vector space. I think mathematicians studying new groups of objects look instinctively for how we might organize them into a vector space.

We can organize them further. A vector space that satisfies some rules about sequences of terms, and that has a “norm” which is pretty much a size, becomes a Banach space. It works a little more like ordinary three-dimensional space. A Banach space that has a norm defined by a certain common method is a Hilbert space. These work even more like ordinary space, but they don’t need anything in common with it. For example, the functions that describe quantum mechanics are in a Hilbert space. There’s a thing called a Sobolev Space, a kind of vector space that also meets criteria I forget, but the name has stuck with me for decades because it is so wonderfully assonant.

I mentioned how vectors are stuff you can multiply by numbers, and add to other vectors. That’s true, but it’s a little limiting. The thing we multiply a vector by is called a scalar. And the scalar is a number — real or complex-valued — so often it’s easy to think that’s the default. But it doesn’t have to be. The scalar just has to be an element of some field. A ‘field’ is a ring that you can do addition, multiplication, and division on. So numbers are the obvious choice. They’re not the only ones, though. The scalar has to be able to multiply with the vector, since otherwise the entire concept collapses into gibberish. But we wouldn’t go looking among the gibberish except to be funny anyway.

The idea of the ‘vector’ is straightforward and powerful. So we see it all over a wide swath of mathematics. It’s one of the things that shapes how we expect mathematics to look.

## The Most Unlikely NHL Playoff Upsets of the Last Five Years

Nick Emptage, writing for puckprediction.com, has the sort of post which I can’t resist: it’s built on the application of statistics to sports. In this case it’s National Hockey League playoffs, and itself builds on an earlier post about the conditional probabilities of the home-team-advantaged winning a best-of-seven series, to look at the most unlikely playoff wins of the last several years. Since I’m from New Jersey I feel a little irrational pride at the New Jersey Devils being two of the most improbable winners, not least because I remember the Devils in the 1980s when the could lose as many as 200 games per eighty-game season, so seeing them in the playoffs at all is a wondrous thing.

## Reading the Comics, April 5, 2013

Before getting to the next round of comic strips that mention mathematics stuff, I’d like to do a bit of self-promotion. Freshly published is the book Oh, Sandy: An Anthology Of Humor For A Serious Purpose, edited by Lynn Beighley, Peter Barlow, Andrea Donio, and A J Fader. This is a collection of humorous bits, written out of a sense of needing to do something useful after the Superstorm. I have an essay in there, based on the strange feelings I had of being remote (and quite safe) while seeing my home state — and particularly the piers at Seaside Heights, New Jersey — being battered by a storm. The book is available also through CreateSpace.

Jenny Campbell’s Flo and Friends (March 23) mentions π, and what’s really a fairly indistinct question for a tutor to ask the student. “Explain pi” is more open-ended than I think could be useful to answer: you could write books trying to describe what it’s used for, never mind the history of studying it. After all, it’s the only transcendental number with enough pop cultural cachet to appear routinely in newspaper comic strips; what constitutes an explanation of it? Alas, the strip just goes for the easiest pi pun to be made.

Scott Hilburn’s The Argyle Sweater (March 25) returns to the gimmick of anthropomorphized numerals. It’s a cute enough joke; it’s also apparently a different pair of 1 and 2 from earlier in the month. I do wonder what, in this panel’s continuity, subtraction might mean. Still, Hilburn is obviously never far from thinking of anthropomorphized numbers, as he came back to the setting on April 3, with another 2 putting in an appearance.

## How Big Was West Jersey?

A book I’d read about the history of New Jersey mentioned something usable for a real-world-based problem in fraction manipulation, for a class which was trying to get students back up to speed on arithmetic on their way into algebra. It required some setup to be usable, though. The point is a property sale from the 17th century, from George Hutcheson to Anthony Woodhouse, transferring “1/32 of 3/90 of 90/100 shares” of land in the province of West Jersey. There were a hundred shares in the province, so, the natural question to build is: how much land was transferred?

The obvious question, to people who failed to pay attention to John T Cunningham’s This Is New Jersey in fourth grade, or who spent fourth grade not in New Jersey, or who didn’t encounter that one Isaac Asimov puzzle mystery (I won’t say which lest it spoil you), is: what’s West Jersey? That takes some historical context.

## How Big Charlotte Was In 1975

[ I cannot and do not try to explain it, but yesterday was a busier-than-average day around these parts, with a surprising number of references coming from an Entertainment weekly article about the House series finale for some reason. In this context a “surprising” number is “any number other than zero” since I don’t know why anyone would go from there to here. I watched House, sometimes, sure, and liked it, but kind of drifted away when there was other stuff to do, you know? ]

That’s enough time spent establishing the heck out of the idea of a polynomial. Let’s actually put one in place. My goal back when was estimating what the population of Charlotte, North Carolina, was around 1975. I had some old Census data from 1970 and 1980 giving its population on the first of April, the earlier year, as 840,347; and the first of April, 1980, as 971,391.

## Life In North Carolina

[ I’m grateful to all for the help in reading my pages here. I’ve not quite reached 3,000 hits, but it’s within sight. If you do know of people who might be interested in either what I’m doing now — and it should be clearer after today’s post — or articles I’ve written in the past, please let them know, or let me know if I could be doing better at reaching interested audiences. ]

I left off the list of places I’d lived the city of Charlotte, North Carolina. There’s justice in my doing so. We lived there only for a couple years, when I was extremely young. I have only a few memories of the place, most of them based on the popcorn machine they had in my preschool program. I don’t know what else I got out of that, but I certainly appreciated seeing popcorn pop. Also I had two brothers born then. But, mostly, I can’t say that Charlotte made much of an impression on me. I couldn’t identify any major features of it from memory, and challenged to point to it on a map I might point at Delaware instead, or wander off to find a soda. Plus, I last lived there somewhere around 1975. I can accept that the population of South Amboy, New Jersey, may not have changed very much since the mid-1970s, but not that Charlotte’s hasn’t.

## Dense Places I Have Lived

[ I don’t wish to be too shameless here, but I’m closing in on 3,000 visitors to my little blog here. Can we get there? Kindly pass on a reference to people you think might be interested; if I matched my most-popular-ever day I’d reach 3,000 tonight easily. ]

I’ve lived almost my entire life in New Jersey, which has its effects on my world view; for example, it produces an extreme defensiveness about the state — really, has there been a fresh Jersey Joke since Benjamin Franklin’s quip about it being “a barrel tapped at both ends”, and they’re not even sure it wasn’t James Madison who said that instead, if anyone ever did? — and a feeling that one should refer to Bruce Springsteen as “Bruce”, as if we’d ever knowingly been in the same zip code simultaneously. Add to that not understanding what is wrong with other states that you’re forced to pump your own gas, and not being able to get a cackling laughter and a voice-over announcer wailing “Rrrrrrrrraceway Park!” out of the head, and you’ve got a first sketch of my personality. (I seem to have missed going to Action Park. My father insists he took me there; I grant he may have taken my siblings, but I don’t remember ever getting there, and the fact I have all my limbs suggests I never did go there.) But there are some other impressions that one gets from growing up in New Jersey.