## My 2018 Mathematics A To Z: Nearest Neighbor Model

I had a free choice of topics for today! Nobody had a suggestion for the letter ‘N’, so, I’ll take one of my own. If you did put in a suggestion, I apologize; I somehow missed the comment in which you did. I’ll try to do better in future.

# Nearest Neighbor Model.

Why are restaurants noisy?

It’s one of those things I wondered while at a noisy restaurant. I have heard it is because restauranteurs believe patrons buy more, and more expensive stuff, in a noisy place. I don’t know that I have heard this correctly, nor that what I heard was correct. I’ll leave it to people who work that end of restaurants to say. But I wondered idly whether mathematics could answer why.

It’s easy to form a rough model. Suppose I want my brilliant words to be heard by the delightful people at my table. Then I have to be louder, to them, than the background noise is. Fine. I don’t like talking loudly. My normal voice is soft enough even I have a hard time making it out. And I’ll drop the ends of sentences when I feel like I’ve said all the interesting parts of them. But I can overcome my instinct if I must.

The trouble comes from other people thinking of themselves the way I think of myself. They want to be heard over how loud I have been. And there’s no convincing them they’re wrong. If there’s bunches of tables near one another, we’re going to have trouble. We’ll each by talking loud enough to drown one another out, until the whole place is a racket. If we’re close enough together, that is. If the tables around mine are empty, chances are my normal voice is enough for the cause. If they’re not, we might have trouble.

So this inspires a model. The restaurant is a space. The tables are set positions, points inside it. Each table is making some volume of noise. Each table is trying to be louder than the background noise. At least until the people at the table reach the limits of their screaming. Or decide they can’t talk, they’ll just eat and go somewhere pleasant.

Making calculations on this demands some more work. Some is obvious: how do you represent “quiet” and “loud”? Some is harder: how far do voices carry? Grant that a loud table is still loud if you’re near it. How far away before it doesn’t sound loud? How far away before you can’t hear it anyway? Imagine a dining room that’s 100 miles long. There’s no possible party at one end that could ever be heard at the other. Never mind that a 100-mile-long restaurant would be absurd. It shows that the limits of people’s voices are a thing we have to consider.

There are many ways to model this distance effect. A realistic one would fall off with distance, sure. But it would also allow for echoes and absorption by the walls, and by other patrons, and maybe by restaurant decor. This would take forever to get answers from, but if done right it would get very good answers. A simpler model would give answers less fitted to your actual restaurant. But the answers may be close enough, and let you understand the system. And may be simple enough that you can get answers quickly. Maybe even by hand.

And so I come to the “nearest neighbor model”. The common English meaning of the words suggest what it’s about. We get it from models, like my restaurant noise problem. It’s made of a bunch of points that have some value. For my problem, tables and their noise level. And that value affects stuff in some region around these points.

In the “nearest neighbor model”, each point directly affects only its nearest neighbors. Saying which is the nearest neighbor is easy if the points are arranged in some regular grid. If they’re evenly spaced points on a line, say. Or a square grid. Or a triangular grid. If the points are in some other pattern, you need to think about what the nearest neighbors are. This is why people working in neighbor-nearness problems get paid the big money.

Suppose I use a nearest neighbor model for my restaurant problem. In this, I pretend the only background noise at my table is that of the people the next table over, in each direction. Two tables over? Nope. I don’t hear them at my table. I do get an indirect effect. Two tables over affects the table that’s between mine and theirs. But vice-versa, too. The table that’s 100 miles away can’t affect me directly, but it can affect a table in-between it and me. And that in-between table can affect the next one closer to me, and so on. The effect is attenuated, yes. Shouldn’t it be, if we’re looking at something farther away?

This sort of model is easy to work with numerically. I’m inclined toward problems that work numerically. Analytically … well, it can be easy. It can be hard. There’s a one-dimensional version of this problem, a bunch of evenly-spaced sites on an infinitely long line. If each site is limited to one of exactly two values, the problem becomes easy enough that freshman physics majors can solve it exactly. They don’t, not the first time out. This is because it requires recognizing a trigonometry trick that they don’t realize would be relevant. But once they know the trick, they agree it’s easy, when they go back two years later and look at it again. It just takes familiarity.

This comes up in thermodynamics, because it makes a nice model for how ferromagnetism can work. More realistic problems, like, two-dimensional grids? … That’s harder to solve exactly. Can be done, though not by undergraduates. Three-dimensional can’t, last time I looked. Weirdly, four-dimensional can. You expect problems to only get harder with more dimensions of space, and then you get a surprise like that.

The nearest-neighbor-model is a first choice. It’s hardly the only one. If I told you there were a next-nearest-neighbor model, what would you suppose it was? Yeah, you’d be right. As long as you supposed it was “things are affected by the nearest and the next-nearest neighbors”. Mathematicians have heard of loopholes too, you know.

As for my restaurant model? … I never actually modelled it. I did think about the model. I concluded my model wasn’t different enough from ferromagnetism models to need me to study it more. I might be mistaken. There may be interesting weird effects caused by the facts of restaurants. That restaurants are pretty small things. That they can have echo-y walls and ceilings. That they can have sound-absorbing things like partial walls or plants. Perhaps I gave up too easily when I thought I knew the answer. Some of my idle thoughts end up too idle.

I should have my next Fall 2018 Mathematics A-To-Z post on Tuesday. It’ll be available at this link, as are the rest of these glossary posts.

## A Leap Day 2016 Mathematics A To Z: Kullbach-Leibler Divergence

Today’s mathematics glossary term is another one requested by Jacob Kanev. Kaven, I learned last time, has got a blog, “Some Unconsidered Trifles”, for those interested in having more things to read. Kanev’s request this time was a term new to me. But learning things I didn’t expect to consider is part of the fun of this dance.

## Kullback-Leibler Divergence.

The Kullback-Leibler Divergence comes to us from information theory. It’s also known as “information divergence” or “relative entropy”. Entropy is by now a familiar friend. We got to know it through, among other things, the “How interesting is a basketball tournament?” question. In this context, entropy is a measure of how surprising it would be to know which of several possible outcomes happens. A sure thing has an entropy of zero; there’s no potential surprise in it. If there are two equally likely outcomes, then the entropy is 1. If there are four equally likely outcomes, then the entropy is 2. If there are four possible outcomes, but one is very likely and the other three mediocre, the entropy might be low, say, 0.5 or so. It’s mostly but not perfectly predictable.

Suppose we have a set of possible outcomes for something. (Pick anything you like. It could be the outcomes of a basketball tournament. It could be how much a favored stock rises or falls over the day. It could be how long your ride into work takes. As long as there are different possible outcomes, we have something workable.) If we have a probability, a measure of how likely each of the different outcomes is, then we have a probability distribution. More likely things have probabilities closer to 1. Less likely things have probabilities closer to 0. No probability is less than zero or more than 1. All the probabilities added together sum up to 1. (These are the rules which make something a probability distribution, not just a bunch of numbers we had in the junk drawer.)

The Kullback-Leibler Divergence describes how similar two probability distributions are to one another. Let me call one of these probability distributions p. I’ll call the other one q. We have some number of possible outcomes, and we’ll use k as an index for them. pk is how likely, in distribution p, that outcome number k is. qk is how likely, in distribution q, that outcome number k is.

To calculate this divergence, we work out, for each k, the number pk times the logarithm of pk divided by qk. Here the logarithm is base two. Calculate all this for every one of the possible outcomes, and add it together. This will be some number that’s at least zero, but it might be larger.

The closer that distribution p and distribution q are to each other, the smaller this number is. If they’re exactly the same, this number will be zero. The less that distribution p and distribution q are like each other, the bigger this number is.

And that’s all good fun, but, why bother with it? And at least one answer I can give is that it lets us measure how good a model of something is.

Suppose we think we have an explanation for how something varies. We can say how likely it is we think there’ll be each of the possible different outcomes. This gives us a probability distribution which let’s call q. We can compare that to actual data. Watch whatever it is for a while, and measure how often each of the different possible outcomes actually does happen. This gives us a probability distribution which let’s call p.

If our model is a good one, then the Kullback-Leibler Divergence between p and q will be small. If our model’s a lousy one, then this divergence will be large. If we have a couple different models, we can see which ones make for smaller divergences and which ones make for larger divergences. Probably we’ll want smaller divergences.

Here you might ask: why do we need a model? Isn’t the actual data the best model we might have? It’s a fair question. But no, real data is kind of lousy. It’s all messy. It’s complicated. We get extraneous little bits of nonsense clogging it up. And the next batch of results is going to be different from the old ones anyway, because real data always varies.

Furthermore, one of the purposes of a model is to be simpler than reality. A model should do away with complications so that it is easier to analyze, easier to make predictions with, and easier to teach than the reality is. But a model mustn’t be so simple that it can’t represent important aspects of the thing we want to study.

The Kullback-Leibler Divergence is a tool that we can use to quantify how much better one model or another fits our data. It also lets us quantify how much of the grit of reality we lose in our model. And this is at least some of the use of this quantity.

## A Summer 2015 Mathematics A To Z: ansatz

Sue Archer at the Doorway Between Worlds blog recently completed an A to Z challenge. I decided to follow her model and challenge and intend to do a little tour of some mathematical terms through the alphabet. My intent is to focus on some that are interesting terms of art that I feel non-mathematicians never hear. Or that they never hear clearly. Indeed, my first example is one I’m not sure I ever heard clearly described.

## Ansatz.

I first encountered this term in grad school. I can’t tell you when. I just realized that every couple sessions in differential equations the professor mentioned the ansatz for this problem. By then it felt too late to ask what it was I’d missed. In hindsight I’m not sure the professor ever made it clear. My research suggests the word is still a dialect rather than part of the universal language of mathematicians, and that it isn’t quite precisely defined.

What a mathematician means by the “ansatz” is the collection of ideas that go into solving a problem. This may be an assumption of what the solution should look like. This might be the assumptions of physical or mathematical properties a solution has to have. This might be a listing of properties that a valid solution would have to have. It could be the set of things you judge should be included, or ignored, in constructing a mathematical model of something. In short the ansatz is the set of possibly ad hoc assumptions you have to bring to a topic to make it something answerable. It’s different from the axioms of the field or the postulates for a problem. An axiom or postulate is assumed to be true by definition. The ansatz is a bunch of ideas we suppose are true because they seem likely to bring us to a solution.

An ansatz is good for getting an answer. It doesn’t do anything to verify that the answer means anything, though. The ansatz contains assumptions you the mathematician brought to the problem. You need to argue that the assumptions are reasonable, and reflect the actual problem you’re studying. You also should prove that the answer ultimately derived matches the actual behavior of whatever you were studying. Validating a solution can be the hardest part of mathematics, other than all the other parts of mathematics.

## Reading the Comics, March 15, 2015: Pi Day Edition

I had kind of expected the 14th of March — the Pi Day Of The Century — would produce a flurry of mathematics-themed comics. There were some, although they were fewer and less creatively diverse than I had expected. Anyway, between that, and the regular pace of comics, there’s plenty for me to write about. Recently featured, mostly on Gocomics.com, a little bit on Creators.com, have been:

Brian Anderson’s Dog Eat Doug (March 11) features a cat who claims to be “pondering several quantum equations” to prove something about a parallel universe. It’s an interesting thing to claim because, really, how can the results of an equation prove something about reality? We’re extremely used to the idea that equations can model reality, and that the results of equations predict real things, to the point that it’s easy to forget that there is a difference. A model’s predictions still need some kind of validation, reason to think that these predictions are meaningful and correct when done correctly, and it’s quite hard to think of a meaningful way to validate a predication about “another” universe.

## Reading the Comics, August 16, 2014: Saturday Morning Breakfast Cereal Edition

Zach Weinersmith’s Saturday Morning Breakfast Cereal is a long-running and well-regarded web comic that I haven’t paid much attention to because I don’t read many web comics. XKCD, Newshounds, and a couple others are about it. I’m not opposed to web comics, mind you, I just don’t get around to following them typically. But Saturday Morning Breakfast Cereal started running on Gocomics.com recently, and Gocomics makes it easy to start adding comics, and I did, and that’s served me well for the mathematical comics collections since it’s been a pretty dry spell. I bet it’s the summer vacation.

Saturday Morning Breakfast Cereal (July 30) seems like a reach for inclusion in mathematical comics since its caption is “Physicists make lousy firemen” and it talks about the action of a fire — and of the “living things” caught in the fire — as processes producing wobbling and increases in disorder. That’s an effort at describing a couple of ideas, the first that the temperature of a thing is connected to the speed at which the molecules making it up are moving, and the second that the famous entropy is a never-decreasing quantity. We get these notions from thermodynamics and particularly the attempt to understand physically important quantities like heat and temperature in terms of particles — which have mass and position and momentum — and their interactions. You could write an entire blog about entropy and probably someone does.

Randy Glasbergen’s Glasbergen Cartoons (August 2) uses the word-problem setup for a strip of “Dog Math” and tries to remind everyone teaching undergraduates the quotient rule that it really could be worse, considering.

Nate Fakes’s Break of Day (August 4) takes us into an anthropomorphized world that isn’t numerals for a change, to play on the idea that skill in arithmetic is evidence of particular intelligence.

George McManus’s Bringing Up Father (August 11, rerun from April 12, 1949) goes to the old motif of using money to explain addition problems. It’s not a bad strategy, of course: in a way, arithmetic is one of the first abstractions one does, in going from the idea that a hundred of something added to a hundred fifty of something will yield two hundred fifty of that thing, and it doesn’t matter what that something is: you’ve abstracted out the ideas of “a hundred plus a hundred fifty”. In algebra we start to think about whether we can add together numbers without knowing what one or both of the numbers are — “x plus y” — and later still we look at adding together things that aren’t necessarily numbers.

And back to Saturday Morning Breakfast Cereal (August 13), which has a physicist type building a model of his “lack of dates” based on random walks and, his colleague objects, “only works if we assume you’re an ideal gas molecule”. But models are often built on assumptions that might, taken literally, be nonsensical, like imagining the universe to have exactly three elements in it, supposing that people never act against their maximal long-term economic gain, or — to summon a traditional mathematics/physics joke — assuming a spherical cow. The point of a model is to capture some interesting behavior, and avoid the complicating factors that can’t be dealt with precisely or which don’t relate to the behavior being studied. Choosing how to simplify is the skill and art that earns mathematicians the big money.

And then for August 16, Saturday Morning Breakfast Cereal does a binary numbers joke. I confess my skepticism that there are any good alternate-base-number jokes, but you might like them.

## Stable Marriages and Designing Markets

A few days ago Jeremy Kun with the Math ∩ Programming blog wrote about the problem of stable marriages, by which here is meant pairing off people so that everyone is happy with their pairing. Put like that it almost sounds like the sort of thing people used to complain about in letters to Ann Landers about mathematicians doing foolish things — don’t mathematicians know that feelings matter in this, and, how does this help them teach kids to do arithmetic.

But the problem is just put that way because it’s one convenient representation of a difficult problem. Given a number of agents that can be paired up, and some way of measuring the collection of pairings, how can you select the best pairing? And what do you mean by best? Do you mean the one that maximizes whatever it is you’re measuring? The one that minimizes it (if you’re measuring, say, unhappiness, or cost, or something else you’d want as little of)? Jeremy Kun describes the search for a pairing that’s stable, which requires, in part, coming up with a definition of just what “stable” means.

The work can be put to describe any two-party interaction, which can be marriages, or can be the choice of people where to work and employers who to hire, or can be people deciding what to buy or where to live, all sorts of things where people have preferences and good fits. Once the model’s developed it has more applications than what it was originally meant for, which is part of what makes this a good question. Kun also write a bit bout how to expand the problem so as to handle some more complicated cases, and shows how the problem can be put onto a computer.

Here is a fun puzzle. Suppose we have a group of 10 men and 10 women, and each of the men has sorted the women in order of their preference for marriage (that is, a man prefers to marry a woman earlier in his list over a woman later in the list). Likewise, each of the women has sorted the men in order of marriageability. We might ask if there is any way that we, the omniscient cupids of love, can decide who should marry to make everyone happy.

Of course, the word happy is entirely imprecise. The mathematician balks at the prospect of leaving such terms undefined! In this case, it’s quite obvious that not everyone will get their first pick. Indeed, if even two women prefer the same man someone will have to settle for less than their top choice. So if we define happiness in this naive way…

View original post 2,343 more words

## Realistic Modeling

“Economic Realism (Wonkish)”, a blog entry by Paul Krugman in The New York Times, discusses a paper, “Chameleons: The Misuse Of Mathematical Models In Finance And Economics”, by Paul Pfleiderer of Stanford University, which surprises me by including a color picture of a chameleon right there on the front page, and in an academic paper at that, and I didn’t know you could have color pictures included just for their visual appeal in academia these days. Anyway, Pfleiderer discusses the difficulty of what they term filtering, making sure that the assumptions one makes to build a model — which are simplifications and abstractions of the real-world thing in which you’re interested — aren’t too far out of line with the way the real thing behaves.

This challenge, which I think of as verification or validation, is important when you deal with pure mathematical or physical models. Some of that will be at the theoretical stage: is it realistic to model a fluid as if it had no viscosity? Unless you’re dealing with superfluid helium or something exotic like that, no, but you can do very good work that isn’t too far off. Or there’s a classic model of the way magnetism forms, known as the Ising model, which in a very special case — a one-dimensional line — is simple enough that a high school student could solve it. (Well, a very smart high school student, one who’s run across an exotic function called the hyperbolic cosine, could do it.) But that model is so simple that it can’t model the phase change, that, if you warm a magnet up past a critical temperature it stops being magnetic. Is the model no good? If you aren’t interested in the phase change, it might be.

And then there is the numerical stage: if you’ve set up a computer program that is supposed to represent fluid flow, does it correctly find solutions? I’ve heard it claimed that the majority of time spent on a numerical project is spent in validating the results, and that isn’t even simply in finding and fixing bugs in the code. Even once the code is doing perfectly what we mean it to do, it must be checked that what we mean it to do is relevant to what we want to know.

Pfleiderer’s is an interesting paper and I think worth the read; despite its financial mathematics focus (and a brief chat about quantum mechanics) it doesn’t require any particularly specialized training. There’s some discussions of particular financial models, but what’s important are the assumptions being made behind those models, and those are intelligible without prior training in the field.

## Monopoly Chances

While the whole world surely heard about it before, I just today ran across a web page purporting to give the probabilities and expected incomes for the various squares on a Monopoly board. There are many similar versions of this table around — the Monopoly app for iPad even offers the probability that your opponents will land on any given square in the next turn, which is superlatively useful if you want to micromanage your building — and I wouldn’t be surprised if there are little variations and differences between tables.

What’s interesting to me is that the author, Truman Collins, works out the answers by two different models, and considers the results to probably be fairly close to correct because the different models of the game agree fairly well. There are some important programming differences between Collins’s two models (both of which are shown, in code written in C, so it won’t compile on your system without a lot of irritating extra work), but the one that’s most obvious is that in one model the effect of being tossed into jail after rolling three doubles in a row is modelled, while in the other it’s ignored.

Does this matter? Well, it matters a bit, since one is closer to the true game than the other, but at the cost of making a more complicated simulation, which is the normal sort of trade-off someone building a model has to make. Any simulation simplifies the thing being modelled, and a rule like the jail-on-three-doubles might be too much bother for the improvement in accuracy it offers.

Here’s another thing to decide in building the model: when you land in jail, you can either pay a \$50 fine and get out immediately, or can try to roll doubles. If there are a lot of properties bought by your opponents, sitting in jail (as the rolling-doubles method implies) can be better, as it reduces the chance you have to pay rent to someone else. That’s likely the state in the later part of the game. If there are a lot of unclaimed properties, you want to get out and buy stuff. Collins simulates this by supposing that in the early game one buys one’s way out, and in the late game one rolls for doubles. But even that’s a simplification: suppose you owned much of the sides of the board after jail. (You’re likely crushing me, in that case.) Why not get out and get closer to Go the sooner, as long as it’s not likely to cost you?

That Collins tries different models and gets similar results suggest that these estimates are tolerably close to right, and often, that’s the best one can really know about how well a model of a complicated thing represents the reality.