Where Interpolations Go Wrong

I built up a linear interpolation for Charlotte’s population between 1970 and 1980. In principle, I could extend this beyond those years, and project what the population was before Census Day 1970 or after the census of 1980. Generally, we’d call that — the values the interpolating polynomial takes on, outside the range of data we started with — an extrapolation rather than an interpolation, but they’re pretty closely tied together. If we understand one we’re doing pretty well understanding the other.

The population of Charlotte, North Carolina, as a linear interpolation/extrapolation based on the 1970 and 1980 census data.  The census data from 1960, 1990, and 2000 are included for comparison.

In the enclosed drawing I’ve plotted in red the line which represents the interpolation of Charlotte’s population. The X marks are the actual census data for 1960, 1970, 1980, 1990, and 2000, according to CensusScope.com. For convenience here’s their record of the populations:

Year Population
1960 702,383
1970 840,347
1980 971,391
1990 1,162,093
2000 1,499,293

When we compare this linear fit to the 1960 population it looks pretty good. The interpolation projects that on the 1st of April, 1960, Charlotte should have had 709,573 people living in it. The Census recorded 702,383 people; that’s not quite dead on, but it’s startlingly good. The interpolation projects 840,347 people for 1970, and 971,391 for 1980, but it gets no credit for making a good projection there. If it didn’t match our original data exactly we’d have come up with a different interpolation; matching the original data was an assumption built into the interpolation we made.

For 1990 the projection starts to be bad. The linear extrapolation projects that by 1990 there should have been 1,102,705 people in Charlotte, which is 59,388 people low. As a fraction of the true population this isn’t bad — just over five percent in error — but misplacing sixty thousand people looks bad, particularly if you know any of them. It could be worse, though; the linear extrapolation projects Charlotte as having a population of only 1,233,749 people on Census Day, 2000, which misses by 265,544 people. Missing one person in six is pretty bad; that might be only two-fifths the 2000 population of Alaska, but still, that’s two-fifths the 2000 population of Alaska.

In part, this is built into the system of interpolations and extrapolations. We started out with a slender two pieces of data, population in 1970 and 1980. Interpolations and extrapolations, generally, are going to be at their bets near the original set of data. The farther away from the real data you go, the worse the extrapolation is, and for pretty much the obvious reason. There’s just so much time for things to get weird.

We can compensate, though. We don’t have to use a linear interpolation, for example. We can use other functions, functions that curve. Looking at the original data we see the rate of increase, decade-to-decade, changes; can we find functions which match that change? Sure, and we’ll do that next.