The Jagged Kind Of Flat


I have a couple of other thoughts about these piecewise constant functions which I’ve been using to make interpolations. The basic idea is simple enough; we pretend the population of Charlotte was a constant number, the 840,347 it happened to be on the 1970 Census Day, and then leapt upwards at some point to the 971,391 it would have on the 1980 Census Day. Maybe it leapt up immediately after the 1970 Census; maybe immediately before the 1980; maybe at the exact middle moment between the two; maybe some other day. Are those all the options we have?

That I’m writing any further gives away my answer. Do we really have to take the entire 131,044-person population leap in one step? Why not split it in two, and estimate the population to be 840,347 from April 1970 through to March 1975, and then leap up half this gap to 905,869 from April 1975 through to April 1980, and to 971,391 after that?

If we’re supposing the piecewise-constant interpolation is desirable, presumably because it’s easy to work with, then this shouldn’t be all that much harder than the original where the big leap was in one piece. If we expect the population is rising more or less steadily the whole period, too, this approximation might be better than the ones we had last time, since the error, the difference between our interpolation and the actual population, is probably smaller, or at least smaller for much of the time. Maybe it’s even smaller for the time — 1975 — that we’re really interested in getting right.

We can vary this too; we could use 840,347 as the interpolated population from sometime before 1970 through 1973, then jump up to 905,869 for 1973 through 1978, and then up to 971,391 for 1978 through 1980 and beyond. It might be a more accurate approximation overall.

Disadvantages? Well, we do have to work with a more complicated function, so it’s more annoying to work with by hand. If we let a computer do whatever our work is, we’ll have to enter and work out a more elaborate approximation. There are problems.

If breaking this into two pieces worked so well, why not three pieces? (Did it work so well?) Why not split the population increase into three equal pieces and let them fill up equal amounts of the decade?

Well, for one, a third of the difference in populations comes to 43,681 and one-third. It looks a bit funny to say that in 1974 we estimate Charlotte’s population to be 884,028 and one-third persons. But that does only look funny; after all, we knew going in that the interpolation could only by a wild stroke of luck be exactly right. All we really want is to not be too far off the correct number. If we don’t mind being, say, 15,000 people away from the right headcount, we shouldn’t mind being 15,000 and one-third people away.

I admit I feel uncomfortable making this split, though. We at least started out with each constant piece containing a data point, and now we’ve got flat pieces that are just projections of where data points might be. We might be justified in doing that, but my suspicion is it’d tend to make us think there’s more data there than we actually have. Consider an extreme case: imagine splitting up the segment of 3,653 days between April 1, 1970 and April 1, 1980 into 131,044 separate constant pieces, each one greater than the one before. We’d be splitting this decade into stretches of as little as 40 minutes between population increases. However good the 1970 and 1980 censuses were, do you believe we can actually tell the difference of Charlotte’s population over a 40-minute stretch based on these two data points?

Maybe. We’ll get soon enough into interpolations that are continuous functions, where — in principle — we estimate different populations for any two different moments of time, and we know enough not to take that literally. We are always at risk of being hypnotized by decimal places and their illusion of precision. If we’re wise enough to avoid that, though, why not use this ever-so-greater precision?

Here I’d put up a practical objection. The whole point of a piecewise constant approximation is that it’s simple. It’s easy to look at, easy to work with, and doesn’t require any great effort. Breaking a stretch of time down into incredibly many little stretches spoils that simplicity. If we want a fine-grained approximation, why not use some non-constant function? That’ll probably be easier for us to work with. Interpolation is a tool; if it’s too cumbersome to use, it’s not useful.

I’d also point out that cutting things into too many subdivisions makes extra assumptions about what the thing we’re measuring is doing between data points. For example, if we divided the interval into ten stretches, one year each, we’d probably suppose the population to grow more or less steadily every single year. Perhaps that’s true. Perhaps not; maybe Charlotte grew explosively to 1973, shrank a little to 1977, and then grew swiftly again. Every interpolation scheme is subject to this sort of error; the real world is more complicated than our models of it.

But by committing to paper or Matlab my assumption that the population is increasing in these discrete chunks over and over, I’m committing myself to thinking that’s how the population must be behaving. It’s usually desirable practice to keep the assumptions we make about something as few and as general as we can; so, I’d like to be careful about breaking the 1970-to-1980 stretch into smaller groups if I don’t have reasons to support doing so.

I think that fairly closes out the piecewise-continuous sort of interpolation, at least for the moment.

Advertisements