What We Mean By x

[ Oh, wow. Yesterday’s entry had way fewer hits than average. I also put an equation out right up front where everyone could see it. I wonder if this might be a test of Stephen Hawking’s dictum about equations and sales. Or maybe I was just boring yesterday. I’d ask, but apparently, nobody found me interesting enough yesterday to know for comparison. ]

It shouldn’t be too hard to translate the the idea “I want to know the population of Charlotte at some particular time” into a polynomial. The polynomial ought to look something like y equals some pile of numbers times x’s raised to powers, and x somehow has to do with the particular time, and y has something to do with the population. And it’s not hard to do that translating, but I want to talk about some deeper issues. It’s probably better explaining them on the simple problem, where we know what we want things to mean, than it would be explaining them for a complicated problem.

The mental model of how functions work, as I first learned it, and as still seems reasonably popular, is as some kind of machine. You throw in some raw ingredients, stuff you have on hand, and the function does something or other, and you get out of it the thing you want. The polynomial as put up there is such an example: we start out with some specific year, for which we want to know Charlotte’s population, as the raw ingredients. Then something (that whole raising-to-powers and multiplying and adding up thing) gets done to those raw ingredients. When that’s done, we have the output, in this case, the population for that year.

These translations aren’t uniquely correct, though: we could work out other translating schemes, and as long as we do our work correctly, the results will be right. We enter the slightly challenging realms of what setups are convenient and useful and keep us from making careless mistakes. Computers will do arithmetic for us, and the smarter computers will even do a good swath of algebraic manipulations. None of this is going to help if we aren’t clear about what we want the computer to do for us, though. This is why in setting up a problem it’s usually very good practice to write out just what quantities one wants to know something about, and what symbols one is using to represent those quantities. It helps make clear what one thinks one is doing, and how to translate what one did back into what one wants.

In the polynomial there I wanted the input to in some way correspond to the year. I had data for the population of Charlotte in 1970 and 1980, and I particularly wanted to know what it was in 1975. The variable x contains this input and … say, why should it be the letter x? We always use x, which is itself not a bad reason to use it, actually. We can tell what role it serves without having to think about it. Rene Descartes started us on the practice of using “x” for a number whose value we didn’t know or didn’t want to specify, and that’s proven to be among his ten most durable ideas.

Still, we are looking for the population in different years; year starts with y; why not have ‘y’ be the letter we use to represent the year? Well, because we’re using y on the other side of the equation, but, we could change that letter to. On the left-hand side of the equation y meant the population; why couldn’t we use ‘p’ for that? That might be worth specifying whatever we use for the year.

On the other hand, the year is also a measure of time, and time starts with ‘t’, so why not use ‘t’ as the variable? Unless you just don’t like ‘t’, I don’t see any reason not to. x or y or t, any of these are decent choices for which variable to use as the input idea: we want the estimate of Charlotte’s population for what specific point in time?

Let’s say we take t as the variable, just to give x a bit of a rest. Well, then, we obviously know the population at time t = 1970 and at time t = 1980 and want it at t = 1975 only why do we know that exactly? That is, why should the input be the calendar year?

If we’re letting the computer do all the calculations, there’s no reason not to. If we were working this out by hand, though, we might look in horror at the idea of working out something which looks like

p = a_0 + a_1 \cdot 1975 + a_2 \cdot 1975^2 + a_3 \cdot 1975^3 + \cdots + a_n \cdot 1975^n

simply because we are not so crazy as to multiply 1975 by itself even once, let alone several times over.

What if, instead of t being the number of years since the start of the Christian Era, we have t instead be the number of years since the census of 1970? Then, for the interpolation of populations between census 1970 and census 1980, we’d have t stick to the range from 0 to 10. We may not want to find what happens if we multiply 5 by itself many times over, but we’d much rather do that than multiply 1975 by itself over and over.

Does it have to be years? There are 120 months from census 1970 to census 1980; why not let t be the number of months since the 1970 census? A range in t of 0 to 120 … probably is fairly convenient. People who’ve taken exams are pretty comfortable with number ranges from 0 to 100, and going from 0 to 120 isn’t very different. The number range is comfortable, even if working out 60 raised to the third power might not be.

For that matter, why not have it be complete days from the census of 1970? Well, that seems a little silly since then t would range from 0 up to 3653, and no sane person wants to take the middle of that … 1826.5 … to any powers. On the other hand, if I were interested in projecting the population for every day between those two dates, maybe the convenience of matching every projected population to a particular and distinct day would be convenient.

Note that I say convenient, not right. They’re all right, and if you use them you’ll get the right answers as long as you do all the work consistently. We’re just laying down the rules of what’s convenient for us. We can come up with more alternatives: for example, let t be the number of years since the census of 1980. Then the census of 1970 would have a t of -10, and 1975 would have a t of -5. We maybe wouldn’t like negative numbers as much as we do positive ones — even the names make negative numbers sound like we disapprove of them — but we can work with them just fine. For that matter we could have t represent the number of months since the April 1975, and put the census of 1970 at t = -30 and the census of 1980 at t = +30, which has a charming symmetry to it if nothing else.

We wouldn’t have to start from 1970 either. We might start from 1950, on the theory that I won’t find any census data from earlier than that, so we can have the advantage of a range of t values which isn’t too big and also let us make projections to earlier years without having to deal with negative numbers. We might start from 1768, when the city was incorporated, or 1755, when the earliest (European) settlement began.

I am going to make my t represent the number of years since 1950, since I already know the census data for Charlotte for 1960 and imagine I could get 1950 with a little effort, and I’ll want that later on. I’m picking years because this way I can project all the way out to the present day without exceeding 62, which seems like a tolerably modest range that doesn’t dip into negative numbers. You could set up your own interpolation differently. You’ll just want to set things up in ways that make it convenient for you to put in what you do know and get back out what you would like to know.