I wanted to point folks over to a blog post by Rick Wicklin, on the web site for SAS. It’s the company that makes, well, SAS, software designed for data management and analysis.
The incident behind this was an accident, as his daughter spilled a bottle of black nail polish, and it splattered on the wall in an interesting spiral. Dr Wicklin wondered if it might be a logarithmic spiral and gathered data to work out whether it might plausibly be. There’s a nice description for how to go from the messiness of a real world splatter to a clearly defined mathematical problem, and how to try fitting a curve to the messy reality of data.
Curve-fitting real-world data is a challenging field. Curves are always members of families, groups of curves that look similar. For example, circles may have any point as their center and have any radius. Lines may pass through any point you like and be as horizontal or vertical or diagonal as you like. (Yes, a straight line isn’t much of a curve, but it’s too wordy to talk of “line or curve fitting” if you don’t have to. In this context, a line is a kind of curve in the same way a square is a kind of parallelogram.) There are many, many more kinds of curves, parabolas and hyperbolas and cubics and quartics and trigonometric functions and, oh yes, we can add them together, or multiply them, or even compose them (anyone up for the sine of a logarithm?).
So you start with the kind of curve you think your data really is, and try to find the set of parameters that make the curve and the data look like they’re representations of the same thing. The drawing of your curve and the drawing of your data points will never exactly overlap, though. Your data, coming from the real world, will be messy. Some of the nail polish spots will be in the ‘wrong’ place, or it’ll be ambiguous what the ‘real’ location of a point should be. (After all, what is the real location of a spot? Its center? How do you know where the exact center is? What if the spot is a smeared raindrop-shape rather than a circle?)
It’s not just an artistic eye that judges whether the parameters you’ve picked are a good fit. We can quantify how “good” a fit the curve is to the data, and to find the parameters that make the best possible, or the best findable, fit. But there is still an artistic eye involved: there are infinitely many imaginable curves. If you start from the wrong kind of curve, you might get a tolerable fit. But it won’t give insight into the reasons the data looks like this, or what it might look like as more data comes in. Happily, computers make it easy to try out many different kinds of curves, but having a sense of what curves are plausible makes for better work.