## Can We Tell Whether A Pinball Player Is Improving?

The question posed for the pinball league was: can we say which of the players most improved over the season? I had data. I had the rankings of each of the players over the course of eight league nights. I had tools. I’ve taken statistics classes.

Could I say what a “most improved” pinball player looks like? Well, I can give a rough idea. A player’s improving if their rankings increase over the the season. The most-improved person would show the biggest improvement. This definition might go awry; maybe there’s some important factor I overlooked. But it was a place to start looking.

So here’s the first problem. It’s the plot of my own data, my league scores over the season. Yes, league night 2 is dismal. I’d had to miss the night and so got the lowest score possible.

Is this getting better? Or worse? The obvious thing to do is to look for a curve that goes through these points. Then look at what that curve is doing. The thing is, it’s always possible to draw a curve through a bunch of data points. As long as there’s not something crazy like there’s four data points for the same league night. As long as there’s one data point for each measurement you can always connect those points to some curve. Worse, you can always fit more than one curve through those points. We need to think harder.

Here’s the thing about pinball league night results. Or any other data that comes from the real world. It’s got noise in it. There’s some amount of it that’s just random. We don’t need to look for a curve that matches every data point. Or any data point particularly. What if the actual data is “some easy-to-understand curve, plus some random noise”?

It’s a good thought. It’s a dangerous thought. You need to have an idea of what the “real” curve should be. There’s infinitely many possibilities. You can bias your answer by choosing what curve you think the data ought to represent. Or by not thinking before you make a choice. As ever, the hard part is not in doing a calculation. It’s choosing what calculation to do.

That said there’s a couple safe bets. One of them is straight lines. Why? … Well, they’re easy to work with. But we have deeper reasons. Lots of stuff, when it changes, looks like it’s changing in a straight line. Take any curve that hasn’t got a corner or a jump or a break in it. There’s a straight line that looks close enough to it. Maybe not for long, but at least for some stretch. In the absence of a better idea of what ought to be right, a line is at least a starting point. You might learn something even if a line doesn’t fit well, and get ideas for why to look at particular other shapes.

So there’s good, steady mathematics business to be found in doing “linear regression”. That is, find the line that best fits a set of data points. What do we mean by “best fits”?

The mathematical community has an answer. I agree with it, surely to the comfort of the mathematical community. Here’s the premise. You have a bunch of data points, with a dependent variable ‘x’ and an independent variable ‘y’. So the data points are a bunch of points, $\left(x_j, y_j\right)$ for a couple values of j. You want the line that “best” matches that. Fine. In my pinball league case here, j is the whole numbers from 1 to 8. $x_j$ is … just j again. All right, as happens, this is more mechanism than we need for this problem. But there’s problems where it would be useful anyway. And for $y_j$, well, here:

j yj
1 467
2 420
3 472
4 473
5 472
6 455
7 479
8 462

For the linear regression, propose a line described by the equation $y = m\cdot x + b$. No idea what ‘m’ and ‘b’ are just yet. But. Calculate for each of the $x_j$ values what the projection would be, that is, what $m\cdot x_j + b$. How far are those from the actual $y_j$ data?

Are there choices for ‘m’ and ‘b’ that make the difference smaller? It’s easy to convince yourself there are. Suppose we started out with ‘m’ equal to 0 and ‘b’ equal to 472. That’s an okay fit. Suppose we started out with ‘m’ equal to 100,000,000 and ‘b’ equal to -2,038. That’s a crazy bad fit. So there must be some ‘m’ and ‘b’ that make for better fits.

Is there a best fit? If you don’t think much about mathematics the answer is obvious: of course there’s a best fit. If there’s some poor, some decent, some good fits there must be a best. If you’re a bit better-learned and have thought more about mathematics you might grow suspicious. That term ‘best’ is dangerous. Maybe there’s several fits that are all different but equally good. Maybe there’s an endless series of ever-better fits but no one best. (If you’re not clear how this could work, ponder: what’s the largest negative real number?)

Good suspicions. If you learn a bit more mathematics you learn the calculus of variations. This is the study of how small changes in one quantity change something that depends on it; and it’s all about finding the maxima or minima of stuff. And that tells us that there is, indeed, a best choice for ‘m’ and ‘b’.

(Here I’m going to hedge. I’ve learned a bit more mathematics than that. I don’t think there’s some freaky set of data that will turn up multiple best-fit curves. But my gut won’t let me just declare that. There’s all kinds of crazy, intuition-busting stuff out there. But if there exists some data set that breaks linear regression you aren’t going to run into it by accident.)

So. How to find the best ‘m’ and ‘b’ for this? You’ve got choices. You can open up DuckDuckGo and search for ‘matlab linear regression’ and follow the instructions. Or ‘excel linear regression’, if you have an easier time entering data into spreadsheets. If you’re on the Mac, maybe ‘apple numbers linear regression’. Follow the directions on the second or third link returned. Oh, you can do the calculation yourself. It’s not hard. It’s just tedious. It’s a lot of multiplication and addition and you know what? We’ve already built tools that know how to do this. Use them. Not if your homework assignment is to do this by hand, but, for stuff you care about yes. (In Octave, an open-source clone of Matlab, you can do it by an admirably slick formula that might even be memorizable.)

If you suspect that some shape other than a line is best, okay. Then you’ll want to look up and understand the formulas for these linear regression coefficients. That’ll guide you to finding a best-fit for these other shapes. Or you can do a quick, dirty hack. Like, if you think it should be an exponential curve, then try fitting a line to x and the logarithm of y. And then don’t listen to those doubts about whether this would be the best-fit exponential curve. It’s a calculation, it’s done, isn’t that enough?

Back to lines, back to my data. I’ll spare you the calculations and show you the results.

Done. For me, this season, I ended up with a slope ‘m’ of about 2.48 and a ‘b’ of about 451.3. That is, the slightly diagonal black line here. The red circles are what my scores would have been if my performance exactly matched the line.

That seems like a claim that I’m improving over the season. Maybe not a compelling case. That missed night certainly dragged me down. But everybody had some outlier bad night, surely. Why not find the line that best fits everyone’s season, and declare the most-improved person to be the one with the largest positive slope?

## Who’s The Most Improved Pinball Player?

My love just completed a season as head of a competitive pinball league. People find this an enchanting fact. People find competitive pinball at all enchanting. Many didn’t know pinball was still around, much less big enough to have regular competitions.

Pinball’s in great shape compared to, say, the early 2000s. There’s one major manufacturer. There’s a couple of small manufacturers who are well-organized enough to make a string of games without (yet) collapsing from not knowing how to finance game-building. Many games go right to private collections. But the “barcade” model of a hipster bar with a bunch of pinball machines and, often, video games is working quite well right now. We’re fortunate to live in Michigan. All the major cities in the lower part of the state have pretty good venues and leagues in or near them. We’re especially fortunate to live in Lansing, so that most of these spots are within an hour’s drive, and all of them are within two hours’ drive.

Ah, but how do they work? Many ways, but there are a couple of popular ones. My love’s league uses a scheme that surely has a name. In this scheme everybody plays their own turn on a set of games. Then they get ranked for each game. So the person who puts up the highest score on the game Junkyard earns 100 league points. The person who puts up the second-highest score on Junkyard earns 99 league points. The person with the third-highest score on Junkyard earns 98 league points. And so on, like this. If 20 people showed up for the day, then the poor person who bottoms out earns a mere 81 league points for the game.

This is a relative ranking, yes. I don’t know any competitive-pinball scheme that uses more than one game that doesn’t rank players relative to each other. I’m not sure how an alternative could work. Different games have different scoring schemes. Some games try to dazzle with blazingly high numbers. Some hoard their points as if giving them away cost them anything. A score of 50 million points? If you had that on Attack From Mars you would earn sympathetic hugs and the promise that life will not always be like that. (I’m not sure it’s possible to get a score that low without tilting your game away.) 50 million points on Lord of the Rings would earn a bunch of nods that yeah, that’s doing respectably, but there’s other people yet to play. 50 million points on Scared Stiff would earn applause for the best game anyone had seen all year. 50 million points on The Wizard of Oz would get you named the Lord Mayor of Pinball, your every whim to be rapidly done.

And each individual manifestation of a table is different. It’s part of the fun of pinball. Each game is a real, physical thing, with its own idiosyncrasies. The flippers are a little different in strength. The rubber bands that guard most things are a little harder or softer. The table is a little more or less worn. The sensors are a little more or less sensitive. The tilt detector a little more forgiving, or a little more brutal. Really the least unfair way to rate play is comparing people to each other on a particular table played at approximately the same time.

It’s not perfectly fair. How could any real thing be? It’s maddening to put up the best game of your life on some table, and come in the middle of the pack because everybody else was having great games too. It’s some compensation that there’ll be times you have a mediocre game but everybody else has a lousy one so you’re third-place for the night.

Back to league. Players earn these points for every game played. So whoever has the highest score of all on, say, Attack From Mars gets 100 league points for that regardless of whatever they did on Junkyard. Whoever has the best score on Iron Maiden (a game so new we haven’t actually played it during league yet, and that somehow hasn’t got an entry on the Internet Pinball Database; give it time) gets their 100 points. And so on. A player’s standings for the night are based on all the league points earned on all the tables played. For us that’s usually five games. Five or six games seems about standard; that’s enough time playing and hanging out to feel worthwhile without seeming too long.

So each league night all the players earn between (about) 420 and 500 points. We have eight league nights. Add the scores up over those league nights and there we go. (Well, we drop the lowest nightly total for each player. This lets them miss a night for some responsibility, like work or travel or recovering from sickness or something, without penalizing them.)

As we got to the end of the season my love asked: is it possible to figure out which player showed the best improvement over time?

Well. I had everybody’s scores from every night played. And I’ve taken multiple classes in statistics. Why would I not be able to?

## How Pinball Leagues and Chemistry Work: The Mathematics

My love and I play in several pinball leagues. I need to explain something of how they work.

Most of them organize league nights by making groups of three or four players and having them play five games each on a variety of pinball tables. The groupings are made by order. The 1st through 4th highest-ranked players who’re present are the first group, the 5th through 8th the second group, the 9th through 12th the third group, and so on. For each table the player with the highest score gets some number of league points. The second-highest score earns a lesser number of league points, third-highest gets fewer points yet, and the lowest score earns the player comments about how the table was not being fair. The total number of points goes into the player’s season score, which gives her ranking.

You might see the bootstrapping problem here. Where do the rankings come from? And what happens if someone joins the league mid-season? What if someone misses a competition day? (Some leagues give a fraction of points based on the player’s season average. Other leagues award no points.) How does a player get correctly ranked?