One of the persistent issues in advanced hockey analytics concerns the treatment of data that is only available in small sample sizes. It’s pretty common for us to disregard one, two, even 30 games of data from the beginning of a season as ambiguous, even misleading. This has struck me as not good enough. There are only so many games in a season — waiting until it’s over doesn’t really help a team evaluate its players. This post will propose new methodology to help in this task.
Let’s start with the major problem of small sample size — namely, small sample size. When you’re tracking one player, one game at a time, without any context, you can’t really learn much. How do we add the context? By adding sample size. Thousands of games of sample size.
Imagine a true 50% even strength Corsi player. Imagine that we know he’s a 50% player, as a given. Now imagine this player playing a single game — what happens? We know intuitively that he probably won’t have a 50% game exactly, it’ll be some percentage between 0% and 100%, but tending around 50%. To define the shape of that probability curve, we need to know the variance in Corsi percentage of a typical game for an NHL player. We can do this by looking at real NHL data, looking at any player’s in-season game-by-game Corsi percentage, to get a sense of what any player’s typical standard deviation is over a season’s worth of games. It turns out this standard deviation is around 0.12. 95% of expected data is within 2 standard deviations of the mean. If a player’s mean Corsi% is 50%, that means 95% of his games will have a Corsi% between 26% and 74%.
Here’s a histogram that shows a 50% Corsi player’s distribution of possible game Corsi%’s over a simulation of 10,000 games:
So about 60% of the time, this player will have a game Corsi% of between 40% and 60%. 15% of the time he’ll have a Corsi% between 30% and 40%. 4% of the time he’ll have a Corsi% of between 20% and 30%. And so on. The important thing to take away here is that he’s likely to have games around 50%, but the possibility of game Corsis run the entire probability gamut.
I suppose it’s interesting to know what’s possible over one single game, but how does this tell us anything about a season’s progression? Imagine our 50% Corsi player playing one game, and then another, and then another. Each passing game is played according to the probability curve above. And each passing game accumulates in the player’s season-to-date Corsi%. After 82 games, this 50% Corsi player’s season Corsi will necessarily be close to 50%. Too many games have passed for it not to be close to his true talent level. But how does that distribution of probabilities evolve over a season? To answer this, we’ll ask our 50% Corsi player to play many, many seasons. 10,000 seasons, in fact. I’ll use a technique called Monte Carlo simulation, which uses random numbers in a given probabilistic formula over many iterations to display a model’s behaviour. Here’s the results of 10,000 seasons of data for a true 50% Corsi player’s expected by-game results:
On the far left of our x-axis is the distrubtion of possible outcomes of 10,000 iterations of Game 1 of a season. As each game passes, the Corsi% of each 10,000 player-seasons accumulates. The more games that are played, the closer each of our 10,000 player-seasons drives towards 50%, their true talent level. The straight middle blue line is the mean (average) accumulated Corsi% of our 10,000 player-seasons as each game passes. Predictably, this is 50%, with only minute deviations. The very top blue and very bottom purple lines are the extremes seen in the data — these are the maximum and minimum points seen in 10,000 simulated seasons of a 50% Corsi player.
The really interesting lines here are the green and red lines, marked “Upper 95%” and “Lower 95%”. These encompass our 95% confidence intervals. Now, a 95% confidence interval means that 95% of observations fall within it. That means that 2.5% of observations will be above it, and 2.5% of observations will be below it.
Another point of interest is the rapidly changing slope of the 95% confidence lines. They start wide in early games, but quickly dive towards 50%. If you’re above 60% or below 40% Corsi by game 5, we already know a lot about you. Likewise, if you’re above 55% or below 45% Corsi by game 19, we know a lot about you. By the end of an 82-game season, a true 50% Corsi player has a 95% chance to be between 47% and 53% Corsi. Anything beyond those levels, and, yes, we know a lot about you.
In this analysis, I’ll use this 95% “level of significance” to test against our 50% Corsi player. Our null hypothesis is that a player’s true mean Corsi is 50%. To reject this hypothesis, an observed data point will need to fall above our 95% confidence interval (to accept that he is higher than a 50% Corsi player), or below our 95% confidence interval (to accept that he is lower than a 50% Corsi player).
Let’s look at some practical examples of this. Here’s the above probability distribution overlaid with the 2012-13 by-game season-to-date Corsi% for Oilers centres:
First, imagine this graph with just the player’s Corsi% lines. You know that, yes, these Corsi lines suck horribly, but you don’t have context to know how bad they suck. Now, flip in the 50% mean and 95% confidence interval lines, and we have some context to compare the players’ lines with.
The best Corsi here is Ryan Nugent-Hopkins’ — his light blue line tends towards our 50% mean line, and tracks within our 95% interval. The latter fact lets us say that we cannot reject the null hypothesis that RNH was a 50% talent Corsi player in 2012-13. His teammates, however, are a different story. At some point during this season, they all fall below the Lower 95% line. That is to say that in our 10,000 season simulation, they fell below a point that only 2.5% of observations were beneath, meaning that we can reject the null hypothesis, and say that they are below 50% Corsi players. Horcoff and Belanger start flirting with the Lower 95% line around Game 8, and fall under it for good around Game 16. If you were tracking this during the year, you’d know by Game 8 that something was wrong.
But most interesting to me is Sam Gagner’s line. By Game 2, his orange line has dove below the 95% confidence line. This was after only two games of results, which were games of 41% and 25% Corsi percentages for him. Now, this doesn’t sound like much in our traditional view of the game. But the probability that a true 50% Corsi player would post two games consecutively so low is less than 2.5%. With only two data points, we know that something is rotten in Denmark, and can accept that he is a lower than 50% mean Corsi% player. He tracks the lower 95% line for another 7 games, until he clearly falls under it around Game 9.
Let’s look at the Oilers’ Defence:
Here we see in full gory detail what a bad NHL defence corps looks like. All five defencemen shown here start within our 95% interval at the beginning of the season, but one by one they all eventually fall below it. The earliest is Ryan Whitney, of whom we can say only 9 games into the season that he is not a 50% Corsi player. The next to fall are Jeff Petry and Ladi Smid, who fall below our Lower 95% line in games 18 and 19, respectively. Justin Schultz falls below in game 23, while Nick Schultz is the final hold-out, not falling below until game 24, the exact halfway point of the season. At varying points, we can say that all of these players are below 50% Corsi players.
Let’s concentrate on one player to see how he flirts with the Lower 95% line: Ryan Whitney
The blue line shows Ryan Whitney’s accumulated Corsi% percentile rank in our true 50% player’s 10,000 simulated seasons. By Game 4 we can see that only 10% of observations had Corsi percentages lower than him. By game 7, he’s flirting with our 2.5% Lower 95% Confidence Interval line. By game 9, he’s below it, never to return. By game 19, his Corsi% sunk to a point where not one of the 10,000 simulations of a true 50% player’s season was below him. To put that into context, there’s only been about 20,000-25,000 player-seasons in NHL history. (As an aside, I kept this as a two-tailed test to see whether a player was a 50% player or not. I could have made it a one-tailed test, to test whether he was explicitly a below 50% player. If this had been done, the test line would be at 5% here instead of 2.5%, and Whitney would have failed it by Game 7 instead of Game 9).
Possible enhancements to this model would take into account expected performance of players of various roles (you may want to assume a first line player should be a 54% Corsi play and test a 4th liner at a 48% level instead of 50%), the difficulty of opponent in any given game, home/road game advantages, schedule rest, etc. This assumes that a player is just as likely in one game to have the same probability curve of possible Corsi outcomes as any other game.
And so what applicability do we have for the Oilers so far after one game? Well, their top player had a Corsi% of 62%, while their bottom player was at 41% — these are both inside the 1-game 95% confidence interval of between 26% and 74%, so we cannot reject the null hypothesis that each player is a 50% Corsi player, yet. After two games, the critical levels are between 34% and 66%. After 3 games, it’s between 36.5% and 63.5%. We’ll keep an eye on this one.