One of the persistent issues in advanced hockey analytics concerns the treatment of data that is only available in small sample sizes. It’s pretty common for us to disregard one, two, even 30 games of data from the beginning of a season as ambiguous, even misleading. This has struck me as not good enough. There are only so many games in a season — waiting until it’s over doesn’t really help a team evaluate its players. This post will propose new methodology to help in this task.

Let’s start with the major problem of small sample size — namely, small sample size. When you’re tracking one player, one game at a time, without any *context*, you can’t really learn much. How do we add the context? By adding sample size. Thousands of games of sample size.

Imagine a true 50% even strength Corsi player. Imagine that we know he’s a 50% player, as a given. Now imagine this player playing a single game — what happens? We know intuitively that he probably won’t have a 50% game exactly, it’ll be some percentage between 0% and 100%, but tending around 50%. To define the shape of that probability curve, we need to know the variance in Corsi percentage of a typical game for an NHL player. We can do this by looking at real NHL data, looking at any player’s in-season game-by-game Corsi percentage, to get a sense of what any player’s typical standard deviation is over a season’s worth of games. It turns out this standard deviation is around 0.12. 95% of expected data is within 2 standard deviations of the mean. If a player’s mean Corsi% is 50%, that means 95% of his games will have a Corsi% between 26% and 74%.

Here’s a histogram that shows a 50% Corsi player’s distribution of possible game Corsi%’s over a simulation of 10,000 games:

So about 60% of the time, this player will have a game Corsi% of between 40% and 60%. 15% of the time he’ll have a Corsi% between 30% and 40%. 4% of the time he’ll have a Corsi% of between 20% and 30%. And so on. The important thing to take away here is that he’s likely to have games around 50%, but the possibility of game Corsis run the entire probability gamut.

I suppose it’s interesting to know what’s possible over one single game, but how does this tell us anything about a season’s progression? Imagine our 50% Corsi player playing one game, and then another, and then another. Each passing game is played according to the probability curve above. And each passing game accumulates in the player’s season-to-date Corsi%. After 82 games, this 50% Corsi player’s season Corsi will necessarily be close to 50%. Too many games have passed for it not to be close to his true talent level. But how does that distribution of probabilities evolve over a season? To answer this, we’ll ask our 50% Corsi player to play many, many seasons. 10,000 seasons, in fact. I’ll use a technique called Monte Carlo simulation, which uses random numbers in a given probabilistic formula over many iterations to display a model’s behaviour. Here’s the results of 10,000 seasons of data for a true 50% Corsi player’s expected by-game results:

On the far left of our x-axis is the distrubtion of possible outcomes of 10,000 iterations of Game 1 of a season. As each game passes, the Corsi% of each 10,000 player-seasons accumulates. The more games that are played, the closer each of our 10,000 player-seasons drives towards 50%, their true talent level. The straight middle blue line is the mean (average) accumulated Corsi% of our 10,000 player-seasons as each game passes. Predictably, this is 50%, with only minute deviations. The very top blue and very bottom purple lines are the extremes seen in the data — these are the maximum and minimum points seen in 10,000 simulated seasons of a 50% Corsi player.

The really interesting lines here are the green and red lines, marked “Upper 95%” and “Lower 95%”. These encompass our 95% confidence intervals. Now, a 95% confidence interval means that 95% of observations fall within it. That means that 2.5% of observations will be above it, and 2.5% of observations will be below it.

Another point of interest is the rapidly changing slope of the 95% confidence lines. They start wide in early games, but quickly dive towards 50%. If you’re above 60% or below 40% Corsi by game 5, we already know a lot about you. Likewise, if you’re above 55% or below 45% Corsi by game 19, we know a lot about you. By the end of an 82-game season, a true 50% Corsi player has a 95% chance to be between 47% and 53% Corsi. Anything beyond those levels, and, yes, we know a lot about you.

In this analysis, I’ll use this 95% “level of significance” to test against our 50% Corsi player. Our null hypothesis is that a player’s true mean Corsi is 50%. To reject this hypothesis, an observed data point will need to fall above our 95% confidence interval (to accept that he is higher than a 50% Corsi player), or below our 95% confidence interval (to accept that he is lower than a 50% Corsi player).

Let’s look at some practical examples of this. Here’s the above probability distribution overlaid with the 2012-13 by-game season-to-date Corsi% for Oilers centres:

First, imagine this graph with just the player’s Corsi% lines. You know that, yes, these Corsi lines suck horribly, but you don’t have context to know how bad they suck. Now, flip in the 50% mean and 95% confidence interval lines, and we have some context to compare the players’ lines with.

The best Corsi here is Ryan Nugent-Hopkins’ — his light blue line tends towards our 50% mean line, and tracks within our 95% interval. The latter fact lets us say that we cannot reject the null hypothesis that RNH was a 50% talent Corsi player in 2012-13. His teammates, however, are a different story. At some point during this season, they all fall below the Lower 95% line. That is to say that in our 10,000 season simulation, they fell below a point that only 2.5% of observations were beneath, meaning that we can reject the null hypothesis, and say that they are below 50% Corsi players. Horcoff and Belanger start flirting with the Lower 95% line around Game 8, and fall under it for good around Game 16. If you were tracking this during the year, you’d know by Game 8 that something was wrong.

But most interesting to me is Sam Gagner’s line. By Game 2, his orange line has dove below the 95% confidence line. This was after only two games of results, which were games of 41% and 25% Corsi percentages for him. Now, this doesn’t sound like much in our traditional view of the game. But the probability that a true 50% Corsi player would post two games consecutively so low is less than 2.5%. With only two data points, we know that something is rotten in Denmark, and can accept that he is a lower than 50% mean Corsi% player. He tracks the lower 95% line for another 7 games, until he clearly falls under it around Game 9.

Let’s look at the Oilers’ Defence:

Here we see in full gory detail what a bad NHL defence corps looks like. All five defencemen shown here start within our 95% interval at the beginning of the season, but one by one they all eventually fall below it. The earliest is Ryan Whitney, of whom we can say only 9 games into the season that he is not a 50% Corsi player. The next to fall are Jeff Petry and Ladi Smid, who fall below our Lower 95% line in games 18 and 19, respectively. Justin Schultz falls below in game 23, while Nick Schultz is the final hold-out, not falling below until game 24, the exact halfway point of the season. At varying points, we can say that all of these players are below 50% Corsi players.

Let’s concentrate on one player to see how he flirts with the Lower 95% line: Ryan Whitney

The blue line shows Ryan Whitney’s accumulated Corsi% percentile rank in our true 50% player’s 10,000 simulated seasons. By Game 4 we can see that only 10% of observations had Corsi percentages lower than him. By game 7, he’s flirting with our 2.5% Lower 95% Confidence Interval line. By game 9, he’s below it, never to return. By game 19, his Corsi% sunk to a point where not one of the 10,000 simulations of a true 50% player’s season was below him. To put that into context, there’s only been about 20,000-25,000 player-seasons in NHL history. (As an aside, I kept this as a two-tailed test to see whether a player was a 50% player or not. I could have made it a one-tailed test, to test whether he was explicitly a below 50% player. If this had been done, the test line would be at 5% here instead of 2.5%, and Whitney would have failed it by Game 7 instead of Game 9).

Possible enhancements to this model would take into account expected performance of players of various roles (you may want to assume a first line player should be a 54% Corsi play and test a 4th liner at a 48% level instead of 50%), the difficulty of opponent in any given game, home/road game advantages, schedule rest, etc. This assumes that a player is just as likely in one game to have the same probability curve of possible Corsi outcomes as any other game.

And so what applicability do we have for the Oilers so far after one game? Well, their top player had a Corsi% of 62%, while their bottom player was at 41% — these are both inside the 1-game 95% confidence interval of between 26% and 74%, so we cannot reject the null hypothesis that each player is a 50% Corsi player, yet. After two games, the critical levels are between 34% and 66%. After 3 games, it’s between 36.5% and 63.5%. We’ll keep an eye on this one.

## 4 Comments

I’m with you through the first half, but in the second half I think you have it wrong.

The probability that Ryan Whitney’s Corsi would be achieved by a 50% Corsi player is not the same as the probability that Ryan Whitney is a 50% Corsi player. Your last two charts imply that it is.

You’ve basically reconstructed Thinker A’s erroneous analysis from Vic Ferrari’s intro to Bayesian thinking, I’m afraid.

In this analysis, I’m considering each season as a discrete series of events. This may make more sense in reality for some teams than others. For the Oilers, with two new head coaches in two consecutive years, they’ve got new systems and deployments in addition to new players to play with.

This is treating each season for each player as a fresh sheet. Each new game is a fresh new data point, without priors from a previous season to draw from. As we’ve talked about, this aspect doesn’t take into account a Bayesian method, and, yes, it is what it is.

For each game that passes, you can construct a statistical test:

Null Hypotehsis is that a player’s mean Corsi up to that point is 50%.

Alternate Hypothesis is that a players mean Corsi up to that point is not 50%.

Imagine a bell curve that changes every game, with mean 50% and sigma that changes as more data is collected. We can plunk down our required test statistic at a different point at each game, and test against that.

I do understand your concerns about not taking into account Bayesian thinking, and I wasn’t claiming to have done that here. But IMO there’s a legitimate test that can be constructed out of this line of thinking. Finding out if a player’s Corsi% mean up to any point in a season is likely not 50% is useful.

As I mentioned, this is only a first cut, and I’ll see if there are ways to make it more instructive. In fact, I’d love it if you’d like to chip in, have a look at the data, and see where and how the model can be improved. Seriously. I’m not a natural Bayesian thinker.

The problem is here:

Imagine you’re a fan of the National Coin-Flipping League. One of the coins on your team comes up heads nine times in the first ten games. You run this test and reject the null hypothesis that the coin is a 50% coin. But given what we know about coins, you’re probably wrong.

Of course, we don’t think all hockey players are identical 50% players, like coins are. But we need to know how different they are for this analysis to work.

Imagine a league where the standard deviation on Corsi Skill is 1%. In the short run, there’s variance, but if we’re looking at the actual ability of any given player, there’s a 95% chance they’re between 48% and 52%, and a 99.7% chance they’re between 47% and 53%.

One of the players on your team runs at 38% over 100 shots (by the way, you should be using shots, not games, to assess sample size). So let’s use your test…

If they’re a 53% player, the probability of going 38/100 or worse is 0.2%.

If they’re a 52% player, the probability of going 38/100 or worse is 0.3%.

If they’re a 51% player, the probability of going 38/100 or worse is 0.6%.

If they’re a 50% player, the probability of going 38/100 or worse is 1.0%.

If they’re a 49% player, the probability of going 38/100 or worse is 1.7%.

If they’re a 48% player, the probability of going 38/100 or worse is 2.8%.

If they’re a 47% player, the probability of going 38/100 or worse is 4.4%.

So now what? Because you’re examining this player in isolation, without considering the underlying distribution of talent in the league, you reject all of these hypotheses and conclude he’s a 46% player or worse.

But we know that in that particular league, a 46% player is four standard deviations below the mean, a one-in-30,000 rarity. You’ve rejected hypotheses that were one-in-100 in favor of one that’s one-in-30,000, because you’re examining the player in a vacuum. You just can’t do this kind of analysis correctly without some starting information about how talent is distributed.

For each hypothesis, you need to multiply the likelihood of finding a player with that particular talent by the likelihood of a player with that talent having this particular result. Then you can see which hypothesis is really most likely — and how likely the other hypotheses are, how confident you can be in your projection.

That’s how Bayesian analysis works, and it’s the right way to answer questions like this.

Excellent article.

Is there any chance to do the same analysis on a team level

It would be very interesting to find out what a few ‘bad’ corsi games on a team level tell us?

Thanks

## One Trackback

[...] One of those conversations occurred in the comments on my last piece. The other revolved around the question of how many games we need to see from a player before we can conclude that they will have a below-average Corsi. [...]