Hockey really is a chaotic thing. When you watch a game, things oftentimes happen fast enough to obscure what is really going on. The reason I’ve been researching and writing on the topics I’ve chosen this summer have been to further explore these atomic nuclei — to help unwrap how the game gets put together according to its purest tenets.

To me, this has largely been an exercise in testing the randomness of various hockey metrics. If something is found to be random, it follows that the player said metric is being attached to does not have control over it. If a player or team does not control something, we cannot rationally use it to value their performance.

Percentages are an area rife for further research. A construction of this, PDO, combines on-ice save percentage and on-ice shooting percentage to estimate how much luck a player is experiencing (or a team, for that matter). Anything above 1000, and they are lucky, anything below 1000, and they are unlucky. Of course, this formulation requires the logic that both on-ice shooting and save percentage are completely random events.

As we saw at the team level, shooting percentage is random, but save percentage is not. This should make sense — all teams shoot against many different goaltenders in the league over a season, but your own team tends to employ between 1 to 3 goaltenders over a season, who all have established levels of ability. I tried to fashion a new method for improving the randomness of PDO, but can understand some people’s misgivings about the marginal benefit of the increased computational complexity. To me, it’s a question of principal that I will never feel quite comfortable with, but in any case, team-level PDO provides but a fraction of the value inherent in player-level evaluations that a team needs to make in the course of filling its roster.

I’ve recently explored shooting percentage at the individual level, and found that while it seems to be random for defencemen, it is not random at the forward level. Forwards show the ability to consistently show similar levels of on-ice shooting percentage (OISP) one year after the next — in fact, you can use one year’s level to significantly predict the next year’s level.

Recently I’ve tried to delve further into on-ice save percentage (OISVP) to see if there are similar low-hanging fruits to pull off.

At the centre of an analysis of OISVP is a core truth — certain players tend to play with certain goalies, otherwise known as teammates. This tendency to have teammates means any cursory test for randomness in a player’s OISVP will be biased immediately — of course certain players will exhibit traits of maintaining high or low OISVP, as they may play with good or bad goaltenders. So what if Mason Raymond has shown an ability to maintain a high OISVP over the last 6 seasons — he’s been playing in front of Roberto Luongo and Cory Schneider. You can test raw OISVPs for randomness, but you will find non-randomness for both defenders and forwards, just like you will at the team level. This isn’t all that interesting to me.

To get at whether something truly fundamental is happening here, you need to tease out a bit more. First, I’ll try to scythe off the biases introduced by goaltending quality.

To start with, I stopped thinking about how good a player’s OISVP was, and started thinking about how good it was relative to his own team’s even strength save percentage. This requires taking a simple difference between the two:

OISVP Differential =Player’s OISVP – Team’s Even Strength Save Percentage

Here’s the forumla for 2007-2008 Alex Ovechkin:

OISVP Differential = 0.905 – 0.910 = -0.005

Therefore, when Ovechkin was on the ice, the capitals allowed 9.5% of shots on their net to be goals, while their team average was to only allow 9% of shots to be goals.

I recreated this process for all players in my sample who’d played at least 30 games in each of the last 6 seasons (plus Crosby) to find their overall score.

The first of my findings was that for defencemen, this step of “unbiasing” OISVP was enough to establish randomness:

Here are the top and bottom defencemen by 6-year average OISVP. If you’re thinking that this data is already looking like a delicious bell curve, you’d be right. Not a whole lot seems to be separating the top players from the bottom players. The 1% that separates Greg Zanon from Filip Kuba is worth about 5 goals over the course of a normal season, or less than 1 win (according to the rule of thumb of 6 goals for every win).

We also see that defencemen’s distribution around the expected long term normal cumulative average of 0.5 is fairly tightly-banded. There aren’t a lot of outliers lingering past 0.7 or below 0.3. Are these outliers occurring at a rate higher or lower than expected?

This tactic treats each season as a coin flip — think of being above the average OISVP in a season as heads, and below the average OISVP as tails. Over 6 seasons, we’d expect the distribution of players in our sample to follow that seen in the orange bars. In reality, we see what actually happened in the blue bars. Doing a quick chi-squared test confirms that the distributions are plausibly similar.

In addition, one player’s OISVP Differential in one season cannot be used to predict the next season’s OISVP Differential after testing using regression — a P-value of 0.186 is much too high above the 0.05 threshold required to accept a hypothesis that the two variables (one season predicting another) are related.

At this point, no more “unbiasing” needs to be done as we’ve already established randomness. Any further effort would simply make it “more random” in my eyes. Defencemen have no control over their OISVP Differential. One year they may post an OISVP that is higher than their team average, and the next it may be lower than the team average. In the long-run, it is impossible to sustain at abnormally high or low levels.

For forwards, the conversation is a bit more complicated.

If we were to leave forwards at the same level, we’d have no choice but to say that OISVP is not random. Have a look at these charts:

The extremes compared with the defencemen are already larger. Oddly, crappy players seems to be really good at this, and good players are pretty bad at this.

The spread between players’ 6-year average normal cumulative scores is much more erratic around the 0.50 long-run expected average.

This chart shows a distinct pattern of taking sizable chunks away from the random middle and giving them to the non-random extremes. A chi-squared test shows just a 0.3% chance that the actual data seen for forwards comes from the same distribution as the expected distribution. This is not random.

But something bugs me about this. Why do good players tend to sustain low OISVP Differential scores, while bad players sustain high ones? A reasonable explanation might be that “bad” players tend to play on the 3rd and 4th lines, and were hired as defensive specialists, and succeed in that vocation. They compensate for their lack of skill by cheating for defence, getting back early, checking the other team tighter, etc.

Or, a more accurate explanation would be that 3rd and 4th liners tend to play against poorer opposition. We’ve already shown that good players tend to sustain higher on-ice shooting percentages (OISP) — I consider this ability to be a reflection of their skill level.

We also know that good players tend to play against good players, and 4th liners tend to play against 4th liners. This is pure game theory played out on a nightly basis pitting one coach against another. The risk of having your 4th line out against the other team’s top line is too great not to match away from.

This being established, I need some way to control against this quality factor so we can judge players against eachother. At this moment, I don’t have a statistic for quality of competition based on average OISP. As a proxy, I will use a player’s average even strength time on ice per game (TOI) to gauge what kind of players they are expected to play against. If a player gets 16 minutes of ice time a night, odds are the coach considers him top line material, and plays him against the other team’s best. If you’re getting 8 minutes, you’re probably only playing 3rd or 4th line competition as the coach finds ways to use you without getting hurt for it.

It turns out that a player’s TOI is an incredibly powerful predictor of his OISVP Differential. The more a player plays on average, the worse his OISVP will be. For each of the 6 years in my sample, TOI was a statistically significant predictor of OISVP Differential. P-values ranged as high as 0.002 (this past season), to 3.60203E-09 the season before. In the regression equations, a good rule of thumb would suggest that you lose about 0.0025 for every additional minute you play. A player that plays 16 minutes as opposed to 6 minutes a night will have an OISVP Differential 0.025 lower. If your team’s average save percentage is 0.920, I’d expect the 4th liner to have an OISVP of 0.935 or higher, and the 1st liner to have an OISVP of 0.910 or lower.

The next steps are pretty basic. We have to figure out the regression equations tying TOI to OISVP Differential for all 6 years, then we have to come up with our expected OISVP Differential based on that equation for each player in each year, and then we find the residual between what the player actually did and what we expected. A player with a high positive residual is said to have outplayed his lineup expectations, a player with a negative residual is said to have underperformed based on his lineup expectations.

The following table shows the highest and lowest residual scores for forwards:

Already these are looking a lot more like the extremes seen in the defencemens’ table above. Instead of being all crappy players, the leaders include some very good players, such as Marian Hossa, Mikko Koivu, Pavel Datsyuk, and Shawn Horcoff (he’s a good player, now shut up!). At least it’s starting to approximate something I can believe here.

The distribution seen in this chart is MUCH tighter than the one seen earlier for the forwards — is is very rare again to see players sustain above 0.7 or below 0.3 in their 6-year average normal cumulative scores.

And the finale, this chart shows an increased adherence of the actual observed buckets of players to the proportion we expected in each one. A chi-squared test shows a 40% probability that the actual data follows the same distribution as the expected data, well past the 5% we would have needed to accept an alternate hypothesis that this is not random.

**Conclusions**

- On-Ice Save Percentage (OISVP) data is highly biased based on team save percentage.
- Removing this team bias renders defencemen’s OISVP random, while forwards still show distinct evidence of being non-random
- A players time on ice (TOI) is a significant (negative) predictor of what his OISVP will be
- Once corrected for TOI, OISVP in forwards is successfully randomized. This means that once you correct for team and lineup biases, forwards do show random qualities in their OISVP.
- Because such extreme measures need to be taken to remove the biases, I would be very apprehensive of taking a forward’s PDO at face value.
- A defenceman’s PDO should still be corrected for team SV% factors, but is otherwise judged to be random.
- A forward’s PDO consists of elements that are proven to be non-random (on ice shooting percentage) and elements that need to be extremely modified to show random qualities (on ice save percentage).

## One Comment

Great stuff!

## One Trackback

[...] affect the evaluation of NHL defencemen. First, skaters do not appear to exercise any control over their team’s save percentage when they’re on the ice. Second, defencemen don’t have much control over their team’s shooting percentage when [...]