Before the season starts, I wanted to refresh the Adjusted Corsi statistic I’d developed last season using the full season of 2012-2013 data and introducing a new twist of logic.
The gist of why this is important is quite simple — some people assail Corsi as a statistic because it’s too dependent on other factors and is hard to make into an individual statistic. I think it’s possible to simultaneously admit that this is somewhat true while finding ways to distill Corsi into a normalized format. To accomplish this, I’ll use regression to control for strong dependent relationships, and get at the shiny residue, or the portion of Corsi that is unexplained by the circumstances the player found themselves in.
In my previous piece, I controlled for Quality Factors (using a variable I called QualDiff, the difference between QoT and QoC) and Zone Starts, finding that they were very significant explanatory variables for how a player’s raw Corsi rate / 60 mins would turn out. To augment these, I’ve added a third variable: 5×5 time on ice per game. My logic is uncomplicated — if your coach is playing you for 15 even strength minutes each game, he considers you a first line player, and first line players should have an ability to outshoot average opposition more than someone who plays only 10 minutes a night. If we control for your time on ice, we can get at the residual Corsi that you produced beyond what your coach’s expectations could have been.
What I’m trying to do is get every forward in the NHL on even footing, to properly judge how they did in terms of shot differential with what a rational observer would have expected given the factors outside of the player’s direct control.
So, do these three independent variables hold statistically significant explanatory power? Yes. Shitloads.
Here’s my summary output. My sample includes the 336 forwards who played 30 games or more in 2012-13. Taken together, the three variables explain 75.1% of the variance in Corsi, leaving 24.9% of magic sauce to explain as residuals. The P-values of all three of my variables are stupendously low. That “2.41E-69″ P-value for QualDiff means the probability that QualDiff’s coefficient is actually zero (ie QualDiff has no explanatory power) is 0.0000000000000000000000000000000000000000000000000000000000000000000000241%. Let’s put that in perspective: if someone offered you a chance of winning a Snickers bar in the next second, with the odds of getting that Snickers in the next second was (1/”all of the seconds of time since the Big Bang”), you’d have an astonishingly better chance of winning that Snickers than of QualDiff not having explanatory power in determining Corsi.
The formula is: Expected Corsi = -25.4 + QualDiff * 0.95 + OZ Start % * 0.22 + TOI/60 * 1.18
Interpreting the coefficients is fun:
- For QualDiff, the model suggests that for every unit increase in your Corsi Quality of Team or for every unit decrease in your Corsi Quality of Competition, your raw Corsi score should go up about 0.95 / 60 mins of 5×5 time. So, if one player has a Q0T of 0 and a QoC of 0, for a QualDiff of 0, and another has QoT of 5 and QoC of 0, for a QualDiff of 5, the second player’s raw Corsi should be ~4.6 Corsi/60 mins higher than the first player’s.
- For each percentage increase in your offensive zone starts, your Corsi is expected to go up ~0.22/60 mins.
- For each additional minute you play at evens, the model expects your Corsi to go up 1.18, meaning a player that plays 15 minutes will have a Corsi that is ~5.9 Corsi / 60 mins higher than a player who only plays 10 minutes.
Let’s have a look at the lists of players by Adjusted Corsi. For this I’ve separated the 336 players into quartiles to loosely represent 1st liners, 2nd liners, 3rd liners, and 4th line players. Note 336 into 30 teams is about 11.2 players per team, so this isn’t too far off a roster complement of 12 players per team. I’ve divided these up so you can compare players amongst their peers. Sidney Crosby is rated as the highest-ranked first line player, but Marcus Foligno’s Adjusted Corsi is higher does not mean Foligno is a better player than Crosby. All it means is that Foligno outperformed his expectations given his circumstances as a third line player a bit more than Crosby did as a first line player.
The Oilers top line players all show well here, and that squares with my interpretation of last year: many nights it was like two different teams were on the ice, the first line and all the other crap. The fact that Hall/RNH/Eberle put up positive Corsi numbers while playing with such crappy players (other than themselves) and fighting through whatever weird system the Oilers were playing with last year is damn impressive.
Leafs fans may not want to peek at who was rated the worst third liner by a country mile. Oilers newcomer Boyd Gordon shows very well in the 3rd line mix, ranked 15th.
To me, this list is not only interesting, it should be diagnostic. If you’ve got a player who seems to be knocking the cover off the ball in his 4th line role, perhaps you could try him further up the lineup, where harder competition and higher expectations that come with higher ice time should present more of a challenge. There also seems to be players at the end of each list that probably should find an easier roster spot, or retire altogether.