Before the season starts, I wanted to refresh the Adjusted Corsi statistic I’d developed last season using the full season of 2012-2013 data and introducing a new twist of logic.

The gist of why this is important is quite simple — some people assail Corsi as a statistic because it’s too dependent on other factors and is hard to make into an individual statistic. I think it’s possible to simultaneously admit that this is somewhat true while finding ways to distill Corsi into a normalized format. To accomplish this, I’ll use regression to control for strong dependent relationships, and get at the shiny residue, or the portion of Corsi that is unexplained by the circumstances the player found themselves in.

In my previous piece, I controlled for Quality Factors (using a variable I called QualDiff, the difference between QoT and QoC) and Zone Starts, finding that they were very significant explanatory variables for how a player’s raw Corsi rate / 60 mins would turn out. To augment these, I’ve added a third variable: 5×5 time on ice per game. My logic is uncomplicated — if your coach is playing you for 15 even strength minutes each game, he considers you a first line player, and first line players *should* have an ability to outshoot average opposition more than someone who plays only 10 minutes a night. If we control for your time on ice, we can get at the residual Corsi that you produced *beyond* what your coach’s expectations could have been.

What I’m trying to do is get every forward in the NHL on even footing, to properly judge how they did in terms of shot differential with what a rational observer would have expected given the factors outside of the player’s direct control.

So, do these three independent variables hold statistically significant explanatory power? Yes. Shitloads.

Here’s my summary output. My sample includes the 336 forwards who played 30 games or more in 2012-13. Taken together, the three variables explain 75.1% of the variance in Corsi, leaving 24.9% of magic sauce to explain as residuals. The P-values of all three of my variables are stupendously low. That “2.41E-69″ P-value for QualDiff means the probability that QualDiff’s coefficient is actually zero (ie QualDiff has no explanatory power) is 0.0000000000000000000000000000000000000000000000000000000000000000000000241%. Let’s put that in perspective: if someone offered you a chance of winning a Snickers bar in the next second, with the odds of getting that Snickers in the next second was (1/”all of the seconds of time since the Big Bang”), you’d have an astonishingly better chance of winning that Snickers than of QualDiff not having explanatory power in determining Corsi.

The formula is: Expected Corsi = -25.4 + QualDiff * 0.95 + OZ Start % * 0.22 + TOI/60 * 1.18

Interpreting the coefficients is fun:

- For QualDiff, the model suggests that for every unit increase in your Corsi Quality of Team or for every unit decrease in your Corsi Quality of Competition, your raw Corsi score should go up about 0.95 / 60 mins of 5×5 time. So, if one player has a Q0T of 0 and a QoC of 0, for a QualDiff of 0, and another has QoT of 5 and QoC of 0, for a QualDiff of 5, the second player’s raw Corsi should be ~4.6 Corsi/60 mins higher than the first player’s.
- For each percentage increase in your offensive zone starts, your Corsi is expected to go up ~0.22/60 mins.
- For each additional minute you play at evens, the model expects your Corsi to go up 1.18, meaning a player that plays 15 minutes will have a Corsi that is ~5.9 Corsi / 60 mins higher than a player who only plays 10 minutes.

Let’s have a look at the lists of players by Adjusted Corsi. For this I’ve separated the 336 players into quartiles to loosely represent 1st liners, 2nd liners, 3rd liners, and 4th line players. Note 336 into 30 teams is about 11.2 players per team, so this isn’t too far off a roster complement of 12 players per team. I’ve divided these up so you can compare players amongst their peers. Sidney Crosby is rated as the highest-ranked first line player, but Marcus Foligno’s Adjusted Corsi is higher does not mean Foligno is a better player than Crosby. All it means is that Foligno outperformed his expectations given his circumstances as a third line player a bit more than Crosby did as a first line player.

The Oilers top line players all show well here, and that squares with my interpretation of last year: many nights it was like two different teams were on the ice, the first line and all the other crap. The fact that Hall/RNH/Eberle put up positive Corsi numbers while playing with such crappy players (other than themselves) and fighting through whatever weird system the Oilers were playing with last year is damn impressive.

Leafs fans may not want to peek at who was rated the worst third liner by a country mile. Oilers newcomer Boyd Gordon shows very well in the 3rd line mix, ranked 15th.

To me, this list is not only interesting, it should be diagnostic. If you’ve got a player who seems to be knocking the cover off the ball in his 4th line role, perhaps you could try him further up the lineup, where harder competition and higher expectations that come with higher ice time should present more of a challenge. There also seems to be players at the end of each list that probably should find an easier roster spot, or retire altogether.

**First Liners**

**Second Liners**

**Third Liners**

**Fourth Liners**

## 6 Comments

I think this is all fantastic and interesting work, but I think players on great teams are punished a little too much; ie Marchand, Hossa, Carter, the SJ bunch. A closer look at WOWYs doesn’t support some of it.

Out of curiousity, is there any reason why you used QualDiff rather than QoC and QoT in the regression? Would be interesting to see whether each component has an equal effect on expected Corsi.

Interesting work, but this seems very similar to what Steve and I did with regards to deltaCorsi (dCorsi):

http://www.pensionplanpuppets.com/2013/7/30/4573254/expansion-of-steves-sdi-study

http://www.pensionplanpuppets.com/2013/8/8/4582260/making-use-of-new-ideas-dcorsi

The only difference is that we included 5v5 TOI/GP later on (Steve did a while back, but hasn’t published his results yet) and that we used Hockey Analysis’ QOT/QOC stats which excludes the player of interest’s effects on teammates’/opposition’s Corsi.

Hey Michael,

I’ve been working on something very very similar all summer long. Wrote about it a few times. I split up Corsi For and Corsi Against in an effort to get at player ability in more detail at either end of the ice, and found that the correlation between the two was negligible.

Either way – you’re basically coming at it from the same angle I have been – nice to see the work align.

Here’s a link to some of what I’ve worked on:

dCorsi comparisons

There are other links embedded in the article to expand on where it started when I looked at just “SDI” – or trying to focus in on Shut down D men.

Oh and the other thing I’m curious about is – are you using QoC and QoT Corsi? or +/-? If you’re using Corsi part of your regression’s high correlation is the fact that the player’s Corsi is included in their QoT and QoC measurements.

That’s why I switched to using stats.hockeyanalysis.com’s measurement TMCF and TMCA – which strips out the player you’re looking at in particular. The resulting correlations aren’t as high but the data is better.

Hi Steve,

I’d seen you reference dCorsi but hadn’t actually read the methodology, at least the fact that we’ve approached this problem similarly validates the technique! I’d written a bit about this technique last season, and there were some very insightful comments at that time, available at: http://www.boysonthebus.com/2013/03/14/adjusting-corsi-for-zone-starts-and-quality-factors/. Thanks for pointing out where to find the neutralized Quality measures, as I’d often wanted to have something similar to use that was unbiased from the players’ own effects!