One of the major flaws identified with Corsi as a metric is that it is very prone to influence from factors outside of the player’s purview — where he starts on the ice, who he plays with, and who he plays against are decisions made by his coach and his general manager. Of course James Neal looks terrific in Corsi — he plays the whole bloody game with either Sidney Crosby or Evgeni Malkin. Of course the Sedins have good Corsis, they start in the offensive end more than the opposition’s netminder. I’ve been trying to think of a way to adjust Corsi to take those factors out — to try to explain what portions of a player’s Corsi is explained simply by what circumstances they find themselves in, and what portion of a player’s Corsi is the residual talent/results.
To review, Corsi is simply a shot attempt metric — how many shot attempts does your team take at the opponent’s net, net of how many shot attempts the opposition takes at yours, rated on a per 60 minutes of icetime basis.
I compiled 5 seasons of data using behindthenet.ca, only looking at forwards who played at least 30 games in a season. My original design used three metrics to try to explain a player’s raw Corsi:
- Corsi Quality of Competition – a measure of the average raw Corsi of the players a certain player faces
- Corsi Quality of Team – a measure of the average raw Corsi of the players a certain player plays with
- Offensive Zone Start % – a percentage of how often a player’s shift is started in the offensive zone by his coach
It was obvious from the outset that these variables have a massive amount of explanatory power with respect to a player’s raw Corsi. The only hitch comes when you realize that Corsi QoC correlates POSITIVELY with a player’s Corsi — ie, the more difficult a player’s competition, the better he seems to do. This is obviously counter-intuitive to logic, and is likely due to some recursive effects of in-game strategy — coaches tend to play their best players against each other. To get around this while still trying to incorporate the insights of Corsi QoC, I decided to take a simple differential between Quality of Team and Quality of Competition. And why not? If the average Corsis of a player’s linemates is 2, for instance, and the opposition he plays against also has a Corsi of 2, shouldn’t that logically mean he’s been placed in a neutral situation, ie he has a difficulty factor of 0? If a player’s linemates have average Corsis of 20, and he plays against opposition with average Corsis of 0, doesn’t that mean he’s been placed in an incredibly fortunate set of circumstances (with a difficulty factor of +20)? In my analysis, I refer to this new variable as QualDiff.
I then ran regressions using QualDiff and Offensive Zone Starts % on raw Corsi for each of the last five seasons, AND all 5 seasons combined. The results:
Let’s concentrate on the Combined equation, which is greyed above. The formula would then be:
Expected raw Corsi = -11.91 + QualDiff * 1.00 + Off ZS% * 0.24
So, a player with QualDiff of 0, and a zone start rate of 50% would have an expected Raw Corsi of:
Expected raw Corsi = -11.91 + 0 * 1.00 + 50 * 0.24 = -0.15
This is so close to a zero raw Corsi that not one player with more than 10 GP this year is closer than it to zero. The r-square correlation of the combined formula is 0.61, meaning 61% of the variation in Corsi can be explained by the variation in these two variables. The P-values of each coefficient is so low that’s it’s approaching zero.
The formula suggests that Corsi will increase by 1 for every unit increase in his quality of teammates and every unit decrease in his quality of competition. His Corsi will increase by one about every 4 more percentage points in his zone start. A player with a zero QualDiff and starting every faceoff in the defensive end is expected to have a -11.91 Corsi, reflected in the intercept above.
Let’s walk through how we can now apply this formula to adjust raw Corsis. First you must figure out what each player’s expected Corsi would be using the formula above and the two variables. Then you simply need to take the difference between his Actual Corsi and his Expected Corsi –ie, how much better or worse did this player do than his expected Corsi? I’m calling this “Adjusted Corsi”. All we’re doing is taking away the portion of a player’s Corsi that can be explained by his zone starts and quality factors and seeing what’s left over.
If we apply the formula to this year’s data among players with more than 10 GP, here are the top and bottom 10 players, along with any qualifying Oilers:
Using this table you can follow how the Adjusted Corsi is derived from following the formulas at the top. We can see that Jordan Eberle actually has the 9th best Adjusted Corsi in the NHL right now — this is because he plays with crappy teammates (ie the Oilers) against relatively difficult competition & middling zone starts and actually has a raw Corsi of +3.3. Since his expected Corsi is -9.87, his adjusted Corsi is 13.17. This places into context how strong the Oiler’s kid line is versus the rest of the league, considering the talent they have been surrounded with.
Many of the Oilers do quite well here, while many do quite terribly. Zone starts and Quality Factors actually bump Lennart Petrell out of last place in the league and into 13th last — he is expected to have a Corsi of -23.57, but actually has a Corsi of -38.8, for an Adjusted Corsi of -15.23.
I’d expect the formulas for defencemen to be different, but I will perform that analysis next.