Over the last few weeks I’ve been experimenting with new formulas to determine how shot distance, shot type, and shot quantity can all be intertwined into a new shot-based statistic. Basically, I’m coming up with a way to weight any given shot taken on the ice based on its objective “danger” of going in. To do this, I’ve mined 5 years of data between 2007 and 2012. I’ve taken to calling this concept “Expected Goals”, as giving shot attempts a probability of going in also handily aggregates into how many goals you’d expect a team to create or allow given league-average goaltending.
In my last post, the observed data resulted in the following lines:
You can see that different shot types have different likelihoods of going in based on distances. Not pictured above are the separate concept of “rebounds”, which have a much higher chance of being goals than all non-rebounds.
A quick aside on rebounds — I wanted to see whether rebounds within 3 seconds of saves, missed shots, and blocked shots really do exhibit similar shooting percentages from various distances. Perhaps it’s unwise to lump rebounds after all 3 of those events together, and since rebounds are such key scoring chances, it’s worth figuring out.
Here’s a chart that compares shooting percentages after all 3 types of events:
You can see in the high-frequency rebound ranges between about 7 and 15 feet, shot attempts within 3 seconds of either saves, misses, and blocks all exhibit similar shooting rates, with a slight advantage for shots off saves (which makes sense, because the goalie is more likely to be down and immobilized). But the differences here are not enough to demand different functions for these events altogether, especially considering how relatively rare rebounds off misses and blocks are. This article will assume “rebounds” to include any shot attempt within 3 seconds of any of these previous thwarted shot attempts. Further, rebounds are considered discrete from other shot types — if I consider a shot a rebound, it is a rebound, and not a “wrist shot”, etc.
Continuing on. Now that I’ve found the observed shooting percentages, I’ll need to create mathematical functions to model them. It was a pretty routine affair, fitting out different kinds of lines and settling on ones that fit the observed data the best. Most were second order polynomials, two were logarithmic, one was a third order polynomial, and another was just plan old linear. In all cases I required 400 shots at any distance in feet to consider having enough sample size to model it. Any distances shorter than that point, I just took the total goals scored from that distance and smaller and divided by unblocked shot attempts from the same distances and applied a uniform curve to it. Ditto for situations beyond the far distance threshold for sample size, but in cases where the unblocked shooting percentage (USH%) was higher than the last modeled points, I just extrapolated the modeled point onwards uniformly.
That’s a lot of words to prep a picture. Here it is:
Each point on each line in this graph represents a modeled probability of shots taken from that distance going in. In all cases, the modeled lines adhered to the observed data with R-squared correlations of over 0.97.
With this graph in mind, you can now watch a hockey game the way Cypher reads the green code of The Matrix. Oh hey, a 7 foot wrap-around, that’s got about a 4% chance of going in. A rebound from 7 feet? Now that’s got a 30% chance of going in. Add up all the probabilities of the shot attempts your team takes, divide by the combined probabilities of the shots taken by both teams, and that’s your team’s Expected Goal %.
Now why even do this? Some rationale:
- It helps to even out the randomness of goaltending seen in goal scoring ratios. In baseball there’s an entire category of statistics for pitching that are independent of defence — these take out the things that a pitcher has little influence over, and concentrates on the things he can control. This expected goal measure takes out something a player has little control over (his goaltender) and concentrates on the things he can control, namely attacking and defending.
- It also takes out the randomness of who’s doing the shooting. We all know that 50 foot shots taken by Stamkos and Ryan Smyth are not created equal. But if I’m a defender, and I’ve successfully managed to keep either of those players to the periphery enough to force a 45 foot shot, I’ve successfully defended my territory, regardless of who’s taking the shot. This formula wouldn’t care if he only faced players with the shooting talent of Crosby, whereas his traditional on-ice save% would plummet and his plus-minus would look like crap.
- Picture a team attacking and a team defending. What is the defending team’s goal on the play? To deny scoring chances and get the puck back. How does it deny scoring chances? Usually this involves keeping the attacking team to the outside and not giving them chances to get in good position to score. Small things also come into play — a defender all of a sudden needing to cover an attacker with the puck in a scoring area, and forcing him to switch to a less than optimal shot selection, such as hounding him enough to switch to his backhand instead of getting a clean snap shot off, thereby lessening the probability that his shot will go in. I picture this metric measuring ground gained, like the front lines and trenches of World War I. Each attacking team wants to get as close to the enemy as possible to increase their odds of getting a kill shot. They also want to use the best weapon posssible when they get there — a bazooka is worth more than a BB gun. The defending team wants to keep the enemy as far away as possible, and also want to influence what weaponry they can use against them. This metric measures how good each team is at creating and denying danger.
- It has the added benefit of expressing things in a unit everyone can understand — goals. It’s intuitive to communicate and makes analysis fairly simple. It’s not expressed in widgets / 60 minutes or some other “scary” or proprietary concept. Further, compare a team’s actual goals to its expected goals and you start to get a sense of what influence non-average goaltending has had on a team’s fortunes. This is especially true of this year’s Oilers, as we will discover later. This works at both the team and player level, just like how traditional shot metrics like Corsi or Fenwick can be applied. If we think player should have been on for 10 goals for and 15 goals against, but he’s actually been on for 10 goals for and 5 goals against, we know he’s been incredibly lucky to get great goaltending behind him.
I’ve taken all of the formulas and applied them to the 19 Oilers games that have been played to date. At the team level, we know the following to be true at even strength:
- The Oilers have created 740 total shot attempts and given up 876, for a Corsi% of 45.8%.
- They’ve created 538 unblocked shot attempts, and given up 626, for a Fenwick% of 46.2%.
- They’ve actually scored 34 goals, and given up 52, for a goals percentage of 39.5%.
- Applying the new expected goals formula, I’d have expected them to score 31.7 goals and given up 34.7. This is an expected goals percentage of 47.7%.
This means that the Oilers have scored about 2 more goals at even strength than I’d expect, and have given up about 17 more than I’d expect. They have enough shooting talent for me to believe them scoring a couple more than you’d expect, but giving up just shy of 50% more goals than I’d expect is a massive discrepancy. But it kind of does feel like the Oilers goalies have been giving up about 1 freebie a game, even when they play well, and these number give some credence to that notion.
But what about players? Here’s how we’d have expected them to do:
This shows the total expected goals for and against each player 19 games into the season, and sorts by expected goals for percentage (EGF%). Some thoughts:
- Generally this demerits players that you’d intuitively think were playing poor defence. Gagner, Larsen, and Fedun all have EGF% that are below their normal Fenwick%, suggesting they’ve been given up better chances than their shot differentials would suggest.
- Ryan Smyth scored a lot higher in the old version of this formula that didn’t discern by shot type. It turns out he was firing a lot of low-percentage wrap-arounds at the net instead of getting high quality rebounds, etc, so his EGF% is much more moderated
- Players like Arcobello, Joensuu (small sample), Jones, Belov, Petry, and Perron have higher EGF% than Fenwick%, suggesting they’ve been on the ice for higher quality chances for or lower quality chances against than shot differentials would account for.
- Only a few players have been able to sustain higher than average rates of EGF and lower than average rates of EGA — Arcobello, RNH, Petry.
- The most dangerous players have been Eberle, Arcobello, Perron at forward along with Justin Schultz and Belov on defence in terms of rates of expected goals for.
- The worst defenders have been Ryan Smyth, Yakupov, Justin Schultz, Ference, and Perron.
It’s important to keep some things in mind. This doesn’t account for any contextual factors such as quality of team, quality of competition, or zone starts. It’s just like raw shot differentials in that regard. Ference may look like he’s getting buried compared to, say, Gazdic, but Gazdic has been facing much easier competition than Gazdic. This also is hampered by what’s counted on an NHL.com score sheet. I’d love to pull out things like breakaways, 2 on 1′s, one-timers, etc etc but you just can’t at this point in time. Even if they were to start counting that kind of thing tomorrow, it’d take a year or two or three to get enough data to even figure out what effect those shot types may have. These formulas assume a wrist shot taken at 12 feet during a zone pressure is the same as one taken on a breakaway. It’s the best we can do at this point.
Now that I’ve formulated this the way I want, I can start proper testing for sustainability and predictive power. It may turn out to be a complete waste of time, but I feel like I’ve learned some things along the way.