## Piecing the Shot Puzzle Together

Over the last few weeks I’ve been experimenting with new formulas to determine how shot distance, shot type, and shot quantity can all be intertwined into a new shot-based statistic.  Basically, I’m coming up with a way to weight any given shot taken on the ice based on its objective “danger” of going in.  To do this, I’ve mined 5 years of data between 2007 and 2012.  I’ve taken to calling this concept “Expected Goals”, as giving shot attempts a probability of going in also handily aggregates into how many goals you’d expect a team to create or allow given league-average goaltending.

In my last post, the observed data resulted in the following lines:

You can see that different shot types have different likelihoods of going in based on distances.  Not pictured above are the separate concept of “rebounds”, which have a much higher chance of being goals than all non-rebounds.

A quick aside on rebounds — I wanted to see whether rebounds within 3 seconds of saves, missed shots, and blocked shots really do exhibit similar shooting percentages from various distances.  Perhaps it’s unwise to lump rebounds after all 3 of those events together, and since rebounds are such key scoring chances, it’s worth figuring out.

Here’s a chart that compares shooting percentages after all 3 types of events:

You can see in the high-frequency rebound ranges between about 7 and 15 feet, shot attempts within 3 seconds of either saves, misses, and blocks all exhibit similar shooting rates, with a slight advantage for shots off saves (which makes sense, because the goalie is more likely to be down and immobilized).  But the differences here are not enough to demand different functions for these events altogether, especially considering how relatively rare rebounds off misses and blocks are.  This article will assume “rebounds” to include any shot attempt within 3 seconds of any of these previous thwarted shot attempts.  Further, rebounds are considered discrete from other shot types — if I consider a shot a rebound, it is a rebound, and not a “wrist shot”, etc.

Continuing on.  Now that I’ve found the observed shooting percentages, I’ll need to create mathematical functions to model them.  It was a pretty routine affair, fitting out different kinds of lines and settling on ones that fit the observed data the best.  Most were second order polynomials, two were logarithmic, one was a third order polynomial, and another was just plan old linear.  In all cases I required 400 shots at any distance in feet to consider having enough sample size to model it.  Any distances shorter than that point, I just took the total goals scored from that distance and smaller and divided by unblocked shot attempts from the same distances and applied a uniform curve to it.  Ditto for situations beyond the far distance threshold for sample size, but in cases where the unblocked shooting percentage (USH%) was higher than the last modeled points, I just extrapolated the modeled point onwards uniformly.

That’s a lot of words to prep a picture.  Here it is:

Each point on each line in this graph represents a modeled probability of shots taken from that distance going in.  In all cases, the modeled lines adhered to the observed data with R-squared correlations of over 0.97.

With this graph in mind, you can now watch a hockey game the way Cypher reads the green code of The Matrix.  Oh hey, a 7 foot wrap-around, that’s got about a 4% chance of going in.  A rebound from 7 feet?  Now that’s got a 30% chance of going in.  Add up all the probabilities of the shot attempts your team takes, divide by the combined probabilities of the shots taken by both teams, and that’s your team’s Expected Goal %.

Now why even do this?  Some rationale:

• It helps to even out the randomness of goaltending seen in goal scoring ratios.  In baseball there’s an entire category of statistics for pitching that are independent of defence — these take out the things that a pitcher has little influence over, and concentrates on the things he can control.  This expected goal measure takes out something a player has little control over (his goaltender) and concentrates on the things he can control, namely attacking and defending.
• It also takes out the randomness of who’s doing the shooting.  We all know that 50 foot shots taken by Stamkos and Ryan Smyth are not created equal.  But if I’m a defender, and I’ve successfully managed to keep either of those players to the periphery enough to force a 45 foot shot, I’ve successfully defended my territory, regardless of who’s taking the shot.  This formula wouldn’t care if he only faced players with the shooting talent of Crosby, whereas his traditional on-ice save% would plummet and his plus-minus would look like crap.
• Picture a team attacking and a team defending.  What is the defending team’s goal on the play?  To deny scoring chances and get the puck back.  How does it deny scoring chances?  Usually this involves keeping the attacking team to the outside and not giving them chances to get in good position to score.  Small things also come into play — a defender all of a sudden needing to cover an attacker with the puck in a scoring area, and forcing him to switch to a less than optimal shot selection, such as hounding him enough to switch to his backhand instead of getting a clean snap shot off, thereby lessening the probability that his shot will go in.  I picture this metric measuring ground gained, like the front lines and trenches of World War I.  Each attacking team wants to get as close to the enemy as possible to increase their odds of getting a kill shot.  They also want to use the best weapon posssible when they get there — a bazooka is worth more than a BB gun.  The defending team wants to keep the enemy as far away as possible, and also want to influence what weaponry they can use against them.  This metric measures how good each team is at creating and denying danger.
• It has the added benefit of expressing things in a unit everyone can understand — goals.  It’s intuitive to communicate and makes analysis fairly simple.  It’s not expressed in widgets / 60 minutes or some other “scary” or proprietary concept.  Further, compare a team’s actual goals to its expected goals and you start to get a sense of what influence non-average goaltending has had on a team’s fortunes.  This is especially true of this year’s Oilers, as we will discover later.  This works at both the team and player level, just like how traditional shot metrics like Corsi or Fenwick can be applied.  If we think player should have been on for 10 goals for and 15 goals against, but he’s actually been on for 10 goals for and 5 goals against, we know he’s been incredibly lucky to get great  goaltending behind him.

I’ve taken all of the formulas and applied them to the 19 Oilers games that have been played to date.  At the team level, we know the following to be true at even strength:

• The Oilers have created 740 total shot attempts and given up 876, for a Corsi% of 45.8%.
• They’ve created 538 unblocked shot attempts, and given up 626, for a Fenwick% of 46.2%.
• They’ve actually scored 34 goals, and given up 52, for a goals percentage of 39.5%.
• Applying the new expected goals formula, I’d have expected them to score 31.7 goals and given up 34.7.  This is an expected goals percentage of 47.7%.

This means that the Oilers have scored about 2 more goals at even strength than I’d expect, and have given up about 17 more than I’d expect.  They have enough shooting talent for me to believe them scoring a couple more than you’d expect, but giving up just shy of 50% more goals than I’d expect is a massive discrepancy.  But it kind of does feel like the Oilers goalies have been giving up about 1 freebie a game, even when they play well, and these number give some credence to that notion.

But what about players?  Here’s how we’d have expected them to do:

This shows the total expected goals for and against each player 19 games into the season, and sorts by expected goals for percentage (EGF%). Some thoughts:

• Generally this demerits players that you’d intuitively think were playing poor defence. Gagner, Larsen, and Fedun all have EGF% that are below their normal Fenwick%, suggesting they’ve been given up better chances than their shot differentials would suggest.
• Ryan Smyth scored a lot higher in the old version of this formula that didn’t discern by shot type. It turns out he was firing a lot of low-percentage wrap-arounds at the net instead of getting high quality rebounds, etc, so his EGF% is much more moderated
• Players like Arcobello, Joensuu (small sample), Jones, Belov, Petry, and Perron have higher EGF% than Fenwick%, suggesting they’ve been on the ice for higher quality chances for or lower quality chances against than shot differentials would account for.
• Only a few players have been able to sustain higher than average rates of EGF and lower than average rates of EGA — Arcobello, RNH, Petry.
• The most dangerous players have been Eberle, Arcobello, Perron at forward along with Justin Schultz and Belov on defence in terms of rates of expected goals for.
• The worst defenders have been Ryan Smyth, Yakupov, Justin Schultz, Ference, and Perron.

It’s important to keep some things in mind. This doesn’t account for any contextual factors such as quality of team, quality of competition, or zone starts. It’s just like raw shot differentials in that regard. Ference may look like he’s getting buried compared to, say, Gazdic, but Gazdic has been facing much easier competition than Gazdic. This also is hampered by what’s counted on an NHL.com score sheet. I’d love to pull out things like breakaways, 2 on 1’s, one-timers, etc etc but you just can’t at this point in time. Even if they were to start counting that kind of thing tomorrow, it’d take a year or two or three to get enough data to even figure out what effect those shot types may have. These formulas assume a wrist shot taken at 12 feet during a zone pressure is the same as one taken on a breakaway. It’s the best we can do at this point.

Now that I’ve formulated this the way I want, I can start proper testing for sustainability and predictive power. It may turn out to be a complete waste of time, but I feel like I’ve learned some things along the way.

1. Woodguy
Posted November 12, 2013 at 10:20 pm | #

I think you are really onto something here Mike.

Its stuff like this that will help to bridge the gap for less statistically inclined.

I think many of the people who have embraced the shot attempts stats intuitively factor a lot of this stuff in and it becomes second hand in the conversation, but to the person who is new to this stuff it helps bridge the “But not all shots are created equal!!!” thoughts and will bring them easier into understanding shot metrics.

Also,

Ference may look like he’s getting buried compared to, say, Gazdic, but Gazdic has been facing much easier competition than Gazdic

When a Gazdic faces easier competition than a Gazdic is anyone awake to see it?

Also,

I’m going to take a while to embrace this as it shows Jones a complete player.

Noooooooo!!!!!!!!!!!!!!!

• Michael Parkatti
Posted November 13, 2013 at 10:36 am | #

Yeah, it’s really just a weighted shot metric, and you’re right in that a lot of people kind of weight them in their head already.

2. D green
Posted November 13, 2013 at 1:30 am | #

All excellent – one thing though – not sure that 3 seconds is a little long – wouldn’t most goalies/defenders have recovered positioning etc within 2 ?

3. mac sapp
Posted November 13, 2013 at 9:42 am | #

Well Mike, you’ve reinforced my view as a defenceman that if I’m going 1-1 with a guy and he decides rips a shot from just inside the blue I’ve done my job. I always thought, ‘fuck it, he can have that shot every time.’

Course beer league goalies are not NHL goalies, but as a strategy it was the best I could do!!

4. Mitch
Posted November 13, 2013 at 9:48 am | #

Great article. Honestly your method here seems like it simply MUST have some truth to it.

5. Frag
Posted November 13, 2013 at 10:08 am | #

If you have enough data, are you planning on adjusting for factors like QOT, QOC, zone starts, etc.? Something similar in technique to dCorsi?

• Michael Parkatti
Posted November 13, 2013 at 10:33 am | #

The thought has crossed my mind, but that’ll likely be much further down the road! My top of mind interest is with the team level stuff, and proving that stuff out. Then I’ll be testing sustainability at the player level…

6. not norm
Posted November 27, 2013 at 8:15 am | #

Great stuff, Michael. Awesome.

7. Yan Grey
Posted January 10, 2014 at 4:37 pm | #

Another problem with the three-second rule: you’re assuming the timing of the event is accurately recorded by the shot scorer. Consider how it works: the scorer sees the event, looks down at the computer, presses the shot key, places the shot on the screen and then enters the type of shot. That takes three seconds itself. Now say that rebound happens. A spotter probably has to call it out because the scorer is still looking down at the screen. Once that data’s entered, another three seconds have passed, so the event could read six seconds after the initial event. Sometimes the scorer backs it up, sometimes he’s too busy entering the NEXT event because they happen FAST. My point is, so much faith is being placed on the accuracy of data which is demonstrably inaccurate, certainly in terms of distances, times and sometimes even whether an attempt is a shot or not.I appreciate the work being done here, but it’s like making a beautiful dinner with rotten ingredients.

8. Alan Ryder
Posted June 27, 2014 at 7:10 am | #

There is a whole bunch of thinking on this on my web site, starting with the first work ever on Shot Quality in 2004, including the development of Expected Goals. See http://hockeyanalytics.com/shot-quality/ and especially http://hockeyanalytics.com/Research_files/Shot_Quality.pdf.

When I went to work on this I was really only interested in a refined view of defense and goaltending. While you can apply this thinking with some success at the team level, especially for defensive shot quality, it is not very predictive at the individual level, especially on offense (where the ‘quality’ of the shooter is a dominant and missing variable).