Firstly, I apologize for the long lay-off on the blog — after I won the Hackathon I was caught up in the whirlwind that followed, which mostly involved getting to work with the team on their Entry Draft preparations. I’m not exactly sure how much I’ll be able to post here in the coming months, but whenever I’m working on something general/non-team specific, I’ll post it here.
Much earlier in the summer I had been working on a series of posts which investigated team percentages — basically, can a team sustain a high or low shooting or save percentage? The short answer is that, yes, a team can sustain a high or low save percentage but cannot sustain a high or low shooting percentage. Shooting percentage at the team level is essentially a random walk, with pretty compelling evidence to support that position. On the other hand, save percentage is highly dependent on one position, being goaltending, which tends to be somewhat stable over time for teams.
This team-level analysis had obvious conclusions — chief among them being that team save percentages are relatively predictable over time, whereas team shooting percentages are not. You cannot read much of anything into a high or low team shooting percentage, other than the fact that it’s unsustainable. Over the long term, you would expect team shooting percentages to converge to the league average for all teams.
Of course, this got me thinking about how this phenomenon manifests itself at the player level. Can players manage to sustain a high or low on-ice shooting percentage? Now, this is a different concept than simple shooting percentage, which tracks the proportion of your own personal shots that go in the net. On-ice shooting percentage measures the proportion of shots taken by you or any of your teammates that go into the other team’s net while you are on the ice.
One of the longstanding theorems in hockey statistics that still needs to be solved is whether or not shot quality exists. A rational thinking fan would think: “of course it exists” — different players have different skill levels, so a higher proportion of player Goodly Wonderstud’s on-ice shots will go in compared to Jumbly Fumblepot’s. A lot of smart people have taken a run at this proof, but IMO it still remains unproven. This analysis is not an attempt to prove or disprove this — I’ll simply say that I believe one way shot quality can be observed is to prove if certain players can sustain high or low on-ice shooting percentages. Then I will know that a shot taken while Sidney Crosby is on the ice is more likely to go in than when Travis Moen is on the ice — therefore, shot quality when Sidney Crosby is on the ice is higher. If this is a trick that can be sustained, then I think we will have something of value to place alongside Corsi (or, the volume of shots taken while you’re on the ice). Alex Tanguay has a reputation of being a quality shooter who doesn’t shoot very much. His Corsi stats may not be all that impressive, but if we can prove that he can sustain a high on-ice shooting percentage to compensate, what’s the difference (really)?
I compiled a list of all players who’d played at least 30 games in each of the six seasons between 2007-2008 and 2012-2013 (I used >=20 games in the lockout shortened 2012-2013 season). This left me with a sizeable sample of 229 players. Just to be up front, I left Sidney Crosby in the sample even though he had one season where he did not meet the 30 game requirement (2011-12), because I wanted to (and this is my damn analysis!!!). I then found the average on-ice 5 on 5 shooting percentages and standard deviations for each of the six seasons only using the players in my sample. Using these, I compiled a normal cumulative distribution score between 0 and 1 for each player for each season. If you were right at the sample average, you would have scored 0.50, if you were better than the average you’d have higher than 0.5 but below 1, etc.
First, let’s see a list of players with the 25 highest 6-season average on-ice shooting percentages, and the 25 lowest:
Well, this is already saying something, isn’t it? That list of the top 25 on-ice shooting percentages includes probably 20 players that you’d put down on your “from-the-gut” list of the top 25 players from this time period, while the bottom 25 players includes many guys that you’d be surprised actually played 30 games for 6 straight seasons.
The implications of this list are pretty mind-blowing if you think about it: shots taken when Sidney Crosby is on the ice are more than twice as likely to go in as shots taken when George Parros is on the ice. Remember, this has nothing to do with shot quantity — just a normal, mundane, everyday Sidney Crosby on-ice shot is that much more likely to go in. This is shot quality that Crosby has proven than he can sustain year after year after year, either from his own stick, or through his efforts to set up his teammates.
If a player’s on-ice shooting percentage was random, you’d expect it to tend towards the 0.50 cumulative score mark that I mentioned earlier — it should be VERY hard to sustain near the edges of 0 and 1. But this is just not the case at all:
Just look at how many 6-season average cumulative scores are fairly far away from the 0.50 expected score. The data may still tend towards 0.50, but it seems like there’s a lot of action going on outside of my arbitrarily chosen band of +/- 10% of the expected equilibrium point. If this is random, how can so many guys seem to sustain very high and very low scores?
One of my favourite tests for randomness in this context is akin to observing coin flips. A coin flip is 50% heads 50% tails random event. Think of scoring above and below 0.50 (or, sample average) in any one year as flipping a coin. If it’s below 0.50, it’s one side of a coin, if it’s above 0.50, it’s the other. Imagine flipping that coin 6 times (for our six seasons). If this was random, you’d expect the outcome to be 3 heads and 3 tails a decent proportion of the time, etc. I used a probability tree in an earlier post to come up with my expected percentage of outcomes for 6 “flips” of our coin, ie, how many times I’d expect 1 heads / 5 tails, 2 heads / 4 tails, etc. Then I looked at the data to see the observed number of times each combination of “heads” and “tails” was actual seen:
It seems like there is a divergence between what we expected to see (on the right), and what we observed in the data. This is easier to see in a graph:
The trend is pretty clear — there are many more players able to sustain a relatively large number of high or low on-ice shooting percentage seasons than we expected. If the data were random, you’d expect a central tendency to 3 above and below average seasons. There is a tendency, but much more moderate than we’d expect — the 3 above/3 below category saw only 57 out of the 229 players in our sample showing that behaviour, whereas we would have expected ~72 of them to show such characteristics.
We can test the goodness-of-fit using a Pearson Chi-Squared Test. Basically, this tests our observed number of players in each category versus our expected number of players in each category, and tells us the probability of observing the difference between our observed and expected data if it actually did have the properties of our expected data (this is a p-value). I come up with a test statistic of 5.47396E-09, meaning the probability that our sample comes from a population with the expected distribution is 0.000000005%. We can safely reject that hypothesis, and say that our observed distribution of data is different than our expected distribution. IE — a player’s on-ice shooting percentage is not random.
Here’s another way to prove this — let’s do a simple regression, where I use a player’s on-ice-shooting percentage in one season to predict his on-ice shooting percentage in the following season. If it’s a statistically significant predictor (ie, if one season’s on-ice shooting percentage can be used to predict the next season’s), then we know that it’s not random. I had 1145 such observations in my 6-year dataset — 229 players times 5 year-over-year comparisons, so we have degrees of freedom coming out of our ears. The t-stat for the coefficient for my predicting year was 6.696, with a p-value of 3.35728E-11, or well below my needed threshold of below 0.05 to accept a hypothesis that one year’s on-ice SH% can be used to predict the next year’s.
So what does this all mean? Well, it’s become pretty handy to target players who have a high Corsi, or shot attempt generation, as over the long term players who generate more shots will have more chances to score, will score more goals, etc. But this analysis suggests that sustaining a high on-ice shooting percentage is not simply luck. Take Alex Semin, for example. He had a very high on-ice shooting percentage with the Capitals, but went to an entirely new team this year (Canes) and proceeded to have the best on-ice shooting percentage of his career. Also, this has also shown that Alex Tanguay does have a sustained high on-ice shooting percentage, which helps to compensate for his lack of shot production. In the end, the puck goes into the net, he’s just more efficient over a less number of shots. The superstars are the guys like Crosby, who are able to sustain high rates of shot production along with high shooting percentages, resulting in sweet, sweet gravy.
I think we can start to use concepts like this to develop metrics that can value a player’s displayed on-ice shooting percentage to augment stats like Corsi. If Sam Gagner has shown he can sustain a pretty high on-ice shooting percentage, that must help inform us about what kind of value the player has. The more data we have about a player, the more confident we can become that he is a “high” or “low” on-ice shooting percentage guy, and not just experiencing luck — much like how we treat a goalie’s save percentage.