With the recent acquisition of Mike Brown, the Oilers seem to be falling prey to the mantra that you need to outhit your opponents in order to succeed in the NHL. Many, many media members have asserted and reasserted this concept that, in short, hits are a proxy for toughness, and toughness is a proxy for male essence, and male essence leads to wins.
But what of it? Is there any relationship between hits and shots for, or hits and shots against, or hits and standings points?
To answer this question, I gathered a data set that included hits, total shots for, total shots against, and standings points for every season each team has played in the last 5 complete seasons. Then I performed a series of one-on-one regressions to test whether total hits had any effect on shots or points. Regression is simply a tool that tells us what amount of variance in one variable can be explained by the variance in another one, and whether this relationship is ‘statistically significant’. Statisticians require a threshold of 95% confidence in the displayed relationship before saying that it does, in fact, exist. Note, this doesn’t mean 95% of the variance is explained by the variable, it’s just a reflection of confidence.
We use a statistic called a ‘P-value’ to decide what the level of confidence is. If a P-value is lower than 0.05, we can say we are more than 95% confident of statistical significance. If it’s above 0.05, we can say that we are not confident in any relationship. Here are the P-values of total hits against the three variables shots for, shots against, and standings points for both the last 5-years and just looking at last year:
We can see that none of these are below 0.05, therefore we can reject the hypothesis that total hits has any statistical significance on any of these variables. The only one that’s even remotely close is the effect of hits on shots for, which came within 10% of the confidence level required to accept it. It was suggesting that every 13.5 more hits you make in a season results in one extra shot on net that season. Either way, it’s still rejected. All of the other P-values are so high that we can easily reject that any relationship exists between them.
However, Tyler Dellow of mc79hockey has made great points on Twitter recently about how using total hits is flawed, because certain arenas overcount hits, making the methodology inconsistent between arenas. The only way to control for this is to strip out all home hits and only count road hits, so that you have a varied subset of arenas in your sample without a dominant number of home hits. So I re-ran my regressions only using road hits. Here are the results:
Again, not one of the P-values is below 0.05, in fact none of them are even close. We can safely reject the hypothesis that total amount of road hits in a season has any effect on shots or standings points.
Also, not one r-square correlation was above 0.04, while most were below 0.01… meaning that less than 1% of the variability in shots or points could be explained by the variability in hits. The relationship is statistically random — some high hit teams are good, some are bad, and vice versa. There must be something else to explain why good teams are good?
Handily enough, I had the variable right in front of me in this dataset. I took shots for and subtracted shots against — coming up with the overall shot differential for each time in each season for the last 5 years. I then used that as the explanatory variable in a regression against standings points. What were the results?
In short, the results are unequivocal: there is an incredibly strong relationship between shot differential and standings points. I’ve outlined the P-value of this regression in red. What does that even mean? Well, you remember that you need a P-value lower than 0.05 to assert statistical significance, right? The lower the number, the more confident you are in a statistical relationship. The P-value in red is 0.000000000000015. Or, I can say with 99.9999999999985% confidence that there is a statistically significant positive relationship between shot differential and standings points. Compare this P-value to the ones in the two tables above, and you decide which one is more important to winning.
The regression equation is:
Standings Points = 91.68 + Shot Differential * 0.0275
This implies that a team with an even shot differential will end an 82-game season with 91.68 points. Every 36.4 more shots that your differential increases by over a season provides you with one more standings point. Conversely, every 36.4 shots your differential drops by in a season will cost you one point. At the Oilers current shot differential rate this season of -5.5 per game, over an 82 game season you’d expect them to have a shot differential of -454, meaning this equation would predict that they would finish an 82-game season with:
Standings Points = 91.68 + -454 * 0.0275 = 79.2 standings points.
Right now, they’re on a 78.3 point pace if this was an 82-game season… so I’d say this regression equation holds pretty damned well.