This post continues a series of examinations into the concept of team ‘luck’. I initially wrote a three part series on team PDO, testing both it and its constituent parts to find out if they were truly random over time. I found that while team shooting percentage does seem to be random over time, team save percentage shows distinct evidence of not being random. The influence from SV% was so strong that it made PDO statistically predictable over time — kind of a bad thing when it’s traditionally used to reflect luck.
The first post in this current series tried to envisage a new way to calculate PDO, taking into account the expected performance of goaltenders entering a season and weighting it based on how many starts that goaltender received in the current season. I then applied the methodology to the 2013 season and promised to analyze it with the same vigour I’d used in my initial 3-part series on PDO. And here we are.
Firstly, I’d like to report that my initial methodology for a modified PDO statistic (MPDO) did *not* pass my statistical tests — it definitely showed more randomness than PDO, but any given MPDO could still be used as a statistically significant predictor of a team’s subsequent season. It really made me think more about what this entire process is trying to accomplish. If you recall, my initial methodology for coming up with a goalie’s predicted SV% for any upcoming season was to simply take his save percentage over the last 5 seasons of data (as long as he had > 500 shots against). My rationale was that we were trying to get an idea of what a goalie’s “true” talent level is using a large data sample, so that any deviation from this monolithic number could be seen as manifested luck. However, when implementing this methodology in practice, I realized that the 5-year average was simply too long — there were many instances of years long past polluting my dataset, in that they were no longer useful predictors of expected future performance. A quick example would be including some of Martin Brodeur’s monster post-2005 lockout years to help predict how he would do in some of the last few years. His performance over this time period was so different, with high SV% years falling into low ones, that the 5-year average essentially evened out to league average, leaving his “standard” the same as it would be in traditional PDO.
I then realized that this is essentially a forecasting excercise. We’re trying to come up with what a goaltender’s expected performance in this season, not what his long-term true talent level is. Those are related, but slightly different concepts. Knowing Martin Brodeur’s long-run SV% is interesting, but it’s certainly not going to be very useful in trying to predict what his SV% “should” be when he’s 40. I put the word “should” in quotation marks to emphasize this point: we need to find an expected save percentage where it is EQUALLY LIKELY for his actual SV% to be above or below that number. Read that sentence again, it’s important. Meaning, we really need to give this the college try to predict (or, forecast) what a goaltender’s performance will be this season.
Now, I’ve worked with SV% numbers and forecasting a lot, so I had some ideas of how this could be accomplished. I needed something both simple to understand and decently accurate. After about 10 different attempts at trial and error, I came up with a new methodology that satisfied these requirements. Instead of using a 5 year average SV%, I only used the last 3 years of data. This concentrates the expected performance on a goalie’s most recent history, but includes enough data to be considered a large sample size (in my eyes anyways). I then weighted the 3 years in an exponential fashion, with the most recent year having the most weight and 3 years ago having the least weight (I used the arbitrary weights 2, 4, and 8 for the save percentages recorded three, two, and one year ago). A goalie still needed to have seen 500 shots within the last three years to qualify, and any non-qualifying or new goalie was given the same “average new goalie” SV% I used in the first part of this series (which are 4-5 points below league average SV%).
Once I had my expected SV% for each goalie, I then weighted them by the starts that each team’s goalies saw to come up with an expected team SV% for that team for that season. If the actual SV% was above this, it was considered lucky, and if it was below this it was considered unlucky. For instance, the Oilers this year had an expected team SV% of 0.921, but actually had a team SV% of 0.924, meaning they did a bit better than was expected, or “luckier” than expected. I took the difference between the expected and actual, added to the difference between the team’s expected and actual shooting percentage, and came up with a modified PDO (MPDO) for each team for each of the last 6 seasons.
To normalize the MPDO results, I transformed all the numbers for each season into normal cumulative distribution scores between 0 and 1, a technique I used extensively in the initial 3-part series on PDO. A score of 0.50 was league average, anything above that is above league average, anything below is below league average. It’s just a technique to compare apples to apples over time. If MPDO is random, I would expect the long-term average scores to tend towards 0.50 or league average. Here are the 6 years of data:
The teams in this table are ranked in descending order from the most “lucky” over the last 6 seasons to the least. Already we see a promising band of scores around 0.50. Let’s compare the top and bottom 5 teams to their normal PDO counterparts:
The top team in traditional PDO was Vancouver by a landslide, with an average 6-year score of 0.87. However, in MPDO Vancouver slides down to the 4th luckiest — why is this? Well, Vancouver has employed consistently good goaltenders over this time-frame, meaning that one would “expect” them to do better over time than the league average. In other words, Vancouver would consistently have high PDO, and people would construe this as being “lucky”, when in fact it was only because they had good goaltenders. Their MPDO in 2013 was a touch below league average because their expected SV% was 0.931, while their actual SV% was 0.928.
And just eyeballing this table should tell you something about what’s going on here –instead of the top and bottom PDO teams being the ones who’ve had consistently good or bad goaltending, it seems to be more of an unexpected mix of teams. “Hey, Dallas is the third luckiest team over the last 6 years!” instead of “Hey, Boston’s had two Vezina winners and the best young goalie in the game, neat!”.
Compare this graph of the 6-year average scores using MPDO:
To this original one using PDO:
You’ll immediately notice more teams in the arbitrary yellow band +/- 10% from the expected long-run tendency towards 0.50. You also notice a less number and less severe outliers. The new MPDO does seem to be increasing the gravitational pull towards 0.50 (league average).
You may also recall a technique I used in my PDO series where I found the expected probability of any team having, say, 6 above average and 0 below average seasons out of 6 seasons, etc. The expected probabilities were calculated using a probability tree that assumed the chance of being above or below 0.50 should be 50% if the measure truly is random (the measure in this case being MPDO). The following table shows the actual numbers of teams that had the specified number of above/below average seasons and compares it to the expected numbers.
The effect is better seen graphically. Compare this graph of how MPDO compares to the expected probabilities:
To this graph using the original PDO:
What we’re looking for is evidence of a central tendency — if the measures are random, we want to see them bunch up in the middle where teams have more equal numbers of good and bad seasons instead of a sustained number of good or bad seasons. Now, perhaps the two graphs don’t seem all that different, but check out how the 6 Above/0 Below bar in the PDO graph is completely absent in the MPDO graph, with its weight being added to the more central 4 Above/ 2 Below category. It’s a small change, but does show a more central tendency. How can we test this?
I chose to use a Chi-Squared test to make my point here. What Chi-Squared answers is, basically: “does a set of actual numbers match a set of expected numbers”? It’s often used in situations like this, where the expected numbers are assumed to be random chance, and the observed actual numbers are tested to see if they deviate significantly from the expected ones. If so, whatever you’re analyzing can be found not congruent with simple random chance. This test uses P-values: any P-value below 0.05 suggests whatever you’re analyzing is not congruent with random chance, any score above 0.05 suggests that it is. I proceeded with chi-squared tests not only for MPDO but also PDO and the original parts of PDO (team save and shooting percentages) to illustrate the point.
Here again we see the evidence that SH% is truly random, very much above the 0.05 cutoff. SV% is found to not be congruent with random chance, well below 0.05 at 0.0004. What’s interesting is that PDO is found to show adherence with random chance with this test at P-value: 0.18, but it is obviously heavily influenced by SV% downwards. Alternatively, the new MPDO statistic has a P-value much higher at 0.59. The difference in the P-values suggests MPDO displays much more randomized characteristics than PDO.
To finish off, I performed my favourite test of randomness: I performed a regression equation using one season’s MPDO normal cumulative distribution score as an independent variable to predict the next season’s score as a dependent variable. The rationale here is that if I can use one year’s MPDO to predict the next year’s with statistical significance, the measure cannot be random. If you recall, I performed this test with PDO and came up with a P-value of 0.0014, well within the 0.05 boundaries needed for me to accept the hypothesis that they do have a statistical relationship. What I’m hoping for with this test with MPDO is for a high P-value, showing a very weak statistical relationship from one year to the next.
I was not disappointed. Using the MPDO normdist scores, I calculated a P-value of 0.32, meaning that I cannot accept a hypothesis that one year’s MPDO has any bearing on the next year’s — the relationship is random.
To conclude, I do not want to suggest this new MPDO stat is supreme over all potential others — it is to prove that finding a measure to properly reflect luck in hockey is possible. Showing which teams are riding luck and which ones are due to break-out has been one of the most useful developments of the advanced stats community. It’s my belief that refining our methods to more accurately depict this concept is important, and will provide a great many insights for years to come. This new MPDO statistic is proven to be randomly influenced by a force that we can approximate as luck.
For those interested, here’s a table that compares MPDO to PDO for each team for the last 6 years. If you’d like this in excel format, just email me and I’d be happy to provide a copy of my work.