by PuckStopsHere on 11/17/11 at 06:58 PM ET
I thought it would be a good idea to write an article in an accessible (as non-mathematical as possible) style to discuss Corsi analysis and some of issues discussed on the internet regarding it.
Buffalo Sabres assistant coach Jim Corsi wanted to better understand how busy goalies had been so instead of counting only shots against, began counting blocked and missed shots as well. After a while it was realized that this could be a basis of individual player assessment. It was found that the differential of Corsi events (the difference in the number of shots attempted for and against) correlates extremely strongly with which team possesses the puck and also with which zone the puck is in. A team with a high Corsi possesses the puck the majority of the game and keeps the puck in their opponent’s zone. These are good things that are valuable to playing winning hockey and this is the best (known?) statistical measure from the information the NHL routinely publishes online.
We measure the differential of Corsi events when an individual player is on the ice (his Corsi rating) to try to gage which players are driving puck possession and puck position during games. The problem with looking at things on an individual level is that Corsi is inherently a team gage and must be put into context to try to get individual ratings. A player on a good team will be more likely to have a good Corsi than a player on a bad team. A player who plays in offensive situations or against weak opposition will be more likely to have a good Corsi than one who doesn’t (and one can imagine several other reasons a player might have a good Corsi related to the context in which he plays).
In order to remove the bias of the situation a player plays, Corsi is usually only recorded in 5 on 5 situations. Thus power play and penalty kill situations are removed. In some cases data is also recorded in these situations but usually kept separately. In power play situations any player would be expected to have a good Corsi and in penalty kill situations it is expected to be poor. Nevertheless, it is possible to do Corsi analysis to special teams situations. Since more time is played at even strength, there is a poorer sample size on special teams, so there is more random error in this.
With an even strength Corsi, I find the most useful corrections (in that they are the biggest) are the team the player plays on and the situation in which the player plays (as measured by zone starts - does he start more shirts in the offensive or defensive zones). These corrections make Corsi a more useful number to compare various players because they better include the context in which the player plays.
The complaint is often that Corsi (even with various adjustments) is not a “be all and end all” statistic to rank all players. That is unfair because no existing number in hockey is held to that standard. Goals, points, +/- ratings etc. are not a “be all and end all” statistic to rank all players, but like Corsi they give us useful information about how players are playing.
The internet father of Corsi analysis is Gabe Desjardins who publishes the fabulous behind the net website where he publishes the numbers to do Corsi analysis. The numbers go back to the 2007/08 season, so we do not have an ability to do Corsi analysis through most of the NHL’s history - only the recent years are available. Sufficient data was not recorded throughout most of the NHL’s history to allow this.
Desjardins estimates that about 40% of the game is captured by Corsi analysis. It measures the shots attempted at even strength. This is a strong measure of puck possession and puck position on the ice, but it does not measure a lot. It does not measure special teams play. It does not measure the ability to score goals; it only measures the ability to generate shots. Not all shots are equal. Some shots are better than others and have a better chance of becoming goals. Shot quality is not measured in Corsi analysis. Neither is a particular player’s ability to score (his finishing ability) in a given situation. It also does not include team’s ability to prevent quality shots or to make saves (generally this is goaltending). In order to turn a Corsi rating into a more useful number, this must be taken into account if it is possible. Corrections can be made for some of these effects. Some of these effects are not too dependent upon the individual player involved. The saves percentage while a player is on the ice is strongly dependent upon goaltending and not the individual player involved.
In hockey scoring goals is more important than generating shots. Some think that a better analysis can be done using goals instead of shots. This is the position of David Johnson who runs the website hockey analysis. He is often seen as a “heretic” in the sabermetrics community because although that idea sounds simple enough it doesn’t hold up to scrutiny.
In order to go from shots to goals you merely multiply by the shooting percentage of the player involved. On the team level, as we are talking about all shots when a player is on the ice, shooting percentage is not an individual number; it is the number for all players on the ice when a player is on the ice. How much control does a player have on the shooting percentage of another player? If the player is a good set-up man who can get his teammates into good scoring opportunities, he has some control, but it is generally very limited. On the flip side saves percentage of a given team is not well controlled by any other player on the ice.
Even on an individual level, shooting percentage for a player is not a very repeatable number. There are often wide differences in shooting percentage for an individual player from year to year. These shooting percentage differences are often enough to explain unusually high or low scoring seasons. In fact Corsi is found to be a better, more sustainable number from year to year than points or shooting percentage. It is more consistent from year to year and hence a better measure of the individual contribution of a player. Corsi is an underlying number that does a good job of showing how well a player is playing and it is a persistent measure of his talent.
Johnson’s goals analysis is equivalent to replacing Corsi with +/- ratings. The problem with +/- ratings is they are strongly dependent on the shooting percentage and saves percentage of a team, while a player is on the ice. These numbers are largely random and not under a player’s control. They can be more repeatable if a player consistently plays with the same goalie or linemates, but they are largely a random factor. This idea is quantified as pdo . This is the sum of the shooting percentage and saves percentage when a given player is on the ice. The leaguewide average by definition is 1. Players who have a PDO well above 1 have a combination of saves percentage and shooting percentage well above average. Usually this means they have had good luck and will not be able to sustain their numbers over the longterm. Conversely if a player has a PDO well below 1, it means he has had bad luck and things will likely get better soon. The biggest caveat here is that a player on a team with goaltending that is far above or below average is not as likely to regress to 1 in time as the saves percentage will not at a league average value.
This is not to say that there is nothing to be learned looking at shooting percentage and saves percentage while a player is on the ice. There are small corrections that can be made. Some players have extremely good or bad finishing ability for example. To fully understand their contributions it is necessary to take it into account. The problem with rejecting Corsi analysis for shot based analysis is that you lose information on the individual level because of the largely random PDO effects.
Johnson argues that he can identify some players who consistently have high or low PDO values. Marian Gaborik is the best example of a high PDO player and Travis Moen is the best example of a low PDO player. Something can be learned by understanding why these players have consistently extreme PDOs. Gaborik has spent his career with good goaltending. He plays with Henrik Lundqvist in New York and he is one of the top goalies in hockey. Previously in Minnesota he played in a top defensive system under Jacques Lemaire where goalies had high saves percentages. Gaborik is a player who is a very good finisher and has been the top offensive forward on his line throughout his career, so teams want to have him taking shots. He helps to increase his team’s shooting percentage and has had the luck to play with a good saves percentage. Travis Moen is a defensive forward who does not play in offensive situations often. He plays against top opposition and his team usually does not press for high quality shots and accepts low quality shots that reduce their shooting percentage as they are more concerned with defence. The strong opposition tends to reduce his team’s shooting percentage and his line’s lack of offensive play reduces their shooting percentage.
Despite the existence of a few players who have extreme PDOs that persist from year to year, for almost every player their PDO will regress to 1 over time. It is a far better model to try to ignore the usually random effects of team shooting and saves percentages on most players than to start with numbers that include these largely random effects.
Johnson argues that you can look at the situation as an apparently linear equation:
goals scored = shots taken * shooting percentage
He then treats them all as independent variables when they are not. If I want to raise my shooting percentage, I can chose not to shoot expect in situations where I have a very good chance of scoring. I will reduce my goals scored and shots taken but increase my shooting percentage significantly. Similarly if I shoot every time I am 200 feet from the goal on any angle with any number of players in the way (somebody at the games screams shoot whenever this situation presents itself), I will have a high number of shots taken an very low shooting percentage and may even have reduced my goals scored since I am no longer patient enough to get high percentage shots. These variables are clearly dependent upon on another and cannot be views as independent linear variables. This is not a linear equation. The simplest way to show this point is that any player with zero goals scored will necessarily have a zero shooting percentage. It is impossible for that to not be the case.
The further problem is that we are not discussing the shooting percentage of individual players. We are discussing the shooting percentage of all players on the ice when a player is on the ice. A player has little effect on the shooting percentage of others.
In trying to find a quantifiable example of a player who looks good in a goals based analysis but not in a Corsi based analysis, Johnson picks out Brendan Morrison of the Calgary Flames. Johnson calls him a good signing despite the fact Corsi disagrees. Morrison has failed so far this season. He has no points in eight games so far this season. He has had injury problems and played very few games, but he is a perfect example to show the difference between these two schemes. Morrison has consistently had a high PDO in the past. That is unlikely to last. It hasn’t so far this season. As a result his scoring numbers have declined. It is unlikely that when Morrison finally comes back from injury, he will continue to have no points, but it is likely he will underproduce his previous numbers as his PDO is likely to drop from its past levels. Morrison is a very good example of the difference between the two systems. He is predicted far better by Corsi analysis than by a goals based analysis because of the randomness in the goals analysis captured by his PDO.
Corsi analysis is more powerful than a goals based analysis because it is more repeatable. While shooting percentages can have a measurable effect that needs to be included to better understand the value of a player than a Corsi based number that doesn’t take it into account will give, it is usually a small correction. It is far better than neglecting Corsi based analysis to do a goals based analysis.
Corsi analysis is a strong measure of puck possession and puck position. With some context based adjustments it is possible to find individual player values. This number is a useful number for rating players, just as goals or points is. Just as goals or points, nobody in their right mind would claim this number is a “be all and end all” statistic to rank players. Much of the resistance to Corsi analysis comes from the fact that people do not understand it. They often expect it to be a “be all and end all” statistic and criticize it when it fails, as it is expected to. It gives useful information about how players are playing. With this information we can better assess players than without it.
Add a Comment
Please limit embedded image or media size to 575 pixels wide.
Most Recent Blog Posts
About The Puck Stops Here
Who am I? A diehard hockey fan.
Why am I blogging? I want to.
Why are you reading it? ???