Jump to content

Advanced Statistics: The Corsi Number


Wraith

Recommended Posts

Sabermetrics are very popular in Baseball these days. It has helped both fans and professionals alike analyze what is actually happening and helped move away from traditional statistics that were way too dependent on luck/random variation. Statistical analysis of hockey has not progressed to nearly the same extent, but the field is growing. One statistic often cited is the Corsi number. I've seen this referenced occasionally around here so at least a few of you are familiar with it. I've been following the Sabres numbers all season and I decided now was a good time to analyze the Sabres. What I found was really interesting.

 

Background:

 

1) The development of the Corsi number is credited to our very own Jim Corsi, although it's probably been used in various forms for a long time. Darcy Regier is well known to be a proponent of the statistic, so even if you disagree with it's findings, you should at least walk away with a better understanding of why the Sabres value their players the way they do.

 

2) It's calculated on a player by player basis. It's usually defined as all team shots on offense (including missed shots and blocked shots) minus all shots/missed shots/blocked shots given up on defense while a player is on the ice. I encourage you to look it up if you want more details. It's an indication of offensive zone pressure created versus defensive zone pressure receive while a player is on the ice. It's normalized by ice time. It's used almost exclusively with 5 on 5 play.

 

2) It's predicated on the idea that all players and teams basically have the same shooting percentages and therefore, the only way to score more goals is to shoot more. On a team level, this is absolutely true. On a player level, this is still very true. Year over year, shooting percentages for an indiviidual player vary tremendously and do not correlate well. There are very few players that can sustain an elevated shooting percentages year after year. Good shooting percentages are mostly luck.

 

3) It solves the +/- problem by dramatically increasing sample size. In theory, there is nothing wrong with the +/- stat. The object is to score more than your opponents when you're on the ice so +/- should make sense. However, as we all know, +/- is heavily influenced by luck/random variation because goals are scored so infrequently. The sample size is just way too small. The Corsi number increases the sample size by about a factor of ten, which means that random variation should have a much less significant affect, as the good/bad luck cancel each other out and actual differences appear.

 

4) The Corsi number needs context, like all good statistics. The Corsi number can be heavily influenced by the quality of the competition for each player. I've added the average Corsi number of the opponents for each Sabre to show who they've been up against. More importantly, the Corsi number can be heavily influenced by where on the ice the coach uses a player. I've added the offensize zone start % to give this context as well.

 

5) Attempts have been made to gauge "shot quality" to try to explain differences in shooting percentages amongst individual players. The only factor that has been found to significantly influece shooting percentage is shot distance (from the goalie). I've added that stat as well.

 

6) I filtered out Sabres with less than 10 games played because sample size is everything for these statistics.

 

Offensive Findings:

 

1) The common perception of Drew Stafford on this board does not match the findings of these statistics. His Corsi number (CORSI REL in the table) is excellent and leads the Sabres forwards by a wide margin. At even strength, the ice is tilted towards the opposition net the most while he is on the ice and it's not even close. A more traditional statistic that backs this up is his shots/game statistic. At 2.0 shots/game (5 on 5) he's average 23% more shots than the second best forward (Nathan Gerbe). Impressively, he's doing this while getting the fewest offensive zone starts of any skilled forward (48% versus 58%/59% for Vanek/Pominville). He's also doing this against very good competition (CORSI REL QoC). His goal total is being hurt by a painfully low shooting percentage but his average shot distance of 30 feet is totally normal and consistent with last year. This may mean he's due for some puck luck in the second half of the year.

 

2) Zack Kassian has also been excellent. He is second on the team in Corsi number despite starting in the offensive zone infrequently. However, his shooting percentage is inflated and probably unsustainable.

 

3) Thomas Vanek and Jason Pominville are actually in the negative. That means the Sabres are receiving more pressure on defense than they're generating on offense while these two are on the ice. They're shooting (and scoring) a lot but also giving up a lot of shots and goals on defense. This isn't all that surprising when you look at their +/- numbers. Pominville's shooting percentage is very average which is good news as it's probably sustainable over the course of the season. Vanek's is exceptional and probably not sustainable, though, despite his skill. His shooting percentage last year, for example was around 8% which is very close to average. This is surprising given their offensive zone start percentage (most among forwards) and their reputations as good defensive players.

 

4) Leino and Boyes are also generating a lot of offensive zone pressure while being the victims of bad luck in terms of shooting percentages. However, Leino needs to get his shot total up.

 

5) The Sabres as a team need need to generate many more shots. Last year the Sabres had 7 players who average 1.7 shots/game and played at least 10 games. This year they have one (Stafford).

 

Defensive Findings:

 

1) Mr. Gragnani is still a beast. His Corsi number is the best of any Sabre by a huge margin. However, he is being coddled by Ruff. He faces the second weakest competion of the seven defenseman who qualified and gets by far the most offensive zone starts. He also receives the least ice time. Still, the difference between him and the rest of the defenseman is so huge that some of it has to be his own doing. Remember, this is all at even strength.

 

2) Robin Regehr is doing all the heavy lifting. He's facing by far the highest quality competition and his Corsi number is suffering as a result.

 

3) Andrej Sekera has been excellent. Christian Ehroff, too.

 

4) Tyler Myers was terrible before the injury. The difference this year to last year is tremedous if you look it up and we need him to return to form.

 

5) McNabb is a mixed bag. He's faced the lowest quality competition but his Corsi number is still poor. But Ruff isn't afraid to use him in the defensive zone which is a plus.

 

Take a look at the attached statistics and draw your own conclusions.

post-1253-0-69838200-1325035223_thumb.jpg

post-1253-0-51695100-1325035231_thumb.jpg

Link to comment
Share on other sites

Pretty cool stuff. I wouldn't say that getting more offensive zone starts is coddling, though. Isn't it just common sense to put your defensive defensemen on the ice in your own zone, and your offensive defenseman on the ice in their zone?

Link to comment
Share on other sites

Pretty cool stuff. I wouldn't say that getting more offensive zone starts is coddling, though. Isn't it just common sense to put your defensive defensemen on the ice in your own zone, and your offensive defenseman on the ice in their zone?

 

While for the most part I am a proponent of statistical analysis and moving away from the established, anecdotal way players were evaluated in the past, it does bear noting that while Billy Beane kept the A's competetive, they never won anything and so far Darcy is keeping pace with his model.

Link to comment
Share on other sites

While for the most part I am a proponent of statistical analysis and moving away from the established, anecdotal way players were evaluated in the past, it does bear noting that while Billy Beane kept the A's competetive, they never won anything and so far Darcy is keeping pace with his model.

 

In all fairness to Beane's method, the wealthier teams quickly adopted it too and applied more money to pull the players that his analysis highlighted.

 

Is it just me or am I the only one disappointed to see Stafford evaluates well using statistical methods?

Link to comment
Share on other sites

Wraith, that's one of the more interesting posts I've read in a long time. There is the question of whether a player who has an excellent Corsi number, or, more importantly, a team with a number of such players, excels in the league.

 

Billy Beane did what he did in baseball, and won some division titles (as did the Sabres, and not all that long ago--what, is it eighteen months or less?), usually lost in the first playoff round, but while under financial constraints. The Sabres no longer are under such constraints, although there still remains a salary cap not present in baseball.

 

Weave is right on; the wealthy teams (and for once, we're rooting for one!) can adopt this, but is there any reason why the Sabres should? Especially in a cap league, which the AL is not? What are the Corsi numbers for Boston/Vancouver last year? Chicago/Philly the year before? How about Buffalo (Pres. Trophy)/Detroit in 2007? In other words, is this just another neat summation--accurate or not--of a player's worth, or is it useful to determine the makeup of a team?

Link to comment
Share on other sites

 

Background:

 

1) The development of the Corsi number is credited to our very own Jim Corsi, although it's probably been used in various forms for a long time. Darcy Regier is well known to be a proponent of the statistic, so even if you disagree with it's findings, you should at least walk away with a better understanding of why the Sabres value their players the way they do.

 

2) It's calculated on a player by player basis. It's usually defined as all team shots on offense (including missed shots and blocked shots) minus all shots/missed shots/blocked shots given up on defense while a player is on the ice. I encourage you to look it up if you want more details. It's an indication of offensive zone pressure created versus defensive zone pressure receive while a player is on the ice. It's normalized by ice time. It's used almost exclusively with 5 on 5 play.

 

2) It's predicated on the idea that all players and teams basically have the same shooting percentages and therefore, the only way to score more goals is to shoot more. On a team level, this is absolutely true. On a player level, this is still very true. Year over year, shooting percentages for an indiviidual player vary tremendously and do not correlate well. There are very few players that can sustain an elevated shooting percentages year after year. Good shooting percentages are mostly luck.

 

3) It solves the +/- problem by dramatically increasing sample size. In theory, there is nothing wrong with the +/- stat. The object is to score more than your opponents when you're on the ice so +/- should make sense. However, as we all know, +/- is heavily influenced by luck/random variation because goals are scored so infrequently. The sample size is just way too small. The Corsi number increases the sample size by about a factor of ten, which means that random variation should have a much less significant affect, as the good/bad luck cancel each other out and actual differences appear.

 

4) The Corsi number needs context, like all good statistics. The Corsi number can be heavily influenced by the quality of the competition for each player. I've added the average Corsi number of the opponents for each Sabre to show who they've been up against. More importantly, the Corsi number can be heavily influenced by where on the ice the coach uses a player. I've added the offensize zone start % to give this context as well.

 

5) Attempts have been made to gauge "shot quality" to try to explain differences in shooting percentages amongst individual players. The only factor that has been found to significantly influece shooting percentage is shot distance (from the goalie). I've added that stat as well.

 

6) I filtered out Sabres with less than 10 games played because sample size is everything for these statistics.

 

 

Honest question: if the Corsi number was developed in-house and has been followed by Darcy for a relatively long time, then why isn't the team better than it is? In other words, does having a lot of players with nice Corsi numbers correlate with a high win total? Not necessarily. The Corsi number tracks puck control for the most part, and will reward throwing low quality shots at the net offensively while punishing teams/players for packing it in defensively if they jump out to an early lead.

 

To address Santa's question above: the Bruins were only 14th in team Corsi rating last year... Vancouver was 6th... Anaheim was dead last, but was the 5 seed in the west.

Link to comment
Share on other sites

I think the Sabres should pay more attention to Newton Numbers. You know...little things like F=ma

 

If people use these numbers, more power to them. Does it account for undersized performers who consistantly play the perimeter and may feather greater number of attempted shots towards an in-control, bulkier defense?

 

Does it account for an undersized defense that retreats at it's own blueline and may only give up one great scoring opportunity on one shot that is stopped versus the same perimeter player who may aim 3 shots towards the goal area after dangling? Does it also account for the quality of shot that may be given up by a pinching defense who leaves chances to be outbumbered going the other way?

 

Just what we need around here....more spreadsheets. (No offense carp)

 

Honest question: if the Corsi number was developed in-house and has been followed by Darcy for a relatively long time, then why isn't the team better than it is? In other words, does having a lot of players with nice Corsi numbers correlate with a high win total? Not necessarily. The Corsi number tracks puck control for the most part, and will reward throwing low quality shots at the net offensively while punishing teams/players for packing it in defensively if they jump out to an early lead.

 

To address Santa's question above: the Bruins were only 14th in team Corsi rating last year... Vancouver was 6th... Anaheim was dead last, but was the 5 seed in the west.

 

Jesus....Regehr and McNabb horrible numbers.

 

I think I found the key....the bigger your nuts are, the more they drag your Corsi number down.

 

Now I know why Corsi sits next to Regier every game. I used to think it was because he opened Darcy's pistachios. They are just working on new numbers to woo Pegula with.

Link to comment
Share on other sites

OK....this idea really has me steaming.....thank you Wraith for bringing it up.

 

When I have enough free time, I am going to develop my Regier/Ruff Return on Investment Number. Here is what it will be. Average cash outlay on players on the ice for team, divided by points in the standings, divided by 1.0 if missing playoffs, 1.07 for making playoffs, 1.17 making the second round, 1.30 making the conference finals and 1.45 making the Stanley Cup. The multiplier comes from expected additional revenue in hosting home games in those rounds. The average cash outlay negates injuries in the starting lineup.

 

I figure this is a nice fair way to evaluate the bang for the buck an owner gets out of his GM/coach combo.

 

If Carp wants to take on this task, he can reap the rewards as GM's and coaches around the league drop like flies. I have an idea where the first opening would be this year.

Link to comment
Share on other sites

While for the most part I am a proponent of statistical analysis and moving away from the established, anecdotal way players were evaluated in the past, it does bear noting that while Billy Beane kept the A's competetive, they never won anything and so far Darcy is keeping pace with his model.

Wasn't there another formula or website that rated players like this. If I recall correctly that site, as the Corsi system does, failed the Stafford Test. Any formula that ends with a result of Stafford being a good hockey player is flawed and needs to "go back to formula" as they said in the first Spider-Man.

Link to comment
Share on other sites

Nice work.

 

Gad, I vaguely remember that, (i.e. too much time on my hands before marriage, 4 children and a dog).

 

 

And yes I am one of the ones disappointed that statistically speaking Stafford looks like a beast, however actually watching him play and seeing these stats kind of proves that they do not tell the whole story.

Link to comment
Share on other sites

Is it just me or am I the only one disappointed to see Stafford evaluates well using statistical methods?
And yes I am one of the ones disappointed that statistically speaking Stafford looks like a beast, however actually watching him play and seeing these stats kind of proves that they do not tell the whole story.
Stats can be misleading. See Gragnani, Mark Andre.

 

If the statistics do not favor your argument or opinion, simply ignore them! That's standard operating procedure, is it not?

Link to comment
Share on other sites

If the statistics do not favor your argument or opinion, simply ignore them! That's standard operating procedure, is it not?

 

I'm not suggesting that at all. When my eyes and gut tell me something that the stats say otherwise I certainly want explanation though.

Link to comment
Share on other sites

If the statistics do not favor your argument or opinion, simply ignore them! That's standard operating procedure, is it not?

 

Stats would seem to imply that I've got lying eyes! So my choice is the stats or my lying eyes? I'm pretty sure about Stafford and Grags so I am inclined to believ my lying eyes and I guess your right, stats be damned. I say iognore them with these two guys.

Link to comment
Share on other sites

I'm not suggesting that at all. When my eyes and gut tell me something that the stats say otherwise I certainly want explanation though.

 

Your explanation is that the stat was invented by someone on the current staff who needs to justify the performance of their signings.

 

When you watch a boxing match, and through 6 rounds, fighter A lands 97/138 jabs and 25/65 power punches.................and fighter B lands 33/67 jabs and 30/80 power punches........and fighter A is bleeding and swollen and has hit the mat 2 times.........what do you trust? Statistics or results?

 

That is what Darcy does all day the past 5 years. He makes up spreadsheets. I kid you not. I've heard enough stuff out of his mouth alone that makes me want to put a gun to my temple. I can only imagine what wooful numbers are being presented behind the scenes.

Link to comment
Share on other sites

Your explanation is that the stat was invented by someone on the current staff who needs to justify the performance of their signings.

 

When you watch a boxing match, and through 6 rounds, fighter A lands 97/138 jabs and 25/65 power punches.................and fighter B lands 33/67 jabs and 30/80 power punches........and fighter A is bleeding and swollen and has hit the mat 2 times.........what do you trust? Statistics or results?

 

That is what Darcy does all day the past 5 years. He makes up spreadsheets. I kid you not. I've heard enough stuff out of his mouth alone that makes me want to put a gun to my temple. I can only imagine what wooful numbers are being presented behind the scenes.

 

I'm a data oriented guy. It comes with my profession. I trust data more than I trust what I perceive. I prefer to know that the GM of my hockey team is using objective measurements to make decisions. But when a stat seems to run very, very counter to a subjective observation questions about which one is valid and why need to be asked. I guess what I am saying is, I find nothing wrong with spreadsheet driven decisions (I make them all the time). What I am wondering about is the explanation for why this particular stat appears to fly in the face of perception, at least in the case of Stafford. Is it because I am not trained in what to look for and am making a mistake? Is it because of a flaw inherent to this data? Is it because it isn't being properly applied? etc, etc...

Link to comment
Share on other sites

I'm a data oriented guy. It comes with my profession. I trust data more than I trust what I perceive. I prefer to know that the GM of my hockey team is using objective measurements to make decisions. But when a stat seems to run very, very counter to a subjective observation questions about which one is valid and why need to be asked. I guess what I am saying is, I find nothing wrong with spreadsheet driven decisions (I make them all the time). What I am wondering about is the explanation for why this particular stat appears to fly in the face of perception, at least in the case of Stafford. Is it because I am not trained in what to look for and am making a mistake? Is it because of a flaw inherent to this data? Is it because it isn't being properly applied? etc, etc...

 

It is fine to use data to make decisions. I do so as well. If you have a valid set of criteria forming the numbers....great.

 

From what I understand, this number does not consider the things I listed upthread. If you have a sturdy defense that keeps smaller perimeter players outside taking poor shots towards the net, it is going to reward the perimeter guy and punish sound, in-control defensive play. If you play a retreating style of defens elike the Sabres seem to do on opposition chances, they may only get one really nice shot off. Then you have what guys do without the puck. Vanek can stand there for 40 seconds and not get moved as the rest of the guys dipsy doodle with the puck, not getting it through.

 

My beef with Darcy has always been he doesn't have the ability to Quantify subjective material. Weave, I think you are great at doing so. Everyone laughs at LGR for saying Roy is as good as Getzlaff, but if you are going to accept statistics sight-unseen.......that is the same thing.

 

Paul Hamilton quoted Darcy as saying blocked shots is the most overrated stat out there. Given the makeup of this Corsi number where softies reign supreme, I can see now why he said that.

Link to comment
Share on other sites

Here is another example....if I am reading the Corsi idea correctly.

 

Drew Stafford loafs at his own blueline on a 4-4 in OT and an Ottawa defender goes in uncontested to put in a game ending goal.

 

Robyn Regehr maintains position on a 2 on 2 and blocks a harmless shot 25 feet out then drives his man into the boards sending him sprawling to the ice.

 

The result is the same.

 

-1 Corsi for both.

Link to comment
Share on other sites

 

My beef with Darcy has always been he doesn't have the ability to Quantify subjective material. Weave, I think you are great at doing so. Everyone laughs at LGR for saying Roy is as good as Getzlaff, but if you are going to accept statistics sight-unseen.......that is the same thing.

 

I get what you are saying but you've made a poor choice of wording. By definition you cannot quantify subjective material.

Link to comment
Share on other sites

I get what you are saying but you've made a poor choice of wording. By definition you cannot quantify subjective material.

 

Incorrect.

 

That is what I do. That is how trillions are made or lost. Sure, there are analysts who plug numbers into a financial model and make a decision based on their projections. The makeup of their model in itself may be flawed, or many models are just cut and paste and you can make a living being a good steward...say insurance, index funds, T-bills, etc.

 

When you deal with something like a hockey player, you are projecting. You are taking things such as past performance into account, and you have hard data such as age, height, weight. But there is a level of intangibles and chemistry involved that go into success. A good GM and Coach need to be able to make decisions based on the "unknown number", and quantify it. Whether it be giving a certain salary, or choosing linemates, or the system you adopt to get the most out of your assets in a synergistic way.

 

You need to be a statistician, an economist, a psychologist, a physiologist, and maybe even a chaplain if you really want to build a cohesive hockey team. Not everybody has that skill set. Some guys can get by, but some guys let their weakness show to a point that it becomes obvious they have a greater hurdle to jump if they are ever to make it to the top. I don't think Regier is a stupid person. I think he doesn't have the skillset to build a synergistic, cohesive team with staying power given the parameters of the NHL. He would make a great accountant.

 

Markets are made because of subjective quantification. If everything was so obvious, the world would be a constant stalemate.

Link to comment
Share on other sites

From what I understand, this number does not consider the things I listed upthread. If you have a sturdy defense that keeps smaller perimeter players outside taking poor shots towards the net, it is going to reward the perimeter guy and punish sound, in-control defensive play. If you play a retreating style of defens elike the Sabres seem to do on opposition chances, they may only get one really nice shot off. Then you have what guys do without the puck. Vanek can stand there for 40 seconds and not get moved as the rest of the guys dipsy doodle with the puck, not getting it through.

 

I don't think you understand. The entire point of the statistic is to attempt to quantify puck possession. As a set of five skaters, you do not achieve good Corsi numbers by getting one weak shot through and then retreating on defense. You earn a good Corsi number by getting the puck efficiently out of your own end, into your opponents zone, and keeping it there as long as possible. That sounds like a great recipe for success to me.

 

The Sabres as a team have very poor puck possession, Stafford and Gragnani are just the best of the lot. The numbers I quoted in the initial post were Corsi numbers relative to the rest of the team. It's the best way to evaluate players compared to their teammates. It calculates a Corsi number for each player while they're on the ice, a Corsi number for the rest of the team when said player is off the ice, adjusts both for ice time, and reports the difference. If you want to look at players across teams, the Corsi number for each player while they're on the ice is the best metric. Compare the Sabres and the Red Wings:

 

Detroit Red Wings

Buffalo Sabres

Boston Bruins

 

The Red Wings have 19 players with 10 or more games played, of which 1 has a negative Corsi number while on the ice. The Sabres who have 22 players with 10 or more games played, of which 14 have a negative Corsi number. Gragnani would be 9th on the Red Wings and Stafford would be 14th. The Red Wings are the kings of puck possession and routinely have great Corsi numbers. Check out Boston's numbers. They're also much better than Buffalo's. It's not as if the Sabres are being made to look good by this statistic.

 

Here is another example....if I am reading the Corsi idea correctly.

 

Drew Stafford loafs at his own blueline on a 4-4 in OT and an Ottawa defender goes in uncontested to put in a game ending goal.

 

Robyn Regehr maintains position on a 2 on 2 and blocks a harmless shot 25 feet out then drives his man into the boards sending him sprawling to the ice.

 

The result is the same.

 

-1 Corsi for both.

 

You've obviously come into this topic with a bias against it so I don't see much point in arguing with you. The Corsi number was created to address the sample size problem with traditional statistics. It's meant to look at over 100 shot attempts per game. You've cherry picked two. Congratulations.

Link to comment
Share on other sites

Thanks Wraith.

 

I just get the willies when I see stuff like this. I remember Darcy saying we just need to get rid of 10% of the hits to get rid of 90% of head injuries, as he constantly lobbies to remove physicality.

 

If people find validity in his formula and want to use it, great. We all have different ideas on what makes things go round. I'm happy to see that it isn't just being used to validate Sabres. You also have to look at the "fat people have a greater chance of heart attack" studies. Is the number relevant, or do good teams just spit out numbers that would be expected?

 

Either way, I am learning something...thanks.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...