Jump to content

Advanced Stats Explained


LGR4GM

Recommended Posts

1 minute ago, Randall Flagg said:

Also note that all stats are weaker when analyzing specific players than with overall teams. that's why the example I highlighted in the first post is so telling (6 cup winners in 6 years in top 4 in Corsi) whereas there are some damn good players with damn bad corsi (and plus minus and everything but production stats). That's why when I talk about individuals I try to incorporate as much context as humanly possible because you can arrive to some dreadful conclusions if you don't take lots of things into account. Bad (or good) teams can do wild things to an individual player, which is why Adam Larsson gets traded for Taylor Hall I guess. 

Right - Good/Great players on terrible teams can look statistically similar to Average players on great teams if you just take the numbers in a vacuum without context. Hockey is a team sport so the advanced stats in the NHL are heavily weighted to who you are playing with and your opponents, as opposed to Baseball which is far more isolated on just your talent. 

Taylor Hall looked way better on good New Jersey than he did on a crappy Edmonton team, but his raw talent level didn't really change. It will be interesting to see how much better ROR's metrics look on a good Blues team v. our tire fire team.

Link to comment
Share on other sites

1 hour ago, Randall Flagg said:

That's what I tried to point out - if you freeze time at any given moment, looking at goal differential will closely align with the standings, in general more so than corsi up to that point. But the tests which show an improvement in advanced stats over regular ones are taking each of those numbers right now, and using them to check the standings in 40 games, or any time in the future. Those are where advanced metrics do better even if the standings now are the same. Namely, if you just stick with the standings now for 40 games from now, there is statistically significant analysis that shows you will have a worse prediction than those taking Corsi from right now (and those guys worse than those who use a maximized combination of individual stats). 

So no, that isn't true. Here's an example of a test done with the stats:

player.png

team.png

GF% is plus-minus converted into a percentage. It is the goal differential stat and it shows comprehensively worse predictive power at a team and individual player level, which means it's simply less useful. Note that the claim isn't that shot-based metrics are a key to the universe, either. Those R^2 are low. They aren't so low as to be useless, and we should note that if they approached 1, then you'd never have to watch a hockey game again to know exactly what will happen in what order, and we know that sports aren't like that and shouldn't be like that. 

So in general, call your team's goal differential X. Right now, it is a good indicator of where you are in the standings. But if you use X now to figure out X later, (and this X later will be a good descriptor of where you are later) you will in general do worse than if you use shot-based metrics to predict your X later, which will tell you a lot about standings later, and this is shown to be comprehensively true through rigorous analysis designed to figure out exactly that ability of any stat on the planet. 

Looking at those graphs... it appears those 3 metrics converge at the 70game mark?   Meaning, if I understand correctly, that all 3 metrics are equivalent predictors at +70 games out?    If that's true, shouldn't the standings (after 80 games) reflect the same... ie, GF% and CF% should be relatively equal as far as the standings go?

You've mentioned a negative about GF% is that goals are much more rare events than shots.    Would you agree that throughout a season a teams CF% doesn't not fluctuate as much (on a per game basis) than their GF%?    If that's true, and CF% is a better predictor of future success.... shouldn't all the best CF% teams percolate up to the top of the standings by the end of the season?

Link to comment
Share on other sites

2 hours ago, Randall Flagg said:

It's also unfortunate that using obnoxious stats jargon makes you sound fervently religious about it by default. It's hard to talk about the stupid things without coming off as elitist and eye-rollingly self-righteous. 

 

Eye-rollingly. Going to show that one to my significant other who writes for a living 

Suck eh?  I get the same crap when I demonstrate that I can talk in depth about beer.  People assume I am a beer snob as opposed to just educated.

 

Link to comment
Share on other sites

18 minutes ago, pi2000 said:

Looking at those graphs... it appears those 3 metrics converge at the 70game mark?   Meaning, if I understand correctly, that all 3 metrics are equivalent predictors at +70 games out?    If that's true, shouldn't the standings (after 80 games) reflect the same... ie, GF% and CF% should be relatively equal as far as the standings go?

You've mentioned a negative about GF% is that goals are much more rare events than shots.    Would you agree that throughout a season a teams CF% doesn't not fluctuate as much (on a per game basis) than their GF%?    If that's true, and CF% is a better predictor of future success.... shouldn't all the best CF% teams percolate up to the top of the standings by the end of the season?

I'm not too sure of the details of this number set, because both stats are prone to tremendous fluctuations 5 games in, and I don't think that they've simply taken whacky early season stats and projected to game 82. I believe somehow the setup is that the initial sample size is large. But yes, in general to maximize your predictive power you'd want to make your prediction for 40 games out and then check again in 40 games, and do it again. But nobody really looks at stuff like this, the only takeaway I have is that in general, among the 10 stats I try to list, I'll include ones that do a better job than others often, but usually even both just for completion.

I don't have the raw numbers, but I wouldn't necessarily speak on a per game basis more than to say a 5 game hot shooting stretch is far more likely to radically alter the standings in a way that won't sustain itself than the likelihood that some bad team suddenly way-outshoots a good team 5 straight games . Even goals have a decent sample size after a full season, but it's still far smaller than shot attempts, and it is known to be more noisy even though I don't have numbers for it off-hand . Of course, shot attempts are a huge umbrella which is why things like expected goals are tinkered with. 

And no, because the correlation factors are still pretty weak, like I mentioned in the first post. If that happened then I wouldn't even bother watching hockey. The stats aren't about claiming to know everything, it's about using a better one when it's available. I'm always going to want to talk numbers so I may as well use ones with better power, and use them in a way which maximizes the information we receive from them. But like I mentioned, you can get some generally accurate broad swipes with them, like a 6 year stretch where every cup winner is in the top 4 Corsi teams while in that same stretch the top 10 goal differential teams only account for 4 of them. 

Luckily there's so much more that factors into hockey than one single stat. Carolina has amazing possession numbers and when you watch them they're highly competent and entertaining with the puck. I watched them take like 60 shots on goal in a game this year, and it's because they had the puck the whole time. But their finishing power is terrible outside of Skinner, their depth is weak, and their 3-year-goaltending stretch is probably the worst in the league by a mile, so they lose games. 

Edited by Randall Flagg
Link to comment
Share on other sites

7 hours ago, SDS said:

People just don't understand statistics and randomness in general. A 60% chance of an event occuring does not mean it happens 100% of the time just because it is more likely. It should be obvious, but it isn't because most people are bad at math. Really bad.

It's just like all those people I've come across who were "shocked" when the Sabres weren't winning draft lotteries, because their 20% chance to win was the highest of any single team in the league. They see that every other team's number was below 20, therefore expected a win. 

Link to comment
Share on other sites

2 hours ago, Thorny said:

It's just like all those people I've come across who were "shocked" when the Sabres weren't winning draft lotteries, because their 20% chance to win was the highest of any single team in the league. They see that every other team's number was below 20, therefore expected a win. 

The fact they won once out of 3 times they were dead last actually puts them on the lucky end.

Link to comment
Share on other sites

On 7/6/2018 at 2:17 PM, pi2000 said:

Looking at those graphs... it appears those 3 metrics converge at the 70game mark?   Meaning, if I understand correctly, that all 3 metrics are equivalent predictors at +70 games out?    If that's true, shouldn't the standings (after 80 games) reflect the same... ie, GF% and CF% should be relatively equal as far as the standings go?

Without knowing exactly what future time frame was being predicted...I had a different interpretation of the charts. I had only two takeaways:

1) Expected Goals works best as a predictor when using the past 30 games or so (highest point of R-sq on it's curve), CF% works best with the past 20 games, and GF% works best with about 40 games of data. In other words, using less data than those number of games for each variable doesn't let us utilize their full predictive power. This intuitively makes sense as we can't expect a 5 or even 10 game run to necessarily represent what will happen some number of games in the future as there is too much luck over a short time frame. Likewise, as additional games are added to the data set being used to make predictions beyond the optimal number of games, the predictive power of each variable gets worse. Intuitively this makes sense to me because at some point the data being used to make predictions has occurred long enough ago that conditions have changed enough (think injuries, trades, luck, new coach's system kicks in or players lose interest, etc) that that data's inclusion in the model has a negative impact on it's predictive power. 

2) At all points from 10 games through 70 games, both CF% and Exp. Goals are better predictors of future performance than GF%.  

Why are Corsi and Expected Goals being better predictors than GF% (i.e. +/-)? To some extent, who knows, they just are. But...I think it makes sense, at least to me. 

Goals for and against are of course what teams are focused on to win games and can directly explain W-L record with some set of probabilistic outcomes associated with every team +/-.   In other words, a team that scores 200 goals and gives up 200 goals over the course of a season is most likely to have a 0.500 record (I'm ignoring the impact of the loser point) but that likelihood is much less than 50% chance. They can easily have win%'s of .490 or .510 etc. It's a bell curve of outcomes centered around 0.500. If they happened to have scored 200 goals in one game and were shut out in every other game they will have a win % of 0.012. This is way out on the extreme tail of outcomes, but it's not theoretically impossible. 

Anyway, goals are scored or allowed due to myriad factors: number of shot attempts, quality of shots, skill of the shooter, skill of the goaltender, luck,  etc. In other words, goals for and against are dependent variables, not independent variables. Goals do not stand alone as an explanation of events.  Pragmatically speaking, everything we talk about as a sports statistics could be described as a dependent variable based on something else (compared to say, temperature at a given pressure, which is entirely descriptive in and of itself). Since we can't consider every variable in existence, any predictive stat will inherently not have an R-squ approaching 1.0 since all the variables we are not explicitly considering have some level of impact that will create noise in our predictions. But...we can get more discrete than Goals by looking at the major drivers of how goals are scored (or allowed) and separate those factors out. Now we have searched one level below goals and found some variables that have less luck involved with them and have some level of fundamental consistency over time. We still have a bucket containing the rest of the variables ("everything else") which lowers our predictive power, but at least now we can discern trends in key variables that have some predictive power. These key variables are a stronger signal of what will happen next. Whereas just looking at goals alone co-mingles the signal (the key variables) with the noise (everything else) and hence has a lower R-squ. 

This  may make no sense to anyone but me but it helped to try to write it out ? 

 

  • Like (+1) 1
Link to comment
Share on other sites

That was great, thanks.   

You'd think there would be a function that combines all these individual metrics into a single index, eg a power index of some sort... that would have superior predictive power when analyzing matchups.  

Link to comment
Share on other sites

13 hours ago, pi2000 said:

That was great, thanks.   

You'd think there would be a function that combines all these individual metrics into a single index, eg a power index of some sort... that would have superior predictive power when analyzing matchups.  

Maybe it's like reconciling quantum theory with relativity - once we figure it out, we will be masters of the universe

Link to comment
Share on other sites

1 minute ago, Randall Flagg said:

I don't know how many times I've almost gotten my credit card out to pay for the full subscription to that site. It's just so fun to look through

Didn't know you had to pay for it but I'm seriously considering it now. Feel like a kid in a candy shop right now

Link to comment
Share on other sites

10 minutes ago, WildCard said:

Didn't know you had to pay for it but I'm seriously considering it now. Feel like a kid in a candy shop right now

You'll just get info from past years and stuff, I think, with the subscription. You can use what you can see there, but there's more on the site than you can see.

Link to comment
Share on other sites

It's fun to do silly little comparisons like so: 

eicheja96

 

eicheja96

 

eicheja96

 

These are graphs for Jack's first 3 seasons. Take them at face value, but they show shot percentage shares of his teammates in situations with him (their number in black) and without him (their number in red). Upper-right is where you want to be. Lower left is bad. So you want black up and to the right of red. Rookie year, you can see that Jack was largely pulling his teammates to "bad." Sophomore year it was a mixed bag which largely averaged out to not making players better or worse, with the noted increase in number of events both good and bad ( being dragged parallel to the good/bad axis towards "fun." Then his third year, with arguably his worst consistent set of linemates, he's doing a much better job of pulling guys in the right direction. Sam was the only player to get worse with him by this metric and even then they were bang on the line. There's very real progression in Eich's game both on the surface watching him and underneath. I can't wait to see him play with some real hockey players someday (...in Boston  eight years from now? let's go Jason get some done! ? )

Edited by Randall Flagg
Link to comment
Share on other sites

1 hour ago, LGR4GM said:

I think Reinhart should be kept off Jack's line and I think that Connor Sheary and a healthy Okposo will do wonders for Jack's line. Also Dahlin feeding him pucks. 

Okposo will continue to be one of the worst options for Jack's line, both statistically and in terms of what Jack needs and what Kyle needs versus what they give each other. 

And he was healthy all season last season.

Link to comment
Share on other sites

6 minutes ago, Randall Flagg said:

 

And he was healthy all season last season.

 

Healthy? Maybe. He missed six games.

But he certainly wasn’t well. Whether he was hampered by an ability to train over the summer, or a general anxiety over playing hockey, his stint in the ICU clearly affected him.

I’ll use +/- despite its limitations because it illustrates my point: he was a cumulative -22 the five seasons prior to last, from his best of -2 to a worst of -9. Then he puts up a -34?

It’s entirely possible that Okposo peaked during his last three years on the Island when he was at a 70-Point pace and is halfway through a steady decline that will have him out of the league in two years. He just turned 30.

But it’s not a guarantee he will be Matt Moulson. Iginla had three down years in a row in his late 20s before having his best 3-year run of his career in his early 30s. Shane Doan did something similar, as did Jagr.

This year should show if his decline - and he was worse than his numbers - was a result of declining skill, or mostly about being physically and mentally ready to play.

Link to comment
Share on other sites

2 minutes ago, TrueBlueGED said:

"A healthy Okposo" is the new "a healthy Ennis." It's a desperate straw to be grasped by fans hoping against all odds that a player hasn't reached a cliff. 

Generally agree, but I wouldnt have termed KO healthy, able to play yes, but seemed that his game was off.  Whether it is because of post injury issues... or he’s falling off the cliff, we will know at latest by November.

Link to comment
Share on other sites

Just now, dudacek said:

 

Healthy? Maybe. He missed six games.

But he certainly wasn’t well. Whether he was hampered by an ability to train over the summer, or a general anxiety over playing hockey, his stint in the ICU clearly affected him.

I’ll use +/- despite its limitations because it illustrates my point: he was a cumulative -22 the five seasons prior to last, from his best of -2 to a worst of -9. Then he puts up a -34?

It’s entirely possible that Okposo peaked during his last three years on the Island when he was at a 70-Point pace and is halfway through a steady decline that will have him out of the league in two years. He just turned 30.

But it’s not a guarantee he will be Matt Moulson. Iginla had three down years in a row in his late 20s before having his best 3-year run of his career in his early 30s. Shane Doan did something similar, as did Jagr.

This year should show if his decline - and he was worse than his numbers - was a result of declining skill, or mostly about being physically and mentally ready to play.

Yes, he did put up a -34. He was a lot worse of a player, with eroded skills, being used like that hadn't happened. 

If he wasn't physically or mentally "ready to play" then he shouldn't have been anywhere near NHL ice. Especially the "physically" part. If he wasn't 100% in that regard, that could have had life-altering consequences. So I'm not convinced he'll be more "ready to play" in any sense than he was last year. 

I could see him having better jump in the first couple months of the season, I really can. But anything more than that, the only thing that'll convince me is watching it happen. Just like I said with Ennis 2 years ago

Link to comment
Share on other sites

You’ve never had a bad stretch at work or school because you were consumed over something in your personal life?

Or not adequately prepared for a project because of life circumstances? Suffered through a physical ailment yet continued to go to work because you could and were needed?

According to Okposo, it was a fact he did not adequately train last year due to his recovery -that’s the physical.

Also, he was in ICU for an extended period of time for something traumatic - that’s the mental.

Recovery takes time and sometimes it has to happen on the job. Pride is a factor. Not understanding how to deal with what you are dealing with on a personal level is another. Your teammates and coaches not understanding it either is another.

Some things aren’t as simple as the numbers.

Ennis couldn’t overcome his issues and it is entirely possible Okposo may not either.

Edited by dudacek
Link to comment
Share on other sites

16 minutes ago, dudacek said:

You’ve never had a bad stretch at work or school because you were consumed over something in your personal life?

Or not adequately prepared for a project because of life circumstances? Suffered through a physical ailment yet continued to go to work because you could and were needed?

 According to Okposo, it was a fact he did not adequately train last year due to his recovery -that’s the physical.

 Also, he was in ICU for an extended period of time for something traumatic - that’s the mental.

 Recovery takes time and sometimes it has to happen on the job. Pride is a factor. Not understanding how to deal with what you are dealing with on a personal level is another. Your teammates and coaches not understanding it either is another.

 Some things aren’t as simple as the numbers.

 Ennis couldn’t overcome his issues and it is entirely possible Okposo may not either.

All fair.

I'll believe it when I see it, and until then, I will not be convinced, for the reasons I've outlined enough to saturate and tire everyone.

I continue to ponder something you posted a while ago about a take that was indicative of a wider perspective of life, and I think that plays into every debate that happens here, this one no exception.

Link to comment
Share on other sites

This topic is OLD. A NEW topic should be started unless there is a VERY SPECIFIC REASON to revive this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...