Marvin Posted July 7, 2019 Report Posted July 7, 2019 (edited) Pending Admin approval, I created a new club so that we can have a repository for statistics references and discussion of the methodology, quality, etc. When I add a stat to the discussions, I will always give the simplest way to arrive at a variant of the stat, some more complex adjustments, etc. I will also try to add evaluation to the discussions. I have access to the very earliest of hockey data analysis, even pre-dating my own from 1992. I have the original computations used in The Hockey News for the columns, "For Argument's Sake." These are very primitive and date back to when a $2500 Atari 400 was close to top-of-the-line. For example Goaltender Perseverance ratings: (Save pct *6 + average shots against / game) / 0.6 Created: 1981 Inspiration: Avoid using GAA for comparing goaltenders because good goaltenders on bad teams look worse than bad goaltenders on good teams. Logic: Save Percentage is generally a more predictable long-term, team-independent statistic than GAA. Add in the shots against per game to measure workload; thus the same save percentage for a goaltender on a weaker team that surrenders more shots will show a higher perseverance rating and therefore better performance. Advantages: First goaltender stat to try to rate goaltenders by combining personal performance and workload; found goaltenders who were over-rated by GAA who were terrible but played on very defencive teams. (Prototype: Pete Peeters later in his career) Disadvantages: Rated all shots equally; proportions were derived to rescale goaltenders to the THN staff's perceptions and evaluations. (Prototype: Tom Barrasso early in his career) Common Adjustments: Varying the dependence on the shot rates; incorporating shot difficulty; incorporating situational issues, such as a two-man advantage. I did a pile of stuff when I got access to our old Sun machine at the MSU Math Department (the Solaris beta OS). I did a lot of work on what we call analytics back on my old Amiga in the 1990s. Most of this stuff has been largely superseded by modern analytics, but they are still pretty accurate, simple enough to compute, and easy enough to understand that I like to use them for basic analysis just to get a rough idea whenever I run into a claim that looks either counter-intuitive or completely out of whack. Edited July 7, 2019 by E4 ... Ke2 5 2 Quote
Marvin Posted July 8, 2019 Author Report Posted July 8, 2019 I put 4 stats in there. I avoided stats where I know we have far greater expertise on this board, such as TRpm and RApm. I will let the experts put them in. I added some evaluations of those statistics, because they are very old, and therefore have a long track record which we can evaluate. I would other people to chip in and let me know what they think and what adjustments to make. ASIDE: One assertion that gets a lot of support is @pi2000 claiming that because you can't rehearse set plays, only those that use concrete measures, such as TRpm, are worthwhile. I agree in principle that hockey is controlled chaos. So is the stock market. That does not make it impossible to derive figures based on seemingly ephemeral situations. It just means that you have to take them in context and with a bit of a jaded eye. For instance, I am a huge believe in quality of competition (Q of C) from a tactical evaluation; however as a long-term statistic, it is not worth nearly as much because almost all changes on the fly tend to flatten out the QofC. If we just ignored these numbers in other walks of life, virtually all applied mathematics that involves statistics and operations research would vanish. (You ever tried quantifying and modelling luck and trust?) Because of this uncertainty (kind of like a Hockey Heisenberg Uncertainty Principle), I don't use any single number to evaluate a player. For instance, I start with Adjusted Plus-Minus, incorporate Defencive Zone Starts, situational adjustments, and such to get as complete and nuanced an evaluation of a player who does not score a lot. (Or, for that matter determine if a big scorer should be moved because he scores a disproportionate amount of trivial goals in low-leverage situations and is completely defencively inept.) 2 Quote
... Posted July 8, 2019 Report Posted July 8, 2019 (edited) Good start. It's likely to be a slow go in here until the season picks up and we really start to reference the fancy stats, so, don't get discouraged! At some point, I may contribute on RAPM. For now, Evolving Hockey has a nice reference section for those who need summer reading material: https://evolving-hockey.com/ Select the "More" tab for the references. Edited July 8, 2019 by ... Schlemiel and Schlimazel 1 Quote
stinky finger Posted July 8, 2019 Report Posted July 8, 2019 I remember when sports were fun and simple. 2 Quote
North Buffalo Posted July 9, 2019 Report Posted July 9, 2019 I'll tell my brother, I have boy muw waaa haa! Quote
North Buffalo Posted July 9, 2019 Report Posted July 9, 2019 1 hour ago, ... said: Good start. It's likely to be a slow go in here until the season picks up and we really start to reference the fancy stats, so, don't get discouraged! At some point, I may contribute on RAPM. For now, Evolving Hockey has a nice reference section for those who need summer reading material: https://evolving-hockey.com/ Select the "More" tab for the references. PLEASE Add topic in club and maybe cut and past reference definitions page. 1 Quote
Marvin Posted July 9, 2019 Author Report Posted July 9, 2019 1 hour ago, Leaf Blower said: I remember when sports were fun and simple. Actually, statistical analysis and such is part of why I find sports fun. And the analysis thereto made playing it and talking about it much simpler for me. So, believe it or not, this made it more fun and simpler for me. 1 hour ago, ... said: Good start. It's likely to be a slow go in here until the season picks up and we really start to reference the fancy stats, so, don't get discouraged! At some point, I may contribute on RAPM. For now, Evolving Hockey has a nice reference section for those who need summer reading material: https://evolving-hockey.com/ Select the "More" tab for the references. I will get to that when I finish working. I have been doing this as a stuff to do for a break. Quote
... Posted July 9, 2019 Report Posted July 9, 2019 (edited) 1 hour ago, North Buffalo said: PLEASE Add topic in club and maybe cut and past reference definitions page. On the bold - what? On the non-bold. I could cut-n-paste the page contents, but that wouldn't be cool. Additionally, you can not copy the direct URL to the reference page the way that site is set up. Edited July 9, 2019 by ... I need another pot of coffee, excuse me. Quote
North Buffalo Posted July 9, 2019 Report Posted July 9, 2019 So do a reference page of definitions of terms and abbreviations. There was a pretty decent one to start in that link. Quote
... Posted July 10, 2019 Report Posted July 10, 2019 Pulled from another thread, background info on the RAPM charts. This describes the RAPM charts in detail: https://hockey-graphs.com/2019/01/14/reviving-regularized-adjusted-plus-minus-for-hockey/ One concern, and argument against the RAPM charts, is the effect of line-mates. This effect is built into the equations. Quote Additionally, we will use a technique called “regularization” in the linear regression (this is where “Regularized” in “Regularized Adjusted Plus-Minus” comes from). Regularization in a linear regression comes in two main forms – ridge regularization (also known as Tikhonov regularization or L2 regularization) and LASSO regularization (“Least Absolute Shrinkage and Selection Operator”, also known as L1 regularization). The main purpose is to address multicollinearity that is present in the data. Why do we care about multicollinearity? Well, when a pair of players play together for a significant amount of time (the classic example is Henrik and Daniel Sedin who spent over 90% of their career time on ice together) the coefficient estimates in a traditional OLS regression will be extremely unstable (and therefore unreliable). Regularization combats this by adding some amount of “bias” into the model (Gaussian “white-noise”) to decrease the variance in the coefficient estimates [more info here]. What this means, essentially, is unstable coefficient estimates are “penalized” (or “shrunk”) based around a Gaussian distribution where 0 is the mean. Ridge regularization will pull coefficients towards 0 (but never exactly 0). LASSO regularization will pull coefficients toward 0 and also “zero” some coefficients. A summary of what we're NOT looking at, in general, with these charts: Quote So let’s summarize what the RAPM coefficients are. They are offensive and defensive ratings for each player that are isolated from the other skaters they played with, the other skaters they played against, the score state, the effects of playing at home or on the road, the effects of playing in back-to-back games, and the effects of being on the ice for a shift that had a faceoff in the offensive or defensive zone. And, addressing another concern, TOI: Quote Above, we can see that players who play less than ~100-150 EV minutes are being “regressed” to the mean (0 for each regression) – this is the regularization pulling these players towards the mean. Not only does this help to deal with multicollinearity (to an extent), it also adds a “quasi-Bayesian” aspect to the player ratings. In other words, if a player has very few shifts in the data, they are brought closer to the league mean using a Gaussian “prior” distribution. In an OLS regression, these players would have wildly inflated per 60 ratings. Quote
SwampD Posted July 10, 2019 Report Posted July 10, 2019 I asked this question in the club, but guess I could put it here. Quote
... Posted July 10, 2019 Report Posted July 10, 2019 You're in luck. I have obtained video of the people who collect stats doing the work... 2 Quote
triumph_communes Posted July 10, 2019 Report Posted July 10, 2019 4 hours ago, ... said: Pulled from another thread, background info on the RAPM charts. This describes the RAPM charts in detail: https://hockey-graphs.com/2019/01/14/reviving-regularized-adjusted-plus-minus-for-hockey/ One concern, and argument against the RAPM charts, is the effect of line-mates. This effect is built into the equations. A summary of what we're NOT looking at, in general, with these charts: And, addressing another concern, TOI: Need to be clear: The correction factor ***attempts*** to account for teammate effects, but by no means can it completely do so. It also has no ability to distinguish chemistry with one player va no chemistry vs another. When players have very few linemates, the model will fail them to some degree. Quote
Randall Flagg Posted July 10, 2019 Report Posted July 10, 2019 (edited) So Sean Tierney (a creator of many charts that get posted here and everywhere) has this pretty neat lineup creator tool that uses a projected 82-game WAR (data and WAR information is from EvolvingWild https://evolving-hockey.com/ (check the references section for weeks' worth of reading on these models) with WAR-specific stuff here: https://hockey-graphs.com/2019/01/17/wins-above-replacement-the-process-part-2/). Since WAR is ultimately an attempt to combine everything a player does into a single number (with obvious constraints on the number of variables and their inherent uncertainties and yaddah yaddah) you can theoretically have a somewhat decent guess at how a team would fair, and it makes the lineup creator tool fun. Here's the tool: https://docs.google.com/spreadsheets/d/1hkm-5QqNEQKULy4Bp8VTQHJoQlcehSgx1lZKMxQ4Uog/edit#gid=276138252 I decided to see how well the WAR creator replicates last season's standings. I used dailyfaceoff to get most common lines/pairings used by teams, and hockey reference to get time-on-ice information, and Tierney's time on ice adjustment was sort of averaged from the combinations of players made from this information. Obviously, this is incredibly patched together, because it's just 12 forward spots and 6 d spots and so injuries and thus injury replacements are neglected - it's a tool to get total WAR numbers for a full season of just these 18 skaters and 2 goalies, after all. But still, I was curious to see how it did. Hopefully using raw time on ice was okay since I think the individual WAR number for each player takes into account their power play or PK time. Anyway, after doing all of this with each team, these are the results: The difference between the model and reality increases as you move left to right. I would say that this did a bit better than I expected - for 2/3 of the NHL, it was within plus or minus 3 wins. The worst performance was six wins off. Another interesting trend - it tends to undervalue things, only seven of the thirty one teams were calculated to finish higher than they actually did. 23 teams did better than the model thought, and one was bang on. There are probably infinite confounding factors for this phenomenon, maybe including the fact that sometimes injury replacements can be better than guys pushed out of the bottom of the lineup? I tended to stick players in with the most games played, and always players that started the year with the team, so it misses the fact that teams often bolster their lineups at the trade deadline. This is why it over-predicts Ottawa - both Stone and Duchene were present in their lineup, which matches the fact that they closed the year with just seven wins in their last 24 games, after 22 in their first 58 (a decline in win percentage of 8%). Interesting is that, of the large gaps to the right, the model is always conservative - it does its over-predicting when it does a better job of predicting, not when it's way off. I don't see a trend in the teams there either - Washington, Islanders, Oilers, Flames, Ducks, Nashville, Winnipeg, Vegas. Some good, some bad. None of the elite teams. FWIW, all playoff teams in that group lost in the first round except the Islanders, who were swept in round 2. Perhaps the model saw them for what they were. Or perhaps it's completely random! Either way, if you come across or want to use the model, presuming that you'll use it to see what the Sabres or another team might look like for next year, this gives you some idea of its performance. It would never claim to be elite at this, because of aforementioned drawbacks. I'm not sure I could have done better writing out standings before the season started. In fact, I might try that this year - make my own detailed standings predictions, write them down, and then do this with the WAR for that season afterwards to see what does a better job. Because this was fun! Here are the NHL Standings based on the WAR lineup creator: 1.)TBL (-) 2.)BOS (-) 3.)TOR (+5) 4.)PIT (+5) 5.)SJS (+2) 6.)DAL (+10) 7.)CGY (-3) 8.)CBJ (+6) 9.)STL (+2) 10.)WSH (-5) 11.)NYI (-5) 12.)MTL (+3) 13.)CAR (-1) 14.)WPG (-1) 15.)NSH (-5) 16.)COL (+2) 17.)MIN (+5) 18.)VEG (-1) 19.)PHI (+4) 20.)FLA (-1) 21.)CHI (-) 22.)ARZ (-2) 23.)VAN (+1) 24.)NJD (+6) 25.)DET (+4) 26.)BUF (+2) 27.)NYR (-) 28.)OTT (+4) 29.)ANA (-4) 30.)EDM (-4) 31.)LAK (-1) It really only over-predicted Dallas, and it liked New Jersey better than they were by a lot (which is weird, because I didn't include Hall in their lineup...) This would have given us a first round of: Boston vs Toronto Pittsburgh vs NYI Washington vs CBJ Tampa vs Carolina St. Louis vs Winnipeg Calgary vs Vegas Dallas vs Nashville San Jose vs Colorado So in each conference in WAR world, we would have had 50% of the same playoff series, and then the other two teams would merely have swapped opponents in each case. Pretty spooky. I CERTAINLY wouldn't have been this successful predicting before the season, even if I knew everything about each player individually that I do now while being ignorant of any game or standings results. Now, there are a few differences in division winners by which these series play out (Pittsburgh was the division winner in WAR world, so NYI were a WC team). And Vegas actually finished worse than Minnesota here but got in because of the playoff format. Still, fun! Edited July 10, 2019 by Randall Flagg Quote
... Posted July 10, 2019 Report Posted July 10, 2019 34 minutes ago, triumph_communes said: It also has no ability to distinguish chemistry with one player va no chemistry vs another. That's true for some of our GMs and past coaches, too. Quote
Randall Flagg Posted July 11, 2019 Report Posted July 11, 2019 (edited) 11 hours ago, SwampD said: I asked this question in the club, but guess I could put it here. I've spent a lot of time looking into this. Here's what I've found, in real time, as I've found it: It depends what kind of data we're talking about. When regression is performed for a model (like in RAPM charts or WAR stuff) I've seen the NHL's official html reports like this get scraped:http://www.nhl.com/scores/htmlreports/20172018/PL020672.HTM Because for those purposes, the main things you need to know are who was on the ice, what the score was, and when shifts start/end etc, and when events happen. The scraping code is available, you just have to dig into the references of a given model (they aren't shy about sharing what they do, it takes me hours to read through (without understanding a lot of) methods, conclusions etc). There are lots of big-data ways to combine the scraped data with other observations, whose natures I'm still looking into, like what will be discussed in the paragraph below. To give a generic answer to your first question - there are "RTSS employees" whose job it is to sit at each game and produce this stuff. We'll get into these guys with more detail later. The NHL also sources on-ice coordinates for shot events, which are the other main thing you're probably thinking of. From what I gather, the NHL isn't the only entity doing this, but when other people do analyses, they don't perpetually record every NHL game year after year, they eventually stop and write a paper with their results. For the guys you usually see here (McCurdy, Tierney, EvolvingWild etc) who create massive series of published papers on all this stuff, it appears that they generally use the NHL's data. One thing I'll say about this exploration I'm undergoing in real time - big data really does have its hands on everything, and I'm surprised at how deep and intricate this stuff goes. There are a lot of people way smarter than me who put stunning amounts of work into this stuff. I'm sifting through academic articles arguing about the impact of shot quality (implying they were using it to generate models then) from 2007. Apparently a data scientist named Ken Krzywicki was integral to the shot quality data generation in 2007? I could be picking up the context incorrectly though. I'm just kinda dumping more info here as I come across it. Apparently that Ken guy was frustrated at shot distance data being consistently under or over reported at certain venues back in 2009, and created a unique model for each arena that took this into account and allowed statistically meaningful comparisons based on the way the employees that year consistently reported. These employees are the "RTSS staff" whose job it is to do what you're basically asking about. He was successful, as far as statistical models go, at smoothing out these differences - before you could generate a model based on the RTSS staff's work, but its predictions didn't match the scoring results seen. A typical linear regression to isolate rink impacts on shot data was performed, and did its job, providing corrective factors for these tendencies. Resulting predictions matched the scoring results much better, being able to control for the "observer bias" of whoever was doing the work those years. Here's the paper: http://hockeyanalytics.com/Research_files/SQ-DistAdj-RS0809-Krzywicki.pdf I So, to the question of "well what if some people count stats differently than others" there are statistical methods that can get around observer bias and apparently have been in use at least since the Sabres last were winning playoff series. Here's a paper from a bunch of stats nerds that does zone entries - they wrote this for an analytics conference. http://www.hockeyanalytics.com/Research_files/Using Zone Entry Data To Separate Offensive, Neutral, And Defensive Zone Performance.pdf This would fall under the "individual project" category I mentioned above. Here's how they acquired data: "2 Data Collection and Assessment Each time a team advanced the puck into the offensive zone, the observers recorded a few key parameters:  The time on the clock  The player who sent the puck into the zone  The method of entry (e.g. carrying the puck in with possession, dumping it into the zone and trying to recover it, or miscellaneous other entries such as shots on goal from the neutral zone) This data was then merged with the official play-by-play, breaking the game into a series of segments from one zone entry or offensive zone faceoff to the next. The number of shots (including those that miss the net) and goals produced in each offensive zone possession were extracted from the play-by-play. This permitted assessments of each player’s contributions with the puck; to additionally identify defensive and off-puck offensive contributions, the list of players on the ice at the time of each zone entry was obtained from the official shift charts. In this manner, 330 games were tracked, covering a full season for the Flyers and Wild, a half-season for the Capitals and Sabres, and approximately 7-10 games for most other teams. For any manually-tracked data, it is important to assess the potential impact of scorer variability. Subjective assessments such as scoring chance counts can show major differences across scorers.[4] Since the puck crossing the blue line is a discrete, objective event, zone entry counts might be expected to be less problematic, but the scorers do still have a few decisions to make. The difference between carrying the puck in and dumping it in is usually clear, but the line between a pass with possession and a dump-in is occasionally tricky, as are some miscellaneous entries (e.g. when a player carries the puck back into his own zone and then turns it over). Additionally, since the goal is to assess offensive and defensive performance, plays where the offense dumps the puck in and goes for a line change without making any attempt to recover the puck were excluded, which introduces a bit more subjectivity. Several games were tracked by multiple observers. Comparing zone entry data from those games permits assessment of the integrity of the data and the viability of comparisons across data sets. Correlation matrices are given in Figure 1, indicating how often observers agreed on a given entry (more than 85% of the time) and what the most common discrepancies were (nearly two-thirds were when one observer omitted an entry that another recorded).The only significant scorer bias appears to be in the number of entries omitted; the distribution of entry types was consistent across observers and there was no apparent tendency for an observer to record his favorite team differently from what a fan of the opponent would record. Dump-and-change plays were explicitly tracked for Capitals games and were typically accompanied by having four offensive players leave the ice within five seconds. Therefore, subjectivity around omissions could be removed by recording every dump-in and algorithmically removing the dump-and-change plays from the NHL shift chart." That's how these particular guys for this particular paper tried to account for their own bias in their data. Here is a scraper that you can use on the RTSS reports if you wanted to: https://pythonhosted.org/nhlscrapi/ Here's some more work on adjusting for unreliability in RTSS event recording, from 2012-2013: http://statsportsconsulting.com/main/wp-content/uploads/Schuckers_Macdonald_RinkEffects_Final.pdf Here's how one man makes heat maps from zone charts he makes from NHL data: https://blog.icydata.hockey/2018/07/08/create-heatmaps-in-php-and-other-languages/ I just found an NHL JSON for a typical game. http://statsapi.web.nhl.com/api/v1/game/2015030411/feed/live It has the event location data!!! finally! I spent like two hours trying to find an example of how I could get a shot location from NHL data. for example, the 11th event recorded in this Sharks/Penguins game took place at coordinate (-69.0 (nice), 22) (I don't know the details of their mesh coordinates off hand, center ice is probably (0,0) with the rink going left-right). It was a wrist shot on Martin Jones by Matt Cullen, saved. THIS is the information, recorded by the RTSS guys whose job it is to do this, that gets turned into most of the charts we see. I dunno if there are other sources that track these things - other people will clearly do their own tracking for smaller projects like posted above, but I wouldn't be surprised if most of the big ones we see use this stuff. moneypuck's about section gives options to download data going back to 2008-09 if you wanted to make this stuff yourself from scratch. There are MASSIVE amounts of data here. Here's a random guy who did a bunch of work so that you can generate a shot chart of any game going back almost a decade, from these game JSON scripts or whatever theyre called. https://public.tableau.com/profile/icydata#!/vizhome/ShotChart_2/ShotChart So, now back to the question of who these people are. It's almost impossible to find details. First, people still regularly have problems with the job they do - reports of sketchy data appear common, which goes to the heart of your question. "castles made on sand" was a phrase used back in 2009, and it appears that I'd be lying to say it still isn't a concern now. It's possible @TrueBlueGED (who I think has been to some of these conferences) can provide details on how big the problem is, and what the community wants to do, or how it feels, about it. But yeah, there's no information for how many people generate this data, if they cross check each other, etc. Enough errors have been found in event logs, and the existence of the observer bias in the first place, appears to indicate that the NHL has a lot of room for improvement in this stuff. Now that I think about it, this is probably the major driving force for getting puck and player tracking chips developed - at the very least, coordinate information will become impeccable, and outright event classification much easier. I wish I had more to offer on this end, since it's basically what your question was. AHA, here's an article with some good info. https://www.nhl.com/news/off-ice-officials-are-a-fourth-team-at-every-game/c-38840 HITS is apparently what RTSS was.This is an old article, I'm sure some things have changed. I don't think shot location data was available when this article was written, that's clearly been incorporated somehow. Ultimately, it appears that caution should be applied with any stat or chart you see, because enough problems have been raised with this data that you can't assume it's all good. At the same time, I'm not sure I see reason to believe entire charts with thousands of minutes of sample size on them are useless or would be inverted with "more correct" data. These are people whose job it is to do this, after all, they aren't monkeys at typewriters or random number generators. I don't have a firm handle on what mistakes get made and how often they're made. Perhaps it's less important to note that Risto has exactly 2.3 zone exit passes per blah blah blah, and more important to look for general trends in lots of metrics, and absorb as much information (both numbers and on film) as you can to make judgments (which is a personal commandment of mine - i still cringe that y'all assume I'm just a stat head - I don't think I've even posted a RAPM chart outside of the post in which I explain how theyre made! Any chart, stat, or single video clip is pretty useless in hockey analysis, the best you can do is combine together as much information as time allows) Stat collecting appears to be about as messy and human as you'd expect it to be. Certainly not useless like one extreme would claim, and certainly not gospel like the other would. We have the info we have, it's not perfect, but it's better than nothing, and should use it responsibly. I trust the analysis done with the data more than I trust the data itself - these guys do a lot of work, and will tell you in mind-numbing detail what they did every step of the way and why! Edited July 11, 2019 by Randall Flagg 1 Quote
Randall Flagg Posted July 11, 2019 Report Posted July 11, 2019 I am dreadfully sorry for how long this is, I didn't mean it to get that way Quote
SwampD Posted July 11, 2019 Report Posted July 11, 2019 15 minutes ago, Randall Flagg said: I am dreadfully sorry for how long this is, I didn't mean it to get that way Very cool. Thanks for the legwork. My only reaction is, “All this analysis is done from data observed in real-time?!?!”,... really?! And the solution is coming up for ways to “correct” for incorrect data? How about going back and re-analyzing games accurately. I mean, it only costs $139 dollars a season to watch every game as many times as you want. While I still believe stats are useful, this information strengthens my eye test/analytics weighting, and It’s going to take a lot for me to change it. Quote
LGR4GM Posted July 11, 2019 Report Posted July 11, 2019 7 hours ago, SwampD said: Very cool. Thanks for the legwork. My only reaction is, “All this analysis is done from data observed in real-time?!?!”,... really?! And the solution is coming up for ways to “correct” for incorrect data? How about going back and re-analyzing games accurately. I mean, it only costs $139 dollars a season to watch every game as many times as you want. While I still believe stats are useful, this information strengthens my eye test/analytics weighting, and It’s going to take a lot for me to change it. And how do you weight it? Quote
... Posted July 11, 2019 Report Posted July 11, 2019 (edited) 7 hours ago, SwampD said: Very cool. Thanks for the legwork. My only reaction is, “All this analysis is done from data observed in real-time?!?!”,... really?! And the solution is coming up for ways to “correct” for incorrect data? How about going back and re-analyzing games accurately. I mean, it only costs $139 dollars a season to watch every game as many times as you want. While I still believe stats are useful, this information strengthens my eye test/analytics weighting, and It’s going to take a lot for me to change it. Your thinking is all wrong, sorry, IMHO. Just because they talk about flawed or missing data, it doesn't mean the entire data set is inaccurate or wholly flawed, or that the results are more novel than useful. These are people who labor over minutiae talking about the activity they obsess over. What would be be perfectly acceptable results outside of this group are garbage on the inside. @Randall Flagg isn't doing you a favour by providing the narrative above, unfortunately, especially if this is your response. Naïveté of the subject will prevent you from appreciating just how detailed and useful this data is. What's more, this type of data is collected everywhere now against many, many things (no doubt you've heard of "Big Data" and "The Internet of Things"). This type of data/processing is the source of machine learning and artificial intelligence. It doesn't assimilate you into the Borg, but it does allow for smarter decisions made quicker. It also opens the door for a discussion of the merits of fancy stats here in this thread. We've hashed out fancy stats on Sabrespace countless fricken times. If anyone is going to criticize fancy stats because the observations were made by people in real time, well, I have news for that person/anyone: all of our science started that way and most of it continues to be done that way. The flaws inherent in hockey fancy stats are no different than the flaws in science, and in life, for that matter. To poo-poo them because of the flaws is like a child crying over having to grow up and be an adult. IMHO. ? ? ?️♀️ ? Edited July 11, 2019 by ... clarification. Quote
SwampD Posted July 11, 2019 Report Posted July 11, 2019 13 minutes ago, ... said: Your thinking is all wrong, sorry. Just because they talk about flawed or missing data, it doesn't mean the entire data set is inaccurate or wholly flawed, or that the results are more novel than useful. These are people who labor over minutiae talking about the activity they obsess over. What would be be perfectly acceptable results outside of this group are garbage on the inside. @Randall Flagg isn't doing you a favour by providing the narrative above, unfortunately, especially if this is your response. Naïveté of the subject will prevent you from appreciating just how detailed and useful this data is. What's more, this type of data is collected everywhere now against many, many things (no doubt you've heard of "Big Data" and "The Internet of Things"). This type of data/processing is the source of machine learning and artificial intelligence. It doesn't assimilate you into the Borg, but it does allow for smarter decisions made quicker. It also opens the door for a discussion of the merits of fancy stats here in this thread. We've hashed out fancy stats on Sabrespace countless fricken times. If you're going to criticize fancy stats because the observations were made by people in real time, well, I have news for you: all of your science started that way and most of it continues to be done that way. The flaws inherent in hockey fancy stats are no different than the flaws in science, and in life, for that matter. To poo-poo them because of the flaws is like a child crying over having to grow up and be an adult. IMHO. ? ? ?️♀️ ? I am not naive on this topic at all and I do have the same problem with “Big Data” and “The internet of things.” In most cases I think it’s a scourge on our society. Don't you find it ironic that for being so meticulous, and “laboring over the minutia,” as you have pointed out, they would not be detailed enough to care about how the data is actually collected. Once something is established, it is very hard to change and If you are asking the wrong questions and just keep collecting data over and over, what are you accomplishing? You just keep getting more and more data telling you you are wrong. Quote
... Posted July 11, 2019 Report Posted July 11, 2019 (edited) 8 minutes ago, SwampD said: In most cases I think it’s a scourge on our society. Don't you find it ironic that for being so meticulous, and “laboring over the minutia,” as you have pointed out, they would not be detailed enough to care about how the data is actually collected. Once something is established, it is very hard to change and If you are asking the wrong questions and just keep collecting data over and over, what are you accomplishing? You just keep getting more and more data telling you you are wrong. A scourge on society? You could have stopped there, having said all you need to have said. There is no irony. I just explained why to the layman, no less someone who thinks data collection is a "scourge on society", a peek into the inner world can lead to erroneous conclusions. If you were to poke around the inner world of surgeons, for example, you might find fault with robot-assisted surgeries or non-invasive techniques. And then you would prefer for heart-bypasses the patient is sliced open from foot to sternum. On the third point, without collecting the first set of data and analyzing it, they have no idea what additional data, or changes to the collection process, they need to make. That's how it works. It's science. Edited July 11, 2019 by ... my binary was missing two bytes. whatever Quote
North Buffalo Posted July 11, 2019 Report Posted July 11, 2019 6 minutes ago, SwampD said: I am not naive on this topic at all and I do have the same problem with “Big Data” and “The internet of things.” In most cases I think it’s a scourge on our society. Don't you find it ironic that for being so meticulous, and “laboring over the minutia,” as you have pointed out, they would not be detailed enough to care about how the data is actually collected. Once something is established, it is very hard to change and If you are asking the wrong questions and just keep collecting data over and over, what are you accomplishing? You just keep getting more and more data telling you you are wrong. Data collection, how and by whom... ie their inherent bias is always an underrated problem that is tough to weed out and in many respects intractable because it is the only way to collect it and tough to detect flaws even when standards are applied. Part of the reason political polling is so inaccurate. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.