NHL slammed for ‘catastrophically’ bad advanced stats, listens to critics

Puck Daddy
NHL slammed for ‘catastrophically’ bad advanced stats, listens to critics
NHL slammed for ‘catastrophically’ bad advanced stats, listens to critics

When it came to the National Hockey League’s new advanced stats site, a few things did not compute for Travis Yost. 

OK, more than a few things.

“There is no reason to go to NHL.com for anything related to hockey statistics. Their numbers are inaccurate. If not inaccurate, they are misleading. If not inaccurate or misleading, they aren’t capturing what they believe they are capturing,” wrote Yost, an analytics writer for TSN and co-host of the PDOcast, an analytics heavy podcast.

Yost’s screed against the NHL.com advanced stats site went viral, echoing and summarizing the concerns from many hockey fans about the League’s approach to analytics and its partnership with SAP, which was announced with much fanfare in February.

[Play Yahoo Daily Fantasy and get a 100% deposit bonus with your first deposit]

That's when they added an “enhanced stats” section added to NHL.com, that featured metrics on puck possession and zone starts. But the partnership was widely criticized by the advanced stats community, from the League’s decision to do away with traditional stat names like “Corsi” to some specious boasts from SAP about its number crunching – as Yost notes, there was a “comical” claim that “they had built a predictive model that could accurately select 85% of post-season winners.”

Quietly, references to that model have been scuttled.

“The NHL should terminate their partnership and find someone who actually cares about the work they are doing. And the SAP should find more big business to sink their teeth into before the well dries up. Monorail sales have never been better,” Yost wrote.

Ouch.

In an interview with Puck Daddy on Monday, Yost didn’t mince words on NHL.com’s advanced stats site.

“As for right now, no, they are not a good source. In fact they are the worst hockey stats source I can ever recall, and it's not particularly close. There are data integrity issues everywhere. They don't seem to know what they are scraping. They don't seem to know what is and isn't relevant. The visualization is atrocious. They add no value at present time. And sadly they are getting precisely zero help from their business partner in SAP in all of this, who should know better,” he said.

Chris Foster felt the burn of Yost’s critique all way up in the NHL’s front office.

“The first couple of points, those were great. Fantastic feedback. But the rest of the criticisms … I felt he was trying to pile on and paint the entire site as problematic, when he found a couple of issues that were corrected quickly,” said Foster, the NHL’s director of digital media.

“I would call that a very large generalization not based on truth. It’s not an accurate assessment of the site.”

Between Yost’s screed and Foster’s defense we find some common ground: There were some basic issues with the NHL’s fancy new stats site, and it was that criticism that prompted their correction.

***

Problem No. 1: Goalies On The Kill

Yost noted that, according to NHL.com’s stats, 17 goalies had not given up a goal on the penalty kill. Which is frankly impossible, given the stats for the League’s power play efficiency.

This was a valid concern and an easy fix. “We basically had the drop down menu backwards,” said John Dellapina of the NHL. “We had the menu labeled as ‘shorthanded’ when facing shorthanded shots.”

The NHL now has a save a goalie makes while his team is shorthanded properly labeled.

Problem No. 2: NHL’s stats database currently features completely inaccurate/randomly generated numbers for team-level shot statistics.

Yost ran a chart that showed the wild disparity between the NHL’s numbers detailing Corsi-For (shots on goal, missed, or blocked) per 60 minutes of even strength time:

Yost
Yost

Yikes, right?

Turns out the disparity is a difference in philosophy.

According to the NHL, some stats sites use “the average number of even strength minutes in a game” while the NHL numbers are based on 60 minutes of even strength hockey. The NHL argues that while its numbers weren’t in sync with those of the stats sites, they all led to the same conclusions about teams.

“It’s like Celsius and Fahrenheit,” said Dellapina. “If you look at the chart, you see that worst teams are still the worst teams and the best teams are still the best teams.”

That said, the NHL is making sure its methodology is in sync with other stats sites after Yost’s call out. “All of the data is correct, but we just used different standards for time on ice. It was good feedback, so we’re making that adjustment,” said Foster.

Problem No. 3: Zone Starts

The NHL’s numbers on where plays begin or end were wildly inaccurate when compared to advanced stats sites. In one example, the Carolina Hurricanes started 39 percent of their plays in the attacking zone according to independent stats sites, but the NHL had them at just over 30 percent.

“NHL’s stats database has either inverted the faceoff count or is calculating zone starts through, again, randomly generated numbers,” wrote Yost.

Turns out it’s the former, and it’s human error.

“In two or three arenas, we put the officials who record this stuff on the opposite side of the ice. It’s an X and Y coordinates input, not offensive or defensive. For all games in those arenas, it was flipped. Derek Stepan was supposed to have 15 offensive zone starts, and instead he had 15 defensive zone starts,” said Dellapina.

So the NHL has rectified this as well. But the question remains: How can mistakes like this happen on the League’s official site when so many other stats sites, scraping the same data, are accurate?

“I'm guessing the more likely answer is they didn't spend five minutes of their time to realize that the sheets list zones relative to the home team. Other independent databases haven't had a single issue calculating zone starts, and they are scraping from the same exact resource. Again, you can call it what you want: laziness, ignorance, an accident. These things shouldn't happen to a billion dollar enterprise if hobbyists can get it right,” said Yost in an interview on Monday.

***

One of the primary attacks on the NHL’s “enhanced stats” site was that it was a poor attempt to replicate what other stats sites had already perfected. In some cases, the metrics they chose to use have been already tossed aside by cutting-edge analytics analysts, and have been criticized as being behind the curve.

Take “close” stats for example, defined by War On Ice as “situations when the game is within 1 goal (1st and 2nd periods) or tied (3rd period or overtime). It’s a stat cited by many when writing about puck possession, but it’s been vetted and diminished by many in the analytics community.

“Years ago, smart people recognized that simply throwing out data for the sake of correcting for score effects was inefficient. We started using score adjusted stats at the team-level as far back as 2012. It was reaffirmed as a superior approach in terms of repeatability and predictability in 2014. Anyone who has spent 10 minutes on the internet looking up hockey stats is by and large familiar with this work. I don't know anyone who has cited FenwickClose% or CorsiClose% in years for these very reasons,” said Yost.

Foster argues the jury is still out on “close” stats, which is why the NHL uses them. “It’s one variation. It’s one context. You can use it or you can choose to ignore it. It’s fair to say that some sites are phasing it out, but we’re putting it out there. I think it’s up for debate. I don’t think there’s been anything that definitive,” he said.

Yost sees the “close” debate as part of a larger problem with the NHL and SAP project.

“It's not just about 'Close' stats. It's about paying attention and knowing what's already been done. They were years behind the curve and once someone finally realized that they needed to catch up (and fast), they rushed the entire thing without thinking or vetting anything that lived on the web site,” he said.

“The whole thing has been a catastrophe.”

***

From the start of the project, there’s been an adversarial relationship between the NHL and the advanced stats community. It started with a slight change in language within the NHL.com terms of service that seemed to target the way advanced stats sites gather their data. It continued when the NHL, for whatever reason, didn’t involve the established sites and the smartest analytics minds in helping to craft their “enhanced stats” site or rolling it out. 

Besides the obvious fact that, in essence, hockey fandom’s garage band had started playing stadiums.

“I get the initial antagonism. Once the league starts doing it, it’s not as exclusive. It becomes a little more mainstream,” said Foster.

“We don’t feel that we’re in competition with any other sites. The hockey analytics sites are the trailblazers. They’ve been doing it for years before us. We don’t want to take anything away from them. We just have in some cases a larger reach, and just want to make these stats as accessible as possible.”

Both Yost and Foster are hopeful that this project can eventually get to where it was promised to go, bringing advanced to the mainstream and increasing the quality of the data.

“I cannot emphasize this enough: the NHL needs someone to vet everything that SAP's doing. It's really that simple,” said Yost. “When you don't have someone who is familiar with the work that's out there, you get things like 'Big Data Can Predict 85% of All Games' produced on league web sites. Get that person in place, get player tracking data going in a year or so, and I become genuinely hopeful that they can become an excellent source for hockey stats -- not dissimilar to what has occurred with the NBA and NBA.com. There are smart people who work (or worked) on this project. They just didn't listen to them.”

Foster says that everything SAP does is signed-off on by the NHL. 

Although it stings, Foster accepts the criticisms from Yost and others in the analytics community. Because as much as these oversights and wrong turns have earned NHL.com its scorn, he hopes they eventually make the SAP-driven advanced stats pages better.

“The fan feedback has been vital to making improvements to the site. Our goal is a spirit of collaboration,” he said.

“We’re not in competition. We’re not trying to take traffic away from other sites or shut down other sites. We want to be part of the conversation as well. And we have a big voice.”

____
Greg Wyshynski is a writer for Yahoo Sports. Contact him at puckdaddyblog@yahoo.com or find him on Twitter. His book, TAKE YOUR EYE OFF THE PUCK, is available on Amazon and wherever books are sold.

LISTEN TO YAHOO SPORTS' HOCKEY PODCAST!

What to Read Next