Advertisement

Computer model has Calgary and Hamilton winning divisions: professor discusses how he built it

A computer model has Zach Collaros and the Tiger-Cats favoured to finish first in the East. (Darren Calabrese/The Canadian Press.)
A computer model has Zach Collaros and the Tiger-Cats favoured to finish first in the East. (Darren Calabrese/The Canadian Press.)

The CFL has posted a computer model's predictions for who will finish first in each division at the end of the regular season over the last couple of weeks, but until now, those predictions have been a bit of a black box. We can see the results (Calgary and Hamilton are favoured as of Monday's update, with 94.8 and 59.0 per cent chances respectively of finishing at the top their divisions), but while the page mentions what criteria the model considers and a bit of its method (it runs 1,000 simulations of the remaining schedule, then probabilities are assigned based on how many times in 1,000 each team comes out on top of the division), it doesn't even include who created the model, much less what kind of relative or absolute weights it assigns to each category. Fortunately, Global Saskatoon's Lisa Dutton has solved at least part of that issue, sitting down with the model's creator, University of Saskatchewan associate professor Keith Willoughby. Here's her interview with Willoughby about the model:

There are some lines in there that are a bit grating from a statistics nerd's point of view, particularly the comparison to Moneyball that Willoughby brings up and the "You're the CFL's Brad Pitt!" that Dutton responds with; it's unfortunate that a book about exploiting inefficient markets has been turned into a catch-all reference for any sort of analytics, and that it's Hollywood star Pitt who's referenced instead of the man he plays who had a much larger role in analytics' rise, Oakland Athletics' general manager Billy Beane. (Then again, Beane received a heck of a lot of misplaced credit/blame relative to Moneyball author Michael Lewis, so I guess turnabout is fair play.)

The predictive model here actually has nothing at all to do with Moneyball beyond just involving numbers, as it's not trying to develop productive in-game strategies or find undervalued players. However, the future ideas Willoughby discusses later of evaluating individual players and optimal replacements do have more of a connection there, and he only brings Moneyball up after discussing those, so he can't be blamed too much. Similarly, Dutton can't really be criticized too much for the Pitt mention; this is a general-audience newscast, and that's probably a useful point of reference for many viewers. Overall, it's nice to see CFL analytics get some exposure.

How good are these predictions? Willoughby doesn't really go into a ton of details on-air about how the model works, but we do have his components from the CFL.ca page

The model considers the following:
•             Each team’s current win-loss record
•             Opponents already played (including whether it was a home or away game)
•             Margin of victory (or loss) in games previously played in the season
•             Remaining opponents to be played (including whether those games are home or away)  

The model calculates each team’s probability of victory in each remaining game.  It then simulates 10,000 replications of the remaining regular season schedule.  The first place team in each division is the one with the most regular season wins.  For each replication, the model keeps track of which team finished first.  

For instance, if Winnipeg finished first in the Western Division in 1,990 of the 10,000 replications, then its first place probability is 1,990 / 10,000 = 19.90%.

The model is updated weekly based on the results of games played that week.

That seems like a pretty reasonable way to do this, even if we don't know exactly how each piece is weighted. Those elements (record, margin of victory, home/road and schedule) are all important, and they're considered in other systems like RPI and SRS, which we've discussed for CFL predictive purposes before. (Rob Pettapiece, who did those calculations, now works in the Toronto Maple Leafs' analytics department.)

On the whole, Willoughby's system might be stronger than either raw RPI or SRS, as it combines home/road (which RPI considers and SRS doesn't) and margin of victory (which SRS considers and RPI doesn't). However, he did tell Dutton that he doesn't yet include a recency factor (which RPI does) and is hoping to put that in next year. Recency factors are debatable; yes, they may reflect a team's current form (and current health) more accurately than season-long results (for example, as he notes, the Roughriders aren't the same team since Darian Durant was hurt), but they also put much more weight on individual games, and that's not always desirable.

In general, this looks like a relatively solid way to predict division winners, and the 1,000 simulation approach in particular is a good way to do this; it's a large enough sample size to get a good idea of what scenarios are most likely. Willoughby's model's conclusions have some notable and logical elements that you wouldn't just get from the standings, too. For one, he gives Toronto a higher chance (27.1 per cent) of winning the East than Montreal (13.9 per cent) despite the Alouettes having more wins (5 to 4). That's because the Argonauts have a better point differential (-17 versus -68), more games left (six versus five) and more home games left (five versus two). This speaks to the advantages of models like this.

Willoughby's stated future goal of evaluating individual players and replacements may be more difficult. Some good work has been done on that front in the NFL from the likes of Football Outsiders and Pro Football Focus, but those metrics are a long way from perfect, especially as evaluations of individual players, and adapting them to the CFL game could prove very tough. FO's algorithmic approach might work with some substantial tweaking, but PFF's grading of individual players from tape would be almost impossible in a league that doesn't usually make All-22 (or in the CFL's case, All-24) film available to the media. The idea of evaluating potential replacements might be especially difficult, as American players often take a while to adapt to the Canadian game (but that time isn't always consistent from player to player), and some guys with next-to-no U.S. pedigree become CFL stars, while others who dazzled south of the border fizzle north of it.

Still, the idea of more quantitative analysis of Canadian football is a great thing from this corner. Moreover, the model Willoughby has come up with so far appears to be a solid step in that direction, and while his future plans may seem difficult to achieve, his success with this predictive model may lead to other successes down the road. Credit to him for developing this, to the CFL for publishing and promoting it, and to Dutton and Global Saskatoon for being willing to talk about Canadian football analytics.