to err is human

by Roderick

leak_trophy
Photo by Flickr user chasingfun/Mark Trammell

Every fall, the 120 teams in the NCAA Football Bowl Subdivision (FBS) play 12 or so weeks of college football. At the end of this regular season, the Bowl Championship Series (BCS) releases its final rankings; the teams ranked 1 and 2 are awarded the privilege of competing for the BCS National Championship.

And that’s it.1

The other bowl games select their participants in rather arbitrary fashion, whether by historical conference affiliations (most famously the venerable Rose Bowl Game, which historically pits a team from the West/PCC/AAWU/Pac-8/10/12 against one from the East/Big Nine/Ten), by selecting the best teams available (the bowls have an arcane but ostensibly logical selection hierarchy), or simply by ignoring all traditional rankings and picking the most financially lucrative matchup for the bowl game itself.

The nature of the championship (a single game between teams ranked 1 and 2 by the BCS) is rather frustrating because in almost all forms of competition the custom is typically to determine the champion by an elimination tournament. The college football model seems not only arbitrary, but unjustifiably so; often more than two teams (maybe many more) can make a reasonable case for being in the championship game. Consequently, the BCS receives considerable and (in my opinion) completely deserved criticism.

What baffles me the most, however, is the disdain for the use of computer models by the BCS. If anything, they are (or ought to be)2 the best part of the entire college football circus.

In brief, the BCS gives equal weight to the Harris Interactive Poll (a media poll), the USA Today Coaches Poll, and the average of the middle four of six computer models in determining the BCS rankings. The computer models thus account for one third (1/3) of the result.

It is extremely difficult for humans to make dispassionate analyses. We struggle to identify the sources of our own biases, we subconsciously process information selectively, and we make mistakes. Computers do none of these things. They perform no more or less than the tasks with which they are entrusted, barring technical errors (which are exceedingly uncommon). Moreover, the decisive element of the “computer rankings” of the BCS is not the computers themselves (modern computers being more or less fungible), it is the mathematical formulae by which the rankings are computed. The entire endeavor can only be criticized on the basis of the soundness of said formulae.

And therein lies my primary objection to the way the BCS implements computer rankings, an objection that can hardly be expressed more eloquently or scathingly than Bill James already did in an article in 2009. What the BCS has right now is not a good representation of what mathematical and statistical modeling has to offer for college football, so to criticize it on the basis of its performance is akin to criticizing automobile safety on the basis of a 2007 Brilliance BS6 crash test. The computer models are hampered neither by any flaw inherent to the concept of computer rankings, nor by a lack of football knowledge on the part of their creators. Their shortcomings are symptomatic of an institutional sluggishness on the part of college football, wherein age-old truisms supersede contradictory evidence.

That most of the six computer models employed by the BCS are run by individuals who like the current system is not insignificant. Some of the justifications for the considerable role of human polls in the BCS ranking are downright silly. This gem appeared in a Daily Fix (a Wall Street Journal sports blog) post about the BCS computer models:

[Jeff Anderson, co-creator of the Anderson & Hester computer ranking] argues that human voters are better equipped to judge scores, and distinguish between a 24-14 game where the losing team scores two touchdowns in garbage time and a 24-14 team where the losing team trailed by three late but threw an interception returned for a touchdown while attempting to mount a game-winning drive. “If margin of victory is going to be included in any part of the rankings, it should be included only in the subjective part,” Anderson says. Others point out that in many other sports, playoff seedings are determined solely by won-loss record, and the computer rankings account for the unique nature of college football by accounting for strength of schedule.

“It’s a matter of sportsmanship,” [Bill Hancock, executive director of the BCS] says. ”You don’t want a team to run up the score on their opponents, merely so they can move up in the computer rankings.” [1]

So instead of giving the computer models the freedom to employ the soundest methods, the BCS bars them from considering the margin of victory, ostensibly to encourage sportsmanship. Yet it gives two thirds of the vote to humans, who will vote not only on the basis of margin of victory, but really on the basis of whatever the hell they feel like. How is that any more fair? And Jeff Anderson, are you sure computers can’t tell the difference between garbage time and a late win?

I would argue that most people vastly overestimate the value of human polls and desperately underestimate the extent of human biases, particularly their own. If you perceive a computational model to be biased, I can assure you it is not (unless it’s Richard Billingsley’s, but that’s for another time). You are biased.

From 2001 to 2004, the BCS gradually eliminated the the use of margin of victory in its computer models. It also doubled the weight of human polls (from 1/3 to 2/3) in 2004, largely in response to the controversy of a split championship between the BCS and the AP poll. The message sent by the BCS (and much of the media, and pretty much everyone else who supported the change) was that the computer models exist only to corroborate and legitimize the human polls. When the computer models diverge meaningfully from human polls or the hopelessly vague and utterly uninformative “eyeball test,” they are made the scapegoat and forced to fall in line.

Throughout this process, we’ve met the most resistance from the computer people,” [Grant Teaff, executive director of American Football Coaches Association] said. “But that’s their deal. They talk about numbers and figures, and we talk about our responsibility to the game and responsibility to coaches and players emotionally. And besides, the polls that are done by the coaches and the writers will probably still make margin of victory a factor still anyhow. [2]

Responsibility to the game and coaches and players emotionally? What does that even mean? This quotation says everything you need to know about the BCS. Yes, the polls will indeed probably still make margin of victory, and the relative strength of the conferences in 1997, and in which time zone the games were played, and how the outcome will impact the coach’s own national championship game, and whether the team’s conference is spelled SEC, and on which team a writer’s son is a third-string kicker, a factor. And they will do it arbitrarily, without telling you. And if the computers don’t match the completely transparent and fair gold standard set by the polls, it’s because they were programmed by some scrawny, glasses-wearing, pocket-protecting brainiac at MIT who doesn’t know anything about what it’s like to coach or play football. Right?

References
[1] Drehs, Wayne. “BCS figures new formula makes for a better title game.” ESPN.com, July 12, 2001. Accessed December 8, 2011. http://static.espn.go.com/ncf/s/2001/0712/1225482.html.
[2] Bialik, Carl. “College Football’s Top Six Computers.” Wall Street Journal Blogs, December 8, 2011. Accessed December 8, 2011. http://blogs.wsj.com/dailyfix/2011/12/08/college-footballs-top-six-computers/.


1Okay, well, other polls (notably the Associated Press, a fascinating tale in its own right) rank teams outside of the BCS, and it is possible for the final AP champion to differ from the BCS champion, but the latter arguably carries more weight de facto.
2If all of the computer models employed were methodologically sound, I would not qualify this statement; sadly this is not currently the case, for all the reasons outlined above.