prolost

vivere est cogitare

Category: ἐπιστήμη

to err is human

leak_trophy
Photo by Flickr user chasingfun/Mark Trammell

Every fall, the 120 teams in the NCAA Football Bowl Subdivision (FBS) play 12 or so weeks of college football. At the end of this regular season, the Bowl Championship Series (BCS) releases its final rankings; the teams ranked 1 and 2 are awarded the privilege of competing for the BCS National Championship.

And that’s it.1

The other bowl games select their participants in rather arbitrary fashion, whether by historical conference affiliations (most famously the venerable Rose Bowl Game, which historically pits a team from the West/PCC/AAWU/Pac-8/10/12 against one from the East/Big Nine/Ten), by selecting the best teams available (the bowls have an arcane but ostensibly logical selection hierarchy), or simply by ignoring all traditional rankings and picking the most financially lucrative matchup for the bowl game itself.

The nature of the championship (a single game between teams ranked 1 and 2 by the BCS) is rather frustrating because in almost all forms of competition the custom is typically to determine the champion by an elimination tournament. The college football model seems not only arbitrary, but unjustifiably so; often more than two teams (maybe many more) can make a reasonable case for being in the championship game. Consequently, the BCS receives considerable and (in my opinion) completely deserved criticism.

What baffles me the most, however, is the disdain for the use of computer models by the BCS. If anything, they are (or ought to be)2 the best part of the entire college football circus.

In brief, the BCS gives equal weight to the Harris Interactive Poll (a media poll), the USA Today Coaches Poll, and the average of the middle four of six computer models in determining the BCS rankings. The computer models thus account for one third (1/3) of the result.

It is extremely difficult for humans to make dispassionate analyses. We struggle to identify the sources of our own biases, we subconsciously process information selectively, and we make mistakes. Computers do none of these things. They perform no more or less than the tasks with which they are entrusted, barring technical errors (which are exceedingly uncommon). Moreover, the decisive element of the “computer rankings” of the BCS is not the computers themselves (modern computers being more or less fungible), it is the mathematical formulae by which the rankings are computed. The entire endeavor can only be criticized on the basis of the soundness of said formulae.

And therein lies my primary objection to the way the BCS implements computer rankings, an objection that can hardly be expressed more eloquently or scathingly than Bill James already did in an article in 2009. What the BCS has right now is not a good representation of what mathematical and statistical modeling has to offer for college football, so to criticize it on the basis of its performance is akin to criticizing automobile safety on the basis of a 2007 Brilliance BS6 crash test. The computer models are hampered neither by any flaw inherent to the concept of computer rankings, nor by a lack of football knowledge on the part of their creators. Their shortcomings are symptomatic of an institutional sluggishness on the part of college football, wherein age-old truisms supersede contradictory evidence.

That most of the six computer models employed by the BCS are run by individuals who like the current system is not insignificant. Some of the justifications for the considerable role of human polls in the BCS ranking are downright silly. This gem appeared in a Daily Fix (a Wall Street Journal sports blog) post about the BCS computer models:

[Jeff Anderson, co-creator of the Anderson & Hester computer ranking] argues that human voters are better equipped to judge scores, and distinguish between a 24-14 game where the losing team scores two touchdowns in garbage time and a 24-14 team where the losing team trailed by three late but threw an interception returned for a touchdown while attempting to mount a game-winning drive. “If margin of victory is going to be included in any part of the rankings, it should be included only in the subjective part,” Anderson says. Others point out that in many other sports, playoff seedings are determined solely by won-loss record, and the computer rankings account for the unique nature of college football by accounting for strength of schedule.

“It’s a matter of sportsmanship,” [Bill Hancock, executive director of the BCS] says. ”You don’t want a team to run up the score on their opponents, merely so they can move up in the computer rankings.” [1]

So instead of giving the computer models the freedom to employ the soundest methods, the BCS bars them from considering the margin of victory, ostensibly to encourage sportsmanship. Yet it gives two thirds of the vote to humans, who will vote not only on the basis of margin of victory, but really on the basis of whatever the hell they feel like. How is that any more fair? And Jeff Anderson, are you sure computers can’t tell the difference between garbage time and a late win?

I would argue that most people vastly overestimate the value of human polls and desperately underestimate the extent of human biases, particularly their own. If you perceive a computational model to be biased, I can assure you it is not (unless it’s Richard Billingsley’s, but that’s for another time). You are biased.

From 2001 to 2004, the BCS gradually eliminated the the use of margin of victory in its computer models. It also doubled the weight of human polls (from 1/3 to 2/3) in 2004, largely in response to the controversy of a split championship between the BCS and the AP poll. The message sent by the BCS (and much of the media, and pretty much everyone else who supported the change) was that the computer models exist only to corroborate and legitimize the human polls. When the computer models diverge meaningfully from human polls or the hopelessly vague and utterly uninformative “eyeball test,” they are made the scapegoat and forced to fall in line.

Throughout this process, we’ve met the most resistance from the computer people,” [Grant Teaff, executive director of American Football Coaches Association] said. “But that’s their deal. They talk about numbers and figures, and we talk about our responsibility to the game and responsibility to coaches and players emotionally. And besides, the polls that are done by the coaches and the writers will probably still make margin of victory a factor still anyhow. [2]

Responsibility to the game and coaches and players emotionally? What does that even mean? This quotation says everything you need to know about the BCS. Yes, the polls will indeed probably still make margin of victory, and the relative strength of the conferences in 1997, and in which time zone the games were played, and how the outcome will impact the coach’s own national championship game, and whether the team’s conference is spelled SEC, and on which team a writer’s son is a third-string kicker, a factor. And they will do it arbitrarily, without telling you. And if the computers don’t match the completely transparent and fair gold standard set by the polls, it’s because they were programmed by some scrawny, glasses-wearing, pocket-protecting brainiac at MIT who doesn’t know anything about what it’s like to coach or play football. Right?

References
[1] Drehs, Wayne. “BCS figures new formula makes for a better title game.” ESPN.com, July 12, 2001. Accessed December 8, 2011. http://static.espn.go.com/ncf/s/2001/0712/1225482.html.
[2] Bialik, Carl. “College Football’s Top Six Computers.” Wall Street Journal Blogs, December 8, 2011. Accessed December 8, 2011. http://blogs.wsj.com/dailyfix/2011/12/08/college-footballs-top-six-computers/.


1Okay, well, other polls (notably the Associated Press, a fascinating tale in its own right) rank teams outside of the BCS, and it is possible for the final AP champion to differ from the BCS champion, but the latter arguably carries more weight de facto.
2If all of the computer models employed were methodologically sound, I would not qualify this statement; sadly this is not currently the case, for all the reasons outlined above.

consequence

I heard a mention of an interesting story this weekend on NPR’s fantastic Wait Wait… Don’t Tell Me! about an Irish man who saved Adolf Hitler’s life. It’s one of the best good news…bad news stories I’ve come across.

It also highlights one of the biggest reasons why I think a robustly consequentialist moral framework is not workable. Perfectly successful consequentialism requires exact knowledge of the outcome of every action ad infinitum. And in the case of a probabilistic model (i.e. taking actions most likely to lead to positive consequences), the likelihood of attaining the desired outcome decreases accordingly. Thus, in the absence of perfect knowledge, consequentialism actually fails to achieve its intended outcome (to wit, the maximization of some desirable quantity). By its own metric, consequentialism is at best very flawed.

Whereas consequentialism almost necessarily fails to satisfy its own criterion, well-conceived deontological approaches ought not to present such a paradox. The “duty” of the deontologist is simply to adhere to deontology.

This is not to say that deontology is better than consequentialism. We all care about the consequences of our actions to (at least) some degree. But robust, exclusive consequentialism just makes no sense in a world where it is impossible to know perfectly what will happen as a result of our actions.

july 5

God speed the year of jubilee
The wide world o’er!
When from their galling chains set free,
Th’ oppress’d shall vilely bend the knee,
And wear the yoke of tyranny
Like brutes no more.
That year will come, and freedom’s reign,
To man his plundered rights again
Restore.

God speed the day when human blood
Shall cease to flow!
In every clime be understood,
The claims of human brotherhood,
And each return for evil, good,
Not blow for blow;
That day will come all feuds to end,
And change into a faithful friend
Each foe.

God speed the hour, the glorious hour,
When none on earth
Shall exercise a lordly power,
Nor in a tyrant’s presence cower;
But to all manhood’s stature tower,
By equal birth!
That hour will come, to each, to all,
And from his Prison-house, to thrall
Go forth.

Until that year, day, hour, arrive,
With head, and heart, and hand I’ll strive,
To break the rod, and rend the gyve,
The spoiler of his prey deprive –
So witness Heaven!
And never from my chosen post,
Whate’er the peril or the cost,
Be driven.

watchmaker

This photo needs to be viewed in really large format to be fully appreciated. The sheer size and beauty of the universe are simply staggering. Reality trumps fiction any day (sorry BSG, you’re still pretty cool).

The Watchmaker analogy is totally broken, but it’s easy to see why such a sentiment is appealing; our world is amazing.

MKAILVVLLY

Those of you who do not live directly beneath a rock may have heard about this whole “swine flu” thing. Unfortunately, there is a considerable amount of misinformation and confusion in the public consciousness, and the media at large seems not to be helping much in the panic-mitigation department.

So before you start building your vault, a few points to keep in mind:

1. First of all, calm down.

2. There is still no compelling reason to believe that this strain, influenza A(H1N1)1, is significantly more virulent than a typical seasonal influenza.

Your run-of-the-mill flu season has a case-fatality ratio of very roughly 0.1%, or 32% of hospitalizations [1]. Let’s narrow that to the 19-to-64 demographic, which could be most susceptible to this current outbreak (an unusual pattern seen in pandemic flus and likely caused by an overly robust immune response in healthy adults [2]), and is least susceptible to the seasonal flu. Within that population, CFR is about 0.03%, or 7% of hospitalizations [1]. Past influenza pandemics have had CFRs of anywhere from 0.1% in the 1957 and 1968 outbreaks to 2.5%2 in the 1918 “Spanish flu” [3].

In contrast, the CFR in the case of influenza A(H1N1) could be anywhere from 3.1% (an upper bound, based on a maximum of 8 laboratory-confirmed influenza A(H1N1) deaths out of a minimum of 257 laboratory-confirmed influenza A(H1N1) cases worldwide, from WHO figures available at time of writing) to 0.0016% (a very conservative lower bound, based on an approximate hospitalization rate of 0.4% of all cases in the 19-64 demographic in a typical flu season [1], with which an attack rate was extrapolated from 2000 estimated hospitalizations in Mexico).

Using figures that are quite popular in the press gives a CFR of about 7.5% in Mexico (some 150 deaths in 2000 hospitalizations, the latter very dubiously assumed to be equal to the number of cases). Because of the unreliability of the “suspected” case count in Mexico, I am not convinced that this particular CFR estimate is useful at all, even as an upper bound. It’s far more likely that the actual CFR falls somewhere between 0.0016% and 3.1%.

All of these numbers don’t tell us very much (except that it is highly unlikely that this is some epic killer virus), but that’s exactly the point. Just because (thanks in large part to the surveillance infrastructure put into place in the wake of the “avian flu” panic) this (potential) pandemic has been spotted, there is no reason to assume that we have any solid evidence suggesting that the virulence of this pathogen is particularly high. However, this may very well change as time goes on and as the situation becomes clearer, and it certainly does not mean that the virus is not dangerous.

3. Virulence is not the same as pathogenicity. Perhaps more precisely, the concepts are not the same, though the terms may often become scrambled in the fray. The salient point is that while influenza A(H1N1) has proven highly pathogenic (i.e. it is highly infectious and spreads rapidly), there is not much evidence to suggest that it is especially virulent (i.e. it has not been associated with unusually high mortality or morbidity). So while governments everywhere are preparing for the possibility of a pandemic, the severity of the disease (to wit, the “causing serious illness” criterion from the linked WHO document) is far from clear at this point. And hopefully I was able to convince you in Point 2 that there is as yet no reason to suspect any greater virulence from this strain than a typical seasonal flu strain.

4. Influenza A(H1N1) has a few key differences to Severe Acute Respiratory Syndrome (SARS) and influenza A(H5N1) or “avian flu”. For one, both SARS and avian flu were much deadlier; the SARS outbreak in Hong Kong had a CFR of about 14-17% [4], while the avian flu has a CFR of something like 14-33% [3]. However, avian flu never demonstrated efficient human-to-human transmission, which made it a very deadly disease that was unlikely to spread quickly. Likewise, SARS has never been observed to be contagious before the onset of symptoms, which significantly increases the likelihood that a person at risk of transmitting SARS can be identified by basic surveillance. Influenza A(H1N1), while appearing (for now) to be far less virulent than either of these two recent serious respiratory disease outbreaks, is also considerably more likely to spread rapidly and become pandemic.

5. There is a lot of talk in the news about “suspected” and “probable” cases of influenza A(H1N1). When these words are used by a media outlet, then frankly all bets are off. On the other hand, if a news report quotes a health official referring to a case as “probable” or “suspected,” that official is (hopefully) adhering to the CDC’s Case Definitions for Infection with Swine-origin Influenza A (H1N1) Virus (S-OIV):

A confirmed case of S-OIV infection is defined as a person with an acute febrile respiratory illness with laboratory confirmed S-OIV infection at CDC by one or more of the following tests:

  1. real-time RT-PCR
  2. viral culture

A probable case of S-OIV infection is defined as a person with an acute febrile respiratory illness who is positive for influenza A, but negative for H1 and H3 by influenza RT-PCR

A suspected case of S-OIV infection is defined as a person with acute febrile respiratory illness with onset

  • within 7 days of close contact with a person who is a confirmed case of S-OIV infection, or
  • within 7 days of travel to community either within the United States or internationally where there are one or more confirmed cases of S-OIV infection, or
  • resides in a community where there are one or more confirmed cases of S-OIV infection.

You can make of that what you will. It seems to me that there is probably no logistical barrier preventing health care entities other than the CDC from confirming the influenza A(H1N1) subtype, except for one reason or another it doesn’t count as “confirmed” unless the CDC does it.

6. When I first began considering and looking into the actual severity of the whole “swine flu” panic, I thought exactly the same thing that Obama said earlier this week: this flu outbreak (and likely pandemic) is, based on the information we currently have, a cause for concern but not alarm.

If there is one good thing that has come out of what is arguably a gross overreaction by the American media, it is a heightened awareness of the importance of public health and good hygiene. So remember kids, listen to the President and wash your hands.

References

[1] Weycker, D. et al. Population-wide benefits of routine vaccination of children against influenza. Vaccine 23, 1284-1293 (2005).

[2] Kobasa, D. et al. Enhanced virulence of influenza A viruses with the haemagglutinin of the 1918 pandemic virus. Nature 431, 703-707 (2004).

[3] Li, F. C. K. et al. Finding the real case-fatality rate of H5N1 avian influenza. J Epidemiol and Community Health 62, 555-559 (2008).

[4] Jewell, N. P. et al. Non-parametric estimation of the case fatality ratio with competing risks data: an application to Severe Acute Respiratory Syndrome (SARS). Statist Med 26, 1982-1998 (2006).


1I have used the nomenclature preferred by the World Health Organization as of 30 April 2009.

2The 2.5% CFR figure for the 1918 pandemic, though almost canonical, seems highly questionable given the estimates of 20-100 million deaths at a time when the world had a population under 2 billion. In any case, data from that pandemic are likely iffy at best.