Saturday, September 6, 2008

Our Vocabulary Word of the Day is a Doozie

As one might expect, the topic of choice among political bloggers and poll junkies is the question of whether or not McCain's muddled, emotionally disorganized, and storm-shortened convention will get him a bounce that cancels Obamas--or indeed whether he will get any bounce at all. Because of the three-day-track system employed by most national polls, the latter question can be answered before the former, and it seems safe now to conclude that the answer is affirmative: there is definitely some movement in the polling data in favor of Mr. McCain, even before that data is able to fully reflect peoples' feelings about the entirety of the event--particularly Mr. McCain's acceptance speech, which won't be fully reflected in daily tracking poll figures until at least Monday.

Could this be a sign that McCain's scrappy, free-wheeling (un-themed?) party for himself on the upper Mississippi has tied the fray? Is this the moment Obama supporters have been dreading, when the inevitable last at-bat by the Republicans produces an upwelling of support that sets both the tone and the agenda for the entire fall sprint to the election? (Lest we forget, the Swift-Boat ads may have tarnished the 2004 Democratic nominee, but Mr. Kerry was still leading on the opening night of the Republican convention.) Is McCain about to square this thing up yet again, leading to still another cliffhanger election between equally matched and equally polarized bases?

Here the answer is decidedly less encouraging for Mr. McCain, and it involves a little-known and difficult-to-say concept from the world of statistical analysis: heteroscedasticity. It's a slightly involved tale, and as second-cup stories go it's not likely to distinguish itself for its enthralling sense of drama, but it's an extremely important concept in data analysis and it has unusual relevance to the business of taking the temperature of the 2008 Presidential election campaign--especially if you're an Obama supporter. So do bear with me for a moment or two, please.

Most of us understand without really thinking about it too technically that the point of a series of numbers with dates attached to them, is to connect the dates and look for a line through the swarm of numbers that will summarize how things are going. In common parlance this is called a trend-line but its slightly more official handle is a "best-fit line," since no line, no matter how technically complex its methodology, can connect points from the same day that contradict each other. The job of figuring out where to put the best-fit line is a plodding math process which ultimately yields that one of all possible lines with the smallest cumulative total of "mistakes" arising from leaving some of the points to the high-side and others to the low.

Still, a person could be forgiven for presuming at this point that such a characterless set of math steps could be relegated to a PC, which would then come back in about six seconds (in that droll, casually confident tone of voice that PC's always save for tasks that should've been easy enough for you too) with a best-fit line drawn through the data as if by magic. Having automated the whole process, we'd no longer have to think about it anymore, afterward. Right?

Actually, no.

It happens that there are several problems with these best-fit lines, some of them inherent and others of them owing to lapses in the vigilance of the practitioner employing them. For example, data that actually follows a crescent-shaped track across our graph paper will still yield a perfectly valid answer for a straight best-fit line, but such an outcome will require the practitioner to stare back at his droll and casually confident PC and realize (by eyeball if not by means any more elaborate) that the straight line is not the best explanation of what has happened.

Likewise, data that has a huge break in it--if, for example, Mr. Kerry had been running three-points up for every day before the Swift-Boat ad, and then ran three-points down every single day afterward--could also generate a single, straight, best-fit line, and that line would be equally inadequate as a description of what had actually happened. (And in case you're not already so bored as to be pondering how it would feel to shove roofing nails into your eye sockets, these two problems are called autocorrelation and structural change, respectively.)

A third problem with drawing best-fit lines, the problem most relevant to the task of using one to analyze the present state of the 2008 election, is the problem of heteroscedasticity. This occurs when data at the beginning of the series has either a higher or lower degree of dispersion, than data at the end, producing a sort of "megaphone" shape on the graph paper when the data is plotted. When this happens, the line isn't so much a poor fit as it is a poor representation of the most important story the data is trying to tell: either that the relationship between the points is getting noisier and noisier with outside influences, or that it is dialing-down to a less and less volatile pattern over time, indicating a decreasing likelihood for big changes as the pattern continues into the future.

In viewing the tracking poll data for a Presidential campaign, heteroscedasticity implies either more and more of the electorate up for grabs, as election day draws near, or else less and less. If the megaphone produced by graphing the data is "shouting" to our left--if the data is drawing-down to a narrower and narrower band, then it suggests that more people are making up their minds about which candidate they intend to support, and that it will be harder and harder for the trailing candidate at this point in the computation to overcome the deficit and win.

Unfortunately, tests for this sort of thing are difficult to explain in print like this, and even harder to carry-out. But now that we know what to look for, the tracking data for this election (courtesy once again of my friends at certainly does appear to be far more Obama-friendly than the best-fit line that has already been drawn through it:

Granted, ignoring the red and yellow lines through the swarm isn't easy, but if we squint past their psychological influences, the overall character of the swarm certainly seems (to this eye, at least) to bear the classic megaphone-shape of an electorate slowly and methodically coming to a degree of comfort with Senator Obama and his supposed "other-ness," and this comfort seems now to favor him by a roughly 3% margin--about the same as Bush's 2004 ultimate margin over Senator Kerry.

As The Key Grip has been saying on an almost daily basis, any number of enervating and dramatic things could still happen: our armed forces could suddenly apprehend Osama Bin Laden, the economy could double-dip into a far deeper recession, some small band of crazies could attack the country or one of its candidates. Indeed, this proclivity for established electoral dynamics to be blown apart in their eleventh hour is so well-known and so universally feared as to have its own semi-technical jargon attached to it: the dreaded "October Surprise," as coined by none other than the current President's father in the waning days of the 1980 campaign.

But as Senator Kerry could surely explain to Senator McCain (were they still on speaking terms, that is), gambling on an October Surprise is also a fool's errand: Most elections are what they are by shortly after the second convention. It is rare that even a stunning out-performance by one candidate over the other in the three Presidential debates proves enough to significantly move the electorate--perhaps because most peoples' preconceived notions of the two participants are generally reinforced by their performances, regardless.

At all events, as the number of undecideds in this election gets smaller, as the comfort level we all have with Senator Obama as a man who can pass the "living room test" only gets larger, as the non-electoral news continues methodically to break Senator Obama's way, the likely bounce from the Republican convention becomes less and less the story of the data, and more and more the footnote. If Senator McCain plans to be our next President, he will have to shake this pattern up in a far more dramatic and far more lasting way than he has managed to do thusfar.

And he has fewer than sixty days, to do it.

Dave O'Gorman
("The Key Grip")
Gainesville, Florida


A. Gordon said...

Great posting. While I have a MS in Applied mathematics and was never good at stats, even I understood. Nice work.

Dave O'Gorman said...

Thanks! For a second there I was afraid you were going to point out that I was spectacularly wrong, somehow.