"Whenever I go into a restaurant, I order both a chicken and an egg to see which comes first"

Saturday, November 3, 2012

The Genius of Nate Silver–Data Trumps People

Nate Silver is everywhere these days, primarily because he correctly predicted the results of the 2008 presidential election in 49 out of 50 states.  He has predicted that Obama has a close to 80 percent chance of winning this 2012 election. He makes his predictions number-crunching, advanced statistical analysis which combines state polling data and averages their results; subjects them to a variety of filters, and then finally makes his predictions.  Put simply, Silver:

  1. Takes all the polls available;
  2. Adds in factors to his model shown to have impacted election outcomes in the past;
  3. Runs lots and lots and lots of elections; and
  4. Looks at the probability distribution of the results.

On a New York Times website http://fivethirtyeight.blogs.nytimes.com/methodology/ which provides detailed methodological notes, Silver’s approach becomes more clear.  Silver starts with three principles: Recency (most recent polls given greater weight), Sample Size, and Pollster Rating. Then he filters the data according to the following:

The trendline adjustment. An estimate of the overall momentum in the national political environment is determined based on a detailed evaluation of trends within generic congressional ballot polling.

The house effects adjustment. Sometimes, polls from a particular polling firm tend consistently to be more favorable toward one or the other political party.

The likely voter adjustment. Throughout the course of an election year, polls may be conducted among a variety of population samples. Some survey all American adults, some survey only registered voters, and others are based on responses from respondents deemed to be “likely voters,” as determined based on past voting behavior or present voting intentions. Sometimes, there are predictable differences between likely voter and registered voter polls.

The Times then does a regression analysis to include non-poll factors such as ‘Partisan Index’ rating, incumbency status, political contributions received, and stature (highest previous elected office).

The most important element in all this is the rejection of any individual poll to suggest predictable outcomes; and moreover to discount entirely what political pundits have to say.  In the former case there is too much bias and error for accuracy; and in the latter too much ignorance or political posturing. 

The reason why all this is so important is that it follows a trend that is occurring in every discipline.  In a recent blog post I detailed ‘sequencing’, a technique which relies on modern computing power to observe phenomena and come to conclusions, predictions, and innovations based on them, rather than to focus on the nature or origins of those phenomena (http://www.uncleguidosfacts.com/2012/11/the-new-data-revolutionchomsky-against.html).

In a popular example of this idea, the movie Moneyball in which the General Manager of the Oakland A’s used sabermetrics to select baseball players, jettisoning so-called talent scouts for facts and figures.  On-base percentage was what won ballgames, decided Billy Beane, and he had his data-whiz associate crunch the numbers and find out who got on base the most and cost the least.  His scouts were predictably outraged because Beane was ‘throwing away’ decades of finely-honed skills of detecting talent and predicting Major League performance. He didn’t listen to the nay-sayers, and although the A’s lost the year the system was introduced, the Red Sox adopted it and won.

I have also written about crowdsourcing which relies on volumes of data, often from randomly-selected sources, to make predictions about elections, number of gumballs in a glass canister, or the likelihood of Israel bombing Iran  (http://www.uncleguidosfacts.com/2012/11/crowdsourcing-and-predictive-markets.html).  Google recently crowdsourced its search for a better search algorithm, offering a significant reward to anyone who came up with a solution.  Google didn’t care whether the researcher was a PhD or geek-in-a-garage as long as they came up with the answer.  The decided not to restrict their search even to the boy-geniuses in their California labs.  Why should they?  They knew, perhaps better than any other corporation, the ideas being generated throughout the cyber-world.

Both of these research technique eliminate the need for polls, pollsters, pundits, and focus groups and simply rely on large amounts of data from super-large numbers of people to generate useful information.

The negative reaction to Nate Silver has been strong and predictable, mainly from the pundits and pollsters who make their living on predictions; but the train has already left the station, and the importance of individuals or individual ‘things’ (genes, polls, opinions) is diminishing as the power to aggregate and analyze them grows.

This data-mining, sequencing, crowdsourcing trend is revolutionary; for it is changing the way we view the world.  Op-Ed columnists, bloggers, pundits, individual scientists, economists, and political scientists will soon become things of the past.  We will no longer tune in to CNN or MSNBC for opinions; but will go to a vast new array of websites which do what Nate Silver has done – amass huge amounts of data on a particular subject, report findings, and make predictions.

Noam Scheiber has written a review of Silver’s new book The Signal and the Noise takes a different tack and suggests that his statistical methods can insert a note of much-needed objectivity into journalism:

Journalism is in a strange place these days. Cable and the Internet crippled the old media establishment; political polarization dealt it a death blow. In the meantime, no new establishment has risen up to take its place. What we have is a growing sense of intellectual nihilism. The right-wing media speak only to true believers. Liberal journalists are often more fact-consciousbut equally partisan, while mainstream outlets have a rapidly dwindling audience. Few media institutions command widespread credibility. I think Silver — or at least Silver-ism — has the potential to fill the void.

It is doubtful that the millions of Americans who define themselves by their politics and who willingly give up objectivity for confirmation of their beliefs will give up Fox News or MSNBC for purely objective data-based assessments of reality; and it is likely that most of us will continue to think, act, and predict based on feel; but the major decisions taken in modern life, whether science, foreign policy, or economics will increasingly be based on data, statistical analysis and probability.

It represents a quantum leap for many to leave aside their own convictions and confidence in their own abilities; and for some it is a denigration of human ability, a neutralization of the very human insights, revelations, and innovations which have characterized us; but in the end it is merely another, perhaps better way, of collecting better information on which to be creative or successful.

No comments:

Post a Comment