How to Interpret Polls and Analyze Election Forecasts

In 2007, an anonymous baseball analyst under the pen name “Poblano” began blogging about the 2008 Presidential Election. Frustrated by the lack of detail-oriented approaches to poll analysis, he spread information about the important ways to handle public opinion data. To prove he could do better than current models, he collected hundreds of polls from across the internet and put together his own presidential forecast. On Nov. 4, 2008, his forecast accurately predicted the result of 49 out of 50 states.

You may now know “Poblano” as Nate Silver, Editor-in-Chief of fivethirtyeight.com. Silver has risen to the top of statistician and political punditry circles for his quantitative and lay-friendly approach to politics.

Although Silver’s public image has allowed him to educate the public on data analysis, many Americans still do not know how to glean information from national or state polls. Part of this problem is due to the sheer volume of polls, and part of it is due to the differences between them. 

So, instead of looking at how humans are supposed to interpret polls, let us first look at how models interpret polls. After all, the best models out-predict the best humans.

The natural first question is, how many polls are there? FOX and CNN often make it seem like there are only a few polls in the presidential cycle, because these networks individually analyze each one without respect to the others. However, at this point in the election, there are over a dozen national polls taken every week. For key battleground states like Pennsylvania and Michigan, polls are taken everyday. 

The reason for this high poll volume is because each poll has a margin of error, typically around 3% when 1000 voters are surveyed. If a poll shows Biden is up by 4 points with a margin of error of 3%, it means that pollsters have 95% confidence that Biden is up by 1 to 7 points. In other words, if they made many intervals, 95% of polls would hold the true percentage of Biden supporters.

And yet, sampling error isn’t the only source of variation. Over the course of the election cycle, opinions change, and some groups become more comfortable sharing their votes with pollsters. That is why models like Silver’s combine polls together, placing emphasis on the more important polls.

Forecasters look at many poll attributes, including the time before election, the sample size and the grade of the poll (an A-F label assigned to polling methods by Fivethirtyeight). Many functions can take these factors into account and spit out a “weight” for the poll. For example, if there are 30 days before the election, 200 people are surveyed, and the polling agency gets a D grade, then the poll might have a weight of 1. If there are 5 days before the election, 1200 people are polled, and the polling agency gets an A- grade, then the poll might have a weight of 6. The 6-weight poll will have a much bigger impact on the final prediction than the 1-weight poll, much like how a 6-point assignment is more important than a 1-point assignment in your math class.

But polls cannot tell the whole story. In this rapidly changing political climate, it is becoming increasingly important that forecasters take other information into account. Things like voter demographics, COVID-19 severity and the barriers to voter turnout can be combined into a regression model that predicts a two-party vote. This is intuitive for most people. After all, models should know that a 90% white state with minimal COVID-19 deaths and high barriers to immigrant voter turnout would vote differently than a 60% Black state with a high COVID-19 rate and high turnout (both states are completely fictional).

In 2016, this combination of polling and demographics did not predict Trump’s victory. Many polls were given strong weight despite underrepresenting key groups in the election cycle. Namely, non-college educated white people came out in droves for Trump after having refused polling surveys in higher-than-expected rates (what we call non-response bias). Nobody knew this demographic would impact the election until months after it did.

What will separate good forecasts and bad forecasts in this election is not just the nuances of poll analysis, but a fundamental understanding of the limitations of polls. Sampling error, opinion changes and non-response bias are prevalent in every poll. 

Fivethirtyeight is by no means the only forecast to look at. The Economist is offering a model that strives to take into account different rates of response across party lines. At the same time, 90 students from Montgomery Blair High School are hosting their own forecast at polistat.mbhs.edu

There is no shortage of forecasters and no shortage of methods, but there will only be one winning candidate in this election. Polls are giving us clues at voter sentiment, but they are not telling us the whole story by any means. Campaigners have learned this, statisticians have learned this and, hopefully, the American people are learning this as well. 

Article by Shariar Vaez-Ghaemi of Montgomery Blair High School

Graphic by Angelina Guhl of Richard Montgomery High School

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.