Abstract
In averaging forecasts within and across four-component methods (i.e. polls, prediction markets, expert judgment and quantitative models), the combined PollyVote provided highly accurate predictions for the US presidential elections from 1992 to 2012. This research note shows that the PollyVote would have also outperformed vote expectation surveys, which prior research identified as the most accurate individual forecasting method during that time period. Adding vote expectations to the PollyVote would have further increased the accuracy of the combined forecast. Across the last 90 days prior to the six elections, a five-component PollyVote (i.e. including vote expectations) would have yielded a mean absolute error of 1.08 percentage points, which is 7% lower than the corresponding error of the original four-component PollyVote. This study thus provides empirical evidence in support of two major findings from forecasting research. First, combining forecasts provides highly accurate predictions, which are difficult to beat for even the most accurate individual forecasting method available. Second, the accuracy of a combined forecast can be improved by adding component forecasts that rely on different data and different methods than the forecasts already included in the combination.
Keywords
Introduction
Combining forecasts is a well-established and powerful method to increase forecast accuracy (Armstrong, 2001; Clemen, 1989). The reason is that a combined forecast includes more information than forecasts from any single component method. In addition, the systematic and random errors associated with individual component forecasts are likely to cancel out in the combined forecast.
As has been demonstrated with the PollyVote for predicting US presidential elections, combining forecasts is particularly beneficial if one can draw on component forecasts that use different methods and data. The PollyVote averages forecasts within and across four different component methods: polls, prediction markets, quantitative models, and expert judgment. Across the six elections from 1992 to 2012, the resulting combined forecast reduced the error of a typical poll, model, and expert judgment by more than half. Compared with prediction markets, the most accurate component method, error was reduced by 16% (Graefe et al., 2014b). Forecasts made on Election Eve prior to the three elections from 2004 to 2012 missed the final vote share on average by 0.6 percentage points. To put this in perspective, the average error of the final Gallup poll was more than three times higher (Graefe et al., 2014a).
This performance was achieved even though the PollyVote did
Vote expectation surveys have been around at least as long as scientific polling (Hayes, 1936), but have long been overlooked as a method for forecasting election outcomes. Although early work pointed to the accuracy of vote expectations, these studies focused on identifying factors that explain
In a previous study, I compared the accuracy of vote expectations to forecasts from polls, prediction markets, quantitative models, and expert judgment (Graefe, 2014). Across the last 100 days prior to the seven US presidential elections from 1988 to 2012, vote expectations provided more accurate forecasts of election winners and vote shares than each of the four established methods. Compared with polls, vote expectations reduced the error of vote-share predictions by 51%. Compared with prediction markets, error was reduced by 6%. In other words, vote expectation surveys appear to be the most accurate individual method for forecasting US presidential elections available to date.
The present research note builds on this work and contributes to knowledge on combining forecasts by analysing (1) the relative accuracy of vote expectations and the PollyVote and (2) the accuracy gains from adding vote expectations to the PollyVote.
Method and data
Accuracy is analysed for forecasts of the national two-party popular vote in the six US presidential elections from 1992 to 2012, the time period for which forecast data on both the PollyVote and vote expectation surveys are available. The absolute error, calculated as the absolute difference of the predicted and actual national two-party popular vote of the incumbent party’s candidate, was used as the measure of accuracy.
Forecasts from the original four-component PollyVote and vote expectations were obtained from publicly available datasets at the
Results
Figure 1 shows the mean absolute errors (MAE) of forecasts from vote expectations, the original PollyVote (without vote expectations), and the new PollyVote (including vote expectations) across the six elections from 1992 to 2012. Vote expectations were less accurate than the original PollyVote for both long-term (90–70 days prior to Election Day) and short-term forecasts (from 20 days prior to Election Day). For medium-term forecasts (60–20 days), however, vote expectations performed similar to – and sometimes better than – the original PollyVote.

Mean absolute errors of forecasts from vote expectations, the original PollyVote (without vote expectations) and the new PollyVote (with vote expectations), 1992–2012.
Figure 1 further shows that adding vote expectations to the original PollyVote increases accuracy. Except for long-term forecasts, the new five-component PollyVote provides at least as – and usually more – accurate forecasts as the original four-component PollyVote.
Figure 2 presents the same data in a different way by showing the MAE of vote expectations and both PollyVote versions

Mean absolute error of forecasts from vote expectations, the original PollyVote (without vote expectations) and the new PollyVote (including vote expectations), calculated across the remaining day to election, 1992–2012.
Discussion
This research note provides empirical evidence in support of two major findings from the forecasting literature. First, combining forecasts from different methods that use different data provides highly accurate forecasts, which are difficult to beat by even the most accurate individual method available. Across the past 90 days prior to each of the six elections from 1992 to 2012, the original PollyVote – which averages forecasts within and across polls, prediction markets, quantitative models, and expert judgment, but does not include vote expectations – missed the incumbent party’s final vote share on average by 1.16 percentage points. This error is 12% lower than the corresponding error of vote expectation surveys, which prior research found to the most accurate method for the examined time period (Graefe, 2014).
Second, and more importantly, the accuracy of a combined forecast can be further improved by adding component forecasts that rely on a different method and different data than the forecasts already included in the combination. After adding vote expectations as a fifth component method, the new PollyVote reduced the error of the original four-component version by 7%, a substantial improvement given the already very low forecast error. On average across the 90 days prior to the six elections, the new five-component PollyVote missed the final election result by little more than one (i.e. 1.08) percentage point.
This performance was achieved by calculating simple unweighted averages within and across forecasts of five component methods. Calculating unweighted averages may appear as a naïve approach to combining forecasts, as it does not account for the component methods’ relative accuracy. However, an early review of more than 200 papers showed that the simple average provides a good starting point for combining forecasts, and is difficult to beat by more complex approaches (Clemen, 1989). These results still hold today, despite many efforts in search of sophisticated methods for combining. The problem with complex statistical procedures that aim to estimate component weights from historical data is that they tend to perform poorly in situations with limited and messy data, which are common in the social sciences. A recent example is Ensemble Bayesian Model Averaging (EBMA), a method that has been shown to perform well for combining forecasts in the data-heavy domain of weather forecasting. However, when applied to problems with scarce and noisy data, such as in economic and election forecasting, EBMA provided less accurate forecasts than the simple equal-weights average (Graefe et al., 2015).
When pre-specifying equal weights to component forecasts, analysts ignore the components’ relative accuracy. Instead, they deliberately introduce a bias that reduces variance and thereby limits a model’s ability to
When forecasting presidential elections, for example, uncertainty occurs due to ambiguity about the component methods’ relative accuracy, external shocks (e.g. campaign events), or the existence of noisy data. Calculating unweighted averages across forecasts is a simple way to account for such uncertainties or, in other words, to incorporate prior knowledge that prediction in the situation at hand is difficult.
Differential weights can be useful if there is strong prior knowledge about the methods’ relative accuracy. For example, polls are well known to have little predictive value until shortly before the election (Erikson and Wlezien, 2012). Thus, it might be useful to assign lower weights to polls early in the campaign and then gradually increase their weight as the election comes closer, an approach that is becoming standard practice in models that combine structural (fundamental) data and updated polls over time. For a review of existing models see Lewis-Beck and Dassonneville (2015), who also incorporate this prior knowledge about the relative accuracy of polls over time to develop forecasting models for French, German, and UK elections.
While combining is most powerful when aggregating many forecasts that use a different method and different data, the approach can also be used if fewer methods are available. In a recent study, combining forecasts from three methods (a quantitative model, prediction markets, and polls) yielded accurate predictions of the 2012 US Electoral College and senatorial elections, a situation in which data are scarce (Rothschild, 2014).
Finally, the benefits of combining are of course not limited to US elections. In a recent validation test, the PollyVote was used to predict vote shares of seven parties in the 2013 German federal election by averaging forecasts within and across the four-component methods that were used in the original US PollyVote. Across the 58 days for which forecasts from all four components were available, the combined PollyVote forecast was more accurate than each component’s typical forecast. Error reductions ranged from 5%, compared with a typical poll, to 41%, compared with a typical prediction market (Graefe, 2015).
Conclusion
Since 2004, the PollyVote has demonstrated the benefits of combining for forecasting the national vote in US presidential elections by averaging forecasts from four-component methods. Combining is a simple and powerful strategy to generate accurate forecasts. Combining forecasts from different component methods typically yields more accurate predictions than the average (i.e. randomly selected) component, and often outperforms even the best component. Adding forecasts that use a different method and different data to the combination can be expected to further improve accuracy. Given the results of the present research note, the PollyVote will add vote expectations as a fifth component for forecasting the 2016 US presidential election.
