Abstract
We use random forests, a machine-learning technique, to formally examine the link between real gasoline prices and presidential approval ratings of the United States (US). Random forests make it possible to study this link in a completely data-driven way, such that nonlinearities in the data can easily be detected and a large number of control variables, in line with the extant literature, can be considered. Our empirical findings show that the link between real gasoline prices and the presidential approval ratings is indeed nonlinear, and that the former even has predictive value in an out-of-sample exercise for the latter. We argue that our findings are in line with the so-called pocketbook mechanism, which stipulates that the presidential approval ratings depend on gasoline prices because the latter have sizable impact on personal economic situations of voters.
Introduction
There is quite a lot of discussion in the popular media about the possible negative influence of gasoline prices on the presidential approval ratings of the United States (US). 1 Although various government policies (like import tariffs, infrastructure investment, renewable energy-related decisions, and environmental standards) can move gasoline prices in one direction or another, the fact is that US presidents actually have only limited control over energy prices, which are determined by global supply and demand. Hence, the frequent politicization of gasoline prices might seem puzzling. However, as outlined in Kim and Yang (2022), two mechanisms can account for the electoral effects of gasoline prices. First, voters response to changes in gasoline price may reflect “pocketbook” considerations, i.e., changes in gasoline prices can result in sizable gains or losses for individual voters, and hence, may affect their personal economic situations. Second, voters can be motivated by a “sociotropic” reason, that is, voters may consider movements in gasoline prices as an informational source about the general health of the economy.
In this research, our objective aim is to test this relationship between the US presidential approval ratings and gasoline prices by accounting for nonlinearity, as confirmed by statistical tests conducted by us, as well as, by controlling for a large number of macroeconomic and financial factors, and their corresponding uncertainties, besides geopolitical risks, and oil prices. The importance of these variables has been highlighted in many studies (see, for instance, Burden and Mughan (2003); Halcoussis et al. (2009); Chong et al. (2011); Fauvelle-Aymar and Stegmaier (2013); Berlemann and Enkelmann (2014); Berlemann et al. (2015); Choi et al. (2016); Dickerson (2016); Adrangi and Macri (2019); Gupta et al. (2021); Bouri et al. (2024), and references cited therein). Specifically, several researchers have emphasized that these variables often act as nonlinear drivers for the US presidential approval ratings in the sense that their impact becomes strong and/or significant in a statistical sense once they increase beyond a certain threshold. Choi et al. (2016), for example, explicitly formulate and test the hypothesis that the effects of economic conditions on presidential approval ratings take on a nonlinear form, where they motivate this hypothesis by pointing out that threshold effects may be a characteristic feature of media coverage of the economy in general and of specific economic variables in particular. Hence, in case of our research, it is plausible to stipulate that the impact of real gasoline prices on the US presidential approval ratings is negative but small due to moderate media coverage when gasoline prices start increasing from a low level. When real gasoline prices start increasing further, however, things are likely to change as media coverage most likely increases sharply, and a higher share of perhaps formerly inattentive voters realizes that the purchasing power of money is decreasing, thereby lowering the presidential approval ratings and giving rise to the kind of threshold effects documented in the empirical literature involving the macroeconomic predictors. At the same time, when real gasoline prices increase even further, the presidential approval ratings, having already dropped to a lower level, and in the wake of continuing media coverage, dissatisfied voters most likely to some extent get used to higher real gasoline prices, implying that, once the threshold level of real gasoline prices has been crossed, the presidential approval ratings stays at a comparatively low level or further decreases moderately. In terms of empirical research, Berlemann and Enkelmann (2014) argue that nonlinear links between the US presidential approval ratings and economic indicators may account for inconclusive and sometimes conflicting results documented in the extant earlier studies, and Berlemann and Enkelmann (2014) present evidence of nonlinearities.
Given the potential importance of nonlinear links between economic indicators and US presidential approval ratings, it is important to account for such nonlinearities in empirical research, i.e., in terms of the empirical model being used, without restricting a priori the shape of a potential nonlinearity, that is, it is desirable to capture any nonlinear structure in the data in a fully data-driven way. Econometrically speaking, we address the question in hand in a robust manner by means of a machine learning approach, known as random forests (Breiman, 2001), over the monthly period of 1973:10 to 2023:12. Random forests can accurately trace out the link between presidential approval ratings and a large number of its drivers, which in our case is 19 (including the lagged presidential approval ratings), in a full-fledged data-driven manner. Being a nonparametric approach, random forests automatically capture potential nonlinear links between the US presidential approval ratings and gasoline prices, besides the various other control variables.
To the best of our knowledge, we are the first to analyze whether there is a prominent role of gasoline prices in driving US presidential approval ratings using a machine-learning approach. More importantly, we not only consider this question from an in-sample view, but also conduct a one-step-ahead out-of-sample forecasting exercise, with the latter well-established as a relatively stronger test of predictability than an in-sample test (Campbell, 2008). In the process, our research can be considered to be an extension to the somewhat related studies of Harbridge et al. (2016) and Coggin (2024). In Harbridge et al. (2016), the authors assessed the strength of the pocketbook and the sociotropic mechanisms by examining the interaction effects between gasoline prices and media coverage volume on presidential approval ratings, the idea being that, if the sociotropic mechanism prevails, the impact of gasoline prices should be more significant when voters are exposed to more regular news coverage about gasoline prices. Relying on data from multiple surveys, Harbridge et al. (2016) reported an insignificant effect from the interaction term along with independent effects from gasoline prices, thus, suggesting a strong pocketbook but weak sociotropic mechanism. 2 As far as the recent work of Coggin (2024) is concerned, this paper highlighted the importance of gasoline prices in producing a reduction in US presidential approval ratings in a linear set-up with a handful (eight) other predictors as control variables, and, in the process, emphasized the pocketbook channel. In this regard, our paper can be considered as an improvement to the work of Coggin (2024), especially methodologically, as we address the issue of potential misspecification of the linear model, given the evidence of nonlinear association between the variables of interest in our nonparametric set-up. Moreover, the number of variables considered by Coggin (2024) was also limited compared to that of ours, and, hence, we prevent the possibility of the so-called “omitted-variables-bias”. Finally, the study by Coggin (2024) was only restricted to in-sample deductions, with no evidence of the associated possible forecasting power of gasoline prices for US presidential approval ratings, as done by us, which, in turn, is a well-known relatively stronger statistical test of the validity of predictability.
Given that the US elections just concluded with a new president-elect at the end of 2024, and the fluctuations in gasoline (and oil) prices constantly in the news, especially in the wake of a series of recent geopolitical events associated with major energy-producing economies, this is indeed a pertinent question to ask. Answering this question is not only of paramount importance from the perspective of global investors operating in financial markets (on political cycles and stock-market returns, see Pástor and Veronesi, 2020), but also from the point of view of world politics, due to the worldwide influence of (divergent) policy stances undertaken by Democratic and Republican presidents. At the same time, while our objective is not necessarily to explicitly identify the channels through which gasoline prices drive the US presidential approval ratings, if we end up finding that the former are indeed relatively more important (as is possible in our machine-learning set-up) than the macroeconomic factors, as well as oil prices (historically known to be closely associated with the US macroeconomy and financial markets, Gupta & Wohar, 2017), in impacting the latter, such a finding can be interpreted as a further piece of evidence in line with the pocketbook mechanism. This is because, if voters use real gasoline prices as carrying information about the state of the economy, then the link between real gasoline prices and the presidential approval ratings should weaken, or even no longer exist in the extreme scenario, once we include macroeconomic and financial predictors in our model that tends to capture the uncertainty and expectations of voters about future macroeconomic developments, in light of the close relationship between gasoline prices and macroeconomic outcomes (see Baumeister et al. (2017) for a detailed discussion). In light of this, at the same time, the empirical results of our paper could also be relevant academically to the theoretical literature on electoral politics, which aims to identify underlying reasons (i.e., pocketbook or sociotropic) behind economic voting (Fiorina, 1981; Gomez & Wilson, 2001; Kinder & Kiewiet, 1981; Kramer, 1971).
Previewing our results, we report that the link between real gasoline prices and the US presidential approval ratings is nonlinear in nature, with the former being a relatively more important predictor than the wide array of other macroeconomic and financial controls, to the extent that it also carries an out-of-sample forecasting value for the approval ratings. At this juncture, it must be emphasized that our findings are likely to carry limited policy implications, beyond reducing taxes and import tariffs, given that US presidents cannot necessarily control gasoline, which, in turn, are governed by worldwide supply and demand. We organize the remainder of this research as follows: In Section 2, we provide a description of the data that we use in our study, while we outline in Section 3 our econometric model. In Section 4, we present our empirical results. In Section 5, we conclude.
Data
The data on US presidential approval ratings (PAR) are based on surveys conducted by Gallup, as part of the American Presidency Project.
3
A rating (expressed in percentage terms) informs about the proportion of respondents to an opinion poll who approve of the US president in office at the time when the poll was conducted. An important advantage of the Gallup poll, which differentiates it from other national polls informing about public approval of the president, is that the Gallup poll has been based over the years (since, July, 1941) on the same unchanged approval question: “Do you approve or disapprove of the way [enter president name] is handling his job as president?”. The upper panel of Figure 1 plots the ups and downs of the presidential approval ratings during the sample period that we study in this research. Presidential approval ratings and real gasoline prices.
As far as nominal gasoline prices are concerned, we utilize the US city average of all grades of gasoline retail price (in dollars per gallon including taxes). The data is obtained from the Monthly Energy Review of the Energy Information Administration (EIA) of the US. 4 We obtain real gasoline prices (RGP) by deflating with the Consumer Price Index (CPI), which captures the average price of all items for all urban consumers, obtained from the FRED database maintained by the Federal Reserve Bank of St. Louis. 5 The lower panel of Figure 1 plots real gasoline prices.
Browsing through Figure 1, there does seem to be a negative association between PAR and RGP, as is confirmed by a negative full-sample correlation coefficient of = −0.46, with a p-value of 0.00. The empirical fact that this negative association is nonlinear, however, is indicated when we estimate a quantile-on-quantile regression model, in line with Sim and Zhou (2015). We find, as reported in Figure A1 at the end of the paper (Appendix), that the effect varies in magnitude across the conditional quantiles of PAR for different sized-values (quantiles) of RGP, with relatively stronger effects at lower quantiles of the latter and upper quantiles of the former. Using the wavelet localized multiple correlation (WLMC) approach of Fernández-Macho (2018), the nonlinear negative relationship of RGP with PAR is, in general, confirmed not only based on varying strength of correlation over time, but also across frequency-bands, particularly over to medium- to long-run since the mid-1980s, as shown in Figure A2, which we also place at the end of the paper.
We also control for crude oil prices by utilizing the real values of the Cushing, Oklahama West Texas Intermediate (WTI) spot oil price (RWTI), with the nominal price also derived from the EIA, 6 and the CPI deflator from the FRED. As has been emphasized by Kilian (2010), while crude oil is the main input in the production of motor gasoline, the retail prices of the latter will in addition reflect shocks to the demand from the US for gasoline as well as shocks to the ability of US-based refiners to process crude oil. In other words, changes in the retail price of gasoline are likely to be driven not exclusively by events in the global crude oil market. It, thus, is important to look at both gasoline and oil separately when considering the role of energy prices in impacting the presidential approval ratings.
We now turn our attention to a detailed discussion of our other predictors, beyond the energy prices. In order to capture a broad base of macroeconomic and financial variables, in line with the literature on presidential approval ratings mentioned above, we use eight factors (F1, F2,…, F8) derived from the 134 macroeconomic variables of Ludvigson and Ng (2009, 2011). 7 Including these factors gives us the advantage of capturing a wide array of aggregate and regional time-series. The factors contain information on real output and income, employment and hours, real retail, manufacturing and sales data, international trade, consumer spending, housing starts, housing building permits, inventories and inventory sales ratios, orders and unfilled orders, compensation and labor costs, capacity utilization measures, price indexes, interest rates and interest rate spreads, stock market indicators, and foreign exchange measures. As pointed out by Ludvigson and Ng (2009, 2011), the factors can be distinctly identified with F1 being a real activity factor, F2 capturing interest rate spreads, F3 and F4 capturing comovements of prices, F5 being an interest rates factor, F6 and F8 capturing the situation in the housing and the stock markets, and, finally, F7 summarizing the alternative measures of the money supply.
In addition, in line with earlier studies dealing with what determines US presidential approval ratings, we use the macroeconomic uncertainty (MU) and financial uncertainty (FU) measures developed by Jurado et al. (2015) and Ludvigson et al. (2021), which, in turn, is the average time-varying variance in the unpredictable component of 134 macroeconomic and 148 financial time-series. In other words, the MU and FU variables are designed in a way so as to capture the average volatility in the shocks to the factors that summarize real and financial conditions. 8 The metrics that we use are the broadest measures of macroeconomic and financial uncertainties currently available for the US. The uncertainty indexes cover three forecasting horizons of 1-, 3-,and 12-month-ahead, and are denoted by MU1, MU3, MU12, FU1, FU3, and FU12.
Again, in line with earlier research on the topic of presidential approval ratings, as far as geopolitical risks are concerned, we consider two indexes related to threats and attacks. The two indexes are based on the work by Caldara and Iacoviello (2022), 9 who compute the indexes by counting the number of articles related to adverse geopolitical events using automated text search of the electronic archives of three newspapers (namely, the Chicago Tribune, the New York Times, and the Washington Post) for each month (as a share of the total number of news articles). The search spans eight categories (war threats, peace threats, military buildups, nuclear threats, terror threats, beginning of war, escalation of war, terror acts), with the geopolitical threats (GPRT) index covering categories 1 to 5, and the geopolitical acts (GPRA) index comprising of categories 6 to 8.
Understandably, along with a lag of presidential approval ratings, included to capture the persistence of the presidential approval ratings, we end up with 19 predictors of the presidential approval ratings for the current period, covering the monthly sample period ranging from 1973:10 to 2023:12, based on data availability at the time of writing this paper, with the start date corresponding to the RGP series, 10 and the end date being in line with the eight factors and the six uncertainty measures.
Random Forests
In order to detect the nature of the link between the presidential approval rating, PAR and real gasoline prices, RGP, we use models of the following format:
Upon applying the search-and-split algorithm in top-down way by applying this optimization problem in a recursive way, we can grow a complex regression tree that consists of many nodes and branches. The predicted value of the presidential approval ratings then can be computed from such a regression tree as follows:
A complex regression tree should inform a researcher in much detail about the link between the presidential approval ratings, the real gasoline price, and the vector of control variables. At the same time, its complicated hierarchical structure makes a complex regression tree rather sensitive to the specific idiosyncratic features of the sample of data under study. A random forest addresses this overfitting problem by growing not only one but many regression trees. Such an ensemble of regression trees is grown by (i) computing a large number of bootstrap samples by resampling from the data, (ii) growing a random regression tree for every single bootstrap sample, and (iii) predict the presidential approval ratings as the average prediction obtained from the ensemble of random regression trees. A random regression tree uses for the search-and-splitting algorithm only a random subset of the predictors and, thereby, mitigates the effect of influential predictors on tree building. Averaging across random regression trees, in turn, stabilizes the resulting predictions.
We use the R language and environment for statistical computing (R Core Team, 2023) and the R add-on package “randomForestSRC” (Ishwaran & Kogalur, 2023) to estimate random forests. We use 500 individual regression trees to grow a random forest, and bootstrapping is done with replacement.
Empirical Results
OLS Results.
adenotes significance at the 1% level. t-values are based on robust standard errors.
The OLS model imposes a linear structure on the data. The results that we summarize in Figure 2 indicate that such a linear structure may be too restrictive.
11
Figure 2 shows a scatterplot of the presidential approval ratings as a function of real gasoline prices along with a superimposed local Gaussian polynomial regression and its ± 2 standard error band. In line with the OLS results, the polynomial regression function has a negative slope as well. The local slope of the polynomial regression function, however, in addition reveals that, at comparatively low values of real gasoline prices, the correlation is stronger (in absolute terms) than at relatively high values of the real gasoline price. Hence, the estimated polynomial regression function indicates that the contemporaneous link between the presidential approval ratings and real gasoline prices is nonlinear. Local Gaussian polynomial regression. Dashed gray lines denote the boundaries of a ± two standard error band.
Like the OLS model that we consider in Table 1, the polynomial regression does not control for the impact of the predictors other than real gasoline prices. While it would be possible to include several other predictors in the OLS model in addition to real gasoline prices so as to trace out the incremental impact of real gasoline prices on the presidential approval ratings, such a research strategy runs into several difficulties. First, such an empirical modeling strategy would yield a high-dimensional but highly unreliable model in case nonlinearities in fact are present in the data. Second, such an empirical modeling strategy would result in an OLS model that, while already being complex, still would be incomplete because it only would be natural to account for a nonlinearity not only with regard to real gasoline prices, but rather with regard to the other predictors as well. Third, it would be important to account for potential interaction effects between the predictors because, for example, the impact of real gasoline prices on the presidential approval ratings may depend in a potentially complex way on the level of macroeconomic uncertainty. Such interaction effects further would inflate the number of predictors included in an OLS model. Finally, even after adding a full set of nonlinear and interactions terms to an OLS model, a researcher would still have to decide on the specific form of nonlinearities and interaction effects, further proliferating the space of models from which to choose arbitrarily a specific model for estimation and forecasting. Given these difficulties, it should be clear that random forests are a much better model to inspect how the presidential approval ratings are linked to real gasoline prices. 12
In order to shed light on the link between the presidential approval ratings and real gasoline prices after controlling for the predictive value of the other predictors, we plot in Figure 3 the partial dependence function we obtain from estimating a random forest. The partial dependence function informs about the value of the presidential approval ratings that the estimated random forest predicts for alternative realizations of real gasoline prices, holding the other predictors constant. The estimated partial dependence function resembles the estimated polynomial regression function. The partial dependence function has a strongly negative slope for low values of real gasoline prices and then more or less flattens out when real gasoline prices increase beyond their mean (1.08)/median (1.03). Partial dependence plot. Dashed red lines denote the smoothed boundaries of a± two standard error band.
Variable Importance.
VIMP is standardized by dividing by the variance of PAR and then multiplied by 100. CI = confidence region (parametric jackknife, in%). For definitions of the predictors, see Section 2.
Forecasting Results.
Initial training period: 10 years. Benchmark model: PARt+1 = f(PAR
t
, …). Rival model: PARt+1 = f(PAR
t
, RGP
t
, …). A RMSFE (MAFE) ratio
We also report results for a modified RMSFE (MAFE) ratio, where the discount “old” forecast errors using the formula FE s × γT−s, for s = T, T − 1, T − 2, …, where FE denotes the forecast error and T denotes the last observation of the sequence of out-of-sample forecasts. We consider two cases: γ = 0.99 and γ = 0.9. In both cases, more recent forecast errors receive a larger weight as compared to more distant forecast errors. We observe that discounting increases the ratios. This observation suggests that the impact of real gasoline prices on forecast accuracy has tended to increase in the more recent past.
In order to inspect this observation from a different angle, we plot in Figure 4 how the rank of real gasoline price among the predictors changes when we move the end of the recursive-estimation window forward in time. We plot the rank of real gasoline prices in terms of how often this predictor is used for splitting when growing a random forest (upper panel) and in terms of VIMP (lower panel). A lower rank means that real gasoline prices are more important. The evolution of both metrics shows that the importance of real gasoline prices has increased over time. Importance of real gasoline prices over time. The rank of RGP in terms of how often this predictor is used for splitting when growing a random forest (upper panel) and in terms of VIMP (lower panel). Window = Index of recursive-estimation window.
To assess the statistical significance of the impact of real gasoline prices on forecast accuracy, we report, also in Table 3, the results of the Clark and West (2007) and the Diebold and Mariano (1995) tests, where we report results for absolute and squared forecast errors in case of the latter. We report results for both tests because a comparison forecasting models in terms of statistical tests is complicated by the nonlinear and complex structure of random forests. The nonlinear and complex structure of random forests implies that the models are not simple nested versions of each other. In any case, both tests yield statistically significant results and, thus, point in the same direction that real gasoline prices help to improve the accuracy of one-month-ahead forecasts of the presidential approval ratings.
Besides the expectations of the consumers (which have been closely associated with the PAR of the US recently by Gordon (2024)) and businesses captured by the survey data, that forms part of the large database from which the eight factors are extracted, the macroeconomic and financial uncertainties that we include in our array of predictors can be interpreted as forward-looking variables as they capture the non-predictable component of these variables due to various types of shocks (Ludvigson et al., 2021), including those emanating from the energy sector (Sheng et al., 2020), and, as such, account for the sociotropic argument that movements in gasoline prices have a substantial impact on voters expectations of the overall macroeconomy. The incremental impact of real gasoline prices on the presidential approval ratings, therefore, can be interpreted as direct evidence of the pocketbook mechanism.
14
In order to strengthen this interpretation further, we let subsequent macroeconomic conditions, Mt+1, be linked to voters’ expectations of the overall macroeconomy, Importance of real gasoline prices over time in an extended model. The rank of RGP in terms of how often this predictor is used for splitting when growing a random forest (upper panel) and in terms of VIMP (lower panel). Window = Index of recursive-estimation window. The extended model features the leads of the macroeconomic factors as additional predictors.
Concluding Remarks
We have used random forests to study the link between the U.S. presidential approval ratings and real gasoline prices, where we have controlled for a large number of control variables that have been studied in earlier literature. Our empirical results have shown that the link between the presidential approval ratings and real gasoline prices is negative and nonlinear. We have found that, putting the lagged presidential approval ratings aside, real gasoline prices clearly are as important, or even more important, than other conventional predictors of the presidential approval ratings. We also observe that real gasoline prices even have predictive value for the subsequent presidential approval ratings in an out-of-sample forecasting experiment. Given the importance of the issue, the predictive value of real gasoline prices for the subsequent presidential approval ratings should be investigated in future research in a more systematic way by considering longer forecast horizons and alternative forecasting models.
Random forests have the advantage that they render it possible to consider a large number of predictors of the the presidential approval ratings, the link of which to the predictors is then traced out in a flexible and completely data-driven way. If voters use real gasoline prices as a source of information about the health of the macroeconomy, then the link between real gasoline prices and the presidential approval ratings should disappear, or at least weaken, once we control for predictors that somehow control for the uncertainty and expectations of the voters about subsequent macroeconomic developments. To this end, we have included in our model various macroeconomic uncertainties and (lead) macroeconomic factors. In spite of the fact that random forests can use these predictors, we have found a direct effect of real gasoline prices on the presidential approval ratings. Our empirical findings, thereby, support the pocketbook mechanism, which stipulates that the link between the the presidential approval ratings and real gasoline prices reflects a sizable effect of the latter on personal economic situations of voters.
At this stage, it is important to point out that our finding that real gasoline prices carry predictive information for the US presidential approval ratings might carry only limited policy implications, given that US presidents cannot necessarily control energy prices, as they are determined by condition worldwide supply and demand. In other words, though chances of being re-elected is likely to be influenced, besides other macroeconomic factors, by real gasoline prices, the president might not be able to do much beyond possibly reducing taxes to compensate for the increase in real gasoline prices at the domestic-level, and/or reducing import tariffs from a global (import) perspective. Having said this, perhaps our results make a case for more emphasis on transition to renewable energy to reduce dependence on gasoline (fossil fuel) and, hence, enhance the probability of retaining the presidential office, besides meeting environmental standards required for achieving the long-term goal of a more “green economy”.
Footnotes
Author’s Note
We would like to thank an anonymous referee for many helpful comments. The usual disclaimer applies.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
Appendix
Results for a quantile-on-quantile regression. Results for wavelet localized multiple correlation.
