Abstract
In a recent Research and Politics article, we showed that for many types of time series data, concerns about spurious relationships can be overcome by following standard procedures associated with cointegration tests and the general error correction model (GECM). Matthew Lebo and Patrick Kraft (LK) incorrectly argue that our recommended approach will lead researchers to identify false (i.e., spurious) relationships. In this article, we show how LK’s response is incorrect or misleading in multiple ways. Most importantly, when we correct their simulations, their results reinforce our previous findings, highlighting the utility of the GECM when estimated and interpreted correctly.
Introduction
We are grateful for the opportunity to continue the dialogue about appropriate applications of the general error correction model (GECM) with Matthew Lebo and his coauthors. Although this discussion has been underway for several years now, 1 our first article on the topic followed a Political Analysis time series symposium, where Grant and Lebo (2016) (GL) and Lebo and Grant (2016) (LG) argued that the GECM is rarely (if ever) appropriate with political data. Like many time series researchers, much of their concern stemmed from the potential for estimating spurious relationships. Our article – Enns et al. (2016) (EKMW) – showed that GL were far too skeptical of the ongoing utility of the GECM. When Y contains a unit root, when Y is bounded and contains a unit root, when Y is stationary, or when Y is near-integrated (i.e., ρ ≥ 0.90), LG’s concerns about spurious relationships are easily overcome by following standard procedures associated with cointegration tests and the GECM. 2 Specifically, to conclude that cointegration exists with a GECM, researchers should: (1) conduct statistical tests to confirm that Y and X contain unit roots (our simulations used augmented Dickey–Fuller (ADF) tests); 3 (2) confirm that the error correction model (ECM) parameter (associated with Yt–1) is statistically significant using appropriate MacKinnon critical values; and (3) confirm that the coefficient for the lag of X is statistically significant. 4 We showed that if all three of these conditions are met, the Type-I error rate for the estimated relationship between X and Y falls at, or below, the standard 5% threshold.
Much of our evidence relied on GL’s own simulation results. Lebo and Kraft (2017) (LK) now conduct new simulations in an effort to show that our approach will routinely lead researchers to identify false (i.e., spurious) relationships. They also conduct simulations which suggest that the negative bias on the error correction parameter is much more severe than we report. A careful look at LK’s response, however, shows that it does not undermine our conclusions and that it is easy to reconcile the seemingly disparate recommendations. In fact, had LK followed exactly our recommended procedure, their simulation results would have looked extremely different and would, in fact, support our conclusions.
Our advice does not over-produce false-positives
Based on 60,000 simulations, LK conclude that the ADF test is “drastically underpowered” to reject the null hypothesis of a unit root (Lebo and Kraft, 2017: 4). This, of course, is a well-known finding (e.g., Blough, 1992; Cochrane, 1991), and LK actually quote us making the exact same point in our article (Lebo and Kraft, 2017: 3). It is important to remember why we chose to use an underpowered test in our simulations. Just three lines below the sentence LK quoted, we explain: “this means we are biasing our simulations against support for the GECM since we are more likely to incorrectly conclude the series contains a unit root and thus inappropriately utilize the GECM as a test of cointegration (thereby inflating the rate of Type-I errors with those cointegration tests).” For the four types of data that we analyzed, the Type-I error rate using 0.05 p-values approximated the expected 5%. Using stronger unit-root tests would only reduce the rate of spurious findings. Thus, what LK present as a bug was actually a feature of our analysis.
What, then, are we to make of LK’s simulation results in their Figure 1(c), which claim to follow the “exact procedures” we advocate and report false-positive rates greater than 5% in 38 out of 60 sets of simulations when Y is stationary or fractionally integrated? A review of LK’s approach reveals that their spurious relationships emerge because they did not follow our “exact procedures.” 5

The proportion of false-positives when the augmented Dickey–Fuller test includes the Inappropriate vs. Appropriate number of lags of ΔY.
Lebo and Kraft’s first oversight results because they incorrectly used the
Researchers should also be aware, however, that LK skipped two other steps that are necessary to conclude that cointegration is present. First, both X and Y should be tested for a unit root before utilizing the ECM parameter as a test of cointegration. Second, even if both series showed evidence of a unit root and the ECM parameter was significant with the MacKinnon critical values, the estimated coefficient on Xt–1 should also be significant (using traditional critical values) before concluding that there is a long-term relationship between X and Y. Consistent with Enns et al. (2016), when we follow all of the necessary steps, the false-positive rate is at or below 0.05 in every set of simulations reported above except for two (where the false-positive rate is 0.06 and 0.08) (see Appendix, Figure A1).
Instead of concerns about potential spurious relationships, LK’s Figures 1 (a) and (b) focus on the point estimate of the error correction parameter. On one hand, we want to be careful not to place too much emphasis on this point estimate. Researchers are typically most interested in whether a relationship exists between X and Y. Although the rate of error correction can be informative, this is generally not the quantity of primary interest upon which tests of substantive theories critically depend. However, even if not of primary interest, we feel that researchers should be made aware that LK’s results again reflect a fundamental error and are thus misleading.
Recall that LK estimated a bivariate GECM with two unrelated series with varying data-generating processes. LK’s Figures 1(a) and 1(b) plot the mean value of the ECM parameter (
Some further points of clarification
We were surprised by the statement, “Enns et al. provide no justification for expanding when these [critical] values should be used — to NI [near integrated] data, FI [fractionally integrated] data, or any other type. Yes, Enns et al.’s advice prevents some spurious findings but that does not mean they are the correct critical values” (Lebo and Kraft, 2017: 6). This statement is misleading for three reasons. First, LK suggest we had no justification for the critical values we used, but as we explained in our original article, the simulation results we presented for near integrated data come directly from GL’s tables G.1–G.5. In other words, we relied on results that they reported based on MacKinnon critical values. Second, we did offer a theoretical justification for using these critical values (see, especially, Enns et al., 2016: 6–7, 9, and note 21). Third, we did not simply show that using these critical values “prevents some spurious findings.” We showed that the false-positive rate was approximately 5 percent or less with these values.
We also disagree with LK’s suggestion that truly cointegrated series will mimic Stock and Watson’s (2011) textbook example of cointegration (reported in LK’s Figure 2). To highlight the importance of the Stock and Watson example, LK reference Lebo’s previous work, stating: “Error correction between variables is a very close relationship that should be obvious in a simple glance at the data” (Lebo and Grant, 2016: 22). We are strong proponents for the utility of plotting time series. However, identifying a single textbook example of cointegration and using it as the benchmark for future analyses is overly simplistic. LK could have just as easily pointed to Enders’s (2014) textbook example of three cointegrated series (shown in Figure 2), which appears more similar to the Kelly and Enns data shown in LK’s Figure 4. But, relying on Enders’s figure would be equally problematic. The problem, of course, is that the choice of figure – as well as subjective assessments comparing applied data to the chosen figure – involves substantial researcher discretion. To avoid this subjectivity, we conducted simulations which show that we would expect to falsely reject the true null hypothesis only about 5% of the time if researchers use the procedure we highlight. Researchers should definitely plot their data, but they should also use systematic statistical tests to evaluate whether cointegration exists.

Enders’s (2014) Figure 6.2 from Applied Economic Time Series (2014) (reprinted with permission from John Wiley & Sons.).
Advancing methodological debates
We have always been eager to advance our methodological understanding, even when it requires us to reconsider our previous work (e.g., Enns et al., 2014, 2016). However, our experience with this exchange suggests some general insights about how to engage usefully and constructively within a methodological debate.
First, even when the primary debate is a methodological one, existing substantive theory and research should be engaged and treated seriously. For example, not only has a sizeable literature explored – and found – a relationship between public opinion and Supreme Court decisions (e.g., Enns and Wohlfarth, 2013; Epstein and Martin, 2011; Flemming and Wood, 1997; Link, 1995; McGuire and Stimson, 2004; Mishler and Sheehan, 1993, 1996), GL found such a relationship using our data and their preferred fractional integration (FI) methods (Grant and Lebo, 2016: 23). These results should be acknowledged when critiquing literature on this topic. Similarly, any critique of research on the relationship between inequality and support for redistribution should acknowledge formal (Shayo, 2009), experimental (Trump, forthcoming), cross-national (Cavaillé and Trump, 2015), and other time series (Luttig, 2013) analyses that are consistent with the argument being critiqued. Methodological discussions and conclusions can be improved by paying attention to existing substantive literature, theoretical arguments, and related analyses.
Second, to advance the methods literature, it is most helpful to build a positive case for a new method, or a broader application of an existing one. In this instance, we think it would be extremely beneficial to make a positive case for the FI techniques advocated by GL and LG. We would be very interested in further incorporating FI techniques into our research, but as we pointed out in our previous article, we believe three aspects of FI still need to be tackled. First, concerns with estimating the FI parameter, d, with short time series must be addressed. Second, LG’s “practical guide” to estimating d ignores the many choices involved and the fact that estimates can be highly sensitive to these choices. 9 Finally, our past work has shown that there is reason to question whether the three-step fractional error correction model (FECM) approach that GL recommend can reliably detect true relationships in the data (see also Enns and Wlezien, 2017). Validating FI methods in a variety of contexts and offering a realistic guide for implementation would provide an important service to the discipline.
The heart of time series methodology involves balancing the many tradeoffs inherent in applied modeling to minimize errors and avoid incorrect inferences when testing substantive theory. Although we share Lebo and his coauthors’ concern that research continues to be published in top political science journals that uses the GECM incorrectly because MacKinnon critical values are ignored, we have shown that after correcting the errors in Lebo and Kraft (2017), their simulations reaffirm the conclusions of Enns et al. (2016). While care must certainly be taken, a fairly straightforward procedure can protect applied time-series researchers against false-positives when attempting to estimate relationships with many types of data that are common in social science research.
Footnotes
Appendix. Rate of false positives when the general error correction model (GECM) is estimated correctly
The following figure shows that when the appropriate lag lengths are used in the augmented Dickey–Fuller (ADF) test and the GECM is implemented correctly, the rate of false-positives (i.e., the rate of finding a statistically significant effect of Xt–1 after testing for both integration and cointegration) in the data Lebo and Kraft (LK) analyze is below 0.05 in every case except for two (where the false-positive rate is 0.08 and 0.06).
The results reported in Figures 1(b) and A1 selected the number of lags to include in the ADF test based on the portmanteau (Q) test for white noise residuals. As Figure A2 shows, if we selected the number of lags based on a Breusch–Godfrey test for serial correlation, virtually the same results emerge. Both tests indicate that LK’s decision to rely on a default of 3 lags for the ADF test was not appropriate for these simulations.
Declaration of Conflicting Interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
Carnegie Corporation of New York Grant
This publication was made possible (in part) by a grant from Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.
