Abstract
In a recent
Introduction
We are grateful for the opportunity to continue the dialogue about appropriate applications of the general error correction model (GECM) with Matthew Lebo and his coauthors. Although this discussion has been underway for several years now,
1
our first article on the topic followed a
Much of our evidence relied on GL’s own simulation results. Lebo and Kraft (2017) (LK) now conduct new simulations in an effort to show that our approach will routinely lead researchers to identify false (i.e., spurious) relationships. They also conduct simulations which suggest that the negative bias on the error correction parameter is much more severe than we report. A careful look at LK’s response, however, shows that it does
Our advice does not over-produce false-positives
Based on 60,000 simulations, LK conclude that the ADF test is “drastically underpowered” to reject the null hypothesis of a unit root (Lebo and Kraft, 2017: 4). This, of course, is a well-known finding (e.g., Blough, 1992; Cochrane, 1991), and LK actually quote us making the exact same point in our article (Lebo and Kraft, 2017: 3). It is important to remember
What, then, are we to make of LK’s simulation results in their Figure 1(c), which claim to follow the “exact procedures” we advocate and report false-positive rates greater than 5% in 38 out of 60 sets of simulations when

The proportion of false-positives when the augmented Dickey–Fuller test includes the
Lebo and Kraft’s first oversight results because they incorrectly used the
Researchers should also be aware, however, that LK skipped two other steps that are necessary to conclude that cointegration is present. First,
Instead of concerns about potential spurious relationships, LK’s Figures 1 (a) and (b) focus on the point estimate of the error correction parameter. On one hand, we want to be careful not to place too much emphasis on this point estimate. Researchers are typically most interested in whether a relationship exists between
Recall that LK estimated a bivariate GECM with two unrelated series with varying data-generating processes. LK’s Figures 1(a) and 1(b) plot the mean value of the ECM parameter (
Some further points of clarification
We were surprised by the statement, “Enns et al. provide no justification for expanding when these [critical] values should be used — to NI [near integrated] data, FI [fractionally integrated] data, or any other type. Yes, Enns et al.’s advice prevents some spurious findings but that does not mean they are the correct critical values” (Lebo and Kraft, 2017: 6). This statement is misleading for three reasons. First, LK suggest we had no justification for the critical values we used, but as we explained in our original article, the simulation results we presented for near integrated data come directly from GL’s tables G.1–G.5. In other words, we relied on results that they reported based on MacKinnon critical values. Second, we did offer a theoretical justification for using these critical values (see, especially, Enns et al., 2016: 6–7, 9, and note 21). Third, we did not simply show that using these critical values “prevents some spurious findings.” We showed that the false-positive rate was approximately 5 percent or less with these values.
We also disagree with LK’s suggestion that truly cointegrated series will mimic Stock and Watson’s (2011) textbook example of cointegration (reported in LK’s Figure 2). To highlight the importance of the Stock and Watson example, LK reference Lebo’s previous work, stating: “Error correction between variables is a very close relationship that should be obvious in a simple glance at the data” (Lebo and Grant, 2016: 22). We are strong proponents for the utility of plotting time series. However, identifying a single textbook example of cointegration and using it as the benchmark for future analyses is overly simplistic. LK could have just as easily pointed to Enders’s (2014) textbook example of three cointegrated series (shown in Figure 2), which appears more similar to the Kelly and Enns data shown in LK’s Figure 4. But, relying on Enders’s figure would be equally problematic. The problem, of course, is that the choice of figure – as well as subjective assessments comparing applied data to the chosen figure – involves substantial researcher discretion. To avoid this subjectivity, we conducted simulations which show that we would expect to falsely reject the true null hypothesis only about 5% of the time if researchers use the procedure we highlight. Researchers should definitely plot their data, but they should also use systematic statistical tests to evaluate whether cointegration exists.

Enders’s (2014) Figure 6.2 from
Advancing methodological debates
We have always been eager to advance our methodological understanding, even when it requires us to reconsider our previous work (e.g., Enns et al., 2014, 2016). However, our experience with this exchange suggests some general insights about how to engage usefully and constructively within a methodological debate.
First, even when the primary debate is a methodological one, existing substantive theory and research should be engaged and treated seriously. For example, not only has a sizeable literature explored – and found – a relationship between public opinion and Supreme Court decisions (e.g., Enns and Wohlfarth, 2013; Epstein and Martin, 2011; Flemming and Wood, 1997; Link, 1995; McGuire and Stimson, 2004; Mishler and Sheehan, 1993, 1996), GL found such a relationship using
Second, to advance the methods literature, it is most helpful to build a positive case for a new method, or a broader application of an existing one. In this instance, we think it would be extremely beneficial to make a positive case for the FI techniques advocated by GL and LG. We would be very interested in further incorporating FI techniques into our research, but as we pointed out in our previous article, we believe three aspects of FI still need to be tackled. First, concerns with estimating the FI parameter,
The heart of time series methodology involves balancing the many tradeoffs inherent in applied modeling to minimize errors and avoid incorrect inferences when testing substantive theory. Although we share Lebo and his coauthors’ concern that research continues to be published in top political science journals that uses the GECM incorrectly because MacKinnon critical values are ignored, we have shown that after correcting the errors in Lebo and Kraft (2017), their simulations reaffirm the conclusions of Enns et al. (2016). While care must certainly be taken, a fairly straightforward procedure can protect applied time-series researchers against false-positives when attempting to estimate relationships with many types of data that are common in social science research.
Footnotes
Appendix. Rate of false positives when the general error correction model (GECM) is estimated correctly
The following figure shows that when the appropriate lag lengths are used in the augmented Dickey–Fuller (ADF) test and the GECM is implemented correctly, the rate of false-positives (i.e., the rate of finding a statistically significant effect of
The results reported in Figures 1(b) and A1 selected the number of lags to include in the ADF test based on the portmanteau (Q) test for white noise residuals. As Figure A2 shows, if we selected the number of lags based on a Breusch–Godfrey test for serial correlation, virtually the same results emerge. Both tests indicate that LK’s decision to rely on a default of 3 lags for the ADF test was not appropriate for these simulations.
Declaration of Conflicting Interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
Carnegie Corporation of New York Grant
This publication was made possible (in part) by a grant from Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.
