1 Introduction
Since Sims (1980) popularized it, the vector autoregression (VAR) model has proven to be one of the most successful models for analyzing economic (and noneconomic) time-series data, offering descriptive analysis of the dynamics, forecasting, structural inference, and policy analysis. However, because a many-variables, higher-order VAR tends to be overparameterized—yielding weak inference—a general-to-specific (GETS) approach has been suggested to improve the inference after VAR (Campos, Ericsson, and Hendry 2005). Moreover, Asali, Abu-Qarn, and Beenstock (2017) provided additional estimators of long-run (LR) and steady-state effects that can be calculated after the VAR or the GETS VAR. Asali and Gurashvili (Forthcoming) use this framework to study the relationship between discrimination in the labor market and the macroeconomy.
To facilitate this type of analysis, from estimating the original balanced VAR to its parsimonious GETS version, and inference after each model, I developed the command vgets, which standardizes the steps in this type of analysis and simplifies the process by making it easier and less prone to calculation mistakes. The vgets command estimates the full-specification VAR, arrives at its best-reduced version (GETS), tests for Granger causality of each variable in the system, and provides estimates of the LR and the cumulative impulse–response (CIR) effects of each variable for both specifications (the full and the GETS). It also provides additional diagnostics and tests that can facilitate genuine-causality interpretation of the Granger causality tests. The time saved in doing the analyses with vgets, compared with carrying them out manually, is immeasurable.
Jaeger and Paserman (2008) (henceforth, JP) used the VAR framework to analyze the cycle of violence in the Israeli–Palestinian conflict. Asali, Abu-Qarn, and Beenstock (2017) (henceforth, AAB), using the same JP data, but devising the use of GETS specifications, and offering new steady-state estimators (like the LR kill ratio or the CIR), overturned previous findings. In this article, I use the JP and AAB studies as examples of applying the vgets command.
2 Statistical background: VAR, GETS VAR, Granger causality, and LR effects
To illustrate the method and test statistics, we use the simplest two-variables VAR system
where z is a vector of exogenous control variables. The full specification refers to the balanced VAR, with the same lag length for all the main variables. The GETS VAR is the system we get from eliminating some of the insignificant lagged variables in each of the system equations; this is the so-called near-VAR system. In this case, the GETS VAR does not have to be balanced, and the lag length with respect to each variable (that is, Lx
1
, Ly
1
, Lx
2
, Ly
2) can be different. We then define the short-run (SR) effect of y on x as the sum of the coefficients of all the lagged y variables in the x equation; namely,
Likewise, we can define α
1
, α
2, and β
2. These effects are grouped in the matrix A:
The LR effects of one variable on the other is calculated by solving that dependent variable’s equation for the LR ratios between the variables—all the SR lags being summed up over all the lags. The LR effect of y on x from the first equation (which is the LR x/y ratio) is given by
Likewise, the LR y/x ratio from the second equation is given by
Finally, the CIR is defined as the relevant element in the matrix:
For the 2 × 2 case above, for example, the CIR of y in the x equation is given by
Statistical inference is then carried out for the effects in question (SR, LR, and CIR): the standard errors of these estimates are calculated using the delta method.
The extension to higher-order VARs is straightforward. While estimates of the LR and CIR effects are always provided for any n-variable VAR, statistical inference is limited to up-to-four-variables VAR systems because calculating the standard errors for the different CIR elements in many-variables (≥ 5) VARs is computationally prohibitive.
3 Automatic inference after VAR: vgets
The above steps can, in principle, be applied manually, but that is extremely time consuming and tedious, and the process is prone to many subtle errors of execution that are difficult to discover or fix retroactively. The command vgets is aimed at addressing this need. In addition, it offers diagnostic checks that are essential for the interpretation of results and the validity of statistical inference after the VAR estimation. The results of these checks, for example, the absence of serial correlation in the system errors, can give the Granger causality results a genuine causality interpretation (for example, refer to the AAB article).
3.1 Syntax
vgets varlist [ if ] [ , maxlag(
#
) t(
#
) exog(
varlist
) diagnostics quick nois format(%
fmt
)]
3.2 Description
vgets estimates a balanced VAR of the listed variables, with lags of each variable from 1 to maxlag(). It also estimates the GETS VAR system of the given variables, using a stepwise elimination procedure where the omitted lagged variables are those whose coefficient’s t statistics are, in absolute value, below the specified level in the t() option. The command then reports, for both the full specification and the GETS specification, Granger causality tests of the different variables, the LR effects (or ratios) of the different variables, and the CIRs of the different variables. The exogenous variables listed in the exog() option are not subject to elimination in the GETS procedure.
While the point estimates of the effects are reported for any set of main variables, statistical inference (that is, robust standard errors and p-values) is calculated and reported only for systems of two, three, or four variables.
3.3 Options
maxlag(
#
)specifies the maximum lag length of the main variables in the balanced VAR. The default is maxlag(1).
t(
#
) specifies the maximum level of the t statistic at which lagged variables will be dropped in the GETS procedure. Guided by the Haitovsky (1969) rule, the default is t(1): lagged variables with absolute t statistic below 1 will be dropped iteratively to reach the system GETS final specification.
exog(
varlist
) specifies the list of exogenous or control variables that are not subject to elimination in the GETS procedure (that is, maintained regardless of their t statistic). These variables are included in only their current (time t) values and are not subject to any lag structure.
diagnostics reports diagnostic measures of the residuals in each equation. In particular, it reports robust Breusch–Godfrey tests of serial correlation (up to the sixth order), Jarque–Bera asymptotic tests for normality of the residuals in each equation of the system, skewness and kurtosis of the respective residuals set, and Akaike information criterion and Bayesian information criterion for model selection.
quick shows only the effects (SR, LR, CIR) matrices, without statistical inference (standard errors, p-values, tests for Granger causality). This is the default and only option if the main set of variables includes five or more variables.
nois shows all the underlying regressions, tests, and intermediate steps.
format(%
fmt
) specifies the display format for coefficients, effects, standard errors, and p-values.
3.4 Stored results
The command stores the results in r(). The SR effects are stored in the r(SR) matrix and in the r(getsSR) matrix for the GETS specification. The LR effects (ratios) are stored in the r(LR) matrix for the full specification and in the r(getsLR) matrix for the GETS specification. Likewise, the CIR matrix is stored under the name r(CIR)for the full specification and r(getsCIR) for the GETS specification. If Granger causality tests are carried out, then the respective test statistics and p-values will be stored under the scalars r(chi
XY
) and r(pchi
XY
), respectively; XY refers to the order of the variables in the list varlist, that is, the effect of variable number Y on variable number X. So, if we type vgets var1 var2 var3, then r(chi32) will represent the test statistic of Granger causality of var2 on var3. For the GETS specification, these results are stored under the names r(Gchi
XY
) and r(Gpchi
XY
).
Similarly, the effects, their standard errors, and their p-values for the LR effects and the CIRs will be stored under the scalars r(LR
XY
), r(LRse
XY
), r(LRp
XY
), r(CIR
XY
), r(CIRse
XY
), and r(CIRp
XY
) for the full specification. For the GETS specification, the respective scalars are r(GLR
XY
), r(GLRse
XY
), r(GLRp
XY
), r(GCIR
XY
), r(GCIRse
XY
), and r(GCIRp
XY
). The matrix r(TABLE) summarizes the results for all equations in the full specification—Granger causality, LR effect, and CIR of each variable for each equation. The GETS specification results are stored in the matrix r(getsTABLE).
3.5 Example
JP provided an interesting use of balanced VARs to study the cycle of violence in the Middle East. Their conclusion, regarding the absence of a cycle, however, has been overturned in AAB, who suggested the use of GETS VAR because of the overparameterization in the full, balanced specification.
1
In the Israeli–Palestinian conflict, the number of Palestinian and Israeli fatalities in day t are, respectively, pal
t
and isr
t. The full specification of the system is given by
where z is a vector of control, exogenous variables that include dummy variables indicating the weekday, dummy variables indicating the period in the Israeli–Palestinian context (for example, Oslo accords, Barak government, Sharon government, pre- and post-September 11), and the length of the barrier built between Israel and the West Bank.
The first equation of the VAR is called the “Israeli reaction function”, and the second equation is called the “Palestinian reaction function”. Israel “reacts to violence” if the lags of Israeli fatalities in the Israeli reaction function Granger-cause Palestinian fatalities. Likewise, the Palestinian reaction is defined from the second equation.
The main findings of JP and AAB can be easily replicated with the vgets command. For example, reproducing columns 2 and 4 of table 1 from JP, as well as columns (1) and (3) in both panels of table 1 from AAB, is carried out as follows:
. use intifada_extended_data
. vgets pal_tot isr_tot if date>=14882 & date<=16451, maxlag(14) t(1)
> exog(Period2 Period3 Period4 Period5 Period6 Period7 completed sunday monday
> tuesday wednesday thursday friday) format(%9.3f)
To see the underlying regressions and statistical tests, add the option nois. While JP refer only to the full specification, AAB refer also to the GETS specification. The vgets command reports both:
GC refers to Granger causality, so in the full specification, the own (Israeli) fatalities Granger-cause Palestinian fatalities; that is, Israel reacts to violence, significant at the 5% level (pv = 0.035), while Palestinians do not react to violence (pv = 0.208). However, once the GETS (AAB) specification is considered, we see that both sides react with respective p-values of 0.011 and 0.064. The slight difference between the reported p-values and those appearing in the original studies is that the vgets command uses the more efficient system-estimation approach (seemingly unrelated estimation), as opposed to the equation-by-equation ordinary least squares.
LR refers to the LR effect. In the current context, the LR effect from the Israeli reaction function refers to the number of Palestinian fatalities inflicted by Israel per each Israeli fatality. In the full specification, this is 1.32, and in the GETS specification, this is 1.23; both are highly statistically significant.
Finally, CIR refers to the CIR, which is the response of one side to the violence of the other, accounting for the other’s response. These are also statistically significant and valued at 1.615 and 1.523 for the Israeli reaction function in the full and GETS specifications, respectively. (For the Palestinian reaction functions, the respective estimates are 0.179 and 0.158.)
Adding the option diagnosticsat the end of the command reports all the diagnostic tests that AAB suggested for testing the performance and interpretation of the VAR in these contexts, like serial correlation tests of the system residuals, normality tests of the system residuals, and information criteria for each model. This pertains to the upper panel of table 1 from AAB. For example, for the full specification of the Israeli reaction function, the diagnostics output will be
The absence of serial correlation in the residuals of the first equation, the long specification of the Israeli reaction function, renders the lagged variables weakly exogenous and therefore suggests that the Granger causality (from own Israeli fatalities to Palestinian fatalities) that we found earlier (χ
2 = 24.95, p-value = 0.035) can be interpreted as genuine causality. The same can be said about the Palestinian reaction function.