Abstract
In this article, we introduce a new community-contributed command called
1 Introduction
A fundamental property of time-series data is stationarity. If a time series is stationary, then all the standard tools of statistical analysis can be applied; the series can be used in regression models, its mean and variance estimated, and its future forecasted. Frequently, though, time series are nonstationary, or unit-root processes as they are called, and in this case their statistical analysis is different. If nonstationarity is not accounted for, many problems arise for inference and prediction, the most notable of which is that of spurious regression; see, for example, Granger and Newbold (1974). Spurious regression leads to high R2 and statistically significant coefficients, when in fact there might be no relationship between the series of interest. Therefore, determining whether a series is stationary should be one of the first steps in any time-series and panel-data (which consist of multiple time series) analysis. Unit-root tests are statistical hypothesis tests used to infer whether a series is a unit root or a stationary process; see, for example, Dickey and Fuller (1979).
In a seminal article, Perron (1989) demonstrated that structural breaks can adversely impact the behavior of unit-root tests. Structural breaks are shocks that are exogenous to the model but have a lasting effect because they change the model parameters. Structural breaks occur when a system is hit by large-scale phenomena; in economics, for example, such events include wars, policy changes, a pandemic like COVID-19, or a financial crisis. These shocks can make stationary series look as if they are nonstationary and therefore mislead the unit-root tests to accept the null hypothesis of nonstationarity when it is not true. To deal with this problem, Perron (1989) proposed new unitroot tests that allow for a structural break in the constant and trend of the series. Perron’s approach, however, assumed that the date of the structural break is known to the researcher. Zivot and Andrews (1992) and Banerjee, Lumsdaine, and Stock (1992) extended Perron (1989) by allowing the date of the break to be endogenously determined by the data, and Lumsdaine and Papell (1997) further extended Zivot and Andrews (1992) to the case of two structural breaks at unknown dates.
Structural breaks can affect panel (longitudinal) data unit-root tests in the same way. In response, Karavias and Tzavalis (2014) proposed panel-data unit-root tests that allow for structural breaks in the intercepts of the series or in both the intercepts and linear trends. The break dates are assumed to be common for all series, but the magnitude of the break can differ across series. The null hypothesis is the same as in Zivot and Andrews (1992) and Lumsdaine and Papell (1997); under the null, the panel series are assumed to constitute unit-root processes without breaks, while under the alternative, they are stationary around breaking means or breaking means and trends.
The Karavias and Tzavalis (2014) tests are widely applicable and possess some unique optimality properties, as has been shown in Karavias and Tzavalis (2017, 2019). In terms of applicability, they can be used in both small- and large-T settings, where T is the number of time-series observations. They allow for multiple common breaks, and the dates of the breaks can be known or unknown. In the latter case, they can be endogenously determined from the data. The errors can be nonnormal and have cross-sectional heteroskedasticity and dependence. Finally, the autoregressive coefficients under the alternative can be homogeneous or heterogeneous, which means that either they can be the same for all units or they can differ between units. In terms of their optimality properties, the tests are invariant under the null to the initial condition, which means that no assumptions on the first observations are necessary, as is the case in other fixed-T tests. Furthermore, the tests are invariant to the coefficients of the deterministic components, and are powerful in the presence of linear trends.
This article introduces
The breaks can affect the intercepts only or the intercepts and the linear trends of the series. If the dates of the breaks are unknown, then a bootstrap procedure described in Karavias and Tzavalis (2019) is used to derive the critical value and the p-value of the test. Other options include the allowance of cross-section heteroskedasticity, crosssection dependence as in O’Connell (1998), and normal errors. Unbalanced panels are also supported.
The
The remainder of the article is organized as follows. In section 2, we present the panel unit-root tests as developed by Karavias and Tzavalis (2014) and their extension to two breaks along the lines of Karavias and Tzavalis (2019). This section also provides some Monte Carlo simulations on the behavior of the tests with two breaks. The simulations have been done using the
2 Panel unit-root tests with structural breaks
2.1 One break
For panels with N cross-section units, T time-series observations, and one common break, Karavias and Tzavalis (2014) give two models. The first model can be used to test the null hypothesis of a random walk against the alternative hypothesis of a stationary series with a break in the intercepts (means) of the series,
where i = 1,…, N and t = 1,…, T. In the above model, φ is the autoregressive parameter, and a1 ,i and a2 ,i are the fixed effects before and after the break, which happens on date b. The notation I(·) denotes the indicator function.
The second model tests the null hypothesis of a random walk with drift against the alternative of a trend-stationary panel process with a break in the intercepts and linear trends at time b:
and
In the above formulation, βi is the drift under the null hypothesis, while β1 ,i and β2 ,i are the trend coefficients under the alternative hypothesis.
We will henceforth denote by M1 the model with intercepts (1) and by M2 the model with both intercepts and trends (2). For M1, the break is allowed to be in I1 = {1, 2,…, T − 1}, and for M2 the break is allowed to be in I2 = {2,…, T − 2}. 1
The alternative hypothesis is homogeneous across different individuals, but Karavias and Tzavalis (2016) have shown that the test has power against heterogeneous alternatives as well, when φi ≠ φj and φi, φj < 1 for i, j = 1,…, N and i ≠ j. Furthermore, Juodis, Karavias, and Sarafidis (2021) argue that pooled estimators can lead to power gains as opposed to mean-group-type estimators like those of Im, Pesaran, and Shin (2003).
Karavias and Tzavalis (2014) propose estimating the autoregressive parameter φ with the following pooled least-squares estimator,
where y
i
= (yi,
1
,…, yi,T
)′ and y
i,
−
1 = (yi,
0
,…, yi,T
−
1)′ are T × 1 vectors. The orthogonal projection matrix
And
The superscripts b in
The estimator
where Bb
is the bias correction and Cb
is the variance of the bias-corrected (modified) estimator. The parameters ku
and
To deal with cross-sectional dependence in error terms, one popular approach is demeaning the series across i by subtracting the cross-sectional averages for all t before conducting the test (see, for example, O’Connell [1998]),
where
If the date of the break is unknown, Karavias and Tzavalis (2014) follow Zivot and Andrews (1992) and propose the following statistic to test for unit roots:
The limiting distribution of this statistic is shown to depend on the time dimension T. Following Karavias and Tzavalis (2019), the
The
2.2 Extension to two-break case
The alternative hypotheses in M1 and M2 allow for two breaks. In this case, (1) and (2) become
and
The pooled least-squares estimator is based on the
The vector τt is defined as
The distributions of the test statistics are as before; Z(b1 , b2) is standard normal when b1 and b2 are known, and
can be used when the dates of the breaks are unknown. Its limiting distribution can be calculated using the bootstrap. Notice that b1 and b2 cannot be in two consecutive dates for M2, which is the same situation as in a single time series; see, for example, Lumsdaine and Papell (1997). The
To provide some evidence on the behavior of the test statistic with two breaks, we conduct a small Monte Carlo experiment. Consider the data-generation process in (1) and (2) under the null and (3) and (4) under the alternative. We assume that the error term ui,t ∼ independent and identically distributed N(0, 1). The panel-data series yi,t
is generated with autoregressive coefficient ϕ = 1 to investigate the size at 5% significance level and ϕ = {0.90, 0.95} to investigate the power of the test. As we set ϕ very close to 1, it will be hard to reject the null under H1 : ϕ < 1. The initial values yi,
0 are generated as yi,
0 ∼ independent and identically distributed N(0, 1). For the M1 and M2 models, we generate the intercept and trend coefficients as follows: a1,i ∼ U(−0.5, 0), a2,i ∼ U(0, 0.5), and a3,i ∼ U(0.5, 1). The slope coefficients for the linear trends are generated as β1,i ∼ U(0, 0.025), β2,i ∼ U(0.025, 0.05), and β3,i ∼ U(0.05, 0.075). These parameter choices follow Karavias and Tzavalis (2014) and are made to make it harder to reject the null H0 : φ = 1. We set break dates to the following fractions of the sample, λ1 = 0.25, λ2 = 0.75. The number of bootstrap replications is set to 100, which is the default in
The results of the exercise appear in table 1. We can see that for both known and unknown breaks, the size of the tests is always very close to its nominal 5% level, and the power is satisfactory and increasing with both N and T.
Size and power of the tests for two known and unknown breaks
Notes: The reported values are rejection probabilities. For ϕ = 1, the reported rejection rates give the size of the test, while for ϕ < 1, they give the power of the test. The M1 model includes intercepts, while the M2 model includes both intercepts and linear trends. The dates of the breaks are b1 = ⌊0.25T⌋, b2 = ⌊0.75T⌋.
3 The xtbunitroot command
3.1 Syntax
where varname is the variable to be tested for nonstationarity. You must
3.2 Options
3.3 Stored results
4 Example
4.1 Unit-root tests for bank balance sheet variables
Unit-root processes were originally important for macroeconomic variables. However, as Holtz-Eakin, Newey, and Rosen (1988) argue, such dynamic relationships can appear in other economic variables for which long time series may not be available. In this case, the panel dimension of the data can be used for inference.
In this section, we focus on bank balance sheet variables taken from the call reports of the Federal Deposit Insurance Corporation. This is a very popular dataset; see, for example, Kripfganz and Sarafidis (2021) and Juodis, Karavias, and Sarafidis (2021) for some recent applications. However, while stationarity of the bank balance sheet variables is frequently assumed, it is almost never tested. The contribution of this section is to examine the stationarity of four variables of interest, namely, returns on assets, returns on equity, total assets, and noninterest income. The returns on assets (
We collect data for a random sample of 500 banks from the third quarter of 2018 to the fourth quarter of 2020. This period includes the COVID-19 pandemic, which may have caused breaks in the intercepts and trends of the series. The short dimension of the data is chosen so that our sample of banks does not suffer from survivorship bias. The data are publicly available, and they have been downloaded from the Federal Deposit Insurance Corporation website. 2
In the following, we perform panel unit-root tests allowing for structural breaks in the aforementioned variables. We assume that our sample is large enough so that the idiosyncratic errors ui,t are normally distributed. The errors can be cross-sectionally heteroskedastic, but under the normality assumption, the tests are invariant to heteroskedasticity. Finally, we assume error cross-section dependence, caused possibly by the monetary policy rate.
We start the analysis with the
The output above indicates that the Z(b) statistic is equal to −15.521, which is far less than the critical value of −1.645; therefore, we can reject the null hypothesis of nonstationarity. The break under the alternative takes place in observation 7, which corresponds to the first quarter of 2020. The output also reports the result “the null is rejected”, which is the outcome at the 5% significance level. The following output presents the results for
We can see from the above output that the
For the remaining three variables, we will implement the tests being agnostic about the date of the break. For
As we can see from the above outputs,
5 Concluding remarks
This article introduced a new community-contributed command,
6 Programs and supplemental materials
Supplemental Material, sj-zip-1-stj-10.1177_1536867X221124541 - Panel unit-root tests with structural breaks
Supplemental Material, sj-zip-1-stj-10.1177_1536867X221124541 for Panel unit-root tests with structural breaks by Pengyu Chen, Yiannis Karavias and Elias Tzavalis in The Stata Journal
Footnotes
Notes
6 Programs and supplemental materials
To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type
