Sage Journals: Discover world-class research

Abstract

We propose a simple smoothing method for the spatiotemporal disaggregation of economic time series. Contrary to the existing methods, our approach does not require exogenous regressors and can therefore be used for countries lacking long and reliable series of regional economic indicators. The proposed method can also be applied sequentially, implying that one needs to revise only the estimates for the last low-frequency period when new data are disaggregated. This is a convenient feature when historical estimates are of interest. We apply this method to disaggregate annual real GDP data for Polish regions into quarterly series and compare the results with the series obtained from a multivariate linear regression-based procedure, considering the differences between the estimates and the consequences for regional recession dating. We also examine the nowcasting performance of the smoothing algorithm and find that it is superior to the regression-based alternative for most of the studied sample lengths and horizons up to a year.

Keywords

GDP disaggregation regions business cycle nowcasting

1. Introduction

Conceptualizing efficient economic policies and conducting robust research requires reliable and up-to-date data. For economic activity, such data are available mainly at the national level. Flash estimates of quarterly gross domestic product (GDP) are usually released after several weeks, whereas various forecasts and nowcasts are published much earlier. The situation is much worse at the regional level. Typically, only annual GDP series are available, and publication delays can last one year or even longer. The USA and the UK are notable exceptions, as their statistical offices publish quarterly regional GDP series with delays of three and six months, respectively.

Several spatiotemporal disaggregation methods have been developed to address the missing regional data problems. All of these methods are statistical approaches that estimate the missing high-frequency series using a set of auxiliary regional indicators and standard temporal and spatial contemporaneous constraints. For example, Mazzi and Proietti (2017, 234) distinguish among three classes. First, observation-driven models extend the seminal regression-based model developed by Chow and Lin (1971; Cuevas et al. 2015; Di Fonzo 1990; Pipień and Roszkowska 2015; Proietti 2011; Rossi 1982). Second, parameter-driven methods aim to build a joint structural model for the estimated series and auxiliary indicators, usually using dynamic factor models (Moauro and Savio 2005; Proietti and Moauro 2006) or mixed-frequency Bayesian vector autoregressive models (Koop, McIntyre, and Mitchell 2020; Koop, McIntyre, Mitchell, and Poon 2020; Lehmann and Wikman 2022). Third, semiparametric approaches encompass Denton’s multivariate benchmarking method (Di Fonzo and Marini 2011) and the polynomial method proposed by Hedhili Zaier and Trabelsi (2007).

While accounting for additional information usually improves estimates, it also leads to some difficulties. First, it requires reliable indicators. For many countries, particularly the post-communist countries of Central and Eastern Europe, the high-frequency series related to economic activity at the regional level are usually short. Additionally, their quality is relatively low because of idiosyncratic shocks or breaks. For example, Statistics Poland—the national statistical agency—is one of the world leaders in data coverage and website availability according to the ODIN ranking (Open Data Watch 2021). However, at the NUTS-2 regional level in Poland, there are only five series related to cyclical economic activity measured at the quarterly frequency available for the 2006 to 2021 period. Of these, one is the unemployment rate, which, because of strong employment protection in Poland, lags the business cycle considerably (see Grabek and Kłos 2013, for the evidence for Poland and Di Iorio and Triacca 2022, for some other European countries) and therefore cannot be a reliable regressor for quarterly GDP estimation. The industrial production and investment series are also plagued with outliers, where the annual change exceeds 40% and even reaches more than 100%. And these numbers do not refer to the coronavirus disease (COVID-19) period.

The second problem is practical. As data for a new period are released, disaggregating them using statistical techniques usually requires revising all the previously disaggregated series. For nowcasting or forecasting purposes, this is just a minor side effect; however, when the historical series are of interest, the drawback becomes severe, even if it can be administered through the official data revision policy. Although the estimates from short time series are likely to be more sensitive to new observations, longer series are not immune to this problem either.

For example, the cyclical estimates of the quarterly gross value added (GVA) for Great Britain’s regions that are based on the half-century-long sample (Koop, McIntyre, Mitchell, and Poon 2020, the disaggregated series are available at https://www.escoe.ac.uk/regionalnowcasting/; accessed 8.09.2022) also vary considerably between editions. The mean absolute difference between the estimates of the real quarter-on-quarter GVA obtained in the last quarter of 2018 and the last available results from the second quarter of 2021 ranges between 0.3 and 0.7 p.p., representing 9.6% to 28.6% of the unconditional standard deviation of the series. Even the differences between the consecutive estimates are non-negligible. They are equal to 0.1 to 0.4 p.p., that is, 3.3% to 17.7% of the unconditional standard deviation for the first two quarters of 2021. To some extent, these differences may result from the input data revisions. However, they not only occur at the end of the sample, where the input series are revised, as their sizes are approximately the same as at the beginning of the investigation period.

Finally, as noted by Mazzi and Proietti (2017, 233), official statistical agencies still prefer univariate methods because they are simpler and less computationally intensive. The multivariate approaches can only be applied by agencies that employ highly trained staff with experience in disaggregation problems, such as in Italy or Spain.

Herein, we propose an alternative spatiotemporal decomposition method that uses a smoothing criterion. We postulate that the annual growth rates of the disaggregated series should be as smooth as possible. This idea extends to the first algorithms for univariate disaggregation proposed by Boot et al. (1967) that can be viewed as a special application of the adjustment problem considered by Denton (1971) and Cholette (1984). To our best knowledge, it has never been utilized for multivariate cases.

Unlike other methods, ours does not require auxiliary regressors and can be applied sequentially. This means that previous estimates need not be revised entirely when new data is disaggregated––only the last part needs to be updated. Additionally, our method is computationally simple because it requires solving a set of rather simple, constrained optimization problems, as shown in the next section. These advantages make the smoothing procedure a reasonable choice for countries that lack long and reliable series of regional statistics and when historical estimates are of particular interest.

Focusing on growth rates makes our procedure closely related to the growth-preserving benchmarking and reconciliation methods (Bozik and Otto 1988; Causey and Trager 1981; Dagum and Cholette 2006; Di Fonzo and Marini 2012, 2015; Titova et al. 2010; Trager 1982), particularly when looking at the optimization problems. However, contrary to these studies, we use annual growth rates instead of period-by-period ones.

We apply our method to disaggregate the annual real GDP series for 16 NUTS-2 regions of Poland. The annual data on regional GDP in Poland are published with substantial delays. Nominal values are released after one year, whereas the real series are published two years after the end of the year of interest. Like other post-communist countries of Central and Eastern Europe, Poland underwent a complete transformation of its socioeconomic life, including the administrative division of the country and the functioning of public statistics. Therefore, as discussed earlier, it also lacks long and reliable series of regional business cycle benchmarks.

We consider two versions of the smoothing algorithm. The baseline sequential approach divides the disaggregation problem into a set of annual sub-problems. In contrast, the one-step variant solves the problem in one step for the entire study period. The results show that the differences between the two procedures are minimal. We also compare the disaggregated series with the results of a multivariate linear regression-based procedure. We consider the differences between the estimated growth rates and the consequences for regional recession dating.

Finally, we also examine the nowcasting abilities of the smoothing approach by running a pseudo-real-time nowcasting experiment. By nowcasting, we mean estimating the quarterly regional values when the annual regional totals are unavailable. Theoretically, because the smoothing approach relies solely on the persistence of the disaggregated series and the current quarterly country-level data, it is expected to be inferior to the regression-based methods guided by auxiliary indicators. However, it is unclear to what extent the aforementioned problems of short samples and poor data quality affect the benchmarking procedures and decrease their nowcasting abilities.

The remainder of the article is organized as follows. Section 2 presents the procedure. Section 3 is devoted to the disaggregation results, whereas Section 4 presents the nowcasting experiment. Section 5 concludes the study.

2. The Procedure

2.1. The Preliminaries

This study uses the following notation: capital letters represent annual variables, whereas small letters denote quarterly data. The procedure takes two series as inputs: the annual data on regional GDP levels $Y_{τ, i}$ , where $τ = 1, 2, \dots, T$ is the time index for years and $i = 1, 2, \dots, M$ is the index for regions, and the quarterly data on country GDP level $y_{t}^{c}$ , where $t = 1, 2, \dots, 4 T$ represents the time index for quarters. We aim to determine the quarterly regional GDP levels $y_{t, i}$ , $t = 1, 2, \dots, 4 T$ , $i = 1, 2, \dots, M$ . Evidently, there is a natural correspondence between quarterly and annual data: every four consecutive quarterly values represent four quarters of the appropriate year from the annual series.

The unknown quarterly regional data should satisfy the set of temporal and contemporaneous spatial constraints:

\sum_{j = 1}^{4} y_{4 (τ - 1) + j, i} = Y_{τ, i}, τ = 1, 2, \dots, T, i = 1, 2, \dots, M

(1)

\sum_{i = 1}^{M} y_{t, i} = y_{t}^{c}, t = 1, 2, \dots, 4 T

(2)

The first condition represents the aggregation of quarterly data to the observed annual series, and the second constraint implies that the regional data sum up to the observed country series. Constraints, Equation (1) and (2), imply that the input series should satisfy the following set of conditions:

\sum_{i = 1}^{M} Y_{τ, i} = \sum_{j = 1}^{4} y_{4 (τ - 1) + j}^{c}, τ = 1, 2, \dots, T,

(3)

so that the sum of the annual regional values is equal to the sum of the quarterly country-level series for a given year. In practice, the condition in Equation (3) might not be satisfied because of rounding errors, mainly when the series are created from volume indices. In this case, an initial rebalancing procedure should be applied to establish the series’ consistency. The simplest one is to adjust the regional values proportionally to match the sum of the quarterly country-level values:

{\hat{Y}}_{τ, i} = \frac{\sum_{j = 1}^{4} y_{4 (τ - 1) + j}^{c}}{\sum_{i = 1}^{M} Y_{τ, i}} \cdot Y_{τ, i},

(4)

where ${\hat{Y}}_{τ, i}$ stands for the adjusted value.

2.2. The Optimization Problem

In this study, we disaggregate the two input series so that the resulting year-on-year (y-o-y) growth rates of quarterly regional GDP are as smooth as possible and satisfy the constraints in Equation (1) and (2). More formally, let $Δ y_{t, i}$ denote the annual growth rates of the quarterly regional GDP data:

Δ y_{t, i} = \frac{y_{t, i}}{y_{t - 4, i}} - 1

(5)

The set of quarterly regional GDP series ${y_{t, i}}$ solves the following optimization problem:

min_{{y_{t, i}}} [\sum_{i = 1}^{M} w_{i} (\sum_{t = 6}^{4 T} {(Δ y_{t, i} - Δ y_{t - 1, i})}^{2})] s . t . (1) - (2),

(6)

where $w_{i}$ denote region-specific weights. We use the average regional share in the country’s GDP as the weights:

w_{i} = \frac{1}{T} \sum_{τ = 1}^{T} \frac{Y_{τ, i}}{\sum_{i = 1}^{M} Y_{τ, i}}

(7)

T9he weighting scheme plays a vital role in the optimization problem because the contributions of different regions to GDP and, in turn, to contemporaneous constraints, Equation (2), differ considerably. Without the weights, the algorithm would prefer smoothing the growth rates in small regions over the most important ones.

Following Di Fonzo and Marini (2012, 2015; see also Brown 2010; Causey and Trager 1981), we use the interior point Newton-type algorithm to solve the smoothing problems. In the case of growth rate smoothing problems, it also delivers fast and accurate solutions.

2.3. The Sequential Solution to the Optimization Problem

Generally, the optimization problem Equation (6) should be solved in one step using a numerical solver. Such an approach, however, requires revising all the disaggregated series when data for a new period are added, leading to the frequent data revision problem discussed in Introduction. Fortunately, the revisions are tiny and an alternative, sequential approach to optimization can be applied. It fixes most of the already disaggregated series and adjusts only the most recent estimates to disaggregate the new data. Therefore, this version is no longer subject to substantial revisions of the earlier estimates. Notably, the solution obtained with the sequential disaggregation differs from that obtained with the one-step procedure. However, as documented in the next section, the differences are minimal.

The sequential procedure entails as follows: in the first step, the problem Equation (6) is solved for some initial $T_{0}$ years:

min_{{y_{t, i}}_{t = 1}^{4 T_{0}}} [\sum_{i = 1}^{M} w_{i} (\sum_{t = 6}^{4 T_{0}} {(Δ y_{t, i} - Δ y_{t - 1, i})}^{2})] s . t . (1) - (2) .

(6′)

In the second step, we add the data for the next year $T_{0} + 1$ , take the growth rates of the disaggregated regional GDP series from the last quarter of year $T_{0} - 1$ as given, and solve Problem (6) for the data covering years $T_{0}$ and $T_{0} + 1$ :

min_{{y_{t, i}}_{t = 4 (T_{0} - 1) + 1}^{4 (T_{0} + 1)}} [\sum_{i = 1}^{M} w_{i} (\sum_{t = 4 (T_{0} - 1) + 1}^{4 (T_{0} + 1)} {(Δ y_{t, i} - Δ y_{t - 1, i})}^{2})] s . t . (1) - (2),

(6″)

where $Δ y_{4 (T_{0} - 1), i}, \dots, Δ y_{4 T_{0}, i}$ are given by the disaggregated values ${\hat{y}}_{4 (T_{0} - 2), i}, \dots, {\hat{y}}_{4 (T_{0} - 1), i}$ obtained in the previous step. In other words, in the second step, we disaggregate the data for year $T_{0} + 1$ and revise the disaggregated series for the previous year. In the subsequent steps, we repeat this step for the next periods until the last year $T$ .

In this study, we treat the sequential procedure as the baseline algorithm.

2.4. Nowcasting

By nowcasting, we mean estimating the missing quarterly regional GDP series for periods when no annual regional values are available, but the quarterly country-level data are known. In this case, we solve the smoothing problem Equation (6) without the contemporaneous constraints Equation (1). Formally, let $H$ denote the nowcast horizon in quarters. The nowcast series solve the following optimization problem:

min_{{y_{t, i}}_{t = 4 T + 1}^{4 T + H}} [\sum_{i = 1}^{M} w_{i} (\sum_{t = 4 T + 1}^{4 T + H} {(Δ y_{t, i} - Δ y_{t - 1, i})}^{2})] s . t . (2),

(8)

where $Δ y_{4 T, i}, \dots, Δ y_{4 (T + 1), i}$ are calculated using the disaggregated values ${\hat{y}}_{4 (T - 1), i}$ , …, ${\hat{y}}_{4 T, i}$ .

2.5. The Multivariate Regression-Based Method

In the study, we compare the performance of the smoothing algorithm a multivariate regression-based method. This approach uses a set of auxiliary high-frequency, region-level regressors that serve as benchmarks for disaggregated series. We consider the following optimization problem:

min_{{y_{t, i}}, {β_{j, i}}} {\sum_{i = 1}^{M} w_{i} [\sum_{t = 5}^{4 T} {(Δ y_{t, i} - (β_{0, i} + \sum_{j = 1}^{J} β_{j, i} x_{j, t, i}))}^{2}]} s . t . (1) - (2),

(9)

where $β_{j, i}$ denote region-specific regression coefficients and $x_{j, t, i}$ represent values of the auxiliary regressors. The algorithm is similar to the well-known Chow and Lin (1971) procedure with two main differences: it uses growth rates in the objective function and does not account for the serial correlation of regression errors.

To nowcast with the regression-based method, we use the estimated regression coefficients ${\hat{β}}_{i}$ from the disaggregation step. Assuming that the values of the auxiliary regressors in the nowcast horizon are known, the nowcast series solve the following problem:

min_{{y_{t, i}}_{t = 4 T + 1}^{4 T + H}} {\sum_{i = 1}^{M} w_{i} [\sum_{t = 4 T + 1}^{4 T + H} {(Δ y_{t, i} - ({\hat{β}}_{0, i} + \sum_{j = 1}^{J} {\hat{β}}_{j, i} x_{j, t, i}))}^{2}]} s . t . (2),

(10)

where $Δ y_{4 T + 1, i}, \dots, Δ y_{4 (T + 1), i}$ are calculated using the disaggregated values ${\hat{y}}_{4 (T - 1) + 1, i}$ , …, ${\hat{y}}_{4 T, i}$ .

We apply an iterative algorithm to solve the regression problem Equation (9). First, given the starting values of $y_{t, i}$ , we estimate the regression coefficients $β_{j, i}$ using OLS. Subsequently, given the regression coefficients, we update $y_{t, i}$ solving the problem Equation (9) for fixed. We iterate over these two steps until convergence. Similarly to the smoothing problems, we apply the interior point algorithm in the $y_{t, i}$ updating step.

3. The Disaggregation Results

In this section, we use the method described above to disaggregate the real GDP data for Poland. We use the results to analyze business cycles at the regional level and compare the results with the multivariate benchmarking/regression method.

3.1. The Data

Statistics Poland publishes the regions’ annual growth rates of chain-linked GDP volumes. The data are available for the 2004 to 2019 period. For 2020, only nominal GDP data are published. Therefore, we approximate the regional growth rates of GDP volumes for 2020 by deflating the nominal growth rates for that year by the regional price indices for the previous year (the price indices for 2020 are also unknown). Subsequently, we create the chain-linked volume GDP series using the growth rates described above and the GDP series at the current prices for 2003 published by Statistics Poland. We also use the official country-level quarterly data on chain-linked GDP volume growth rates for the period 1Q2004 to 4Q2020. Finally, because the series do not satisfy the condition in Equation (3), the rebalancing procedure is applied where the regional values are adjusted proportionally, as described in the previous section. The nominal GDP in the studied regions is illustrated in Figure 1.

Figure 1.

Nominal GDP for 2019 in Polish NUTS-2 regions (in billion EUR).

The calculations are conducted in Julia using the Ipopt solver. The codes and the data are available in GitHub repository at https://github.com/JanAcedanski/spatio-temporal-disaggregation-with-smoothing.

3.2. The Disaggregated Series for Polish Regions

The results of the sequential smoothing disaggregation procedure are presented in Figure 2. It shows the y-o-y growth rates of the disaggregated series and the corresponding growth rates of the country-level and annual regional-level input data. The outcome series are characterized by substantial smoothness, although some local spikes governed by the national data are visible. The most obvious is the COVID-19 crash in Q2 2020. Other notable jumps occur at the turn of 2009 and 2010 and in Q4 2015.

Figure 2.

Disaggregation of real GDP series for Poland (y-o-y growth rates).

Table 1 presents the smoothness statistics for the disaggregated series. We use a simple version of the smoothness measure proposed by Froeb and Koyak (1994), defined as the ratio of the long-run standard deviation to the short-run one. The index is calculated as follows:

Table 1.

Smoothness Indices for the Disaggregated Series.

Period	Regions			Poland
Period	Median	Min	Max	Poland
Full	1.42	1.09	1.86	1.39
excl. COVID	1.82	1.59	2.33	1.89

S I_{i} = \frac{s (Δ y_{t, i})}{s (d Δ y_{t, i})},

(11)

where $s$ represents standard deviations and $d Δ y_{t, i} = Δ y_{t, i} - Δ y_{t - 1, i}$ . Higher values of the index indicate higher levels of smoothness or, in other words, lower levels of volatility. Overall, the disaggregated series are characterized by a similar level of volatility as the country-level data, regardless of whether we consider the entire period or exclude the last three quarters of the COVID-19 crisis. The mean values of the smoothness indices for the regions almost coincide with the country-level GDP dynamics. It should be noted that there is significant variability in the smoothness of the regional series. The indices range from 1.09 to 1.86 for the entire sample and from 1.59 to 2.33 when excluding the COVID-19 crisis period.

3.3. Comparison with the Alternative Approaches

In this subsection, we compare our baseline sequential smoothing disaggregation procedure with the two alternatives: the one-step version of the smoothing algorithm and the regression-based benchmarking method. The former approach relies on the same data as our baseline method. For the benchmarking method, we use four quarterly regional indicators as regressors. These are annual growth rates of: sold industrial production at constant prices, investment outlays, gross salary, and the consumer price index. The two middle series are deflated using the regional CPI indexes. Besides the unemployment rate that lags the business cycle considerably, these are the only Polish regional series available at the quarterly frequency for a longer period, starting from 2006. Because the industrial production series contain many outliers, we replace the annual growth rates that exceed 40% or are lower than −40% with the means of the neighborhood values. The distributions of the auxiliary regressors are shown in Figure 3.

Figure 3.

Distribution of the annual growth rates of the regressors used in the Chow-Lin method.

We use several distance measures between our baseline sequential smoothing procedure and the two alternatives. These are the mean difference (MD), mean absolute difference (MAD), and root mean square difference (RMSD), defined as:

M D_{i} = \frac{1}{4 (T - 1)} \sum_{t = 5}^{4 T} d_{t, i}

(12)

MA D_{i} = \frac{1}{4 (T - 1)} \sum_{t = 5}^{4 T} | d_{t, i} |

(13)

RMS D_{i} = \sqrt{\frac{1}{4 (T - 1)} \sum_{t = 5}^{4 T} d_{t, i}^{2}}

(14)

where $d_{i, t} = y_{t, i} - y_{t, i}^{seq}$ . We also calculate the Pearson correlation coefficients between the baseline and alternative results.

In addition to the annual growth rates (y-o-y) reported above, we calculate the quarter-over-quarter (q-o-q) changes using the seasonally adjusted disaggregated series. For seasonal adjustments, we employ the default X-13ARIMA-SEATS procedure (Sax and Eddelbuettel 2018).

All the distance measures are calculated region-wise. In Table 2, we report the mean and extreme values of the measures across the regions.

Table 2.

The Differences Between the Sequential Smoothing Procedure and the Alternative Methods.

Method	MD [p.p.] mean/min/max	MAD [p.p.] mean/min/max	RMSD [p.p.] mean/min/max	Pearson mean/min/max
y-o-y growth rates
One-step	0.00/0.00/0.00	0.08/0.08/0.08	0.12/0.11/0.13	.999/.997/1.000
Regression	0.00/0.00/0.00	0.64/0.53/0.76	0.83/0.71/0.95	.930/.871/.968
q-o-q growth rates
One-step	0.00/0.00/0.01	0.06/0.03/0.12	0.11/0.04/0.20	.998/.994/1.000
Regression	0.01/−0.11/0.03	0.55/0.31/1.10	0.79/0.45/1.66	.909/.751/.978

The differences between the sequential and one-step versions of the smoothing procedure are minimal. On average, the disaggregated series obtained from both methods coincide. The mean absolute difference for the annual growth rates slightly exceeds 0.1 p.p. on average, but it never reaches 0.2 p.p. The two series are also perfectly correlated, as the correlation coefficients never drop below .996. These results clearly show that the more practical sequential procedure can replace the one-step smoothing approach with minimal risk of result distortion.

Significantly higher differences are observed in the case of the regression-based procedure. Although the mean difference is also 0, the absolute and squared differences reach 0.6 p.p. and 0.8 p.p., on average, and 0.8 p.p. and 1.0 p.p., at most, respectively, for y-o-y growth series. In addition, although high, the correlation is far from perfect. The differences are quite similar for q-o-q growth rates.

The differences between the sequential and regression-based procedures become less important when comparing the descriptive statistics of the two series, as shown in Table 3. For each region, we calculate means, standard deviations, and autocorrelations. The table reports the cross-sectional means, extremes, and standard deviations. Our baseline approach is characterized by marginally higher means (3.33 and 3.24 p.p. for y-o-y growth rates) for the disaggregated series and considerably lower volatility for some regions (see Figure 4). This is documented by the difference in serial standard deviations for the regions with the highest fluctuations of the disaggregated series (3.8 p.p. for sequential smoothing and 4.6 p.p. for the regression-based method) as well as the higher autocorrelations (.74 and .64, respectively).

Table 3.

The Comparison of the Statistics for the Alternative Decompositions for the Regions.

Method	Mean [p.p.] mean/min/max/std	St. dev. [p.p.] mean/min/max/std	Autocorr. mean/min/max/std
y-o-y growth rates
Sequential	3.33/2.35/4.64/0.61	2.71/1.83/3.84/0.51	.74/.58/.85/.07
One-step	3.38/2.34/4.64/0.61	2.73/1.83/3.84/0.50	.75/.61/.85/.06
Regression	3.24/2.35/4.56/0.55	2.83/2.13/4.61/0.65	.64/.32/.77/.11
q-o-q growth rates
Sequential	0.85/0.62/1.15/0.15	1.62/1.46/1.85/0.09	−.30/−.41/−.16/.08
One-step	0.85/0.61/1.15/0.15	1.58/1.35/1.82/0.13	−.29/−.39/−.10/.09
Regression	0.81/0.55/1.12/0.14	1.76/1.15/3.26/0.51	−.31/−.49/−.14/.10

Figure 4.

Comparison of the disaggregated series obtained from the smoothing and the regression-based procedures.

The ragged parts of the series disaggregated using the benchmarking procedure are shown in Figure 4. For example, they are visible for the 2010 to 2014 period in the case of some smaller regions, such as Kujawsko-Pomorskie, Lubuskie, Podlaskie, and, particularly, Świętokrzyskie. These jumps are likely to result from the poor quality of the regressors for these regions.

3.4. Differences in Business Cycle Dating

GDP series are frequently used for business cycle dating. While precise dating procedures go well beyond the GDP series, one of the most popular rules for identifying a recession involves determining a period of two consecutive quarters of decline in real GDP. We use this rule to compare recession dating based on q-o-q data from our sequential procedure and the regression-based alternative.

The results are shown in Table 4. It should be noted that Poland did not experience a recession at the country level in the investigated period. However, at the regional level, mild recessions were observed in 2008 to 2009, 2012 to 2013, and 2020 in some regions. The comparison presents mixed results. The highest consistency is observed for the COVID-19 crisis period, where both procedures give the same results (whether there was/was not a recession) for thirteen out of sixteen regions, including Śąskie region, where the recession had already started in Q4 2019. In addition, for the Global Financial Crisis period, the two procedures deliver the same outcome for thirteen regions, but the lengths and dating of the recession periods rather differ.

Table 4.

Recession Dating Based on the Smoothing and the Chow-Lin Procedures.

Note. The “minus” symbol indicates a quarter’s negative q-o-q growth rate. The shaded area represents a recession period defined as at least two consecutive negative growth rates.

However, a high level of disagreement is observed for the slowdown in 2012 to 2013. Sequential smoothing identifies recessions in thirteen regions with a median length of four quarters. The quarterly growth rates under the benchmarking procedure are much more scattered. As a result, recessions are identified in only nine regions.

4. Nowcasting Performance

In this section, we examine the nowcasting abilities of the smoothing algorithm and compare them with the performance of the regression-based method. For this purpose, we perform a pseudo-real time nowcasting experiment on the same Polish data as in the previous section. As a by-product of the experiment, we can also examine the size of the revisions resulting from the disaggregation of newly published annual regional data.

4.1. The Pseudo-Real-Time Nowcasting Experiment

We divide the whole dataset into eight extending samples. The first covers the period 2005 to 2011 and the last 2005 to 2018. For each sample, we first disaggregate the data solving the standard disaggregation problems Equation (6′)–(6″) and Equation (9). Given these disaggregated series, we calculate nowcasts for $H = 1, 2, \dots, 8$ quarters solving the nowcasting problems Equation (8) and Equation (10).

More precisely, we solve eight different nowcasting problems and have eight sets of nowcasts obtained from the smoothing approach. This is because nowcasts for succeeding quarters are dependent, which means that adding a contemporaneous constraint for a new quarter and calculating the nowcasts for this period changes the previous periods’ nowcasts. The situation is different for the regression-based methods for which nowcasts are based on fixed regression coefficients and do not affect the previous estimates. As a result, to calculate the nowcasts for the next eight quarters, it suffices to solve the problem Equation (10) just once, taking $H = 8$ .

Notably, our forecasting experiment does not account for the official data revisions released in the studied period. We work on the latest available data published by Statistics Poland.

4.2. Measuring Nowcasting Performance

It is impossible to assess the accuracy of the obtained nowcasts objectively. Obviously, this is because the official quarterly, region-level GDP data are unavailable in Poland. To create the reference series for the accuracy assessment, we use the disaggregated data from the previous section obtained from the one-step smoothing procedure and the regression-based method calculated over the whole dataset. Then, we take the mean values of the two disaggregated series. The general conclusions remain valid even if we take only one of the disaggregated series, either smoothed or benchmarked, as the reference. The results for these cases are available upon request.

Given the reference series $Δ y_{t, i}^{*}$ , we use the standard measures of forecast accuracy for a given horizon $h$ :

M E_{h} = \frac{1}{16 \cdot 8} \sum_{i = 1}^{16} \sum_{s = 1}^{8} e_{s, i}^{h},

(15)

MA E_{h} = \frac{1}{16 \cdot 8} \sum_{i = 1}^{16} \sum_{s = 1}^{8} | e_{s, i}^{h} |,

(16)

RMS E_{h} = \sqrt{\frac{1}{16 \cdot 8} \sum_{i = 1}^{16} \sum_{s = 1}^{8} {(e_{s, i}^{h})}^{2}},

(17)

where $e_{s, i}^{h} = Δ y_{4 (3 + s) + h, i}^{nc} - Δ y_{4 (3 + s) + h, i}^{*}$ is the nowcast error, $Δ y_{t, i}^{nc}$ denotes the GDP growth rates based on the nowcast values, and $s = 1, \dots, 8$ is the sample indicator. Because the accuracy of the nowcasts is sensitive to the sample length, we also analyze the accuracy for subsets of four consecutive samples.

4.3. Results

The values of the accuracy measures based on all eight samples are reported in Table 5. For shorter horizons, the smoothing algorithm delivers unbiased forecasts and slightly negatively biased for the longer ones, reaching −0.2 p.p. for $h = 8$ . Considerably larger, also negative, bias is observed for the regression-based method. It ranges from −0.15 p.p. to −0.72 p.p., for most of the horizons.

Table 5.

Nowcasting Accuracy—All Samples.

Horizon (quarters)	ME		MAE		RMSE
	Smooth	Regress	Smooth	Regress	Smooth	Regress
1	−0.01	0.26	0.92	2.62	1.09	4.32
2	0.05	−0.19	1.06	2.16	1.27	3.30
3	0.00	−0.16	1.14	2.01	1.37	3.10
4	−0.03	−0.19	1.21	2.55	1.43	4.05
5	0.05	−0.72	1.39	2.75	1.70	4.40
6	−0.11	−0.38	1.54	2.25	1.95	3.55
7	−0.17	−0.29	1.58	2.32	1.97	3.37
8	−0.20	−0.41	1.59	2.67	1.94	4.10

Also MAEs and RMSEs document the overall superiority of the smoothing method over the alternative approach, regardless of the horizon. The errors for the former are usually 30% to 75% smaller than for the latter.

The poor performance of the regression-based approach results from including extremely short samples in the analysis. Notably, the first one covers just six years of growth rates which means a single degree of freedom in the regression. Therefore, in Table 6, we study nowcasting abilities for different sample lengths.

Table 6.

Nowcast Accuracy for Different Sample Lengths.

Mean sample length (q/y)	$h = 1$	$h = 2$	$h = 3$	$h = 4$	$h = 5$	$h = 6$	$h = 7$	$h = 8$
RMSE for smoothing
30/7.5	1.14	1.31	1.41	1.50	1.54	1.36	1.48	1.48
34/8.5	1.11	1.27	1.39	1.46	1.64	1.68	1.73	1.67
38/9.5	1.09	1.25	1.34	1.46	1.76	1.81	1.81	1.73
42/10.5	1.03	1.18	1.34	1.45	1.86	1.91	1.91	1.80
46/11.5	0.90	1.11	1.17	1.24	1.73	1.83	1.86	1.72
50/12.5	0.94	1.06	1.20	1.22	1.55	2.11	2.12	2.15
RMSE for smoothing relative to RMSE for regression
30/7.5	0.19	0.28	0.33	0.26	0.25	0.28	0.31	0.26
34/8.5	0.30	0.32	0.53	0.36	0.38	0.54	0.55	0.46
38/9.5	0.55	0.49	0.67	0.69	0.83	0.94	0.78	0.68
42/10.5	0.74	0.87	0.92	0.96	1.13	1.19	1.18	1.19
46/11.5	0.68	0.83	0.85	0.91	1.45	1.55	1.52	1.35
50/12.5	0.99	1.04	1.12	1.01	1.29	1.55	1.70	1.41

Note. All the values are based on four consecutive samples of different lengths. The relative values of RMSE for smoothing lower than one indicate that smoothing nowcasts were more accurate than nowcasts generated by the regression-based method. Bolded are the relative errors that are lower than 1.

The results show that the very short samples also decrease the accuracy of the smoothing method, but this effect is small and limited to horizons up to one year. For the longer horizons, the forecasts are more accurate for the short samples which is likely to be the result of sample composition.

The relative errors for smoothing reported in the bottom panel of Table 6 rise with the sample length confirming the increasing relative accuracy of the regression-based method. However, up to horizon of one year the smoothing method outperforms the alternative for all but the last sample lengths. For the longer horizons, the more accurate regression-based forecasts are observed even for shorter samples consisting of ten to eleven annual observations.

4.4. Quantification of the Revisions of the Disaggregated Series

Consecutive disaggregation of the extending samples in the nowcasting experiment allows for quantifying the revision size. Given eight samples, where the first one covers the period 2005 to 2011, we have eight series of growth rates of disaggregated series in the years 2006 to 2011 per method. The number of revisions decreases with time and in 2017, there are just two disaggregated series. To quantify the revision size, we consider two measures calculated annually: the mean standard deviation ${\bar{SD}}_{T}$ and the maximum revision $MaxRe v_{T}$ defined as follows:

{\bar{SD}}_{T} = \frac{1}{4 \cdot 16} \sum_{i = 1}^{16} \sum_{t = 1}^{4} \sqrt{\frac{1}{s_{\max}} \sum_{s = 1}^{s_{\max}} {({\hat{y}}_{4 T + t, i}^{s} - {\bar{\hat{y}}}_{4 T + t, i})}^{2}},

(18)

MaxRe v_{T} = max_{i, t} {max_{s} {{\hat{y}}_{4 T + t, i}^{s}} - min_{s} {{\hat{y}}_{4 T + t, i}^{s}}},

(19)

where ${\hat{y}}_{4 T + t, i}^{s}$ denotes the disaggregated value for sample $s$ and ${\bar{\hat{y}}}_{4 T + t, i} = \frac{1}{s_{\max}} \sum_{s = 1}^{s_{\max}} {\hat{y}}_{4 T + t, i}^{s}$ stands for its mean value over $s$ . The measures are calculated for all the samples and for the last five longest samples.

The results are shown in Table 7. It documents that the revisions resulting from the regression-based method are substantial, and particularly huge before 2012. The mean standard deviations exceed one p.p., whereas the maximum revision considerably surpasses 30 p.p.. If we consider only the last five samples, the mean standard deviation for the Chow-Lin method drops considerably and never exceeds 0.15 p.p.. However, the maximum revisions can still be even higher than two p.p. On the other hand, even considering all the samples, the revisions for the sequential smoothing algorithm are small: the mean standard deviation never exceeds 0.08 p.p., and the maximum revision reaches 0.66 p.p. at most.

Table 7.

Size of the Revisions of the Disaggregated Series.

Year	Mean standard deviation [p.p.]				Maximum revision [p.p.]
	All samples		Last five samples		All samples		Last five samples
	Smooth	Regress	Smooth	Regress	Smooth	Regress	Smooth	Regress
2006	0	0.95	0	0.13	0	18.1	0	2.00
2007	0	1.12	0	0.15	0	37.1	0	2.09
2008	0	1.18	0	0.14	0	34.4	0	1.99
2009	0	1.87	0	0.15	0	84.5	0	2.71
2010	0.05	1.31	0	0.11	0.64	50.8	0	1.29
2011	0.04	1.07	0	0.13	0.59	66.6	0	1.68
2012	0.05	0.34	0	0.12	0.56	6.80	0	2.08
2013	0.07	0.13	0	0.11	0.58	1.64	0	1.43
2014	0.08	0.12	0.08	0.12	0.62	2.20	0.62	2.20
2015	0.06	0.08	0.06	0.08	0.66	0.89	0.66	0.89
2016	0.06	0.06	0.06	0.06	0.38	0.44	0.38	0.44
2017	0.07	0.03	0.07	0.03	0.30	0.17	0.30	0.17

Note. Smooth represents the sequential smoothing algorithm; Regress is the regression-based one.

5. Discussion and Conclusion

This article proposes a novel spatiotemporal disaggregation method of economic time series. Our approach relies on the smoothing criterion and postulates that the annual growth rates of the disaggregated series should be as smooth as possible. Contrary to the existing methods, it is relatively simple, does not require auxiliary regressors, and is therefore well suited for statistical offices that lack long and reliable series of regional economic indicators, as well as disaggregation practices. We also show that the algorithm can be applied sequentially, which implies that one merely needs to revise the estimates for the last low-frequency period when new data are disaggregated. This is a useful feature when historical estimates are of interest, presenting another advantage over benchmarking algorithms.

Univariate temporal smoothing disaggregation methods are known to be subject to at least three serious weaknesses: the outcome series are overly smooth, the methods do not utilize available information, and they cannot be used for nowcasting and forecasting. As a result, this approach is commonly employed as a last resort (Eurostat 2013, 123).

These problems are considerably less severe in the multivariate version of the algorithm, as shown by the results presented in this study. In general, this is because of the country-level contemporaneous constraints that, to some extent, play the same role as the auxiliary regressors by guiding the nowcast values when the temporal regional constraints are unavailable.

As far as the excessive smoothness is concerned, the spatial contemporaneous constraints also restrict the space for smoothing, as shown by the short-run volatility visible in the disaggregated series. They are characterized by a similar level of smoothness as the country-level data. Nonetheless, this is still a sign of over-smoothness because the country-level aggregate, being a linear combination of the regional series, should, on average, be less volatile than its components.

Our study clearly shows that it is possible to nowcast with the multivariate smoothing method. The contemporaneous country-level constraints suffice to generate reasonable nowcasts. Moreover, in the case of short time series and poor quality of the regional business cycle benchmarks, accounting for the additional data does not necessarily lead to more accurate predictions.

We believe that one can merge smoothing with multivariate benchmarking, for example, by extending the linear regression objective function to include a smoothness-related term. It is likely to improve the nowcasting abilities of the regression-based methods in the case of the problematic data discussed in this study. However, we leave this problem for future research.

Footnotes

Acknowledgements

I am very grateful to anonymous reviewers and an associate editor for their detailed and insightful comments and suggestions, especially regarding the use of the interior point method. Their input has significantly enhanced the quality of the study. I also thank Marek A. Dąbrowski, Grzegorz Kończak, Jacek Osiewalski, Józef Pociecha, Andrzej Torój, and Aleksander Welfe for constructive comments on the earlier version of the study. All errors are entirely my own.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Jan Acedański

Received: October 2022

Accepted: August 2024

References

Boot

J. C. G.

Feibes

Lisman

J. H. C.

1967. “Further Methods of Derivation of Quarterly Figures from Annual Data.”Journal of the Royal Statistical Society. Series C (Applied Statistics) 16 (1): 65–75. DOI: https://doi.org/10.2307/2985238.

Bozik

J. E.

Otto

M. C.

1988. “Benchmarking: Evaluating Methods That Preserve Month-to-Month Changes.” Technical Report RR-88/07, U.S. Bureau of the Census, Statistical Research Division.

Brown

2010. “An empirical comparison of constrained optimization methods for benchmarking economic time series.” In JSM 2009 Proceedings, Business and Economic Statistics Section, 2131–2143. American Statistical Association.

Causey

Trager

M. L.

1981. “Derivation of Solution to the Benchmarking Problem: Trend Revision.” Technical Report, US Census Bureau. Available as Appendix in Bozik and Otto (1988).

Cholette

1984. “Adjusting Sub-Annual Series to Yearly Benchmarks.”Survey Methodology 10 (1): 35–49.

Chow

G. C.

Lin

1971. “Best Linear Unbiased Interpolation, Distribution, and Extrapolation of Time Series by Related Series.”The Review of Economics and Statistics 53 (4): 372–5. DOI: https://doi.org/10.2307/1928739.

Cuevas

Á.

Quilis

E. M.

Espasa

2015. “Quarterly Regional GDP Flash Estimates by Means of Benchmarking and Chain Linking.”Journal of Official Statistics 31 (4): 627–47. DOI: https://doi.org/10.1515/jos-2015-0038.

Dagum

E. B.

Cholette

P. A.

2006. Benchmarking, Temporal Distribution, and Reconciliation Methods for Time Series. New York, NY: Springer. DOI: https://doi.org/10.1007/0-387-35439-5.

Denton

F. T.

1971. “Adjustment of Monthly or Quarterly Series to Annual Totals: An Approach Based on Quadratic Minimization.”Journal of the American Statistical Association 66 (333): 99–102. DOI: https://doi.org/10.2307/2284856.

10.

Di Fonzo

1990. “The Estimation of M Disaggregate Time Series When Contemporaneous and Temporal Aggregates Are Known.”The Review of Economics and Statistics 72 (1): 178–82. DOI: https://doi.org/10.2307/2109758.

11.

Di Fonzo

Marini

2011. “Simultaneous and Two-Step Reconciliation of Systems of Time Series: Methodological and Practical Issues.”Journal of the Royal Statistical Society. Series C (Applied Statistics) 60 (2): 143–64. DOI: https://doi.org/10.1111/j.1467-9876.2010.00733.x.

12.

Di Fonzo

Marini

2012. “Benchmarking Time Series According to a Growth Rates Preservation Principle.”Journal of Economic and Social Measurement 37 (3): 225–52. DOI: https://doi.org/10.3233/JEM-2012-0358.

13.

Di Fonzo

Marini

2015. “Reconciliation of Systems of Time Series According to a Growth Rates Preservation Principle.”Statistical Methods and Applications 24 (4): 651–69. DOI: https://doi.org/10.1007/s10260-015-0322-y.

14.

Di Iorio

Triacca

2022. “A Comparison Between VAR Processes Jointly Modeling GDP and Unemployment Rate in France and Germany.”Statistical Methods & Applications 31 (3): 617–35. DOI: https://doi.org/10.1007/s10260-021-00594-2.

15.

Eurostat. 2013. Handbook on Quarterly National Accounts. 2013 Edition. Eurostat Manuals and Guidelines. Luxembourg: Publications Office of the European Union. DOI: https://doi.org/10.2785/46080.

16.

Froeb

Koyak

1994. “Measuring and Comparing Smoothness in Time Series the Production Smoothing Hypothesis.”Journal of Econometrics 64 (1): 97–122. DOI: https://doi.org/10.1016/0304-4076(94)90059-0.

17.

Grabek

Kłos

2013. “Unemployment in the Estimated New Keynesian SoePL-2012 DSGE Model.”https://ideas.repec.org/p/nbp/nbpmis/144.html.

18.

Hedhili Zaier

Trabelsi

2007. “A Polynomial Method for Temporal Disaggregation of Multivariate Time Series.”Communications in Statistics – Simulation and Computation 36 (3): 741–59. DOI: https://doi.org/10.1080/03610910601096296.

19.

Koop

McIntyre

Mitchell

2020. “UK Regional Nowcasting Using a Mixed Frequency Vector Auto-Regressive Model with Entropic Tilting.”Journal of the Royal Statistical Society. Series A (Statistics in Society) 183 (1): 91–119. DOI: https://doi.org/10.1111/rssa.12491.

20.

Koop

McIntyre

Mitchell

Poon

2020. “Regional Output Growth in the United Kingdom: More Timely and Higher Frequency Estimates, 1970-2017.”Journal of Applied Econometrics 35 (2): 176–97. DOI: https://doi.org/10.1002/jae.2748.

21.

Lehmann

Wikman

2022. “Quarterly GDP Estimates for the German States.” IFO Working Paper Series 370, IFO Institute – Leibniz Institute for Economic Research at the University of Munich.

22.

Mazzi

G. L.

Proietti

2017. “Multivariate Temporal Disaggregation.” In Handbook on Rapid Estimates. 2017 Edition, edited by Mazzi

G. L.

Ladiray

Rieser

D. A.

, 231–85. Luxembourg: Publications Office of the European Union. DOI: https://doi.org/10.2785/4887400.

23.

Moauro

Savio

2005. “Temporal Disaggregation Using Multivariate Structural Time Series Models.”The Econometrics Journal 8 (2): 214–34. DOI: https://doi.org/10.1111/j.1368-423X.2005.00161.x.

24.

Open Data Watch. 2021. “Open Data Inventory 2020/21 Annual Report.”https://odin.opendatawatch.com/Report/annualReport2020

25.

Pipień

Roszkowska

2015. “Quarterly Estimates of Regional GDP in Poland – Application of Statistical Inference of Functions of Parameters.” NBP Working Papers 219, National Bank of Poland.

26.

Proietti

2011. “Multivariate Temporal Disaggregation with Cross-Sectional Constraints.”Journal of Applied Statistics 38 (7): 1455–66. DOI: https://doi.org/10.1080/02664763.2010.505952.

27.

Proietti

Moauro

2006. “Dynamic Factor Analysis with Non-Linear Temporal Aggregation Constraints.”Journal of the Royal Statistical Society. Series C (Applied Statistics) 55 (2): 281–300. DOI: https://doi.org/10.1111/j.1467-9876.2006.00536.x.

28.

Rossi

1982. “A Note on the Estimation of Disaggregate Time Series When the Aggregate Is Known.”The Review of Economics and Statistics 64 (4): 695–6. DOI: https://doi.org/10.2307/1923955.

29.

Sax

Eddelbuettel

2018. “Seasonal Adjustment by X-13ARIMA-SEATS in R.”Journal of Statistical Software 87 (11): 1–17. DOI: https://doi.org/10.18637/jss.v087.i11.

30.

Titova

Findley

Monsell

B. C.

2010. “Comparing the Causey-Trager Method to the Multiplicative Cholette-Dagum Regression-Based Method of Benchmarking Sub-Annual Data to Annual Benchmarks.”JSM 2010 Proceedings, Business and Economic Statistics Section, 3007–3021. American Statistical Association.

31.

Trager

M. L.

1982. “Derivation of Solution to the Benchmarking Problem: Relative Revision.” Technical Report, US Census Bureau. Available as Appendix in Bozik and Otto (1988).

Disaggregation and Nowcasting of Regional GDP Series with a Simple Smoothing Algorithm

Abstract

Keywords

1. Introduction

2. The Procedure

2.1. The Preliminaries

2.2. The Optimization Problem

2.3. The Sequential Solution to the Optimization Problem

2.4. Nowcasting

2.5. The Multivariate Regression-Based Method

3. The Disaggregation Results

3.1. The Data

3.2. The Disaggregated Series for Polish Regions

3.3. Comparison with the Alternative Approaches

3.4. Differences in Business Cycle Dating

4. Nowcasting Performance

4.1. The Pseudo-Real-Time Nowcasting Experiment

4.2. Measuring Nowcasting Performance

4.3. Results

4.4. Quantification of the Revisions of the Disaggregated Series

5. Discussion and Conclusion

Footnotes

Acknowledgements

Funding

ORCID iD

References