Sage Journals: Discover world-class research

Abstract

In recent years, researchers and statisticians have increasingly used econometric techniques to generate timely, high-quality, and detailed official statistics. This study presents a modified form of regression-based temporal disaggregation model to compile the index of production in construction for Türkiye, employing a state-space modeling approach. The model incorporates the twelve-month moving sum of deflated turnover as an observed variable and the number of employees as an exogenous variable. Finally, four alternative models—three assuming constant labor productivity and one assuming time-varying labor productivity—are compared.

Keywords

construction output statistics time-varying parameter labor productivity temporal disaggregation

1. Introduction

The general economy and construction industry maintain a close relation, as evidenced by the strong interconnections between the construction industry and other sectors. Spillover effects from developments in the construction industry stimulate various sectors within the economy. These developments generate significant demand for intermediate input from other sectors, such as agriculture, manufacturing, transportation, mining, and services. Notably, developments in other sectors that need capital facilities to produce goods and services also affect the construction industry (K’Akumu 2009). Given its predominant domesticity and labor-intensiveness, the construction industry serves as a significant source of employment opportunities in the economy. Therefore, the quality of construction statistics holds strategic importance for effective economic planning and decision-making. When National Statistical Institutes (NSIs) ensure the provision of more reliable, accurate, and timely statistics, decision-makers can respond appropriately and implement timely strategic policies to address economic fluctuations (Fan et al. 2011).

The index of production in construction (IPC) is a theoretical measure that aims to show changes in the volume of value-added data. However, as monthly value-added data are typically unavailable, NSIs use some proxy variables such as output quantities, building permits and starts, hours worked, employment, material input, wages and salaries, and turnover to estimate the IPC (Eurostat 2011). Until the end of 2016, a productivity-corrected hours worked method had been used to compile the IPC in Türkiye, primarily relying on the hours worked survey as the data source. However, in 2016, Turkstat decided to increase the weight of administrative data sources in the production of official statistics to reduce data compilation costs while alleviating the respondent burden. Consequently, Turkstat discontinued certain surveys, including the hours worked survey, and shifted to predominantly utilizing value-added tax (VAT) registers to generate short-term production statistics.

The IPC continuation with the variable of deflated turnover is one of the methods suggested by Eurostat (2011). This method relies on the assumption that monthly deflated turnover is an approximation for monthly gross production. However, just like other administrative data sources, tax records are not designed specifically for statistical purposes. A significant seasonal bias arises in the variable turnover obtained from the monthly VAT registers in Türkiye, as the usual practice for enterprises and corporations is to declare to the government for taxation and other accounting purposes at the end of the calendar year. Owing to inconsistencies between the timing of work done and timing of recording, the seasonal pattern of the deflated turnover does not reflect the actual seasonal pattern of the production in construction, indicating that the deflated turnover may not represent an appropriate proxy variable for value-added data in Türkiye.

Temporal disaggregation (TD) techniques exhibiting varying applications in the production of short-term official statistics could be a solution to the aforementioned problem. Temporal disaggregation could be defined as the process of deriving high-frequency time series from the low frequency series. Temporal disaggregation methods are broadly classified into two main categories based on the availability of high-frequency information regarding the target variable. The first group includes methods that do not use an indicator (Durbin and Quenneville 1997; Hillmer and Trabelsi 1987; Lisman and Sandee 1964; Stram and Wei 1986; Wei and Stram 1990). The second category comprises methods employing high-frequency information from related indicators, such as the Denton procedure (Denton 1971), regression-based methods (Chow and Lin 1971; Dagum and Cholette 2006; Fernández 1981; Litterman 1983), and dynamic regression models (Di Fonzo 2003; Proietti 2006; Silva and Cardoso 2001).

The issue of seasonal bias can be addressed through temporal disaggregation techniques such as employing an appropriate high-frequency indicator that captures the expected seasonal variations in construction production. However, standard temporal disaggregation methods can lead to too large revisions, especially in annual growth rates, as they require extrapolation in months beyond the temporal range for which low-frequency data are not yet available. Therefore, we propose a modified form of regression-based temporal disaggregation technique (MTD) to overcome both the seasonal bias problem and revision issue. The MTD mainly differs from the standard temporal disaggregation because we derive the monthly IPC from the monthly rolling annualized totals of the turnover series instead of the standard annual totals corresponding to each calendar year. The most important contribution of MTD is that it eliminates the need for extrapolation, as the most up-to-date information on annual turnover is used each month, which reduces revisions significantly compared to standard temporal disaggregation. However, as the model parameters are re-estimated every month, revisions still occur but are now quite small.

Because labor input is used for updating IPC, the regression coefficient can be considered as labor productivity. A crucial underlying assumption in standard regression-based temporal disaggregation models is the constancy of the relation between variables (low-frequency and high-frequency related data). However, this assumption may not work well when the relation varies with time or when structural breaks are present within the data. Empirical evidence indicates that dynamics between macroeconomic variables may change in the long run (Stock and Watson 1996). Keeping this economic reality in mind, MTD is extended to explicitly account for changes in labor productivity over time. We compare four alternative models, with three assuming constant labor productivity and one assuming time-varying labor productivity.

We use a state-space model (SSM) for estimation as it enables the estimation of unobserved latent variables, such as labor productivity, and allows the calculation of time-varying parameters, offering a more flexible and dynamic modeling approach. The SSM approach to temporal disaggregation was originally introduced by Harvey and Pierse (1984) and further developed by Durbin and Quenneville (1997) and Harvey and Koopman (1997). More recently, Moauro and Savio (2005) used SUTSE models for temporal disaggregation, and Proietti (2006) contributed to the field of regression-based temporal disaggregation within the SSM framework. Labonne and Weale (2020) employed the SSM framework to derive monthly business sector output from overlapping noisy quarterly UK VAT data, which has similarities to our problem. This study considers the changing nature of relations between variables in temporal disaggregation application. To our knowledge, this represents the first known use of time-varying coefficients as explanatory variables in this context.

The remainder of this paper is divided into six sections. Section 2 discusses the main data sources and variables for IPC compilation; Section 3 provides an overview of the general SSM and describes the proposed models for compiling Turkish IPC; Section 4 discusses the estimation results, Section 5 provides a general discussion of MTD, and finally, Section 6 provides concluding remarks.

2. Data Sources and Variables for IPC Compilation

Despite the strategic significance of the construction sector, the compilation of production statistics for this industry poses considerable challenges worldwide due to its inherent characteristics (Ruddock 2002). In general, construction activities are mainly project-based and are characterized by short-term temporary partnerships between different companies (Eurostat 2011). Although the duration of a particular project is limited, it generally lasts longer than the accounting period (United Nations 1997). A significant portion of enterprises in the construction sector are small firms. However, even in large firms, much of the work is often subcontracted (Meikle and Grilli 1999). The widespread practice of subcontracting in the formal and informal sectors introduces the probability of double counting or omissions. An important share of construction activity may be unrecorded or under-recorded. First, a considerable amount of construction work is done by establishments whose main activities are unclassified under the construction sector (United Nations 1997). Second, construction work is generally geographically dispersed, even for the same enterprise. Notably, maintaining a comprehensive register or database for collecting data from the construction sector is challenging due to the temporary nature of construction sites, which are often dispersed across numerous locations (Windapo and Qongqo 2011). In addition, numerous small-scale firms enter and exit from the industry in response to changing economic conditions each year (Briscoe 2006). Owing to the footloose nature of the industry, the operations of these firms may not be captured in construction statistics (Briscoe 2006; K’Akumu 2007).

Two national account concepts for defining construction activity include gross output and gross value-added. While the former is usually measured, the latter must always be estimated (Meikle and Grilli 1999). Gross output is the turnover of the construction sector or the amount paid for construction work by its final customers. Gross value-added is the value-added by the construction industry itself; hence, it should exclude inputs from other parts of the economy to avoid double counting (Meikle and Gruneberg 2015). In other words, gross output is a more general concept than gross value-added, as it includes the total value of all inputs in construction work in addition to gross value-added.

The IPC aims to measure the changes in volume of value-added data at close and regular intervals, normally monthly. As monthly value-added data are generally unavailable, NSIs employ some variables or combinations of variables to estimate the IPC to substitute value-added data. These variables are output quantities, building permits and starts, hours worked, material input, wages and salaries, turnover, and subcontracting (Eurostat 2011).

Best and Meikle (2015) emphasized that poor data will yield poor results, regardless of the employed method. Due to the idiosyncratic features of the construction industry, collecting reliable and timely construction data is always troublesome. The two main data sources of construction output statistics are administrative sources and surveys. Administrative sources are very useful for producing timely, detailed, and comprehensive statistics. Although administrative register-based data compilation offers advantages such as cost-effectiveness and reduced burden on respondents, data quality may be adversely affected by classification and coverage issues, outliers, data entry errors, and reporting delays of construction enterprises (Eurostat 2011). Surveys are subject to sampling, coverage, and distribution problems, but more importantly, they can be disadvantageous in terms of meeting the timeliness criteria of good quality statistics. Often, surveys are used where suitable administrative data are unavailable or insufficient to produce IPC.

Where monetary data (turnover, value of output, wages and salaries, or value of purchased materials) are collected, deflation is necessary, as the IPC is a volume measure. The selection of representative price indices for generating construction output at constant prices presents certain challenges. Output prices or purchaser prices are considered ideal deflators for methods based on turnover or output value. Since an output price index shows the development of final prices by the client to the contractor for completed construction work, it reflects both changes in productivity and profitability (Best and Meikle 2015). Similarly, with methods based on input values (wages and salaries, value of input material), input prices paid for labor, purchased material, and equipment by contractors should be used. If indices for input costs are used instead of output price indices to obtain IPC by deflating turnover or output value, the deflators will systematically overstate price changes and underestimate growth in output and productivity (Valence 1996). The reason is that marginal cost shocks do not fully pass through to final product prices at the firm level and final output prices do not reflect completely changes in costs (Klenow and Willis 2016).

Where quantity data (such as square meters of built area or cubic meters of volume) are collected, deflation is unnecessary. However, owing to the heterogeneity of construction output and quality differences even for buildings of the same size and type, quantity-based measurements would be misleading (Eurostat 2011).

When labor input (hours worked and number of persons employed) is used to update the IPC, labor productivity changes must be considered, suggesting that a productivity factor must be estimated. Common practice is to use the productivity trend observed in the past for the current reference period and to benchmark monthly IPC to quarterly series of construction value-added data. However, this may lead to large revisions in published monthly IPC if the relation between the quarterly series and hours worked is too noisy (Eurostat 2011). Because a single variable or data source is insufficient for compiling construction output statistics alone, some countries use combinations of variables or data sources to overcome the aforementioned problems.

2.1. Data Sources and Variables in Türkiye

The main data used in this study are the turnover and number of paid employees in construction industry obtained from Turkstat. Turnover statistics are based on VAT registers obtained from the Revenue Administration and number of paid employee statistics are based on registers procured from the Social Security Institution (SSI). Furthermore, the Hedonic New House Price Index from the Central Bank of Türkiye and Construction Cost Index (CCI) from Turkstat were utilized to deflate the construction output at nominal prices to real prices. All data are monthly and cover the period from January 2015 to December 2022.

Turnover data are available at the two-digit NACE Rev2 (Statistical Classification of Economic Activities in the European Community) level. Division F-41 (construction of buildings) is deflated using the Hedonic New House Price Index, which is an output price index. Because appropriate output price indices are not available for division F-42 (civil engineering) and division F-43 (specialized construction activities), the civil engineering subcomponent of the CCI was used for division F-42 and aggregate CCI was employed for division F-43, making up a relatively small share of total construction activity. All deflated subcomponents were then aggregated to obtain the total deflated construction turnover. All data used in the models were indexed to 2015 = 100.

Figure 1 shows the monthly index of deflated turnover. The chart exhibits pronounced seasonality, characterized by recurring troughs typically occurring in January and peaks observed in December. In fact, production in the construction sector should normally be reduced during the winter months or rainy seasons and should be higher during the summer months. As cold climate increases the cost of outdoor activities such as new housing and highway construction, builders should incorporate climate when planning construction projects to minimize the adverse impact of weather conditions on the cost of work (Tschetter and Lukasiewicz 1983). Hence, weather conditions and the timing of projects lead to pronounced seasonality in the construction industry. However, the seasonal pattern of the monthly index of deflated turnover shown in Figure 1 deviates from the expected seasonal pattern of the IPC. Because the usual practice for enterprises and corporations is to report to the government for taxation and other accounting purposes at the end of the calendar year, the timing of the construction work done deviates from the timing of official reporting. Because a time delay exists within the year between the actual production activity and registration, a significant seasonal bias arises in the variable turnover obtained from monthly VAT registers in Türkiye.

Figure 1.

Monthly index of deflated turnover.

We use the number of employees as an exogenous variable in the model because labor is a major input to the construction industry. Figure 2 shows the monthly construction employment index. Seasonality affects the demand for labor in the construction industry. Because the labor force is overused during high seasons of construction and underused in low seasons (Dagum and Cholette 2006), construction employment data show similar seasonal variations to production in the construction industry, unlike turnover. One can argue that the variable of hours worked provides a better estimate of productivity, as it also reflects calendar effect variations. However, the hours worked data in Türkiye are published on a quarterly basis. Employment data from SSI also has its own limitations. The major drawback of administrative data is the exclusion of informal employment. The labor force survey comprises both registered and unregistered employment, but detailed results by economic activity are only available on a quarterly basis as the sample size is insufficient to provide independent monthly estimates. Because our analysis of quarterly LFS data revealed an almost stable relation between informal and total employment, we assumed that this relation is also valid for monthly employment series obtained from SSI data.

Figure 2.

Monthly index of employment.

3. State-Space Modeling Approach

In this section, we present a new method for compiling monthly IPC using the SSM. The SSM has received increased attention in the production of official statistics in recent years, thereby proliferating studies on this subject: Pfeffermann et al. (1998) for Australian LFS; Silva and Smith (2001) for Brazil LFS; Elliott and Zong (2019) for UK unemployment statistics; Van den Brakel and Krieg (2009, 2015) for Dutch unemployment statistics; Tiller (1992) and Pfeffermann and Tiller (2006) for US labor statistics; Bisio and Moauro (2018) for Italian quarterly national accounts; Labonne and Weale (2020) for UK business sector output; and Balabay et al. (2016) for the Netherlands Road Transport Survey. This study pioneers the SSM application for estimating production in the construction industry.

We apply modified forms of the Chow–Lin (CL) model and its extensions, the Fernández and Litterman models, which are widely used by NSIs. We compare four alternative models—three assuming constant labor productivity and one assuming time-varying labor productivity.

The general form of a linear Gaussian SSM is specified by two equations: the measurement equation (also known as the signal or observation equation) and the state (or transition) equation. The measurement equation provides a link between the (p × 1) vector of observable variables $Y_{t}$ at time t and (m × 1) vector of unobserved state variables $α_{t}$ , which determine the underlying dynamics of the system.

\begin{matrix} Y_{t} = Z_{t} α_{t} + ϵ_{t}, ϵ_{t} ~ N (0, H_{t}) \\ α_{t + 1} = T_{t} α_{t} + R_{t} v_{t}, v_{t} ~ N (0, Q_{t}), \end{matrix}

(1)

where $Z_{t}$ is a (p × m) matrix, $ϵ_{t}$ is a (p × 1) vector of the serially uncorrelated disturbances with mean zero and covariance matrix $H_{t}$ , $T_{t}$ is an (m × m) matrix, $R_{t}$ is an (m × g) matrix, and $v_{t}$ is a (g × 1) vector of serially uncorrelated disturbances with mean zero and covariance matrix $Q_{t}$ . The initial state $α_{1}$ is a random vector such that its mean and covariance are $a_{1}$ and $P_{1}$ . The system matrices $Z_{t}$ , $H_{t}$ , $T_{t}$ , $R_{t}$ , and $Q_{t}$ are initially assumed as known. SSM can be estimated using the Kalman filter. The Kalman filter is a recursive estimation algorithm based on the minimization of the variance of the estimation error of the state vector. If Gaussian errors are assumed, the model parameters can be easily estimated by maximum-likelihood methods. For detailed information on SSM, we refer readers to Harvey (1990), Durbin and Koopman (2001), and Commandeur and Koopman (2007).

The main variables used in the models are $f_{t}$ and $x_{t}$ , which denote deflated turnover and the number of employees, respectively. Although using deflated turnover to update monthly IPC is recommended by Eurostat, as pointed out before, it gives misleading signals about the seasonal pattern of construction activity. To overcome this problem, we construct a signal (observable) variable $F_{t}$ such that it would be equal to a twelve-month moving sum of the deflated turnover $, f_{t}$ .

F_{t} = \sum_{i = 0}^{11} f_{t - i}, t = 12, \dots, n

(2)

Notably, $F_{t}$ is a monthly frequency series but has a rolling annual sum. It is updated each month when a new value of $f_{t}$ is added to the sequence. Only the observations belonging to the first eleven months will be treated as missing values in the model. Because the seasonal component is specified to cancel out over twelve consecutive months under the assumption of slowly evolving seasonality, the yearly series cannot contain seasonality (Dagum and Cholette 2006). Therefore, using a twelve-month moving sum of $f_{t}$ means implicit seasonal adjustment of the series.

3.1. SSMs Based on Constant Labor Productivity Assumption

In this section, we set up three SSMs, which are modified versions of the regression-based CL, Fernández, and Litterman-type temporal disaggregation models.

3.1.1. Chow–Lin-Type Model

Chow and Lin (1971) proposed a linear regression model for temporal disaggregation.

y_{t} = x_{t} β_{t} + u_{t},

(3)

where $y_{t}$ denotes monthly IPC, $x_{t}$ is the number of employees, and $β_{t}$ is the labor productivity assumed to be invariable over time.

β_{t + 1} = β_{t} = β,

(4)

The disturbance series $u_{t}$ in Equation 3 is assumed to follow a first-order autoregressive (AR1) process:

u_{t + 1} = ϕ u_{t} + ζ_{t}, ζ_{t} ~ N (0, σ_{ζ}^{2})

| ϕ | < 1 a n d u_{t} ~ N (0, σ_{ζ}^{2} / (1 - ϕ^{2}))

(5)

Although the annual summation of the series implies a simple form of seasonal adjustment, the yearly series may still exhibit some calendar effects, including trading days and moving holidays that do not cancel within the year. In particular, the Ramadan and Sacrifice holidays, which are moving religious holidays in Türkiye, might substantially impact the number of working days in a month and the effect varies from year to year. Therefore, we have included the composite calendar regressor used by Turkstat in the seasonal adjustment process in the model given in Equation (3). This regressor is based on the number of effective working days in a month; therefore, it does not include the weekend or fixed national and moving holidays.

y_{t} = x_{t} β_{t} + ω_{t} λ_{t} + u_{t},

(6)

where $ω_{t}$ is the composite calendar regressor and $λ_{t}$ is its time-invariant coefficient.

λ_{t + 1} = λ_{t} = λ

(7)

As previously stated, due to its seasonal bias, the deflated monthly turnover variable cannot be directly utilized as a proxy for the IPC. To overcome this issue, signal variable $F_{t}$ is constructed in such a way that it would be equal to a twelve-month moving sum of the deflated turnover. Since this transformation removes the seasonality from $F_{t}$ , it can also be assumed to be equivalent to a twelve-month moving sum of the monthly IPC:

F_{t} = y_{t} + y_{t - 1} + y_{t - 2} \dots \dots + y_{t - 11}, t = 12, \dots, n,

(8)

Substituting $y_{t}$ in Equation (6) into Equation (8) and rearranging yields:

F_{t} = (x_{t} β_{t} + ω_{t} λ_{t} + u_{t}) + \dots \dots + (x_{t - 11} β_{t - 11} + ω_{t - 11} λ_{t - 11} + u_{t - 11})

(9)

We can transform the observation Equation (9) into an equivalent minimal representation by defining the state variables for all lagged terms of the unobserved IPC as follows:

F_{t} = (x_{t} β_{t} + ω_{t} λ_{t} + u_{t}) + δ_{t} + δ_{t - 1} + \dots \dots + δ_{t - 10}

(10)

where

δ_{t + 1} = x_{t} β_{t} + ω_{t} λ_{t} + u_{t}

(11)

Hence, the aggregation constraints are satisfied by the observation equation. Once the model is estimated, the monthly IPC can be obtained indirectly using the number of employees, the smoothed estimates of labor productivity, and disturbance series (see the parenthesis in Equation (10)), or the lagged values of the IPC can be obtained directly from the smoothed estimates of the state variables, $δ_{t} .$

The state-space representation of the CL-type model is as follows:

F_{t} = {z^{'}}_{t} α_{t}, α_{t + 1} = T_{t} α_{t} + R_{t} v_{t}, v_{t} ~ N (0, Q_{t}),

(12)

where ${z^{'}}_{t} = [x_{t}, ω_{t}, 1, 1, \dots, 1]$ and the state vector $α_{t} = {[β_{t}, λ_{t}, u_{t}, δ_{t}, δ_{t - 1}, \dots δ_{t - 10}]}^{'}$

The state transition equation is as follows:

[\begin{matrix} β_{t + 1} \\ λ_{t + 1} \\ u_{t + 1} \\ δ_{t + 1} \\ δ_{t} \\ ⋮ \\ δ_{t - 9} \end{matrix}] = [\begin{matrix} 1 & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & ϕ & 0 & 0 & \dots & 0 \\ x_{t} & ω_{t} & 1 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & 1 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & 0 & 1 & 0 \end{matrix}] [\begin{matrix} β_{t} \\ λ_{t} \\ u_{t} \\ δ_{t} \\ δ_{t - 1} \\ ⋮ \\ δ_{t - 10} \end{matrix}] + [\begin{matrix} 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ ⋮ \\ 0 \end{matrix}] [ζ_{t}], Q_{t} = σ_{ζ}^{2}

(13)

3.1.2. The Fernández-Type Model

Fernández (1981) offered to extend the original CL’s AR(1) approach to the integrated model, where the error term follows a random walk process.

u_{t + 1} = u_{t} + ζ_{t}, ζ_{t} ~ N (0, σ_{ζ}^{2}),

(14)

The state-space representation of the Fernández-type model is as follows:

{z^{'}}_{t} = [x_{t}, ω_{t}, 1, 1, \dots, 1], α_{t} = {[β_{t}, λ_{t}, u_{t}, δ_{t}, δ_{t - 1}, \dots δ_{t - 10}]}^{'}

(15)

The system matrices $T_{t}$ , $R_{t}$ , and $Q_{t}$ are as follows:

T_{t} = [\begin{matrix} 1 & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 1 & 0 & 0 & \dots & 0 \\ x_{t} & ω_{t} & 1 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & 1 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & 0 & 1 & 0 \end{matrix}], R_{t} = [\begin{matrix} 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ ⋮ \\ 0 \end{matrix}], Q_{t} = σ_{ζ}^{2},

(16)

The system matrices are the same as the CL-type model except for $T_{t}$ . Note that the autoregressive parameter $ϕ$ in (13) is replaced by 1 in (16).

3.1.3. The Litterman-Type Model

Litterman (1983) proposed a model where the disturbance term follows the ARIMA(1,1,0) process on the grounds that Fernández’s random walk model may not remove all the serial correlation.

\begin{matrix} u_{t + 1} = u_{t} + ζ_{t}, ζ_{t + 1} = ϕ ζ_{t} + ψ_{t} \\ | ϕ | < 1, ψ_{t} ~ N (0, σ_{ψ}^{2}) a n d ζ_{t} ~ N (0, σ_{ψ}^{2} / (1 - ϕ^{2})) \end{matrix}

(17)

The state-space representation of the Litterman-type model is as follows:

{z^{'}}_{t} = [x_{t}, ω_{t}, 1, 0, 1, \dots, 1], α_{t} = {[β_{t}, λ_{t}, u_{t}, ζ_{t}, δ_{t}, δ_{t - 1}, \dots δ_{t - 10}]}^{'}

(18)

The system matrices $T_{t}$ , $R_{t}$ , and $Q_{t}$ are as follows:

T = [\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & ϕ & 0 & 0 & \dots & 0 \\ x_{t} & ω_{t} & 1 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & 0 & 1 & 0 \end{matrix}], R_{t} = [\begin{matrix} 0 \\ 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ ⋮ \\ 0 \end{matrix}], Q_{t} = σ_{ψ}^{2},

(19)

3.2. SSM Based on Time-Varying Labor Productivity Assumption

If the relationship between two economic variables is time-varying, employing a model with a constant parameter would constitute a specification error. In this case, the estimator will be inefficient, and the estimate of error variance usually is downward-biased (Rosenberg 1973b, 399). Given that the labor–output relation is affected by changes in capital intensity and total factor productivity, assuming that productivity is constant over time seems unreasonable. Therefore, in this section, we propose a model in which the labor productivity coefficient is allowed to vary stochastically over time.

The time-varying coefficient models were introduced into the literature in the 1970s including Swamy (1971), Cooper (1973), Rosenberg (1973a, 1973b), and Cooley and Prescott (1976), among many others. In the existing literature, stochastically varying regression parameter models are generally assumed to follow one of three processes: a random process, a random walk process, and a first-order Markov process or more generally an ARMA process (for a survey for these models, refer to Rosenberg 1973a).

Rosenberg (1973b) proposed a model in which the time-varying coefficients follow a first-order autoregressive process, as follows:

β_{t + 1} = (1 - ρ) \bar{β} + ρ β_{t} + ξ_{t}

(20)

While the term If $(1 - ρ)$ ensures convergence toward a normal value $\bar{β}$ , the serially independent disturbances $ξ_{t}$ impose stochastic variation over time (Rosenberg 1973b).

If $ρ = 0$ , the Rosenberg model turns into Swamy’s (1971) random coefficient model. In this model, the time-varying coefficient varies around a constant mean.

β_{t + 1} = μ + ξ_{t}

(21)

If $ρ = 1$ in the Rosenberg model, it reduces to the random walk model. General textbooks on time series analysis by state-space methods commonly assume that time-varying coefficients of explanatory variables follow a random walk process (Harvey 1990, sec. 7.7, 408, Durbin and Koopman 2001, sec. 3.2, 50, Hamilton 1994, sec. 13.8, 400).

β_{t + 1} = β_{t} + ξ_{t}

(22)

Perhaps the most widely used specification for a time-varying coefficient in empirical studies is the random walk model, as it is easy to implement, requires minimal information, and performs well for many datasets. According to Engle and Watson (1987), for most economic variables, the process should have a unit root and evolve slowly. Long-run productivity is more likely to evolve in a smooth and persistent manner over time. Therefore, we assume that the model representing the dynamics of productivity is the random walk model.

The unobserved monthly IPC with time-varying productivity is defined as follows:

y_{t} = x_{t} β_{t} + ω_{t} λ_{t} + u_{t}

u_{t + 1} = ϕ u_{t} + ζ_{t}, ζ_{t} ~ N (0, σ_{ζ}^{2})

| ϕ | < 1 a n d u_{t} ~ N (0, σ_{ζ}^{2} / (1 - ϕ^{2}))

(23)

where $β_{t}$ represents the time-varying coefficient, $ω_{t}$ is the calendar variable, and $λ_{t}$ is the time-invariant coefficient of the calendar variable. In order to remove potential serial correlation that may exist in the disturbance series $u_{t}$ , we assume that it is generated from a simple first-order autoregressive model as in the CL-type model. We do not apply the Fernández and Litterman solutions, as incorporating time-varying productivity into the model allows to address the nonstationarities in the data. The CL-type model postulates a stationary AR(1) residual process. However, if the related series $(x_{t})$ and unobserved disaggregated time series ( $y_{t})$ are nonstationary, this postulation is admissible only if a linear co-integration relation exists between these series. Meanwhile, the Fernández- and Litterman-type models assume the disturbance structure as an integrated process, implying that no linear long-run equilibrium relation exists between $y_{t}$ and $x_{t}$ . If nonstationarities result from nonlinear co-integration relationship or structural changes between the variables, time-varying coefficients can capture these dynamics, as they allow the regression model to adapt to changes between the variables over time.

Consequently, in the SSM model based on time-varying productivity (hereinafter TV-CL), all equations are the same as the CL-type model in the previous section, except that the disturbance term $ξ_{t}$ in the state equation (22) represents productivity, thereby yielding the following state-space representation of the model:

{z^{'}}_{t} = [x_{t}, ω_{t}, 1, 1, \dots, 1], α_{t} = {[β_{t}, λ_{t}, u_{t}, δ_{t}, δ_{t - 1}, \dots δ_{t - 10}]}^{'}

(24)

The state transition equation is as follows:

[\begin{matrix} β_{t + 1} \\ λ_{t + 1} \\ u_{t + 1} \\ δ_{t + 1} \\ δ_{t} \\ ⋮ \\ δ_{t - 9} \end{matrix}] = [\begin{matrix} 1 & 0 & 0 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & ϕ & 0 & 0 & \dots & 0 \\ x_{t} & ω_{t} & 1 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & 1 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋱ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & 0 & 1 & 0 \end{matrix}] [\begin{matrix} β_{t} \\ λ_{t} \\ u_{t} \\ δ_{t} \\ δ_{t - 1} \\ ⋮ \\ δ_{t - 10} \end{matrix}] + [\begin{matrix} 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ 0 & 0 \\ 0 & 0 \\ ⋮ & ⋮ \\ 0 & 0 \end{matrix}] [\begin{matrix} ξ_{t} \\ ζ_{t} \end{matrix}], Q_{t} = [\begin{matrix} σ_{ξ}^{2} & 0 \\ 0 & σ_{ζ}^{2} \end{matrix}]

(25)

The models presented in this paper were estimated using SSpace object module of EViews software package. A quasi-Newton algorithm, the BFGS (Broyden–Fletcher–Goldfarb–Shanno) was used as the optimization method. Smoothed estimates were obtained from a fixed interval smoothing algorithm. State variables were initialized with nondiffuse initialization. The initial values of disturbances were set to zero, and the initial value of productivity in the TV-CL model was set to 1. The initial conditions of the covariance matrices, were applied based on the specifications outlined in the aforementioned models.

4. Empirical Results

4.1. IPC Estimates

Figure 3 presents a comparative display of the IPC estimates obtained from SSM. Notably, the wider confidence intervals for the CL-type and Fernández models indicate a higher degree of uncertainty compared to the TV-CL model and Litterman type models. All IPC estimates are very close to each other and follow a seasonal pattern similar to the employment series used as a related variable in the models. They also reflect nonseasonal movements embedded in deflated turnover data. This is because the information at the sub-annual (monthly) frequency is obtained from the number of employees and the moving sum of deflated turnover data.

Figure 3.

IPC estimates with confidence intervals. Solid lines show IPC estimates and dashed lines show corresponding ±2 standard error bands.

4.2. Time-Varying Labor Productivity Estimate

As mentioned before, the model representing the productivity dynamics is assumed to be the random walk model. Figure 4A presents the smoothed estimate of labor productivity obtained from this model. The annual productivity coefficients were calculated from the original annual data to evaluate how well the local-level model fits the real situation. These coefficients were derived by dividing the original annual turnover index by the original annual employment index. The annual productivity coefficients from the SSM are then compared with those from the original data. Figure 4B illustrates the annual productivity coefficients derived from the state-space model (SSM) are remarkably close to those calculated directly from the original data, highlighting the effectiveness of the random walk model in capturing the labor productivity dynamics.

Figure 4.

Comparison of labor productivity estimates: (A) Monthly labor productivity estimates from SSM and (B) Comparison of annual labor productivity estimates.

4.3. Model Selection

Table 1 provides maximum likelihood estimates of hyperparameters and model selection criteria for alternative model specifications. While the constant productivity value is around 1 for the CL-, Fernández-, and Litterman-type models, time-varying productivity for the TV-CL model is 0.71, which is the last estimate of $β_{t}$ , in fact, it varies between 0.71 and 1.36 (see Figure 4A). The coefficients of the composite calendar regressor $λ_{t}$ are all statistically significant and have a positive sign, aligning with our a priori expectations. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) values point that the TV-CL model is the most suitable model among others. This suggests that the incorporation of time-varying productivity enhances the model’s explanatory power and its ability to represent the underlying dynamics in the data.

Table 1.

Estimation Results of Various Models.

	$σ_{ζ}^{2}$	$σ_{ψ}^{2}$	$σ_{ξ}^{2}$	$ϕ$	$β_{t}$	$λ_{t}$	LL	AIC	BIC
CL	156.94 (0.00)			0.73 (0.00)	1.03 (0.00)	1.88 (0.00)	−349.06	8.31	8.42
Fernandez	172.68 (0.00)				0.89 (0.00)	1.97 (0.00)	−354.46	8.41	8.50
Litterman		128.66 (0.00)		−0.52 (0.00)	0.90 (0.00)	2.00 (0.01)	−341.15	8.12	8.24
TV–CL	73.61 (0.00)		0.002 (0.00)	−0.11 (0.46)	0.71 (0.00)	1.97 (0.01)	−336.60	8.01	8.13

Note. The values given in parentheses are p values. LL = Log-likelihood value; AIC = Akaike information criterion; BIC = Bayesian information criterion.

Table 2 provides standard residual diagnostics for standardized smoothed disturbances to check the validity of the assumption that these disturbances are normally distributed without serial correlation. Augmented Dickey–Fuller (ADF) test statistics are used to identify whether disturbances exhibit stationarity. The Jarque–Bera test results show that the normality assumption is satisfied for the models based on constant labor productivity at the 5% significance level, while for the TV-CL model, normality is not satisfied. While Q statistics and autocorrelation values at lag 1 for the Litterman-type model are satisfactory, the CL- and Fernández-type models show the presence of first-order autocorrelation in their residuals. Q(12) statistics for the models based on constant labor productivity indicate a violation of the most important assumption of independence. On the other hand, the null hypothesis of independence cannot be rejected at the 5% significance level for the TV-CL model at lags 1 and 12. The ADF test results show that disturbances for the CL-type model have a unit root, whereas the TV-CL and integrated models satisfy the stationarity assumption. In conclusion, the diagnostic tests summarized in Table 2 reveal that the Litterman-type and TV-CL models outperform the CL- and Fernández-type models.

Table 2.

Diagnostics Tests for the Standardized Smoothed Disturbances of Various Models.

	JB	Q(1)	Q(12)	AC(1)	ADF
CL	0.28 (0.87)	8.52 (0.00)	29.01 (0.00)	−0.30	0.00 (0.96)
Fernandez	0.30 (0.86)	25.53 (0.00)	34.96 (0.00)	−0.51	−6.66 (0.00)
Litterman	0.26 (0.88)	1.78 (0.18)	22.76 (0.03)	−0.14	−7.12 (0.00)
TV–CL	8.16 (0.02)	3.63 (0.06)	17.93 (0.12)	−0.19	−7.34 (0.00)

Note. The values given in parentheses are p values. JB = Jarque–Bera test statistics; Q(k) = Ljung–Box Q(k) test statistics at lag k; AC(1) = Autocorrelation values at lag 1, 95% confidence limits for AC(1) = ±0.21.

4.4. Revision Analysis

Model based estimates of official statistics are always subject to certain revisions, irrespective of the revisions to the original input data in the model. In the production process, the monthly published IPC figures are obtained from a real-time analysis based on the most recent data. When a new observation is added to the series, the previously published figures will be revised, as the model parameters will be re-estimated each month based on the available series observed up to this period. The more frequent and larger the revisions, the more damaging the reliability of statistical data from the user’s perspective. Therefore, comparing the revision performances of the models will serve as a valuable guide for selecting the most suitable model.

Figure 5 illustrates the concurrent estimates of the models for the last thirty-six months. These IPC estimates were obtained indirectly using the employment index and smoothed state estimates, as shown in Equation (10). In 2020, when the COVID-19 pandemic hit the construction industry, the revision amount for all models was obviously higher than in the latter period. The erratic changes observed in both the employment and turnover series during this period caused the model parameters estimated for the consecutive months to change drastically, which resulted in larger updates to the initially published figures. While the TV-CL model performed slightly better than others in the period 2020 to 2021, it performed similarly to the Litterman- and Fernández-type models in 2022. Meanwhile, the CL-type model underperformed than other models during the entire period.

Figure 5.

Concurrent estimates of the models for the period 2020:01 to 2022:12. Gray lines represent concurrent estimates spanning from January 2020 to November 2022. The dashed line indicates the IPC estimate for December 2022.

In order to evaluate the performance of the models, we consider three basic descriptive revision measures: relative mean absolute revision (RMAR), mean absolute revision (MAR), and root mean square revision (RMSR). If $y_{t, t}$ is the value of the IPC at time t on the first date of release t, then the value of the IPC at time t published k periods later is $y_{t, t + k}$ .

We use RMAR to find the revision size for k steps in percentages in the IPC estimate levels. The revised IPC estimates are calculated starting from the sixty-first month, January 2020 to the ninety-sixth month, December 2022 as follows,

{RMAR}_{k} = \frac{100}{35 - k + 1} \sum_{t = 61}^{96 - k} | \frac{y_{t, t + k} - y_{t, t}}{y_{t, t}} |

(26)

We used MAR and RMSR to find the revision size in the monthly (month-on-month) and annual (year-on-year) growth rates.

{MAR}_{k} = \frac{1}{35 - k + 1} \sum_{t = 61}^{96 - k} | Δ y_{t, t + k} - Δ y_{t, t} |

(27)

{RMSR}_{k} = \sqrt{\frac{1}{35 - k + 1} \sum_{t = 61}^{96 - k} {(Δ y_{t, t + k} - Δ y_{t, t})}^{2}}

(28)

where Dyt represents the growth rate. While the monthly growth rate was calculated from the formula $100 (y_{t} - y_{t - 1}) / y_{t - 1}$ , annual growth rate was calculated from the formula $100 (y_{t} - y_{t - 12}) / y_{t - 12}$ .

The results for each step from one month to four months and for the twelve-month step, where we expect the most revisions, are presented in Table 3. The last column of Table 3 summarizes revision measure, demonstrating the overall performance of the models, which is obtained by averaging the revisions calculated for each $k$ step (from 1 to 12). The revision shown for $k = 12$ for all revision measures is higher than others, as expected. Another notable outcome in the table is that the average revision size in the annual growth rates is quite small for all models. While the CL-type model exhibits the worst performance for revisions at the level according to RMAR, the Litterman-type model shows the worst performance for revisions in monthly growth rates according to MAR and RMSR. The results indicate that the TV-CL model outperforms the other three based on constant labor productivity for four of the five revision measurement criteria. The Fernández-type model performs slightly better than the TV-CL model at monthly growth rates for RMSR, which is more sensitive to sample outliers.

Table 3.

Revision Table for the Period 2020:01 to 2022:12.

		k = 1	k = 2	k = 3	k = 4	k = 12	k = 1–12
RMAR	CL	0.40	0.60	0.74	0.92	1.15	1.01
	Fernandez	0.77	0.78	0.81	0.78	1.16	0.91
	Litterman	0.42	0.65	0.71	0.72	1.28	0.81
	TV-CL	0.27	0.45	0.56	0.65	1.28	0.77
	Standard TD	1.31	2.21	3.16	4.17	12.8	7.00
MAR (MoM)	CL	0.04	0.08	0.09	0.12	1.59	0.46
	Fernandez	0.16	0.19	0.22	0.22	1.57	0.44
	Litterman	0.24	0.39	0.41	0.41	1.72	0.71
	TV-CL	0.05	0.09	0.11	0.13	1.58	0.42
	Standard TD	0.03	0.04	0.06	0.08	1.20	0.45
RMSR (MoM)	CL	0.07	0.13	0.13	0.18	2.52	1.11
	Fernandez	0.34	0.38	0.37	0.38	2.57	1.07
	Litterman	0.75	0.84	0.87	0.89	2.78	1.49
	TV-CL	0.07	0.15	0.16	0.18	2.52	1.14
	Standard TD	0.11	0.13	0.16	0.19	1.55	0.87
MAR (YoY)	CL	0.04	0.06	0.06	0.08	0.18	0.10
	Fernandez	0.09	0.09	0.09	0.09	0.17	0.11
	Litterman	0.04	0.07	0.08	0.08	0.18	0.10
	TV-CL	0.02	0.04	0.04	0.05	0.17	0.08
	Standard TD	1.12	1.99	2.97	4.07	14.2	7.42
RMSR (YoY)	CL	0.07	0.08	0.09	0.11	0.41	0.20
	Fernandez	0.21	0.22	0.21	0.22	0.42	0.28
	Litterman	0.06	0.14	0.14	0.15	0.40	0.21
	TV-CL	0.04	0.05	0.06	0.07	0.35	0.17
	Standard TD	4.49	5.79	6.88	7.88	13.14	9.97

Furthermore, we utilized the standard temporal disaggregation method to determine whether the MTD approach can reduce the total revision sizes. In this case, we used the standard CL method, which is widely used by NSIs to derive monthly IPC using annual frequency turnover series rather than moving sums. We found that revision levels were indeed quite high, especially in terms of the annual growth rates and level of the series for $k = 12$ step.

If the 2020 is excluded from the analysis, revision sizes are generally smaller than those in the previous table (Table 4). Only the CL-type model performs slightly worse than before considering RMAR. The TV-CL model outperforms the others for all revision measurement criteria except for RMSR (MoM), where the Fernández-type model performs better. The Litterman model underperforms the others in terms of revisions to monthly growth rates, as before. Therefore, the performances of the models are observed to be quite similar to those of our previous findings.

Table 4.

Revision Table for the Period 2021:01 to 2022:12.

		k = 1	k = 2	k = 3	k = 4	k = 12	k = 1–12
RMAR	CL	0.44	0.67	0.87	1.12	0.93	1.10
	Fernandez	0.72	0.75	0.78	0.78	1.13	0.81
	Litterman	0.40	0.64	0.67	0.71	1.14	0.69
	TV-CL	0.26	0.41	0.48	0.54	0.92	0.48
MAR (MoM)	CL	0.05	0.09	0.09	0.14	1.55	0.40
	Fernandez	0.15	0.18	0.20	0.23	1.51	0.32
	Litterman	0.32	0.45	0.48	0.50	1.61	0.68
	TV-CL	0.04	0.09	0.11	0.13	1.46	0.30
RMSR (MoM)	CL	0.08	0.13	0.13	0.19	2.58	0.89
	Fernandez	0.35	0.38	0.38	0.42	2.68	0.78
	Litterman	0.92	0.99	1.03	1.08	2.90	1.47
	TV-CL	0.06	0.16	0.17	0.19	2.51	0.90
MAR (YoY)	CL	0.04	0.06	0.07	0.09	0.14	0.09
	Fernandez	0.08	0.09	0.09	0.09	0.17	0.10
	Litterman	0.03	0.07	0.07	0.07	0.16	0.07
	TV-CL	0.02	0.03	0.04	0.04	0.13	0.04
RMSR (YoY)	CL	0.07	0.08	0.10	0.12	0.37	0.15
	Fernandez	0.22	0.23	0.23	0.25	0.45	0.27
	Litterman	0.06	0.14	0.14	0.16	0.37	0.17
	TV-CL	0.03	0.04	0.05	0.06	0.29	0.09

5. General Discussion of Modified Temporal Disaggregation

The primary motivation behind proposing the MTD is to deal with the reported lag-induced seasonal bias in the deflated turnover data. Traditional temporal disaggregation methods can serve the same purpose, but they produce large revisions, as demonstrated using the standard CL method. These methods are principally used to estimate high-frequency values from low-frequency data. However, the utilization of low-frequency annual data may result in a loss of valuable information regarding the time series characteristics of the economic data. Rossana and Seater (1995) empirically examined the effects of temporal aggregation and concluded that “Monthly and quarterly data are governed by complex time series processes with much low-frequency cyclical variation, while annual data are governed by extremely simple processes with virtually no cyclical variation” (Rossana and Seater 1995, 441). In our case, we already have monthly turnover series but cannot use them in their current form as they suffer from data quality issues mentioned before. In the MTD model, the complete loss of monthly information is prevented using annualized monthly data instead of low-frequency annual data. Therefore, the superiority of MTD over standard temporal disaggregation is not limited to producing lower revisions, but it also reflects in nonseasonal movements embedded in deflated turnover data as information at the subannual (monthly) frequency is obtained from employment and deflated turnover data.

This study considered the employment variable deterministically. Certainly, alternative models treating employment stochastically might be used to address the seasonal bias problem. For example, a bivariate model can be established in which turnover and employment follow separate basic structural time series models. Furthermore, these models can be designed in relation to each other within an SSM framework. Trend, seasonality, and irregular components of employment and deflated turnover could then be estimated, and the IPC could be derived by borrowing seasonality from employment. Thus, the new IPC might have the trend and irregular components of turnover and a seasonal pattern of employment. However, such a modeling approach does not guarantee consistency between the annual totals of the IPC and deflated turnover. Over time, the levels of these two series may diverge. To circumvent this problem, published monthly IPC figures can be benchmarked to yearly sums of deflated turnover at each year-end, which is another source of revision. Alternatively, an SSM could be constructed such that monthly IPC estimates satisfy temporal constraints automatically in a single step. These alternatives hold some disadvantages, such as computational complexity and estimation of the large number of parameters from a relatively short series. Therefore, generally preferable method is to adopt a modeling approach in a routine statistical production that does not require serious expertise and is easily implemented.

As Lucas (1976) argued with his famous critique, the parameters of traditional macro-econometric models are unlikely to remain constant in a changing economic environment. Therefore, this study also addresses the dynamic properties of productivity by incorporating a time-varying coefficient within the MTD model. Standard temporal disaggregation methods consider the relation between the related series $(x_{t})$ and unobserved disaggregated time series ( $y_{t})$ as time-invariant whether co-integrated or not. If the relation between variables exhibits a changing nature, ignoring this fact will lead to model specification errors. According to the ADF test results, the null hypothesis corresponding to the presence of a unit root for the CL-type model disturbances is not rejected, while disturbances in the Fernández- and Litterman-type models satisfy the stationarity assumption. In other words, no linear co-integration relation exists between variables. Meanwhile, disturbances in the TV-CL model, which we assume to follow an AR(1) process, are stationary, indicating a time-varying co-integration relation between output and employment. The TV-CL model, explicitly accounting for the time-changing nature of productivity, demonstrates better performance in terms of revisions compared to other models. This may stem from the model's ability to adapt to the evolving productivity dynamics over time.

6. Conclusion

In recent years, NSIs have increasingly adopted administrative data sources to generate official statistics that are cost-effective, reduce the burden on respondents, and provide timely and detailed information. However, since these sources are not designed specifically for statistical purposes, data quality may be negatively affected by timing differences, classification, and coverage issues. In Türkiye, the main data that forms the basis for estimating output in the construction industry is the monthly turnover series obtained from VAT registers. Like other administrative data sources, tax registers have some limitations. The main weakness of this data is that the seasonal pattern does not reflect the actual seasonal pattern of production in the construction industry, as companies operating in the industry usually declare most of their taxes to the government at the end of the calendar year. In this study, we proposed four alternative SSMs to address the seasonal bias problem that arises in monthly administrative data. We developed modified forms of the CL-, Fernández-, and Litterman-type models by assuming constant labor productivity and the CL-type model by assuming time-varying labor productivity.

The monthly IPC figures were obtained by the temporal disaggregation of the rolling annualized totals of the deflated turnover using the number of employees as a related indicator. The results indicated that the MTD-generated new IPC estimates were superior in terms of the representation capability of real economic activities in the construction industry compared to an indicator directly based on the turnover series. To evaluate the quality of models, we considered the standard model selection criteria and residual-based diagnostics. The results revealed that the Litterman-type and TV-CL models outperformed the CL- and Fernández-type models. Although standard model comparison techniques provide some insight into selecting the best model, we need further analysis in order to see how well the models perform in real-time production. To this end, we compared the revision performances of the models using the concurrent estimates of the models for the last thirty-six months. Accordingly, the overall revision performance of the models was satisfactory, especially since the average revision size in annual growth rates was quite small for all models. The results indicated that the TV-CL model outperformed the other three models based on constant labor productivity on four out of the five revision measurement criteria. The Fernández-type model performed slightly better than the TV-CL model for monthly growth rates with respect to RMSR, which is more sensitive to outliers in the sample. Additionally, we applied the standard CL method to determine whether the MTD approach was effective in reducing the total revision sizes. The standard temporal disaggregation model yielded much larger revisions compared to the MTD, especially in terms of the annual growth rates.

Considering the statistical quality and revision performance of the proposed models, we found that the TV-CL model outperformed the other three models. While the SSM framework permits time-varying coefficients for explanatory variables, the existing empirical literature on regression-based temporal disaggregation often assumes these coefficients to be constant. If the relation between a low-frequency series and a high-frequency series is not stable over time, trying to model this relation rather than residuals (or besides the residuals) may yield more satisfactory results. The proposed SSM with a time-varying coefficient can also be applied to any standard temporal disaggregation problems. This can be readily achieved by constructing the signal variable from the standard annual totals instead of the rolling annual totals.

Footnotes

Author’s Note

The views and opinions expressed in this paper are those of the author and do not necessarily represent the official views of the Turkish Statistical Institute.

Funding

The author(s) declared that they received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Özlem Yiğit

Received: June 2023

Accepted: March 2024

References

Balabay

Van den Brakel

J. A.

Palm

F. C.

2016. “Multivariate State Space Approach to Variance Reduction in Series with Level and Variance Breaks Due to Survey Redesigns.” Journal of the Royal Statistical Society Series A: Statistics in Society 179: 377–402. DOI: https://doi.org/10.1111/rssa.12117.

Best

Meikle

2015. “International Construction Cost Comparisons.” In Measuring Construction Prices, Output and Productivity, edited by Best

Meikle

, 42–60. London: Routledge.

Bisio

Moauro

2018. “Temporal Disaggregation by Dynamic Regressions: Recent Developments in Italian Quarterly National Accounts.” Statistica Neerlandica 72 (4): 471–94. DOI: https://doi.org/10.1111/stan.12156.

Briscoe

2006. “How Useful and Reliable Are Construction Statistics?” Building Research and Information 34 (3): 220–9. DOI: https://doi.org/10.1080/09613210600589878.

Chow

Lin

1971. “Best Linear Unbiased Interpolation, Distribution, and Extrapolation of Time Series by Related Series.” The Review of Economics and Statistics 53 (4): 372–5. DOI: https://doi.org/10.2307/1928739.

Commandeur

Koopman

2007. An Introduction to State Space Time Series Analysis. New York, NY: Oxford University Press.

Cooley

T. F.

Prescott

E. C.

1976. “Estimation in the Presence of Stochastic Parameter Variation.” Econometrica 44 (1): 167–84. DOI: https://doi.org/10.2307/1911389.

Cooper

J. P.

1973. “Time-Varying Regression Coefficients: A Mixed Estimation Approach and Operational Limitations of the General Markov Structure.” Annals of Economic and Social Measurement 2 (4): 525–30.

Dagum

E. B.

Cholette

P. A.

2006. Benchmarking, Temporal Distribution, and Reconciliation Methods for Time Series. Lecture Notes in Statistics. New York, NY: Springer-Verlag.

10.

Denton

1971. “Adjustment of Monthly or Quarterly Series to Annual Totals: An Approach Based on Quadratic Minimization.” Journal of the American Statistical Association 66: 99–102.

11.

Di Fonzo

. 2003. “Temporal Disaggregation of Economic Time Series: Towards a Dynamic Extension.” European Commission (Eurostat) Working Papers and Studies, Theme, 1, 41.

12.

Durbin

Koopman

2001. Time Series Analysis by State Space Methods. Oxford: Clarendon Press.

13.

Durbin

Quenneville

1997. “Benchmarking by State Space Models.” International Statistical Review 65: 23–48. DOI: https://doi.org/10.1111/j.1751-5823.1997.tb00366.x.

14.

Elliott

Zong

2019. “Improving Timeliness and Accuracy of Estimates from the UK Labour Force Survey.” Statistical Theory and Related Fields 3 (2): 186–98. DOI: https://doi.org/10.1080/24754269.2019.1676034.

15.

Engle

Watson

1987. “The Kalman Filter: Applications to Forecasting and Rational-Expectations Models.” In Advances in Econometrics: Fifth World Congress (Econometric Society Monographs, edited by Bewley

, 245–84. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/CCOL0521344301.007.

16.

Eurostat. 2011. Guidelines for Compiling the Monthly Index of Production in Construction. Eurostat, European Commission.

17.

Fan

R. Y.

S. T.

Wong

J. M.

2011. “Predicting Construction Market Growth for Urban Metropolis: An Econometric analysis.” Habitat International 35 (2): 167–74. DOI: https://doi.org/10.1016/j.habitatint.2010.08.002.

18.

Fernández

1981. “A Methodological Note on the Estimation of Time Series.” The Review of Economics and Statistics 63 (3): 471–6. DOI: https://doi.org/10.2307/1924371.

19.

Hamilton

J. D.

1994. Time Series Analysis. Princeton, NJ: Princeton University Press.

20.

Harvey

A. C.

1990. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press.

21.

Harvey

A. C.

Koopman

S. J.

1997. “Multivariate Structural Time Series Models.” In System Dynamics in Economic and Financial Models, edited by Heij

Shumacher

J. M.

Hanzon

Praagman

, 269–98. New York, NY: John Wiley and Sons.

22.

Harvey

A. C.

Pierse

R. G.

1984. “Estimating Missing Observations in Economic Time Series.” Journal of the American Statistical Association 79 (385): 125–31. DOI: https://doi.org/10.1080/01621459.1984.10477074.

23.

Hillmer

S. C.

Trabelsi

1987. “Benchmarking of Economic Time Series.” Journal of the American Statistical Association 82 (400): 1064–71. DOI: https://doi.org/10.1080/01621459.1987.10478541.

24.

K’Akumu

O. A.

2007. “Construction Statistics Review for Kenya.” Construction Management and Economics 25 (3): 315–26. DOI: https://doi.org/10.1080/01446190601139883.

25.

K’Akumu

O. A.

2009. “Reforming the National Statistical System of Kenya: Policy Implications for the Development of Building Construction Statistics.” Habitat International 33(1): 120–4. DOI: https://doi.org/10.1016/j.habitatint.2008.05.008.

26.

Klenow

P. J.

Willis

J. L.

2016. “Real Rigidities and Nominal Price Changes.” Economica 83 (331): 443–72. DOI: https://doi.org/10.1111/ecca.12191.

27.

Labonne

Weale

2020. “Temporal Disaggregation of Overlapping Noisy Quarterly Data: Estimation of Monthly Output from UK Value-Added Tax Data.” Journal of the Royal Statistical Society Series A: Statistics in Society 183 (3): 1211–30. DOI: https://doi.org/10.1111/rssa.12568.

28.

Lisman

Sandee

1964. “Derivation of Quarterly Figures from Annual Data.” Journal of the Royal Statistical Society Series C: Applied Statistics 13 (2): 87–90. DOI: https://doi.org/10.2307/2985700.

29.

Litterman

1983. “A Random Walk, Markov Model for the Distribution of Time Series.” Journal of Business and Economic Statistics 1 (2): 169–73. DOI: https://doi.org/10.1080/07350015.1983.10509336.

30.

Lucas

R. E.

1976. “Econometric Policy Evaluation: A Critique.” Carnegie-Rochester Conference Series on Public Policy 1: 19–46.

31.

Meikle

Gruneberg

2015. “Measuring and Comparing Construction Activity Internationally.” In Measuring Construction Prices, Output and Productivity, edited by Best

Meikle

, 113–30. London: Routledge

32.

Meikle

J. L.

Grilli

M. T.

1999. “Measuring European Construction Output: Problems and Possible Solutions.” CIB TG29. Rotterdam.

33.

Moauro

Savio

2005. “Temporal Disaggregation Using Multivariate Structural Time Series Models.” The Econometrics Journal 8 (2): 214–34. DOI: https://doi.org/10.1111/j.1368-423X.2005.00161.x.

34.

Pfeffermann

Feder

Signorelli

1998. “Estimation of Autocorrelations of Survey Errors with Application to Trend Estimation in Small Areas.” Journal of Business and Economic Statistics 16: 339–48. DOI: https://doi.org/10.1080/07350015.1998.10524773.

35.

Pfeffermann

Tiller

2006. “Small Area Estimation with State-Space Models Subject to Benchmark Constraints.” Journal of the American Statistical Association 101: 1387–97. DOI: https://doi.org/10.1198/016214506000000591.

36.

Proietti

2006. “Temporal Disaggregation by State Space Methods: Dynamic Regression Methods Revisited.” The Econometrics Journal 9 (3): 357–72. DOI: https://doi.org/10.1111/j.1368-423X.2006.00189.x.

37.

Rosenberg

1973a. “A Survey of Stochastic Parameter Regression.” Annals of Economic and Social Measurement 2: 381–97.

38.

Rosenberg

1973b. “Random Coefficients Models: The Analysis of a Cross Section of Time Series by Stochastically Convergent Parameter Regression.” Annals of Economic and Social Measurement 2: 399–428.

39.

Rossana

R. J.

Seater

J. J.

1995. “Temporal Aggregation and Economic Time Series.” Journal of Business & Economic Statistics 13(4): 441–55.

40.

Ruddock

2002. “Measuring the Global Construction Industry: Improving the Quality of Data.” Construction Management and Economics 20 (7): 553–6. DOI: https://doi.org/10.1080/01446190210159908.

41.

Silva

Smith

2001. “Modelling Compositional Time Series from Repeated Surveys.” Survey Methodology 27(2): 205–15.

42.

Silva

J. S.

Cardoso

F. N.

2001. “The Chow–Lin Method Using Dynamic Models.” Economic Modelling 18 (2): 269–80. DOI: https://doi.org/10.1016/S0264-9993(00)00039-0.

43.

Stock

Watson

1996. “Evidence on Structural Instability in Macroeconomic Time Series Relations.” Journal of Business & Economic Statistics 14(1): 11–30. DOI: https://doi.org/10.1080/07350015.1996.10524626.

44.

Stram

Wei

1986. “A Methodological Note on the Disaggregation of Time Series Totals.” Journal of Time Series Analysis 7: 293–302. DOI: https://doi.org/10.1111/j.1467-9892.1986.tb00496.x.

45.

Swamy

P. A. V. B

. 1971. Statistical Inference in Random Coefficient Regression Models. New York, NY: Springer-Verlag.

46.

Tiller

1992. “Time Series Modeling of Sample Survey Data from the US Current Population Survey.” Journal of Official Statistics 8 (2): 149–66.

47.

Tschetter

Lukasiewicz

1983. “Employment Changes in Construction: Secular, Cyclical, and Seasonal.” Monthly Labor Review 106 (3): 11–7.

48.

United Nations. 1997. “International Recommendations for Construction Statistics.” Statistical Papers. Series M, Studies in Methods.

49.

Valence

1996. “The Productivity of the Construction Industry: Measurement of Inputs and Outcomes in a Dynamic System.” CIB REPORT.

50.

Van den Brakel

J. A.

Krieg

2009. “Estimation of the Monthly Unemployment Rate Through Structural Time Series Modelling in a Rotating Panel Design.” Survey Methodology 35: 177–90.

51.

Van den Brakel

J. A.

Krieg

2015. “Dealing with Small Sample Sizes, Rotation Group Bias and Discontinuities in a Rotating Panel Design.” Survey Methodology 41 (2): 267–96.

52.

Wei

W. W.

Stram

D. O.

1990. “Disaggregation of Time Series Models.” Journal of the Royal Statistical Society: Series B (Methodological) 52 (3): 453–67. DOI: https://doi.org/10.1111/j.2517-6161.1990.tb01799.x.

53.

Windapo

Qongqo

2011. “A Comprehensive Study of South African Construction Data Sources.” International Conference on Management and Innovation for a Sustainable Built Environment, Amsterdam, The Netherlands, June 20–23.

State-Space Modeling Approach to Exploring the Index of Production in Construction for Türkiye

Abstract

Keywords

1. Introduction

2. Data Sources and Variables for IPC Compilation

2.1. Data Sources and Variables in Türkiye

3. State-Space Modeling Approach

3.1. SSMs Based on Constant Labor Productivity Assumption

3.1.1. Chow–Lin-Type Model

3.1.2. The Fernández-Type Model

3.1.3. The Litterman-Type Model

3.2. SSM Based on Time-Varying Labor Productivity Assumption

4. Empirical Results

4.1. IPC Estimates

4.2. Time-Varying Labor Productivity Estimate

4.3. Model Selection

4.4. Revision Analysis

5. General Discussion of Modified Temporal Disaggregation

6. Conclusion

Footnotes

Author’s Note

Funding

ORCID iD

References