Sage Journals: Discover world-class research

Abstract

Composite estimation in repeated surveys with rotating panels refers to methods of estimation that exploit correlations in the data in the sample overlap between survey times to improve the precision of estimates. In this article a novel approach to composite estimation is proposed, in which composite regression estimators of current totals for a number of key variables are generated from a simultaneous calibration of the sampling weights of the overlapping samples of the current and previous survey time. In this procedure, in addition to the usual calibration to known population totals, differences of estimates for the key variables based on the full sample and the common sample from the two consecutive times are calibrated to each other. The resulting multivariate composite regression estimator, which is constructed as an approximate best linear unbiased estimator, incorporates effectively information from the samples of both survey times for enhanced estimation efficiency. Unlike other composite regression estimators, the proposed estimator does not require micro-matching of data in the overlap sample, and, therefore, is free of potential issues associated with it. It is also considerably more practical than existing composite regression estimators and the traditional AK-composite estimator.

Keywords

composite calibration composite regression estimator AK-composite estimation MR-composite estimation sample overlap

1. Introduction

Some repeated surveys, typically Labor Force Surveys, use a sampling design with rotating panels for operational and statistical efficiency. In such a design, a large overlap in the samples between successive survey times allows the improvement of estimates of population parameters, especially for those variables for which there is a strong correlation between the values reported by the same units in successive times. Composite estimation refers to estimation methods that use information from previous times to improve the precision of both the point-in-time (“level”) estimates and estimates of change between consecutive times, by exploiting correlations in the data of the overlap sample. This improvement in precision can in turn reduce the volatility in the time series of estimates, especially of estimates with high sampling variability associated with subpopulations of interest.

The earliest composite estimation method, known as “K-composite estimation,” was introduced for the US Current Population Survey (CPS) by Hansen et al. (1955), and extended later to the “AK-composite estimation” by Gurney and Daly (1965), and to the “AK-composite weighting”; see Fuller (1990), Cantwell and Ernst (1992), and Lent et al. (1994, 1999).

Later, a type of regression method of composite estimation introduced for the Canadian Labor Force Survey, called modified regression (MR) composite estimation, was developed to overcome certain shortcomings of the AK method; see Singh and Merkouris (1995), Singh et al. (1997, 2001), Gambino et al. (2001), Fuller and Rao (2001), Bell (2001), Beaumont and Bocci (2005). A recent evaluation study by Bonnéry et al. (2020), devised to assess the design-based properties of different composite estimators using CPS data and CPS sample design, found that the currently used Fuller-Rao version of the MR composite estimator performed the best. A more recent regression estimator has been proposed by Konrad and Berger (2023).

In a time series approach, other authors (Bell and Hillmer 1990; Binder and Dick 1989; Jones 1980; Pfeffermann 1991; Scott et al. 1977; Tiller 1989) developed estimators for repeated surveys allowing for stochastic variation in the parameters being estimated.

In this article, a novel approach to composite estimation is proposed, in which composite regression estimators of current totals for a number of key variables are generated from a simultaneous calibration of the sampling weights of the overlapping samples of the current and previous survey time. In this procedure, in addition to the usual calibration to known population totals of auxiliary variables, differences of estimates for the key variables based on the full sample and the common sample of the two consecutive surveys are calibrated to each other.

The proposed approach is motivated by considerations of most effective use of data collected in successive survey times. Earlier work that adopted this approach (e.g., Gurney and Daly 1965; Jones 1980) centered on the formulation of best linear unbiased estimators (BLUE) of a population parameter, involving linear combinations of estimators from a number of survey times. These estimators are rather unrealistic in practice to use when the series of available survey data is long, and depend on the intractable covariance structure of the combined estimators. In contrast, the proposed composite regression estimator of a vector of population totals is constructed as a practical approximation of a BLUE that involves a particular set of estimators from the latest two survey times.

The proposed composite regression estimators of current totals and changes can be particularly efficient because the regression coefficients incorporate information from the samples of both current and previous survey time, as do the coefficients of the BLUE. Furthermore, the simultaneous calibration of the samples of consecutive survey times facilitates greatly variance estimation by resampling methods. Unlike the MR-composite method, the proposed method of composite regression estimation does not require micro-matching of data in the common sample, and therefore is free of potential quality issues associated with it. It is also considerably more practical than the MR-composite estimation and the traditional AK-composite estimation. The comparative merits of the proposed estimator are discussed in detail in Section 4. Section 2 provides the notation and estimation preliminaries. Section 3 describes the construction of the composite regression estimators of level and change for any survey variable, as approximate BLUEs, through a suitable calibration procedure. A concluding discussion is provided in Section 5.

2. Notation and Preliminaries

The sample of a repeated survey with a rotating panel design is typically made up of a number (say $r$ ) of subsamples (“panels,” or “rotation groups”) of approximately equal size, each one staying in the sample for $k \leq r$ consecutive survey times and then rotated out of the survey and replaced by a newly selected panel. The panel sizes may vary in time in accordance with necessary adjustments in the survey design. Each panel is designed to be a sample of units (e.g., dwellings) representative of the survey population, and so can provide estimates of population parameters by a proper scaling-up of its sampling weights. For any two consecutive survey times there is a partial panel overlap of $100 (r - r / k) / r$ %, defining the “matched sample.” The samples at current time $t$ and previous time $t - 1$ are denoted by $s_{t}$ and $s_{t - 1}$ , respectively, and the vector of sampling weights in $s_{t}$ is denoted by $w_{t}$ . Adjustment of these weights for nonresponse is necessary in practice but not considered in the theory expounded in this article.

Let $y$ be a vector of $q$ key variables to be used in composite estimation, with vector of current-time totals $τ_{y, t}$ , and let $x$ be a vector of $p$ auxiliary variables used in calibration, with vector of current-time known totals $τ x, t$ . Denote then the sample matrix of $y$ , of dimension $n_{t} \times q$ , where $n_{t}$ is the sample size at time $t$ , by $Y_{t}$ , partitioned by the unmatched $(u)$ and matched $(m)$ part of the sample into $Y_{u, t}$ and $Y_{m, t}$ , respectively. Similar is the notation for the previous survey time $t - 1$ , but $Y_{u, t - 1}$ and $Y_{m, t - 1}$ refer to unmatched and matched samples with respect to time $t$ . The sample matrix of $x$ , of dimension $n_{t} \times p$ , is denoted by $X_{t}$ .

The current-time Horvitz-Thompson (HT) estimators of the totals $τ_{y, t}$ and $τ_{x, t}$ , based on the full sample, are the weighted sums ${\hat{τ}}_{y, t} = Y'_{t} w_{t}$ , and ${\hat{τ}}_{x, t} = X'_{t} w_{t}$ , respectively. The estimator of $τ_{y, t}$ based on the matched sample is ${\hat{τ}}_{y, m, t} = R Y_{m, t}' w_{m, t}$ , where $w_{m, t}$ is the subvector of sampling weights of units in the matched sample, and $R = 1_{s_{t}}' w_{t} / 1_{s_{m, t}}' w_{m, t}$ , where $1$ denotes the unit vector and $s_{t}$ and $s_{m, t}$ are the full and matched samples. $R$ is the ratio (weighted version of $r / (r - r / k)$ ) that adjusts for the fact that $r - r / k$ of the $r$ panels between times $t - 1$ and $t$ are common. When the panel sizes are approximately equal and procedures to balance the weights by panel are used (e.g., nonresponse adjustment is done separately by panel), then $R$ is approximately equal to $r / (r - r / k)$ .

The standard regression estimator ${\hat{τ}}_{y, t}^{R}$ of $τ_{y, t}$ involving the vector $x$ of auxiliary variables, and $X_{t}$ as the associated regression matrix, is given by

{\hat{τ}}_{y, t}^{R} = {\hat{τ}}_{y, t} + \hat{B} (τ_{x, t} - {\hat{τ}}_{x, t}),

(1)

where $\hat{B} = Y_{t}' W_{t} X_{t} {(X_{t}' W_{t} X_{t})}^{- 1}$ is the $q \times p$ matrix of regression coefficients, with $W_{t}$ being the diagonal “weighting” matrix with diagonal elements the elements of the vector $w_{t}$ . The estimator ${\hat{τ}}_{y, t}^{R}$ can be also written in the form of a calibration estimator, that is, as the weighted sum $Y_{t}' c_{x, t}$ , where $c_{x, t}$ is the vector of the calibrated sampling weights

c_{x, t} = w_{t} + W_{t} X_{t} {(X_{t}' W_{t} X_{t})}^{- 1} (τ_{x, t} - X_{t}' w_{t}),

(2)

derived by minimizing the generalized least-squares distance $(c_{x, t} - w_{t})' W_{t}^{- 1} (c_{x, t} - w_{t})$ subject to the constraints $X_{t}' c_{x, t} = τ_{x, t}$ . Of course, ${\hat{τ}}_{x, t}^{R} = τ_{x, t}$ , and for any other single variable $z$ with sample vector $Z_{t}$ at time $t$ , the regression estimator ${\hat{τ}}_{z, t}^{R}$ of its total $τ_{z, t}$ is of the form of Equation (1) and is obtained as $Z_{t}' c_{x, t}$ .

The regression estimator of $τ_{y, t}$ based on the matched sample, to be used in composite estimation, is

{\hat{τ}}_{y, m, t}^{R} = {\hat{τ}}_{y, m, t} + {\hat{B}}_{m} (τ_{x, t} - {\hat{τ}}_{x, t}),

where ${\hat{B}}_{m} = R Y_{m, t}' W_{m, t} X_{m, t} {(X_{t}' W_{t} X_{t})}^{- 1}$ , with the obvious notation for $W_{m, t}$ and $X_{m, t}$ .

3. A New Method of Composite Estimation

3.1. Constructing a Composite Regression Estimator

The proposed method of composite estimation arises from the search for a most effective use of information on the vector $y$ from successive survey times, in addition to the use of the current-time data on the auxiliary vector $x$ that gives the regression estimator in Equation (1). Thus, in addition to the estimates ${\hat{τ}}_{y, t}$ and ${\hat{τ}}_{x, t}$ used in Equation (1) for current survey time $t$ , we consider the matched-sample estimate ${\hat{τ}}_{y, m, t}$ defined above, plus the full-sample and matched-sample estimates for time $t - 1$ . In the time series of composite estimates, these two estimates for time $t - 1$ are given by $\hat{τ_{y, t - 1}^{c}} = Y_{t - 1}' c_{t - 1}$ and $\hat{τ_{y, m, t - 1}^{c}} = Y_{m, t - 1}' c_{m, t - 1}$ , where $c_{t - 1}$ is the vector of composite weights for time $t - 1$ . It is important to note that in this time series the vector $c_{t - 1}$ is constructed to incorporate information on $y$ from time $t - 1$ and past times. The first time of employing composite estimation, the estimates ${\hat{τ}}_{y, t - 1}^{c}$ and ${\hat{τ}}_{y, m, t - 1}^{c}$ are just the regression estimates ${\hat{τ}}_{y, t - 1}^{R}$ and ${\hat{τ}}_{y, m, t - 1}^{R}$ for previous time $t - 1$ , and $c_{t - 1}$ is the vector of calibrated weights, of the form of Equation (2), that generates these estimates. We thus have a second estimate of $τ_{y, t}$ formed as ${\hat{τ}}_{y, t - 1}^{c} + ({\hat{τ}}_{y, m, t} - {\hat{τ}}_{y, m, t - 1}^{c})$ , that is, the previous-time composite estimate updated with the change estimate (estimate of $τ_{y, t} - τ_{y, t - 1}$ ) based on the matched sample. We then consider the best linear unbiased estimator (BLUE) ${\hat{τ}}_{y, t}^{B}$ of $τ_{y, t}$ , which is the minimum-variance linear unbiased combination of the three estimates $τ_{x, t} - {\hat{τ}}_{x, t}$ , ${\hat{τ}}_{y, t}$ , and ${\hat{τ}}_{y, t - 1}^{c} + ({\hat{τ}}_{y, m, t} - {\hat{τ}}_{y, m, t - 1}^{c})$ , that is,

{\hat{τ}}_{y, t}^{B} = B_{1} (τ_{x, t} - {\hat{τ}}_{x, t}) + B_{2} {\hat{τ}}_{y, t} + B_{3} ({\hat{τ}}_{y, t - 1}^{c} + ({\hat{τ}}_{y, m, t} - {\hat{τ}}_{y, m, t - 1}^{c})) .

Using the condition of unbiasedness $E ({\hat{τ}}_{y, t}^{B}) = τ_{y, t}$ (satisfied approximately for the terms ${\hat{τ}}_{y, t - 1}^{c}$ and ${\hat{τ}}_{y, m, t - 1}^{c}$ ), and the fact that $E ({\hat{τ}}_{x, t}) = τ_{x, t}$ , it follows immediately that $B_{2} + B_{3} = 1$ , and thus ${\hat{τ}}_{y, t}^{B}$ takes the extended regression form

{\hat{τ}}_{y, t}^{B} = {\hat{τ}}_{y, t} + B_{1} (τ_{x, t} - {\hat{τ}}_{x, t}) + B_{3} [{\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c} - ({\hat{τ}}_{y, t} - {\hat{τ}}_{y, m, t})],

which can be written as

\hat{τ_{y, t}^{B}} = \hat{τ_{y, t}} + B (\begin{matrix} τ_{x, t} - {\hat{τ}}_{x, t} \\ {\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c} - ({\hat{τ}}_{y, t} - {\hat{τ}}_{y, m, t}) \end{matrix}),

(3)

where $B = (B_{1}, B_{3})$ has the easily derived variance-minimizing value

B = - Cov [{\hat{τ}}_{y, t}, (\begin{matrix} τ_{x, t} - {\hat{τ}}_{x, t} \\ (\begin{matrix} {\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c} - ({\hat{τ}}_{y, t} - {\hat{τ}}_{y, m, t}) \end{matrix}) \end{matrix})] {[Var (\begin{matrix} τ_{x, t} - {\hat{τ}}_{x, t} \\ {\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c} - ({\hat{τ}}_{y, t} - {\hat{τ}}_{y, m, t}) \end{matrix})]}^{- 1} .

(4)

It is pointed out here that a more typical formulation of the BLUE ${\hat{τ}}_{y, t}^{B}$ is given by

{\hat{τ}}_{y, t}^{B} = {(U' K^{- 1} U)}^{- 1} U' K \hat{Θ},

where $\hat{Θ} = (τ_{x, t} - {\hat{τ}}_{x, t}, {\hat{τ}}_{y, t}, {\hat{τ}}_{y, t - 1}^{c} + ({\hat{τ}}_{y, m, t} - {\hat{τ}}_{y, m, t - 1}^{c}))'$ , $K$ is the covariance matrix of $\hat{Θ}$ , and $U$ is a design matrix of 0’s and 1’s that satisfies $E (\hat{Θ}) = U τ_{y, t}$ . The variance of ${\hat{τ}}_{y, t}^{B}$ is $Var ({\hat{τ}}_{y, t}^{B}) = {(U' K^{- 1} U)}^{- 1}$ . Such formulation of a best linear unbiased estimator, for a scalar parameter, was used by Gurney and Daly (1965) and Jones (1980) in the setting of a repeated survey, with $\hat{Θ}$ being a vector of unbiased estimators of the parameter for a number of survey times.

The more transparent formulation of the BLUE of $τ_{y, t}$ in Equation (3), with the specified $\hat{Θ} = (τ_{x, t} - {\hat{τ}}_{x, t}, {\hat{τ}}_{y, t}, {\hat{τ}}_{y, t - 1}^{c} + ({\hat{τ}}_{y, m, t} - {\hat{τ}}_{y, m, t - 1}^{c}))'$ , involving two estimates of $τ_{y, t}$ and the HT estimate of $τ_{x, t}$ , is used here to motivate a more practicable estimation procedure for the current-time total of any survey variable. Thus we set

w = (\begin{matrix} w_{t} \\ c_{t - 1} \end{matrix}), Y_{(t)} = (\begin{matrix} Y_{t} \\ \begin{matrix} 0 \end{matrix} \end{matrix}), X = (\begin{matrix} \begin{matrix} X_{t} Ψ_{t} \\ 0 - Ψ_{t - 1} \end{matrix} \end{matrix}),

(5)

where $Ψ_{t} = (Y_{u, t}', (1 - R) Y_{m, t}')'$ and $Ψ_{t - 1} = (Y_{u, t - 1}', (1 - R) Y_{m, t - 1}')'$ . Then $Y_{(t)}' w = {\hat{τ}}_{y, t}$ and

X' w = (\begin{matrix} X_{t}' w_{t} \\ Ψ' w_{t} - Ψ_{t - 1}' c_{t - 1} \end{matrix}) = (\begin{matrix} {\hat{τ}}_{x, t} \\ {\hat{τ}}_{y, t} - {\hat{τ}}_{y, m, t} - ({\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c}) \end{matrix}),

and thus the matrix coefficient $B$ in Equation (4) can be expressed as

B = Cov (Y_{(t)}' w, X' w) {[Var (X' w)]}^{- 1} = Cov ({\hat{τ}}_{y, t}, {\hat{τ}}_{X}) {[Var ({\hat{τ}}_{X})]}^{- 1},

(6)

where $\hat{τ_{X}} = X' w$ . An estimate of $B$ through the estimated variance $\hat{Var} (w)$ , having the form $\hat{B} = Y_{(t)}' \hat{Var} (w) X {[X' \hat{Var} (w) X]}^{- 1}$ , would give the “optimal” estimator of $τ_{y, t}$ , of minimum asymptotic variance, in analogy to the single-sample optimal estimator (see, e.g., Rao 1994). However, computation of the matrix $\hat{Var} (w)$ is intractable in the considered sampling setting. Besides, such an estimate $\hat{B}$ might be quite suboptimal.

A practical substitute can be obtained by replacing $\hat{Var} (w)$ in $\hat{B}$ with the weighting matrix $W = diag (W_{t}, C_{t - 1})$ associated with $w$ . This defines the $q \times (p + q)$ matrix of (generalized) regression coefficients $\hat{B} = Y_{(t)}' W X {(X' W X)}^{- 1}$ , and the composite regression estimator, as an approximation of the BLUE in Equation (3),

{\hat{τ}}_{y, t}^{CR} = {\hat{τ}}_{y, t} + \hat{B} (τ_{X} - {\hat{τ}}_{X}),

(7)

where $τ_{X} = (τ_{x, t}', 0')'$ , and $0$ denotes a $q$ -dimensional vector of zeros.

The composite regression estimator in Equation (7) is generated by an extended calibration procedure which involves both samples $s_{t}$ and $s_{t - 1}$ . This procedure is specified by the augmented regression matrix $X$ defined in Equation (5), the associated vector of calibration totals $τ_{X} = (τ_{x, t}', 0')'$ and the weight vector $w = (w_{t}', c_{t - 1}')'$ . The vector of calibrated weights for the combined sample $s_{t} \cup s_{t - 1}$ is given then by

c = w + W X {(X' W X)}^{- 1} (τ_{X} - X' w) .

(8)

The vector $c = {({c^{'}}_{t}, {c^{'}}^{⋆}_{t - 1})}^{'}$ , where $c_{t - 1}^{*}$ is the vector $c_{t - 1}$ recalibrated at time $t$ , satisfies the calibration constraints $X' c = τ_{X}$ , that is, $X_{t}' c_{t} = τ_{x, t}$ and $Ψ'_{t} c_{t} = Ψ'_{t - 1} c_{t - 1}^{*}$ . The second constraint means that the differences in full-sample and matched-sample estimates from previous and current time are equated. Clearly then, with $Y_{(t)}$ defined in Equation (5), the composite regression estimator in Equation (7) can be obtained as a composite calibration estimator, that is, ${\hat{τ}}_{y, t}^{CR} = Y_{(t)}' c = Y_{t}' c_{t}$ . Evidently, it suffices to use only the current-time component $c_{t}$ , which incorporates micro-level information on $y$ from the previous time, together with the current-time data matrix $Y_{t}$ to obtain the composite regression estimator of $τ_{y, t}$ .

3.2. Analytical Expressions of CR Estimates of Levels and Changes

3.2.1. Estimates of Levels

The composite regression estimator of $τ_{y, t}$ given by Equation (7) can be decomposed into the standard and additional (composite) regression terms as

{\hat{τ}}_{y, t}^{CR} = {\hat{τ}}_{y, t} + {\hat{B}}_{x} (τ_{x, t} - {\hat{τ}}_{x, t}) + {\hat{B}}_{y}^{c} [{\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c} - ({\hat{τ}}_{y, t} - {\hat{τ}}_{y, m, t})],

(9)

where ${\hat{B}}_{x}$ and ${\hat{B}}_{y}^{c}$ are the partial regression coefficients, components of $\hat{B}$ . Clearly, the estimator ${\hat{τ}}_{y, t}^{CR}$ is recursive, through ${\hat{τ}}_{y, t - 1}^{c}$ and ${\hat{τ}}_{y, m, t - 1}^{c}$ , which are composite regression estimators carrying information from previous surveys to the current survey—the notation CR is reserved for estimators produced by the current-time $t$ composite calibration.

Partitioning the matrix $X$ by the column submatrices in Equation (5) as $X = (X, Ψ)$ , the vector $c$ in Equation (8) can be decomposed (Merkouris 2004, 1132) as

c = c_{x} + L_{x} Ψ {(Ψ' L_{x} Ψ)}^{- 1} (0 - Ψ' c_{x}),

(10)

where $c_{x} = w + W X {(X' W X)}^{- 1} (τ_{x, t} - {\hat{τ}}_{x, t})$ is the vector of calibrated weights based on the regression matrix $X$ , so that $X' c_{x} = τ_{x, t}$ , and $L_{x} = W (I - P_{x})$ , with $P_{x} = X (X' W X)^{- 1} X' W$ . Note that $Ψ' c = 0$ , this being the partial calibration constraint associated with the differences in the last term in Equation (9).

The vector $c_{x}$ can be written analytically as

c_{x} = (\begin{matrix} c_{x, t} \\ c_{x, t - 1} \end{matrix}) = (\begin{matrix} w_{t} + W_{t} X_{t} {(X'_{t} W_{t} X_{t})}^{- 1} (τ_{x, t} - {\hat{τ}}_{x, t}) \\ c_{t - 1} \end{matrix}),

(11)

with $c_{x, t}$ as in Equation (2). Then, straightforward calculations give

Y_{(t)}' c_{x} = {\hat{τ}}_{y, t}^{R}

and

Ψ' c_{x} = {\hat{τ}}_{y, t}^{R} - {\hat{τ}}_{y, m, t}^{R} - ({\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c}),

where ${\hat{τ}}_{y, t}^{R}$ and ${\hat{τ}}_{y, m, t}^{R}$ are respectively the full-sample and matched-sample regression estimators defined in Section 2.

Now, using Equation (10) and Equation (11), the composite calibration estimator $Y_{(t)}' c$ can be expressed in the (alternative to Equation (7)) composite regression form

{\hat{τ}}_{y, t}^{CR} = {\hat{τ}}_{y, t}^{R} + {\hat{B}}_{y}^{c} ({\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c} - ({\hat{τ}}_{y, t}^{R} - {\hat{τ}}_{y, m, t}^{R})),

(12)

where ${\hat{B}}_{y}^{c} = Y_{(t)}' L_{x} Ψ {(Ψ' L_{x} Ψ)}^{- 1}$ is the partial regression coefficient in Equation (9). A more explicit expression of ${\hat{B}}_{y}^{c}$ is derived upon noting that

L_{x} = (\begin{matrix} \begin{matrix} L_{x, t} 0 \\ 0 C_{t - 1} \end{matrix} \end{matrix}),

where $L_{x, t} = W_{t} (I - P_{x, t})$ , and $P_{x, t} = X_{t} {(X_{t}' W_{t} X_{t})}^{- 1} {X'}_{t} W_{t}$ . It follows then that ${\hat{B}}_{y}^{c} = Y_{t}' L_{x, t} Ψ_{t} {(Ψ_{t}' L_{x, t} Ψ_{t} + Ψ_{t - 1}' C_{t - 1} Ψ_{t - 1})}^{- 1}$ , which shows how previous-time information on $y$ is incorporated in ${\hat{B}}_{y}^{c}$ .

Expression (12) of the composite regression estimator ${\hat{τ}}_{y, t}^{CR}$ allows a direct comparison with the current-time regression estimator ${\hat{τ}}_{y, t}^{R}$ , separating the effect of incorporating previous-time information. We can write Equation (12) alternatively in the more interpretative form

{\hat{τ}}_{y, t}^{CR} = (I - {\hat{B}}_{y}^{c}) {\hat{τ}}_{y, t}^{R} + {\hat{B}}_{y}^{c} ({\hat{τ}}_{y, t - 1}^{c} + {\hat{τ}}_{y, m, t}^{R} - {\hat{τ}}_{y, m, t - 1}^{c}) .

(13)

Expression (13) shows that the composite regression estimator ${\hat{τ}}_{y, t}^{CR}$ is a weighted average of the current-time regression estimator and the previous-time composite regression estimator updated with the change estimator based on the matched sample.

Expression (13) gives the composite regression estimator ${\hat{τ}}_{y, t}^{CR}$ in multivariate form for the vector $y$ . It follows from Equation (13) that for any of the $q$ components of $y$ , say $y_{g}$ , the composite regression estimator of its total is (using for simplicity only the subscript $g$ )

\begin{matrix} {\hat{τ}}_{g, t}^{CR} = (1 - {\hat{β}}_{g}^{c}) {\hat{τ}}_{g, t}^{R} + {\hat{β}}_{g}^{c} ({\hat{τ}}_{g, t - 1}^{c} + {\hat{τ}}_{g, m, t}^{R} - {\hat{τ}}_{g, m, t - 1}^{c}) \\ + {\hat{β}}_{\bar{g}}^{c} ({\hat{τ}}_{\bar{g}, t - 1}^{c} - {\hat{τ}}_{\bar{g}, m, t - 1}^{c} - ({\hat{τ}}_{\bar{g}, t}^{R} - {\hat{τ}}_{\bar{g}, m, t}^{R})) \end{matrix}

(14)

where ${\hat{β}}_{g}^{c}$ is the $g$ -th diagonal element of ${\hat{B}}_{y}^{c}$ , ${\hat{β}}_{\bar{g}}^{c}$ is the $g$ -th row vector of ${\hat{B}}_{y}^{c}$ without the $g$ -th element, and the quantities in the last bracket of Equation (14) are the indicated vector estimators for the other $q - 1$ components of $y$ . Thus, although the composite estimator ${\hat{τ}}_{g, t}^{CR}$ incorporates all information on $y_{g}$ available in the two overlapping samples, in the manner of the weighted average in the first two terms of Equation (14), the additional third term suggests that ${\hat{τ}}_{g, t}^{CR}$ may realize additional efficiency due to correlation of $y_{g}$ with the rest of the components of $y$ . Of course, ${\hat{τ}}_{g, t}^{CR}$ can be conveniently obtained as calibration estimator $Y_{g, t}' c_{t}$ , where $Y_{g, t}$ is the $g$ -th column of $Y_{t}$ . Generally, for any linear combination $λ' y$ of the components of $y$ , where $λ$ is a $q \times 1$ vector of constants, the composite regression estimator of $λ' τ_{y, t}$ is given by $λ' {\hat{τ}}_{y, t}^{CR} = λ' Y_{t}' c_{t}$ .

Let now $z$ be any other single variable, with current-time total $τ_{z, t}$ and current-sample matrix $Z_{t}$ of dimension $n_{t} \times 1$ . Setting $Z_{(t)} = (Z_{t}', 0')'$ and using Equation (10) we obtain the composite calibration estimator $Z_{t}' c_{t}$ of $τ_{z, t}$ in composite regression form, analogous to Equation (12), as

{\hat{τ}}_{z, t}^{CR} = {\hat{τ}}_{z, t}^{R} + {\hat{B}}_{z}^{c} ({\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c} - ({\hat{τ}}_{y, t}^{R} - {\hat{τ}}_{y, m, t}^{R})),

(15)

where ${\hat{B}}_{z}^{c} = Z_{(t)}' L_{x} Ψ {(Ψ' L_{x} Ψ)}^{- 1} = Z'_{t} L_{x, t} Ψ_{t} (Ψ_{t}' L_{x, t} Ψ_{t} + Ψ_{t - 1}' C_{t - 1} Ψ_{t - 1})$ , and ${\hat{τ}}_{z, t}^{R}$ is the regression estimator of the form (1). It is seen from Equation (15) that the efficiency of the composite regression estimator ${\hat{τ}}_{z, t}^{CR}$ relative to the standard regression estimator ${\hat{τ}}_{z, t}^{R}$ depends on the strength of correlation of $z$ with $y$ , in the presence of the auxiliary variable $x$ .

It is noteworthy that the simultaneous calibration of the two samples results also in an updated estimator for the previous time, incorporating information from current time. Setting $Y_{(t - 1)} = (0', Y_{t - 1}')'$ , we obtain the updated calibration estimator $Y_{(t - 1)}' c = Y_{t - 1}' c_{t - 1}^{*}$ (in place of $Y_{t - 1}' c_{t - 1} = {\hat{τ}}_{y, t - 1}^{c}$ ) in the composite regression form, similar to Equation (13),

{\hat{τ}}_{y, t - 1}^{CR} = (I - {\hat{B}}_{y}^{c^{*}}) {\hat{τ}}_{y, t - 1}^{c} + {\hat{B}}_{y}^{c^{*}} ({\hat{τ}}_{y, t}^{R} - ({\hat{τ}}_{y, m, t}^{R} - {\hat{τ}}_{y, m, t - 1}^{c})),

where ${\hat{B}}_{y}^{c^{*}} = Y_{t - 1}' C_{t - 1} Ψ_{t - 1} {(Ψ_{t}' L_{x, t} Ψ_{t} + Ψ_{t - 1}' C_{t - 1} Ψ_{t - 1})}^{- 1}$ . This shows that the updated composite regression estimator ${\hat{τ}}_{y, t - 1}^{CR}$ is a weighted average of the initial previous-time composite regression estimator and the current-time regression estimator reduced by the change estimator based on the matched sample. The possible use of ${\hat{τ}}_{y, t - 1}^{CR}$ will be considered in the following section.

3.2.2. Estimates of Change

In the simultaneous calibration of the previous-time and current-time samples that generates the composite estimator ${\hat{τ}}_{y, t}^{CR}$ , the differences in estimates based on full and matched samples from previous and current time are calibrated to each other. This is due to the partial calibration constraint $Ψ' c = 0$ noted above. Writing $Ψ' c = Ψ'_{t} c_{t} - Ψ'_{t - 1} c_{t - 1}^{*}$ , we easily verify that

Ψ'_{t} c_{t} = Ψ'_{t - 1} c_{t - 1}^{*} = (I - {\hat{B}}_{d}^{c}) ({\hat{τ}}_{y, t}^{R} - {\hat{τ}}_{y, m, t}^{R}) + {\hat{B}}_{d}^{c} ({\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c}),

(16)

where ${\hat{B}}_{d}^{c} = Ψ'_{t} L_{x, t} Ψ_{t} {(Ψ_{t}' L_{x, t} Ψ_{t} + Ψ_{t - 1}' C_{t - 1} Ψ_{t - 1})}^{- 1}$ . This shows that the calibration equates both differences ${\hat{τ}}_{y, t}^{R} - {\hat{τ}}_{y, m, t}^{R}$ and ${\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c}$ to their combination in Equation (16). Since $Ψ'_{t} c_{t} = {\hat{τ}}_{y, t}^{CR} - {\hat{τ}}_{y, m, t}^{CR}$ and $Ψ'_{t - 1} c_{t - 1}^{*} = {\hat{τ}}_{y, t - 1}^{CR} - {\hat{τ}}_{y, m, t - 1}^{CR}$ , the composite calibration results in the equality

{\hat{τ}}_{y, t}^{CR} - {\hat{τ}}_{y, m, t}^{CR} = {\hat{τ}}_{y, t - 1}^{CR} - {\hat{τ}}_{y, m, t - 1}^{CR} .

(17)

Interestingly,

{\hat{τ}}_{y, t}^{CR} = {\hat{τ}}_{y, t - 1}^{CR} + {\hat{τ}}_{y, m, t}^{CR} - {\hat{τ}}_{y, m, t - 1}^{CR},

which shows that the composite regression estimate at time $t$ is simply the updated composite regression estimate at time $t - 1$ plus the change estimate based on the matched sample at times $t - 1$ and $t$ .

It follows from Equation (17) that

{\hat{τ}}_{y, t}^{CR} - {\hat{τ}}_{y, t - 1}^{CR} = {\hat{τ}}_{y, m, t}^{CR} - {\hat{τ}}_{y, m, t - 1}^{CR},

(18)

which means that the estimate of the change $τ_{y, t} - τ_{y, t - 1}$ based on the full samples $s_{t - 1}$ , $s_{t}$ is equal to the estimate of change based on the matched sample. The estimate ${\hat{τ}}_{y, t}^{CR} - {\hat{τ}}_{y, t - 1}^{CR}$ in Equation (18) involves the updated previous-time estimates ${\hat{τ}}_{y, t - 1}^{CR}$ and ${\hat{τ}}_{y, m, t - 1}^{CR}$ , and is obtained through composite calibration as $Y_{t}' c_{t} - Y_{t - 1}' c_{t - 1}^{*}$ . On the other hand, if the initial previous-time estimates ${\hat{τ}}_{y, t - 1}^{c}$ and ${\hat{τ}}_{y, m, t - 1}^{c}$ are used ( ${\hat{τ}}_{y, t - 1}^{c}$ being the already published estimate), then it follows easily from Equation (13) that the estimate of change ${\hat{τ}}_{y, t}^{CR} - {\hat{τ}}_{y, t - 1}^{c}$ can be expressed as the weighted average of full-sample and matched-sample estimates of change

{\hat{τ}}_{y, t}^{CR} - {\hat{τ}}_{y, t - 1}^{c} = (I - {\hat{B}}_{y}^{c}) ({\hat{τ}}_{y, t}^{R} - {\hat{τ}}_{y, t - 1}^{c}) + {\hat{B}}_{y}^{c} ({\hat{τ}}_{y, m, t}^{R} - {\hat{τ}}_{y, m, t - 1}^{c}),

and conveniently obtained as $Y_{t}' c_{t} - Y_{t - 1}' c_{t - 1}$ . Like the level estimate ${\hat{τ}}_{y, t}^{CR}$ , this estimate of change arises as an approximate BLUE of change.

Using notation defined in Section 3.2.1, the estimate of change for the component $y_{g}$ of $y$ is clearly obtained as $Y_{g, t}' c_{t} - Y_{g, t - 1}' c_{t - 1}$ , and for any other single variable $z$ the estimate of change is obtained as $Z_{t}' c_{t} - Z_{t - 1}' c_{t - 1}$ .

4. Comparisons with Other Methods

To facilitate the comparison of the proposed composite regression estimator ${\hat{τ}}_{y, t}^{CR}$ of $τ_{y, t}$ with the current Fuller and Rao (2001) version of the MR composite estimator, a formulation of the MR estimator is presented here using present notation. This version is an extension of the regression estimator in Equation (1), of the form

{\hat{τ}}_{y, t}^{MR} = {\hat{τ}}_{y, t} + {\hat{B}}_{x} (τ_{x, t} - {\hat{τ}}_{x, t}) + {\hat{B}}^{MR} ({\hat{τ}}_{y, t - 1}^{MR} - {\hat{τ}}_{ψ, t}^{MR}),

(19)

where ${\hat{τ}}_{y, t - 1}^{MR}$ is the MR composite estimate of $τ_{y, t - 1}$ , and ${\hat{τ}}_{ψ, t}^{MR} = (1 - α) {\hat{τ}}_{ψ, t}^{MR 1} + α {\hat{τ}}_{ψ, t}^{MR 2}$ , with $α \in (0, 1)$ , is a weighted average of two HT estimates that use data from both times $t - 1$ and $t$ but current-time $t$ sampling weights. The values $α = 0$ and $α = 1$ give the composite estimators MR1 (Singh and Merkouris 1995) and MR2 (Singh et al. 1997, 2001), respectively. Specifically, ${\hat{τ}}_{ψ, t}^{MR 1} = (Ψ_{t}^{MR 1})' w_{t}$ , where the $n_{t} \times q$ sample matrix $Ψ_{t}^{MR 1}$ is defined as $Ψ_{t}^{MR 1} = (({\hat{τ}}_{y, t - 1}^{MR})' 1_{u}', Y_{m, t - 1}')'$ , where $1_{u}$ is a column of ones for the units in the unmatched sample, and ${\hat{τ}}_{ψ, t}^{MR 2} = (Ψ_{t}^{MR 2})' w_{t}$ , where $Y_{t}^{M R 2} = {(Y_{u, t}^{'}, Y_{m, t}^{'} + R (Y_{m, t - 1}^{'} - Y_{m, t}^{'}))}^{'}$ .

Writing the extended design matrix for the MR estimator in Equation (19) as $X_{t}^{MR} = (X_{t}, Ψ_{t}^{MR})$ , where $Ψ_{t}^{MR} = (1 - α) Ψ_{t}^{MR 1} + α Ψ_{t}^{MR 2}$ is the sample matrix of the composite auxiliary variables, and the associated vector of totals as $τ_{X}^{MR} = (τ_{x, t}', ({\hat{τ}}_{y, t - 1}^{MR})')'$ , this estimator can be written as

{\hat{τ}}_{y, t}^{MR} = {\hat{τ}}_{y, t} + {\hat{B}}^{MR} (τ_{X}^{MR} - {\hat{τ}}_{X}^{MR}),

where ${\hat{τ}}_{X}^{MR} = (X_{t}^{MR})' w_{t}$ and ${\hat{B}}^{MR} = Y_{t}' W_{t} X_{t}^{MR} {[(X_{t}^{MR})' W_{t} X_{t}^{MR}]}^{- 1}$ . Note that the coefficients ${\hat{B}}_{x}$ and ${\hat{B}}^{MR}$ in Equation (19) are the two components of ${\hat{B}}^{MR}$ . It is clear then that ${\hat{τ}}_{y, t}^{MR}$ can be expressed as a calibration estimator, that is, ${\hat{τ}}_{y, t}^{MR} = Y'_{t} c_{t}^{MR}$ , where the vector of calibrated weights

c_{t}^{MR} = w_{t} + W_{t} X_{t}^{MR} {[(X_{t}^{MR})' W_{t} X_{t}^{MR}]}^{- 1} (τ_{X}^{MR} - {\hat{X}}_{t}^{MR})

satisfies the initial constraints $X_{t}' c_{t}^{MR} = τ_{x, t}$ and the additional constraints $(Ψ_{t}^{MR})' c_{t}^{MR} = {\hat{τ}}_{y, t - 1}^{MR}$ .

The composite auxiliary variables used in the MR composite estimation, with the previous-time MR estimator used as corresponding calibration total, involve data from the current and previous survey time, and sample matching between the two times is done at the individual record level. In this matching procedure, missing values are imputed using mean imputation and carry-backward imputation for the MR1 and MR2 components, respectively. The choice of the value of the tuning constant $α$ depends on the relative importance of the level and change estimates, and leads also to a compromise between the two imputation methods. Satisfying the two sets of calibration constraints requires that the previous-time survey data be calibrated to the current-time demographic totals. Detailed discussion of implementation issues related to the choice of the tuning constant and the imputation required in specifying the composite auxiliary variables is found in Gambino et al. (2001) and Statistics Canada (2017).

The proposed composite regression estimators for levels and changes derive their efficiency from the fact that they are approximate BLUEs, with the partial regression coefficient ${\hat{B}}_{y}^{c}$ incorporating information on $y$ from both previous-time and current-time samples. Note that the estimate $\hat{B} = Y_{(t)}' \hat{Var} (w) X {[X' \hat{Var} (w) X]}^{- 1}$ of the BLUE coefficient in (6) is a function of data from both survey times. The regression coefficient $\hat{B} = Y_{(t)}' W X {(X' W X)}^{- 1}$ in Equation (7), a practicable substitute of $\hat{B}$ , and hence the partial regression coefficient ${\hat{B}}_{y}^{c}$ in Equation (12), are also functions of weighted data from both survey times. It should be also noted that the submatrix $C_{t - 1}$ of the weighting matrix $W$ incorporates past information through the composite calibrated weights $c_{t - 1}$ of time $t - 1$ .

In contrast, the MR-composite estimator is generated by a calibration of current-time weights, whereby current-time estimates are calibrated to previous-time estimates, the latter being treated as constants in calibration, and thus the regression coefficient incorporates weighted data from current time only. This becomes evident in the MR2 composite estimator (the Fuller-Rao version with $α = 1$ ), which has the form in Equation (19) with the second regression term written explicitly as ${\hat{τ}}_{y, t - 1}^{MR 2} - {\hat{τ}}_{y, m, t - 1}^{*} - ({\hat{τ}}_{y, t} - {\hat{τ}}_{y, m, t})$ . In this expression, ${\hat{τ}}_{y, m, t - 1}^{*}$ estimates current population totals for the previous-time variables $y$ , and the previous-time MR2 estimator ${\hat{τ}}_{y, t - 1}^{MR 2}$ is the calibration total, also transformed to reflect possible changes in demographic totals from $t - 1$ to $t$ . On the other hand, in the comparable form in Equation (9) of the proposed composite estimator ${\hat{τ}}_{y, t}^{CR}$ (which is generated by the simultaneous calibration of $s_{t - 1}$ and $s_{t}$ ) the calibration total associated with the second regression term ${\hat{τ}}_{y, t - 1}^{c} - {\hat{τ}}_{y, m, t - 1}^{c} - ({\hat{τ}}_{y, t} - {\hat{τ}}_{y, m, t})$ is zero.

In view of the construction of the proposed composite estimators of level and change, there is no need for searching a compromise between estimation of level and change using a tuning constant, as with the Fuller-Rao estimator. A similar comment is made in Bonnéry et al. (2020) regarding a BLUE of level and a BLUE of change, in a comparison with the Fuller-Rao estimator.

The proposed composite estimation is free of problems with sample matching between two consecutive times at the individual record level, as required in the MR-composite estimation. These problems arise when, for a given matched sample, data is available only for one survey time. This may occur due to nonresponse in either survey time or when a move or change in scope has taken place between the two consecutive survey times; see Gambino et al. (2001). Micro-matching involves imputation, that is, artificial creation of data, which may not reflect the actual correlation of data collected from the same units in successive survey times. This may create a false effect on the efficiency of the MR estimator, and may also introduce bias. Such bias, which may be accumulated over time due to the recursive nature of the composite estimator, is avoided in the proposed estimation procedure. The proposed method is also free of operational complexities of the MR-composite estimation, which include the extra calibration of past-month data to the current-month population totals, and the cumbersome variance estimation by resampling methods; see Statistics Canada (2017). In current MR methodology, bootstrap replicates of the composite calibration totals of previous month are computed, adjusted to current month population totals, and used in current month calculations using bootstrap sample coordination between survey times. In the proposed method, estimation and variance estimation can be done conveniently in one step, with replication of the composite calibration of the combined sample to generate replicate estimates for the variance calculations.

The form of the composite regression estimator in Equation (12) is similar to that of the K-composite estimator, with the regression coefficient ${\hat{B}}_{y}^{c}$ in place of the coefficient K. In K-composite and AK-composite estimation, values of A and K that are optimal over time, in the sense of minimum variance of the estimator, are empirically chosen for each variable of interest, which may cause inconsistencies among estimates. In contrast, the proposed composite estimator, with the time-dependent matrix coefficient ${\hat{B}}_{y}^{c}$ , is multivariate and thus the efficiency of estimation for each of the components of $y$ (or a linear combination of any subset of them) may be enhanced by the correlation with other components, as indicated by Equation (14). Also unlike AK-composite estimation, in which only estimates for selected key variables are true composite estimates, the proposed composite calibration generates composite regression estimates for any variable, as shown in Equation (15). Operationally, unlike the AK-estimation, where calibration to satisfy known population totals and composite estimation are separate steps, calibration weighting in the proposed composite regression estimation is done in one step, that is, simultaneously with weighting to satisfy the standard calibration constraints.

Like the proposed multivariate composite regression estimator, the regression estimator of Konrad and Berger (2023) also involves the combined sample $s_{t - 1} \cup s_{t}$ and does not require imputation, though it has a different structure: It is multivariate with respect to the totals of a single variable $y$ in two consecutive survey times. It involves the vector $x$ in both $s_{t - 1}$ and $s_{t}$ and the vector of the additional auxiliary variables represents sampling design information in terms of stratification and sample rotation, instead of information on the key variables of interest from both survey times (as in Equation (9)). Furthermore, this estimator is not recursive and is not formulated as a calibration estimator.

5. Discussion

We have developed a new method of composite regression estimation, based on the principle of best linear unbiased estimation and with a transparent structure of the multivariate composite estimators of levels and changes for any survey variable. These estimators are generated by a suitable simultaneous calibration of the weights of the combined sample of current and previous survey times. This theoretically well-founded calibration procedure allows a most effective incorporation of past information in the current time weights, and incorporation of current information in the previous time weights, resulting in composite estimators whose efficiency is expected to compare favorably with that of existing estimators.

The proposed composite estimation method is considerably more practical than the existing composite estimation methods. This is a significant advantage of the new method, considering the operational complexity of a composite estimation process.

The proposed method can be extended to rotation schemes that are more general than the typical scheme outlined in Section 2.

An important issue with repeated surveys with rotating panels is the possible rotation bias due to the differential nonresponse and measurement error for the different panels. It is known that the birth panel usually differs most from the others, so that the matched and unmatched samples differ. This is a intrinsic problem with repeated surveys with such design, and affects not only composite estimation but also the basic Horvitz-Thompson estimation and the standard regression estimation. The reduction of the effect of rotation group bias on composite estimation by addressing the cause of this bias (e.g., differences in nonresponse rates and in the mode of data collection) has been discussed in the literature; see Gambino et al. (2001). In this connection, an extension of the composite regression estimator ${\hat{τ}}_{y, t}^{CR}$ , analogous to the extension of the K-composite to the AK-composite estimator, could involve adding the regression term ${\hat{τ}}_{y, u, t} - {\hat{τ}}_{y, m, t}$ in Equation (9), where ${\hat{τ}}_{y, u, t} = r Y_{u, t}' w_{u, t}$ is the estimate of $τ_{y, t}$ based on the unmatched (“birth”) panel at time $t$ . This is done by augmenting the matrix $X$ in Equation (5) by the column $({\bar{Ψ}}_{t}', 0')'$ , where ${\bar{Ψ}}_{t} = - r Ψ_{t}$ , and using the vector of calibration totals $(τ_{x, t}', 0', 0')'$ . The extended calibration corresponding to this extended regression estimation will result in ${\hat{τ}}_{y, u, t}^{CR} = \hat{τ_{y, m, t}^{CR}}$ , which may help to reduce the birth rotation bias due to the usual difference of the birth panel from the other panels.

The performance of the proposed composite regression estimators for levels and changes needs to be assessed through an extensive empirical study using actual data from a repeated survey with rotating panels (e.g., data from a Labour Force Survey). These estimators should be evaluated for multiple survey characteristics using data over a sufficient period of time, to generate the effect of the recursive incorporation of past information, and their advantages should be judged not only on their statistical efficiency but also on their impact on various time series with respect to stability and seasonal adjustment. Such is the study for the MR estimator in Gambino et al. (2001). Considering that the Bonnéry et al. (2020) evaluation study declared the current MR method preferable to the AK method, the comparison could be limited to one between the proposed method and the MR method and the method of Konrad and Berger (2023). Such an empirical study is beyond the scope of the present article.

Footnotes

Acknowledgements

The author is grateful to the Editor-in-Chief, Associate Editor and the referees for their constructive comments and suggestions that have helped to improve substantially the article.

Funding

The author declared that they received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Takis Merkouris

Received: March 2023

Accepted: April 2024

References

Beaumont

J.-F.

Bocci

2005. “A Refinement of the Regression Composite Estimator in the Labour Force Survey for Change Estimates.”Proceedings of the Survey Methods Section: Statistical Society of Canada SSC, Annual Meeting, June. Available at: https://ssc.ca/sites/default/files/survey/documents/SSC2005_C_Bocci.pdf

Bell

2001. “Comparison of Alternative Labour Force Survey Estimators.” Survey Methodology 27: 53–63. Available at: https://www150.statcan.gc.ca/n1/pub/12-001-x/2001001/article/5854-eng.pdf

Bell

W. R.

Hillmer

S. C.

1990. “The Time Series Approach to Estimation for Periodic Surveys.” Survey Methodology 16: 195–215. Available at: https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1990002/article/14535-eng.pdf

Binder

D. A.

Dick

J. P.

1989. “Modeling and Estimation for Repeated Surveys.” Survey Methodology 15: 29–45. Available at: https://www150.statcan.gc.ca/n1/pub/12-001-x/1989001/article/14579-eng.pdf

Bonnéry

Cheng

Lahiri

2020. “An Evaluation of Design-Based Properties of Different Composite Estimators.” Statistics in Transition New Series 21: 166–90. DOI: https://doi.org/10.21307/stattrans-2020-037

Cantwell

P. J.

Ernst

L. R.

1992. “New Developments in Composite Estimation for the Current Population Survey.”Proceedings of Statistics Canada Symposium 92: Design and Analysis of Longitudinal Surveys, 121–30. Available at: https://publications.gc.ca/collections/collection_2017/statcan/CS11-522-1992-eng.pdf

Fuller

W. A.

1990. “Analysis of Repeated Surveys.” Survey Methodology 16: 167–80. Available at: https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X199000214537.

Fuller

W. A.

Rao

J. N. K.

2001. “A Regression Composite Estimator with Application to the Canadian Labour Force Survey.” Survey Methodology 27: 45–51. Available at: https://www150.statcan.gc.ca/n1/pub/12-001-x/2001001/article/5853-eng.pdf.

Gambino

Kennedy

Singh

M. P.

2001. “Regression Composite Estimation for the Canadian Labour Force Survey: Evaluation and Implementation.” Survey Methodology 27: 65–74. Available at: https://www150.statcan.gc.ca/n1/pub/12-001-x/2001001/article/5855-eng.pdf.

10.

Gurney

Daly

J. F.

1965. “A Multivariate Approach to Estimation in Periodic Sample Surveys.”Proceedings of the Social Statistics Section: American Statistical Association, 242–57. Available at: http://www.asasrms.org/Proceedings/y1965/A%20Multivariate%20Approach%20To%20Estimation%20In%20Periodic%20Sample%20Surveys.pdf

11.

Hansen

M. H.

Hurwitz

W. N.

Nisselson

Steinberg

1955. “The Redesign of the Census Current Population Survey.” Journal of the American Statistical Association 50: 701–19.

12.

Jones

R. G.

1980. “Best Linear Unbiased Estimators for Repeated Surveys.” Journal of the Royal Statistical Society: Series B (Methodological) 42: 221–6. DOI: 10.1111/J.2517-6161.1980.TB01123.XCorpus ID: 124864331

13.

Konrad

Berger

2023. “A Multivariate Regression Estimator of Levels and Changes for Surveys Over Time.” Journal of Official Statistics 39: 27–44. DOI: https://doi.org/10.2478/jos-2023-0002

14.

Lent

Miller

Cantwell

1994. “Composite Weights for the Current Population Surveys.”Proceedings of the Survey Research Methods Section: American Statistical Association, 867–72. Available at: https://www.bls.gov/osmr/research-papers/1994/pdf/cp940060.pdf

15.

Lent

Miller

S. M.

Cantwell

P. J.

Duff

1999. “Effect of Composite Weights on Some Estimates from the Current Population Survey.” Journal of Official Statistics 14: 431–48. Available at: https://www.scb.se/contentassets/ff271eeeca694f47ae99b942de61df83/effects-of-composite-weights-on-some-estimates-from-the-current-population-survey.pdf

16.

Merkouris

2004. “Combining Independent Regression Estimators from Multiple Surveys.” Journal of the American Statistical Association 99: 1131–9. Available at: https://www.jstor.org/stable/27590491

17.

Pfeffermann

1991. “Estimation and Seasonal Adjustment of Population Means Using Data from Repeated Surveys.” Journal of Business and Economic Statistics 9: 163–75. DOI: https://doi.org/10.2307/1391783

18.

Rao

J. N. K.

1994. “Estimating Totals and Distribution Functions Using Auxiliary Information at the Estimation Stage.” Journal of Official Statistics 10: 153–65. Available at: https://www.scb.se/contentassets/f6bcee6f397c4fd68db6452fc9643e68/estimating-totals-and-distribution-functions-using-auxiliary-information-at-the-estimation-stage.pdf

19.

Scott

A. J.

Smith

T. M. F.

Jones

R. G.

1977. “The Application of Time Series Methods to the Analysis of Repeated Surveys.” International Statistical Review 45: 13–28. DOI: https://www.jstor.org/stable/1403000

20.

Singh

A. C.

Kennedy

2001. “Regression Composite Estimation for the Canadian Labour Force Survey with a Rotating Panel Design.” Survey Methodology 27: 33–44. Available at: https://www150.statcan.gc.ca/n1/pub/12-001-x/2001001/article/5852-eng.pdf

21.

Singh

A. C.

Kennedy

Brisebois

1997. “Composite Estimation for the Canadian Labour Force Survey.”Proceedings of the Survey Research Methods Section: American Statistical Association, 300–5. Available at: http://www.asasrms.org/Proceedings/papers/1997_050.pdf

22.

Singh

A. C.

Merkouris

1995. “Composite Estimation by Modified Regression for Repeated Surveys.”Proceedings of the Survey Research Methods Section: American Statistical Association, 420–5. Available at: http://www.asasrms.org/Proceedings/papers/1995_071.pdf

23.

Statistics Canada. 2017. “Methodology of the Canadian Labour Force Survey.” Statistics Canada, Catalogue no. 71-526-X. Available at: https://www150.statcan.gc.ca/n1/en/pub/71-526-x/71-526-x2017001-eng.pdf

24.

Tiller

1989. “A Kalman Filter Approach to Labor Force Estimation Using Survey Data.”Proceedings of the Section on Survey Research Methods: American Statistical Association, 16–25. Available at: http://www.asasrms.org/Proceedings/papers/1989_003.pdf

A New Approach to Composite Estimation for Repeated Surveys with Rotating Panels

Abstract

Keywords

1. Introduction

2. Notation and Preliminaries

3. A New Method of Composite Estimation

3.1. Constructing a Composite Regression Estimator

3.2. Analytical Expressions of CR Estimates of Levels and Changes

3.2.1. Estimates of Levels

3.2.2. Estimates of Change

4. Comparisons with Other Methods

5. Discussion

Footnotes

Acknowledgements

Funding

ORCID iD

References