Abstract
Composite estimation in repeated surveys with rotating panels refers to methods of estimation that exploit correlations in the data in the sample overlap between survey times to improve the precision of estimates. In this article a novel approach to composite estimation is proposed, in which composite regression estimators of current totals for a number of key variables are generated from a simultaneous calibration of the sampling weights of the overlapping samples of the current and previous survey time. In this procedure, in addition to the usual calibration to known population totals, differences of estimates for the key variables based on the full sample and the common sample from the two consecutive times are calibrated to each other. The resulting multivariate composite regression estimator, which is constructed as an approximate best linear unbiased estimator, incorporates effectively information from the samples of both survey times for enhanced estimation efficiency. Unlike other composite regression estimators, the proposed estimator does not require micro-matching of data in the overlap sample, and, therefore, is free of potential issues associated with it. It is also considerably more practical than existing composite regression estimators and the traditional AK-composite estimator.
Keywords
1. Introduction
Some repeated surveys, typically Labor Force Surveys, use a sampling design with rotating panels for operational and statistical efficiency. In such a design, a large overlap in the samples between successive survey times allows the improvement of estimates of population parameters, especially for those variables for which there is a strong correlation between the values reported by the same units in successive times. Composite estimation refers to estimation methods that use information from previous times to improve the precision of both the point-in-time (“level”) estimates and estimates of change between consecutive times, by exploiting correlations in the data of the overlap sample. This improvement in precision can in turn reduce the volatility in the time series of estimates, especially of estimates with high sampling variability associated with subpopulations of interest.
The earliest composite estimation method, known as “K-composite estimation,” was introduced for the US Current Population Survey (CPS) by Hansen et al. (1955), and extended later to the “AK-composite estimation” by Gurney and Daly (1965), and to the “AK-composite weighting”; see Fuller (1990), Cantwell and Ernst (1992), and Lent et al. (1994, 1999).
Later, a type of regression method of composite estimation introduced for the Canadian Labor Force Survey, called modified regression (MR) composite estimation, was developed to overcome certain shortcomings of the AK method; see Singh and Merkouris (1995), Singh et al. (1997, 2001), Gambino et al. (2001), Fuller and Rao (2001), Bell (2001), Beaumont and Bocci (2005). A recent evaluation study by Bonnéry et al. (2020), devised to assess the design-based properties of different composite estimators using CPS data and CPS sample design, found that the currently used Fuller-Rao version of the MR composite estimator performed the best. A more recent regression estimator has been proposed by Konrad and Berger (2023).
In a time series approach, other authors (Bell and Hillmer 1990; Binder and Dick 1989; Jones 1980; Pfeffermann 1991; Scott et al. 1977; Tiller 1989) developed estimators for repeated surveys allowing for stochastic variation in the parameters being estimated.
In this article, a novel approach to composite estimation is proposed, in which composite regression estimators of current totals for a number of key variables are generated from a simultaneous calibration of the sampling weights of the overlapping samples of the current and previous survey time. In this procedure, in addition to the usual calibration to known population totals of auxiliary variables, differences of estimates for the key variables based on the full sample and the common sample of the two consecutive surveys are calibrated to each other.
The proposed approach is motivated by considerations of most effective use of data collected in successive survey times. Earlier work that adopted this approach (e.g., Gurney and Daly 1965; Jones 1980) centered on the formulation of best linear unbiased estimators (BLUE) of a population parameter, involving linear combinations of estimators from a number of survey times. These estimators are rather unrealistic in practice to use when the series of available survey data is long, and depend on the intractable covariance structure of the combined estimators. In contrast, the proposed composite regression estimator of a vector of population totals is constructed as a practical approximation of a BLUE that involves a particular set of estimators from the latest two survey times.
The proposed composite regression estimators of current totals and changes can be particularly efficient because the regression coefficients incorporate information from the samples of both current and previous survey time, as do the coefficients of the BLUE. Furthermore, the simultaneous calibration of the samples of consecutive survey times facilitates greatly variance estimation by resampling methods. Unlike the MR-composite method, the proposed method of composite regression estimation does not require micro-matching of data in the common sample, and therefore is free of potential quality issues associated with it. It is also considerably more practical than the MR-composite estimation and the traditional AK-composite estimation. The comparative merits of the proposed estimator are discussed in detail in Section 4. Section 2 provides the notation and estimation preliminaries. Section 3 describes the construction of the composite regression estimators of level and change for any survey variable, as approximate BLUEs, through a suitable calibration procedure. A concluding discussion is provided in Section 5.
2. Notation and Preliminaries
The sample of a repeated survey with a rotating panel design is typically made up of a number (say
Let
The current-time Horvitz-Thompson (HT) estimators of the totals
The standard regression estimator
where
derived by minimizing the generalized least-squares distance
The regression estimator of
where
3. A New Method of Composite Estimation
3.1. Constructing a Composite Regression Estimator
The proposed method of composite estimation arises from the search for a most effective use of information on the vector
Using the condition of unbiasedness
which can be written as
where
It is pointed out here that a more typical formulation of the BLUE
where
The more transparent formulation of the BLUE of
where
and thus the matrix coefficient
where
A practical substitute can be obtained by replacing
where
The composite regression estimator in Equation (7) is generated by an extended calibration procedure which involves both samples
The vector
3.2. Analytical Expressions of CR Estimates of Levels and Changes
3.2.1. Estimates of Levels
The composite regression estimator of
where
Partitioning the matrix
where
The vector
with
and
where
Now, using Equation (10) and Equation (11), the composite calibration estimator
where
where
Expression (12) of the composite regression estimator
Expression (13) shows that the composite regression estimator
Expression (13) gives the composite regression estimator
where
Let now
where
It is noteworthy that the simultaneous calibration of the two samples results also in an updated estimator for the previous time, incorporating information from current time. Setting
where
3.2.2. Estimates of Change
In the simultaneous calibration of the previous-time and current-time samples that generates the composite estimator
where
Interestingly,
which shows that the composite regression estimate at time
It follows from Equation (17) that
which means that the estimate of the change
and conveniently obtained as
Using notation defined in Section 3.2.1, the estimate of change for the component
4. Comparisons with Other Methods
To facilitate the comparison of the proposed composite regression estimator
where
Writing the extended design matrix for the MR estimator in Equation (19) as
where
satisfies the initial constraints
The composite auxiliary variables used in the MR composite estimation, with the previous-time MR estimator used as corresponding calibration total, involve data from the current and previous survey time, and sample matching between the two times is done at the individual record level. In this matching procedure, missing values are imputed using mean imputation and carry-backward imputation for the MR1 and MR2 components, respectively. The choice of the value of the tuning constant
The proposed composite regression estimators for levels and changes derive their efficiency from the fact that they are approximate BLUEs, with the partial regression coefficient
In contrast, the MR-composite estimator is generated by a calibration of current-time weights, whereby current-time estimates are calibrated to previous-time estimates, the latter being treated as constants in calibration, and thus the regression coefficient incorporates weighted data from current time only. This becomes evident in the MR2 composite estimator (the Fuller-Rao version with
In view of the construction of the proposed composite estimators of level and change, there is no need for searching a compromise between estimation of level and change using a tuning constant, as with the Fuller-Rao estimator. A similar comment is made in Bonnéry et al. (2020) regarding a BLUE of level and a BLUE of change, in a comparison with the Fuller-Rao estimator.
The proposed composite estimation is free of problems with sample matching between two consecutive times at the individual record level, as required in the MR-composite estimation. These problems arise when, for a given matched sample, data is available only for one survey time. This may occur due to nonresponse in either survey time or when a move or change in scope has taken place between the two consecutive survey times; see Gambino et al. (2001). Micro-matching involves imputation, that is, artificial creation of data, which may not reflect the actual correlation of data collected from the same units in successive survey times. This may create a false effect on the efficiency of the MR estimator, and may also introduce bias. Such bias, which may be accumulated over time due to the recursive nature of the composite estimator, is avoided in the proposed estimation procedure. The proposed method is also free of operational complexities of the MR-composite estimation, which include the extra calibration of past-month data to the current-month population totals, and the cumbersome variance estimation by resampling methods; see Statistics Canada (2017). In current MR methodology, bootstrap replicates of the composite calibration totals of previous month are computed, adjusted to current month population totals, and used in current month calculations using bootstrap sample coordination between survey times. In the proposed method, estimation and variance estimation can be done conveniently in one step, with replication of the composite calibration of the combined sample to generate replicate estimates for the variance calculations.
The form of the composite regression estimator in Equation (12) is similar to that of the K-composite estimator, with the regression coefficient
Like the proposed multivariate composite regression estimator, the regression estimator of Konrad and Berger (2023) also involves the combined sample
5. Discussion
We have developed a new method of composite regression estimation, based on the principle of best linear unbiased estimation and with a transparent structure of the multivariate composite estimators of levels and changes for any survey variable. These estimators are generated by a suitable simultaneous calibration of the weights of the combined sample of current and previous survey times. This theoretically well-founded calibration procedure allows a most effective incorporation of past information in the current time weights, and incorporation of current information in the previous time weights, resulting in composite estimators whose efficiency is expected to compare favorably with that of existing estimators.
The proposed composite estimation method is considerably more practical than the existing composite estimation methods. This is a significant advantage of the new method, considering the operational complexity of a composite estimation process.
The proposed method can be extended to rotation schemes that are more general than the typical scheme outlined in Section 2.
An important issue with repeated surveys with rotating panels is the possible rotation bias due to the differential nonresponse and measurement error for the different panels. It is known that the birth panel usually differs most from the others, so that the matched and unmatched samples differ. This is a intrinsic problem with repeated surveys with such design, and affects not only composite estimation but also the basic Horvitz-Thompson estimation and the standard regression estimation. The reduction of the effect of rotation group bias on composite estimation by addressing the cause of this bias (e.g., differences in nonresponse rates and in the mode of data collection) has been discussed in the literature; see Gambino et al. (2001). In this connection, an extension of the composite regression estimator
The performance of the proposed composite regression estimators for levels and changes needs to be assessed through an extensive empirical study using actual data from a repeated survey with rotating panels (e.g., data from a Labour Force Survey). These estimators should be evaluated for multiple survey characteristics using data over a sufficient period of time, to generate the effect of the recursive incorporation of past information, and their advantages should be judged not only on their statistical efficiency but also on their impact on various time series with respect to stability and seasonal adjustment. Such is the study for the MR estimator in Gambino et al. (2001). Considering that the Bonnéry et al. (2020) evaluation study declared the current MR method preferable to the AK method, the comparison could be limited to one between the proposed method and the MR method and the method of Konrad and Berger (2023). Such an empirical study is beyond the scope of the present article.
Footnotes
Acknowledgements
The author is grateful to the Editor-in-Chief, Associate Editor and the referees for their constructive comments and suggestions that have helped to improve substantially the article.
Funding
The author declared that they received no financial support for the research, authorship, and/or publication of this article.
Received: March 2023
Accepted: April 2024
