Abstract
In clinical and public health studies, it is often the case that some variables relevant to the analysis are too difficult or costly to measure for all individuals in the population of interest. Rather, a subsample of these individuals must be identified for additional data collection. A sampling scheme that incorporates readily-available information for the entire target population at the design stage can increase the statistical efficiency of the intended analysis. While there is no universally optimal sampling design, under certain principles and restrictions, a well-designed and efficient sampling strategy can be implemented. In two-phase designs, efficiency can be gained by stratifying on the outcome and/or auxiliary information that is known at phase I. Additional gains in efficiency can be obtained by determining the optimal allocation of the sample sizes across the strata, which depends on the quantity that is being estimated. In this paper, the inference is concerned with one or multiple regression parameter(s) where the study units are naturally clustered and, thus, exhibit correlation in outcomes. We propose several allocation strategies within the framework of two-phase designs for the estimation of the regression parameter(s) obtained from weighted generalized estimating equations. The proposed methods extend existing theory to address the objective of the estimating regression parameters in cluster-correlated data settings by minimizing the asymptotic variance of the estimator subject to a fixed sample size. Through a comprehensive simulation study, we show that the proposed allocation schemes have the potential to yield substantial efficiency gains over alternative strategies.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
