Abstract
Estimating the precision of a single proportion via a 100(1−α)% confidence interval in the presence of clustered data is an important statistical problem. It is necessary to account for possible over-dispersion, for instance, in animal-based teratology studies with within-litter correlation, epidemiological studies that involve clustered sampling, and clinical trial designs with multiple measurements per subject. Several asymptotic confidence interval methods have been developed, which have been found to have inadequate coverage of the true proportion for small-to-moderate sample sizes. In addition, many of the best-performing of these intervals have not been directly compared with regard to the operational characteristics of coverage probability and empirical length. This study uses Monte Carlo simulations to calculate coverage probabilities and empirical lengths of five existing confidence intervals for clustered data across various true correlations, true probabilities of interest, and sample sizes. In addition, we introduce a new score-based confidence interval method, which we find to have better coverage than existing intervals for small sample sizes under a wide range of scenarios.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
