Abstract
Unequal probability sampling designs are often used when unit size is predictive of unit outcome. Such sampling schemes increase the precision of the estimated totals relative to other fixed size unbiased sampling strategies by assigning inclusion probabilities proportional to unit size. Consequently, the largest units on the sampling frame have inclusion probabilities approaching one, while the smallest have inclusion probabilities approaching zero, with varying sampling weights. At the design stage, it may be reasonable to assume that these smallest units will have small values for outcome variables. However, any violation of this assumption can have detrimental consequences on the precision of the sample-based estimates. In practice, reducing the influence of such units is often accomplished via weight trimming (setting a threshold for maximum sampling weight after sample selection). With multipurpose surveys, it is difficult to determine optimal thresholds appropriate for all variables. This paper introduces the Exchangeable Unit Inclusion Probability Average algorithm, a data-dependent procedure that uses clustering to determine stratum specific thresholds for minimum inclusion probabilities, assigned before sample selection. This approach yields unbiased samples and can reduce sampling variance. We present the empirical application of this procedure to the U.S. Census Bureau’s Annual Integrated Economic Survey.
Get full access to this article
View all access options for this article.
