Abstract
When inferring population characteristics from a nonprobability sample, it is crucial to correct the possible selection bias therein by, for example, pseudo-weighting. Many correction methods focus on estimating the population means of the target variable. However, often the quantities of subpopulations are also of interest. It is unclear whether pseudo-weights are suitable for domain estimation, since the weights unavoidably introduce variation and possibly even bias in the downstream estimation.
To address this issue, modeling on the domain level may be an option. We evaluate two promising domain estimation methods on weighted nonprobability samples. The first one is iterative proportional fitting (IPF), where the margins are considered in the domain estimation, so that the marginal values may be fixed when improving the domain estimates. The other is a hierarchical Bayesian model, in which the pseudo-weights are included in the domain modeling process. This approach enjoys the flexibility of modeling when different types of information are available. We evaluate a range of modeling options for the two methods, and compare them in a simulation study. We also evaluate the methods with resampled real data sets to mimic the scenario where the relation between variables and the inclusion mechanism of the nonprobability samples are unknown to the researchers.
We found that applying IPF to the unweighted table and the hierarchical Bayesian model improves the domain estimation in most cases. If both marginal and domain estimates are of interest, the estimated overall population total or mean should be considered in the domain modeling process.
Keywords
Get full access to this article
View all access options for this article.
