Sage Journals: Discover world-class research

Abstract

Keywords

voluntary survey projection estimator generalized entropy calibration weight trimming

1. Introduction

Nonprobability samples (NPS) are increasingly popular in recent years due to their convenience and low cost (Baker et al. 2013). Unlike probability samples, NPS are subject to selection bias, which is difficult to control (Bethlehem 2016; Meng 2018). Because of this difficulty, Rohr et al. (2024) claimed that NPS should not be used for general purpose inferences. Nonetheless, NPS are widely used in many social surveys despite their potential drawbacks. Wu (2022) provides a comprehensive overview of the statistical methods for analyzing NPS survey data.

Mitigating selection bias in nonprobability samples (NPS) remains a critical challenge in survey research. This paper explores calibration weighting as a statistical method to address this bias. Weighting is one of the central estimation steps in survey sampling. Haziza and Beaumont (2017) presented a comprehensive review of the weighting methods in survey sampling practice. While many weighting methods for NPS rely on propensity score (PS) models (Chen et al. 2020; Elliott and Valliant 2017; Valliant and Dever 2011), these models often face significant limitations. Accurately modeling the complex factors influencing survey participation is inherently difficult. Survey participation involves multiple selection processes, including internet access, willingness to participate, and non-response, making it difficult to capture all relevant factors in a single model (Herzing et al. 2022). Moreover, even with a correctly specified PS model, the resulting estimator can exhibit high variability if the propensity score has a weak or no relationship with the study variables of interest (Park et al. 2019).

On the other hand, the proposed calibration weights leverage an outcome regression (OR) model. Thus, it allows for the incorporation of subject-matter knowledge from other surveys to build the OR model and subsequently develop the calibration estimator. The validity of the resulting estimator depends on the plausibility of the OR model assumption and the additional assumption of the ignorability of the sampling mechanism (Sugden and Smith 1984). Furthermore, model diagnostic tools can be used to validate the specified OR model. Thus, in practice, one can have more confidence on the OR model than on the PS model. Notably, this prediction model-based approach to calibration weighting has been considered by Kott and Liao (2012) in the context of handling unit nonresponse.

This paper presents the core concepts of calibration weighting incorporating an outcome regression (OR) model. We then introduce the generalized entropy calibration framework (Kwon et al. 2025) as a unified approach to calibration weighting for nonprobability sample (NPS) data analysis. We conclude by briefly discussing some implications of the algebraic properties of this proposed calibration method and some future research directions.

2. Basic Setup

Consider a finite population with index set $U = {1, \dots, N}$ . We use $x_{i} \in R^{p}$ and $y_{i}$ to denote the vector of auxiliary variables and the study variable associated with unit $i$ , respectively. We assume that a non-probability sample $S \subset U$ is selected from the finite population and observe $y_{i}$ in the sample. We further assume that the auxiliary variables are observed throughout the finite population.

We are interested in estimating $Y = \sum_{i = 1}^{N} y_{i}$ from the sample. To incorporate the auxiliary variable $x_{i}$ observed in the finite population, we consider a linear regression model

y_{i} = x_{i}^{⊤} β + e_{i}

(1)

for some $β$ , where $e_{i} ~ (0, σ^{2})$ and $e_{i}$ is independent of $x_{i}$ . Here, an intercept term is included in $x_{i}$ . We assume the sampling mechanism is ignorable in the sense that $δ ⊥ Y ∣ x$ under model Equation (1), where $δ_{i} = 1$ if $i \in S$ and $δ_{i} = 0$ otherwise. The ignorability assumption means that the regression model in Equation (1) also holds in the sample.

We consider a class of linear estimators defined as

{\hat{Y}}_{ω} = \sum_{i \in S} ω_{i} y_{i}

for some $ω_{i} > 0$ that does not depend on $y$ -values. Linear estimators are attractive as they produce internally consistent estimates (Fuller 2009, page 99). We impose that the final weights satisfy the calibration constraint:

\sum_{i \in S} ω_{i} x_{i} = \sum_{i = 1}^{N} x_{i} .

(2)

Now, using the calibration weights ${{\hat{ω}}_{i}; i \in S}$ , the weighted regression coefficients are computed as $\hat{β} = {(\sum_{i \in S} {\hat{ω}}_{i} x_{i} x_{i}^{⊤})}^{- 1} \sum_{i \in S} {\hat{ω}}_{i} x_{i} y_{i},$ which implies that

\sum_{i \in S} {\hat{ω}}_{i} (y_{i} - x_{i}^{⊤} \hat{β}) = 0 .

(3)

Therefore, we obtain

\sum_{i \in S} {\hat{ω}}_{i} y_{i} = \sum_{i \in S} {\hat{ω}}_{i} x_{i}^{⊤} \hat{β} = \sum_{i = 1}^{N} x_{i}^{⊤} \hat{β},

(4)

where the first equality follows from Equation (3) and the second equality follows from Equation (2). Equation (4) shows the calibration estimator is indeed a projection estimator (Kim and Rao 2011) using the prediction model in Equation (1). Constraint Equation (3) guarantees that the resulting projection estimator is model-unbiased. The algebraic equivalence in Equation (4) clearly suggests that the calibration estimator is justified under the outcome regression model in Equation (1). That is, the outcome regression model is implicitly used in the calibration weighting, while it is explicitly used in the projection estimation.

Because of the algebraic equivalence in Equation (4), we can use the statistical tools for regression analysis when constructing calibration weights. For instance, sieve estimation method can be used to construct a nonparametric regression projection estimator as in Chan et al. (2015), with polynomial orders determined by minimizing mean squared prediction errors in the validation sample.

3. Generalized Entropy Calibration

If the dimension of the covariate vector is smaller than the sample size, there may exist infinitely many solutions to the calibration equation in Equation (2). To uniquely determine $ω_{i}$ , Kwon et al. (2025) proposed minimizing the generalized entropy function $\sum_{i \in S} G (ω_{i})$ subject to the calibration constraint in Equation (2), where $G (ν)$ is a strictly convex and differentiable function. For example, the choice of

G (ω) = ω {\log (ω) - 1}

leads to the maximum entropy calibration of Hainmueller (2012).

Using the Lagrange multiplier method, the final weight can be expressed as

{\hat{ω}}_{i} (\hat{λ}) = g^{- 1} (x_{i}^{⊤} \hat{λ}),

where $g (ω) = dG (ω) / d ω$ and $\hat{λ}$ is obtained by solving the calibration equation in Equation (2). For the choice of $G (ω) = ω {\log (ω) - 1}$ , we have ${\hat{ω}}_{i} (λ) = \exp (x_{i}^{⊤} λ)$ . Thus, for a suitable choice of $G (\cdot)$ , we can avoid negative weights.

In practice, we often wish to have the final weights bounded (Huang and Fuller 1978; Rao and Singh 2009). That is, we wish to achieve ${\hat{ω}}_{i} (λ) \leq M$ for some $M$ which is the upper bound of the final weight. To achieve the goal, one way is to use

G^{*} (ω) = {\begin{matrix} G (ω) & | ω | \leq M \\ \infty & | ω | > M \end{matrix}

(5)

as the objective function for calibration.

To determine $M$ to minimize the prediction error, we can treat $M$ as a tuning parameter in the weighted regression and choose an optimal $M$ in a data-driven way. For simplicity, suppose that we use $G (ω) = ω {\log (ω) - 1}$ . In this case, the final weights using Equation (5) can be expressed as

{\hat{ω}}_{i}^{*} (λ, M) = \min {\exp (x_{i}^{⊤} λ), M},

(6)

which is essentially weight trimming. Now, similarly to Equation (4), we can express the final calibration estimator as a projection estimator

\sum_{i \in S} {\hat{ω}}_{i}^{*} (\hat{λ}, M) y_{i} = \sum_{i = 1}^{N} x_{i}^{⊤} {\hat{β}}_{M},

where

{\hat{β}}_{M} = {\sum_{i \in S} {\hat{ω}}_{i}^{*} (\hat{λ}, M) x_{i} x_{i}^{⊤}}^{- 1} \sum_{i \in S} {\hat{ω}}_{i}^{*} (\hat{λ}, M) x_{i} y_{i}

(7)

and $\hat{λ}$ are determined from the calibration constraints in Equation (2). Thus, the regression coefficient in Equation (7) is computed using the trimmed weights in Equation (6), which depend on $M$ . A standard cross-validation method can be used to determine the optimal $M$ that minimizes the mean squared prediction errors using ${\hat{y}}_{i} = x_{i}^{⊤} {\hat{β}}_{M}$ .

4. Discussion

Calibration weighting presents a promising approach to enhancing the utility of nonprobability sample (NPS) data. Instead of relying on potentially unreliable propensity score models, we advocate for using an outcome regression model to construct calibration weights. When the study variable of interest is known, the outcome regression model can be informed by subject-matter knowledge or previous surveys and further validated using model diagnostic tools. This implied regression model framework proves valuable in developing model selection procedures and improving the statistical efficiency of the calibration weighting process. For instance, the weighted regression approach in Section 3 demonstrates how regression techniques can be employed to balance bias and variance trade-offs in weight trimming. Moreover, leveraging the dual relationship between calibration and regression allows for the adaptation of model selection techniques from regression analysis to guide variable selection for calibration constraints. When dealing with a large number of auxiliary variables, exact calibration on all auxiliary variables can lead to highly variable calibration weights and inflate the variance of the final estimator. In such cases, some of the calibration constraints can be relaxed to achieve “soft calibration” (Chambers 1996; Guggemos and Tillé 2010). The soft calibration estimator can be interpreted as a projection estimator using ridge regression. Thus, the unified approach presented in this paper offers a valuable tool for analyzing NPS data under the assumption of ignorable sampling mechanism. Further extensions and applications of this framework will be explored in future work.

Calibration weighting for NPS data, while promising, presents several critical limitations. Firstly, the assumption of an ignorable sampling mechanism, a cornerstone of many weighting methods, is inherently untestable. For a given NPS dataset, it is impossible to definitively ascertain whether this assumption holds true. Consequently, calibration weighting, even when implemented effectively, can only mitigate selection bias rather than completely eliminate it. Its success hinges on the ability of the calibration procedure to reduce bias compared to unweighted estimates. Secondly, calibration weighting may not effectively address bias stemming from undercoverage in the NPS sample. If a specific demographic group is entirely absent from the sample, either due to random chance or systematic undercoverage, the calibration equations may become unsolvable when the auxiliary variables for calibration include indicators for that group. This scenario highlights how calibration weights can provide insights into the data quality of the NPS sample. Thirdly, the curse of dimensionality poses a significant challenge when dealing with high-dimensional covariate spaces. In such cases, the estimation error associated with calibration weights can increase, potentially diminishing the benefits of the approach. While model selection techniques within the regression framework can be employed to reduce the number of covariates in the OR model and improve stability, this introduces the risk of omitted variable bias. This bias, in turn, can weaken the validity of the ignorability assumption of the sampling mechanism. Therefore, selecting the optimal set of calibration variables involves a delicate balance between bias and variance, akin to the trade-offs encountered in traditional regression modeling.

Addressing selection bias due to a potential nonignorable sampling mechanism calls for robust strategies. One such approach involves leveraging nonresponse instrumental variables (IVs) in the calibration process, as proposed by Lesage et al. (2019). These IVs can help mitigate bias when the standard ignorable sampling assumption is violated. However, a crucial limitation arises from the difficulty in verifying the validity of the nonresponse IV assumption. To enhance robustness, researchers can explore multiple nonresponse models and implement a multiply robust estimation procedure, as outlined by Cho et al. (2025). While this approach offers improved protection against model misspecification, it is computationally demanding.

In conclusion, while nonprobability samples (NPS) offer cost-effectiveness in data collection, they can significantly increase the complexity and cost associated with data analysis and dissemination. Applying naive analytical methods to NPS data without careful consideration of the inherent biases can jeopardize public trust in both the survey industry and statistical science. It is incumbent upon statisticians to develop and implement robust analytical strategies that maximize the utility of NPS data, even when they fall short of the ideal sampling framework. We believe that the calibration weighting approach presented in this paper constitutes a valuable step toward achieving this goal.

Footnotes

Acknowledgements

The author thanks the co-editor-in-chief, Dr. Li-Chun Zhang, for the invitation and constructive comments.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research of the author was done during his visit to Seoul National University which was supported by Brain Pool program funded by the Ministry of Science and ICT through the National Research Foundation of Korea (RS-2023-00218474). His research was also partially supported by a grant from the U.S. National Science Foundation (2242820) and a grant from the U.S. Department of Agriculture’s National Resources Inventory, Cooperative Agreement NR203A750023C006, Great Rivers CESU 68-3A75-18-504.

ORCID iD

Jae Kwang Kim

Received: January 6, 2025

Accepted: January 16, 2025

References

Baker

Brick

J. M.

Bates

N. A.

Battaglia

Couper

M. P.

Dever

J. A.

Gile

K. J.

, et al. 2013. “Summary Report of the AAPOR Task Force on Non-Probability Sampling.”Journal of Survey Statistics and Methodology 1: 90–143. DOI: https://doi.org/10.1093/jssam/smt008.

Bethlehem

2016. “Solving the Nonresponse Problem with Sample Matching?”Social Science Computer Review 34 (1): 59–77. DOI: https://doi.org/10.1177/0894439315573926.

Chambers

1996. “Robust Case-Weighting for Multipurpose Establishment Surveys.”Journal of Official Statistics 12: 3–32.

Chan

K. C. G.

Yam

S. C. P.

Zhang

2015. “Globally Efficient Non-Parametric Inference of Average Treatment Effects by Empirical Balancing Calibration Weighting.”Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 (3): 673–700. DOI: https://doi.org/10.1111/rssb.12129.

Chen

2020. “Doubly Robust Inference with Nonprobability Survey Samples.”Journal of the American Statistical Association 115 (532): 2011–21. DOI: https://doi.org/10.1080/01621459.2019.1677241.

Cho

Qiu

Kim

J. K.

2025. “Multiple Bias Calibration for Valid Statistical Inference Under Nonignorable Nonresponse.”Biometrics. Accepted for Publication.

Elliott

M. R.

Valliant

2017. “Inference for Nonprobability Samples.”Statistical Science 32 (2): 249–64. DOI: https://doi.org/10.1214/16-STS598.

Fuller

W. A.

2009. Sampling Statistics. Hoboken, NJ: Wiley.

Guggemos

Tillé

2010. “Penalized Calibration in Survey Sampling: Design-Based Estimation Assisted by Mixed Models.”Journal of Statistical Planning and Inference 140 (11): 3199–212. DOI: https://doi.org/10.1016/j.jspi.2010.04.010.

10.

Hainmueller

2012. “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.”Political Analysis 20 (1): 25–46. DOI: https://doi.org/10.1093/pan/mpr025.

11.

Haziza

Beaumont

J.-F.

2017. “Construction of Weights in Surveys: A Review.”Statistical Science 32 (2): 206–26. DOI: https://doi.org/10.1214/16-STS608.

12.

Herzing

J. M. E.

Blom

A. G.

Meuleman

2022. “Modeling Group-Specific Interviewer Effects on Survey Participation Using Separate Coding for Random Slopes in Multilevel Models.”Journal of Survey Statistics and Methodology 12 (1): 249–73. DOI: https://doi.org/10.1093/jssam/smac025.

13.

Huang

E. T.

Fuller

W. A.

1978. “Nonnegative Regression Estimation for Sample Survey Data.” Proceedings of the Social Statistics Section. American Statistical Association.

14.

Kim

J. K.

Rao

J. N. K.

2011. “Combining Data from Two Independent Surveys: A Model-Assisted Approach.”Biometrika 99 (1): 85–100. DOI: https://doi.org/10.1093/biomet/asr063.

15.

Kott

P. S.

Liao

2012. “Providing Double Protection for Unit Nonresponse with a Nonlinear Calibration-Weighting Routine.”Survey Research Methods 6 (2): 105–11. DOI: https://doi.org/10.18148/srm/2012.v6i2.5076.

16.

Kwon

Kim

J. K.

Qiu

2025. “Generalized Entropy Calibration for Analyzing Voluntary Survey Data.” arXiv preprint arXiv:2412.12405. DOI: https://doi.org/10.48550/arXiv.2412.12405.

17.

Lesage

É.

Haziza

D’Haultfœuille

2019. “A Cautionary Tale on Instrumental Calibration for the Treatment of Nonignorable Unit Nonresponse in Surveys.”Journal of the American Statistical Association 114 (526): 906–15. DOI: https://doi.org/10.1080/01621459.2018.1458619.

18.

Meng

X.-L.

2018. “Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election.”The Annals of Applied Statistics 12 (2): 685–726. DOI: https://doi.org/10.1214/18-AOAS1161SF.

19.

Park

Kim

J. K.

Kim

2019. “A Note on Propensity Score Weighting Method Using Paradata in Survey Sampling.”Survey Methodology 45: 451–63.

20.

Rao

J. N. K.

Singh

A. C.

2009. “Range Restricted Weight Calibration for Survey Data Using Ridge Regression.”Pakistan Journal of Statistics 25: 371–84.

21.

Rohr

Felderer

Silber

Daikeler

Roßmann

Schröder

2024. “When Are Non-Probability Surveys Fit for My Purpose?” Unpublished Manuscript.

22.

Sugden

R. A.

Smith

T. M. F.

1984. “Ignorable and Informative Designs in Survey Sampling Inference.”Biometrika 71 (3): 495–506. DOI: https://doi.org/10.1093/biomet/71.3.495.

23.

Valliant

Dever

J. A.

2011. “Estimating Propensity Adjustments for Volunteer Web Surveys.”Sociological Methods & Research 40 (1): 105–37. DOI: https://doi.org/10.1177/0049124110392533.

24.

2022. “Statistical Inference with Non-Probability Survey Samples.”Survey Methodology 48 (2): 283–311.

Calibration Weighting for Analyzing Non-Probability Samples

Abstract

Keywords

1. Introduction

2. Basic Setup

3. Generalized Entropy Calibration

4. Discussion

Footnotes

Acknowledgements

Funding

ORCID iD

References