Bayes-based Non-Bayesian Inference on Finite Populations from Non-representative Samples: A Unified Approach * * Based on S. N. Roy Memorial Lecture in the symposium.

Abstract

Classical inference on finite populations is based on probability samples drawn from the target population with predefined selection probabilities. The target population parameters are either descriptive statistics such as totals or proportions, or parameters of statistical models assumed to hold for the population values. Familiar examples of estimation of models include the estimation of income elasticities from household surveys, comparisons of pupils’ achievements from educational surveys, and the study of causal relationships between risk factors and disease prevalence from health surveys. Models are also routinely used to account for measurement errors and for small area estimation with small samples in at least some of the areas.

In practice, the samples selected are often not representative of the finite populations from which they are taken. This is so because the sample selection probabilities might be correlated with the model target values, known as informative sampling, or that observations are missing because of not missing at random (NMAR) nonresponse. Sometimes, the samples are subject to mode effects resulting from the use of different answering methods for different sample units, and in more extreme cases, the samples are drawn from sub-populations such as in web-based surveys or in observational studies.

The focus of this article is to discuss and illustrate how all these diverse scenarios can be handled in a unified manner by use of Bayes theorem. The use of Bayes theorem allows relating the model holding for the observed data with the model holding for the missing data and the model operating in the target population. I discuss different estimation procedures and review articles that illustrate their performance.

Keywords

Bayes theorem empirical likelihood informative sampling mode effects NMAR nonresponse parametric likelihood probability weighting propensity scores sample model web panels

Get full access to this article

View all access options for this article.

References

Brick

Montaquila

. Nonresponse and weighting. In Pfeffermann

Rao

editors. Handbook of Statistics 29A; Sample Surveys: Inference and Analysis. Amsterdam: North Holland; 2009. pp. 163–185.

Pfeffermann

Krieger

Rinott

. Parametric distributions of complex survey data under informative probability sampling. Stat. Sinica. 1998a; 8:1087–1114.

Pfeffermann

Landsman

. Are private schools better than public schools? Appraisal for Ireland by methods for observational studies. Ann of Appl Stat. 2011; 5:1726–1751.

Rubin

. Inference and missing data. Biometrika. 1976; 63:605–614.

Little

RJA

. Models for non-response in sample surveys. J. Am. Stat. Assoc. 1982; 77:237–249.

Sugden

Smith

TMF

. Ignorable and informative designs in survey sampling inference. Biometrika. 1984; 71:495–506.

Skinner

Holt

Smith

TMF

editors. Analysis of complex surveys. New York, NY: Wiley; 1989.

Chambers

Skinner

editors. Analysis of survey data. New York, NY: Wiley; 2003.

Pfeffermann

Sverchkov

. Inference under informative sampling. In Pfeffermann

Rao

editors. Handbook of Statistics 29B; Sample Surveys: Inference and Analysis. Amsterdam: North Holland; 2009. pp. 455–487.

10.

Pfeffermann

Sikov

. Imputation and estimation under nonignorable nonresponse in household surveys with missing covariate information. J. Off. Stat. 2011; 27:181–209.

11.

Pfeffermann

Sverchkov

. Parametric and semi-parametric estimation of regression models fitted to survey data. Sankhya. 1999; 61:166–186.

12.

Pfeffermann

Sverchkov

. Fitting generalized linear models under informative sampling. In Chambers

Skinner

editors. Analysis of Survey Data. New York, NY: Wiley; 2003. pp. 175–195.

13.

Binder

. On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 1983; 51:279–292.

14.

Beaumont

. A new approach to weighting and inference in sample surveys. Biometrika. 2008; 95:385–398.

15.

Kim

Skinner

. Weighting in survey analysis under informative sampling. Biometrika. 2013; 100:385–398.

16.

Lee

Berger

. Semiparametric Bayesian analysis of selection models. J. Am. Stat. Assoc. 2001; 96:1397–1409.

17.

Rotnitzky

Robins

. Analysis of semi-parametric regression models with non-ignorable non-response. Stat. in Med. 1997; 16:81–102.

18.

Pfeffermann

. Modelling of complex survey data: Why model? Why is it a problem? How can we approach it? Surv. Methodol. 2011; 37:115–136.

19.

Feder

Pfeffermann

. Statistical Inference Under Non-ignorable Sampling and Nonresponse—An Empirical Likelihood Approach. University of Southampton, Southampton, Highfield, UK: Southampton Statistical Sciences Research Institute; 2016. Available from http://eprints.soton.ac.uk/id/eprint/378245

20.

Pfeffermann

. The role of sampling weights when modeling survey data. Int. Stat. Rev. 1993; 61:317–337.

21.

Gelman

Carlin

Stern

Rubin

. Bayesian data analysis (2nd edition). London: CRC Press; 2003.

22.

Little

RJA

. To model or not to model? Competing modes of inference for finite population sampling. J. Am. Stat. Assoc. 2004; 99:546–556.

23.

Qin

Leung

Shao

. Estimation with survey data under nonignorable nonresponse or informative sampling. J. Am. Stat. Assoc. 2002; 97:193–200.

24.

Pfeffermann

Moura

FAS

Nascimento-Silva

. Multilevel modeling under informative sampling. Biometrika. 2006; 93:943–959.

25.

Pfeffermann

Skinner

Holmes

Goldstein

Rasbash

. Weighting for unequal selection probabilities in multi-level models (with discussion). J. Roy. Stat. Soc. B. 1998b; 60:23–76.

26.

Sverchkov

Pfeffermann

. Prediction of finite population totals based on the sample distribution. Survey Methodology. 2004; 30:79–92.

27.

Rao

JNK

Molina

. Small area estimation (2nd edition). New York, NY: Wiley; 2015.

28.

Pfeffermann

. New important developments in small area estimation. Stat. Sci. 2013; 28:40–68.

29.

Pfeffermann

Sverchkov

. Small area estimation under informative probability sampling of areas and within the selected areas. J. Am. Stat. Assoc. 2007; 102:1427–1439.

30.

Verret

Rao

JNK

Hidiroglou

. Model-based small area estimation under informative sampling. Surv. Methodol. 2015; 41:333–347.

31.

Kott

. Calibration weighting: Combining probability samples and linear prediction models. In Pfeffermann

Rao

editors. Handbook of Statistics 29B; Sample Surveys: Inference and Analysis. Amsterdam: North Holland; 2009. pp. 55–82.

32.

Chaudhuri

Handcock

Rendall

. A Conditional Empirical Likelihood Approach to Combine Sampling Design and Population Level Information. Technical Report No. 3/2010. National University of Singapore, Singapore; 2010.

33.

Nadaraya

. On estimating regression. Theor. Probab. Appl. 1964; 9:157–159.

34.

Watson

. Smooth regression analysis. Sankhya Ser A. 1964; 26:359–372.

35.

Sverchkov

. A new approach to estimation of response probabilities when missing data are not missing at random. Joint Statistical Meetings, Proceedings of the Section on Survey Research Methods (pp. 867–874). American Statistical Association, North Washington Street, Alexandria. 2008.

36.

Orchard

Woodbury

. A missing information principle: Theory and application. Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, 1 (pp. 697–715); Oakland, California: Univ. of Calif. Press. 1972.

37.

Rosenbaum

Rubin

. The central role of the propensity score in observational studies for treatment effects. Biometrika. 1983; 70:41–55.

38.

Rosenbaum

Rubin

. Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 1984; 79:516–524.

39.

Särndal

Swensson

Wretman

. Model assisted survey sampling. New York, NY: Springer–Verla; 1992.

40.

Landsman

. Estimation of Treatment Effects in Observational Studies by Recovering the Assignment Probabilities and the Population Model. PhD Dissertation, Hebrew University of Jerusalem, Israel. 2008.

41.

De Leeuw

. To mix or not to mix? Data collection modes in surveys. J. Off. Stat. 2005; 21:1–23.

42.

Dillman

Christian

. Survey mode as a source of instability in response across surveys. Field Methods. 2005; 17:30–52.

43.

Pfeffermann

. Methodological issues and challenges in the production of official statistics (with discussion). 24th annual Morris Hansen lecture. J. Surv. Stat. Methodol. 2015; 3:425–483.

44.

Couper

. Web surveys: A review of issues and approaches. Public Opin. Quart. 2000; 64:464–494.

45.

Lee

. Propensity score adjustment as a weighting scheme for volunteer panel web surveys. J. Off. Stat. 2006; 22:329–349.

46.

Rivers

. Sampling for web surveys. Joint Statistical Meeting, Proceedings of the Section on Survey Research Methods. Salt Lake City, UT; 2007.

47.

AAPOR. Report On Online Panel Surveys. American Association for Public Opinion Research; 2010. Available from http://poq.oxfordjournals.org/content/early/2010/10/19/poq.nfq048.full.html