Abstract

The book Sampling Theory and Practice, published by Springer in 2020, is written by two recognized experts in the theory of survey statistics: Changbao Wu and Mary E. Thompson, both professors at the University of Waterloo, Canada. The book is concise and clear, reminiscent of Mary E. Thompson’s previous wonderful book (Thompson 1997). Its content is well defined by three parts: Basic Concepts and Methods in Survey Sampling, Advanced Topics in Probability Sample Analysis, and Practical Issues and Special Topics in Survey Sampling. In the following, we focus on the last two parts and emphasize the interest for official statistics of some of the topics presented in this book, which are also new or distinct from those presented in other books in the field.
The Advanced Topics on Analysis of Probability Survey Samples part begins with an important chapter (Chapter 6) on calibration estimators. Besides the conventional calibration estimators introduced by Deville and Särndal (1992), which represent a routine class of estimators practiced by most of the statistical agencies in official statistics, Chapter 6 describes two interesting topics:
(1) the generalized pseudo-empirical likelihood estimator (Tan and Wu 2015), which is shown to be related to the conventional calibration estimator using the KullbackLeibler distance. The construction of this estimator is somewhat similar to that of a calibration estimator by calculating new weights based on the initial design weights. Its variance can be estimated by bootstrapping. This estimator could be an alternative to conventional calibration estimators.
(2) the model-calibration estimators, linked with the classical GREG estimator, a particular type of calibration estimators. The model-calibration estimators assume the existence of a superpopulation model relating the variable of interest to auxiliary variables. Weights are provided similarly to the case of the conventional calibration weights, but the fitted values of the variable of interest obtained from the model are used instead of auxiliary variables in the calibration equations. As indicated in the book, model-calibration estimators have several attractive features, such as being asymptotically design-unbiased and also approximately model-unbiased under the assumed model.
Chapter 7 is devoted to regression analysis and introduces, among other topics, generalized estimating equations (GEE) for longitudinal surveys. In these surveys, the main challenge is dealing with correlated multivariate responses measured over time on the same unit. The book describes the so-called pseudo-GEE method for analyzing longitudinal survey data (Carrillo et al. 2010; Rao 1998) and shows how to make inference based on this method.
Chapter 8 deals with empirical likelihood methods (Owen 2001), which provide estimators of the population mean. The empirical likelihood method is nonparametric and allows the construction of point estimators as well as confidence intervals. However, the method is not directly applicable to complex survey data because the resulting estimators are generally not consistent. As an alternative, two methods are described: the pseudo-empirical likelihood approach (Chen and Sitter 1999) and the sample empirical likelihood. Both methods have an attractive feature: they avoid the use of second-order inclusion probabilities which are required for design-based variance estimation. The generalized pseudo-empirical likelihood method, introduced in Section 6 as an alternative to classical calibration, is reconsidered in Chapter 8 as a generalization of the pseudo empirical likelihood approach.
Chapter 10 covers resampling and replication methods for variance estimation. Various bootstrap methods for survey data are described: methods for single-stage sampling, stratified sampling, multistage cluster sampling, as well as the pseudopopulation bootstrap and the multiplier bootstrap. Variance estimation in the presence of nuisance functionals (such as the finite population distribution function for the Gini coefficient and the finite population quantile function for the Lorenz curve) is also discussed, as well as two methods for computing replication weights for public use survey data.
Chapter 11 deals with Bayesian inference for survey data. The authors emphasize “While the use of Bayesian methods for survey data analysis, especially for official statistics, has long faced serious obstacles, there have been interesting discussions in recent years to revitalize the approach.” They describe various methods for Bayesian inference for survey data and recommend that the use of nonparametric Bayesian inference with good design-based frequentist properties should be pursued for large samples. Additionally, hierarchical Bayes methods based on parametric models can be employed for small area estimation. For parameters defined through non-differentiable estimating functions, such as population quantiles or parameters of quantile regression models (which are known to be challenging), design-based estimation can be solved using MCMC procedures.
The Practical Issues and Special Topics in Survey Sampling part covers several interesting topics related to surveys that have already been conducted worldwide. We welcome the presence of Chapter 14 on “Natural Resource Inventory Surveys,” a topic that is rarely included in books dedicated to survey sampling, while there is a growing interest in official statistics for estimates derived from spatial sampling. The chapter is mainly concerned with the estimation of fish abundance indices, but at the same time it provides useful tools for conducting surveys in similar circumstances.
Chapter 16 deals with dual and multiple frame surveys, a topic of interest in official statistics. The book provides a good introduction to this topic, offering the classical view of estimation, but also the one based on pseudo-empirical likelihood. The authors argue that confidence intervals based on pseudo-empirical likelihood have an advantage over the usual normal approximation-based intervals for population proportions, where dual frame or multiple frame surveys are often used.
Chapter 17 is devoted to non-probability survey samples, an issue currently being debated in research for official statistics. It seems that there is still a lot of conceptual confusion in this area, and we welcome this chapter, which describes a general inferential framework for the analysis of non-probability survey samples in a setting where there also exists a probability survey sample with auxiliary information about the target population. The chapter discusses sample matching and mass imputation, estimation of propensity scores for non-probability samples, followed by mean and variance estimation. This chapter is therefore a good introduction to non-probabilistic samples.
In Preface, the authors state that the book provides material for a course on survey sampling and further reading on a number of specialized topics. In addition to students, this book is a valuable source of information for both researchers and practitioners. We thank the authors for writing such a concise and clear book on specialized topics and recommend it to anyone interested in the subjects covered.
