Sage Journals: Discover world-class research

Abstract

Usually, Statistical Institutes and Research Centers present the results of surveys in a “standardized” way, that is, the estimator of the parameter of interest and the associated measure of accuracy. This implies the calculation of the variance of the estimator, which is typically done in two possible ways: analytically or with replication methods. The analytic way requires the availability of the information of auxiliary variables used in the methodological plan of the survey, both at the sample level for the design variables, and at the respondent level for the variables involved in the treatment of non-response. Unfortunately, these variables are often not available in the final survey dataset, and the variance might therefore be miscalculated. Replication methods, which fully integrate the methodological plan applied into the survey, may be considered as good alternatives to this approach. This paper uses real data from the survey on “Racism and Ethno-racial Discrimination” carried out in Luxembourg in 2021, to compute and compare analytic variance estimators and bootstrap variance estimators. For this survey, both the linearization and the rescaled bootstrap lead to similar results, but the ultimate cluster variance estimator can be substantially biased. This suggests that the rescaled bootstrap may be a relevant approach.

Keywords

analytic variance calibration non-response rescaled bootstrap stratified sampling

1. Introduction

The analysis of complex survey data usually requires the computation of confidence intervals (and therefore, of the variance) of the estimator for an accurate interpretation of the results, that is, the survey error. An error on their calculation can lead to a misinterpretation of the results (J. N. K. Rao 2016). However, in practice, these calculations become tedious, specially from an user perspective. The set of weights needed for their computation is generally obtained from a combination of different statistical procedures, which the user rarely has access to. In the simplest cases, these statistical procedures include sampling, correction for unit non-response, and the final calibration step. Each of the mentioned procedures (or steps) has its own random component, that should be taken into account when calculating the confidence interval of each estimator. The literature distinguishes between two approaches to compute confidence intervals and/or variance estimators: (1) the analytical approach, which is typically associated to linearization techniques for the non-linear estimators, and (2) resampling methods as an alternative.

The analytical approach requires a perfect knowledge on the survey process (sampling and estimation) to compute the variance. If we consider the process described above (sampling, correction for unit non-response, calibration), we first compute the sampling variance, which depends on the sampling plan. Secondly, we account for the variance due to unit non-response, usually by assimilating unit non-response to an additional sampling phase, see Särndal and Swensson (1987) and Caron (1998) for the resulting formulas for an easier implementation. The so-called “reverse approach” (e.g., Beaumont and Haziza 2016) enables a simplification of the formulas from Särndal and Swensson (1987). Thirdly, we account for the calibration of weights. Finally, the linearization approach (J.-C. Deville 1999) can be used to perform variance estimation for non-linear parameters.

Resampling methods such as Jackknife (Quenouille 1956; Tukey 1958), balanced repeated replication (McCarthy 1969), and bootstrap (Efron 1979) are seen as good substitutes for the analytical methods. In resampling methods, “artificial” samples are drawn from the initial sample. Each of these samples is used to create a replicated version of the original estimator, after correction for unit non-response and calibration. The sequence of replicated weights is used to obtain a simulation-based variance estimator, and then to obtain confidence intervals. We focus here on bootstrap methods, which were initially proposed for independent and identically distributed (i.i.d.) data (Efron 1979). The adaptation to survey data on finite populations has been the topic of a substantial literature, see for example Chauvet (2007), Mashreghi et al. (2016), and the references therein. Among the bootstrap methods proposed for survey sampling, the most used is the rescaled bootstrap of Rao and Wu (1988). It consists in producing bootstrap resamples by means of a basic bootstrap strategy, and then to rescale the resampled values to match an unbiased variance estimator. A modification is proposed in Rao et al. (1992), where the rescaling is applied to the sampling weights. A particular choice of the resampling size leads to the usual with-replacement bootstrap as a particular case. The rescaled bootstrap is generally accurate for sampling designs with a small first-stage sampling fraction, and is conservative otherwise. Beaumont and Émond (2022) recently proposed a unified bootstrap method which may be applied either to multistage sampling designs or to two-phase sampling designs with Poisson sampling at the second phase, without the assumption of a small sampling fraction at the first stage. In the rest of the article, we use the rescaled bootstrap algorithm in Rao et al. (1992). The correction for unit non-response and calibration are integrated in a similar way as in Bessonneau et al. (2021), to obtain a set of bootstrap weights at the level of the survey units.

The aim of this article is to study and compare the variance estimators and confidence intervals obtained under both the linearization approach and the bootstrap approach, in an application with real data from the survey on “Racism and Ethno-racial Discrimination” carried out in Luxembourg in 2021. This survey has two particularities. Firstly, the sampling fraction varies between 1.6% and 10.3% inside strata. It is therefore not negligible in some strata, in which the variance could be overestimated by the bootstrap. Secondly, the response rate is quite low, below 20%, which means that the variance due to unit non-response may be a large part of the overall variance. Our empirical results enable us to confirm on real survey data that both the linearization approach and the rescaled bootstrap approach lead to very similar variance estimates.

The rest of the article is organized as follows. We describe in Section 2 how to estimate the variance of an expansion estimator. We first consider in Subsection 2.1 an analytical approach, and the way by which the different sampling and estimation steps are dealt with in variance estimation. In Subsection 2.2, we explain how the rescaled bootstrap is applied in our case, and compare the bootstrap variance estimators with analytic variance estimators. In Section 3, the sampling design of the survey “Racism and Ethno-racial Discrimination” is described. The results obtained when comparing both approaches are presented and discussed in Subsection 4. The sample from this survey is used to build a pseudo-population, on which we perform a simulation study. The results are presented in Subsection 4.1. Finally, Section 5 summarizes our main conclusions.

2. Methodology

We consider a classical sampling design for a survey of individuals, in a population for which a sampling frame is available. Apart from the definition and the building of the sampling frame, three main steps need to be accounted for in estimation, namely (1) sample selection, (2) correction of unit non-response, and (3) calibration of the survey weights.

2.1. Analytical Variance

2.1.1. Sampling Design

Let $U$ be a finite population of size $N$ , and let $i = 1, \dots, N$ denote some individual inside. We partition the population into $H$ strata $h = 1, \dots, H$ of sizes $N_{h}$ . In the strata $h = 1, \dots, H$ , simple random samples $s_{h}$ of size $n_{h}$ are selected independently. The global sample $s$ is given by $s = \cup_{h = 1}^{H} s_{h}$ of size $n_{s} = \sum_{h = 1}^{H} n_{h}$ with first order inclusion probabilities $π_{i} > 0, i \in U$ and second order inclusion probabilities $π_{ij} > 0, i, j \in U, i \neq j$ . The variance-covariance matrix of the sample membership indicators is $Δ_{s} = {(Δ_{s, ij})}_{i, j \in U}$ , where

\begin{matrix} Δ_{s, ij} = {\begin{matrix} π_{ij} - π_{i} π_{j}, if i \neq j, \\ π_{i} (1 - π_{i}), if i = j . \end{matrix} \end{matrix}

Let $y_{i}$ be a study variable for $i \in U$ . We define the total of the variable of interest $y_{i}$ as

Y = \sum_{i \in U} y_{i} .

(1)

The general formula for the Horvitz-Thompson estimator (Horvitz and Thompson 1952) is

\begin{matrix} {\hat{Y}}^{HT} = \sum_{i \in s} d_{i} y_{i} = \sum_{h = 1}^{H} \sum_{i \in s_{h}} \frac{y_{i}}{π_{i}}, \end{matrix}

(2)

where $d_{i} = 1 / π_{i}$ is the sampling weight. It is unbiased for $Y$ , provided that all $π_{i}$ ’s are positive. The variance of this estimator is given by

V ({\hat{Y}}^{HT}) = \sum_{i \in U} \sum_{j \in U} \frac{y_{i} y_{j}}{π_{i} π_{j}} Δ_{s, ij} = \sum_{h = 1}^{H} (\sum_{i \in U_{h}} \sum_{j \in U_{h}} \frac{y_{i} y_{j}}{π_{i} π_{j}} Δ_{s, ij}) .

Under stratified simple random sampling without replacement, the Horvitz-Thompson estimator simplifies as

{\hat{Y}}^{HT} = \sum_{h = 1}^{H} \hat{\bar{N_{h}}} Y_{h} with \hat{\bar{Y_{h}}} = \frac{1}{n_{h}} \sum_{i \in s_{h}} y_{i} .

(3)

An unbiased estimator of variance is given by

\begin{matrix} v_{HT} ({\hat{Y}}^{HT}) = \sum_{i \in s} \sum_{j \in s} \frac{y_{i} y_{j}}{π_{i} π_{j}} \frac{Δ_{s, ij}}{π_{ij}} \\ = \sum_{h = 1}^{H} N_{h}^{2} (1 - f_{h}) \frac{s_{h, y}^{2}}{n_{h}}, \end{matrix}

(4)

where $f_{h} = \frac{n_{h}}{N_{h}}$ is the sampling rate and $s_{h, y}^{2} = \frac{1}{n_{h} - 1} \sum_{i \in s_{h}} {(y_{i} - {\hat{\bar{Y}}}_{h})}^{2}$ is the sample variance inside the stratum $h$ . This formula is straightforward to program and most of statistical softwares have it implemented.

2.1.2. Non-Response Correction

In survey sampling, unit non-response is usually treated as a supplementary sampling phase (see Särndal and Swensson 1987). This approach allows to consider, in a simple way, the correction for non-response in the calculation of an estimator and in the estimation of its variance. In this section, we explain how to obtain the estimator of the variance of a total of a variable of interest after the correction of unit non-response.

Under unit non-response, we suppose that an additional phase is added in the sampling process, which may be described as follows. From the realization of the sample $s$ (which may be seen as the population, from the perspective of unit non-response), we suppose that a sub-sample $r$ of respondents of random size $n_{r}$ is drawn, with response probabilities

\begin{matrix} p_{i} \equiv \Pr (i \in r | s) for i \in s . \end{matrix}

(5)

We also use the notation

p_{ij} \equiv \Pr (i, j \in r | s) for i \neq j \in s .

(6)

We suppose that the response probabilities $p_{i}$ are $> 0$ (no hardcore non-response) and that the second-order probabilities $p_{ij}$ are also $> 0$ . Note that in this second sampling phase, the inclusion probabilities are defined conditionally on the realization of the sample $s$ . This means that these probabilities would change if the sample $s$ was different. The variance-covariance matrix of response indicators, computed conditionally on the realized sample $s$ , is defined by $Δ_{r} = {(Δ_{r, ij})}_{i, j \in s}$ , where

\begin{matrix} Δ_{r, ij} = {\begin{matrix} p_{ij} - p_{i} p_{j}, if i \neq j, \\ p_{i} (1 - p_{i}), if i = j . \end{matrix} \end{matrix}

We first suppose that the response probabilities are known. In such case, an expansion estimator is

{\hat{Y}}^{nr} = \sum_{i \in r} \frac{y_{i}}{π_{i} p_{i}} .

(7)

Following Särndal and Swensson (1987), the associated variance of this estimator is given by

V ({\hat{Y}}^{nr}) = \sum_{i \in U} \sum_{j \in U} \frac{y_{i} y_{j}}{π_{i} π_{j}} Δ_{s, ij} + E (\sum_{i \in s} \sum_{j \in s} \frac{y_{i} y_{j}}{π_{i} p_{i} π_{j} p_{j}} Δ_{r, ij}) .

(8)

The interpretation of the formula is simple: the variance of an estimator under two-phase sampling is given by the sum of the variance of the first sampling phase (first term in the right-hand side of Equation (8)) and the variance of the second sampling phase (second term in the right-hand side of Equation (8)). An unbiased estimator of the variance proposed also by Särndal and Swensson (1987) is given by

v ({\hat{Y}}^{nr}) = \sum_{i \in r} \sum_{j \in r} \frac{y_{i} y_{j}}{π_{i} π_{j}} \frac{Δ_{s, ij}}{π_{ij} p_{ij}} + \sum_{i \in r} \sum_{j \in r} \frac{y_{i} y_{j}}{π_{i} p_{i} π_{j} p_{j}} \frac{Δ_{r, ij}}{p_{ij}} .

(9)

In practice, even with a simple survey procedure as the one presented in this paper, the Equation (9) is not straightforward to program. It involves several multiple sums, at the level of the first phase, of the second phase, and also between phases. Caron (1998) proposed variance estimation formulas in the specific cases when Poisson sampling or stratified simple random sampling without replacement is used at the second phase. In this work, we focus on Poisson sampling at the second phase, since it is the one used to model unit non-response in our survey “Racism and discrimination ethno-racial” in Luxembourg (see Section 3).

Under Poisson sampling, the individuals in the sample $s$ respond independently of one another. This implies that $p_{ij} = p_{i} p_{j}$ for $i \neq j \in s$ , and

\begin{matrix} Δ_{r, ij} = {\begin{matrix} 0, if i \neq j \in s, \\ p_{i} (1 - p_{i}), if i = j \in s . \end{matrix} \end{matrix}

(10)

The variance estimator in Equation (9) may then be rewritten as

v ({\hat{Y}}^{nr}) = \underset{1}{\underset{︸}{\sum_{i \in r} \sum_{j \in r} \frac{A_{ij}}{p_{ij}} y_{i} y_{j}}} + {\underset{2}{\underset{︸}{\sum_{i \in r} \frac{(1 - p_{i})}{p_{i}^{2}} (\frac{y_{i}}{π_{i}})}}}^{2},

(11)

where $A_{ij} = \frac{π_{ij} - π_{i} π_{j}}{π_{ij} π_{i} π_{j}}$ . The second part in the right-hand side of Equation (11) is straightforward to implement in any program language, since this is a simple sum. However, the first part still has cross products sums. Caron (1998) developed that part using the following idea. The study variable $y_{i}$ is defined at the level of the respondents (last sampling phase). However, we need to calculate the variance at the first sampling phase, therefore, we need to define the variables at the level of the sample $s$ . This is possible by defining a transformed variable. Assuming that the response probabilities are known, let $z_{i}$ be a transformation of $y_{i}$ , $z_{i} = T (y_{i})$ for $i \in s$ where $T (y_{i}) = (y_{i} / p_{i}) r_{i},$ and $r_{i}$ is the response indicator that takes value $1$ if the unit $i \in s$ replies, and $0$ otherwise. Thus, the first part of the sum in Equation (11) reads as follows

\sum_{i \in r} \sum_{j \in r} \frac{A_{ij}}{p_{ij}} y_{i} y_{j} = \underset{1 a}{\underset{︸}{\sum_{i \in s} \sum_{j \in s} A_{ij} z_{i} z_{j}}} + \underset{1 b}{\underset{︸}{\sum_{i \in r} A_{ii} y_{i}^{2} (\frac{1}{p_{i}} - \frac{1}{p_{i}^{2}})}},

(12)

where $A_{ii} = \frac{(1 - π_{i})}{π_{i}^{2}}$ . Now the Equation (12) is straightforward to program. The first part (part 1a) is the general estimation formula of the sampling variance for any sampling design, calculated on the transformed variable $z_{i}, i \in s$ . For stratified random sampling, this formula corresponds to Equation (4) for the variable $z_{i}$ . The second sum (part 1b) of the Equation (12) is the estimation error on the variable $z_{i}$ , which is a simple sum. The resulting variance estimator may also be obtained under the so-called reverse framework see for example Beaumont and Haziza (2016). This approach results in summing up the right-hand side of Equations (11) and (12), respectively. This leads to a simple formula for the estimated variance in Equation (9), given by

v ({\hat{Y}}^{nr}) = \sum_{i, j \in s} A_{ij} z_{i} z_{j} + \sum_{i \in r} {(\frac{y_{i}}{π_{i}})}^{2} \frac{π_{i} (1 - p_{i})}{p_{i}^{2}}

(13)

In practice, the response probabilities $p_{i}$ are unknown and need to be estimated. We suppose that the standard method of Response Homogeneity Groups (RHGs) is used. Under this approach, the sample $s$ is partitioned into $G$ groups $s_{g}, g = 1, \dots, G,$ such that the response probability is approximately constant inside the groups. For all the units in $s_{g}$ , their response probability is estimated by the response rate

\begin{matrix} {\hat{p}}_{g} & = & \frac{\sum_{i \in s_{g}} r_{i}}{n_{g}}, \end{matrix}

(14)

with $n_{g}$ the size of $s_{g}$ .

The estimator of the total accounting for the estimation of the response probabilities is

\begin{matrix} {\hat{\hat{Y}}}^{n r} & = & \sum_{i \in r} \frac{y_{i}}{π_{i} {\hat{p}}_{i}} = \sum_{i \in r} d_{i}^{n r} y_{i} \end{matrix}

(15)

where $d_{i}^{nr} = 1 / (π_{i} {\hat{p}}_{i})$ . The variance estimator in Equation (13) also needs to be modified to account for the fact that the response probabilities are estimated rather than known (Juillard and Chauvet 2018; Kim and Kim 2007). This leads to

\begin{matrix} \begin{matrix} v_{lin} ({\hat{\hat{Y}}}^{nr}) = \sum_{i, j \in s} A_{ij} \frac{r_{i}}{{\hat{p}}_{i}} \frac{r_{j}}{{\hat{p}}_{i}} {y_{i} - π_{i} {\bar{y}}_{rg (i)}} {y_{j} - π_{j} {\bar{y}}_{rg (j)}} + \sum_{i \in r} {(\frac{y_{i} - π_{i} {\bar{y}}_{rg (i)}}{π_{i}})}^{2} \frac{π_{i} (1 - {\hat{p}}_{i})}{{\hat{p}}_{i}^{2}}, \end{matrix} \end{matrix}

(16)

where $g (i)$ is the RHG to which the unit $i \in s$ belongs, and

\begin{matrix} {\bar{Y}}_{r g} & = & \frac{\sum_{i \in s_{g}} r_{i} \frac{y_{i}}{π_{i}}}{\sum_{i \in s_{g}} r_{i}} \end{matrix}

(17)

is the mean among respondents of the variable $\frac{y_{i}}{π_{i}}$ inside the RHG $s_{g}$ .

2.1.3. Calibration

Normally, after the correction of non-response, it is common to include an additional step of calibration on totals of auxiliary variables known at the level of the population. The aim of calibration is to reduce the variance of the estimator of the total, which is the case if there is an (approximately) linear relationship between the variable of interest and the auxiliary variables.

Calibration methods require a vector of $p$ auxiliary/calibration variables $x_{i} = (x_{i 1}, \dots, x_{ip})'$ available at the level of the sample of respondents $r$ . Let $X = \sum_{i \in U} x_{i}$ be the vector of their population totals. The idea is to obtain new weights, $w_{i},$ as close as possible (in terms of a distance function) to the weights $d_{i}^{nr}$ obtained after the correction of unit non-response, while satisfying the calibration equations

\sum_{i \in r} w_{i} x_{i} = X .

(18)

The calibration estimator of the total in (1) proposed by Deville and Särndal (1992) is given by:

{\hat{Y}}^{w} = \sum_{i \in r} w_{i} y_{i} .

(19)

Deville and Särndal (1992) showed that, under some mild conditions, the calibration estimator under any distance function is asymptotically equivalent to the generalized regression (GREG) estimator. Let ${\hat{\hat{X}}}^{nr} = \sum_{i \in r} d_{i}^{nr} x_{i}$ denote the estimator corrected for unit non-response of the vector total of the calibration variables. Then the GREG estimator of the total $Y$ is given by

{\hat{Y}}^{reg} = {\hat{\hat{Y}}}^{nr} + (X - {\hat{\hat{X}}}^{nr})^{'} \hat{β},

(20)

where

\hat{β} = {(\sum_{i \in r} d_{i}^{nr} x_{i} x_{i}^{'})}^{- 1} (\sum_{i \in r} d_{i}^{nr} x_{i} y_{i}) .

The asymptotic equivalence between the calibrated estimator and the generalized regression estimator implies that they share the same asymptotic variance, see Deville and Särndal (1992) for a proof.

For variance estimation, let

e_{i} = y_{i} - {\hat{β}}^{'} x_{i}

(21)

denote the estimated residuals in the weighted regression of the variable of interest $y_{i}$ on the calibration variables $x_{i}$ . The variance estimator for ${\hat{Y}}^{w}$ is obtained by replacing in Equation (16) the variable $y_{i}$ with $e_{i}$ . This leads to the variance estimator

\begin{matrix} \begin{matrix} v_{lin} ({\hat{Y}}^{w}) = \sum_{i, j \in s} A_{ij} \frac{r_{i}}{{\hat{p}}_{i}} \frac{r_{j}}{{\hat{p}}_{i}} {e_{i} - π_{i} {\bar{e}}_{rg (i)}} {e_{j} - π_{j} {\bar{e}}_{rg (j)}} + \sum_{i \in r} {(\frac{e_{i} - π_{i} {\bar{e}}_{rg (i)}}{π_{i}})}^{2} \frac{π_{i} (1 - {\hat{p}}_{i})}{{\hat{p}}_{i}^{2}}, \end{matrix} \end{matrix}

(22)

where

{\bar{e}}_{rg} = \frac{\sum_{i \in s_{g}} r_{i} \frac{e_{i}}{π_{i}}}{\sum_{i \in s_{g}} r_{i}} .

(23)

2.1.4. Extension for Other Parameters

If the population size $N$ is known, the results in the previous sections can be readily extended for the estimation of the population mean $\bar{Y} = Y / N$ . In such case, we can use the estimator

{\hat{\bar{Y}}}^{w} = N^{- 1} {\hat{Y}}^{w}

(24)

and the variance estimator

v_{lin} ({\hat{\bar{Y}}}^{w}) = N^{- 2} v_{lin} ({\hat{Y}}^{w}) .

It is very common in statistical institutes to estimate other parameters, such as a ratio $R = Y / Z$ . For example, if the population size $N$ is unknown, the estimator in Equation (24) can not be used. The ratio $R$ may be estimated by

{\hat{R}}^{w} = \frac{{\hat{Y}}^{w}}{{\hat{Z}}^{w}},

(25)

using a substitution principle. Since both the numerator and denominator in (25) are random variables, the calculation of the variance requires a previous step which is called linearization (e.g., Deville 1999; Woodruff 1971). The estimated linearized variable associated to the ratio $R$ is

ℓ_{i} = \frac{1}{{\hat{Z}}^{w}} (y_{i} - {\hat{R}}^{w} z_{i}) .

(26)

The estimated residuals $e_{i}$ in Equation (21), in this case, correspond to the weighted regression of the linearized variable of interest $ℓ_{i}$ on the calibration variables.

2.2. Bootstrap Variance

In practice, the computation of the variance estimator defined in Subsection 2.1 may not be easy. In particular, auxiliary design variables (the strata indicators) need to be available at the level of the sample $s$ , and auxiliary variables used in the treatment of non-response (the RHG membership indicators) need to be available at the level of the respondent subsample $r$ . These variables may be not always available for secondary users, which might lead to a miscalculation of the variance (Rao 2006). Also, for more complex parameters like quantiles, using the linearization approach is not straightforward.

Resampling methods such as Jackknife (Quenouille 1956; Tukey 1958), balanced repeated replication (McCarthy 1969), and bootstrap (Efron 1979) are good alternatives to the calculations of analytic variance estimators. In this paper, we focus on the bootstrap method proposed by Rao and Wu (1988) and its generalization proposed in Rao et al. (1992). This technique has shown a very good performance in case of stratified multi stage sampling with a negligible first-stage sampling fraction, see Beaumont and Patak (2012), Beaumont and Émond (2022), Chen et al. (2022). This is also a well-established variance estimation method in many national statistical offices. The method proposed by Rao and Wu (1988) is reviewed in Subsection 2.2.1, and an adaptation accounting for unit non-response and calibration is presented in Algorithm 3. In Subsection 2.2.2, we give the benchmark analytic variance estimators that the bootstrap aims at reproducing in the linear case, and compare them to alternatives.

2.2.1. The Rescaled Bootstrap

This bootstrap method proposed by Rao and Wu (1988) is suitable for multistage sampling designs, with a fixed-size sampling design at the first-stage, possibly stratified, which may be performed either with or without replacement. In line with Subsection 2.1, we consider the method in the particular case when the sample is selected by a one-stage sampling design, namely stratified simple random sampling without replacement. Following Rao and Wu (1988), a bootstrap variance estimator for the Horvitz-Thompson estimator may be computed as described in Algorithm 1.

Algorithm 1 Computation of the Rao and Wu (1988) bootstrap variance estimator for the Horvitz-Thompson estimator
1) For any stratum $h = 1, \dots, H$ , draw a stratified simple random sample with replacement of size $m_{h}$ . Let ${y_{hi}^{}}_{i = 1}^{m_{h}}$ denote the values resampled in the stratum $h$ , and $s^{ (b)}$ denote the union of the resamples. Then compute $\begin{matrix} {\tilde{y}}_{hi} = {\hat{\bar{Y}}}_{h} + {(m_{h})}^{1 / 2} {(n_{h} - 1)}^{- 1 / 2} (y_{hi}^{} - {\hat{\bar{Y}}}_{h}) for any i \in m_{h}, \end{matrix}$ (27) which is a correction by a centering and by a scaling factor of the variable of interest, where $m_{h} \geq 1$ is the bootstrap resample size inside the stratum $U_{h}$ , which remains fixed in the bootstrap procedure, and where ${\hat{\bar{Y}}}_{h}$ is defined in Equation (3). In practice, the most common choice is $m_{h} = n_{h} - 1$ . 2) For $b = 1, \dots, B$ , with $B$ large, compute $\begin{matrix} {\hat{Y}}^{HT (b)} = \sum_{h = 1}^{H} \frac{N_{h}}{m_{h}} \sum_{i \in s_{h}^{* (b)}} {\tilde{y}}_{h, i} = \sum_{h = 1}^{H} \frac{N_{h}}{m_{h}} \sum_{i \in s_{h}} δ_{i}^{* (b)} {\tilde{y}}_{h, i}, \end{matrix}$ (28) where $s_{h}^{* (b)}$ is the with-replacement sample selected in stratum $h$ , and $δ_{i}^{* (b)}$ is the multiplicity of unit $i$ , i.e. the number of times that unit $i$ is selected in the with-replacement sample $s^{* (b)}$ . 3) The estimator of the variance is given by the Monte Carlo approximation: $\begin{matrix} {\tilde{v}}_{b} ({\hat{Y}}^{HT}) = {(B - 1)}^{- 1} \sum_{b = 1}^{B} {({\hat{Y}}^{HT * (b)} - B^{- 1} \sum_{c = 1}^{B} {\hat{Y}}^{HT * (c)})}^{2} . \end{matrix}$ (29)

Algorithm 1 Computation of the Rao and Wu (1988) bootstrap variance estimator for the Horvitz-Thompson estimator

1) For any stratum

h = 1, \dots, H

, draw a stratified simple random sample with replacement of size

m_{h}

. Let

{y_{hi}^{*}}_{i = 1}^{m_{h}}

denote the values resampled in the stratum

h

, and

s^{* (b)}

denote the union of the resamples. Then compute

\begin{matrix} {\tilde{y}}_{hi} = {\hat{\bar{Y}}}_{h} + {(m_{h})}^{1 / 2} {(n_{h} - 1)}^{- 1 / 2} (y_{hi}^{*} - {\hat{\bar{Y}}}_{h}) for any i \in m_{h}, \end{matrix}

(27)
which is a correction by a centering and by a scaling factor of the variable of interest, where

m_{h} \geq 1

is the bootstrap resample size inside the stratum

U_{h}

, which remains fixed in the bootstrap procedure, and where

{\hat{\bar{Y}}}_{h}

is defined in Equation (3). In practice, the most common choice is

m_{h} = n_{h} - 1

.
2) For

b = 1, \dots, B

, with

B

large, compute

\begin{matrix} {\hat{Y}}^{HT * (b)} = \sum_{h = 1}^{H} \frac{N_{h}}{m_{h}} \sum_{i \in s_{h}^{* (b)}} {\tilde{y}}_{h, i} = \sum_{h = 1}^{H} \frac{N_{h}}{m_{h}} \sum_{i \in s_{h}} δ_{i}^{* (b)} {\tilde{y}}_{h, i}, \end{matrix}

(28)
where

s_{h}^{* (b)}

is the with-replacement sample selected in stratum

h

, and

δ_{i}^{* (b)}

is the multiplicity of unit

i

, i.e. the number of times that unit

i

is selected in the with-replacement sample

s^{* (b)}

.
3) The estimator of the variance is given by the Monte Carlo approximation:

\begin{matrix} {\tilde{v}}_{b} ({\hat{Y}}^{HT}) = {(B - 1)}^{- 1} \sum_{b = 1}^{B} {({\hat{Y}}^{HT * (b)} - B^{- 1} \sum_{c = 1}^{B} {\hat{Y}}^{HT * (c)})}^{2} . \end{matrix}

(29)

Since Step 1 in Algorithm 1 is specific to the variable of interest, the procedure should be repeated separately for each of them, which leads in a not efficient method computationally speaking. Rao et al. (1992) propose a modification of the procedure, where the centering/scaling correction is applied on the sampling weights rather than on the variable of interest. This leads to Algorithm 2. The resampling is done at the level of the sampling unit, provides better estimation of the sampling variance. However, we still need to integrate the non-response correction and calibration as described in Subsection 2.1.

Algorithm 2 Computation of the Rao et al. (1992) bootstrap variance estimator for the Horvitz-Thompson estimator
1) For $b = 1, \dots, B$ , with $B$ large, draw a stratified simple random sample with replacement $s^{* (b)}$ of size $m_{h}$ inside $s_{h}$ . 2) For $b = 1, \dots, B$ , compute $\begin{matrix} {\hat{Y}}^{HT * (b)} = \sum_{h = 1}^{H} \sum_{i \in s_{h}} d_{i}^{* (b)} y_{i}, \end{matrix}$ (30) $\begin{matrix} with d_{i}^{* (b)} = [{(1 - \frac{m_{h}}{n_{h} - 1})}^{1 / 2} + {(\frac{m_{h}}{n_{h} - 1})}^{1 / 2} \frac{n_{h}}{m_{h}} δ_{i}^{* (b)}] d_{i}, \end{matrix}$ (31) where $d_{i}$ is defined in Equation (2). 3) The estimator of the variance is given by the Monte Carlo approximation: $\begin{matrix} {\tilde{v}}_{b} ({\hat{Y}}^{HT}) = {(B - 1)}^{- 1} \sum_{b = 1}^{B} {({\hat{Y}}^{HT * (b)} - B^{- 1} \sum_{c = 1}^{B} {\hat{Y}}^{HT * (c)})}^{2} . \end{matrix}$ (32)

Algorithm 2 Computation of the Rao et al. (1992) bootstrap variance estimator for the Horvitz-Thompson estimator

1) For

b = 1, \dots, B

, with

B

large, draw a stratified simple random sample with replacement

s^{* (b)}

of size

m_{h}

inside

s_{h}

.
2) For

b = 1, \dots, B

, compute

\begin{matrix} {\hat{Y}}^{HT * (b)} = \sum_{h = 1}^{H} \sum_{i \in s_{h}} d_{i}^{* (b)} y_{i}, \end{matrix}

(30)

\begin{matrix} with d_{i}^{* (b)} = [{(1 - \frac{m_{h}}{n_{h} - 1})}^{1 / 2} + {(\frac{m_{h}}{n_{h} - 1})}^{1 / 2} \frac{n_{h}}{m_{h}} δ_{i}^{* (b)}] d_{i}, \end{matrix}

(31)
where

d_{i}

is defined in Equation (2).
3) The estimator of the variance is given by the Monte Carlo approximation:

\begin{matrix} {\tilde{v}}_{b} ({\hat{Y}}^{HT}) = {(B - 1)}^{- 1} \sum_{b = 1}^{B} {({\hat{Y}}^{HT * (b)} - B^{- 1} \sum_{c = 1}^{B} {\hat{Y}}^{HT * (c)})}^{2} . \end{matrix}

(32)

The bootstrap variance estimator given in Equation (32) accounts for the sampling error. Surveys are prone to other sources of variability, like unit non-response. Rust and Rao (1996) highlight the importance of computing accurately survey error including non-response error when estimating population parameters. They provide some practical examples of health surveys with multistage sampling designs, and design weights corrected for non-response and adjusted via post-stratification. They implemented and analyzed resampling techniques (Jackknife, balanced repeated replication, and bootstrap) in this context. Recently, Bessonneau et al. (2021) proposed two SAS macros including the procedure proposed by Rao et al. (1992) plus the two next steps, non-response correction and calibration. The first macro was designed to obtain the bootstrap replicate weights for single-stage sampling, while the second one was designed to obtain bootstrap replicate weights for two-stage sampling. Since our survey is single-stage, we focus on the algorithm beneath the first macro, which is described in Algorithm 3. Note that the response indicators $r_{i}$ ’s and the vector of calibration variables $x_{i}$ are held fixed in the bootstrap process. This means in particular that the non-response and calibration models remain constant for all the replicates. Also, note that Step 3 is modified as compared to Bessonneau et al. (2021), to account for any possible distance function in the calibration step, and not only on the linear function $F (x) = 1 + x$ .

Algorithm 3 Computation of the Bessonneau et al. (2021) bootstrap algorithm accounting for non-response and calibration
1) For $b = 1, \dots, B$ , with $B$ large, draw a stratified simple random sample with replacement $s^{* (b)}$ of size $m_{h} = n_{h} - 1$ inside $s_{h}$ . For any $h = 1, \dots, H$ and $i \in s_{h}$ , the bootstrap design weight (J. N. K. Rao et al. 1992) is $\begin{matrix} d_{i}^{* (b)} = G_{i} d_{i} with G_{i} = \frac{n_{h}}{n_{h} - 1} δ_{i}^{* (b)} . \end{matrix}$ (33) The bootstrap version of the Horvitz-Thompson estimator given in (2) is $\begin{matrix} {\hat{Y}}^{HT * (b)} = \sum_{i \in s} d_{i}^{* (b)} y_{i} . \end{matrix}$ (34) 2) To account for non-response as described in Section 2.1.2, we recompute the estimated response probabilities inside the same RHGs. This leads to the estimated probabilities and the bootstrap weights corrected for non-response $\begin{matrix} {\hat{p}}_{i}^{* (b)} = \frac{\sum_{i \in s_{g}} G_{i} r_{i}}{\sum_{i \in s_{g}} G_{i}} and d_{i}^{nr * (b)} = \frac{d_{i}^{* (b)}}{{\hat{p}}_{i}^{* (b)}} . \end{matrix}$ (35) The bootstrap version of the estimator corrected for non-response given in (15) is $\begin{matrix} {\hat{\hat{Y}}}^{nr * (b)} = \sum_{i \in r} d_{i}^{nr * (b)} y_{i} . \end{matrix}$ (36) 3) The calibration is accounted for by using the distance function $F (\cdot)$ (see J.-C. Deville et al. 1993). For any unit $i \in r$ , we have $\begin{matrix} w_{i}^{* (b)} = d_{i}^{nr * (b)} F {x_{i}^{'} λ^{* (b)}}, \end{matrix}$ (37) where $λ^{* (b)}$ is a vector of Lagrange multipliers such that $\begin{matrix} \sum_{i \in r} d_{i}^{nr * (b)} F {x_{i}^{'} λ^{* (b)}} x_{i} = X . \end{matrix}$ (38) The bootstrap version of the calibrated estimator given in (19) is $\begin{matrix} {\hat{Y}}^{w * (b)} = \sum_{i \in r} w_{i}^{* (b)} y_{i} . \end{matrix}$ (39) 4) The bootstrap variance estimator for ${\hat{Y}}^{HT}$ is $\begin{matrix} {\tilde{v}}_{b} ({\hat{Y}}^{HT}) = {(B - 1)}^{- 1} \sum_{b = 1}^{B} {({\hat{Y}}^{HT * (b)} - B^{- 1} \sum_{c = 1}^{B} {\hat{Y}}^{HT * (c)})}^{2}, \end{matrix}$ (40) and similarly for ${\hat{\hat{Y}}}^{nr}$ and ${\hat{Y}}^{w}$ .

Algorithm 3 Computation of the Bessonneau et al. (2021) bootstrap algorithm accounting for non-response and calibration

1) For

b = 1, \dots, B

, with

B

large, draw a stratified simple random sample with replacement

s^{* (b)}

of size

m_{h} = n_{h} - 1

inside

s_{h}

. For any

h = 1, \dots, H

and

i \in s_{h}

, the bootstrap design weight (J. N. K. Rao et al. 1992) is

\begin{matrix} d_{i}^{* (b)} = G_{i} d_{i} with G_{i} = \frac{n_{h}}{n_{h} - 1} δ_{i}^{* (b)} . \end{matrix}

(33)
The bootstrap version of the Horvitz-Thompson estimator given in (2) is

\begin{matrix} {\hat{Y}}^{HT * (b)} = \sum_{i \in s} d_{i}^{* (b)} y_{i} . \end{matrix}

(34)
2) To account for non-response as described in Section 2.1.2, we recompute the estimated response probabilities inside the same RHGs. This leads to the estimated probabilities and the bootstrap weights corrected for non-response

\begin{matrix} {\hat{p}}_{i}^{* (b)} = \frac{\sum_{i \in s_{g}} G_{i} r_{i}}{\sum_{i \in s_{g}} G_{i}} and d_{i}^{nr * (b)} = \frac{d_{i}^{* (b)}}{{\hat{p}}_{i}^{* (b)}} . \end{matrix}

(35)
The bootstrap version of the estimator corrected for non-response given in (15) is

\begin{matrix} {\hat{\hat{Y}}}^{nr * (b)} = \sum_{i \in r} d_{i}^{nr * (b)} y_{i} . \end{matrix}

(36)
3) The calibration is accounted for by using the distance function

F (\cdot)

(see J.-C. Deville et al. 1993). For any unit

i \in r

, we have

\begin{matrix} w_{i}^{* (b)} = d_{i}^{nr * (b)} F {x_{i}^{'} λ^{* (b)}}, \end{matrix}

(37)
where

λ^{* (b)}

is a vector of Lagrange multipliers such that

\begin{matrix} \sum_{i \in r} d_{i}^{nr * (b)} F {x_{i}^{'} λ^{* (b)}} x_{i} = X . \end{matrix}

(38)
The bootstrap version of the calibrated estimator given in (19) is

\begin{matrix} {\hat{Y}}^{w * (b)} = \sum_{i \in r} w_{i}^{* (b)} y_{i} . \end{matrix}

(39)
4) The bootstrap variance estimator for

{\hat{Y}}^{HT}

\begin{matrix} {\tilde{v}}_{b} ({\hat{Y}}^{HT}) = {(B - 1)}^{- 1} \sum_{b = 1}^{B} {({\hat{Y}}^{HT * (b)} - B^{- 1} \sum_{c = 1}^{B} {\hat{Y}}^{HT * (c)})}^{2}, \end{matrix}

(40)
and similarly for

{\hat{\hat{Y}}}^{nr}

and

{\hat{Y}}^{w}

2.2.2. A Comparison Between Bootstrap Variance Estimators and Alternatives

In this section, we review the benchmark variance estimators for the bootstrap, that is, the analytic variance estimators that the bootstrap aims at matching for the estimation of a total (Bessonneau et al. 2021). We compare them with alternative analytic variance estimators.

For the Horvitz-Thompson estimator, the benchmark variance estimator is

\begin{matrix} v_{mult} ({\hat{Y}}^{HT}) = \sum_{h = 1}^{H} [\frac{n_{h}}{n_{h} - 1} \sum_{i \in s_{h}} {(d_{i} y_{i} - \frac{1}{n_{h}} \sum_{j \in s_{h}} d_{j} y_{j})}^{2}], \end{matrix}

(41)

see Bessonneau et al. (2021, Equation 2.4). This is the unbiased variance estimator that would be used if the samples were selected with replacement inside strata. It is often called the ultimate cluster (UC) variance estimator. A more general form is

\begin{matrix} v_{uc} ({\hat{Y}}^{HT}) = \sum_{h = 1}^{H} [(1 - α_{h}) \frac{n_{h}}{n_{h} - 1} \sum_{i \in s_{h}} {(d_{i} y_{i} - \frac{1}{n_{h}} \sum_{j \in s_{h}} d_{j} y_{j})}^{2}], \end{matrix}

(42)

where $α_{h}$ is a correction factor inside strata. With $α_{h} = 0$ for any $h = 1, \dots, H$ , we retrieve the with-replacement variance estimator in Equation (41), which is conservative for the variance of ${\hat{Y}}^{HT}$ . With $α_{h} = f_{h}$ for any $h = 1, \dots, H$ , we retrieve the unbiased variance estimator in Equation (4). When the sampling fractions inside strata are small, as in the survey on discrimination that we consider there is little difference between the two alternatives. For simplicity, we focus on the case $α_{h} = 0$ in the rest of the paper, so that $v_{mult} ({\hat{Y}}^{HT})$ and $v_{uc} ({\hat{Y}}^{HT})$ are identical.

For the estimator corrected for unit non-response, the benchmark variance estimator is

\begin{matrix} v_{mult} ({\hat{\hat{Y}}}^{nr}) = \sum_{h = 1}^{H} [\frac{n_{h}}{n_{h} - 1} \sum_{i \in s_{h}} {(d_{i} u_{1 i} - \frac{1}{n_{h}} \sum_{j \in s_{h}} d_{j} u_{1 j})}^{2}], \\ with u_{1 i} = \frac{r_{i} y_{i}}{{\hat{p}}_{g (i)}} - {\frac{r_{i}}{{\hat{p}}_{g (i)}} - 1} {π_{i} {\bar{y}}_{rg (i)}}, \end{matrix}

(43)

and ${\bar{y}}_{rg}$ is defined in Equation (17); see Bessonneau et al. (2021, Equation 2.8). This estimator correctly accounts for the variance due to non-response, and is conservative for the sampling variance. Here again, the overestimation of the sampling variance is slight if the sampling fractions inside strata are small. In the linearized variable $u_{1 i}$ defined in Equation (43), the second term in the right-hand side accounts for the estimation of the response probabilities. Dropping this term leads to the simplified variance estimator

\begin{matrix} v_{simp} ({\hat{\hat{Y}}}^{nr}) = \sum_{h = 1}^{H} [\frac{n_{h}}{n_{h} - 1} \sum_{i \in s_{h}} {(d_{i} u_{2 i} - \frac{1}{n_{h}} \sum_{j \in s_{h}} d_{j} u_{2 ij})}^{2}], \\ with u_{2 i} = \frac{r_{i} y_{i}}{{\hat{p}}_{g (i)}} . \end{matrix}

This amounts to estimating the variance as if the response probabilities were known rather than estimated. This usually results in an overestimation of the variance (Kim and Kim 2007), but the overestimation is expected to be limited if the sample size is sufficiently large inside each RHG. An alternative is the ultimate cluster variance estimator, computed on the respondent sample with the weights corrected for non-response:

\begin{matrix} v_{uc} ({\hat{\hat{Y}}}^{nr}) = \sum_{h = 1}^{H} [\frac{n_{rh}}{n_{rh} - 1} \sum_{i \in r_{h}} {(d_{i}^{nr} y_{i} - \frac{1}{n_{rh}} \sum_{j \in r_{h}} d_{j}^{nr} y_{j})}^{2}], \end{matrix}

(44)

with $r_{h}$ the subset of respondents inside the sample $s_{h}$ , and $n_{rh}$ the size of $r_{h}$ . After some algebra, it can be rewritten as

\begin{matrix} v_{uc} ({\hat{\hat{Y}}}^{nr}) = \sum_{h = 1}^{H} \frac{n_{rh} (n_{h} - 1)}{n_{h} (n_{rh} - 1)} \times \frac{n_{h}}{n_{h} - 1} \sum_{i \in s_{h}} {(d_{i} u_{2 i} - \frac{1}{n_{h}} \sum_{j \in s_{h}} d_{j} u_{2 ij})}^{2} \\ - \sum_{h = 1}^{H} \frac{n_{h} - n_{rh}}{n_{rh} - 1} \times \frac{1}{n_{h}} {(\sum_{j \in r_{h}} d_{j}^{nr} y_{j})}^{2} \\ \approx v_{simp} ({\hat{\hat{Y}}}^{nr}) - \sum_{h = 1}^{H} \frac{n_{rh}}{n_{h}} (1 - \frac{n_{rh}}{n_{h}}) \times n_{h} {(\frac{1}{n_{rh}} \sum_{j \in r_{h}} d_{j}^{nr} y_{j})}^{2}, \end{matrix}

(45)

where the second (approximative) equality in Equation (45) holds if the number of respondents $n_{rh}$ inside strata are large. As compared to $v_{simp} ({\hat{\hat{Y}}}^{nr})$ , this variance estimator is therefore negatively biased. The magnitude of bias depends on the response rate inside strata, and of the magnitude of the variable of interest $y$ . In a nutshell, if the RHGs are large, we expect $v_{uc} ({\hat{\hat{Y}}}^{nr})$ to be negatively biased.

For the calibrated estimator, the benchmark variance estimator is

\begin{matrix} v_{mult} ({\hat{Y}}^{w}) = \sum_{h = 1}^{H} [\frac{n_{h}}{n_{h} - 1} \sum_{i \in s_{h}} {(d_{i} u_{3 i} - \frac{1}{n_{h}} \sum_{j \in s_{h}} d_{j} u_{3 j})}^{2}], \\ with u_{3 i} = \frac{r_{i} e_{i}}{{\hat{p}}_{g (i)}} - {\frac{r_{i}}{{\hat{p}}_{g (i)}} - 1} {π_{i} {\bar{e}}_{rg (i)}}, \end{matrix}

(46)

and where $e_{i}$ and ${\bar{e}}_{rg}$ are defined in Equations (21) and (23), respectively; see Bessonneau et al. (2021, Equation 2.11). This estimator correctly accounts for the variance due to non-response and for the calibration, but overestimates the sampling variance if the sampling fractions inside strata are non-negligible. An alternative is the ultimate cluster variance estimator, using the calibrated weights instead of the weights corrected for unit non-response:

\begin{matrix} v_{uc} ({\hat{Y}}^{w}) = \sum_{h = 1}^{H} [\frac{n_{rh}}{n_{rh} - 1} \sum_{i \in r_{h}} {(w_{i} y_{i} - \frac{1}{n_{rh}} \sum_{j \in r_{h}} w_{j} y_{j})}^{2}] . \end{matrix}

(47)

This variance estimator is asymptotically equivalent to $v_{uc} ({\hat{\hat{Y}}}^{nr})$ . It suffers from two sources of bias, which act in opposite direction. On the one hand, $v_{uc} ({\hat{Y}}^{w})$ is expected to underestimate the variance due to non-response, see our discussion on $v_{uc} ({\hat{\hat{Y}}}^{nr})$ . On the other hand, it does not account for the variance reduction due to calibration, which results in an overestimation of the variance.

The ultimate cluster variance estimator is widespread among survey data users, due to its simplicity. It is included in statistical software like the procedure SURVEYMEANS of SAS®, with the possibility to perform a finite population correction at the first-stage inside strata (Valliant 2004).

3. Description of the Data

In this section, we describe an application with real data, in which we compare the results obtained with the analytic variance estimator and the bootstrap variance estimator. This application is based on the survey on “Racism and ethno-racial discrimination,” run by Luxembourg Institute of Socio-Economic Research (LISER) in 2021. A sample of 15,000 individuals of eighteen years and older have been chosen with a probabilistic sampling design to reply to an online questionnaire. This survey has been conducted between March and December 2021 and the conclusions have been presented in 2022. The main aim of the survey is to measure the perceptions of the residents in Luxembourg and of the minority groups, with respect to discrimination or racism in several aspects of daily life: employment, housing, health, education, relations with public administration, among others. Discrimination or racism is defined as distinctions and/or marginalization based on personal characteristics such as skin color, religion, country of birth or nationality, language accent, dressing, or cultural practices. Minority groups are defined with respect to the country of birth.

We begin with summarizing the methodological plan for the survey. It is divided in four parts: definition of the reference population or sampling frame, definition of the sampling design, choice of the (undesirable) non-response correction model, and choice of the final calibration model on margins. Further information can be found in Docquier et al. (2022).

3.1. Sampling Frame

The study population is defined as the people residing in Luxembourg in October 2020, aged eighteen years and older (age defined as at December 2020). For this survey, the study population coincides with the reference population. The sampling frame of the reference population is defined with the existing administrative registers in Luxembourg, precisely the register of the physical persons (RNPP in French) and the register of affiliated individuals in the Luxembourgish social security, held by the Inspection of social security (IGSS, in French). These two registers contain the main socio-demographic information (age, gender, nationality, …) and basic socioeconomic information such as characteristics of employment (if any) and gross salary or other social benefits (see Inspection Générale de la Sécurité Sociale 2022, for further details). After some cleaning, the size of the reference population is set at 518 104 individuals.

3.2. Sampling Design

The selection of the sample is done facing two constraints: firstly, it should be possible to compute the variance of the considered estimators (mainly totals and means, but also ratios); secondly, the study requires an efficient analysis of the opinions for the minority groups. It is well know that those groups are reluctant to reply. There should therefore be, at least at the sampling phase, a minimum number of individuals sampled within this category. A stratified without-replacement probabilistic sampling design is chosen. The stratification variable is a combination of groups of birth countries, and for some of those groups the affiliation status to the social security. Precisely, birth countries are divided into thirteen groups: Luxembourg; Germany; Belgium; France; Portugal; Italy; Other European Union (EU) 14; Other EU27; Other Europe; North Africa and Muslim Asia; Sub-Saharan Africa; Other Asia (including Oceania and others); America. The birth country Luxembourg is divided into four groups, depending on the status of social security affiliation (workers-of any type; unemployed and non affiliated to the social security; retired; coinsured—insured by means of another person). The sub-population of citizens living in Luxembourg and born in Portugal has experienced a continuous growth since the implementation of guest-workers programs for manual (low-educated) workers in 1970 (Heinz et al. 2013) and then the economic crisis (Hartmann-Hirsch and Amétépé 2023). Portuguese citizens represented 16% of Luxembourg’s population in 2011, and 14.5% in 2021 (Klein and Peltier 2023). They also represent the largest foreign community among the immigrant population, with 30.8% in 2021 (Klein and Peltier 2023). For web-mode surveys, it is well know that low educated individuals tend to have a lower response rate. Moreover, previous experience with surveys indicate that Portuguese are not receptive with the web mode either (Mathä et al. 2023). Benefiting from the large number of people born in Portugal, we can partition this strata into three groups using the working status as a proxy of education. Then, the birth country Portugal is split into three groups of social security affiliation (manual workers; other type of workers; other type of affiliation). Overall, there are $H = 18$ strata.

A sample allocation of 15000 individuals is done under the following criteria. Firstly, proportional allocation across strata of 8500 individuals. Secondly, allocation of 1500 individuals in the strata of people born in Portugal. Thirdly, an allocation of 5000 individuals into the strata of people born outside European countries. The over-sampling inside those strata corresponds to an anticipated lower response rate, which has been observed in previous surveys. Therefore, a minimum sample size (achieved with the over-sampling) is settled to try to reach a sufficient number of respondents to have efficient estimations. Table 1 displays the totals population/sample size by stratum.

Table 1.

Distribution of the Population and Sample Across Strata.

Strata	Auxiliary variables		Size
	Birth country	Soc. Seq. Affiliation	$N_{h}$	$n_{h}$
1	Luxembourg	Worker	112,405	1,844
2		Unemployed, voluntary assured, non-affiliated, other	38,150	626
3		Retired	43,994	722
4		Co-assured	40,160	659
5	Germany		15,372	252
6	Belgium		19,649	322
7	France		40,226	660
8	Portugal	Manual worker	29,161	1,110
9		Other worker	13,075	499
10		Other affiliation	26,951	1,026
11	Italy		17,934	294
12	Other UE14		17,591	289
13	Other UE27		19,275	316
14	Other Europe		26,149	429
15	North Africa and Muslim Asia		13,268	1,362
16	Other Africa		20,200	2,072
17	Other Asia and Oceania and Other		12,692	1,302
18	America		11,852	1,216
Total			518,104	15,000

3.3. Non-Response Correction

After one month of fieldwork, there are 2949 valid responses (around 20% response rate). Since we are no longer working with the probability sample but only with a sub-sample, we might introduce non-response bias. Indeed, the behavior of the respondents may differ from the behavior of the non-respondents. To correct for the non-response, the well-know method of Response Homogeneity Groups (RHGs) is implemented (Särndal and Swensson 1987). This method consists in partitioning the sample so that the response in each group is quite homogeneous, but different from the other groups. In practice, five auxiliary variables explain the non-response: nationality, income class, age class, affiliation status to the social security, and country of birth. The RHGs are defined in Table 2, where the number of respondents and the response rate are also given for each group. We wish to highlight the variability in the response rates. On the one hand, the response rate of Portuguese citizens/nationals is very low, and approximately half of the average response rate. On the other hand, the non-Portuguese with income greater than 5600€ have the highest response rate, which is about 33%. These differences state clearly that there could be some non-negligible bias introduced by non-response if no correction is applied.

Table 2.

Response Homogeneous Groups (RGH): Description and Response Rate (in %).

RHG	Description	Number of respondents ( $n_{r}$ )	Response rate (in %)
RHG1	Portuguese	345	10.70
RHG2	Non PortugueseIncome less than 1,800€	271	15.87
RHG3	Non PortugueseIncome between 1,800 and 5,599€Less than 30 years old.	344	18.23
RHG4	Non PortugueseIncome between 1,800 and 5,599€30 years old or older.Non blue collar and Country of birth:NOT IN France, Belgium, Germany.	977	19.84
RHG5	Non PortugueseIncome between 1,800 and 5,599€30 years old or older.Blue collar.Non blue collar and Country of birth:France, Belgium, Germany	451	29.19
RHG6	Non PortugueseIncome of 5,600€ and more.	561	32.77

3.4. Calibration on Margins

The last step is the so called calibration on margins. The idea is to adjust the data from the survey to external population totals. Usually, this calibration is done after the non-response correction. Apart from the RHG defined before (see Table 2), the calibration variables are: Country of birth (Luxembourg; Portugal; France, Belgium, Germany; Other Europe; North Africa and Muslim Asia; Other Africa; Asia, Oceania and Other; America); Nationality (Luxembourgish; Other European; African; Asian; American); gender and age group (women: 18–29 years old; 30–39 years old; 40–45 years old; 55 and older; men: 18–29 years old; 30–39 years old; 40–45 years old; 55 and older); combination of affiliation status and income class (manual worker and unemployed; non-manual worker with income lower than 1,800€; non-manual worker with income 1,800 to 5,999€ and self-employed; white collar with income higher than 5,600€ and civil servant; retired, invalid or pre-retired; voluntary assured and non-affiliated); co-insured. The auxiliary variables used for the calibration are given in Table 3. The calibration was performed with the raking ratio distance function (Deville and Särndal 1992). The CALMAR macro (Sautory 1993) allows east implementation of the calibration procedure with the SAS software.

Table 3.

Calibration Variables and Associated Totals.

Variables	Description	Population
Birth country	Luxembourg	234,709
	Portugal	69,187
	France, Belgium, Germany	75,247
	Other Europe	80,949
	North Africa and Muslim Asia	13,268
	Other Africa	20,200
	Asia and Oceania and Other	12,692
	America	11,852
Nationality	Luxembourgish	273,410
	Other European	214,572
	African	15,148
	Asian	8,732
	American	6,242
Gender age group	Women 18–29 years old	49,941
	Women 30–39 years old	49,643
	Women 40–55 years old	68,993
	Women 55 and older	89,958
	Men 18–29 years old	52,782
	Men 30–39 years old	50,616
	Men 40–54 years old	72,454
	Men 55 and older	83,717
Affiliation, Income	Manual worker, unemployed	85,932
	Non-manual worker, less than 1,800€	28,983
Affiliation, Income	Non-manual worker 1,800–5,599€, self employed	77,165
	Non-manual worker over 5,600€, civil servant	78,781
	Retired	75,044
	Voluntary insured, not affiliated	102,431
	Co-insured	69,768
MGH	MGH1	76,516
	MGH2	47,121
	MGH3	71,485
	MGH4	179,909
	MGH5	72,830
	MGH6	70,243

4. Application

In this section, we compare the results obtained with the analytic variance estimator presented in Subsection 2.1, and the bootstrap variance estimator presented in Algorithm 3, when estimating a proportion. From the data described in Section 3, we consider the following variable of the questionnaire (Docquier et al. 2022): Regarding your attitude toward racism (hierarchies between human groups related to skin color, country of origin, religion, the consonance of a surname/given name, clothing and cultural practices, etc.), which of the following statements comes closest to your view?

(1) Human races do not exist.

(2) All human races are equal.

(3) There are races superior or inferior to others.

(4) No response.

(5) I don’t know.

Our aim is to estimate the distribution of the responses for the question, which can be also calculated as the proportion of people replying yes to each statement. The modalities are exclusive, and we can therefore transform it into 5 dummy variables.

We define the variable of interest $y_{j \cdot}, j = 1, \dots, 5$ as

\begin{matrix} y_{ji} = {\begin{matrix} 1 if the individual ireplies yes to question j, \\ 0 otherwise, \end{matrix} \end{matrix}

for $i \in r$ . Since the population total $N$ is known, the proportion can be computed as the population mean. Define ${\bar{Y}}_{j} = N^{- 1} \sum_{i \in U} y_{ji}$ as the true proportion of individuals replying yes to each question $j = 1, \dots, J$ . This parameter can be estimated by

{\hat{\bar{Y}}}_{j}^{w} = N^{- 1} {\hat{Y}}_{j}^{w} = N^{- 1} \sum_{i \in r} w_{i} y_{ji} .

(48)

The analytic variance estimator $v_{lin} ({\hat{\bar{Y}}}_{j}^{w})$ associated to the estimator in Equation (48) is computed as follows. For each question $j = 1, \dots, J$ , we obtain the estimated residuals ${\hat{e}}_{ji}$ from the weighted regression of $y_{ji}$ on the calibration variables $x_{i}$ listed in Table 3. Then we compute $g_{i} {\hat{e}}_{ji}$ where $g_{i} = w_{i} / d_{i}^{nr}$ is the so-called g-weight, and we replace $e_{i}$ with $g_{i} {\hat{e}}_{ji}$ in Equation (22).

The results for each question $j = 1, \dots, J$ are summarized in Table 4. We display the estimated proportion (in percentage) and the associated standard deviation. Moreover, the lower bound (LB) and the upper bound (UB) of the confidence intervals are given, as well as the coefficient of variation (in percentage). The confidence intervals are computed under the assumption of asymptotic normality, with a nominal one-tailed error rate of $2.5 %$ . We are interested in confidence intervals of level 1–2 $α$ . With $α = 2.5 %$ , we obtain confidence intervals at 95%.

Table 4.

Estimated Proportion ${\hat{\bar{Y}}}_{j}^{w}$ , Standard Deviation (SD), Lower Bound (LB) and Upper Bound (UB) of the Confidence Interval, and Coefficient of Variation (CV) all in % for J = 5 questions of “Actitude towards racisim”. The Analytic Variance Estimator Presented in Section 2.1.4 is Used for Variance Estimation.

Question	$j$	${\hat{\bar{Y}}}_{j}^{w}$	SD	LB	UB	CV
Human races do not exist	1	24.32	0.93	22.49	26.15	3.83
All human races are equal	2	62.72	1.07	60.63	64.82	1.70
There are races superior or inferior to others	3	4.29	0.47	3.37	5.20	10.90
No response	4	5.84	0.53	4.79	6.88	9.15
I don’t know	5	2.83	0.37	2.11	3.56	13.02

The coefficient of variation is relatively low for j = {1, 2}, since more than half of the respondents chooses this modality. However, for j = 3 the CV is higher than 10%. We highlight the statements j = {2, 3} since those are the statements where there are more respondents (j = 2) or where the respondents are more scarce (j = 3), respectively. We do not consider further $j = {4, 5}$ .

We then perform the bootstrap procedure presented in algorithm 3, using the raking ratio distance function in Step 4. We obtain four vectors of $B = 500, 1 000, 1 500, 2 000$ bootstrap final weights respectively, to evaluate the effect of the number of bootstrap iterations on variance estimation.

We compute a normality-based confidence interval (CI) using the analytic variance estimator in Equation (22)

\begin{matrix} I C_{Ana} ({\bar{Y}}_{j}^{w}) = [{\hat{\bar{Y}}}_{j}^{w} \pm Z_{1 - α} {v_{lin} ({\hat{\bar{Y}}}_{j}^{w})}^{1 / 2}], \end{matrix}

(49)

where $Z_{1 - α}$ is the quantile of order $1 - α$ of the standard normal distribution. We compare the coverage results with those under three bootstrap based confidence intervals as in Bessonneau et al. (2021). Firstly, the bootstrap normality based confidence interval is obtained by replacing in Equation (49) the analytic variance estimator with the bootstrap variance estimator in (40), which leads to

I C_{Norm} ({\bar{Y}}_{j}^{w}) = [{\hat{\bar{Y}}}_{j}^{w} \pm Z_{1 - α} {{\tilde{v}}_{b} ({\hat{\bar{Y}}}_{j}^{w})}^{1 / 2}] .

(50)

Secondly, the percentile confidence interval relies upon the assumption that the conditional distribution of the bootstrap calibrated estimators in Equation (39) is a good approximation of the distribution of the calibrated estimator ${\hat{\bar{Y}}}_{j}^{w}$ . The bootstrap calibrated estimators are ranked in ascending order as ${\hat{\bar{Y}}}_{j}^{w * (b)}, b = 1 \dots, B$ , and the percentile confidence interval is

I C_{Perc} ({\bar{Y}}_{j}) = [{\hat{\bar{Y}}}_{j}^{w * (α B)}, {\hat{\bar{Y}}}_{j}^{w * ({1 - α} B)}] .

(51)

Thirdly, the reverse percentile confidence interval (a.k.a. basic confidence interval) uses of the conditional distribution of ${\hat{\bar{Y}}}_{j}^{w *} - {\hat{\bar{Y}}}_{j}^{w}$ to approximate the distribution of ${\hat{\bar{Y}}}_{j}^{w} - {\bar{Y}}_{j}$ . The confidence interval is

I C_{PercInv} ({\bar{Y}}_{j}) = [2 {\hat{\bar{Y}}}_{j}^{w} - {\hat{\bar{Y}}}_{j}^{w * ({1 - α} B)}, 2 {\hat{\bar{Y}}}_{j}^{w} - {\hat{\bar{Y}}}_{j}^{w * (α B)}]

(52)

We first study the distribution of the bootstrap calibrated estimators ${\hat{\bar{Y}}}_{j}^{w *}$ given in (39), to check whether this estimator is approximately normally distributed. Figure 1 displays the empirical and the theoretical density, the Q-Q plot, the empirical and theoretical cumulative distribution function (CDF), and the P-P plot for the bootstrap estimator ${\hat{\bar{Y}}}_{2}^{w *}$ associated to the variable $y_{2 i}$ , with $B = 500$ (left part) or $B = 2 000$ (right part) bootstrap replications. Similar results are presented in Figure 2 for the variable $y_{3 i}$ . For ${\hat{\bar{Y}}}_{2}^{w *}$ , in which the responses are more concentrated, we see from Figure 1 that with $B = 500$ replications, the adequacy with the normal distribution is good. As expected, when increasing to $B = 2 000$ bootstrap replications, the conclusion remains the same. In the case of ${\hat{\bar{Y}}}_{3}^{w *}$ where the respondents are more scarce, it is seen from the left part of Figure 2 that the adequacy with the normal distribution is slightly worse with $B = 500$ replications. In this case, a larger number of replications may be needed, and we obtain from the right part of Figure 2 that with $B = 2 000$ replications, the fit is better. We obtained no qualitative difference with $B = 1 000$ or $B = 1 500$ replications, and the results are therefore not reported.

Figure 1.

Empirical and theoretical density (top left), Q-Q plot (top right), empirical and theoretical CDF (bottom left), and P-P plot (bottom right) for the distribution of ${\hat{\bar{Y}}}_{2}^{w *}$ with $B = 500$ bootstrap replications (left panel) or $B = 2000$ bootstrap replications (right panel).

Figure 2.

Empirical and theoretical density (top left), Q-Q plot (top right), empirical and theoretical CDF (bottom left), and P-P plot (bottom right) for the distribution of ${\hat{\bar{Y}}}_{3}^{w *}$ with $B = 500$ bootstrap replications (left panel) or $B = 2000$ bootstrap replications (right panel).

Figure 3 displays the normality based confidence interval for ${\hat{\bar{Y}}}_{2}^{w}$ (left panel) and for ${\hat{\bar{Y}}}_{3}^{w}$ (right panel), obtained with the analytic variance estimator and the bootstrap variance estimator with B = 500 to B = 2,000 replications. The results obtained are very similar for both variables, irrespective of the number of bootstrap replications. However, we have to be cautious since, as seen from Table 4, the CV may be a bit large.

Figure 3.

Normality based confidence intervals (Norm) for ${\hat{\bar{Y}}}_{2}^{w}$ (left), and for ${\hat{\bar{Y}}}_{3}^{w}$ (right) obtained with the analytic variance (black line), B = 500 (brown line), 1,000 (blue line), 1,500 (green line), and 2,000 (yellow line) bootstrap replicate weights respectively (in percentage).

Figure 4 depicts normality based (Norm), percentile (Perc), and reverse (PercInv) confidence intervals for ${\hat{\bar{Y}}}_{2}^{w}$ (left panel) and ${\hat{\bar{Y}}}_{3}^{w}$ (right panel) obtained with the bootstrap replicate weights. The three confidence intervals display similar results for ${\hat{\bar{Y}}}_{2}^{w}$ and ${\hat{\bar{Y}}}_{3}^{w}$ . Tables A.1 to A.3 support this conclusion. The right part of Figure 4 shows higher instability of the three confidence intervals. This may be due to the fact that the number of responses is more scarce. In this case, $500$ bootstrap replicates may not be enough to obtain a stable confidence interval.

Figure 4.

Normality based (Norm), percentile (Perc), and reverse (PercInv) confidence intervals for ${\hat{\bar{Y}}}_{2}^{w}$ (left), and for ${\hat{\bar{Y}}}_{3}^{w}$ (right) obtained with the B = 500 (brown line), 1,000 (blue line), 1,500 (green line), and 2,000 (yellow line) bootstrap replicate weights respectively (in percentage).

4.1. Simulation Study

We perform a simulation study to evaluate the performance of analytic and bootstrap variance estimators. We consider a realistic population, based on the survey “Racism and discrimination ethno-racial.” From the subset of respondents, we create a pseudo-population by duplicating each observation, using the calibrated weight rounded to the closest integer for the number of duplications. This leads to a pseudo-population of 518 094 individuals.

We consider two variables from the questionnaire. Firstly, the same variable as in Section 3, only for the cases $j = 2$ (“All human races are equal”) and $j = 3$ (“There are races superior or inferior to each other”). Secondly, we consider the variable “social environment” measured in terms of income. The complete description and some descriptive statistics of this last variable are given in Appendix “Perception of Social Environment.” The peculiarity of this variable is that the estimated proportion for category $j = 3$ (Medium) is nearly 0.5, which corresponds to a large variability.

The sampling design and the non-response pattern are the same as in the real survey. A stratified simple random sample $s$ of size $n_{s} = 15 000$ is selected without replacement, following the same allocation criteria as described in Subsection 3.2. The sample is partitioned into six response homogeneity groups (RHGs), using the definition given in Table 2. Each unit inside a RHG responds independently of the others, where the response probability is the same than in the survey, see Table 2. This leads to the sub-sample of respondents $r$ . The sampling and the non-response steps are repeated $L = 2 000$ times.

For a binary variable $y$ , we are interested in the proportion $\bar{Y} = Y / N$ . We first consider the estimator of the proportion adjusted for non-response

\begin{matrix} {\hat{\hat{\bar{Y}}}}^{nr} = \frac{{\hat{\hat{Y}}}^{nr}}{N}, \end{matrix}

(53)

where ${\hat{Y}}^{w}$ is defined in Equation (15). We compute the linearized variance estimator $v_{lin} ({\hat{\hat{\bar{Y}}}}^{nr}) = N^{- 2} v_{lin} ({\hat{\hat{Y}}}^{nr})$ , where $v_{lin} ({\hat{\hat{Y}}}^{nr})$ is defined in Equation (16), and the ultimate cluster variance estimator $v_{uc} ({\hat{\hat{\bar{Y}}}}^{nr}) = N^{- 2} v_{uc} ({\hat{\hat{Y}}}^{nr})$ , where $v_{uc} ({\hat{\hat{Y}}}^{nr})$ is defined in Equation (44). We also consider the calibrated estimator of the proportion

{\hat{\bar{Y}}}^{w} = \frac{{\hat{Y}}^{w}}{N},

(54)

where ${\hat{Y}}^{w}$ is defined in Equation (19). We compute the linearized variance estimator $v_{lin} ({\hat{\bar{Y}}}^{w}) = N^{- 2} v_{lin} ({\hat{Y}}^{w})$ , where $v_{lin} ({\hat{Y}}^{w})$ is defined in Equation (22), and the ultimate cluster variance estimator $v_{uc} ({\hat{\bar{Y}}}^{w}) = N^{- 2} v_{uc} ({\hat{Y}}^{w})$ , where $v_{uc} ({\hat{Y}}^{w})$ is defined in Equation (47). For the second variable of interest (“There are races superior or inferior to others”), we also consider the estimation of the proportion by education level, defined according to $edu = 1, 2, 3$ (low, medium, and high, respectively); see Table B.2 for the sample sizes inside each category, and for some descriptive statistics. We are therefore interested in the estimation of a ratio $R = Y / Z$ . The estimator adjusted for non-response is

\begin{matrix} {\hat{\hat{R}}}^{nr} = \frac{{\hat{\hat{Y}}}^{nr}}{{\hat{\hat{Z}}}^{nr}} . \end{matrix}

(55)

For example, for the estimation of the proportion for people with a low education level, we have $z_{i} = 1 (edu = 1)$ , with $1 (\cdot)$ the indicator function, and $y_{i} = z_{i} y_{3 i}$ . The linearized variance estimator $v_{lin} ({\hat{\hat{R}}}^{nr})$ is obtained by replacing in Equation (16) the variable $y_{i}$ with $u_{i} = {\hat{\hat{Z}}}^{nr}^{- 1} (y_{i} - {\hat{\hat{R}}}^{nr} z_{i})$ . The ultimate cluster variance estimator $v_{uc} ({\hat{\hat{R}}}^{nr})$ is obtained by replacing in Equation (44) the variable $y_{i}$ with the same variable $u_{i}$ . The calibrated estimator is

\begin{matrix} {\hat{R}}^{w} = \frac{{\hat{Y}}^{w}}{{\hat{Z}}^{w}} . \end{matrix}

(56)

The linearized variance estimator $v_{lin} ({\hat{R}}^{w})$ is obtained by replacing in Equation (22) the variable $y_{i}$ with the variable $ℓ_{i}$ defined in Equation (26). The ultimate cluster variance estimator $v_{uc} ({\hat{R}}^{w})$ is obtained by replacing in Equation (47) the variable $y_{i}$ with the same variable $ℓ_{i}$ .

For all the estimators in Equations (53) to (56), we also compute a bootstrap variance estimator obtained by applying Algorithm 3 with $B = 1, 000$ . For each estimator $\hat{θ}$ of a parameter $θ$ , we compute the normalized root mean square error

\begin{matrix} NRMSE (\hat{θ}) = 100 \times \frac{\sqrt{MSE (\hat{θ})}}{θ}, \end{matrix}

(57)

with $MSE (\hat{θ})$ a simulation-based approximation of the mean square error of $\hat{θ}$ , obtained from an independent run of $10 000$ simulations. To measure the bias of a variance estimator $v (\hat{θ})$ , we use the Monte Carlo Percent Relative Bias

\begin{matrix} RB {v (\hat{θ})} = 100 \times \frac{R^{- 1} \sum_{c = 1}^{R} v_{c} ({\hat{θ}}_{c}) - MSE (\hat{θ})}{MSE (\hat{θ})}, \end{matrix}

(58)

where $v_{c} ({\hat{θ}}_{c})$ stands for the variance estimator in the c-th sample. As a measure of stability of $v (\hat{θ})$ , we use the Relative Stability

\begin{matrix} RS {v (\hat{θ})} = 100 \times \frac{{[R^{- 1} \sum_{c = 1}^{R} {v_{c} ({\hat{θ}}_{c}) - MSE (\hat{θ})}^{2}]}^{1 / 2}}{MSE (\hat{θ})} . \end{matrix}

(59)

Also, we compute the coverage rates of the confidence interval associated to the percentile bootstrap, to the basic bootstrap and to the normality-based confidence interval, with nominal one-tailed error rate of 2.5% in each tail.

The simulation results for the estimation of proportions are given in Table 5 for the bootstrap, and in Table 6 for the linearization and the ultimate cluster approach. We first note that the NRMSE is smaller with the calibration, as could be expected, although the gain in efficiency is moderate. The bootstrap variance estimator is almost unbiased in all cases, with RB no larger than 4%. We note that the fact that we did not include the finite population corrections inside strata does not affect the performance of the bootstrap variance estimator. The linearization variance estimator tends to be slightly negatively biased, but the absolute RB is no larger than 6%. In terms of RS, the linearization variance estimator is usually slightly more stable. The coverage rates are fairly well respected in all cases for the three bootstrap procedures for interval estimation, and for the normality-based confidence interval with the linearized variance estimator. As expected (see Subsection 2.2.2), the ultimate cluster variance estimator is usually negatively biased for the estimator adjusted for non-response. The bias is particularly large if the proportions are close to 0.5 (in which case, the variability of the estimated proportion is larger). In case of the calibrated estimator, the ultimate cluster variance estimator is positively biased, which is due to the fact that the variance reduction due to calibration is not accounted for. The RB is as high as 16% for the calibrated estimation of the proportion of people who perceive their environment as modest. As expected, the coverage rate is too small if the ultimate cluster variance estimator is negatively biased, and too large if it is positively biased.

Table 5.

Simulation Results for the Bootstrap Procedure and the Estimation of Proportions.

Question	Parameter	NRMSE(%)	Bootstrap
			RB	RS	Percentile			Rev. percentile			Norm. based
			(%)	(%)	L	U	L+U	L	U	L+U	L	U	L+U
	Estimator adjusted for non-response
Attitude toward racism
All human races are equal	0.63	1.96	−0.56	5.32	2.05	2.65	4.70	2.30	2.85	5.15	2.15	2.55	4.70
There are races superior/inferior to others	0.04	10.48	0.17	12.87	1.60	3.50	5.10	1.45	4.25	5.70	1.45	3.75	5.20
Perception of social environment
Precarious	0.04	10.03	−2.45	14.92	1.45	3.15	4.60	1.20	4.25	5.45	1.30	3.35	4.65
Modest	0.20	4.22	0.47	7.13	2.00	2.60	4.60	1.65	2.80	4.45	1.75	2.50	4.25
Medium	0.51	2.44	−1.14	5.32	2.55	2.05	4.60	2.45	2.10	4.55	2.45	2.10	4.55
Wealthy	0.17	4.85	1.38	7.07	2.30	2.60	4.90	2.05	2.90	4.95	2.20	2.50	4.70
Don’t know	0.08	15.30	0.42	10.06	1.95	2.85	4.80	1.70	3.30	5.00	1.70	3.05	4.75
	Calibrated estimator
Attitude toward racism
All human races are equal	0.63	1.68	−0.27	5.37	2.90	2.40	5.30	3.05	2.40	5.45	2.75	2.20	4.95
There are races superior/inferior to others	0.04	10.42	0.00	12.77	1.75	3.95	5.70	1.65	4.40	6.05	1.60	4.00	5.60
Perception of social environment
Precarious	0.04	10.00	−3.17	14.40	1.55	3.05	4.60	1.20	4.05	5.25	1.20	3.45	4.65
Modest	0.20	4.01	0.49	6.47	1.85	2.60	4.45	1.90	2.60	4.50	1.80	2.50	4.30
Medium	0.51	2.15	−2.36	5.70	2.55	2.25	4.80	2.35	2.25	4.60	2.40	2.35	4.75
Wealthy	0.17	4.59	2.35	7.10	2.30	2.40	4.70	1.90	2.70	4.60	2.15	2.50	4.65
Don’t know	0.08	14.89	0.11	9.30	1.70	3.10	4.80	1.75	3.10	4.85	1.60	2.95	4.55

Table 6.

Simulation Results with the Linearization and the Ultimate Cluster Variance Estimators for the Estimation of Proportions.

Question	Parameter	NRMSE (%)	Linearization					Ultimate cluster approach
			RB (%)	RS (%)	L	U	L+U	RB (%)	RS (%)	L	U	L+U
	Estimator adjusted for non-response
Attitude toward racism
All human races are equal	0.63	1.96	−0.87	2.73	2.20	2.45	4.65	−21.35	21.49	4.05	4.25	8.30
There are races superior/inferior to others	0.04	10.48	−0.40	12.00	1.50	3.80	5.30	−0.93	11.86	1.55	3.80	5.35
Perception of social environment
Precarious	0.04	10.03	−3.16	14.10	1.25	3.30	4.55	−2.76	14.17	1.25	3.25	4.50
Modest	0.20	4.22	−0.72	5.44	1.90	2.80	4.70	2.98	6.65	1.75	2.55	4.30
Medium	0.51	2.44	−1.37	3.06	2.55	2.05	4.60	−21.00	21.12	3.40	3.45	6.85
Wealthy	0.17	4.85	0.80	5.26	2.30	2.55	4.85	−6.56	7.95	2.70	3.15	5.85
Don’t know	0.08	15.30	−0.24	8.95	1.80	3.10	4.90	−2.52	8.93	2.05	3.25	5.30
	Calibrated estimator
Attitude toward racism
All human races are equal	0.63	1.68	−2.70	3.94	2.95	2.20	5.15	9.75	10.45	1.85	1.90	3.75
There are races superior/inferior to others	0.04	10.42	−2.48	11.58	1.65	4.20	5.85	1.11	12.42	1.55	3.75	5.30
Perception of social environment
Precarious	0.04	10.00	−5.78	13.93	1.40	3.50	4.90	−1.32	14.19	1.20	3.05	4.25
Modest	0.20	4.01	−2.06	5.25	2.15	2.80	4.95	15.59	16.91	1.00	1.90	2.90
Medium	0.51	2.15	−4.79	5.44	2.30	2.30	4.60	3.52	4.82	1.95	1.85	3.80
Wealthy	0.17	4.59	−0.05	4.74	2.15	2.75	4.90	5.48	7.56	1.80	2.30	4.10
Don’t know	0.08	14.89	−2.42	8.34	1.55	3.15	4.70	3.80	9.85	1.25	2.85	4.10

The simulation results for the estimation of ratios are given in Table 7 for the bootstrap, and in Table 8 for the linearization and the ultimate cluster approach. In terms of NRMSE, there is virtually no difference between the estimator adjusted for non-response and the calibrated estimator. All the variance estimators perform well in terms of RB, which are no larger than 4% in all cases, and perform similarly in terms of RS. Concerning the confidence intervals, we note that in all cases the coverage rates are not well respected in each tail, which is likely due to the fact that the distribution of the estimated ratio is skewed. Overall, the percentile bootstrap performs slightly better and the reverse percentile bootstrap performs slightly worse. The three normality-based confidence intervals perform similarly.

Table 7.

Simulation Results for the Bootstrap Procedure and the Estimation of Ratios.

Question	Parameter	NRMSE (%)	Bootstrap
			RB (%)	RS (%)	Percentile			Rev. percentile			Norm. based
			RB (%)	RS (%)	L	U	L+U	L	U	L+U	L	U	L+U
		Estimator adjusted for non-response
All human races are equal (by education level)
Low	0.06	28.41	0.38	32.03	1.50	5.45	6.95	0.65	8.90	9.55	0.95	7.10	8.05
Medium	0.04	18.05	−0.95	20.44	1.45	4.25	5.70	0.80	6.15	6.95	0.95	4.75	5.70
High	0.03	15.84	2.10	20.71	1.45	3.90	5.35	1.00	5.25	6.25	1.15	4.80	5.95
		Calibrated estimator
All human races are equal (by education level)
Low	0.06	28.46	0.50	32.35	1.40	5.80	7.20	0.75	9.10	9.85	1.00	7.10	8.10
Medium	0.04	18.03	−0.79	20.55	1.45	4.20	5.65	0.75	5.80	6.55	1.00	5.10	6.10
High	0.03	15.77	2.13	20.32	1.55	4.20	5.75	1.00	5.60	6.60	1.25	4.85	6.10

Table 8.

Simulation Results with the Linearization and the Ultimate Cluster Variance Estimators for the Estimation of Ratios.

Question	Parameter	NRMSE (%)	Linearization					Ultimate cluster approach
	Parameter	NRMSE (%)	RB (%)	RS (%)	L	U	L+U	RB (%)	RS (%)	L	U	L+U
	Estimator adjusted for non-response
All human races are equal (by education level)
Low	0.06	28.41	−0.46	31.67	0.85	7.3	8.15	0.02	31.77	0.85	7.2	8.05
Medium	0.04	18.05	−1.39	19.82	0.95	4.9	5.85	−0.97	19.89	0.95	4.8	5.75
High	0.03	15.84	1.31	19.89	1.2	4.75	5.95	1.70	19.99	1.2	4.75	5.95
	Calibrated estimator
All human races are equal (by education level)
Low	0.06	28.46	−2.35	30.91	1.15	7.35	8.50	1.11	33.37	0.80	7.05	7.85
Medium	0.04	18.03	−3.36	19.32	1.15	5.00	6.15	0.41	21.31	0.95	4.90	5.85
High	0.03	15.77	−0.51	19.03	1.45	4.90	6.35	3.67	20.95	1.25	4.50	5.75

5. Conclusions

In this article, we have compared the rescaled bootstrap with the linearization and the ultimate cluster variance estimator. Our simulation results indicate that the bootstrap and the linearization perform similarly in terms of relative bias of the variance estimator. On the other hand, the ultimate cluster variance estimator can be strongly biased for estimating the variance of the estimator of the total. The good performance of the bootstrap was not necessarily expected, since the sampling fractions in some strata are not negligible. Bootstrap seems therefore a viable approach, even with moderate sampling fractions.

These results may be interesting to the survey data users, who rarely have access to all the information needed to compute the analytical variance; namely, the design variables, and the auxiliary variables used for the definition of the response homogeneity groups or for calibration.

For non-linear parameters, the variance may be estimated by using the linearization technique, which is somewhat laborious since a specific linearized variable needs to be computed for each parameter. On the other hand, the bootstrap makes it possible to estimate the variance by simply computing the dispersion of the bootstrapped versions of the estimator. The approach is therefore straightforward for data users, once replicated weights accounting for all the estimation steps (non-response and calibration) are released with the survey dataset.

The sampling design used in the survey on racism is relatively simple, since a sampling frame is available in the target population. Unfortunately, this sampling design is not very common for household and social surveys, for which some form of multistage sampling is more likely. An evaluation of the rescaled bootstrap on a real-life multistage survey would be of great practical interest. In other cases, a sampling frame is not available, and the target population may only be surveyed through an intermediary population by means of indirect sampling (Lavallée 2007), and by using the weight share method (Deville and Lavallée 2006). Variance estimation is somewhat complex in case of indirect sampling, since a synthetic variable needs to be used in the variance estimator (e.g., Chauvet et al. 2023). A valid bootstrap procedure for indirect sampling would be both of theoretical and practical interest.

Like many sample surveys, the survey on racism and discrimination suffers from significant non-response. In this context, it is particularly important to model the non-response mechanism as well as possible, in order to obtain adjusted estimators making it possible to limit the bias due to non-response. Beyond its use for variance estimation, bootstrap can also be used to compare the effectiveness of estimators adjusted for non-response, for example by testing the division of the sample into finer homogeneous response groups. This is an aspect that would be very important for users, which we plan to invest in.

Footnotes

Appendix A

Appendix B

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

María Guadarrama

Received: October 2023

Accepted: December 2024

References

Beaumont

J.-F.

Émond

2022. “A Bootstrap Variance Estimation Method for Multistage Sampling and Two-Phase Sampling When Poisson Sampling is Used at the Second Phase.”Stats 5 (2): 339–57. DOI: https://doi.org/10.3390/stats5020019.

Beaumont

J.-F.

Haziza

2016. “A Note on the Concept of Invariance in Two-Phase Sampling Designs.”Survey Methodology 42 (2): 319–23. DOI: https://www150.statcan.gc.ca/n1/pub/12-001-x/2016002/article/14662-eng.pdf.

Beaumont

J.-F.

Patak

2012. “On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling.”International Statistical Review 80 (1): 127–48. DOI: https://doi.org/10.1111/j.1751-5823.2011.00166.x.

Bessonneau

Brilhaut

Chauvet

Garcia

2021. “With-Replacement Bootstrap Variance Estimation for Household Surveys Principles, Examples and Implementation.”Survey Methodology 47 (2): 313–47. DOI: http://www.statcan.gc.ca/pub/12-001-x/2021002/article/00005-eng.htm.

Caron

1998. “Le logiciel poulpe : aspects méthodologiques.”Proceedings of the Journées de Méthodologie Statistique 84: 173–200. DOI: https://www.bnsp.insee.fr/ark:/12148/bc6p06xvh14/f1.pdf.

Chauvet

2007. “Méthodes de bootstrap en population finie.” PhD thesis, University of Rennes2.

Chauvet

Bouriaud

Brion

2023. “An Extension of the Weight Share Method When Using a Continuous Sampling Frame.”Survey Methodology 49 (1): 139–62. DOI: http://www.statcan.gc.ca/pub/12-001-x/2023001/article/00011-eng.htm.

Chen

Haziza

Mashreghi

2022. “A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs.”Stats 5 (2): 521–37. DOI: https://doi.org/10.3390/stats5020031.

Deville

Lavallée

2006. “Indirect Sampling: The Foundations of the Generalized Weight Share Method.”Survey Methodology 32 (2): 165. DOI: https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2006002/article/9551-eng.pdf.

10.

Deville

J.-C.

1999. “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.”Survey Methodology 25 (2): 193–204. DOI: https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X19990024882.

11.

Deville

J.-C.

Särndal

C.-E.

1992. “Calibration Estimators in Survey Sampling.”Journal of the American Statistical Association 87 (418): 376–82. DOI: https://doi.org/10.1080/01621459.1992.10475217.

12.

Deville

J.-C.

Särndal

C.-E.

Sautory

1993. “Generalized Raking Procedures in Survey Sampling.”Journal of the American Statistical Association 88 (423): 1013–20. DOI: https://doi.org/10.1080/01621459.1993.10476369.

13.

Docquier

Tenikue

Brosius

, et al. 2022. Le racisme et les discriminations ethno-raciales au Luxembourg: rapport d’étude quantitative et qualitative. Ministère de la Famille, de l’Intégration et à la Grande Région.

14.

Efron

1979. “Bootstrap Methods: Another Look at the Jackknife.”Annals of Statistics 7 (1): 1–26. DOI: https://doi.org/10.1214/aos/1176344552.

15.

Hartmann-Hirsch

Amétépé

F. S.

2023. “Portuguese Immigration in Luxembourg: The Right to Free Movement.” In Between Europeanisation and Renationalisation of the Free Movement of Persons: A Financial Crisis-Induced Migration from Portugal to Luxembourg, edited by C. Hartmann-Hirsch and F. S. Amétépé. Springer.

16.

Heinz

Peltier

Thill

2013. “Les portugais au luxembourg.”STATEC. https://statistiques.public.lu/fr/publications/series/RP-2011–-Premiers-resultats/2013/rp11-18-13.html.

17.

Horvitz

D. G.

Thompson

D. J.

1952. “A Generalization of Sampling Without Replacement from a Finite Universe.”Journal of the American Statistical Association 47 (260): 663–85. DOI: https://doi.org/10.1080/01621459.1952.10483446.

18.

Inspection Générale de la Sécurité Sociale. 2022. “Luxembourg Microdata Platform on Labour and Social Protection: Data Dictionary.”https://igss.gouvernement.lu/dam-assets/microdata-platform/data-dictionary.pdf.

19.

Juillard

Chauvet

2018. “Variance Estimation Under Monotone Non-Response for a Panel Survey.”Survey Methodology 44 (2): 1–35. DOI: https://www150.statcan.gc.ca/n1/pub/12-001-x/2018002/article/54952-eng.pdf.

20.

Kim

J. K.

Kim

J. J.

2007. “Nonresponse Weighting Adjustment Using Estimated Response Probability.”Canadian Journal of Statistics 35 (4): 501–514. DOI: https://doi.org/10.1002/cjs.5550350403.

21.

Klein

Peltier

2023. “Une population de plus en plus cosmopolite – Nous comptons car vous comptez 2021 – Recensement de la population Luxembourg.”STATEC. https://statistiques.public.lu/dam-assets/recensement/publication-5/docs/rp01-05-population-nationalit-v10.pdf.

22.

Lavallée

2007. Indirect Sampling. New York: Springer Science & Business Media.

23.

Mashreghi

Haziza

Léger

2016. “A Survey of Bootstrap Methods in Finite Population Sampling.”Statistics Surveys 10: 1–52. DOI: https://doi.org/10.1214/16-SS113.

24.

Mathä

T. Y.

Montes-Viñas

Pulina

Ziegelmeyer

2023. “The Luxembourg Household Finance and Consumption Survey: Results from the Fourth Wave in 2021.” Technical Report, Central Bank of Luxembourg.

25.

McCarthy

P. J.

1969. “Pseudo-Replication: Half Samples.”Revue de l’Institut International de Statistique 37: 239–64. DOI: https://doi.org/10.2307/1402116.

26.

Quenouille

M. H.

1956. “Notes on Bias in Estimation.”Biometrika 43 (3/4): 353–60. DOI: https://doi.org/10.1093/biomet/43.3-4.353.

27.

Rao

J. N. K.

2006. “Bootstrap Methods for Analyzing Complex Sample Survey Data.”Proceedings of Statistics Canada International Symposium Series. Symposium.

28.

Rao

J. N. K.

1988. “Resampling Inference with Complex Survey Data.”Journal of the American Statistical Association 83 (401): 231–41. DOI: https://doi.org/10.1080/01621459.1988.10478591.

29.

Rao

J. N. K.

C. F. J.

Yue

1992. “Some Recent Work on Resampling Methods for Complex Surveys.”Survey Methodology 18 (2): 209–217. DOI: https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X199200214486.

30.

Rust

K. F.

Rao

J. N. K.

1996. “Variance Estimation for Complex Surveys Using Replication Techniques.”Statistical Methods in Medical Research 5 (3): 283–310. DOI: https://doi.org/10.1177/096228029600500305.

31.

Särndal

C.-E.

Swensson

1987. “A General View of Estimation for Two Phases of Selection with Applications to Two-Phase Sampling and Nonresponse.”International Statistical Review/Revue Internationale de Statistique279–94. DOI: https://doi.org/10.2307/1403406.

32.

Sautory

1993. “La macro calmar.”Redressement d’un échatillon par calage sur marges, Serie des documents de travail de la Direction des Statistiques Démographiques et Sociales, 55.

33.

Tukey

1958. “Bias and Confidence in Not Quite Large Samples.”Annals of Mathematical Statistics 29: 614. DOI: https://doi.org/10.1214/aoms/1177706647.

34.

Valliant

2004. “The Effect of Multiple Weighting Steps on Variance Estimation.”Journal of Official Statistics 20 (1): 1. DOI: https://www.researchgate.net/publication/255589970_The_Effect_of_Multiple_Weighting_Steps_on_Variance_Estimation.

35.

Woodruff

R. S.

1971. “A Simple Method for Approximating the Variance of a Complicated Estimate.”Journal of the American Statistical Association 66 (334): 411–4. DOI: https://doi.org/10.1080/01621459.1971.10482279.

Comparison of Approaches for Variance Estimation with Application to a Survey on Discrimination

Abstract

Keywords

1. Introduction

2. Methodology

2.1. Analytical Variance

2.1.1. Sampling Design

2.1.2. Non-Response Correction

2.1.3. Calibration

2.1.4. Extension for Other Parameters

2.2. Bootstrap Variance

2.2.1. The Rescaled Bootstrap

2.2.2. A Comparison Between Bootstrap Variance Estimators and Alternatives

3. Description of the Data

3.1. Sampling Frame

3.2. Sampling Design

3.3. Non-Response Correction

3.4. Calibration on Margins

4. Application

4.1. Simulation Study

5. Conclusions

Footnotes

Appendix A

Appendix B

Funding

ORCID iD

References