Comparison of Small Area Procedures Based on Gamma Distributions with Extension to Informative Sampling

Abstract

The gamma distribution is a useful model for small area prediction of a skewed response variable. We study the use of the gamma distribution for small area prediction. We emphasize a model, called the gamma-gamma model, in which the area random effects have gamma distributions. We compare this model to a generalized linear mixed model. Each of these two models has been proposed independently in the literature, but the two models have not yet been formally compared. We evaluate the properties of two mean square error estimators for the gamma-gamma model, both of which incorporate corrections for the bias of the estimator of the leading term. Finally, we extend the gamma-gamma model to informative sampling. We conduct thorough simulation studies to assess the properties of the alternative predictors. We apply the proposed methods to data from an agricultural survey.

Keywords

empirical best prediction bootstrap MSE estimator agricultural survey

1. Introduction

Response variables of interest in applications are often positive and have asymmetric distributions. Examples documented in the fields of health, economics, and agriculture include the body mass index (Pfeffermann and Sverchkov 2007), income (Molina and Rao 2010), and sheet and rill erosion (Berg and Chandra 2014). These types of data are often used to gain a deeper understanding of the characteristics of sub-populations (sub-domains) defined by geographic regions or socio-demographic groups. Such subdivisions are usually granular and therefore have sample sizes that are small or even zero. This motivates the definition of a small area (domain) as any sub-population where the area-specific data are insufficient to assure direct domain estimates of acceptable precision. Estimation procedures for small domains commonly employ indirect estimators based on small area models that incorporate between-area variation and auxiliary variables (Jiang and Lahiri 2006; Morales et al. 2021; Pfeffermann 2013; Rao and Molina 2015). A fundamental small area model is the unit-level linear mixed model of Battese et al. (1988). This model assumes normal distributions and is not immediately suitable for positive, skewed data.

A common approach for skewed data is to apply the unit-level linear mixed model of Battese et al. (1988), after an appropriate transformation. In the framework of a unit-level lognormal model, Berg and Chandra (2014) develop closed-form expressions for an empirical Bayes predictor of a small area mean. Lyu et al. (2020) and Zimmermann and Münnich (2018) extend the lognormal model to zero-inflated data and informative sampling, respectively. Berg and Chandra (2014), Lyu et al. (2020), and Zimmermann and Münnich (2018) focus on prediction of means, but many small area parameters are more complex functions of the model response variable. Molina and Rao (2010) obtain a Monte Carlo (MC) approximation for the best predictor of a general small area parameter, assuming that the nested error linear regression model holds after a transformation of the response variable. Guadarrama et al. (2018) develop predictors of general small area parameters under an informative sample design. Rojas-Perilla et al. (2020) extend Molina and Rao (2010) to data-driven transformations that are more general than the log transformation. Though widely used, transformations have drawbacks. As noted in Graf et al. (2019), log transformed response variables often exhibit skewness relative to a normal distribution, and finding a suitable transformation can be difficult.

An alternative to a transformation is to model the response variable directly. The gamma distribution enables the analyst to model skewed data in the original scale, without need for a transformation. Hobza et al. (2020) compares several small area predictors, developed under a generalized linear mixed model (GLMM) with a gamma distributed response variable. A challenge with the gamma GLMM is that the likelihood and empirical best (EB) predictors involve intractable integrals. Dreassi et al. (2014) use Bayesian inference procedures to construct small area estimates under the assumptions of a gamma GLMM. Berg et al. (2016) compare predictors based on lognormal and gamma distributions through simulation, and find that the predictors based on the gamma distribution seem more robust to model misspecification. Graf et al. (2019) develop empirical best small area predictors of both means and more general small area parameters under the assumptions of a generalized gamma inverse-gamma distribution. Unlike the gamma GLMM, the model of Graf et al. (2019) leads to tractable integrals and predictors with closed-form expressions. The works of Hobza et al. (2020) and Graf et al. (2019) provide the impetus for the research in this paper.

We study small area predictors based on gamma distributions. We focus on a unit-level model with a gamma response distribution and gamma distributed random effects, which we call the “gamma-gamma” model. The gamma-gamma model is a special case of the more general model of Graf et al. (2019), with a slightly different parametrization. We compare predictors based on the gamma-gamma model to the predictors based on the gamma GLMM. An important innovation of this work is that we develop predictors for the gamma-gamma model as well as estimators of their prediction errors in the context of an informative sample design. Our approach to informative sampling transfers the fundamental concepts of Pfeffermann and Sverchkov (2007) to the gamma-gamma framework.

Although the models in this paper are not new, our work has several important contributions. First, the extension of the gamma-gamma model to informative sampling is the most substantive contribution because Graf et al. (2019) only consider noninformative designs. Second, we formally compare predictors based on the gamma-gamma model to predictors based on the gamma GLMM through simulation. Graf et al. (2019) only compare their model to a lognormal model, and Hobza et al. (2020) exclusively consider the gamma GLMM. We evaluate the robustness of the predictors by simulating data from both the gamma-gamma model and the gamma GLMM. Our third contribution is in the area of mean square error (MSE) estimation. We evaluate the properties of several MSE estimators for the gamma-gamma model through simulation. We also propose an MSE estimator that has not yet been used in combination with the gamma-gamma model. This MSE estimator differs from the estimator of Graf et al. (2019). Graf et al. (2019) use a standard parametric bootstrap MSE estimator. In contrast, we decompose the MSE as a sum of two terms and estimate the two terms in the MSE directly. Note that although Graf et al. (2019) propose an MSE estimator, they do not evaluate its properties through simulation. We also innovate on the MSE estimator of Graf et al. (2019) by applying a bootstrap bias correction to the MSE estimator of Graf et al. (2019) and extending them to informative sampling. Our final two contributions are relatively minor but are valuable nonetheless. We develop predictors for the gamma-gamma model using a hierarchical formulation that is computationally easier to implement than the formulation based on marginal distributions in Graf et al. (2019). Finally, we generalize the procedures of Hobza et al. (2020) to prediction of small area parameters that are more general than the class of additive small area parameters. Hobza et al. (2020) define predictors for additive small area parameters of the form $N_{i}^{- 1} \sum_{j = 1}^{N_{i}} \tilde{h} (y_{ij})$ , where $y_{i 1}, \dots, y_{i N_{i}}$ denote the variables of interest for the N_i elements of the population for area i, where $\tilde{h}$ is a user-specified function. We define predictors for more general small area parameters of the form $h (y_{i 1}, \dots, y_{i N_{i}})$ . An important type of non-additive small area parameter that we consider in our study is the population quantile.

The procedures discussed in this paper are relevant to studies of sheet and rill erosion, or soil loss due to the flow of water. Small area estimates of sheet and rill erosion are valuable for assessing the efficacy of conservation programs. Sheet and rill erosion is positive, and past studies (Berg and Chandra 2014; Lyu et al. 2020) have documented the distribution of sheet and rill erosion to be skewed right. The data on sheet and rill erosion are derived from large-scale surveys that use complex designs. Therefore, the predictors that we develop for informative sampling in the context of the gamma distribution are imperative for the application to small area estimation of sheet and rill erosion.

Our study of small area prediction based on gamma distributions is organized as follows. In Section 2, we define the gamma-gamma model and the gamma GLMM. In Section 3, we propose two MSE estimators for the gamma-gamma model. In Section 4, we extend the gamma-gamma model to an informative sample design. In Section 5, we present simulation studies that (1) compare the gamma-gamma model to the gamma GLMM and (2) evaluate the alternative MSE estimators. The data analysis is presented in Section 6. We summarize the main conclusions in Section 7.

2. Small Area Estimation Based on Gamma Distributions

We establish a common notation that we will use for both the gamma-gamma model and the gamma GLMM. Let $i = 1, \dots, D$ denote the areas, and let $j = 1, \dots, N_{i}$ index the elements in the population for area i. Let y_ij denote the response variable for unit j in area i, where the support of y_ij is $(0, \infty)$ . Let x _ij denote the $(p + 1)$ - dimensional covariate associated to element (i, j), where x _ij includes an intercept. Assume that y_ij is observed for a sample of n_i elements in area i. Without loss of generality, let $j = 1, \dots, n_{i}$ index the sampled elements, and let $j = n_{i} + 1, \dots, N_{i}$ index the non-sampled elements. We let $y_{i} = {(y_{is}^{T}, y_{ir}^{T})}^{T}$ , where $y_{is} = {(y_{i 1}, \dots, y_{i n_{i}})}^{T}$ , and $y_{ir} = {(y_{i (n_{i} + 1)}, \dots, y_{i N_{i}})}^{T}$ . We consider prediction of a general small area parameter defined as

\begin{matrix} θ_{i} = h (y_{i 1}, \dots, y_{i N_{i}}), \end{matrix}

(1)

where $h (\cdot)$ is a specified function.

A critical assumption of our procedure is that the covariate x _ij is known for all elements of the population. This is an assumption of many nonlinear small area models, such as Berg and Chandra (2014) and Graf et al. (2019). This condition is satisfied if the covariates are derived from a census or administrative database.

Suppose a population model is specified for y_ij, and let E_p denote expectation with respect to the population distribution. Under the model, the minimum mean square error predictor of θ_i is defined as

\begin{matrix} {\tilde{θ}}_{i}^{BP} = E_{p} (θ_{i} ∣ y_{is}), \end{matrix}

which is called the best predictor (BP) of the area parameter. In this section, we develop best predictors of θ_i under two models that assume a gamma distribution for the response variables. Section 2.1 and Section 2.2 describe the gamma-gamma model and the gamma GLMM, respectively.

2.1. Unit-Level Gamma-Gamma Small Area Model

Assume that the population is generated under a gamma-gamma small area model defined as

\begin{matrix} y_{ij} ∣ u_{i} & \overset{ind}{~} Gamma (α, \exp (x_{ij}^{T} γ) u_{i}), j = 1, \dots, N_{i}, i = 1, \dots, D, \end{matrix}

(2)

where $u_{i} \overset{iid}{~} Gamma (δ, δ)$ , and $γ = {(γ_{0}, \dots, γ_{p})}^{T}$ . We use the notation $Gamma (a, b)$ to represent a gamma distribution with shape parameter a and rate parameter b. The Equation (2) can be viewed as a special case of the model in Graf et al. (2019), with a slightly different parametrization. We use a gamma distribution for u_i, while Graf et al. (2019) use an inverse-gamma distribution for the area random effect in a transformed scale.

We first consider small area prediction of the mean defined as

\begin{matrix} {\bar{y}}_{N_{i}} = \frac{1}{N_{i}} \sum_{j = 1}^{N_{i}} y_{ij}, i = 1, \dots, D . \end{matrix}

Theorem 1 gives the best predictor of ${\bar{y}}_{N_{i}}$ under the model (2). Although Theorem 1 can be cast as a special case of results in Graf et al. (2019), we state Theorem 1 and its proof here for completeness. The formulas in Theorem 1 are slightly different than the formulas in Graf et al. (2019) because we parametrize the distribution of the random effect differently.

Theorem 1: Under the gamma-gamma model, the best predictor of the small area mean is

\begin{matrix} {\tilde{\bar{y}}}_{N_{i}}^{BP} (γ, α, δ, y_{is}) = \frac{1}{N_{i}} [\sum_{j = 1}^{n_{i}} y_{ij} + \sum_{j = n_{i} + 1}^{N_{i}} α \exp (- x_{ij}^{T} γ) \\ \times \frac{{\sum_{j = 1}^{n_{i}} y_{ij} \exp (x_{ij}^{T} γ)} + δ}{n_{i} α + δ - 1}] . \end{matrix}

(3)

A proof of Theorem 1 is given in the Supplemental Material. We use the notation ${\tilde{\bar{y}}}_{N_{i}}^{BP} (γ, α, δ, y_{is})$ to emphasize dependence of the best predictor on the unknown model parameters and the observed data.

We next consider prediction of more general small area parameters of the form in Equation (1). Depending on the complexity of the real-valued function $h (\cdot)$ in Equation (1), an analytic expression of the best predictor, such as the mean predictor in Equation (3), may not exist. We adopt the approach of Molina and Rao (2010) and use a Monte Carlo approximation for the best predictor. By the proof of Theorem 1, the conditional distribution of u_i given the data is given by

\begin{matrix} u_{i} ∣ y_{is} ~ Gamma (n_{i} α + δ, \sum_{j = 1}^{n_{i}} y_{ij} \exp (x_{ij}^{T} γ) + δ) . \end{matrix}

This convenient form for the distribution of u_i enables us to develop a simple algorithm for approximating the best predictor of θ_i. For $ℓ = 1, \dots, L$ , repeat the following steps:

Generate $u_{i}^{(ℓ)} ~ Gamma (n_{i} α + δ, \sum_{j = 1}^{n_{i}} y_{ij} \exp (x_{ij}^{T} γ) + δ)$ , $i = 1, \dots, D$ .

Generate $y_{ij}^{* (ℓ)} ~ Gamma (α, \exp (x_{ij}^{T} γ) u_{i}^{(ℓ)})$ , $j = n_{i} + 1, \dots N_{i}$ .

Define

\begin{matrix} θ_{i}^{(ℓ)} = h (y_{is}^{T}, y_{i (n_{i + 1})}^{* (ℓ)}, \dots, y_{i N_{i}}^{* (ℓ)}) . \end{matrix}

An approximation for the best predictor of the area parameter is defined as

\begin{matrix} {\tilde{θ}}_{i}^{BP} (γ, α, δ, y_{is}) = \frac{1}{L} \sum_{ℓ = 1}^{L} θ_{i}^{(ℓ)} . \end{matrix}

(4)

The notation ${\tilde{θ}}_{i}^{BP} (γ, α, δ, y_{is})$ emphasizes dependence of the best predictor on the unknown $γ$ , α, and δ.

The algorithm above is slightly simpler than the algorithm of Graf et al. (2019). We exploit the convenient form of the conditional distribution of u_i to generate y_ij for nonsampled elements through a hierarchical process that involves first generating u_i and then generating y_ij given u_i. In contrast, Graf et al. (2019) generate y_ij from the marginal distribution of $y_{ij} ∣ y_{is}$ . Simulating from the conditional distributions, as in our algorithm, is easier than simulating from the marginal distribution, as in Graf et al. (2019).

The best predictor is a function of the unknown model parameters, denoted as $ψ = {(α, δ, γ^{T})}^{T}$ . Calculation of a predictor requires an estimator of the model parameters. We propose to use maximum likelihood estimation. Theorem 2 gives the closed-form expression for the likelihood.

Theorem 2: The likelihood for the model parameters, $ψ = {(α, δ, γ^{T})}^{T}$ , under the Equation (2) is of the form

\begin{matrix} L (ψ; y_{s}) = Π_{i = 1}^{D} f (y_{is} ∣ ψ), \end{matrix}

where $y_{s} = {(y_{1 s}^{T}, \dots, y_{Ds}^{T})}^{T}$ and

\begin{matrix} f (y_{is} ∣ ψ) = \frac{δ^{δ}}{{Γ (α)}^{n_{i}} Γ (δ)} Π_{j = 1}^{n_{i}} y_{ij}^{α - 1} \times \exp (α {(\sum_{j = 1}^{n_{i}} x_{ij})}^{T} γ) \\ \times \frac{Γ (n_{i} α + δ)}{{(\sum_{j = 1}^{n_{i}} {y_{ij} \exp (x_{ij}^{T} γ)} + δ)}^{n_{i} α + δ}} . \end{matrix}

A proof of Theorem 2 is given in the Supplemental Material. Let $\hat{ψ} = {(\hat{α}, \hat{δ}, {\hat{γ}}^{T})}^{T}$ denote the maximum likelihood estimator defined as

\begin{matrix} \hat{ψ} = {argmax}_{ψ} L (ψ; y_{s}) . \end{matrix}

Given the maximum likelihood estimator, we define an empirical best (EB) predictor by substitution of $ψ$ with $\hat{ψ}$ . The empirical best predictor of the mean is defined as

\begin{matrix} {\hat{\bar{y}}}_{N_{i}}^{EB} = {\tilde{\bar{y}}}_{N_{i}}^{BP} (\hat{γ}, \hat{α}, \hat{δ}, y_{is}) . \end{matrix}

(5)

The predictor Equation (5) is obtained by evaluating the closed-form expression for the best predictor of the mean in Equation (3) at the maximum likelihood estimators. To define an empirical best predictor of a general small area parameter, we repeat steps 1 to 3 above with the maximum likelihood estimators in place of the true model parameters. This enables us to define the EB predictor of a general small area parameter by

\begin{matrix} {\hat{θ}}_{i}^{EB} = {\tilde{θ}}_{i}^{BP} (\hat{γ}, \hat{α}, \hat{δ}, y_{is}), \end{matrix}

(6)

where

\begin{matrix} {\tilde{θ}}_{i}^{BP} (\hat{γ}, \hat{α}, \hat{δ}, y_{is}) = L^{- 1} \sum_{ℓ = 1}^{L} {\hat{θ}}_{i}^{(ℓ)}, \end{matrix}

(7)

\begin{matrix} {\hat{θ}}_{i}^{(ℓ)} = h (y_{i 1}, \dots, y_{i n_{i}}, y_{i (n_{i} + 1)}^{* (ℓ)}, \dots, y_{i N_{i}}^{* (ℓ)}), y_{im}^{* (l)} ~ Gamma (\hat{α}, \exp (x_{ij}^{T} \hat{γ}) u_{i}^{* (ℓ)}), \\ and u_{i}^{* (ℓ)} ~ Gamma (n_{i} \hat{α} + \hat{δ}, \sum_{j = 1}^{n_{i}} y_{ij} \exp (x_{ij}^{T} \hat{γ}) + \hat{δ}) . \end{matrix}

We refer to the predictor Equation (6) as the EB predictor. When we use the EB predictor Equation (6) to predict the mean, we obtain an MC approximation for the closed-form predictor Equation (5).

2.2. Gamma GLMM

Define a unit-level gamma GLMM by

\begin{matrix} y_{ij} ∣ v_{i} & \overset{ind}{~} Gamma (ν, \frac{ν}{μ_{ij}}), i = 1, \dots, D, j = 1, \dots, n_{i}, \end{matrix}

(8)

where $g (μ_{ij}) = x_{ij}^{T} β + v_{i}, v_{i} \overset{iid}{~} N (0, ϕ^{2})$ , and $β$ is $(p + 1)$ - dimensional. We use the log link function for the mean parameter μ_ij, such that $g (μ_{ij}) = \log (μ_{ij})$ . This model is comparable to the gamma-gamma model in that it has a constant shape parameter and the area random effects are modeled with only one model parameter. Let $ψ^{GLMM} = {(β^{T}, ϕ, ν)}^{T}$ denote the model parameters of the gamma GLMM. Hobza et al. (2020) studies small area estimation for the gamma GLMM, focusing on a more general model with different shape parameters for the different areas. We use the simpler specification with the common shape parameter to ensure that the number of model parameters in the GLMM is the same as the number of parameters in the gamma-gamma model. With our specification, $ψ$ and $ψ^{GLMM}$ have the same dimension. Hobza et al. (2020) also uses a different link function than the log link. We prefer the log link because it ensures that the predictors remain in the parameter space. As mentioned in Hobza et al. (2020), one can fit the model (8) using the R function glmer from the lme4 package. Let ${\hat{ψ}}^{GLMM} = {({\hat{β}}^{T}, \hat{ϕ}, \hat{ν})}^{T}$ denote the resulting estimates. While Hobza et al. (2020) defines predictors of additive small area parameters, we define a more general procedure that is applicable to both additive and non-additive small area parameters.

Note that the best predictor of θ_i can be expressed as a ratio of two integrals:

\begin{matrix} E (θ_{i} ∣ y_{is}) = \int h (y_{is}, y_{ir}) f (y_{ir} ∣ y_{is}) d y_{ir} \\ = \frac{\int h (y_{is}, y_{ir}) \int f (y_{ir} ∣ v_{i}) f (y_{is} ∣ v_{i}) f (v_{i}) d v_{i} d y_{ir}}{\int f (y_{is} ∣ v_{i}) f (v_{i}) d v_{i}} . \end{matrix}

The integrals do not have closed form expressions. We use sampling importance resampling (Smith and Gelfand 1992) to approximate the integrals. The iterative Monte Carlo algorithm is as follows:

For $ℓ_{1} = 1, \dots, L_{1}$ , generate $v_{i}^{(ℓ_{1})} ~ N (0, {\hat{ϕ}}^{2})$ .

(a) for $ℓ_{2} = 1, \dots, L_{2}$ , generate $y_{ir}^{(ℓ_{1}, ℓ_{2})} ~ \hat{f} (y_{ir} ∣ v_{i}^{(ℓ_{1})}; {\hat{ψ}}^{GLMM})$ , where

\begin{matrix} \hat{f} (y_{ir} ∣ v_{i}^{(ℓ_{1})}; {\hat{ψ}}^{GLMM}) = Π_{j = n_{i} + 1}^{N_{i}} g_{ij} (v_{i}^{(ℓ_{1})}, \hat{β}, \hat{ν}), \end{matrix}

and $g_{ij} (v_{i}^{(ℓ_{1})}, \hat{β}, \hat{ν})$ is the density of a gamma distribution with shape parameter $\hat{ν}$ and rate parameter $\hat{ν} / \exp (x_{ij}^{T} \hat{β} + v_{i}^{(ℓ_{1})})$ .

(b) Calculate

\begin{matrix} {\hat{A}}_{hi}^{(ℓ_{1})} = \hat{f} (y_{is} ∣ v_{i}^{(ℓ_{1})}; {\hat{ψ}}^{GLMM}) \times \frac{1}{L_{2}} \sum_{ℓ_{2} = 1}^{L_{2}} h (y_{is}, y_{ir}^{(ℓ_{1}, ℓ_{2})}) \end{matrix}

and ${\hat{B}}_{hi}^{(ℓ_{1})} = \hat{f} (y_{is} ∣ v_{i}^{(ℓ_{1})}; {\hat{ψ}}^{GLMM})$ .

\begin{matrix} {\hat{θ}}_{i}^{EB_GLMM} = \frac{\sum_{ℓ_{1} = 1}^{L_{1}} {\hat{A}}_{hi}^{(ℓ_{1})}}{\sum_{ℓ_{1} = 1}^{L_{1}} {\hat{B}}_{hi}^{(ℓ_{1})}} . \end{matrix}

(9)

Note that the predictor defined in Equation (9) is an empirical best predictor under the GLMM, while the predictor defined in Equation (6) is an empirical best predictor under the gamma-gamma model. Both predictors are empirical best predictors, but they are constructed under different model assumptions.

Another predictor, referred to as the plug-in predictor, is defined as

\begin{matrix} {\hat{θ}}_{i}^{PI} = h (y_{is}, {\tilde{μ}}_{ir}), \end{matrix}

(10)

where ${\tilde{μ}}_{ir} = {({\tilde{μ}}_{i (n_{i} + 1)}, \dots, {\tilde{μ}}_{i N_{i}})}^{T}$ , ${\tilde{μ}}_{ij} = \exp (x_{ij}^{T} \hat{β} + {\hat{v}}_{i})$ , and we use the R function ranef to obtain the predicted area effects ${\hat{v}}_{i}$ given ${\hat{ψ}}^{GLMM}$ .

3. MSE Estimation

In this section, we define two estimators of the MSE of $L = \infty$ , where ${\hat{θ}}_{i}^{EB}$ is defined in Equation (6). The MSE estimator of Section 3.1 is an adaptation of the general procedure of Cho and Berg (2022) to the specific gamma-gamma model. In Section 3.2, we explain an existing parametric bootstrap MSE estimator in the context of the gamma-gamma model. We later compare the two MSE estimators through simulation in Section 5.2. We do not investigate MSE estimation for the gamma GLMM because we find, through the simulations of Section 5.1, that the gamma-gamma model is generally preferable to the gamma GLMM.

3.1. Proposed MSE Estimator

Suppose we use $L = \infty$ in the prediction procedure of Section 2.1 in order to ignore the variability from the MC approximation used to construct the EB predictor Equation (6). Then, note that the MSE of the predictor ${\hat{θ}}_{i}^{EB}$ can be decomposed as

\begin{matrix} MSE ({\hat{θ}}_{i}^{EB}) = M_{i 1} + M_{i 2}, \end{matrix}

(11)

where $M_{i 1} = E [V (θ_{i} ∣ y_{is}; ψ)]$ , $M_{i 2} = E [{({\hat{θ}}_{i, \infty} - {\tilde{θ}}_{i, \infty} (γ, α, δ))}^{2}]$ , and $({\hat{θ}}_{i, \infty}, {\tilde{θ}}_{i, \infty} (γ, α, δ)) = \lim_{L \to \infty} ({\hat{θ}}_{i}^{EB}, {\tilde{θ}}_{i}^{BP} (γ, α, δ, y_{is}))$ . Cho and Berg (2022) provide a more rigorous development of the decomposition Equation (11). Also, see Rao and Molina (2015) and Reluga et al. (2023) for a similar decomposition of the MSE.

The first term M_i1, called the leading term, is the MSE of the best predictor, and its unbiased estimator is $V (θ_{i} ∣ y_{is}; ψ)$ . In practice, due to the unknown model parameters $ψ$ , we use $V (θ_{i} ∣ y_{is}; \hat{ψ})$ as the leading term estimator, and approximate it as

\begin{matrix} V (θ_{i} ∣ y_{is}; \hat{ψ}) & \approx \frac{1}{L - 1} \sum_{ℓ = 1}^{L} {({\hat{θ}}_{i}^{(ℓ)} - {\hat{θ}}_{i}^{EB})}^{2} = : {\hat{M}}_{1 i}, \end{matrix}

where ${\hat{θ}}_{i}^{(ℓ)}$ is defined following Equation (7).

The extra variation induced by replacing $ψ$ with $\hat{ψ}$ in the best predictor is accounted for by the second component M_i2. The analytical form of M_i2 is difficult to obtain, so we use the parametric bootstrap to approximate it. For $b = 1, \dots, B$ , repeat the following steps:

Generate the bootstrap sample $y_{is}^{* (b)} = {(y_{i 1}^{* (b)}, \dots, y_{i n_{i}}^{(b)})}^{T}$ , $i = 1, \dots, D$ from model (2) as $y_{ij}^{* (b)} \overset{ind}{~} Gamma (\hat{α}, \exp (x_{ij}^{T} \hat{γ}) u_{i}^{* (b)})$ , $j = 1, \dots, n_{i}$ , where $u_{i}^{* (b)} \overset{iid}{~} Gamma (\hat{δ}, \hat{δ}), i = 1, \dots, D$ .

Estimate the bootstrap version of the model parameter estimates, ${\hat{ψ}}^{* (b)}$ , by maximizing the likelihood with the bootstrap data generated in step 1. Specifically, ${\hat{ψ}}^{* (b)} = argma x_{ψ} L (ψ; y_{s}^{* (b)}) = {({\hat{α}}^{* (b)}, {\hat{δ}}^{* (b)}, {\hat{γ}}^{* (b), T})}^{T}$ .

Calculate the bootstrap predictor, ${\hat{θ}}_{i}^{EB * (b)} = {\tilde{θ}}_{i}^{BP} ({\hat{γ}}^{* (b)}, {\hat{α}}^{* (b)}, {\hat{δ}}^{* (b)}, y_{is})$ . Note that the bootstrap predictor is obtained by applying the algorithm defined in Section 2.1 with the bootstrap model parameter estimator and the original data. Implementation of this algorithm results in simulated samples $θ_{i}^{(ℓ, b)}$ . Calculate the bootstrap MC approximation for $V (θ_{i} ∣ y_{is}; {\hat{ψ}}^{* (b)})$ , denoted ${\hat{M}}_{1 i}^{* (b)}$ , as ${\hat{M}}_{1 i}^{* (b)} = {(L - 1)}^{- 1} \sum_{ℓ = 1}^{L} {(θ_{i}^{(ℓ, b)} - {\hat{θ}}_{i}^{EB * (b)})}^{2}$ .

Then, define the estimator of M_2i as:

\begin{matrix} {\hat{M}}_{2 i} = \frac{1}{B} \sum_{b = 1}^{B} {({\hat{θ}}_{i}^{EB * (b)} - {\hat{θ}}_{i}^{EB})}^{2} . \end{matrix}

A preliminary estimator of the MSE of θ_i is defined as

\begin{matrix} {mse}_{i}^{noBC} = {\hat{M}}_{1 i} + {\hat{M}}_{2 i} . \end{matrix}

(12)

The label “noBC” is used to indicate that the MSE estimator Equation (12) does not incorporate a correction for the bias of the estimator of the leading term.

However, the estimator of leading term ${\hat{M}}_{1 i}$ is a biased estimator for M_1i due to the replacement $ψ$ with $\hat{ψ}$ . To adjust this bias, we may estimate it by utilizing ${\hat{M}}_{1 i}^{* (b)}$ , $b = 1, \dots, B$ , which is the byproduct of the bootstrap procedure. We can define an additive bias correction as ${\hat{M}}_{1 i}^{Add} = {\hat{M}}_{1 i} - ({\bar{M}}_{1 i}^{* B} - {\hat{M}}_{1 i})$ , or a multiplicative correction as, ${\hat{M}}_{1 i}^{Mult} = {\hat{M}}_{1 i}^{2} [{\bar{M}}_{1 i}^{* B}]^{- 1}$ , where ${\bar{M}}_{1 i}^{* B} = \frac{1}{B} \sum_{b = 1}^{B} {\hat{M}}_{1 i}^{* (b)}$ . Thus, the additive and multiplicative bias-corrected MSE estimators can be defined as:

{mse}_{i}^{Add} = {\hat{M}}_{1 i}^{Add} + {\hat{M}}_{2 i},

(13)

{mse}_{i}^{Mult} = {\hat{M}}_{1 i}^{Mult} + {\hat{M}}_{2 i} .

(14)

These classic additive and multiplicative bias-correction approaches are straightforward, but Hall and Maiti (2006) mention several issues with those approaches. The additive and multiplicative bias-correction could produce a negative leading term estimator when ${\bar{M}}_{1 i}^{* B} > {\hat{M}}_{1 i}$ and unreliable estimators, respectively. Thus, Hall and Maiti (2006) suggest a different bias-correction defined as

\begin{matrix} {\hat{M}}_{1 i}^{HM} = {\begin{matrix} {\hat{M}}_{1 i}^{Add}, & if {\hat{M}}_{1 i} \geq {\bar{M}}_{1 i}^{* B}, \\ {\hat{M}}_{1 i} \exp [- {{\bar{M}}_{1 i}^{* B} - {\hat{M}}_{1 i})} / {\bar{M}}_{1 i}^{* B}], & if {\hat{M}}_{1 i} < {\bar{M}}_{1 i}^{* B} . \end{matrix} \end{matrix}

As a special case of the general bias corrections given in Hall and Maiti (2006), we further define a compromise between ${\hat{M}}_{1 i}^{Add}$ and ${\hat{M}}_{1 i}^{Mult}$ as

\begin{matrix} {\hat{M}}_{1 i}^{Comp} = {\begin{matrix} {\hat{M}}_{1 i}^{Add}, & if {\hat{M}}_{1 i} \geq {\bar{M}}_{1 i}^{* B}, \\ {\hat{M}}_{1 i}^{Mult}, & if {\hat{M}}_{1 i} < {\bar{M}}_{1 i}^{* B} . \end{matrix} \end{matrix}

In summary, the bias-corrected MSE estimators are constructed by

\begin{matrix} {mse}_{i}^{HM} = {\hat{M}}_{1 i}^{HM} + {\hat{M}}_{2 i}, \end{matrix}

(15)

and

\begin{matrix} {mse}_{i}^{Comp} = {\hat{M}}_{1 i}^{Comp} + {\hat{M}}_{2 i} . \end{matrix}

(16)

3.2. Existing Parametric Bootstrap MSE Estimators

Instead of estimating M_i1 and M_i2 separately, one can use the parametric bootstrap to estimate $MSE ({\hat{θ}}_{i})$ directly. Molina et al. (2007), Graf et al. (2019), and Hobza et al. (2020) use the single-stage bootstrap method to estimate the MSE. One may implement the double-bootstrap algorithm suggested by Hall and Maiti (2006) to correct the single-stage estimator. However, the double-bootstrap is computationally expensive and may not be feasible for a large population. Thus, we consider a simpler double-bootstrap motivated by Erciulescu and Fuller (2014) and Reluga et al. (2023), where we generate only one bootstrap replicate in the second-stage bootstrap. The following algorithm describes how to obtain those estimators.

Obtain the maximum likelihood estimate of the model parameter $\hat{ψ}$ based on y _is.

For $b_{1} = 1, \dots, B_{1}$ , independently generate the bootstrap population $y_{i}^{* (b_{1})} = {(y_{i 1}^{* (b_{1})}, \dots, y_{i N_{i}}^{* (b_{1})})}^{T}$ , $i = 1, \dots, D$ from model (2) as

y_{ij}^{* (b_{1})} \overset{ind}{~} Gamma (\hat{α}, \exp (x_{ij}^{T} \hat{γ}) u_{i}^{* (b_{1})}), j = 1, \dots, N_{i},

where $u_{i}^{* (b_{1})} \overset{iid}{~} Gamma (\hat{δ}, \hat{δ})$ for $i = 1, \dots, D$ .

3. Calculate the bootstrap version of a small area parameter as $θ_{i}^{* (b_{1})} = h (y_{i}^{* (b_{1})})$ . Estimate the bootstrap model parameter estimates ${\hat{ψ}}^{* (b_{1})} = {({\hat{α}}^{* (b_{1})}, {\hat{δ}}^{* (b_{1})}, {\hat{γ}}^{(b_{1}) T})}^{T}$ with the bootstrap sample $y_{is}^{* (b_{1})}$ . Then, use the bootstrap sample, $y_{is}^{* (b_{1})}$ , and the bootstrap model parameter estimate ${({\hat{α}}^{* (b_{1})}, {\hat{δ}}^{* (b_{1})}, {\hat{γ}}^{(b_{1}) T})}^{T}$ , to calculate the bootstrap version of the EB predictor, ${\hat{θ}}_{i}^{* (b_{1})}$ . Finally, define $D^{* (b_{1})} = {{\hat{θ}}_{i}^{* (b_{1})} - θ_{i}^{* (b_{1})}}^{2}$ .

4. (a) For $b_{2} = 1, \dots, B_{2}$ , independently generate the bootstrap population $y_{i}^{* * (b_{2})} = {(y_{i 1}^{* * (b_{2})}, \dots, y_{i N_{i}}^{* * (b_{2})})}^{T}$ , $i = 1, \dots, D$ from model (2):

\begin{matrix} y_{ij}^{* * (b_{2})} & \overset{ind}{~} Gamma ({\hat{α}}^{* (b_{1})}, \exp (x_{ij}^{T} {\hat{γ}}^{* (b_{1})}) u_{i}^{* * (b_{2})}), j = 1, \dots, N_{i}, i = 1, \dots, D \\ u_{i}^{* * (b_{2})} & \overset{iid}{~} Gamma ({\hat{δ}}^{* (b_{1})}, {\hat{δ}}^{* (b_{1})}), i = 1, \dots, D . \end{matrix}

(We set B₂= 1 to employ the simpler double-bootstrap.)

(b) Calculate the bootstrap version of a small area parameter $θ_{i}^{* * (b_{2})}$ with the bootstrap population $y_{i}^{* * (b_{2})}$ , the EB predictor ${\hat{θ}}_{i}^{* * (b_{2})}$ with the bootstrap sample $y_{is}^{* * (b_{2})}$ , and $D^{* * (b_{2})} = {{\hat{θ}}_{i}^{* * (b_{2})} - θ_{i}^{* * (b_{2})}}^{2}$ .

5. Finally, define the single-stage and the double-bootstrap MSE estimators as

\begin{matrix} {mse}_{i}^{S} = \frac{1}{B_{1}} \sum_{b_{1} = 1}^{B_{1}} D^{* (b_{1})}, \end{matrix}

(17)

and

\begin{matrix} {mse}_{i}^{D} = 2 {mse}_{i}^{S} - \frac{1}{B_{1}} \sum_{b_{1} = 1}^{B_{1}} {mse}_{i}^{* * (b_{1})} . \end{matrix}

(18)

Note that the single-stage MSE estimator is comparable to Equation (12) and the (simpler) double-stage MSE estimator to the proposed bias-corrected MSE estimators.

4. Extension of SAE Gamma-Gamma Model Under Informative Sampling

We extend the gamma-gamma model to an informative sampling design. We utilize well-known relationships among the population, sample, and sample-complement distributions of y_ij established in Pfeffermann and Sverchkov (2007). We assume the same model for the first moment of the sampling weight in Pfeffermann and Sverchkov (2007). Our approach differs from that of Berg and Eideh (2024) because we derive the exact complement distribution instead of the population distribution.

For completeness, we restate key relationships defined in Pfeffermann and Sverchkov (2007) with respect to the second-stage unit y_ij and the corresponding sampling weight as w_ij. These are given by

\begin{matrix} f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 1) = \frac{λ_{ij ∣ u}}{λ_{ij ∣ u, y}} f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1) \end{matrix}

(19)

and

\begin{matrix} f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 0) = \frac{λ_{ij ∣ u, y} - 1}{λ_{ij ∣ u} - 1} f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 1), \end{matrix}

(20)

where I_i and I_ij are the sample indicators for area i and unit j within area i, respectively; $w_{ij} = 1 / P (j \in s_{i})$ is the sampling weight, where $P (j \in s_{i})$ denotes the selection probability of unit j; $s_{i} = {j : I_{ij} = 1}$ represents the collection of indices for the sampled units in area i; $λ_{ij ∣ u, y} = E (w_{ij} ∣ x_{ij}, u_{i}, y_{ij}, I_{i} = 1, I_{ij} = 1)$ ; and $λ_{ij ∣ u} = E (w_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 1)$ . These relationships imply that we can deduce adequate information about other distributions from observed units and their weights. We consider only an informative selection within the sampled areas, implying that all areas are selected such that, I_i= 1 for $i = 1, \dots, D$ .

For the complex design, we suppose the sample distribution is given by

\begin{matrix} y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 1 & \overset{ind}{~} Gamma (α_{s}, \exp (x_{ij}^{T} γ_{s}) u_{i}), j = 1, \dots, n_{i}, \end{matrix}

(21)

where $u_{i} \overset{iid}{~} Gamma (δ_{s}, δ_{s})$ , $i = 1, \dots, D$ . Here, the model parameters in Equation (21) are differentiated from those of the population distribution (2) using the subscript s. Motivated by Pfeffermann and Sverchkov (2007), we assume that the expected values of the sampling weight satisfies

\begin{matrix} E (w_{ij} ∣ x_{ij}, u_{i}, y_{ij}, I_{i} = 1, I_{ij} = 1) = E (w_{ij} ∣ x_{ij}, y_{ij}, I_{i} = 1, I_{ij} = 1) \\ = κ_{i} \exp (x_{ij}^{T} a - ζ y_{ij}), \end{matrix}

(22)

for ζ > 0 where $κ_{i} = N_{i}^{- 1} \sum_{j = 1}^{N_{i}} \exp (- x_{ij}^{T} a + ζ y_{ij})$ . Because κ_i is a population average, it is nearly a constant. As in Pfeffermann and Sverchkov (2007), we regard the unknown κ_i as fixed constants to be estimated, separately from the other model parameters. Denote the collection of fixed model parameters by $ψ^{INFO} = {(α_{s}, δ_{s}, γ_{s}^{T}, a^{T}, ζ, κ^{T})}^{T}$ , where $κ = {(κ_{1}, \dots, κ_{D})}^{T}$ . Then, using the relationship Equation (20), the following sample-complement distributions under the informative sample scheme can be derived as

\begin{matrix} f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 0; ψ^{INFO}) = \frac{λ_{ij}}{λ_{ij} - 1} f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1) \\ - \frac{1}{λ_{ij} - 1} f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 1), \end{matrix}

(23)

where

\begin{matrix} λ_{ij} = E (w_{ij} ∣ x_{ij}, I_{i} = 1, I_{ij} = 1) \\ = κ_{i} \exp (a^{T} x_{ij}) E [\exp (- ζ y_{ij}) ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 1] \\ = κ_{i} \exp (a^{T} x_{ij}) {(1 + \frac{ζ}{η_{s, ij}})}^{- α_{s}}, \end{matrix}

$η_{s, ij} = \exp (x_{ij}^{T} γ_{s}) u_{i}$ , and the population distribution $f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1)$ is given by

\begin{matrix} f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1) \propto λ_{ij | u, y} \times f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 1) ∵ (19) \\ = \frac{{(η_{s, ij} + ζ)}^{α_{s}}}{Γ (α_{s})} y_{ij}^{α_{s} - 1} \exp (- y_{ij} (η_{ij} + ζ)) . \end{matrix}

When the observed values are not related to sampling probability (i.e., ζ= 0), the population and sample-complement distribution, $f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1)$ and $f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 0)$ , are the same as the sample distribution $f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 1)$ . In this case, the algorithm for the (empirical) best predictor is identical to that in Section 2.1.

However, under the informative design, that is $ζ \neq 0$ , we need to reflect the informative sampling scheme by using the sample-complement distribution Equation (23). The procedure requires an estimator of $ψ^{INFO}$ . We define ${({\hat{α}}_{s}, {\hat{δ}}_{s}, {\hat{γ}}_{s}^{T})}^{T}$ to be the maximum likelihood estimator under the sample model. The estimator of ${(a^{T}, ζ, κ)}^{T}$ is obtained by minimizing

\begin{matrix} SSE (a^{T}, ζ, κ^{T}) = \sum_{i = 1}^{D} \sum_{j = 1}^{n_{i}} {w_{ij} - κ_{i} \exp (x_{ij}^{T} a - ζ y_{ij})}, \end{matrix}

(24)

as in Pfeffermann and Sverchkov (2007). The procedure for the best predictor under the informative design is then implemented as follows. For $ℓ = 1, \dots, L$ , repeat the following steps:

Generate $u_{i}^{(ℓ)} ~ Gamma (n_{i} α_{s} + δ_{s}, \sum_{j = 1}^{n_{i}} y_{ij} \exp (x_{ij}^{T} γ_{s}) + δ_{s})$ , $i = 1, \dots, D$ .

Generate $y_{ij}^{* (ℓ)} ~ Gamma (α_{s}, \exp (x_{ij}^{T} γ_{s} u_{i}^{(ℓ)}) + ζ)$ , $j = n_{i} + 1, \dots N_{i}$ .

Define

\begin{matrix} {\hat{θ}}_{i, INFO}^{(ℓ)} = h (y_{is}, y_{i (n_{i + 1})}^{* (ℓ)}, \dots, y_{i N_{i}}^{* (ℓ)}) . \end{matrix}

Then, the empirical best predictor of the area parameter is defined as

\begin{matrix} {\hat{θ}}_{i}^{EB_INFO} = \frac{1}{L} \sum_{ℓ = 1}^{L} {\hat{θ}}_{i, INFO}^{(ℓ)} . \end{matrix}

(25)

Remark 1. In the above procedure for obtaining the EBP under informative sampling, we use the population distribution, $f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1)$ , as an approximation for the sample-complement distribution, $f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 0)$ , assuming sufficiently large λ_ij. Simulating from the population distribution offers a computational advantage, as it follows a standard gamma distribution.

Remark 2. In the simulation study, we also use inversion sampling to generate $y_{ij}^{* (ℓ)}$ from the exact sample-complement distribution $f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 0; {\hat{ψ}}^{INFO})$ in Step 2 of the EBP procedure. Specifically, we decompose Step 2 into two steps as follows:

Generate $q_{ij} \overset{iid}{~} U (0, 1)$ , $j = n_{i + 1} . \dots, N_{i}$ .

Set $y_{ij}^{* (ℓ)} = {\hat{F}}^{- 1} (q_{ij} ∣ x_{ij}, u_{i}^{(ℓ)}, I_{i} = 1, I_{ij} = 0; {\hat{ψ}}^{INFO})$ , $j = n_{i + 1} . \dots, N_{i}$ , where

\begin{matrix} F (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 0; {\hat{ψ}}^{INFO}) = \int_{0}^{y_{ij}} f (y_{ij} ∣ x_{ij}, u_{i}, I_{i} = 1, I_{ij} = 0) d y_{ij} \\ = \frac{λ_{ij ∣ y}}{λ_{ij ∣ y} - 1} A_{ζ} - \frac{1}{λ_{ij ∣ y} - 1} A_{0}, \end{matrix}

$A_{ζ} = γ {α_{s}, (η_{s, ij} + ζ) y_{ij}} / Γ (α_{s})$ and $γ (a, x) = \int_{0}^{x} t^{a - 1} e x p (- t) d t$ is the incomplete gamma function.

Based on the mean weight model in Equation (22), we extend the MSE estimators from Section 3.1 to quantify the prediction error associated with ${\hat{θ}}_{i}^{EB_INFO}$ under informative sampling. As in Section 3.1, we estimate the leading term M_1i by the sample variance of the simulated small area parameters from the prediction algorithm, given by

{\hat{M}}_{1 i}^{INFO} = \frac{1}{L - 1} \sum_{l = 1}^{L} {({\hat{θ}}_{i, INFO}^{(l)} - {\hat{θ}}_{i}^{EB_INFO})}^{2} .

However, because the weight model in Equation (22) is specified only through its first moment, a fully parametric bootstrap procedure is not directly feasible for estimating the second component M_2i (Berg and Eideh 2024). To account for the variability induced by replacing the fixed model parameters with their estimates in the predictor, we adapt the procedure proposed by Cho and Berg (2022) within our framework:

Compute the jackknife variance estimator for ${\hat{ψ}}^{INFO}$ based on Shao (2003):

{\hat{V}}_{Jack} = \frac{D - 1}{D} \sum_{d = 1}^{D} ({\hat{ψ}}^{(d), INFO} - {\bar{ψ}}^{INFO}) {({\hat{ψ}}^{(d), INFO} - {\bar{ψ}}^{INFO})}^{T},

where ${\hat{ψ}}^{(d), INFO}$ denotes the estimator of $ψ^{INFO}$ with the d-th area omitted, and ${\bar{ψ}}^{INFO} = D^{- 1} \sum_{d = 1}^{D} {\hat{ψ}}^{(d), INFO}$ is their average across all areas.

For $b = 1, \dots, B$ , implement the prediction algorithm defined in the current section with ${\hat{ψ}}^{(b)} \overset{iid}{~} N (\hat{ψ}, {\hat{V}}_{Jack})$ to obtain the bootstrap version of the EBP, denoted as, ${\hat{θ}}_{i}^{EB_INFO, (b)} = L^{- 1} \sum_{l = 1}^{L} {\hat{θ}}_{i, INFO}^{(l, b)}$ , where ${\hat{θ}}_{i, INFO}^{(l, b)}$ is the l-th simulated value of the small area parameter in the b-th bootstrap iteration.

Calculate the estimator of the second component as:

\begin{matrix} {\hat{M}}_{2 i}^{INFO} = B^{- 1} \sum_{b = 1}^{B} {({\hat{θ}}_{i}^{EB_INFO, (b)} - {\hat{θ}}_{i}^{EB_INFO})}^{2} . \end{matrix}

Then, the MSE estimator without the bias-correction for ${\hat{M}}_{1 i}^{EB_INFO}$ can be defined as ${\hat{mse}}_{i}^{no_BC, INFO} = {\hat{M}}_{1 i}^{INFO} + {\hat{M}}_{2 i}^{INFO}$ . For each b in Step 2, we can also obtain the bootstrap version of the leading term as ${\hat{M}}_{1 i}^{INFO, (b)} = {(L - 1)}^{- 1} \sum_{l = 1}^{L} {({\hat{θ}}_{i, INFO}^{(l, b)} - {\hat{θ}}_{i}^{EB_INFO, (b)})}^{2}$ . Analogous to the approach in Section 3.1, we define the bias-corrected MSE estimators for ${\hat{θ}}_{i}^{EB_INFO}$ by replacing ${\hat{M}}_{1 i}$ , ${\hat{M}}_{2 i}$ , and ${\bar{M}}_{1 i}^{* B}$ with their counterparts under informative sampling:

{\hat{M}}_{1 i}^{INFO}, {\hat{M}}_{2 i}^{INFO}, and {\bar{M}}_{1 i}^{* B, INFO} = \frac{1}{B} \sum_{b = 1}^{B} {\hat{M}}_{1 i}^{INFO, (b)} .

Then, denoted them as ${\hat{mse}}_{i}^{method, INFO}$ , where $method \in {Add, Mult, HM, Comp}$ , accordingly.

5. Simulation Study

5.1. Simulation Study 1: Comparison of Predictors

We evaluate the performance of gamma-gamma predictors introduced in Sections 2 and 4 under multiple scenarios. We modified the simulation setup of Pfeffermann and Sverchkov (2007) tailored to our framework. For each scenario, we consider D= 60 areas, each with population size N_i= 1000 and stratify the areas into two strata, where Stratum H₁ and Stratum H₂ are composed of areas $1 \leq i \leq 30$ and $31 \leq i \leq D$ , respectively. Assign the area sample size of n_i= 25 if $i \in H_{1}$ and n_i= 50 if $i \in H_{2}$ . Samples of size n_i are selected independently across the areas using systematic sampling. Then, for the population units, we simulate the response variables y_ij in conjunction with the values of the auxiliary variable $x_{ij} ~ U (0, 2)$ for $j = 1, \dots, N_{i}$ , where x_ij are held constant throughout MC simulations.

In this simulation study, we further examine the robustness of the gamma-gamma predictors against model misspecification. Each scenario is generated from a combination of the population model for a response variable y with skewness being adjusted through the shape parameter in a gamma distribution (α) and the inclusion probability π_ij model with whether an informative sampling design is used ( $ζ \in {0, 0.25}$ ):

Population model for y, with $α \in {2.5, 5}$ adjusting the degree of skewness:

Gamma-Gamma model (GGM)	Gamma GLMM (GLMM)
$\begin{matrix} y_{ij} \| u_{i} ~ Gamma (α, \exp (1 + 0.5 x_{ij}) u_{i}) \\ u_{i} ~ Gamma (4, 4) \end{matrix}$	$\begin{matrix} y_{ij} \| ν_{i} & ~ Gamma (α, \frac{α}{μ_{ij}}) \\ \log (μ_{ij}) & = 0.5 + 0.05 x_{ij} + ν_{i} \\ ν_{i} & ~ N (0, {0.1}^{2}) \end{matrix}$

Inclusion probability π_ij model for systematic sampling, $ζ \in {0, 0.25}$ :

Exponential Form (Exp)	Linear Form (Linear)
$\begin{matrix} π_{ij} = \frac{n_{i} \exp (0.05 x_{ij} + ζ y_{ij} + τ_{ij} / 30)}{\sum_{k = 1}^{N_{i}} \exp (0.05 x_{ik} + ζ y_{ik} + τ_{ik} / 30)} \\ τ_{ij} \overset{iid}{~} Gamma (4, 4) \end{matrix}$	$\begin{matrix} π_{ij} = \frac{n_{i} {1 + x_{ij} + \exp (ζ) y_{ij} + γ_{ij} / 20}}{\sum_{k = 1}^{N_{i}} {1 + x_{ij} + \exp (ζ) y_{ij} + γ_{ij} / 20}} \\ γ_{ij} \overset{iid}{~} N (0, 1) \end{matrix}$

Note that the skewness of the distribution decreases as α increases, and we set $\exp (ζ) = 0$ in the linear inclusion probability model when ζ= 0 to implement the noninformative sampling scheme.

For a given scenario, in the m-th MC iteration, $m = 1, \dots, M (10000)$ , we compute the considered predictors, ${\hat{θ}}_{i}^{pred, (m)}$ , $pred \in {EB, EB_INFO, EB_GLMM, PI}$ , where EB, $EB_INFO$ , $EB_GLMM$ , and PI are defined in Equations (6), (25), (9), and (10), respectively. In addition to the small area mean parameter $\bar{Y}$ , we take into account three non-additive small area parameters. The first two are the 25-th and 75-th sample quantiles, denoted as Q_0.25 and Q_0.75, respectively. These are calculated through the function quantile in R with the default method. The third is the Gini Coefficient (abbreviated Gini) defined as

{Gini}_{i} = \frac{\sum_{k = 1}^{N_{i}} \sum_{ℓ = 1}^{N_{i}} ∣ y_{ik} - y_{i ℓ} ∣}{2 N_{i}^{2} {\bar{Y}}_{i}},

where the value is obtained by the function gini of R package reldist (Handcock 2023).

We compare the predictors using the relative bias (RB) and RRMSE. The RB and RRMSE for the predictor “pred” are defined as

\begin{matrix} {RB}_{i} = \frac{M^{- 1} \sum_{m = 1}^{M} ({\hat{θ}}_{i}^{pred, (m)} - θ_{i}^{(m)})}{M^{- 1} \sum_{m = 1}^{M} θ_{i}^{(m)}} and RRMS E_{i} = \frac{\sqrt{M^{- 1} \sum_{m = 1}^{M} {({\hat{θ}}_{i}^{pred, (m)} - θ_{i}^{(m)})}^{2}}}{M^{- 1} \sum_{m = 1}^{M} θ_{i}^{(m)}}, \end{matrix}

where $θ_{i}^{(m)} = h (y_{i 1}^{(m)}, \dots, y_{i N_{i}}^{(m)})$ denotes the area population small area parameter obtained in MC simulation m and ${\hat{θ}}_{i}^{pred, (m)}$ denotes the corresponding predictor.

The averages of RB and RRMSE (in %) for areas are shown in Tables 1 and 2 for α= 2.5 and 5, respectively. Under noninformative sampling (ζ= 0), EB remains competitive with the alternatives, even when the population and inclusion probability models are misspecified. EB has the RB closest to zero in all scenarios and the RRMSE is smallest at both α= 2.5 and 5. EB_GLMM is robust to the linear inclusion probability model, but not when the population model is generated under the gamma-gamma model, as RB exceeds 5% when α= 5, and the RRMSE is much larger compared to EB or EB_INFO. Conversely, EB_INFO performs similarly to EB, with the RB of EB_INFO uniformly below 1% in absolute value and the RRMSE lower than that of EB_GLMM when the population model is misspecified. Under informative sampling ( $ζ \neq 0$ ), EB_INFO consistently achieves the smallest RBs and RRMSEs across all scenarios. We also computed the EBP using the exact sample-complement distribution via inverse sampling, as described in Remark 2. The corresponding values are shown in parentheses in Tables 1 and 2. The results indicate that the predictors based on the population and sample-complement distributions yield almost identical performance. The RBs of other predictors that ignore unequal sample probabilities increase significantly. PI has significant biases for nonlinear small area parameters. This occurs because the PI predictor replaces a non-sampled unit with its estimated conditional mean. Therefore, the 25 th quantile is predicted to be greater than the actual value, while the 75 th quantile is predicted to be lower.

Table 1.

RB (%) and RRMSE (%) for EB, EB_INFO, EB_GLMM, PI of Considered Small Area Parameters by the Informative Sample Design at α= 2.5.

				RB (%)				RRMSE (%)
Parameter	ζ	wt.model	pop.model	EB	EB_INFO	EB_GLMM	PI	EB	EB_INFO	EB_GLMM	PI
$\bar{Y}$	0.00	Exp	Gam	0.02	0.01 (0.01)	−4.11	−2.31	12.84	12.84 (12.84)	31.89	14.38
			GLMM	−0.01	−0.02 (−0.02)	−0.25	−0.62	7.51	7.51 (7.51)	7.44	7.43
		Linear	Gam	0.01	0.65 (0.67)	−4.09	−2.32	12.80	12.91 (12.92)	31.67	14.31
			GLMM	0.01	0.32 (0.33)	−0.23	−0.60	7.52	7.53 (7.53)	7.45	7.44
	0.25	Exp	Gam	14.52	0.53 (−0.02)	7.46	11.49	38.68	10.90 (10.97)	25.96	28.78
			GLMM	21.38	0.76 (−0.05)	21.03	20.51	22.75	6.76 (6.67)	22.32	21.83
		Linear	Gam	16.93	0.38 (−0.28)	11.02	14.10	29.21	16.90 (17.92)	32.41	22.51
			GLMM	21.45	1.41 (0.61)	21.13	20.70	22.70	7.15 (7.01)	22.35	21.93
Q _0.25	0.00	Exp	Gam	0.02	0.01 (0.01)	−4.86	47.85	13.31	13.30 (13.30)	32.96	56.31
			GLMM	−0.00	−0.02 (−0.02)	−0.09	80.80	8.06	8.06 (8.06)	8.03	81.45
		Linear	Gam	0.03	0.56 (0.58)	−4.82	48.08	13.16	13.23 (13.23)	32.55	56.49
			GLMM	0.02	0.34 (0.35)	−0.06	80.76	8.08	8.09 (8.09)	8.05	81.43
	0.25	Exp	Gam	12.76	0.84 (0.48)	4.89	63.71	37.15	11.73 (11.63)	26.65	90.95
			GLMM	21.33	0.62 (−0.03)	21.12	117.96	22.97	7.54 (7.49)	22.69	118.82
		Linear	Gam	15.59	1.70 (1.26)	9.24	64.41	28.19	16.00 (16.37)	33.16	82.06
			GLMM	26.34	5.34 (4.67)	27.13	123.15	27.63	9.45 (9.06)	28.35	123.81
Q _0.75	0.00	Exp	Gam	0.02	0.00 (0.00)	−4.06	−8.50	13.00	13.00 (13.00)	32.28	18.50
			GLMM	−0.02	−0.03 (−0.03)	−0.29	−23.08	7.66	7.66 (7.66)	7.59	24.35
		Linear	Gam	0.01	0.64 (0.66)	−4.04	−8.35	12.97	13.07 (13.07)	31.95	18.31
			GLMM	−0.01	0.31 (0.32)	−0.27	−23.07	7.67	7.68 (7.68)	7.60	24.34
	0.25	Exp	Gam	14.45	0.51 (0.01)	7.36	5.81	38.84	11.22 (11.27)	26.75	23.46
			GLMM	21.37	0.72 (−0.06)	20.99	−6.24	22.81	6.94 (6.86)	22.36	9.36
		Linear	Gam	16.64	0.21 (−0.38)	10.74	9.43	28.98	17.22 (18.05)	33.02	18.07
			GLMM	20.62	0.70 (−0.06)	20.10	−7.97	21.97	7.19 (7.14)	21.43	10.68
Gini	0.00	Exp	Gam	−0.00	−0.01 (−0.01)	0.66	−53.45	2.39	2.39 (2.39)	2.59	53.52
			GLMM	−0.01	−0.01 (−0.01)	−0.17	−90.15	2.37	2.37 (2.37)	2.53	90.19
		Linear	Gam	−0.02	0.05 (0.05)	0.65	−53.61	2.40	2.40 (2.40)	2.58	53.69
			GLMM	−0.01	−0.01 (−0.01)	−0.18	−90.08	2.39	2.39 (2.39)	2.54	90.13
	0.25	Exp	Gam	1.42	0.14 (0.03)	2.12	−49.86	2.76	2.41 (2.44)	3.28	49.94
			GLMM	0.04	0.17 (−0.02)	−0.07	−89.18	2.35	2.35 (2.34)	2.49	89.22
		Linear	Gam	1.16	−0.66 (−0.80)	1.51	−47.59	2.65	2.58 (2.68)	2.91	47.67
			GLMM	−4.20	−4.02 (−4.21)	−5.16	−93.14	4.81	4.66 (4.82)	5.72	93.17

Table 2.

RB (%) and RRMSE (%) for EB, EB_INFO, EB_GLMM, PI of Considered Small Area Parameters by the Informative Sample Design at α= 5.

				RB (%)				RRMSE (%)
Parameter	ζ	wt.model	pop.model	EB	EB_INFO	EB_GLMM	PI	EB	EB_INFO	EB_GLMM	PI
$\bar{Y}$	0.00	Exp	Gam	0.01	0.01 (0.00)	−5.91	−1.91	9.00	9.00 (9.00)	34.41	10.73
			GLMM	0.01	0.00 (0.00)	−0.21	−0.44	6.07	6.07 (6.07)	6.11	6.08
		Linear	Gam	0.00	0.51 (0.53)	−5.90	−1.92	9.08	9.16 (9.17)	34.56	10.83
			GLMM	−0.00	0.22 (0.23)	−0.22	−0.46	6.09	6.09 (6.09)	6.12	6.09
	0.25	Exp	Gam	13.71	−1.51 (0.72)	4.78	11.24	30.15	19.44 (12.00)	25.96	22.49
			GLMM	9.64	0.34 (−0.02)	9.38	9.10	11.36	5.70 (5.68)	11.03	10.78
		Linear	Gam	11.48	0.61 (0.14)	4.01	9.27	18.79	12.69 (13.43)	33.57	13.98
			GLMM	10.74	0.90 (0.51)	10.49	10.23	12.30	5.87 (5.82)	12.01	11.77
Q _0.25	0.00	Exp	Gam	0.03	0.02 (0.02)	−7.12	20.41	9.29	9.29 (9.29)	35.75	24.27
			GLMM	0.00	−0.01 (−0.01)	−0.29	43.93	6.34	6.34 (6.34)	6.38	44.48
		Linear	Gam	−0.00	0.41 (0.42)	−7.09	20.52	9.29	9.34 (9.34)	35.73	24.36
			GLMM	−0.01	0.21 (0.22)	−0.30	43.87	6.36	6.37 (6.37)	6.41	44.43
	0.25	Exp	Gam	11.73	−1.12 (1.17)	1.50	32.67	28.20	19.95 (12.02)	27.29	46.99
			GLMM	9.61	0.30 (−0.02)	9.23	57.31	11.52	6.07 (6.06)	11.10	57.83
		Linear	Gam	10.47	1.64 (1.35)	1.76	30.36	17.98	11.71 (11.92)	34.99	37.55
			GLMM	12.22	2.21 (1.86)	12.01	61.10	13.78	6.56 (6.44)	13.53	61.58
Q _0.75	0.00	Exp	Gam	0.01	0.01 (0.01)	−5.72	−4.98	9.17	9.17 (9.17)	34.74	12.84
			GLMM	0.01	0.00 (0.00)	−0.19	−18.64	6.18	6.18 (6.18)	6.21	19.75
		Linear	Gam	0.00	0.52 (0.54)	−5.71	−4.85	9.26	9.34 (9.35)	34.81	12.85
			GLMM	−0.00	0.22 (0.23)	−0.20	−18.67	6.19	6.19 (6.19)	6.22	19.78
	0.25	Exp	Gam	14.05	−1.67 (0.70)	5.17	9.09	30.70	20.28 (12.34)	26.84	20.28
			GLMM	9.64	0.34 (−0.03)	9.41	−10.62	11.41	5.81 (5.78)	11.11	12.17
		Linear	Gam	11.63	0.44 (−0.01)	4.30	7.30	19.03	13.14 (13.78)	34.23	12.39
			GLMM	10.29	0.49 (0.11)	10.03	−10.83	11.95	5.91 (5.89)	11.66	12.42
Gini	0.00	Exp	Gam	−0.03	−0.03 (−0.03)	1.83	−41.99	2.42	2.42 (2.42)	3.16	42.07
			GLMM	0.01	0.01 (0.01)	0.14	−88.46	2.50	2.50 (2.50)	2.58	88.51
		Linear	Gam	−0.01	0.09 (0.09)	1.82	−42.07	2.44	2.45 (2.45)	3.14	42.16
			GLMM	0.01	0.01 (0.01)	0.14	−88.43	2.50	2.50 (2.50)	2.59	88.48
	0.25	Exp	Gam	2.32	0.18 (0.13)	4.24	−37.52	3.35	4.11 (3.04)	5.05	37.62
			GLMM	0.04	0.08 (−0.00)	0.25	−87.84	2.51	2.51 (2.50)	2.60	87.89
		Linear	Gam	1.11	−0.91 (−1.07)	2.88	−37.53	2.66	2.75 (2.89)	3.91	37.62
			GLMM	−2.52	−2.44 (−2.54)	−2.60	−91.15	3.54	3.48 (3.55)	3.64	91.20

Remark 3. We set L= 200 Monte Carlo replicates for calculating the predictors, based on the sensitivity analysis of the relative root mean squared error (RRMSE) of EB_INFO under the gamma-gamma population model with an exponential inclusion probability, as shown in Figure 1. While the RRMSE decreases with increasing L, the improvement becomes negligible beyond L= 200. Considering this trend alongside computational cost, we chose L= 200 for our simulation studies. For practical applications, users can perform a grid search over a range of L values and select the one that minimizes cross-validated prediction error, balancing accuracy and computational demands according to their specific dataset and resources.

Figure 1.

RRMSE (%) of EB_INFO under the gamma-gamma population model with an exponential inclusion probability, as L ranges from 25 to 1,000.

5.2. Simulation 2: Comparison of MSE Estimators

In Simulation 1, we observed that EB performs well even when the model is misspecified under noninformative sampling. Therefore, we focus on the performance of the MSE estimators in the scenario where the simulation data are generated from the gamma-gamma model with $α \in {1, 2.5, 5}$ , and the inclusion probability follows an exponential form. Due to computational intensity, we set N_i= 250 for all areas. The predictors and MSE estimators are constructed based on the assumptions of the gamma-gamma model.

We evaluate the MSE estimators described in Section 3 on the basis of two criteria. The first is a measure of the relative bias of the MSE estimator as a measure of the unconditional MSE of the predictor. This is defined as

\begin{matrix} {RB}_{A}^{uncond} = \frac{{(MD)}^{- 1} \sum_{i = 1}^{D} \sum_{m = 1}^{M} {mse}_{i}^{(A, m)} - {MSE}^{UCond}}{{MSE}^{UCond}}, \end{matrix}

where ${mse}_{i}^{(A, m)}$ is the type A MSE estimator obtained in MC simulation m,

A \in {noBC, Add, Mult, HM, Comp, S, D},

and ${MSE}^{UCond} = {(MD)}^{- 1} \sum_{i = 1}^{D} \sum_{m = 1}^{M} {({\hat{θ}}_{i}^{EB, (m)} - θ_{i}^{(m)})}^{2}$ . Here, noBC, Add, Mult, HM, and Comp corresponds to MSE estimators defined in Equations (13) to (16) and S and D are defined in Equations (17) and (18), respectively. Note that the proposed estimators can also be regarded as the estimators for the conditional MSE defined as $E {{({\hat{θ}}_{i}^{EB} - θ_{i})}^{2} ∣ y_{is}}$ . Thus, we define the conditional RB as

\begin{matrix} {RB}_{A}^{cond} = \frac{{(MD)}^{- 1} \sum_{i = 1}^{D} \sum_{m = 1}^{M} {mse}_{i}^{(A, m)} - {MSE}^{Cond}}{{MSE}^{Cond}}, \end{matrix}

where

\begin{matrix} {MSE}^{Cond} = D^{- 1} \sum_{i = 1}^{D} ({\bar{M}}_{1 i} + {\bar{M}}_{2 i}), \end{matrix}

${\bar{M}}_{1 i} = M^{- 1} \sum_{m = 1}^{M} {({\hat{θ}}_{i}^{BP, (m)} - {\hat{θ}}_{i}^{(m)})}^{2}$ and ${\bar{M}}_{2 i} = M^{- 1} \sum_{m = 1}^{M} {({\hat{θ}}_{i}^{EB, (m)} - {\hat{θ}}_{i}^{BP, (m)})}^{2}$ . Here, ${\hat{θ}}_{i}^{BP, (m)}$ is the best predictor, as defined in Equation (4), in the m-th MC simulation. Lohr and Rao (2009) evaluate the conditional relative bias of the MSE estimators in their simulations, and Booth and Hobert (1998) discuss the value of the conditional MSE estimators in prediction problems.

Several of the MSE estimators described in Section 3 include adjustments to correct for the bias in the estimation of the leading term. We therefore check whether there exists a bias for the leading term estimators. For this, define a test-statistic as

\begin{matrix} T^{Bias} = \frac{\bar{ω}}{s d_{ω} / \sqrt{M}}, \end{matrix}

where $ω^{(m)} = D^{- 1} \sum_{d = 1}^{D} ({\hat{M}}_{1 i}^{(m)} - {\bar{M}}_{1 i})$ , $\bar{ω} = M^{- 1} \sum_{m = 1}^{M} ω^{(m)}$ , and

{sd}_{ω} = \sqrt{{(M - 1)}^{- 1} \sum_{m = 1}^{M} {(ω^{(m)} - \bar{ω})}^{2}} .

Figure 2 displays the relative biases of the alternative MSE estimators. The single-bootstrap MSE estimator (S) has a positive bias for $\bar{Y}$ , Q_0.25, and Q_0.75, while the MSE estimator noBC consistently has relative bias close to zero. The double-bootstrap procedure (D) can over-correct this bias, producing important negative biases. For Gini, both S and D have negative biases in the conditional RB metric and the effect of bias-correction is negligible for all metrics. The bias corrections Comp and HM lead to slight increases in the estimated MSE in the unconditional RB metric. The t-statistics in Table 3 shed insight into the relative biases of the noBC, Comp, and HM MSE estimators. The estimator of the leading term does not have a significant bias for all small area parameters, except for the Gini coefficient. Therefore, the noBC MSE estimator has RB close to zero for $\bar{Y}$ , Q_0.25, and Q_0.75. For the Gini coefficient, the bias of the estimator of the leading term is important. As illustrated by the conditional relative bias of the MSE estimators, the HM and Comp bias corrections effectively correct the bias of the estimator of the leading term for the Gini coefficient.

Figure 2.

RB (%) of MSE estimators.

Table 3.

$T^{Bias}$ by the Area Parameter for Each Scenario.

	α
Parameter	1	2.5	5
$\bar{Y}$	−0.48	−0.12	−0.05
Q _0.25	−0.21	0.19	−0.35
Q _0.75	−0.39	−0.44	−0.27
Gini	−1.36	−2.45	−3.64

We performed a sensitivity analysis to determine the optimal number of bootstrap replications B, focusing on the first-round replications (B₁) for methods S and D, while fixing B₂= 1. The relative root mean square error (RRMSE) was used to evaluate the variability of MSE estimators, calculated similarly to Section 5.1. For completeness, the RRMSE of an MSE estimator with respect to the true MSE is defined as

{RRMSE}_{MSE, i}^{Type} = \frac{\sqrt{M^{- 1} \sum_{m = 1}^{M} {({\hat{mse}}_{i}^{(m), est} - {MSE}_{i}^{Type})}^{2}}}{{MSE}_{i}^{Type}},

where ${\hat{mse}}_{i}^{(m), est}$ and ${MSE}_{i}^{(m), Type}$ denote the estimated and true mean squared errors, respectively, for area i in the m-th Monte Carlo iteration. The average true MSE is given by ${MSE}_{i}^{Type} = M^{- 1} \sum_{m = 1}^{M} {MSE}_{i}^{(m), Type}$ . The subscript $est \in {noBC, Add, Mult, Comp, HM, S, D}$ specifies the method used to estimate the MSE, while $Type \in {Cond, Uncond}$ indicates whether the true MSE is conditional or unconditional.

Although Table 4 shows that the RRMSE for methods S and D decreases slightly as B increases, the magnitude of this change is minimal in our simulation setting because the second MSE component contributes very little relative to the leading term. As a result, increasing B beyond 100 yields only negligible improvement in the overall RRMSE while substantially increasing the computational burden. For this reason, and given the dominance of the leading term in our design, we set B= 100 for the simulation study, noting that users may perform their own sensitivity analyses when applying the method in practice.

Table 4.

RRMSE for Considered MSE Estimators by the Number of B Ranging from 50 to 400 with the MC Replicates L and M Being 1,000 and 1,000, Respectively.

		Type=Ucond				Type=Cond
Parameter	Method	50	100	200	400	50	100	200	400
$\bar{Y}$	noBC	2.598	2.598	2.599	2.599	2.616	2.617	2.617	2.617
	Add	2.590	2.592	2.593	2.592	2.608	2.610	2.611	2.610
	Mult	2.596	2.598	2.599	2.598	2.614	2.616	2.617	2.616
	Comp	2.593	2.595	2.596	2.595	2.611	2.613	2.614	2.613
	HM	2.594	2.597	2.597	2.597	2.612	2.615	2.615	2.615
	S	1.001	0.904	0.911	0.744	1.003	0.905	0.915	0.747
	D	3.201	2.860	2.411	2.144	3.202	2.857	2.415	2.146
Q _0.25	noBC	2.700	2.699	2.700	2.699	2.706	2.705	2.706	2.706
	Add	2.688	2.689	2.690	2.689	2.694	2.694	2.695	2.694
	Mult	2.694	2.695	2.695	2.694	2.700	2.700	2.701	2.700
	Comp	2.691	2.692	2.693	2.691	2.697	2.697	2.698	2.697
	HM	2.693	2.693	2.694	2.693	2.699	2.699	2.699	2.698
	S	1.036	0.940	0.774	0.641	1.032	0.934	0.771	0.639
	D	3.413	2.890	2.462	1.976	3.397	2.867	2.440	1.963
Q _0.75	noBC	2.587	2.587	2.588	2.587	2.599	2.599	2.599	2.599
	Add	2.588	2.591	2.591	2.590	2.600	2.603	2.602	2.602
	Mult	2.595	2.597	2.597	2.596	2.606	2.609	2.608	2.608
	Comp	2.591	2.594	2.593	2.593	2.603	2.605	2.605	2.604
	HM	2.592	2.595	2.595	2.594	2.604	2.606	2.606	2.605
	S	1.086	0.869	1.125	0.898	1.086	0.869	1.128	0.900
	D	3.521	2.919	2.927	2.487	3.516	2.914	2.931	2.486
Gini	noBC	0.074	0.073	0.073	0.072	0.073	0.071	0.071	0.071
	Add	0.104	0.103	0.103	0.103	0.103	0.102	0.102	0.101
	Mult	0.104	0.104	0.103	0.103	0.103	0.102	0.102	0.102
	Comp	0.103	0.102	0.102	0.102	0.102	0.101	0.100	0.100
	HM	0.103	0.102	0.101	0.101	0.101	0.100	0.100	0.100
	S	0.209	0.151	0.113	0.087	0.208	0.150	0.112	0.085
	D	0.454	0.322	0.231	0.167	0.453	0.321	0.230	0.166

Remark 4. We conduct a simulation study to evaluate the performance of the proposed MSE estimators for ${\hat{θ}}_{i}^{EB_INFO}$ separately, as there are no competing MSE estimators under the gamma-gamma model with informative sampling. In this study, we set N_i= 1000 and B= 400, and focus on a single scenario where α= 2.5. RBs (%) and RRMSE of the proposed MSE estimators are presented in Table 5. The RBs are around 5%, indicating that the proposed MSE estimators are reliable. We found that estimates of model parameters in Equation (22) are sensitive to the choice of optimization algorithm used in optim function of R, producing the outliers for some replicates when α= 5. This suggests that careful attention is required when choosing the optimization method.

Table 5.

RB (%) and RRMSE of the Proposed MSE Estimators for ${\hat{θ}}_{i}^{EB_INFO}$ at α= 2.5 When N_i= 1000 with Bootstrap Iterations B and MC Replicates M Being 400 and 5,000, Respectively.

	RB					RRMSE
Parameter	noBC	Add	Mult	HM	Comp	noBC	Add	Mult	HM	Comp
$\bar{Y}$	−2.26	−2.48	−1.39	−1.73	−1.94	0.702	0.729	0.731	0.728	0.728
Q _0.25	2.93	2.80	3.90	3.54	3.33	0.769	0.817	0.823	0.815	0.815
Q _0.75	−2.80	−3.03	−1.95	−2.29	−2.50	0.699	0.717	0.717	0.715	0.716
Gini	−5.15	−5.17	−4.47	−4.70	−4.83	0.090	0.127	0.128	0.123	0.125

6. Application to Iowa Soil Erosion Data

We demonstrate the utility of the methods for estimating functions of sheet and rill erosion in Iowa counties. Sheet and rill erosion quantifies the soil loss due to the flow of water. Estimates of sheet and rill erosion for local areas are important for policy and planning purposes. Small area estimation for erosion is challenging because the erosion values have skewed distributions and are often available only from complex surveys.

The survey data for our study are from the National Resources Inventory (NRI). This longitudinal survey collects numerous variables related to agriculture and natural resources. We refer the reader to Nusser and Goebel (1997) and Berg and Kim (2021) for descriptions of the design of the NRI. Sheet and rill erosion is one of the primary response variables in the NRI. Estimates of sheet and rill erosion at the state level are produced as part of the standard NRI estimation program. County level estimation is generally considered a small area estimation problem in the NRI.

The auxiliary data are derived from the Cropland Data Layer (CDL). This raster layer provides a land cover classification for every cell on a grid covering the entire United States. We use the indicator that the CDL classification is corn as the covariate. This covariate is selected because crop managements are known to impact sheet and rill erosion.

Let y_ij represent the soil erosion scaled by the sample standard deviation and x_ij denote the indicator for the cornfield at the j-th sampled unit in the i-th county in Iowa. Note that all ninety-nine counties in Iowa were sampled. Across the ninety-nine counties in Iowa, the average number of potential sampling units per county (denoted N_i) is approximately 363,778, ranging from 255,679 to 623,454. Among these, an average of 189 sampled units per county (denoted n_i) were selected in the sample, with the number of sampled units ranging from 119 to 538. The resulting sampling proportion ( $n_{i} / N_{i}$ ) has a mean of 0.052%, with values ranging from 0.037% to 0.135% across counties. Further, we regress $\log (1 / π_{ij})$ on x_ij and y_ij with areas fixed effects. The estimated coefficient for y_ij was 0.03485 with a standard error of 0.00779 (t=4.48, p-value $= 7 . 6610^{- 6}$ ), indicating statistical significance. Thus, we do consider the predictor under an informative sample design.

We fitted three models to the sampled data: the gamma-gamma model (Equation (2)), the generalized linear mixed model (GLMM; Equation (8)), and a linear mixed model (LMM) where y_ij was transformed using a Box-Cox power transformation with a lambda parameter of 0.17. Note that we have implemented a grid search to determine the optimal λ under the REML framework, using the full linear mixed model as described in Rojas-Perilla et al. (2020). To evaluate their predictive performance, we conducted repeated random subsampling cross-validation. Specifically, in each of M= 100 iterations, the data were randomly split into 90% training and 10% testing sets. The models were trained on the training data, and predictions were generated for the test set. We calculated the mean squared error (MSE) between the predicted and actual values for each model in each iteration. The average MSEs over the 100 iterations were as follows: The gamma-gamma models, using the procedure described in Section 4 and the alternative procedure in Remark 2 of Section 4, achieved MSEs of 0.4370 and 0.4350, respectively. The GLMM yielded an MSE of 1.6622, while the LMM resulted in an MSE of 0.5592. These results indicate that the gamma-gamma model provided the best predictive performance among the models considered. Therefore, it was selected as the preferred model for the sample data in our analysis, with the estimates of the model parameters being $(\hat{α}, \hat{δ}, {\hat{γ}}_{0}, {\hat{γ}}_{1}) = (0.6942, 11.1987, 0.1543, - 0.1844)$ . Among the gamma-gamma models, we specifically chose the one based on the main procedure described in Section 4 due to its computational efficiency and the practical advantage of generating y values directly from a known density.

Figure 3 shows the predicted values of the following county-level parameters: the mean $({\bar{Y}}_{i} = N_{i}^{- 1} \sum_{j = 1}^{N_{i}} y_{ij})$ and the population proportion of soil erosion values greater than the sample median in Iowa (0.447), defined as $(P_{i} = N_{i}^{- 1} \sum_{j = 1}^{N_{i}} I (y_{ij} > 0.447)), i = 1, \dots, 99$ . The north-central region of Iowa, known as the Des Moines Lobe, experiences less soil erosion compared to the western and eastern parts of the state. The Des Moines Lobe, shaped by glacial activity, has been transformed into farmland using drainage systems, making it not only highly suitable but also very productive for corn farming due to its fertile soil (Schilling et al. 2018; Secchi and Babcock 2007). The future research can focus on analyzing the causal relationship between the cropland conservation program and soil loss in this region.

Figure 3.

Prediction of the considered small area parameters with EB_INFO predictor.

We validate our analysis by comparing the direct estimator and the predicted values obtained under the gamma-gamma model without accounting for informative sampling (EB). (The direct estimators are computed using the survey::svydesign function in R.) As shown in Figure 4, the predicted values under informative sampling (EB_INFO) exhibit smaller absolute difference from the direct estimator than those under noninformative sampling, particularly in areas with larger sample sizes, for both small area parameters. Furthermore, as given in Figure 5, although both the direct estimator and EB_INFO exhibit reasonable coefficients of variation (CVs) within 20%, EB_INFO achieves smaller CVs for the nonlinear parameter (P_m) and demonstrates greater stability with fewer outliers.

Figure 4.

Absolute difference from the direct estimator by sample size quartile group for informative and noninformative predictors across two small area parameters.

Figure 5.

Comparison of the percent coefficient of variation (CV) between the direct estimator and the informative EB predictor for each small area parameter. We use ${\hat{mse}}_{i}^{Comp, INFO}$ when calculating the CV of EB_INFO based on B= 100 bootstrap replicates.

7. Conclusion

In this study, we have made several substantial contributions to the literature on small area estimation (SAE) under informative sampling designs. By performing extensive simulations, we have rigorously compared predictors under the gamma-gamma model with those derived under the gamma generalized linear mixed model (GLMM). Through sixteen distinct scenarios, we explored various levels of deviation from the underlying assumptions for the response variable y and sampling weight models, degrees of skewness in the response variable, and whether the sample design is informative. These simulations underscore the robustness of our gamma-gamma-based predictors and offer fresh insights into their relative performance across complex, realistic conditions.

Specifically, our results support the gamma-gamma model more than the gamma GLMM when dealing with skewed responses and informative designs. Additionally, we evaluated multiple MSE estimators for the gamma-gamma model, including one not previously paired with this framework. All relevant codes for the simulation study have been made publicly available to ensure reproducibility. The codes can be accessed at https://github.com/yhcho11/Gamma-Gamma-Informative/tree/main. Furthermore, we present a hierarchical formulation of the gamma-gamma model that simplifies the computational burden associated with marginal distributions used in earlier work. We also generalize the procedures of Hobza et al. (2020) to predict general small area parameters—a broader class of parameters than additive small area parameters, increasing the applicability and flexibility of our approach.

Despite these advancements, our work relies on the assumption that the covariates x_ij are known for all elements of the population. While this condition is satisfied if covariates are sourced from a comprehensive census or administrative database, we acknowledge that it may be restrictive in some settings.

Building on our findings that the gamma distribution provides a robust framework for constructing small area predictors under skewed response variables, we plan to extend our approach to more complex sampling frameworks, including two-stage informative designs that consider non-sampled areas. Considering incomplete auxiliary information is another potential future research direction. Such endeavors will further broaden the applicability, robustness, and practical relevance of our gamma-gamma model–based SAE methods.

Supplemental Material

sj-pdf-1-jof-10.1177_0282423X261424813 – Supplemental material for Comparison of Small Area Procedures Based on Gamma Distributions with Extension to Informative Sampling

Supplemental material, sj-pdf-1-jof-10.1177_0282423X261424813 for Comparison of Small Area Procedures Based on Gamma Distributions with Extension to Informative Sampling by Yanghyeon Cho and Emily Berg in Journal of Official Statistics

Footnotes

Acknowledgements

We sincerely appreciate the valuable feedback from the Associate Editor and three anonymous reviewers.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the U.S. Department of Agriculture’s National Resources Inventory, Cooperative Agreement NR203A750023C006, Great Rivers CESU 68-3A75-18-504.

ORCID iDs

Yanghyeon Cho

Emily Berg

Supplemental Material

Supplemental material for this article is available online.

Received: March 15, 2024

Accepted: January 20, 2026

References

Battese

G. E.

Harter

R. M.

Fuller

W. A.

1988. “An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data.”Journal of the American Statistical Association 83 (401): 28–36. DOI: https://doi.org/10.1080/01621459.1988.10478561.

Berg

Chandra

2014. “Small Area Prediction for a Unit-Level Lognormal Model.”Computational Statistics & Data Analysis 78: 159–75. DOI: https://doi.org/10.1016/j.csda.2014.03.007.

Berg

Chandra

Chambers

2016. “Small Area Estimation for Lognormal Data.” In Analysis of Poverty Data by Small Area Estimation, edited by Pratesi

John Wiley & Sons, Ltd.

Berg

Eideh

2024. “Small Area Prediction for Exponential Dispersion Families Under Informative Sampling.”Journal of Survey Statistics and Methodology 12 (4): 1081–105. DOI: https://doi.org/10.1093/jssam/smae018.

Berg

Kim

J.-K.

2021. “An Approximate Best Prediction Approach to Small Area Estimation for Sheet and Rill Erosion Under Informative Sampling.”The Annals of Applied Statistics 15 (1): 102–25. DOI: https://doi.org/10.1214/20-AOAS1388.

Booth

J. G.

Hobert

J. P.

1998. “Standard Errors of Prediction in Generalized Linear Mixed Models.”Journal of the American Statistical Association 93 (441): 262–72. DOI: https://doi.org/10.1080/01621459.1998.10474107.

Cho

Berg

2022. “Alternative Mean Square Error Estimators and Confidence Intervals for Prediction of Nonlinear Small Area Parameters.”https://arxiv.org/abs/2210.12221.

Dreassi

Petrucci

Rocco

2014. “Small Area Estimation for Semicontinuous Skewed Spatial Data: An Application to the Grape Wine Production in Tuscany.”Biometrical Journal 56 (1): 141–56. DOI: https://doi.org/10.1002/bimj.201200271.

Erciulescu

A. L.

Fuller

W. A.

2014. “Parametric Bootstrap Procedures for Small Area Prediction Variance.”Proceedings of the Survey Research Methods Section. American Statistical Association.

10.

Graf

Marín

J. M.

Molina

2019. “A Generalized Mixed Model for Skewed Distributions Applied to Small Area Estimation.”Test 28 (2): 565–97. DOI: https://doi.org/10.1007/s11749-018-0594-2.

11.

Guadarrama

Molina

Rao

2018. “Small Area Estimation of General Parameters Under Complex Sampling Designs.”Computational Statistics & Data Analysis 121: 20–40. DOI: https://doi.org/10.1016/j.csda.2017.11.007.

12.

Hall

Maiti

2006. “On Parametric Bootstrap Methods for Small Area Prediction.”Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (2): 221–38. DOI: https://doi.org/10.1111/j.1467-9868.2006.00541.x.

13.

Handcock

M. S.

2023. reldist: Relative Distribution Methods. R Package Version 1.7-2. https://CRAN.R-project.org/package=reldist.

14.

Hobza

Marhuenda

Morales

2020. “Small Area Estimation of Additive Parameters Under Unit-Level Generalized Linear Mixed Models.”Sort 44 (1): 3–38. DOI: https://doi.org/10.2436/20.8080.02.93.

15.

Jiang

Lahiri

2006. “Mixed Model Prediction and Small Area Estimation.”Test 15 (1): 1–96. DOI: https://doi.org/10.1007/BF02595419.

16.

Lohr

S. L.

Rao

2009. “Jackknife Estimation of Mean Squared Error of Small Area Predictors in Nonlinear Mixed Models.”Biometrika 96 (2): 457–68. DOI: https://doi.org/10.1093/biomet/asp003.

17.

Lyu

Berg

E. J.

Hofmann

2020. “Empirical Bayes Small Area Prediction Under a Zero-Inflated Lognormal Model with Correlated Random Area Effects.”Biometrical Journal 62 (8): 1859–78. DOI: https://doi.org/10.1002/bimj.202000029.

18.

Molina

Rao

2010. “Small Area Estimation of Poverty Indicators.”Canadian Journal of Statistics 38 (3): 369–85. DOI: https://doi.org/10.1002/cjs.10051.

19.

Molina

Saei

José Lombardía

2007. “Small Area Estimates of Labour Force Participation Under a Multinomial Logit Mixed Model.”Journal of the Royal Statistical Society: Series A (Statistics in Society) 170 (4): 975–1000. DOI: https://doi.org/10.1111/j.1467-985X.2007.00493.x.

20.

Morales

Esteban

M. D.

Pérez

Hobza

2021. A Course on Small Area Estimation and Mixed Models: Methods, Theory and Applications in R. Springer.

21.

Nusser

S. M.

Goebel

J. J.

1997. “The National Resources Inventory: A Long-Term Multi-Resource Monitoring Programme.”Environmental and Ecological Statistics 4: 181–204. DOI: https://doi.org/10.1023/A:1018574412308.

22.

Pfeffermann

2013. “New Important Developments in Small Area Estimation.”Statistical Science 28 (1): 40–68. DOI: https://doi.org/10.1214/12-STS395.

23.

Pfeffermann

Sverchkov

2007. “Small-Area Estimation Under Informative Probability Sampling of Areas and Within the Selected Areas.”Journal of the American Statistical Association 102 (480): 1427–39. DOI: https://doi.org/10.1198/016214507000001094.

24.

Rao

J. N.

Molina

2015. Small Area Estimation. John Wiley & Sons.

25.

Reluga

Lombardía

M.-J.

Sperlich

2023. “Simultaneous Inference for Empirical Best Predictors with a Poverty Study in Small Areas.”Journal of the American Statistical Association 118 (541): 583–95. DOI: https://doi.org/10.1080/01621459.2021.1942014.

26.

Rojas-Perilla

Pannier

Schmid

Tzavidis

2020. “Data-Driven Transformations in Small Area Estimation.”Journal of the Royal Statistical Society: Series A (Statistics in Society) 183 (1): 121–48. DOI: https://doi.org/10.1111/rssa.12488.

27.

Schilling

K. E.

Jacobson

P. J.

Streeter

M. T.

Jones

C. S.

2018. “Groundwater Hydrology and Quality in Drained Wetlands of the Des Moines Lobe in Iowa.”Wetlands 38: 247–59. DOI: https://doi.org/10.1007/s13157-016-0825-9.

28.

Secchi

Babcock

2007. “Impact of High Crop Prices on Environmental Quality: A Case of Iowa and the Conservation Reserve Program.” Working Paper 07-WP 447, Center for Agricultural and Rural Development, Iowa State University.

29.

Shao

2003. Mathematical Statistics. Springer Science & Business Media.

30.

Smith

A. F.

Gelfand

A. E.

1992. “Bayesian Statistics Without Tears: A Sampling–Resampling Perspective.”The American Statistician 46 (2): 84–8. DOI: https://doi.org/10.1080/00031305.1992.10475856.

31.

Zimmermann

Münnich

R. T.

2018. “Small Area Estimation with a Lognormal Mixed Model Under Informative Sampling.”Journal of Official Statistics 34 (2): 523–42. DOI: https://doi.org/10.2478/jos-2018-0024.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.11 MB