A quasi-Bayesian Approach to Small Area Estimation Using Spatial Models

Abstract

The empirical best linear unbiased prediction (EBLUP) method has been the dominant model-based approach in small area estimation. As an alternative to this frequentist method, the observed best prediction (OBP) method, also frequentist, was proposed by Jiang et al.^[11] where the parameters of the model are estimated by minimizing an objective function which is implied by the total mean squared prediction error. In a recent article, Datta et al.^[6] followed a general Bayesian approach, proposed recently by Bissiri et al.^[2], to develop a quasi-Bayesian method by appropriately calibrating the objective function for the OBP method for the Fay-Herriot model. In a different article, Chung and Datta^[4] demonstrated that in the absence of covariates with good predictive power the small area estimates from the standard Fay-Herriot model can be improved by using spatially dependent random effects. In this article, we develop a quasi-Bayesian small area estimation method using several spatial alternatives to the independent Fay-Herriot random effects model. Evaluation of the proposed method based on an application to estimation of four-person family median incomes for the U.S. states shows its usefulness. Limited but related simulation studies for the median incomes application reinforce our conclusion.

AMS Subject Classification: 62F15, 62D99

Keywords

Fay-Herriot model General Bayesian updating Observed best prediction Posterior samples Spatial random effects

1. Introduction

Shrinkage estimation of the mean vector $(θ_{1}, \dots, θ_{m})$ of a multivariate normal distribution is a pioneering breakthrough in statistics by Stein^[21]. He showed that the vector of shrinkage estimators of the individual means has lower frequentist risk under a sum of squared error loss than that of the uniformly minimum variance unbiased estimator vector. Shrinkage estimation makes sense when the components of the mean vector have similar interpretation such as means of a characteristic for many similar subpopulations. Concurrent estimation of subpopulation means is of fundamental importance in small area estimation. Small areas in survey sampling are subpopulations, defined by demographic and/or geographic partitioning of a suitable population. Small areas are actually subpopulations or subdomains of some target population which do not get enough allocations of samples to estimate their means with adequate accuracy.

In a pioneering paper, Efron and Morris^[8] showed Stein’s estimators can be interpreted as empirical Bayes (EB) estimators of the means if one assumes some appropriate models for the mean vector. This EB representation presents a heuristic justification for the above-mentioned risk domination that shrinkage estimators enjoy.

Reliable estimates of subpopulation characteristics $θ_{i} s$ are urgently needed by many national statistical offices to help government administrators implement various social and fiscal policies of their governments. However, a traditional estimate $Y_{i}$ , also called a direct estimate, of $θ_{i}$ based only on the sample from that subpopulation may have unacceptable level of estimation error if the corresponding sample size is small. To improve low accuracy of direct estimates of small area means, researchers recommended combining information from the other subpopulations, possibly similar, through model-based estimation of small area means by using suitable models for the $θ_{i}$ s based on appropriate covariates, if available.

In a large-scale pioneering application of model based small area estimation of average incomes of many small localities in US, Fay and Herriot^[9] proposed a model which is now extensively used in small area estimation. Their model, which is known as the Fay-Herriot model in small area estimation literature, implicitly connects direct estimator $Y_{i}$ of $θ_{i}$ with related covariates $x_{i}$ .

Fay and Herriot^[9] proposed a mixed effects model given by

Y_{i} = θ_{i} + e_{i}, θ_{i} = x_{i}^{T} β + v_{i}, i = 1, \dots, m,

(1.1)

where $Y_{i}$ is related to $θ_{i}$ via the sampling model, and $θ_{i}$ is related to the covariates $x_{i}$ via the linking model. It assumes that the sampling error $e_{i}$ satisfies $E_{P} (e_{i} ∣ x_{i}) = 0$ and $V_{P} (e_{i} ∣ x_{i}) = D_{i}$ , where the subscript $P$ refers to the sampling design. Sampling errors are assumed to be independent and normally distributed, and are independent of linking errors, $v_{i}$ , the area-specific random effects, which are assumed to be independent and normally distributed with zero mean and a common variance $A$ . The sampling variances, $D_{i}$ ’s, which are usually estimated, are treated as known. We denote model parameters ${(β^{T}, A)}^{T}$ by $ψ$ , where $β$ is a $p \times 1$ vector of unknown regression coefficients. In the FayHerriot model, for a $p \times 1$ vector of known covariates $x_{i}$ , estimation of small area means becomes prediction of the mixed effects $θ_{i} = x_{i}^{⊤} β + v_{i}, i = 1, \dots, m$ .

While Fay and Herriot^[9] used the EB approach to estimate $θ_{i}$ based on the model given by equation (1.1), Prasad and RaO^[15], Lahiri and RaO^[12], Datta and Lahiri^[5] and Datta et al.^[7] used the empirical best linear unbiased predictor (EBLUP) method, a frequentist approach, and they derived the second-order approximate mean squared error (MSE) of the EBLUPs of $θ_{i}$ , and second-order approximate unbiased estimators of the MSE’s for various estimators of the variance parameter (see Datta and Lahiri^[5]). However, Ghosh^[10] developed hierarchical Bayes (HB) estimation of $θ_{i}$ by augmenting the model in equation (1.1) by an improper uniform prior for $β$ and $A$ . Later on, Datta et al.^[7] also considered the HB prediction of $θ_{i}$ ’s for the Fay-Herriot model.

The model in equation (1.1) assumes independence of the random effects terms $v_{i}$ ’s across small areas. For subsequent reference we will call this model as the independent Fay-Herriot model. Small area means of geographic areas often display a strong spatial dependence among themselves. In the absence of effective covariates, there maybe a systematic model misspecification by the independent FH model for the means due to spatial correlation among random effects. In such applications, it is recommended to use suitable spatial models for the random effects. In a recent article, Chung and Datta^[4] developed comprehensive $HB$ estimation of $θ_{i}$ ’s under many popular spatial models based on certain noninformative priors on the model parameters. We briefly describe these spatial models in the next section.

The EBLUP approach played a major role in small area estimation (cf. Battese et al.^[1]; Prasad and RaO^[15]). While the independent $FH$ model has been a very popular model for creating the EBLUPs of small area means, many authors in the last twenty years alternatively used spatial random effects models. See, for example, Saei and Chambers^[19], Singh et al.^[20], Petrucci and Salvati^[14], Pratesi and Salvati^[17] and Pratesi et al.^[16].

The EBLUP or the EB predictions of small area means are carried out in two steps, and these predictors are derived under the popular sum of squared error loss function. For simplicity, we present the main ideas below for the independent FH model. First, for known model parameters $ψ$ , the best predictor (BP) ${\tilde{θ}}_{i, B}$ of $θ_{i}$ , given by

{\tilde{θ}}_{i, B} = Y_{i} - γ_{i} (Y_{i} - x_{i}^{⊤} β),

(1.2)

is derived, where $γ_{i} = D_{i} / (D_{i} + A)$ . However, since $ψ$ is unknown, the predictor ${\tilde{θ}}_{i, B}$ cannot be used. Estimators of $A$ and the regression coefficient $β$ are typically obtained by iterative computation based on the marginal distribution of the data.

In a pioneering paper, Jiang et al.^[11] argued that the second step in developing the EBLUPs is not suitable for prediction of $θ_{i}$ since areas with less accurate $Y_{i}$ ’s have less influence in determining the model parameters. The maximum likelihood or weighted least squares estimator of $β$ assigns relatively lower weights to direct estimators $Y_{i}$ with larger sampling error variances $D_{i}$ ’s. Since prediction of $θ_{i}$ ’s is the main interest, they argued for a purely predictive procedure where both the predictor of $θ$ and the estimators of model parameters are derived by minimization of a total predictive mean squared error. We review their method in the next section. These authors also argued that their proposal, which they called observed best prediction (OBP), an alternative to the EBLUP, is less sensitive to misspecification of the mean function. Since both the spatial random effects model and the OBP method are effective tools to address misspecification of the mean function of $θ_{i}$ ’s, in this article we develop a method by combining these two ideas. To address this objective we will extend a quasi-Bayesian method recently developed by Datta et al.^[6]. This method allows us to compute meaningful measure of uncertainty for the Bayesian analogs of OBP estimates for the spatial models. Asymptotically justified estimator of mean squared error of OBP predictor has been derived by Liu et al.^[13] for the independent Fay-Herriot model. This derivation is non-trivial, and no such results exist nor can be easily derived for the spatial Fay-Herriot models. Moreover, such estimator may not always be positive. Our proposed measure of uncertainty, quasi-Bayesian posterior variance, is computational and nonnegative. We showed in Datta et al.^[6] that the quasi-Bayesian method for the independent Fay-Herriot setup is a competitive alternative to the OBP method.

2. A Brief Review of the OBP Method

We use the notation $θ = {(θ_{1}, \dots, θ_{m})}^{T}$ to denote the mean vector and $Y = {(Y_{1}, \dots, Y_{m})}^{T}$ to denote the vector of direct estimates. We first review the objective function crucial to the OBP method of Jiang et al.^[11] for the independent Fay-Herriot model. Under the sum of squared error loss function $L (\hat{θ}, θ) = \sum_{i = 1}^{m} {({\hat{θ}}_{i} - θ_{i})}^{2}$ to estimate $θ$ by an estimator $\hat{θ}$ , (Rao and Molina^[18], p. 166) presented an alternative and transparent derivation of the objective function obtained by Jiang et al.^[11] to estimate the model parameters $ψ$ . (Rao and Molina^[18], p. 166) computed the total sampling MSE of ${\tilde{θ}}_{i, B}, i = 1, \dots, m, {MSE}_{P} ({\tilde{θ}}_{B}) = \sum_{i = 1}^{m} E_{P} {[{\tilde{θ}}_{i, B} - θ_{i}]}^{2}$ . Using equation (6.4.52) in their book, they obtained ${MSE}_{P} ({\tilde{θ}}_{B}) = E_{P} [\sum_{i = 1}^{m} \{γ_{i}^{2} {(Y_{i} - x_{i}^{T} β)}^{2} - 2 D_{i} γ_{i} + D_{i}\}]$ . They dropped the last term, $\sum_{i = 1}^{m} D_{i}$ , which is known, to obtain the objective function of Jiang et al.^[11]. This function is given by

Q (ψ) \overset{def}{=} Q (β, A, Y) = {(Y - X β)}^{T} Γ^{2} (Y - X β) - 2 \cdot t r (D Γ) \overset{def}{=} Q_{0} (β, A, Y) + Q_{1} (A)

(2.1)

where $Q_{1} (A) = - 2 \cdot tr (D Γ) In (2.1), X = {[x_{1}, \dots, x_{m}]}^{T}$ , and $Γ = diag (γ_{1}, \dots, γ_{m})$ is the shrinkage matrix shrinking $Y$ to the regression function. The resulting estimator $\tilde{ψ} = {({\tilde{β}}^{T}, \tilde{A})}^{T}$ , obtained by minimizing $Q (ψ)$ , is referred to as the best predictive estimator (BPE) of $ψ$ .

The objective function in (2.1), which is valid for the independent Fay-Herriot model, has been generalized by Jiang et al.^[11] even for dependent random effects model. To allow a greater generality for random effects distribution we use the precision matrix to denote a multivariate normal pdf for random effects so that even a singular pdf can be considered. If this matrix is positive definite, it is equivalent to the inverse of the variance-covariance matrix of the random effects. We present the pdf of random effects under various spatial models considered in the next section by using the precision matrix. While the precision matrices for most spatial models are p.d., it is not so for the intrinsic autoregression (IAR) model (discussed later). If a random vector $U$ has the pdf given by

f_{U} (u) \propto exp [- \frac{1}{2} {(u - μ)}^{T} P_{U} (u - μ)] for u, μ \in colsp (P_{U})

we use the notation $U \sim N (μ, P_{U})$ . Here, the symmetric matrix $P_{U}$ is n.n.d. and $colsp (P_{U})$ is the column space of the matrix $P_{U}$ . If $P_{U}$ is singular, then the distribution of $U$ is singular. Following equation (27) of Jiang et al.^[11] we consider the following mixed effects model

Y = X β + v + e

(2.2)

where $v$ and $e$ are independently distributed with $v \sim N (0, P_{M})$ and $e \sim N (0, P_{D})$ . Here, we assume that $P_{D}$ is known, and $P_{M}$ depends on a parameter vector $δ$ . We also assume that $P_{D}$ is p.d., which is identical to $D^{- 1}$ , where $D$ is the sampling variance-covariance matrix. Let the mixed effects that we need to predict be denoted by $θ = X β + v$ . If the precision matrix $P_{M}$ is singular, then the distribution of $θ$ will be singular, and in that case, we also need that $colsp (X) \subset colsp (P_{M})$ . Consequently, if $rank (X) = p$ , then rank of the matrix $P_{M}$ is at least $p$ . We redefine the model parameter vector $ψ$ by ${(β^{T}, δ^{T})}^{T}$ .

Considering $δ$ known, the best predictor (BP) of $θ$ , denoted by ${\tilde{θ}}_{B} (ψ)$ , can be obtained from the distribution of $v$ and $e$ . The $BP$ is given by

{\tilde{θ}}_{B} (ψ) = Y - B (Y - X β),

(2.3)

where $B = {(P_{D} + P_{M})}^{- 1} P_{M}$ . The associated measure of uncertainty, $V (θ ∣ ψ, Y)$ , is given by $(P_{D} +$ ${P_{M})}^{- 1}$ . Following the argument of Rao and Molina^[18] presented above, we will derive below the equivalent expression of the objective function in (2.1) for the model given by equation (2.2). Noting that ${\tilde{θ}}_{B} (ψ) - θ = e - B (Y - X β)$ , it can be shown that

M S E_{P} ({\tilde{θ}}_{B}) = E_{P} [{(Y - X β)}^{T} B^{T} B (Y - X β)] - 2 E_{P} [e^{T} B (Y - X β)] + tr (D) .

Using $E_{P} [e^{T} B (Y - X β)] = tr (D B)$ , and dropping the term $tr (D)$ , and the operator $E_{P} [\cdot]$ from the first term we get from the previous display the following objective function

Q (β, δ, Y) \equiv Q (ψ, Y) = Q_{0} (ψ, Y) + Q_{1} (δ),

(2.4)

where $Q_{0} (ψ, Y)$ and $Q_{1} (δ)$ are given by

Q_{0} (ψ, Y) = {(Y - X β)}^{T} B^{T} B (Y - X β), Q_{1} (δ) = - 2 tr (B D) .

(2.5)

We will simplify this general expression of the objective function for various spatial models that we introduce in the next section.

If $δ$ is known, the BPE of $β$ can be obtained by minimizing $Q_{0} (β, δ, Y)$ , the first term on the right-hand side of (2.4). This yields a closed-form solution

\tilde{β_{O}} (δ) = {(X^{T} B^{T} B X)}^{- 1} X^{T} B^{T} B Y,

where the rank of the matrix $B X$ is $p$ , which follows from the assumption that the rank of the matrix $X$ is $p$ and $colsp (X) \subset colsp (P_{M})$ .

Now, $\tilde{δ}$ , the BPE of $δ$ , is obtained by minimizing $Q ({\tilde{β}}_{O} (δ), δ, y)$ with respect to $δ$ . Then $\tilde{β}$ , the BPE of $β$ when $δ$ is unknown is ${\tilde{β}}_{O} (\tilde{δ})$ . We denote the BPE of $ψ$ by $\tilde{ψ} = {({\tilde{β}}^{⊤}, \tilde{δ})}^{T}$ . Finally, following Jiang et al.^[11], the observed best predictor (OBP) of the mixed effects $θ$ will be obtained by replacing $ψ$ in the BP (2.3) with its BPE $\tilde{ψ}$ .

3. Fay-Herriot Model for Spatially Dependent Random Effects

To highlight importance of spatial models in SAE Chung and Datta^[4] commented that in many problems the area characteristic of interest is closely related to various factors such as population size, ethnicity, age-group and education level which usually change geographically. Many economic and health characteristics display certain spatial patterns. When available covariates do not fully explain such spatial association, the independence and equal variance assumptions of random effects for the model in (1.1) fail, and this simple model may generate unreliable estimates. As an example, they noted that the spatial correlation, measured by Moran’s $I$ , for the 1989 four-person family median incomes based on the 1990 Census of the forty-nine contiguous states (including Washington, D.C.) of the U.S. is 0.44, which indicates a strong spatial dependence. For this example, which we consider later in Section 5, Chung and Datta^[4] showed that if the independent Fay-Herriot model does not employ an effective covariate with strong correlation with the SAE characteristic (the 1989 median income), both the accuracy (in terms of bias) and the precision (in terms of variance) of the resulting estimates drop. In terms of these measures, the paper by Chung and Datta^[4] showed that in the absence of covariates with good predictive power, various spatial random effects models provide better predictions than the independent Fay-Herriot model (see the right half of Table 1 of this article).

Table 1.

Mean squared prediction error (MSPE) based on Bayesian and OBP methods and average posterior standard deviation (APSD) of spatial models over the independent FH model.

Model	Full covariates: $x_{1}, x_{2}$			Strong covariate: $x_{2}$			Weak covariate: $x_{I}$
	MSPE		APSD	MSPE		APSD	MSPE		APSD
	Bayes	OBP	Bayes	Bayes	OBP	Bayes	Bayes	OBP	Bayes
FH	2.70	2.67	1.79	2.90	2.88	1.77	7.74	7.76	2.31
SAR	3.30	3.63	1.80	3.54	3.84	I. 77	4.25	4.26	1.98
SCAR	3.54	3.98	1.75	3.73	4.27	1.73	5.33	5.10	2.16
CAR	2.92	3.41	1.78	3.27	3.64	1.76	4.59	4.37	1.91
LCAR	2.37	2.67	1.76	2.70	2.88	1.75	4.40	4.33	1.89

All the spatial models we consider here are described through the adjacency matrix of small areas $W = \{w_{i j}\}, 1 \leq i, j \leq m$ , which plays an important role in capturing spatial association. Here, $w_{i j} = 1$ if the $i$ th and $j$ th small areas are neighbors, and $w_{i j} = 0$ , otherwise. Also, $w_{i i} = 0$ for $=$ $1, \dots, m$ . Let $w_{i .} = \sum_{j = 1}^{m} w_{i j}$ be the sum of the $i$ th row of $W$ and $L = diag {\{w_{i .}\}}_{i = 1}^{m}$ . Assume that all small areas have at least 1 neighboring area. This implies that $L$ is p.d. We define $\tilde{W} = L^{- 1} W$ . We present a few key facts about the eigenvalues of $W$ and $\tilde{W}$ . First, $W$ is a symmetric matrix, all its eigenvalues are real. We denote the $i$ th largest eigenvalue of $W$ by $λ_{i} (W), i = 1, \dots, m$ . Since $W$ is non-null and $0 = \sum_{i = 1}^{m} w_{i i} = \sum_{i = 1}^{m} λ_{i} (W)$ , it follows that $λ_{m} (W) < 0 < λ_{1} (W)$ . Since $\tilde{W}$ is a row stochastic matrix, all of its eigenvalues are bounded above by 1, and at least one of them is 1, that is, $λ_{1} (\tilde{W}) = 1$ . Since $\tilde{W}$ and $L^{- 1 / 2} W L^{- 1 / 2}$ have the same eigenvalues, and the second matrix is symmetric, the eigenvalues of $\tilde{W}$ are real. Finally, $tr (\tilde{W}) = 0$ implies $λ_{m} (\tilde{W})$ must be negative.

We will use $R$ to denote the matrix $L - W$ . Four important spatially dependent random effects models are represented by the following positive definite matrices:

{SAR :Ω}_{2} (ρ) = {(I_{m} - ρ \tilde{W})}^{⊤} (I_{m} - ρ \tilde{W}), ρ \in (- 1, 1),

(3.1)

SCAR : Ω_{3} (ρ) = I_{m} - ρ W, ρ \in (λ_{m} {(W)}^{- 1}, λ_{1} {(W)}^{- 1}),

(3.2)

{CAR :Ω}_{4} (ρ) = L - ρ W, ρ \in (λ_{m} {(\tilde{W})}^{- 1}, λ_{1} {(\tilde{W})}^{- 1}),

(3.3)

LCAR : Ω_{5} (ρ) = ρ R + (1 - ρ) I_{m}, ρ \in (0, 1),

(3.4)

where $ρ$ is the spatial parameter that controls the strength of spatial dependence. Even though we are using the same notation $ρ$ in all four models, meaning of this parameter changes from one model to another. For the $k$ th model, $ρ$ varies between $l_{k}$ and $u_{k}, k = 2, 3, 4, 5$ ; these bounds are given above by (3.1)-(3.4). Chung and Datta^[4] argued that all the precision matrices above are positive definite. Note that $R = Ω_{4} (1) = Ω_{5} (1)$ , and the matrix $A^{- 1} R$ is the precision matrix of the intrinsic autoregressive (IAR) model, another popular spatial model. This matrix is singular since $R$ is a singular matrix $(\Leftarrow R 1 = 0)$ .

The model (3.2) is a simple version of conditional autoregressive model (Rao and Molina^[18], Ch. 9.6.2) where diagonal entries of the precision matrix are all equal to one. This model is known as the simple conditional autoregressive (SCAR) model. The model (3.3) is the widely used conditional autoregressive (CAR) model. Diagonal entries of the precision matrix are $L_{i i}$ for the CAR model, and these entries are between 1 and $L_{i i}$ for the LCAR model. The conditional autoregressive models, SCAR, CAR, and LCAR, assume that $θ_{i}$ depends only on neighboring small area means. In other words, $θ_{i}$ is correlated with $θ_{j}$ ’s, $j \neq i$ , only for the surrounding areas. On the contrary, the simultaneous autoregressive (SAR) model assumes that $θ_{i}$ is dependent on all other $θ_{j}$ concurrently, $j \neq i$ , but has stronger (weaker) correlations for neighboring (remote) areas. The independent $FH$ model can be viewed as a special case of the SAR, SCAR, or LCAR model with $ρ = 0$ . In an analogy to the spatial models, the precision matrix of the independent FH model is $I_{m}$ . For convenience of notation, although this precision matrix is free from $ρ$ , we denote it by $Ω_{1} (ρ)$ . We note that for various models we considered here the matrices $A^{- 1} Ω_{k}, k = 1, \dots, 5$ correspond to the precision matrix $P_{M}$ in the model (2.2).

A full Bayesian analysis for the spatial models above based on a class of improper priors for $β$ and $A$ was pursued by Chung and Datta^[4]. For the CAR model with $Ω_{4} (ρ)$ with $ρ = 1$ , the intrinsic CAR model (which we refer to here also as the IAR model), a fully Bayesian analysis based on a proper inverse gamma prior for $A$ has been considered by Vogt^[22] and Vogt et al.^[23]. Even though the IAR model is a limiting case of the $CAR (ρ)$ model with $ρ = 1$ , Chung and Datta^[4] did not pursue the IAR model, and the proof of the propriety of the posterior for the CAR model will not automatically extend to the ICAR (or IAR) model without appropriate modification.

The EBLUP approach to small area estimation for some spatial models has been considered by several authors. Singh et al.^[20] made a pioneering effort for the SAR model. Following the approach of Datta and Lahiri^[5], Singh et al.^[20] derived a second-order approximation to the MSE of the EBLUP and a second-order unbiased estimator the MSE. Petrucci and Salvati^[14], Pratesi and Salvati^[17] and Pratesi et al.^[16] used Singh et al.^[20] results to several useful applications.

4. A Quasi-Bayesian Alternative to the OBP

For the independent Fay-Herriot model Datta et al.^[6] developed an alternative Bayesian version of the OBP method of Jiang et al.^[11]. While they referred to this method as pseudo-Bayes, the term pseudo’ seems to have a pejorative meaning. Consequently, in this article we will replace pseudo’ by quasi’.

In a recent article Liu et al.^[13] obtained a suitably justified estimator of the MSE of the OBP predictors of small area means for the independent Fay-Herriot model. However, derivation of the estimator is challenging and the estimator is occasionally negative. Quasi-Bayesian method of Datta et al.^[6] not only produces point estimates comparable with the OBPs, the associated posterior variances, naturally positive, are comparable with MSE estimates. Additionally, the quasiBayesian method provides useful credible intervals for the small area means in a straightforward manner. Even though the frequentist OBP point predictors for the spatial models can be calculated, no valid estimators of their MSEs are available yet. Any reasonable extension of the tedious calculations for the estimators of MSE, presented in Liu et al.^[13] only for the independent model, will be far more challenging for the spatial models. Moreover, these estimators may be negative. Due to much usefulness of both the spatial models and the OBP method, a quasi-Bayesian extension of Datta et al.^[6] solution for the independent Fay-Herriot model to a spatial dependent setup can serve as a way to provide reasonable credible intervals as well as uncertainty estimates of the estimators for small area means.

The standard Bayesian method updates a prior distribution of the parameters in a model by their likelihood. In a recent article, Bissiri et al.^[2] presented a general method to update a prior distribution for parameters to a posterior distribution when the parameters are connected to observations through a loss function rather than the traditional likelihood. Since the OBP procedure uses only a loss function and not the likelihood for the model parameters to estimate them, in our Bayesian analog to the OBP solution we use the OBP loss function and an algorithm from Bissiri et al.^[2] to create a quasi-likelihood function. Indeed, Datta et al.^[6] followed this algorithm to construct a quasi-likelihood from the OBP loss function under the independent Fay-Herriot model and a quasi-Bayesian solution from this likelihood.

4.1. Update of a Prior by a Loss Function: a Quasi-posterior

The discussion in this subsection is based on modification of Section 2.1 of Datta et al.^[6]. To derive a quasi-posterior pdf for $ψ$ , we adopt the framework for general Bayesian inference put forward by Bissiri et al.^[2]. According to these authors, the information’ about the parameter $β$ in the loss function $Q_{0} (β, A, ρ, y)$ should be appropriately calibrated to update the prior distribution for $β$ . We update the uniform prior on $β$ by creating an ad hoc likelihood’ for $β$ from this loss function. Since the loss function depends on the scale of the data $y$ , the ad hoc likelihood needs to be calibrated so that both the ad hoc likelihood and the prior pdf of $β$ become scale-free’.

In general, for some data $d$ with the density $f (d ∣ ϕ)$ and prior $pdf π (ϕ)$ for the parameter $ϕ$ , the usual Bayes' rule updates the prior $pdf π (\cdot)$ to the posterior pdf determined by normalization of the kernel $f (d ∣ ϕ) π (ϕ)$ of $ϕ$ . This kernel is proportional to $exp [- w s (ϕ, d) - log (π (ϕ^{*}) / π (ϕ))]$ , where $s (ϕ, d) = - log (f (d ∣ ϕ)), w = 1$ and $ϕ^{*}$ is a suitably chosen value of $ϕ$ . Bissiri et al.^[2] and Bissiri and Walker^[3] developed a general Bayesian method which generalizes the usual Bayes' rule to facilitate updating a prior pdf by loss information on the parameter $ϕ$ . A brief review of their solution is below.

If a nonnegative function $l (ϕ, d)$ is the loss information on $ϕ$ based on data $d$ , Bissiri et al.^[2] suggests a general Bayes’ method to update a prior pdf $π (ϕ)$ by the normalized version of the kernel $exp [- w_{l} l (ϕ, d) - log (π (ϕ^{*}) / π (ϕ))]$ , where $w_{l} > 0$ is an appropriate constant to calibrate the loss information so that the components $w_{l} l (ϕ, d)$ and $- log (π (ϕ))$ are on a comparable scale. When $l (ϕ, d)$ is $s (ϕ, d)$ , the self-information loss, $w_{l} = 1$ is the natural choice to combine the data loss with the prior $loss - log (π (ϕ))$ to update the prior distribution. In our application, the loss function $Q_{0} (β, A, ρ, y)$ for $β$ is not a self-information loss. Bissiri et al.^[2] (cf. Section 3) argues that when different types of loss functions are combined, their calibration is crucial. In particular, the loss function should be commensurate to the prior distribution.

One way to combine two losses $l (ϕ, d)$ and $- log (π (ϕ))$ is based on a prior evaluation of the expected value of $l (ϕ, d)$ to determine $w_{l}$ for the calibration (Bissiri et al.^[2], Section 3.2). To pick $w_{l}$ , solve

w_{l} E [l (ϕ, d)] = E_{π} [log (π (\overset{ˇ}{ϕ}) / π (ϕ))],

(4.1)

where the left hand expectation $E [\cdot]$ is with respect to a joint distribution for $d$ and $ϕ$ , say $m (d, ϕ)$ , and the resulting marginal for $ϕ$ is $π (ϕ)$ . Here, $\overset{ˇ}{ϕ}$ is taken as the maximizer of $π (ϕ)$ . Specifically, if $d$ given $ϕ$ has mean and variance $ϕ$ and $σ^{2}$ , respectively, and if $π (ϕ)$ is normal with mean $η$ and variance $τ^{2}$ , then for squared error $loss l (ϕ, d) = {(ϕ - d)}^{2}$ , the scale $w_{l}$ is $1 / (2 σ^{2})$ . This calibration relates the loss $w_{l} l (ϕ, d)$ to the self-information loss for normally distributed data. The scaling constant $w_{l}$ is independent of both $η$ and $τ^{2}$ . Hence, by taking a very large $τ^{2}$ , one can approach a uniform prior for $ϕ$ and calibrate the squared error loss function relative to this approximate prior.

4.2. Calibration for

β

: Quasi-posterior of

β

Given

A, ρ

We calibrate $Q_{0} (β, A, ρ, Y)$ , the first component of our loss function. For given $A, ρ$ , following an argument similar to the one from Section 2.2 of Datta et al.^[6], we find that for a multivariate normal

prior for $β$ an appropriate scaling factor, $w_{0}$ , is given by

w_{0} = \frac{m}{2} {[tr \{B D\}]}^{- 1} .

The matrix $B$ has been defined following equation (2.3). This scalar $w_{0}$ is free from the parameters of the multivariate normal prior. By taking the variance of this prior very large’, we can use the traditional uniform prior for $β$ . Thus the posterior for $β$ is

π_{q s} (β ∣ A, ρ, y) \propto exp [- w_{0} Q_{0} (β, A, ρ, y)],

(4.2)

see (Bissiri et al.^[2]). The above posterior distribution, conditional on $A, ρ$ , is multivariate normal with mean vector ${\tilde{β}}_{O} (δ) = {(X^{T} B^{T} B X)}^{- 1} X^{T} B^{T} B Y$ and variance-covariance matrix ${(2 w_{0} X^{T} B^{T} B X)}^{- 1}$ .

4.3. Approximate Calibration of Loss Function for

A

and

ρ

The two-step minimization of the loss function $Q (β, A, ρ, y)$ to estimate the model parameters gives the loss function $Q_{*} (A, ρ) = Q (\tilde{β} (A, ρ), A, ρ, Y)$ for $A$ and $ρ$ . For a prior $π (A, ρ)$ on $A$ and $ρ$ , we need to calibrate the loss function $Q_{*} (A, ρ)$ by suitable weight $w_{*}$ to create a quasi-likelihood for $A$ and $ρ$ .

Let $\bar{D} = \sum_{i = 1}^{m} D_{i} / m$ . We use a uniform prior for $ρ$ over its allowable range $(l_{k}, u_{k})$ , and an independent prior $π_{A} (A) = \overset{࿽}{D} {(A + \overset{࿽}{D})}^{- 2}$ . It can be shown that $\int_{0}^{\infty} \int_{l_{k}}^{u_{k}} log [π_{A} (\overset{ˇ}{A}) / π_{A} (A)] π_{A} (A) π_{ρ} (ρ) d ρ d A = 2$ . We remark that this value is specific to the prior $π_{A} (A)$ . For another proper prior $π_{A}^{*} (A)$ , we need to recompute the corresponding value, which we may do by the Monte Carlo method. We also need to compute $E [Q_{*} (A, ρ) / m]$ , where the expectation is with respect to the joint distribution of $Y$ and the model parameters. We can simplify this via iterative expectation by using the first two moments of $Y$ (given $, A, ρ$ ). The result depends on $A, ρ$ , but not on $β$ , and we find the Monte Carlo expectation of this function with respect to the prior for $A$ and $ρ$ by drawing 10000 values for the pair from their joint prior distribution. Using this value, we get the calibration coefficient $w_{*}$ using appropriate modification of the equation (4.1).

Using the above calibration weight, for the loss function $Q_{*} (A, ρ)$ , an update of the prior $π (A, ρ)$ is given by the quasi-posterior

π_{q s} (A, ρ ∣ y) \propto {(A + \overset{࿽}{D})}^{- 2} exp [- w_{*} Q_{*} (A, ρ)] I (l_{k} < ρ < u_{k}) .

(4.3)

To make quasi-Bayesian inference on $θ$ , first generate generate $A$ and $ρ$ from the quasi-posterior in (4.3), then use the generated $A$ and $ρ$ to get $β$ from its conditional quasi-posterior in (4.2). Finally, use the generated values of $β, A, ρ$ to draw $θ$ from an MVN distribution with mean ${\tilde{θ}}_{B} (ψ)$ in (2.3) and variance-covariance matrix ${(P_{D} + P_{M})}^{- 1}$ .

For a suitably large $S$ , we generate $θ^{(s)}, s = 1, \dots, S$ , following the method outlined above. Based on these generated values of $θ^{(s)}$ , we can create various summaries of the posterior distribution including the posterior mean, variance and posterior quantiles of $θ_{i}, i = 1, \dots, m$ to find their point estimates and credible intervals.

Finally, for the IAR model we have no spatial correlation parameter $ρ$ and the precision matrix of $v$ is given by $A^{- 1} R$ . Using this precision matrix we can easily obtain $Q_{0} (β, A, Y)$ and $Q_{1} (A)$ , which will provide the quasi-posterior distributions of $β$ and $A$ .

5. Application to the Current Population Survey Data

We apply our quasi-Bayesian method for the spatial models to estimate the four-person family median incomes for forty-eight states of the continental US and the District of Columbia for the income year 1989. We exclude Alaska and Hawaii from this application because they are geographically isolated from the continental US. The US Department of Health and Human Services (HHS) needs accurate state-level median income estimates to implement an energy assistance program for low-income families. The yearly household income data is collected from the Annual Demographic Supplement to the monthly Current Population Survey (CPS). The state-level estimates for most states from the CPS sample, unfortunately, are not adequately accurate. Using the CPS estimates as direct estimates, and relevant administrative data and past census data as covariates, the US Census Bureau developed model based estimates of the median incomes by using the regular Fay-Herriot model with independent random effects. In this application, we develop alternative small area estimates of median incomes based on the spatial models and compare their performances with the estimates from the traditional Fay-Herriot model. We consider estimation of incomes for the year 1989 since more accurate estimates of the 1989 unknown incomes are available from the 1990 decennial census. Since census values are accurate, we treat these estimates as true values’ and compare our various predictions against these values.

We denote the true 1989 four-person family median income for $i$ th state by $θ_{i}$ , and the direct estimate of $θ_{i}$ from the 1990 CPS by $Y_{i}$ . Two available covariates for this problem are

$x_{i 1}$ : 1979 four-person family median income of the ith state from the 1980 census,

$x_{i 2} = (P C I_{i, 1989} / P C I_{i, 1979}) \cdot x_{i 1}$ : adjusted 1979 census four-person family median income of the ith state,

where PCI is state level per capita income from the U.S. Bureau of Economic Analysis. From our present and previous analyses (cf. Chung and Datta^[4] and Datta et al.^[6]) we found that the covariate $x_{2}$ is more effective in accounting the variability of the median income in small areas. When we look at the spatial dependence within each covariate, we find that the Moran’s I measuring the spatial dependence for the 1989 census state median incomes is 0.50. The corresponding value for $x_{2}$ is 0.43. The Moran’s I for the direct estimates $y_{i}$ from the 1989 CPS is 0.31. But the same for $x_{1}$ is 0.22. While the CPS estimates track closely in terms of Moran’s I of the true’ values, the 1989 census values, they have high variances. Based on Moran’s I, borrowing information from $x_{2}$ may be more effective than using $x_{1}$ to track spatial patterns of $θ_{i}$ ’s. We use mean squared prediction error (MSPE) from the truth’ $θ_{i}$ , which are unknown. In their place, we use the median income values we obtained from the 1990 census. We denote these values by $C_{i, 89}$ . We also compute average of posterior standard deviations (APSD) to evaluate the accuracy of predictions. We define these criteria by

M S P E = \frac{1}{m_{E}} \sum_{i \in E} {({\hat{θ}}_{i} - C_{i, 89})}^{2},

(5.1)

A P S D = \frac{1}{m_{E}} \sum_{i \in E} s d (θ_{i}),

(5.2)

where ${\hat{θ}}_{i}$ is the estimated value from each model and $θ_{i}$ is the true median income value. Here $E$ is a suitable subset of $\{1, \dots, m\}$ , which is determined by indices only of the sampled or non-sampled small areas, and $m_{E}$ is the number of elements in $E$ . Finally, $s d (θ_{i})$ is the posterior standard deviation of $θ_{i}$ based on the model under consideration.

5.1. Median Incomes of 4-person Families by State

We fit four spatial models defined by (3.1) to (3.4) and the FH model with three subsets of covariates, given by the design matrix $X = [1_{m}, x_{1}, x_{2}]$ , or $X = [1_{m}, x_{2}]$ , or $X = [1_{m}, x_{1}]$ . The first setting is the most efficient one and the last setting is the least efficient. We calculate the MSPEs and APSDs over 49 areas from respective $θ_{i}$ ’s and results are summarized in Table 1. Results reported for Bayes are based on the quasi-Bayes predictors we develop here.

Table 1 above shows that for the most efficient set of covariates (full covariates) the LCAR model has about 12 per cent smaller MSPE and 2 per cent smaller APSD than those of the independent FH model for quasi-Bayes. Since $x_{2}$ displays a good spatial pattern, it can explain most of the variability of $θ_{i}$ . As a result, it is not surprising that in terms of MSPE, except the LCAR model, the independent FH model is at least as good as the spatial models. In terms of the APSD values, all five models are comparable. Moreover, in terms of MSPE, quasi-Bayes outperforms OBP method for all four spatial models. The quasi-Bayes method has about 14 per cent smaller MSPE than those of the OBP method of CAR model.

On the other extreme, when only the weaker variable $x_{1}$ is included in the regression model, it is no surprise that the MSPE and the APSD for all five models are higher in comparison with the other two combinations of covariates for both quasi-Bayes and OBP. In this case, under quasi-Bayes method, however, the MSPE and the APSD of the SAR are approximately 45 per cent and 14 per cent, respectively, less than those of the independent FH model. The LCAR is the second-best model in terms of MSPE, with an MSPE that is 43 per cent smaller, and in terms of the APSD the LCAR is the best for which the APSD is 18 per cent smaller than that for the independent FH model. By removing the strong covariate $x_{2}$ from the full model, the MSPE of the SAR and LCAR models increase by about 29 per cent and 86 per cent, respectively, while the MSPE of the independent FH model increases by roughly 187 per cent. These results show that when covariates are less than effective in explaining existing spatial variation, spatial models that account for such variation can generate much more accurate predictions than the independent FH model.

Table 1 provides MSPE values for the OBP method. In terms of the MSPE, only when the weak covariate is used in the regression function, the OBP only marginally performs better than the quasiBayes predictors for the SCAR and CAR models. However, the OBP has no estimated root mean squared error of prediction for spatial models.

In Table 2 we provide point estimates and credible intervals of the spatial parameter for various models. Among the three sets of regression models we considered, only for the case of $x_{1}$ , credible intervals of the spatial parameter for all four models do not include 0, the null value; these intervals quite strongly indicate the presence of a significantly nonzero spatial pattern.

Table 2.

Posterior mean/mode (standard deviation) and 95% credible interval (Crl) of $ρ$ . For each regression model, the first row summarizes posterior mean, mode and standard deviation of $ρ$ , in that order, and the second row summarizes the Crl of $ρ$ .

	SAR	SCAR	CAR	LCAR
Full covariates:	$- 0.47 / - 0.99 (0.33)$	$- 0.26 / - 0.25 (0.08)$	$- 0.46 / - 0.39 (0.59)$	$0.23 / 0.17 (0.24)$
$x_{1}, x_{2}$	$(- 0.97, 0.28)$	$(- 0.35, - 0.03)$	$(- 1.32, 0.82)$	$(0.00, 0.87)$
Strong covariate:	$- 0.50 / - 0.99 (0.31)$	$- 0.26 / - 0.25 (0.08)$	$- 0.58 / - 0.45 (0.52)$	$0.14 / 0.12 (0.16)$
$x_{2}$	$(- 0.97, 0.16)$	$(- 0.35, - 0.04)$	$(- 1.34, 0.57)$	$(0.00, 0.62)$
Weak covariate:	$0.66 / 0.87 (0.15)$	$0.14 / 0.15 (0.03)$	$0.90 / 1.00 (0.10)$	$0.76 / 0.99 (0.17)$
$x_{1}$	$(0.37, 0.96)$	$(0.07, 0.18)$	$(0.63, 1.00)$	$(0.37, 0.99)$

5.2. Estimation of Some Non-sampled State Means

We evaluate the spatial models in terms of the accuracy of their predictions for non-sampled small areas. In this study we have evaluated both the approaches, quasi-Bayesian and OBP. Again, our two covariate settings are $X = [1, x_{1}]$ and $X = [1, x_{2}]$ . To check the quality of prediction for non-sampled areas with no direct estimates, we follow an idea of Chung and Datta^[4]. Suppose there are $m_{1}$ nonsampled areas and $m_{2} = m - m_{1}$ sampled areas. We constructed 12 data sets in an arbitrary manner, eleven of which lack four $(m_{1} = 4)$ direct estimates and one lacks five $(m_{1} = 5)$ direct estimates. We do not use CPS estimates (direct estimates) of the states omitted at each example and predict the $θ_{i}$ ’s of the omitted states. Suppose $θ_{(1)}$ is the subvector of $θ$ corresponding to the non-sampled areas, and $θ_{(2)}$ is the subvector of $θ$ corresponding to the sampled areas. Let $Y_{(2)}$ denote the vector of direct estimates for the sampled areas. We arrange the elements of $θ$ and define an indicator matrix $M$ so that $θ_{(2)} = M θ$ . The matrix $M$ is $m_{2} \times (m_{1} + m_{2})$ with its last $m_{2}$ columns form an identity matrix, and the first $m_{1}$ columns are null vectors. We then define $X_{(2)} = M X$ .

We obtain OBP and quasi-Bayes predictions of $θ_{(1)}$ based on $Y_{(2)}$ for all five models. Using these predictions, we compute, for each non-sampled area under the FH model and spatial models for each data set, the squared prediction error for both the quasi-Bayes and OBP point predictors. Obviously, we get the posterior standard deviations only for the quasi-Bayes method. We summarize the mean squared prediction error (MSPE) over non-sampled areas for each data set in Table 4 and average posterior standard deviation (APSD) in Table 3. Evaluation of prediction performance is based on the following ratios:

M S P E_R a t i o_{k i} = \frac{m s p e_{k i}}{m s p e_{1 i}}, A P S D_R a t i o_{k i} = \frac{a p s d_{k i}}{a p s d_{1 i}}, i = 1, \dots, 12; k = 2, \dots, 5

Table 3.

Ratios of averages of posterior standard deviations of spatial models to the $F H$ model.

	Unsampled states	Strong covariate: X2				Weak covariate: XI
	Unsampled states	$\frac{S A R}{F H}$	$\frac{S C A R}{F H}$	$\frac{C A R}{F H}$	$\frac{L C A R}{F H}$	$\frac{S A R}{F H}$	$\frac{S C A R}{F H}$	$\frac{C A R}{F H}$	$\frac{L C A R}{F H}$
I	AZ MS OK SD	1.01	0.99	0.92	0.94	0.58	0.78	0.53	0.53
2	AR CO DE TN	1.04	1.00	0.88	0.94	0.53	0.81	0.52	0.50
3	MD MI NV WV	0.98	0.97	0.98	0.97	0.72	0.86	0.68	0.66
4	MT NC NE NY	1.03	0.96	1.49	1.15	0.73	0.82	0.94	0.90
5	DC GA ID ND	1.02	0.99	1.00	1.00	0.93	1.05	0.88	0.83
6	AL MO VT WY	1.00	1.02	0.92	0.92	0.66	0.79	0.57	0.57
7	FL LA UT WA	0.99	0.97	1.06	1.01	0.66	0.85	0.69	0.69
8	MA MN SC TX	1.05	0.98	1.09	1.00	0.78	0.88	0.77	0.75
9	KY RI VA WI	0.98	1.07	0.89	0.88	0.69	0.91	0.61	0.62
10	IL IN NH PA	0.98	0.95	0.94	0.93	0.69	0.86	0.66	0.64
II	CA ME NJ OH	1.08	1.01	0.97	0.96	0.57	0.82	0.56	0.53
12	CT IA KS NM OR	1.00	0.98	0.98	0.98	0.73	0.86	0.67	0.66
12	Overall	1.01	0.99	1.01	0.97	0.69	0.85	0.67	0.66

For each row, the values inside the table represent the average posterior standard deviation ratios of spatial model and $F H$ model over four/five non-sampled states. The last row of the table represents the average posterior standard deviation ratio of spatial model and FH model over 49 non-sampled states.

where $a p s d_{k i}$ is the average posterior standard deviations of all omitted $θ$ ’s (four or five) under the $k$ th model for the $i$ th data set. A similar meaning is for $m s p e_{k i}$ . The value of MSPE_Ratio $_{k i}$ or APSD_Ratio k $_{i}$ less than one means that the kth spatial model using ith data set has a smaller mean squared prediction error or average posterior standard deviation than the FH model. Ratios higher than one are highlighted in bold in Table 3 and Table 4.

Table 4.

Ratios of MSPEs of spatial models to the FH model.

	Unsampled states	Method	Strong covariate: $X 2$				Weak covariate: $X I$
	Unsampled states	Method	$\frac{S A R}{F H}$	$\frac{S C A R}{F H}$	$\frac{C A R}{F H}$	$\frac{L C A R}{F H}$	$\frac{S A R}{F H}$	$\frac{S C A R}{F H}$	$\frac{C A R}{F H}$	$\frac{L C A R}{F H}$
1	AZ MS OK SD	qB	2.39	3.20	1.50	0.30	2.17	1.70	1.74	1.67
		OBP	3.29	4.26	2.57	1.00	2.10	1.88	1.65	1.68
2	AR CO DE TN	qB	1.17	1.14	1.02	0.90	0.26	0.54	0.32	0.32
		OBP	1.47	1.15	1.27	1.00	0.20	0.48	0.26	0.26
3	MD MI NV WV	qB	0.93	0.50	1.05	1.05	0.59	0.76	0.39	0.37
		OBP	0.91	0.45	2.05	1.00	1.25	0.71	0.37	0.40
4	MT NC NE NY	qB	1.16	0.70	0.80	1.54	0.81	1.11	0.77	1.09
		OBP	1.03	0.56	0.96	1.00	0.79	1.26	0.99	1.31
5	DC GA ID ND	qB	1.48	1.30	1.28	0.94	0.32	0.69	0.31	0.22
		OBP	1.59	1.38	1.43	1.00	0.20	0.41	0.10	0.14
6	AL MO VT WY	qB	2.12	2.19	1.68	0.69	0.23	0.56	0.27	0.30
		OBP	2.97	2.82	2.37	1.00	0.26	0.54	0.26	0.28
7	FL LA UT WA	qB	5.46	5.49	3.18	0.48	0.47	0.50	0.47	0.44
		OBP	6.51	6.05	4.72	1.00	0.42	0.45	0.46	0.44
8	MA MN SC TX	qB	1.89	1.52	1.76	0.93	0.28	0.22	0.52	0.48
		OBP	2.08	1.52	2.02	1.00	0.23	0.15	0.82	0.68
9	KY RI VA WI	qB	0.94	0.95	1.13	1.13	0.22	0.47	0.30	0.26
		OBP	1.07	1.22	1.22	1.00	0.25	0.45	0.23	0.24
10	IL IN NH PA	qB	1.12	1.12	1.13	0.96	0.36	0.35	0.26	0.27
		OBP	1.19	1.11	1.16	1.00	0.38	0.35	0.26	0.27
11	CA ME NJ OH	qB	1.00	1.18	0.94	0.79	0.53	0.50	0.54	0.50
		OBP	1.38	1.47	0.91	1.00	0.53	0.45	0.49	0.47
12	CT IA KS NM OR	qB	0.98	0.90	1.00	1.19	0.38	0.63	0.49	0.50
		OBP	0.92	0.96	0.99	1.00	0.42	0.64	0.46	0.47
	Overall	qB	1.34	1.33	1.21	0.95	0.41	0.60	0.43	0.43
		OBP	1.61	1.55	1.42	1.00	0.43	0.55	0.40	0.42

The values inside the table represent the average MSPE ratios of spatial model and FH model over four/five non-sampled states. The bottom margin ('Overall') of the table represents the average MSPE ratio of spatial model and FH model over 49 non-sampled states.

When using a weak covariate, the MSPEs and APSDs of SAR, SCAR, CAR, and LCAR models for the non-sampled states are usually smaller than those based on the independent FH model. Their Overall’ MSPE ratios (to the FH independent model) are $0.41, 0.60, 0.43$ and 0.43, respectively. The Overall’ APSD ratios are $0.69, 0.85, 0.67$ and 0.66, respectively. Both SAR and CAR models produce an MSPE ratio greater than 1 for one data set, whereas SCAR and LCAR models produce MSPE ratios greater than 1 for two data sets. Even after substantial exploratory analysis we have been unable to hit on a possible explanation for this anomaly.

For the more effective covariate setting, in terms of both MSPE and APSD ratios, except the LCAR model all other spatial models fare worse than the independent FH model. In this case, the LCAR model has the smallest MSPE ratio of 0.95 and the APSD ratio of 0.97.

Table 5.

Mean squared prediction error (MSPE) and average posterior standard deviation (APSD) for the spatial models and the independent FH model in a simulation study. We present MSPE results for both quasi-Bayesian and OBP methods.

Model	Strong covariate: $x_{I}$			Weak covariate: $x_{2}$
	MSPE		APSD	MSPE		APSD
	Bayes	OBP	Bayes	Bayes	OBP	Bayes
$FH$	3.52	3.54	1.70	6.15	6.14	2.28
SAR	3.64	3.75	I.7I	5.16	5.28	2.09
SCAR	3.57	3.66	I.7I	5.35	5.21	2.16
CAR	3.73	3.81	1.63	5.37	5.30	2.12
LCAR	3.59	3.57	I. 67	5.27	5.31	2.10

Finally, from the right half of Table 4 we see that for the weak covariate case, the average MSPEs for the OBP predictors are lower under the spatial models than under the independent FH model. Also, the same part of this table shows that improvements in these MSPE ratios are slightly better for the OBP method than for the quasi-Bayesian method. However, for the SAR model, the Bayesian method is at least as good as the OBP method. We reiterate that while the Bayesian method provides a measure of uncertainty of the point estimates, the OBP method has no such measure available.

6. Simulation Studies

In this section, we present two simulation studies to examine our proposed quasi-Bayes approach to spatial models under two scenarios. To make the simulation setting realistic, we mimic the 1989 4 -person family median income data described in Section 5. The first scenario evaluates the quality of prediction under two different informative covariate settings with all contiguous $m = 49$ areas (48 states and Washington, D.C.) of U.S. and the second scenario evaluates the quality of prediction in the absence of direct estimates, details are shown in Subsections 6.1 and 6.2, respectively.

6.1. A Simulation Study Without Non-sampled Areas

In the first simulation, we mainly evaluate the quality of prediction under different covariate settings, strong and weak covariates. There are no non-sampled areas. For each setting, we consider S $= 100$ replicated data sets.

Data generation: We use the $D_{i}$ values from the application as the $D_{i}$ ’s in the simulations. Let $\overset{࿽}{D} =$ $m^{- 1} \sum_{i = 1}^{m} D_{i}$ . We set $ρ = 0.85$ so that the Morans I values of 100 replicated small area means range from 0.32 to 0.66 with mean 0.50 (which is consisitent with Moran’s I value for the 1989 census state median incomes in Section 5) and $A = \overset{࿽}{D} / 2$ and consider two independent covariates $x_{1}$ and $x_{2}$ with SAR spatial dependence, that is, $x_{1}, x_{2} \sim N_{m} (0_{m}, {\{Ω_{2} (ρ)\}}^{- 1})$ . Then, letting ${(β_{1}, β_{2})}^{⊤} = {(2, 1)}^{⊤}$ and $μ = β_{1} x_{1} + β_{2} x_{2}$ , we generate small area means and direct estimates from the following independent FH model:

θ \sim N_{m} (μ, A I_{m}), Y ∣ θ \sim N_{m} (θ, D) .

The covariate $x_{1} (x_{2})$ introduces stronger (weaker) spatial pattern to the $θ_{i}$ ’s, and we refer $x_{1} (x_{2})$ as the strong (weak) covariate.

The simulation outcomes presented in Table 5 align with the real data analysis outcomes presented in Table 1. At the strong covariate setting, the independent model is a very competitive model for both quasi-Bayes and OBP methods. At the weak covariate setting, the LCAR model has roughly 14 per cent smaller MSPE (quasi-Bayes method) and 8 per cent lower APSD than the FH model. In contrast, with the strong covariate setting, the MSPE and APSD values for the LCAR and FH models are comparable since $x_{2}$ can explain the majority of variability of $θ_{i}$ ’s. Moreover, when only the weaker variable $x_{1}$ is included in the regression model, the MSPE and APSD for all five models are greater than their counterparts when only the stronger covariate $x_{2}$ is included. Across these covaiate settings, the MSPEs of the SAR, SCAR, CAR and LCAR models increase by approximately 42 per cent, 50 per cent, 44 per cent and 47 per cent, respectively, for the quasi-Bayes method, and 41 per cent, 42 per cent, 39 per cent and 49 per cent, respectively, for the OBP method. While the MSPE of the independent FH model increases by 75 per cent and 73 per cent, respectively for the quasi-Bayes and OBP methods. These results again validate usefulness of spatial models when available covariates are not much effective in explaining existing spatial variation.

6.2. A Simulation Study with Non-sampled Areas

In the second simulation study, we compare prediction performances of the $FH$ model and the four spatial models in the absence of informative covariates with several non-sampled areas. Following the setup in Chung and Datta^[4], we also do not simulate direct estimates for the following $m_{1} = 7$ states: Delaware, Massachusetts, Michigan, Nebraska, Rhode Island, South Dakota, and Texas. For the remaining $m_{2} = 42$ areas we have direct estimates.

Data generation: Similar to the first simulation, the only difference is we do not simulate direct estimates of $m_{1} = 7$ states. We denote $\bar{D} = m_{2}^{- 1} \sum_{i = 1}^{m_{2}} D_{i}$ and $D_{(2)} = diag {\{D_{i}\}}_{i = 1}^{m_{2}}$ . We set $ρ = 0.85$ and $A = \overset{࿽}{D} / 2$ and consider two independent covariates $x_{1}$ and $x_{2}$ with SAR spatial dependence, that is, $x_{1}, x_{2} \sim N_{m} (0_{m}, {\{Ω_{2} (ρ)\}}^{- 1})$ . Then, letting ${(β_{1}, β_{2})}^{⊤} = {(2, 1)}^{⊤}$ and $μ = β_{1} x_{1} + β_{2} x_{2}$ . However, we generate small area means and direct estimates from the following independent FH model:

θ \sim N_{m} (μ, A I_{m}), Y_{(2)} ∣ θ_{(2)} \sim N_{m} (θ_{(2)}, D_{(2)}),

where the components of $Y_{(2)}$ and $θ_{(2)}$ correspond to the $m_{2}$ sampled small areas.

Table 6 presents the average MSPE and MSPE ratios across four spatial models and the FH model if only the strong or weak covariate

x_{2}

x_{1}

is included. When only a strong covariate is given, the independent model is a very competitive model for both quasi-Bayes and OBP methods for both sample states and non-sampled states. Improvements are greater when only a weak covariate is added compared to when only a strong covariate is included. For the quasi-Bayes method under weak covariate setting, LCAR and SAR models have roughly 12 per cent and 13 per cent lower MSPEs than the FH model for sampled states, respectively. For un-sampled states, both LCAR and SAR models have about 33 per cent lower MSPEs than the FH model. We reach a similar conclusion about the spatial models when we compare them in terms of APSD presented in Table 7. Overall, among the four spatial models, LCAR is the most competitive, considering the data were generated using the SAR model. In terms of MSPEs our quasi-Bayes predictors generally outperform the OBP predictors for all spatial models.

Table 6.

Average MSPE values for the spatial models and FH model and MSPE Ratios of different spatial models to the $FH$ model.

Model	Method	Strong covariate: $x_{1}$				Weak covariate: $x_{2}$
		Sampled states		Unsampled states		Sampled states		Unsampled states
		MSPE	Ratio	MSPE	Ratio	MSPE	Ratio	MSPE	Ratio
$FH$	$q B$	3.77	I	6.02	I	6.43	I	17.38	I
$FH$	OBP	3.80	I	6.02	I	6.43	I	17.38	I
SAR	$q B$	3.93	1.04	6.56	1.09	5.60	0.87	11.63	0.67
SAR	OBP	4.10	1.08	7.10	1.18	5.74	0.89	12.54	0.72
SCAR	$q B$	3.88	1.03	6.31	1.05	5.82	0.91	12.14	0.70
SCAR	OBP	4.02	1.06	6.68	I.II	5.72	0.89	12.32	$0.7 I$
CAR	$q B$	3.96	1.06	6.24	1.04	5.88	0.91	13.66	0.79
CAR	OBP	4.10	1.08	6.59	1.09	5.84	0.91	13.90	0.80
LCAR	$q B$	3.86	1.02	6.29	1.04	5.66	0.88	11.40	0.66
LCAR	OBP	3.87	1.02	6.35	1.05	5.73	0.89	12.16	0.70

The Ratio’ columns are ratios of MSPE for spatial models to the FH model. We abbreviate quasi-Bayes by qB.

Table 7.

APSD values for all models and APSD Ratios of spatial models to the FH model.

Model	Strong covariate: $x_{1}$				Weak covariate: $x_{2}$
	Sampled states		Unsampled states		Sampled states		Unsampled states
	APSD	Ratio	APSD	Ratio	APSD	Ratio	APSD	Ratio
$FH$	1.73	I	2.37	I	2.33	I	4.48	I
SAR	I. 74	1.00	2.42	1.02	2.17	0.93	3.79	0.85
SCAR	1.73	1.00	2.36	1.00	2.24	0.96	4.04	0.90
CAR	1.68	0.97	2.40	1.01	2.19	0.94	4.12	0.92
LCAR	I.71	0.99	2.39	1.01	2.17	0.93	3.88	0.87

The Ratio’ columns are ratios of APSD for spatial models to the FH model.

7. Conclusions

Jiang et al.^[11] suggested the OBP method as an alternative to the EBLUP method to predict small area means. Although these authors suggested the OBP approach for a general mixed effects model, accurate estimation of the mean squared error of the OBP predictors of small area means are available only for the independent Fay-Herriot model (cf. Liu et al.^[13]). Derivation of the MSE estimator by Liu et al.^[13] is tedious and the estimator is not guaranteed to be positive. No such results are available for spatial random effects models. Spatial random effects models provide useful prediction of small area means in the absence of good covariates to explain the spatial variation of the small area means (cf. Chung and Datta^[4]). In this work, we have developed a quasi-Bayesian version of the OBP method for prediction of small area means based on various spatial random effects model. This work is a generalization for spatial models of the quasi-Bayesian method by Datta et al.^[6]. It also generalizes the regular Bayesian spatial method of Chung and Datta^[4] for the OBP method based on a quasi-Bayesian argument. Evaluation of the proposed quasi-Bayes spatial predictions based on available Census data and realistic simulations show the usefulness of quasi-Bayesian spatial predictors of small area means. These methods are straightforward to implement to create reliable measures of uncertainty and credible intervals for the small area means.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors received no financial support for the research, authorship and/or publication of this article.

ORCID iD

Gauri S. Datta

References

Battese

, Harter

, Fuller

WA.

An error-components model for prediction of county crop areas using survey and satellite data. J Amer Stat Assoc 1988; 83: 28–36.

Bissiri

, Holmes

, Walker

SG.

A general framework for updating belief distributions. J Royal Stat Soc: Series B (Stat Meth) 2016; 78: 1103–1130.

Bissiri

, Walker

SG.

On general Bayesian inference using loss functions. Stat Prob Lett 2019; 152: 89–91.

Chung

, Datta

GS.

Bayesian spatial models for estimating means of sampled and non-sampled small areas. Surv Meth 2022; 48.2: 463–489.

Datta

, Lahiri

A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Stat Sinica 2000; 10: 613–627.

Datta

, Lee

, Li

Pseudo-Bayesian small area estimation. J Surv Stat Meth 2023.

Datta

, Rao

JNK

, Smith

DD.

On measuring the variability of small area estimators under a basic area level model. Biometrika 2005; 92: 183–196.

Efron

, Morris

Empirical Bayes on vector observations: An extension of Stein’s method. Biometrika 1972; 59: 335–347.

Fay

, Herriot

RA.

Estimates of income for small places: an application of James-Stein procedures to census data. J Amer Stat Assoc 1979; 74: 269–277.

10.

Ghosh

Hierarchical and empirical Bayes multivariate estimation, In: Current issues in statistical inference: Essays in honor of D. Basu, IMS Lecture Notes - Monograph Series , Volume 17, Ghosh

and Pathak

, Editors. Beachwood, OH: Institute of Mathematical Statistics 1992.

11.

Jiang

, Nguyen

, Rao

JS.

Best predictive small area estimation. J Amer Stat Assoc 2011; 106: 732–745.

12.

Lahiri

, Rao

Robust estimation of mean squared error of small area estimators. J Amer Stat Assoc 1995; 90: 758–766.

13.

Liu

, Liu

, Pan

, Jiang

, Xiao

An empirical comparison of various MSPE estimators and associated prediction intervals for small area means. J Stat Comput Simulat 2022; 93: 1–27.

14.

Petrucci

, Salvati

Small area estimation for spatial correlation in watershed erosion assessment. J Agri Bio Env Stat 2006; 11: 169–182.

15.

Prasad

NGN

, Rao

JNK.

The estimation of the mean squared error of small-area estimators. J Amer Stat Assoc 1990; 85: 163–171.

16.

Pratesi

, Petrucci

, Salvati

Spatial disaggregation and small-area estimation methods for agricultural surveys: solutions and perspectives. 2015: 1–160.

17.

Pratesi

, Salvati

Small area estimation: the EBLUP estimator based on spatially correlated random area effects. Stat Meth App 2008; 17: 113–141.

18.

Rao

JNK

, Molina

Small area estimation . Hoboken, NJ: John Wiley & Sons, Inc 2015.

19.

Saei

, Chambers

Out of sample estimation for small areas using area level data. 2005. Url: https://eprints.soton.ac.uk/14327/1/14327-01.pdf.

20.

Singh

, Shukla

, Kundu

Spatio-temporal models in small area estimation. Surv Meth 2005; 31: 183.

21.

Stein

Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Biometrika 1956; 1: 197–206.

22.

Vogt

Bayesian spatial modeling: propriety and applications to small area estimation with focus on the German census 2011 . PhD thesis, Dissertation, Trier, Universität Trier, 2010.

23.

Vogt

, Lahiri

, Munnich

Spatial prediction in small area estimation. Stat Trans New Series 2023; 24.

A quasi-Bayesian Approach to Small Area Estimation Using Spatial Models

Abstract

Keywords

1. Introduction

Table 1.

Mean squared prediction error (MSPE) based on Bayesian and OBP methods and average posterior standard deviation (APSD) of spatial models over the independent FH model.

4.1. Update of a Prior by a Loss Function: a Quasi-posterior

Table 2.

Posterior mean/mode (standard deviation) and 95% credible interval (Crl) of ρ . For each regression model, the first row summarizes posterior mean, mode and standard deviation of ρ , in that order, and the second row summarizes the Crl of ρ .

Table 3.

Ratios of averages of posterior standard deviations of spatial models to the F H model.

Ratios of MSPEs of spatial models to the FH model.

Mean squared prediction error (MSPE) and average posterior standard deviation (APSD) for the spatial models and the independent FH model in a simulation study. We present MSPE results for both quasi-Bayesian and OBP methods.

6.1. A Simulation Study Without Non-sampled Areas

6.2. A Simulation Study with Non-sampled Areas

Table 6.

Average MSPE values for the spatial models and FH model and MSPE Ratios of different spatial models to the FH model.

APSD values for all models and APSD Ratios of spatial models to the FH model.

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

References

Posterior mean/mode (standard deviation) and 95% credible interval (Crl) of $ρ$ . For each regression model, the first row summarizes posterior mean, mode and standard deviation of $ρ$ , in that order, and the second row summarizes the Crl of $ρ$ .

Ratios of averages of posterior standard deviations of spatial models to the $F H$ model.

Average MSPE values for the spatial models and FH model and MSPE Ratios of different spatial models to the $FH$ model.