Sage Journals: Discover world-class research

Abstract

The Director of the US Census Bureau every five years, under the Voting Rights Act (VRA) Section 203(b), determines which states and political subdivisions must provide ballot assistance in languages other than English. These rule-based determinations use small-area estimates of the proportions of voting-age citizen members of racial/ethnic Language Minority Groups (LMGs) who are limited Englishproficient (LEP) and of LEP LMG persons who are also illiterate. This large-scale Small Area Estimation (SAE) effort by the US Census Bureau is based on American Community Survey five-year data. This paper focuses on the unique attributes of this SAE problem, including the predominance in each LMG of tiny domains along with relatively few large domains that account for the bulk of LMG citizens. The data and small area models are treated separately for distinct LMGs, and the spirit of the law requires that the domain estimates should be produced in a similar way across LMGs for each type of geography (Jurisdictions, which are mostly counties, plus two types of American Indian and Alaskan Native areas). The paper describes and assesses the SAE models developed under unique constraints, including novel methodological aspects and the use of hybrid frequentist and Bayesian computational techniques; situates these modelling choices in the broader literature on SAE; and discusses the strengths and weaknesses of the resulting models.

Keywords

Small area estimation mixed-effect multinomial regression multinomial logit-normal model mean squared prediction error model diagnostics

1. Introduction

According to Section 203(b) of the U.S. Voting Rights Act (VRA) of 1965 beginning in 1975, states and political subdivisions must in certain circumstances make voting materials available in languages other than English. Section 203(b) prescribes that the Director of the US Census Bureau shall make these coverage determinations every five years, based on the most current population estimates derived from the American Community Survey (ACS) five-year data along with relevant census data. These determinations are based on estimates of voting-age citizen subgroups defined by race and language proficiency in States and local geographic areas called Jurisdictions, American Indian Areas (AIAs), and Alaskan Native Regional Corporations (ANRCs). Jurisdictions are Counties in all but eight states and Minor Civil Divisions in the other states.

The 2021 determinations (Census RVRDO)^[6] were based on the 2015-2019 five-year ACS data. These data contain at least one voting-age respondent in 7,859 Jurisdictions, 568 AIAs, and all 12 ANRCs. The racial and ethnic self-identifications of census or ACS respondents in these geographies are used to define 73 Language Minority Groups’ (LMGs). Of the LMGs eligible in 2021, 21 are Asian, 51 are American Indian or Alaska Native (AIAN) and 1 is Hispanic.

Within the LMGs, Section 203(b) categorizes voting-age persons according to responses to ACS questions regarding citizenship, limited English proficiency and illiteracy. Limited English-proficiency is defined as speaking a language other than English at home and speaking English less than very well’. Illiteracy is defined as having less than a fifth grade education. Throughout this paper, we refer to the relevant nested population subgroups by the following abbreviations:

VOT: Voting-age persons;

CIT: Voting-age citizens;

LEP: Limited English-proficient, voting-age citizens; and

ILL: Illiterate, limited English-proficient, voting-age citizens.

VRA Section 203(b) coverage determinations for states and political subdivisions depend according to specific rules on estimates and ratios of estimates of the CIT population within geographic areas and of subpopulations LEP and ILL within geographic areas intersected with LMGs (Census RVRDO)^[6]. The rules for the determinations are specified in the data-release documents and Technical Report Slud et al.^[32], Appendices A-B. The estimates underlying determinations for each LMG must depend on data for only that LMG, but the techniques for producing the estimates should be similar across LMGs.

In this Introduction, we explain the rationale and properties of model-based small area estimates of the various population counts needed for VRA determinations. To fix terminology and notations, we restrict attention only to the Jurisdiction geography-type.

1.1. VRA Section 203 Determinations as a Statistical Problem

Within each LMG (labelled g) and geographic unit (jurisdiction, labelled j), the primary ACS data consist of a sample of $n_{j g}^{(a)}$ observations from the ACS in the four outcome levels a = 0,1,2,3, respectively, corresponding to VOT, CIT, LEP, and ILL. Using the same notation for outcomes a, the statistical goal is to estimate the true population counts $N_{j g}^{(a)}$ in terms of the direct survey-weighted estimates ${\hat{N}}_{j g}^{(a)}$ for each (Geo, LMG) domain (j,g); in addition, the true CIT population $N_{j +}^{(1)}$ in jurisdiction j is estimated by the direct survey-weighted estimate ${\hat{N}}_{j +}^{(1)}$ . If the direct estimators ${\hat{N}}_{j g}^{(a)}$ were sufficiently accurate, they would be the design-based estimators of choice. However, many of them are based on very small sample sizes and yield estimates with unacceptably large standard errors (SEs).

To mitigate this small-domain imprecision, starting in 2011 (Joyce et al.)^{[14, 15]} and also in 2016 (Slud et al.)^[31] and 2021 (Slud et al.)^[32], the Census Bureau embarked on a research and development effort to produce domain population estimates and ratios supporting the VRA Section 203 determinations via model-based analyses of ACS - and, if possible, decennial census - data. This research on improving the estimates supporting the Section 203(b) determinations has been conducted within a framework of model-based Small Area Estimation (SAE). The main idea of SAE is that many small (GLMG) domains within the same LMG may behave similarly with respect to domain proportions of CIT within the VOT population, of LEP within CIT and of ILL within the LEP population. This similarity may manifest itself within each LMG in the form of shared relationships and model parameters between the outcome proportions, observable domain-specific covariates, and similarly distributed random effects distinguishing jurisdictions. In the 2016 formulation the random probabilities $N_{j g}^{(a + 1)} / N_{j g}^{(0)} \equiv ω_{j}^{(a)}$ for outcome a = 0,1,2 in (Geo, LMG) domain (j,g) were modelled using the Dirichlet distribution with regression-based parameters as in (2.4) below. In 2021, a Dirichlet multinomial model was again considered, but the production of official estimates was done using a multinomial logit-normal regression model (2.5).

Separate models were estimated for each LMG across jurisdictions, and also for each AIAN LMG across AIAs and again across ANRCs. Guided by the principle of analysing each combination of LMG and geography type separately, since 2016 each model has been fitted from data on all voting-age persons within the LMG regardless of their membership in other LMGs. The only way in which models fitted to different LMGs influence one another is that the subset of fixed-effect covariates used for an LMG in each geography type was chosen from a list of possible predictors according to a grouping of LMGs with similar numbers of (Geo, LMG) domains containing ACS samples.

As in most SAE projects, an important first goal was to document the improvement in accuracy of the resulting model-based estimates over the estimates that would have arisen through purely survey-weighted estimates (called design-based’ or direct’ estimates) that do not rely on models of relationships between outcome variables or between outcomes and auxiliary data such as administrative records. In the 2011 and 2016 cycles of the VRA project, this was done exclusively by displaying the extent to which variances (or Mean-Square Prediction Errors in the empirical Bayes models) are smaller, domain by domain, for the model-based estimates versus the direct estimates. That was neither true in all (Geo, LMG) domains, nor was it expected to be, but it was true for most domains. Similarly, analysis in the 2021 cycle Slud et al.^[32] demonstrated the same kind of systematic but not universal variance improvement for the model-based estimates over their direct survey-weighted analogues. But the 2021 analysis had the further goals of examining model adequacy and of comparing the quality of estimates across different (Bayesian versus frequentist) estimation techniques with the aid of diagnostics. In the absence of ground truth, it has been necessary to rely on diagnostics comparing aggregated estimates versus aggregated direct estimates, since the direct estimates in a high-quality, high response-rate survey like the ACS are nearly unbiased, and these estimates are therefore reliable on domains with large sample. In a limited way, model adequacy was investigated in 2016 (Ashmead and Slud)^[2] using model-based reference distributions determined by parametric bootstrap in empirical Bayes SAE.

Thus, the goals of the SAE analysis in the VRA Section 203(b) project are improved accuracy over direct estimates and careful model assessment, and the goals of this paper are to display both with emphasis on SAE methodological validity and novelty. The paper addresses the question of whether and in what way the covariates were useful across a range of LMGs of radically different sizes. Similarly, we study whether and how the models with the most detailed random effects parameterizations can be documented to outperform those with simpler, independent random effects for the three successive outcomes CIT within VOT, LEP within CIT, and ILL within LEP.

1.2. Special Features of SAE in the VRA Section 203(b) Project

The SAE problem described here has several distinctive features that partially dictate the models and methods used. There were 7,859 jurisdictions with an ACS sample in 2015-2019, further broken down by LMG, and the great majority of (Geo, LMG) domains were very small. Partly for that reason, the available predictive covariates were very weak, as summarized in Section 2.3 and discussed fully in the Technical Report Slud et al.^[32] (Section 2.3).

The number of (Geo, LMG) domains with a voting-age sample varies widely across LMGs. There are 76 LMGs, after breaking the Hispanic LMG into four (numbered 73-76) by Census Region. Most LMGs have fewer than 100 jurisdictions with a voting-age sample size >10 but the largest 10 LMGs have hundreds of jurisdictions with a sample > 100. The numbers of jurisdictions with the same minimum sample sizes of LEP persons was much smaller. The largest LMGs have hundreds of jurisdictions with LEP samples > 20, but the smallest 30 LMGs have fewer than 100 jurisdictions with LEP sample size > 3. The tremendous imbalance between the numbers of very low-sample (Geo, LMG) domains and of moderate-to-large ones plays a dominating role in the kinds of models that can be fitted and in their diagnostic model assessments. Supplement Figure S1(a)-(b) show for LMGs ordered by decreasing population the log number of jurisdictions with at least k sampled persons, respectively for voting-age and LEP persons, for several values of k.

Some of the special features that make the SAE problem in VRA Section 203(b) unique are:

(i)
Nested decreasing outcome categories CIT, LEP, ILL within VOT;
(ii)
A large number of parallel SAE analyses for domains within LMGs (73 LMGs for all jurisdictions, 51 AIAN LMGs for AIAs and for ANRCs) ranging from very data-rich to data-sparse: Analyses should be similar with respect to model type across (large groups of) LMGs, but data analyses in distinct LMGs must be separate;
(iii)
Great predominance in each LMG of geographic units with sample size < 5, among which many counts of 0 LEP and ILL outcomes are observed;
(iv)
Domain-specific covariates are extremely weak, derived from higher-level survey estimates.

Many SAE problems have feature (i), and some with feature (iv) produce estimates primarily exhibiting empirical Bayes shrinkage. Large statistical agencies often see small-domain datasets with features (ii) (iii), although those datasets seldom lead to useful small area estimates unless, as in the Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) programme, the predictive covariates are very strong (Maples)^[21].

This paper outlines models and analytical tools motivated by features (i)-(iv). It introduces novel methodological wrinkles combining frequentist and Bayesian techniques and demonstrates that the resulting SAE estimates are sensible and clearly superior to the direct survey-weighted estimates. The remainder of the paper is organized as follows. The next section describes SAE methodology leading up to the MLN models used in production, along with an SAE literature review placing those models in context and a description of predictive covariates. Section 3 contains several comparisons: (1) SEs for SAE model estimates versus directed survey-weight estimates, (2) predictions obtained by three different SAE calculations within related MLN models, (3) aggregated predictions from MLN models versus the unbiased aggregated direct survey estimates and (4) predictions and run-times for Bayesian versus frequentist estimates within the same MLN model. Then a concluding section discusses our findings, limitations of the SAE models used to produce estimates, and new directions for methodological research in this SAE problem.
2. Methodology

Section 203(b) of the VRA requires that a geographic unit (jurisdiction, AIA or ANRC) provide ballot assistance in a language other than English in part as a function of the estimated proportions of LMG LEP citizens among the citizens of the geographic unit, or of LMG ILL citizens among the LMG LEP citizens of the geographic unit. There is another way to trigger’ ballot language assistance, in jurisdictions with more than 10,000 LEP citizens, but such jurisdictions have a large enough sample that direct estimates are sufficiently accurate to estimate domain counts. The focus in VRA on proportions of LMG citizens in the successive nested categories CIT, LEP, and ILL motivates a nested, hierarchical structure for all SAE model classes considered. That is, any joint SAE model used to describe the data should incorporate an interpretable covariate-based random-effect model for the (Geo, LMG) count of CIT outcomes within the VOT (Geo, LMG) domain, and similarly should incorporate models for the LEP outcome within the CIT (Geo, LMG) domain and for the ILL outcome within the LEP (Geo, LMG) domain. These conditional models represent counts of outcome a + 1 within the (Geo, LMG) domains of outcome a, for a = 0,1,2. Neither the domain (j, g) VOT counts $N_{j g}^{(0)}$ nor the proportions $N_{j g}^{(0)} / N_{j +}^{(0)}$ were modelled as functions of j or g in the 2011 through 2021 VRA cycles: these counts $N_{j g}^{(0)}, N_{j +}^{(0)}$ have always been estimated directly by the survey-weighted ACS estimates ${\hat{N}}_{j g}^{(0)}, {\hat{N}}_{j +}^{(0)}$ . The simplest regression models for counts in domains of specified sample size are multinomial random-effect regression models, and we describe two main classes of such models that we chose between: logistic regression and beta-binomial. Considered as multi-category random-effects models, these are usually called MLN and Dirichlet multinomial models. The domain-level random effects induce domain-level dependence between characteristics of sampled individuals, and the fixed-effect specification is chosen as in ordinary (single-outcome) logistic regression.

2.1. SAE Literature Review

SAE aims to improve domain predictions relative to direct survey estimates by borrowing strength from similar domains and auxiliary data through shared model parameters (Rao and Molina)^[28]. The specific interest here is in SAE models, such as multinomial logistic models, that can handle categorical outcomes. The general reference Agresti^[1] (Section 13.2-13.4), not focused on SAE, treats logistic-normal and other mixed-effect multinomial regression models, while the paper Jiang and Lahiri^[13] (Section 4) covers multinomial regression models in SAE from the general viewpoint of mixed-effect generalized linear models. The Dirichlet multinomial model, mentioned in textbooks as generalizations of the BetaBinomial (Carlin and Louis^[5]; Agresti^[1]), has also been used in SAE for categorical responses in Census Bureau research (Slud et al.^[31]; Maples^[21]).

Other authors have used MLN models for SAE. A seminal early work Ghosh et al.^[12] treated hierarchical generalized linear mixed-effect models for SAE with unit-level data. SAE papers studying area-level MLN models include Molina et al.^[24] López-Vizcaíno et al.^[18]; López-Vizcaíno et al.^[19]; the first of these papers applied an MLN model with one shared random effect across employment categories, and the third also included the temporal structure of the random effects. The latter two papers include distinct random effects across multinomial categories, but their assumption that random effects are independent within each area would be unrealistic in many settings. Some articles (Saei and Taylor^[29]; Scealy^[30]) used MLN with dependent random effects, but their models handled only three categories, and the extension to more categories is not straightforward (Dawber et al)^[8].

The paper Parker et al.^[25] provides a different Bayesian implementation of unit-level models, using variational Bayes computations. It also incorporates pseudo-likelihood adjustments to account for possible informative sampling, which is important for unit-level models. Despite achieving computational efficiency for a unit-level model, this approach might be computationally expensive relative to area-level SAE. Unit- and area-level models each have advantages and disadvantages (Franco and Maitra)^[10].

The MLN regression parameterization used here for SAE in the VRA application (Slud et al.)^[32] is particularly suited to the nested outcomes CIT, LEP and ILL but differs from other parameterizations in the SAE literature. This is the continuation-ratio logit’ (Agresti) ^[1] (Section 8.3.6) that is most often applied to longitudinal or survival data and was introduced into mixed-effect MLNs by Coull and Agresti ^[7]. This parameterization has been used in a Dirichlet multinomial SAE model (Slud et al.)^[31], both in the usual version and in a variant with a dispersion parameter varying linearly with the square root of domain sample size. Within the general continuation-ratio MLN models, the multi-category Gaussian random effects can be dependent. A published data illustration (Coull and Agresti)^[7] performed likelihood ratio tests (LRTs) for various submodels but not for the hypothesis of independent random effects; however, our experience (Slud et al.)^[32] (Section 3.5) suggests that in SAE applications with multiple nested categories, the MLN model with independent random effects across categories may often be adequately general.

SAE literature contains a few other proposals for handling categorical outcome data. As an extension of the work (Liu et al.^[17]), the paper (Wang et al.^[35]) assumes the standard design-based estimators of proportions follow a generalized Dirichlet distribution. However, continuous distributions are not always appropriate for discrete quantities that have a large number of zeros and may cause problems when implementing MCMC. A completely different unit-level proposal was based on expectile regression (Dawber et al.^[8]: although computationally efficient when there are many categories, its estimates have not been shown to be design-consistent, and it does not account for informative sampling.

2.1.1. Computational Issues

There is a large literature on statistical computing of the Bayesian MCMC and frequentist MLE estimates discussed in this paper. On the Bayesian side, our general references are the Stan Manual and background documentation (Stan Development Team^[33]; Gelman et al.^[11]) and R interface (Stan Development Team^[34]; R Core Team^[27]). The Stan references include both a Variational Bayes (experimental) sampler that was used to initialize the computations and the standard HMC-NUTS sampler. A Stan implementation of a unit-level multinomial logistic regression similar to our area-level approach can be found in the ecology literature (Koster and McElreath)^[16]. This model is also covered in the monograph McElreath^[23]. In our Bayesian MLN implementation, the regression parameterization of the model was modified to the continuation-ratio logit’.

During much of the period during which frequentist MLN models were explored in the SAE literature, the computation of mixed-effect MLN regression models was considered difficult (Jiang and Lahiri)^[13] (Section 4.3.1). That is no longer true, due to Adaptive Gaussian Quadrature (AGQ) algorithms (Pinheiro and Bates)^[26], which are implemented on many statistical computing platforms including R. The frequentist MLE computations for this paper are fully described in Slud et al.^[32] (Section E.1) and were implemented in customized programs in R.

2.2. Model Description

The VRA problem setting is expressed in terms of a succession of proportions CIT/VOT, LEP/CIT and ILL/LEP of the populations in (Geo, LMG) domains. In the notation of the Introduction, these are proportions $N_{j g}^{(a + 1)} / N_{j g}^{(a)}$ of counts of outcome a + 1 within the (Geo, LMG) domain defined by outcome a, for a = 0,1,2, where indices 0,1,2,3 correspond to VOT, CIT, LEP and ILL. To reflect this nesting, and to avoid having to bring sample sizes $n_{j g}^{(a)}$ into the regression, the fixed-effect regression specification of all models considered is expressed through the rate of occurrence for outcome a + 1 among outcome a individuals in (Geo, LMG) domain (j,g), denoted for a = 0,1,2 by

μ_{a, j g} = exp (η_{a, j g}) / \{1 + exp (η_{a, j g})\} \equiv h (η_{a, j g}), η_{a, j g} = β_{g}^{{(a)}^{'}} {\underline{X}}_{j g}

(2.1)

where ${\underline{X}}_{j g}$ is a domain-specific vector of observed covariates. To simplify writing later, $h (x) = e^{x} / (1 + e^{x})$ denotes the usual logistic distribution function, and all LMG subscripts g are suppressed from now on, because model-based analysis is done only within separate LMGs. Coefficients $β^{(a)}$ are the same in all geographic units j within the same fixed LMG but may contain structural 0’s, and the rates $μ_{a, j}$ . are determined for all j from fixed-effect regression scores $η_{a, j}$ . We view $μ_{a, j}$ as a location parameter for the random unknown whole-domain ratio $ω_{j}^{(a)} = N_{j g}^{(a + 1)} / N_{j g}^{(a)}$ , with random rates $ω_{j}^{(a)}$ assumed independent across indices j.

The direct (i.e., survey-weighted) estimated counts ${\hat{N}}_{j}^{(a + 1)}$ are the basic data from ACS in the VRA analysis. The direct-estimated VOT count ${\hat{N}}_{j}^{(0)}$ in the jurisdiction is treated as fixed, along with the number $n_{j}^{(0)}$ of sampled VOT individuals in domain (j, g), so that ${\hat{N}}_{j}^{(a + 1)} / {\hat{N}}_{j}^{(0)}$ is the direct ratio estimate of $N_{j}^{(a + 1)} / N_{j}^{(0)} = \prod_{k = 0}^{a} ω_{j}^{(k)}$ . The effective observed $a + 1$ outcome counts in domain (j, g) on the sampled VOT individuals are the sample VOT counts scaled by the estimated proportion of outcome a + 1 within VOT, so the response data $Y_{a + 1, j}$ at all outcome levels a + 1 in domain (j, g) are defined by

Y_{k, j} = n_{j}^{(0)} \cdot \{{\hat{N}}_{j}^{(k)} / {\hat{N}}_{j}^{(0)}\}, k = 0, 1, 2, 3

(2.2)

These response variables are in general not integers, although their successive differences are modelled below as multinomial random variables, with parameters estimated using likelihood expressions which make mathematical sense but are not actually likelihoods for non-integer responses.

We next present the models used and considered in past and present cycles of SAE in the VRA context. First express the response data for mutually exclusive rather than nested outcome categories (VOT not CIT, CIT not LEP, LEP not ILL, and ILL) in the form

{\underline{W}}_{j} \equiv (W_{k, j}, k = 0, 1, 2, 3) = (Y_{0, j} - Y_{1, j}, Y_{1, j} - Y_{2, j}, Y_{2, j} - Y_{3, j}, Y_{3, j})

The multinomial regression model has the form

{\underline{W}}_{j} \sim Multinom (n_{j}^{(0)}, {\underline{π}}_{j}) given {\underline{π}}_{j} = (π_{j}^{(a)}, a = 0, 1, 2, 3)

(2.3)

where ${\underline{π}}_{j}$ is a vector of random probabilities with entries $π_{j}^{(a)}$ defined in terms of fixed-effects $η_{a, j}$ and of random effects entering differently in different models.

The Dirichlet multinomial model used in the VRA project in 2016, similar in spirit but not in detail to the one used in 2011, was (2.3) with ${\underline{π}}_{j}$ defined by

{\underline{π}}_{j} = (π_{j}^{(a)}, a = 0, 1, 2, 3) \sim Dirichlet (τ \cdot {\underline{v}}_{j})

(2.4)

where the fixed-effect rates ${\underline{v}}_{j} = (v_{0, j}, v_{1, j}, v_{2, j}, v_{3, j})$ are defined from (2.1) as

{\underline{v}}_{j} = (1 - μ_{j}^{(0)}, μ_{j}^{(0)} \cdot (1 - μ_{j}^{(1)}), μ_{j}^{(0)} μ_{j}^{(1)} \cdot (1 - μ_{j}^{(2)}), μ_{j}^{(0)} μ_{j}^{(1)} μ_{j}^{(2)})

and the unknown parameters are $(τ, (β^{(k)}, k = 0, 1, 2))$ . The random-effect structure of this model is standard (Carlin and Louis)^[5], but the fixed-effect parameterization in terms of (2.1) is not. As used in 2016 (Slud et al.)^[31], the model replaced $τ$ in each domain j by a term $τ + τ_{*} \cdot {(n_{j}^{(0)})}^{1 / 2}$ containing an additional parameter and a dependence on VOT sample size.

The MLN model with similar fixed-effect parameterization takes the form (2.3) with

π_{j}^{(a)} = (1 - h (η_{a, j} + u_{a, j})) \cdot \prod_{k = 0}^{a - 1} h (η_{k, j} + u_{k, j}), a = 0, 1, 2

(2.5)

where the random effect triples $(u_{0, j}, u_{1, j}, u_{2, j})$ are jointly normal $N (\underline{0}, Σ)$ for each j and independent across distinct j, and the empty product $\prod_{k = 0}^{- 1} \{\cdot\}$ is taken to be equal to 1. The fixed-effect regression terms here are called a continuation-ratio parameterization by Coull and Agresti^[7], who were the first to use it in a mixed-effect multinomial regression model. The unknown parameters in this (MLN-F) model are $(β^{(0)}, β^{(1)}, β^{(2)}, Σ)$ , where $Σ$ is a general 3 × 3 covariance matrix parameter of dimension 6. Model (2.5) can readily be shown to be equivalent to the three successive conditional models for a = 0,1,2:

Y_{a + 1, j} \sim Binom (Y_{a, j}, h (η_{a, j} + u_{a, j})) conditionally given (Y_{k, j}, u_{k j} : k \leq a)

(2.6)

The Appendix summarizes the different mathematical notations and assumptions used in these random-effect models and relates them to the random true-population ratios $ω_{j}^{(a)}$ .

If in (2.3) and (2.5) or in (2.6) the random effects $u_{k, j}$ are assumed independent across $k$ , that is, if $Σ$ is assumed diagonal with diagonal entries $(σ_{a}^{2}, a = 0, 1, 2)$ , then the conditional models simplify. The combined model in this case is called MLN-D from now on and has three fewer parameters than MLN-F. In MLN-D, the unknown parameters decouple into three non-overlapping vectors of parameters $(β_{j}^{(a)}, σ_{a}^{2})$ for a = 0,1,2 that can be estimated separately by maximizing the conditional likelihoods for data ${\{Y_{a + 1, j}\}}_{j}$ given ${\{(Y_{k, j} : k \leq a)\}}_{j}$ .

Model (2.3)-(2.4) can also be presented as successive conditional models. Using well-known properties of the Dirichlet distribution it is shown to be precisely equivalent to the models

\begin{array}{l} Y_{a + 1, j} \sim Binom (Y_{a, j}, α_{a, j}) conditionally given \{Y_{k, j} : k \leq a\} \\ α_{a, j} \sim Beta (τ_{a} \prod_{k = 0}^{a} h (η_{k, j}), τ_{a} (1 - h (η_{a, j})) \cdot \prod_{k = 0}^{a - 1} h (η_{k, j})) \end{array}

(2.7)

in the special case where $τ_{0} = τ_{1} = τ_{2} \equiv τ$ . On the other hand, by allowing the unknown $τ_{k}$ parameters to be distinct in (2.7), we enrich the Dirichlet multinomial model while still maintaining the appealing nested hierarchical structure and the same parameter dimension as MLN-D. This expanded model is no longer Dirichlet multinomial. In fact, it resembles the conditional beta-binomial model used in VRA analysis in 2011 (Joyce et al.)^[15], except that the covariates entered there not through regression but through a grouping of data across domains within LMGs.

In the process of model-building in Slud et al.^[32] (Section 3.2), predictions based on maximum likelihood fits of parameters in the Dirichlet multinomial model (2.7) were compared with those in MLN-D. The MLN-D model was found to perform better with respect to simple diagnostics, and further analyses and production of estimates for the 2021 VRA determinations were done only with the MLN-D and MLN-F models.

2.3. Covariates

Covariates from administrative records were not available when the model was developed. Instead, covariates from ACS were computed from the same ACS 2015-2019 dataset as the domain response variables, but at higher levels of aggregation. This was done to mitigate problems of measurement error, which can cause suboptimal predictions and misspecification of uncertainty measures (Ybarra and Lohr ^[36]; Bell et al.^[4]) In addition, the sample overlap between the response and the covariate at a higher level of aggregation was removed. Overlap in the sample induces dependence between the sampling and model errors which contradicts the conditional independence assumptions of MLN and other area-level models.

For the 2021 determinations based on the ACS data, one set of covariates was defined as the higher-geography-level LMG proportion of CIT within VOT, LEP within CIT, and ILL within LEP, in the portion of the State complementary to a Jurisdiction it contains, or the portion of the whole AIAN LMG complementary to an AIA. All other covariates were defined at the level of the geographic unit, without regard to LMG. One such covariate OTHLANG, in all geography types, was the proportion of VOT persons speaking a language other than English in the home; another was EDU2, the proportion of VOT persons with less than a college education. Covariates used in Asian and Hispanic LMGs include the proportion FRNBORN of Foreign-born, as well as the average years in the United States for the foreign-born, and the proportions in coarse age-groups. Covariates used in various AIAN LMGs include the proportions of high-school graduates, of white non-hispanic people, and of people in poverty. More information about the covariates can be found in the Technical Report Slud et al.^[32] (Section 2.2).

3. Results on Performance of Modeled Estimates

Whether counts are estimated by direct or model-based methods, the uncertainty in SAE can be quantified by the conditional or unconditional expected square discrepancy (MSPE) between the estimate and the target. Subtle differences between these notions arise depending on whether we regard the target totals as being pure counts or stochastic, modelled quantities. See the Supplement, Section S1, for the connection between model random effects and true domain populations.

The estimation and uncertainty measurement arises for the survey-weighted direct estimators ${\hat{N}}_{j}^{(a)}$ . These design-based estimators are unbiased for their nonstochastic target counts $N_{j}^{(a)}$ , and the variances and MSPEs are identical. However, when $N_{j}^{(a)} = N_{j}^{(0)} \cdot ω_{j}^{(a)}$ are viewed as random, as in a mixed-effects model, the MSPE $E {({\hat{N}}_{j}^{(a)} - N_{j}^{(a)})}^{2}$ is not the variance. In this case, the successive difference replication (SDR) estimator is approximately unbiased for the MSPE, not the variance, and that is the reported measure of variability.

When assessing model-based SAE estimation or prediction errors, the targets are random, and there is a distinction between the unconditional and conditional expectation of ${({\hat{N}}_{j}^{(a)} - N_{j}^{(a)})}^{2}$ . The unconditional MSPE is generally what one estimates in frequentist SAE (Rao and Molina)^[28], while the Bayesian estimates the posterior variance conditional on data.

Throughout this paper, variances’ or MSPEs for direct or model-based frequentist estimates are reported as unconditional MSPEs derived from SDR, while conditional MSPEs for Bayesian estimates are reported as MCMC-based posterior variance estimates. The detailed discussion of the differences between these variance and MSPE concepts is given in Section D.1 of the Technical Report Slud et al.^[32].

The (Geo, LMG) estimates for 2021 determinations in jurisdictions were produced in two different ways. For the 24 largest LMGs, those for which there were at least 700 jurisdictions with any ACS sample, the parameter estimates, domain predictions and posterior variances were computed under model MLN $F$ via Bayesian MCMC. The Bayesian estimators were implemented for these LMGs in R-Stan (Stan Development Team)^[34] with 4 parallel chains, using 1000 burn-in iterations for each and a further 1000 iterations which provided 4000 draws (pooled across the 4 chains) for purposes of convergence assessment, parameter estimation and prediction. Details of the Stan MLN-F parameter estimates and their convergence diagnostics (Development Team)^[33] for three LMGs $(4, 8, 10)$ are provided, with brief explanations, in the Supplement, Section S2.

For the LMGs other than the largest 24, parameters were computed via frequentist MLEs, and domain predictions and SDR based mean-squared prediction error (MSPE) estimates were derived from those. For purposes of comparison, in nine large LMGs for which the production estimates were Bayesian, parameter estimates and domain predictions and MSPEs were re-computed using frequentist MLEs and prediction formulas under both models MLN-D and MLN-F.

The subsections that follow explore data-analytically different aspects of the direct and MLN model-based predictions. Section 3.1 focuses on SEs of direct versus MLN production estimates in jurisdictions, across all LMGs. The remaining subsections restrict attention to nine LMGs (eight Asian and NE regional Hispanic) for which direct, frequentist MLN-D and MLN-F, and Bayesian MLN-F estimates are all available. Section 3.2 explores MLN-D versus MLN-F predictions, whether those predictions came from Bayes or frequentist analysis. Section 3.3 uses diagnostics to assess the quality of fitted models, and Section 3.4 compares Bayes versus frequentist predictions, SEs, and computation times.

3.1. MSPE Comparisons

Three types of estimates and SEs were considered for $N_{j}^{(a)}$ in jurisdictions:

$(1^{\circ})$ direct survey-weighted estimates with SEs estimated by SDR; $(2^{\circ})$ Bayesian estimates from the general MLN-F model, with SEs estimated as posterior standard deviations; and $(3^{\circ})$ frequentist estimates from the MLN-D model with SEs estimated by SDR from model-based predictions. Each (Geo, LMG) domain with a VOT sample has one direct and one model-based estimate, the latter of type $(2^{\circ})$ in data-rich LMGs and of type $(3^{\circ})$ in LMGs with fewer sampled domains. This section compares the estimated coefficient of variation (CV) and SE of direct versus model-based estimates for LEP counts and ILL/LEP ratios.

We first address the frequency of improvement of variability of the estimates in single domains, by comparing direct to model-based estimates of SEs and CVs. Larger SEs or CVs imply a higher degree of uncertainty, and therefore worse estimation performance. For most comparisons, we omit estimates with direct-method SE estimates of 0. In some cases, we omit estimates based on VOT sample size < 5, both because such estimates are unreliable and because samples this small only rarely result in Section 203(b) determinations. Exhibits are created from all (Geo, LMG) domains meeting the stated criteria.

The scatterplots in Figure 1 show the relationship between direct (SDR) and model-based SEs for single (Geo, LMG) domains for the variables LEP and ILL/LEP. In each plot, SDR and model-based SEs would be equal along the dashed blue line. Points below [above] the line indicate domains for which the direct-estimated SE is larger [smaller] than the corresponding model-based estimate. Domains for which either the SDR or (frequentist) model SEs are estimated as zero are not meaningful, and have been omitted. To reduce visual clutter, and emphasize relevance to VRA determinations, the plots are restricted to domains with LEPprop > 0.05, where LEPprop $= {\hat{N}}_{j g} / {\hat{N}}_{j +}$ is defined as the direct-estimated fraction of LEP in the domain over the containing jurisdiction’s total of citizens. In both plots, the points tend to cluster on or below the dashed blue line; that is, for the great majority of domains, the model-based estimates have smaller estimated SEs than the direct estimates.

Table 1 and Supplement Table S. 1 confirm the same point, respectively for estimates of LEP and of ILL/LEP ratios, by displaying quantiles of the percent reduction of model versus direct CVs by decile of the direct estimated CV. The tables tally only domains for which both the model and SDR SEs are positive. In both tables, the median percent reduction of CV is positive (> 9%), that is, the model-based CV is typically lower than the direct CV. For LEP estimates, the median reduction is similar in most of the direct CV bins, except in the largest deciles which show the largest reduction. The range of LEP CV reductions is seen to increase with direct CV. In contrast, the median reduction of the ILL/LEP CVs increases systematically across all deciles of the direct CVs: the higher the direct CV, the larger appears the benefit of the model-based estimate.

Another advantage of model-based estimates is that direct-method CVs often cannot be calculated for domains where the sample is too small to estimate both the standard deviation and mean, while model-based CVs are estimable. Table 2 (third column) shows the numbers of (Geo, LMG) domains with estimable CVs for direct and model-based estimates of LEP and ILL/LEP and compares quantiles of the model and direct CV estimates in domains where both can be calculated. In those domains, the model-based CVs tend to be lower than the direct CVs for both LEP and ILL/LEP. The diagnostics shown above all support the conclusion that the model SE estimates are generally smaller than the direct SDR estimates of standard deviation for both LEP and ILL/LEP. The model-based Section 203(b) estimates are more accurate in that sense than the direct survey-weighted estimates.

Figure 1.

Note: Axis labels are: SDR_SD for SDR SEs and Mod_SD for SEs of model-based estimates.

Table 1.

Quantiles of percent CV reduction of model vs direct CV of LEP in (Jurisdiction, LMG) domains with VOT sample size ≥ 5 and nonzero SE, shown by decile of direct CV.

	CV Decile	# domains	Qu.05	Qu. 25	Median	Qu.75	Qu.95
1	(0.01,0.25]	1,849	-7.38	6.86	15.51	23.94	34.52
2	(0.25,0.41]	1,809	-11.69	4.68	15.82	26.22	38.61
3	(0.41,0.55]	1,826	-17.77	4.36	17.61	29.81	44.41
4	(0.55,0.68]	1,786	-19.43	3.37	17.06	29.41	45.74
5	(0.68,0.79]	1,542	-24.47	3.62	18.61	32.16	49.01
6	(0.79,0.89]	1,154	-26.25	7.06	21.50	35.62	50.26
7	(0.89,0.95]	1,104	-30.50	10.34	24.22	36.95	52.16
8	(0.95,1.01]	1,039	-32.51	9.79	26.72	38.85	54.36
9	(1.01,1.13]	1,023	-23.39	14.24	30.91	42.17	58.58
10	(1.13,11.30]	924	-6.92	29.62	44.22	57.54	74.05

Table 2.

Quantiles of CVs for LEP and ILL/LEP variables in (Jurisdiction, LMG) domains where both direct and model CVs can be estimated and are not 0.

Variable	Est. Type	# Estimable	Q5	Q25	Median	Q75	Q95
LEP	Direct	18,600	0.16	0.49	0.79	0.98	1.28
LEP	Model	78,260	0.13	0.38	0.61	0.90	1.38
ILL/LEP	Direct	5,527	0.14	0.36	0.60	0.89	1.33
ILL/LEP	Model	78,200	0.15	0.31	0.41	0.56	0.75

3.2. MLN-D versus MLN-F Comparisons of predictions

The statistical adequacy of MLN-D as a submodel of MLN-F can be tested using LRT. The parameter dimension of MLN-D is smaller by three, corresponding to the above-diagonal entries of the covariance matrix $Σ$ for random effects in MLN-F, so the LRT $α = 0.05$ cutoff is $χ_{3, 0.05}^{2} = 7.81$ . For the 9 LMGs numbered 3-10 and 73 in which MLN-D and both frequentist and Bayesian MLN-F estimates are available, the LRT statistic values from Table 6 of the Technical Report Slud et al.^[32] are respectively 7.6, 5.6, 9.6, 9.2, 8.8, 4.4, 22.6, 7.2, and 27.4. These LMGs show P-values $\geq 0.02$ for the difference between models MLN-D and MLN-F, with the exception that LMGs 9 (Korean) and 73 (Hispanic, in NE Census Region) respectively have $χ_{3}^{2}$ LRT statistics of 22.6 and 27.4. An example of a very large LMG (third largest in the national population, when the Hispanic LMG is split into four regional parts) without a significant difference between MLN-D and MLN-F fits is LMG 4 (Chinese). Its LRT statistic is 5.6, and Figure 2(a)–(c) shows that the jurisdiction-level predictions of LEP counts from the MLN-D model are distinctly different from the direct ACS estimates but largely indistinguishable between the MLN-D and MLN-F models, whether the latter are fitted by frequentist likelihood maximization or Bayesian MCMC. Figure 2 (b)-(c) shows the closeness of the model-based predictions for jurisdictions with at most 35 sampled CIT persons within LMG 4. MLN-F estimates are denoted MLN-F when frequentist, and Bayes when Bayesian. For jurisdictions with larger samples, all three types of model-based predictions have ratios over direct predictions in the range (0.97,1.01) and the ratios among the model-based predictions are in the range (0.97,1.003). A very similar pattern for LEP predictions in LMG 9 and also for ILL predictions in LMG 4 and LMG 9 is shown in Supplement Figures S2, S3 and S4. Despite the ratios of predictions from MLN-F over those from MLN-D being considerably more spread out in LMG 9 than in LMG 4, the Bayesian and frequentist estimates remain very close to one another in both LMGs.

Figure 2.

Comparison of LEP estimates in 2966 LMG4 jurisdictions with CIT sample ≤ 35. (a) Differences between MLN-D and Direct estimates. Blue line is average difference for jurisdictions with specified CIT sample sizes. (b) Ratio of MLN-F over MLN-D LEP estimates for jurisdictions, plotted by CIT sample-size. (c) Ratio of Bayes over frequentist MLN-F LEP estimates, plotted by CIT sample-size. (d) Ratio of MLN-D estimates fitted on all jurisdictions with VOT sample (minsamp = 1) vs. MLN-D estimates fitted on jurisdictions with VOT sample ≥ 5 (minsamp = 5), plotted vs. minsamp = 1 estimates.

The overall conclusions of these comparisons are that in many but not all large LMGs, the MLNF models are barely significantly different from the MLN-D models, and when the LRT statistic is significant, the MLN-F model parameter estimates and SAE predictions are essentially the same whether made by Bayesian or frequentist computations. Further comparisons between the computational burdens of the frequentist (MLN-D or MLN-F) and Bayesian (MLN-F) analyses are given in Section 3.4.

3.3. Assessments of Model Quality

Panels (a) in Figure 2 and Supplement Figure S4 point to possible flaws in the small-sample model predictions that we explore further in diagnostic tables assessing the goodness of fit of these models. Close examination of panels (a) in these figures shows a slight tendency for the average MLN-D predictions minus direct ACS weighted estimates within CIT sample size groups to be positive for sample sizes < 3 and negative for jurisdictions with CIT sample-size in the range 3-35. These differences may seem trivial for each single CIT sample size, but as Figure S.1 shows, the great majority of jurisdictions have very small sample sizes. When the MLN-D predictions are aggregated versus the direct ACS estimates for LEP or ILL over jurisdictions defined by intervals of CIT sample size, the population groups have large total sample size and the (roughly unbiased) ACS estimates are expected to be accurate, and so can be used to assess the MLN-D predictions. These aggregated comparisons are shown in Supplement Tables S2 and S3, which confirm in detail for both LMGs 4 and 9 the patterns seen for LMG 4 in Figure 2.

In LMG 4, the aggregated MLN-D estimates for LEP counts are larger than the direct estimates when CIT sample size is $\leq 2$ , smaller for sample sizes 3-50, and larger but extremely close for larger CIT sample sizes. In LMG 9, the MLN-D estimates of LEP are larger than the direct estimates in the CIT sample-size bins with sizes $\leq 4$ , smaller for sizes 5-25, slightly larger for 26-50, and very close for sizes > 500. In Supplement Table S3 comparing ILL estimates, MLN-D estimates in LMG 4 are larger than direct for sample sizes 1-2 and smaller or very close for sample sizes > 2. In LMG 9, the MLN-D ILL estimates are larger than the direct for sample-size bins up to (12, 25) and smaller or very close for the bins with larger-sample-sizes. Exactly the same pattern held (but is not included in tables) also in LMGs 3 (Cambodian) and 5 (Filipino), in which the sample-size bins with MLN-D LEP estimates cross over at sample-size 5 from being larger than direct to smaller or approximately equal. In ILL estimates for these LMGs, crossover occurs respectively at CIT sample sizes 12 and 50.

What might cause this pattern of model-based over-estimation in very small-sample jurisdictions and under-estimation in jurisdictions with intermediate sample size? The VRA models were fitted in a unified fashion on all jurisdictions, from very small to very large. The likelihood estimating equations for unknown parameters $β^{(k)}$ strike a balance between jurisdictions. Covariates play a role primarily in ensuring that in large geographic units with correspondingly large samples, the outcomes are well predicted by the fixed-effect terms of the model. On the other hand, level a + 1 outcome counts are often 0 for very small-sample geographic units. Then by definition, a model that predicts positive counts in those domains is over-estimating, and our weak covariates have very little effect in counteracting that tendency. Since the data values $Y_{a + 1, j}$ for the outcome a + 1 direct estimates are re-scaled versions of ${\hat{N}}_{j, g}^{(a + 1)}$ , the likelihood equations emphasize parameter values that balance over-estimates in small-sample domains with underestimates in larger-sample domains, and that is what we see.

An alternative estimation method considered (Slud et al.)^[32] was to fit models (MLN-D or MLN-F) using only data on jurisdictions in which the minimum VOT sample size (minsamp) was five, but then using the fitted models to obtain predicted counts on all jurisdictions. (That is how model fitting and prediction was done in 2016.) Panels (d) of Figure 2 and Supplement Figure S4 show the ratios of MLN-D estimates done both ways. (Fitting models using all jurisdictions with VOT sample is designated minsamp = 1.) The blue lines in panels (d) show the ratio of the minsamp = 5 to the minsamp = 1 MLN-D estimates according to the linear regression of the minsamp = 5 estimates on those with minsamp = 1. The clear implication is that the minsamp = 5 estimates are on average higher and actually would have been too high, across the whole range of sample sizes.

Table 3.

MLN-D Direct, and MLN-D Intercept-only estimates of LEP, aggregated over quartiles of EDU2 for sampled jurisdictions in each of LMGs 4,5 and 9, for CIT sample-size classes (50,200] (columns 3-5) and 20I+ (columns 6-8).

		CITsamp (50, 200]			CITsamp 20I+
LMG	EDU2-quartile	MLN-D	Direct	IntOnly	MLN-D	Direct	IntOnly
4	(0,0.337]	22990	22870	23100	389500	389700	389400
4	(0.337,0.424]	15560	15600	15530	264500	264600	264300
4	(0.424,0.505]	10670	10760	10570	194000	193900	193800
4	(0.505,1.01]	1074	1030	1016	3346	3343	3309
5	(0,0.364]	10620	10580	10780	178800	179200	178600
5	(0.364,0.446]	9938	10030	9916	112400	112600	112200
5	(0.446,0.525]	2798	2660	2684	26380	26390	26210
5	(0.525,1.01]	1324	1323	1223	5224	5289	5150
9	(0,0.344]	20540	20680	20410	105820	105930	105530
9	(0.344,0.429]	18980	19350	18830	101050	101170	100900
9	(0.429,0.507]	4124	4133	3984	30230	30235	30130
9	(0.507,1.01]	1306	1325	1273	0	0	0

Note: Counts are rounded to four significant figures.

Although the covariates used in the MLN-D and MLN-F models are undoubtedly weak, likelihood analysis shows them to be highly significant in large LMGs. Supplement Tables S2 and S3 show the aggregated estimates by CIT sample-size category for the intercept-only MLN-D model. These tables do not show the aggregated MLN-D model predictions with covariates to be generally closer to the aggregated direct predictions than the intercept-only MLN-D predictions.

However, direct and MLN-D and intercept-only MLN-D estimates can also be aggregated over population subgroups defined by the covariates used in the models. The single covariate appearing most prominently in the LEP and ILL models is EDU2, the jurisdiction-level fraction of voting-age persons with no college education. Indeed, EDU2 is a covariate in each of the MLN-D models fitted to LEP and ILL outcomes in LMGs 4, 5 and 9. We next aggregated the estimates over all jurisdictions falling in each of four quartile intervals for EDU2 computed for the sampled jurisdictions for each of LMGs 4, 5 and 9 (not shown). Again for these aggregates, MLN-D with covariates agrees no better with the direct estimates than MLN-D fitted intercept-only. The reason for this is that the covariates in MLN-D dramatically improve the agreement of aggregated predictions with the Direct estimates in the size-classes (50,200] and 201+ where the great bulk of the LEP population (> 90%, for all three LMGs) falls. Table 3 shows the aggregated estimates for the EDU2 quartile intervals within the CIT sample-size class (50,200] in columns 3-5 and within the CIT sample-size class (200, ∞) in columns 6-8. This comparison shows that EDU2 does clearly have the effect of bringing the aggregated estimates from the MLN-D models with covariates much closer to the aggregated direct estimates than those from the Intercept-only models. The full Table S4 with all CIT sample-size classes is Included in the Supplement.

Table 4.

Run-times (Hours) for Bayesian and frequentist MLN runs computing parameter estimates, outcome (Geo, LMG) predictions and MSPEs in Jurisdiction Geography for selected LMGs.

Method	Model	Language Minority Group Number
Method	Model	3	4	5	6	7	8	9	10	73
Bayes	MLN-F	0.25	3.50	3.20	0.12	0.21	1.60	2.00	0.20	1.6
Freq	MLN-F	0.10	0.43	0.41	0.07	0.12	0.67	0.65	0.11	0.17
Freq	MLN-D	0.02	0.14	0.12	0.01	0.01	0.12	0.18	0.02	0.04

Notes: Bayes MLN-F: average of 4 MCMC runs; MLN-D: total for 81 replicate runs; Freq MLN-F: from MLN-D initialization to convergence. LMGs 3-10 are Asian and 73 is NE Region Hispanic.

3.4. Bayes vs. Frequentist MLN-F Analyses

Both frequentist and Bayesian analyses were used in producing estimates in the VRA Section 203(b) project, due to the special nature of the data. Differences in the numbers and sizes of jurisdictions among the different LMGs resulted in a wide variety of degrees of data-richness. The models that could be fitted in the LMGs, and the associated computational burdens, were sufficiently different that a flexible approach was necessary. This section summarizes comparisons made between Bayesian MLN-F and frequentist MLN-D and MLN-F methods, on 9 of the largest LMGs, numbered 3-10 and 73.

Frequentist estimates and predictions in the simpler MLN-D model were much easier to compute numerically than in the Bayesian MLN-F, so that MLE in MLN-D was the method of choice in all LMGs for exploratory data analyses with alternative model specifications. Table 4 shows runtimes for the two types of analysis, for LMGs 3-10 and 73 (Hispanic NE Region). The respective numbers of jurisdictions with ACS 2015-2019 sample in these LMGs were 989,3309, 4004, 742, 845, 3029, 3126, 1052 and 1045. The first row of Table 4 gives runtimes in hours for the average of four MCMC chains, although this advantages the Bayes methods. The third row displays the total run-time for the 81 replicate-weight frequentist MLN-D estimation and prediction runs. The second row gives the additional times for each LMG beyond the MLN-D initializations for a single fit using the original ACS weights.

The times in Table 4 to produce estimates in the MLN-F models are definitely faster for frequentist AGQ calculations than for Bayesian MCMC. However, Bayesian MCMC runs produce posterior variances as part of the same computation. With the frequentist estimates, MSPEs from parametric predictions based on model MLEs must be produced from 81 separate model-fitting calculations using SDR weight-replicates. The time required for 81 frequentist MLN-F calculations would not have been faster than the Bayesian MCMC calculations. The Bayesian approach was ultimately employed only for the most data-rich LMGs with Jurisdiction geography. In less data-rich LMGs, it was found that the MLN-F fitted models were not significantly different from the MLN-D fits, and the tremendously greater speed of MLN-D frequentist computations makes that method preferable on all but the largest LMGs.

All but one of the frequentist MLN-F fits in Table 4 used standard R nonlinear minimization by function nlm of the AGQ numerically integrated log likelihood and analytical (numerically integrated) gradient. The exception was LMG 6, where nlm and optim did not converge and were restarted and completed via optimal-steplength gradient descent. Such unexpected non-convergence requiring re-start with modified optimization method occurred many times in our testing of programmes for MLN-F model fitting, which was another reason we performed Bayesian MLN-F analysis in large LMGs.

These purely computational considerations are relevant because the estimates from MLN-F models by Bayesian and frequentist methods were substantially the same. The estimates are not identical, typically differing only in the second or third decimal place, and in a very few cases in the first decimal place when estimated SEs are large (> 0.7). Table 5 quantifies the differences between frequentist and Bayesian MLN-F predictions and between frequentist MLN-F and MLN-D predictions, in (Geo, LMG) counts for outcomes CIT, LEP, and ILL The table quantifies the predicted differences separately for small and large counts. For jurisdictions where counts for an outcome are predicted (by frequentist MLN-D) to be ≤ 30, mean absolute differences of predicted counts are shown in the table’s first three columns. For jurisdictions with predicted outcome counts > 30, the table’s last three columns display mean absolute differences of (natural) logarithms of predicted counts. The discrepancies between MLN-F frequentist and Bayesian predictions are very small.

Table 5.

Differences between MLN-F frequentist and Bayesian predictions of outcome counts CIT, LEP, ILL by jurisdiction within selected LMGs. Columns I-3 are mean absolute differences of predictions within jurisdictions with MLN-D predicted counts ≤ 30. Columns 4-6 are mean absolute differences of log counts for jurisdictions with MLN-D predicted count > 30.

	Predicted Count ≤ 30			Predicted Count > 30
	CIT	LEP	ILL	CIT	LEP	ILL
LMG 3	0.024	0.075	0.046	0.002	0.009	0.012
LMG 4	0.022	0.032	0.020	0.002	0.006	0.014
LMG 5	0.014	0.026	0.011	0.001	0.006	0.020
LMG 6	0.028	0.098	0.059	0.002	0.012	0.016
LMG 7	0.058	0.125	0.060	0.006	0.033	0.199
LMG 8	0.020	0.026	0.025	0.002	0.010	0.047
LMG 9	0.015	0.033	0.025	0.001	0.007	0.022
LMG 10	0.029	0.056	0.060	0.003	0.007	0.015
LMG 73	0.011	0.040	0.016	0.001	0.015	0.014

Estimated variances or MSPEs of the two types of model-based predictors are found to agree closely. An interesting pattern relates the SE differences to the size of the Bayes posterior SE for estimated LEP: when the Bayesian SEs are low (≤ 20), the frequentist SDR-based estimates are lower; for intermediate Bayes SEs (20-50), frequentist SEs are slightly lower or higher equally often; for medium-high Bayes SEs (50-300), frequentist SEs tend to be higher; and for higher Bayes SEs, the frequentist SEs are very slightly larger. This pattern for LEP-count SEs persists also with scaled-down ranges for ILL-count SEs. The estimated SEs track together fairly closely in all the LMGs investigated, with the largest but still tolerable differences seen in LMG 4, which is illustrated in Figure 3 for LEP.

The striking closeness of Bayesian to frequentist estimates and measures of uncertainty, despite their different origins, agrees with theory (Slud et al.)^[32] (Section D.1). It is also practically important when considering a hybrid estimation strategy motivated by computational efficiency.

4. Discussion

This paper has presented as an SAE problem an important, legally mandated, large-scale US statistical programme. Special features of the estimation task, described in the Introduction, have not previously appeared together in the SAE literature. These features led first of all to novel model choices. The nesting of the essential outcome subpopulations CIT, LEP and ILL within voting-age (Geo, LMG) domains strongly motivated models defined in a sequential conditional-probability form in 2011 and 2016 as well as in the recent 2021 VRA Section 203(b) programme cycle. Similar considerations had motivated the introduction of the continuation-ratio random effects model in a biomedical context with nested outcomes (Coull and Agresti) $^{[7]}$ , and as far as we know our use of the MLN-F model is its first appearance in SAE journal literature. The need to run separate but similar models for many LMGs with drastically different data sizes, and the weakness of the available predictive covariates, was an obstacle to detailed variable selection for single LMGs. In data-rich LMGs, we felt driven to use multiple covariates and allow dependence of random effects in the successive conditional models (2.6). In data-sparse LMGs, few covariates and sometimes intercept-only MLN-D models were all that the data and model convergence would allow. For computational reasons, Bayesian MCMC estimation and prediction techniques in Stan (Stan Development Team) $^{[33]}$ were used in the data-rich LMGs with the MLN-F model while frequentist likelihood maximization based on AGQ (Pinheiro and Bates) $^{[26]}$ yielded estimates and predictions from MLN-D in LMGs with less data. Because frequentist MLN-F predictions were also feasible, our data problem provided in nine of the largest LMGs a test-bed for comparisons of Bayesian MLN-F versus frequentist MLN-F and MLN-D predictions and MSPE estimates.

Figure 3.

Plots of estimated SEs by Frequentist SDR versus Bayesian MCMC for jurisdiction LEP counts in LMG4 for four ranges of successively larger SE estimates. Red lines are 45 ∘.

Diagnostic assessments of the (Geo, LMG) model predictions were heavily driven by data imbalances, since the great majority of jurisdictions had tiny sample sizes while the bulk of population was concentrated in relatively few jurisdictions with large-sample sizes. Diagnostics showed that estimates from MLN-D and MLN-F successfully reduce MSPEs in the great majority of (Geo, LMG) domains; that in large LMGs the MLN models tend to over-predict LEP and ILL versus direct estimates in aggregated jurisdictions at tiny sample sizes, and then under-predict at middle sample sizes, with approximate equality at very high sample sizes; that covariates show little beneficial effect based on aggregated comparisons with direct estimates in large LMG tabulations of sample-size categories alone or covariate-quantile intervals alone; but that the covariates improve over intercept-only models in comparisons with direct estimates, when aggregated over sets of jurisdictions defined by covariate-quantile intervals and moderate-to-large (> 50) sample size.

4.1. Limitations

The models presented in this paper have some limitations that were not addressed in the VRA project reports. These are of two broad types: the weakness of the available covariates, and model inadequacies due to the great majority of (Geo, LMG) domains in every LMG having very small size while the bulk of the population falls in a small number of large-sample domains.

The covariates available for our small area models were estimated from the same survey as the dependent variables. This necessitated estimation of covariates at higher levels of geographic aggregation and the removal of the sample overlap between the covariate and the dependent variables. Both of these steps can attenuate the relationship between the covariate and the dependent variable. Further, our covariates are ACS estimates subject to sampling error, but our model specifications ignore this error and treat covariates as if known precisely. It is known that such measurement errors can lead to noisier predictions and errors in estimated MSPEs (Ybarra and Lohr^[36]; Bell et al.^[4]). This imprecision in estimates makes our predictively weak covariates even weaker. The magnitude of this error will generally be larger for smaller geographic units, which have fewer sample observations.

Our diagnostics indicate that the MLN-D and MLN-F models tend on average to overestimate LEP and ILL counts on geographies with small-sample sizes and underestimate the counts in geographies with intermediate sample sizes. This seems to be due to a combination of the weakness of covariates and the practice of fitting a single model to all geographic units for each LMG regardless of sample size.

4.2. Future Directions

Various avenues for research might improve the current estimation methodology for future VRA Section 203(b) cycles. Some are specific improvements in computing or coordination of data sources, but most will involve new methodological work.

Administrative records may be an independent source of potentially stronger predictive covariates that avoid the pitfalls discussed above regarding measurement error and sample overlapping with the response. For the VRA application, challenges to overcome in using administrative records include: geocoding, lack of the necessary race/ethnicity information to compute the LMG in many of the sources, missing data, intricacies of the administrative records themselves, as well as difficulties in finding administrative-record fields predictive of the outcomes in (Geo, LMG) domains.

Another potential source from which to borrow strength in SAE models is past ACS five year direct estimates of outcome counts. The MLN-F model (2.5) could be extended to have random effects with a time dependence similar to a different multinomial logistic model (López-Vizcaíno et al.^[19]. Such models could be generalized in a way that would also extend a recent temporal Binomial Logit-normal Model (Franco and Bell)^[9] and could be estimated via Bayesian computation.

Future research in this area could fruitfully extend MLN models to account for covariate measurement error. Bayesian analysis of such models should be feasible using Stan (Stan Development Team)^[33]. New research is also needed on ways to capture the effect of the survey design on the sample size for multinomial response data. The VRA covariates currently come from survey aggregates, and tiny geographic domains will have the most troublesome covariate errors-in-variables, diluting the effect of covariate-based prediction in those domains. This is a reason to fit models in the future with simple GVFs in which variances are scaled by sample size, as was done by Malec et al.^[20]; Joyce et al.^{[14, 15]} and in SAIPE (Bell et al.)^[3]. A proposed method of defining multinomial effective sample size (McAllister and Iannelli)^[22] was used in SAE (Maples)^[21], but a more comprehensive study is needed.

Supplemental Material

Supplemental material for this article is available online.

Supplemental Material for Small area estimates for Voting Rights Act Section 203(b) coverage determinations by Eric Slud, Carolina Franco and Adam Hall, in Calcutta Statistical Association Bulletin

Footnotes

Acknowledgements

This paper is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily of the US Census Bureau. This research was performed under DMS project number 7510051. The statistics were released under DRB Clearance number CBDRB-FY23-CSRM001-0001.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors received no financial support for the research, authorship and/or publication of this article.

ORCID iD

Francesca Battaglia

References

Agresti

Categorical Data Analysis . 3rd edition. John Wiley, 2013.

Ashmead

and Slud

Small area model diagnostics and validation with applications to the Voting Rights Act, Section 203. In: Proc. Survey Res. Meth. Section, Amer. Statist. Assoc., Alexandria, VA . 2017; pp. 1515–1523.

Bell

, Basel

and Maples

An overview of the U.S. Census Bureau’s Small Area Income and Poverty Estimates program: Pratesi/Analysis of Poverty Data by Small Area Estimation . Wiley. 2016; ch19, pp. 349–378.

Bell

, Chung

, Datta

and Franco

Measurement error in small area estimation: Functional versus structural versus naïve models. Survey Methodology 2019; 45: 61–80.

Carlin

and Louis

Bayesian Methods for Data Analysis . Taylor & Francis, Chapman & Hall, 2009.

Census RVRDO . Section 203 language determinations. Technical report, US Census Bureau, 2021. https://www.census.gov/programs-surveys/decennial-census/about/voting-rights/voting-rights-determination-file.html.

Coull

, and Agresti

Random effects modeling of multiple binomial responses using the multivariate binomial logit-normal distribution. Biometrics 2000; 56: 73–80.

Dawber

, Salvati

, Fabrizi

, and Tzavidis

. Expectile regression for multi-category outcomes with application to small area estimation of labour force participation. Jour Royal Statist Soc Ser A 2022; 185(S2): 590–619.

Franco

and Bell

Borrowing information over time in binomial/logit normal models for small area estimation. Statist in Transition New Ser 2015; 16(4): 563–584.

10.

Franco

and Maitra

Combining surveys in small area estimation using area-level models. Wiley Interdisciplinary Review Series 2023; 15(6): e1613.

11.

Gelman

, Lee

and Guo

Stan: A probabilistic programming language for Bayesian inference and optimization. Jour Educ & Behav Statist 2015; 40: 530–543.

12.

Ghosh

, Natarajan

, Stroud

and Carlin

Generalized linear models for small-area estimation. Jour Amer Statist Assoc 1998; 93: 273–282.

13.

Jiang

and Lahiri

Mixed model prediction and small area estimation, with discussion. TEST , 2006; 15: 1–96.

14.

Joyce

, Malec

, Little

and Gilary

Statistical modeling methodology for the Voting Rights Act Section 203 language assistance determinations. Technical report, US Census Bureau, 2012. https://www.census.gov/content/dam/Census/library/working-papers/2012/adrm/rrs2012-02.pdf.

15.

Joyce

, Malec

, Little

, Gilary

, Navarro

and Asiala

Statistical modeling methodology for the Voting Rights Act Section 203 language assistance determinations. Jour Amer Statist Assoc 2014; 109: 36–47.

16.

Koster

and McElreath

The multinomial analysis of behavior. Behav Ecol & Sociobiol 2017; 71: 138.

17.

Liu

, Lahiri

and Kalton

Hierarchical Bayes modeling of survey-weighted small area proportions. Survey Methodology 2014; 40: 1–13.

18.

López-Vizcaíno

, Lombardía

and Morales

Multinomial-based small area estimation of labour force indicators. Statist Modelling 2013; 13(2): 153–178.

19.

López-Vizcaíno

, Lombardía

and Morales

Small area estimation of labour force indicators under a multinomial model with correlated time and area effects. Jour Royal Statist Soc Ser A 2015; 178: 535–565.

20.

Malec

, Sedransk

, Moriarty

and LeClere

Small area inference for binary variables in the National Health Interview Survey. Jour Amer Statist Assoc 1997; 92: 815–827.

21.

Maples

Small area estimates of the child population and poverty in school districts using Dirichletmultinomial models. In: Proc. Survey Res. Meth. Section, Amer. Statist. Assoc., Alexandria, VA . 2019; pp. 2448–2456.

22.

McAllister

and Iannelli

Bayesian stock assessment using catch-at-age data and the sampling-importance resampling algorithm. Canad Jour Fisheries and Aquatic Sci 1997; 54(2): 284–300.

23.

McElreath

Statistician Rethinking: A Bayesian Course with Examples in R and Stan . Chapman & Hall, 2020.

24.

Molina

, Saei

and Lombardía

Small area estimates of labour force participation under a multinomial logit mixed model. Jour

Royal Statist Soc Ser A 2007; 170: 975–1000.

25.

Parker

, Holan

and Janicki

Computationally efficient Bayesian unit-level models for non-Gaussian data under informative sampling with application to estimation of health insurance coverage. Ann Appl Statist 2022 16(2): 887–904.

26.

Pinheiro

and Bates

Approximations of the loglikelihood function in the nonlinear mixed effects model. Jour Comput & Graph Statist 1995; 4: 12–35.

27.

R Core Team . R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria, 2021. URL https://www.R-project.org/.

28.

Rao

JNK

and Molina

Small-Area Estimation . 2nd edition. John Wiley, 2015.

29.

Saei

and Taylor

Labour force status estimates under a bivariate random components model. Jour Indian Soc Agricult Statist 2012; 66(1): 187–201.

30.

Scealy

Small area estimation using a multinomial logit mixed model with category-specific random effects. Austral Bur Statist 2010; Research paper.

31.

Slud

, Ashmead

, Joyce

, and Wright

(2018) Statistical modeling methodology (2016) for Voting Rights Act, Section 203 determinations. Technical report, US Census Burea. Available online at https://www.census.gov/content/dam/Census/library/working-papers/2018/adrm/RRS2018-12.pdf

32.

Slud

, Franco

, Hall

, and Kang

(2022) Statistical modeling methodology (2021) for Voting Rights Act, Section 203 determinations. Technical report, US Census Bureau. Available online at https://www.census.gov/content/dam/Census/library/working-papers/2022/adrm/RRS2022-06.pdf

33.

Stan Development Team.

Stan modeling language users guide and reference manual, version 2.32.0, 2022. URL http://mc-stan.org/.

34.

Stan Development Team . RStan: the R interface to Stan, 2023. URL https://mc-stan.org/. R package version 2.21.8.

35.

Wang

, Berg

, Zhu

, Sun

and Demuth

Small area estimation of proportions with constraint for National Resources Inventory survey. Jour Agricult Biol & Envir Statist 2018; 23: 509–528.

36.

Ybarra

and Lohr

Small area estimation when auxiliary information is measured with error. Biometrika 2008; 95(4): 919–931.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.90 MB