Sage Journals: Discover world-class research

Abstract

Significant enhancements have been made in small area estimation (SAE) methodology because there has been an increased demand for reliable estimates through SAE over the past decades. This article describes the advanced statistical methodology used to produce state and county model-based estimates of average scores and various proficiency levels of adults for all states and counties, using data from the first cycle of the US Program for the International Assessment of Adult Competencies and the American Community Survey. Challenges and issues are discussed especially in light of the small number of sample counties with survey data. Each major stage of the estimation process is discussed, including the approach for generating design-based survey estimates and modelled variances, identifying a set of predictor variables (available and measured consistently for all counties), the hierarchical Bayes linear threefold models, including bivariate models for proportions, and the diagnostics and evaluation.

Keywords

PIAAC small area estimation model-based

Introduction

To make evidence-based policies and laws relating to adult education, sound research is needed using reliable data that are most relevant to the jurisdiction, for example, the state or county in the United States. The Program for the International Assessment of Adult Competencies (PIAAC) is an international study under the leadership of the Organization for Economic Co-operation Development. The first cycle of PIAAC involved over 30 countries and was designed to provide national estimates of the proficiency of adult literacy, numeracy, and problem-solving skills. Because the US PIAAC sample size was too small to support state and county estimates, the small area estimation (SAE) methodology was used to produce model-based estimates of average scores for literacy and numeracy, and various proficiency levels of adults of 16-74 years old residing in the United States in 2012-2017.

The term SAE refers to a variety of methods or statistical techniques to estimate parameters for subpopulations or smaller domains of interest. In the past decades, two major types of SAE models have been developed: area- and unit-level models. Area-level models take as input survey direct estimates and auxiliary data at the area level.¹ The unit-level models take as input survey data and auxiliary data at the unit level.² An overview of theoretical developments and a number of notable implementations have been provided.³ The review covers topics particularly important for the application to be described here. One of these topics is the possible effect of an informative sample design on estimating the SAE model.^{4, 5} Another topic is the extension to multivariate models.⁶ The Fay-Herriot model has been extended to estimate correlated descriptive measures.⁷ They found that the multivariate empirical best linear unbiased predictors (EBLUPs) have a lower mean square error (MSE) than the corresponding EBLUPs from univariate models when the true generating model is multivariate. A bivariate hierarchical Bayes (HB) SAE model has been developed ⁸ using sparse survey data available for fine domains defined by geography, occupations, work levels within occupations, and job characteristics from the Bureau of Labor Statistics' National Compensation Survey. This model accounts for a more general variancecovariance structure for the latent effects. Also, the model is used to predict employee compensation in a large number of domains without survey data.

In terms of applications to literacy proficiency estimates, state- and county-level estimates of the proportions of adults lacking basic prose literacy skills were produced by the National Center for Education Statistics (NCES) using data from the 2003 National Assessment of Adult Literacy (NAAL) and 1992 National Adult Literacy Survey (NALS),⁹ and model-based SAE methods. The estimates were constructed using an HB unmatched area-level twofold model using 2003 NAAL and auxiliary data from the 2000 census.¹⁰ The model consisted of two "unmatched" components of the approach: a normal sampling model for the county-level direct survey estimates of the percentages and a smoothing (or linking) model to link the logit of the true, underlying, percentages to a set of auxiliary variables that were available, measured consistently for all counties, and related to the outcome. The model was "twofold" in that the smoothing model included random effects for the states and counties. The process was repeated using the 1992 NALS data. In addition, model-based SAE methods were applied to literacy assessment data.¹¹ The authors were faced with a continuous variable that had a large peak of zero values, which can happen in developing countries where zero value indicates illiteracy and positive values measure the level of literacy. In the United Kingdom, a unit-level nonlinear (logistic) model was used for literacy and numeracy binomial outcomes for the 2011 Skills for Life Survey.¹² A modeldependent approach was developed using Canadian PIAAC data.¹³ The author used population parameters derived from the survey data and auxiliary information, such as census to produce provincelevel estimates of skill distribution. Netherlands' PIAAC data were used¹⁴ to produce municipality-level average literacy scores using a unit-level linear HB model, and an area-level linear model was used to model the proportion of low literates.

In our application of SAE, which was sponsored by NCES with an extensive background discussion,¹⁵ county-level HB linear threefold models (i.e., using random effects for counties, states and census divisions) are developed using PIAAC data. The models are used to predict four outcomes of interest for adult literacy and numeracy proficiencies for counties (the small area for this study) that are in the sample and counties that are out of the sample: an average score (on the PIAAC scale of 0-500), and the proportion of adults at or below Level 1 (low proficiency), at Level 2 (medium proficiency) and at or above Level 3 (high proficiency). There were challenges to overcome in the application of the SAE methodology to US PIAAC, which include the existence of small sample size within counties and a small proportion of the US counties with the sample, existence of the sampling error and imputation error, and the aggregation of county-level predictions.

A description of the sources of data is provided in the ‘Background on Ingested Data’ section, including PIAAC and data from other sources. In the ‘Description of the PIAAC Small Area Estimation Process’ section, methodology is discussed including a covariate selection process to gather strong covariates, predictions through a bivariate model and conducting extensive model diagnostics. To allow major sources of error to propagate through the modelling process into the resulting estimates, a hierarchical modelling framework is used. A threefold model brings the model-based estimates into close alignment with the design-based estimates at high levels of aggregation. Next, the results for each step in the process are presented in the ‘Results’ section. The last section contains a summary of the findings, supplemented by the discussion of the dissemination of the model-based estimates.

Background on Ingested Data: Sources and Key Issues

PIAAC is the sixth of a series of adult skills surveys, sponsored by NCES, which have been implemented in the United States. The first cycle of PIAAC included three national data collections (in 2012, 2014 and 2017). In each year, a four-stage stratified area probability sample was selected. In the first stage, primary sampling units (PSUs) were selected with probabilities proportionate to size, consisting of individual counties or groups of contiguous counties. In the second stage, secondary sampling units (SSUs) were selected, consisting of census blocks (e.g., a city block bounded by streets) or block groups (generally contains between 600 and 3,000 people). Dwelling units were selected in the third stage where a screener interview was used to identify the eligible persons. Using a computer-assisted personal interviewing system, in the fourth stage, one or more eligible persons were selected. Next, a Background Questionnaire (BQ) interview was administered, and then respondents were provided with the assessment. To increase the number of counties with PIAAC data for the SAE modelling process, the 2012, 2014 and 2017 samples were combined resulting in 12,330 respondents from 185 counties. Sample weights were created for the combined PIAAC 2012/2014/2017 sample for the purpose of survey estimates. Response rates ranged from 70% in 2012 to 56% in 2017. The US PIAAC Technical Report¹⁶ provides more sample design details.

Key Issues Inherent in the Ingested Data from PIAAC

The first key issue inherent in the PIAAC data that presents challenges from the modelling perspective is informative sampling and nonresponse. PSUs were selected with probability proportionate to size sampling, and therefore the set of states and counties with the PIAAC sample results from informative sampling, which needs to be addressed in the SAE process. Also, when nonresponse is not sufficiently explained by the weighting variables used in weighting adjustments, informative nonresponse exists. In particular, literacy-related nonresponse needs to be addressed, which is estimated to be about 5% of the population. Due to a literacy-related reason (language barrier, reading/writing barrier or mental disability), such nonrespondents cannot complete the BQ and assessment (conducted in English). Cases that received a final weight include respondents to the BQ and literacy-related nonrespondents; however, the BQ literacy-related nonrespondents did not receive a literacy score.

A second issue inherent in the PIAAC data is related to the small sample sizes within counties and a small number of counties with sample. Table 1 provides a breakdown of the county sample sizes for the combined 2012/2014/2017 sample. Of the 50 states, plus the District of Columbia, 44 have completed cases in the combined 2012/2014/2017 sample. In addition, only about 6% of the counties in the United States (185 out of 3,142) have PIAAC respondent data.

Table 1.

Number of Completed Cases per County.

Number of Completed Cases	Number of Counties
Total	185
<5	4
5-10	14
11-20	10
21-50	58
51-100	56
101+	43

A third key issue is the existence of the sampling error (due to small sample sizes within counties) and the imputation error. The imputation error exists due to the PIAAC test design, which is based on an approach most common to the major large-scale assessments. To reduce respondent burden, a subset of test items is administered such that different groups of respondents answer different sets of items. For each domain, scores are derived using item response theory (IRT) scaling. A total of 10 plausible values (PVs, or multiple imputations) help to facilitate a measure of the uncertainty of the cognitive measurement. In a population model, the PVs are drawn from a posterior distribution based on the IRT scaling of the cognitive items with a latent regression model using information from the BQ. For more details about the IRT scaling models and the population models, see chapter 17 of the OECD PIAAC Technical Report.¹⁷

Predictor Variables

Model-based SAE methods are used to produce estimates for areas where survey data are insufficient for reliable design-based estimation. The SAE models ‘borrow strength’ across related counties and from the auxiliary information to produce reliable model-based estimates. Model-based SAE estimates for counties that are not part of the national sample would rely almost entirely on the model, since designbased estimates are not available for such counties. That is, the model structure and covariates play an important role in the prediction of county levels of proficiency for literacy or numeracy. Reliable data sources and variables to serve as potential covariates of proficiency levels were initially identified, and more than 70 county-level variables across five major variable types were obtained as potential predictors from eight data sources. Focus was given to variables that were found to be related to the adult literacy skills in previous studies,^{18, 19} and were available for all the counties in the nation. These variables included those related to location, education, demographic characteristics, socio-economic status and immigration status. In addition to county-level variables, a set of 24 state-level variables was selected to provide additional information. These variables included those related to socio-economic status and education such as the graduation rate.

Description of the PIAAC SAE Process

The general method steps include creating the model inputs (including the selection of covariates), developing the model, generating predictions for all counties in the nation, aggregating predictions to the state and national levels, and performing diagnostics and evaluations.

Creating the Model Inputs

For the model inputs, the methods include computing design-based estimates and variances, making adjustments using survey regression estimation (SRE) and conducting variance smoothing. Designbased estimates were computed for each of the 185 counties for the outcomes of interest, where informative nonresponse is addressed, in which it is assumed that any literacy-related nonresponse is below Level 1 (which impacts the estimation of the proportion at or below Level 1), and for the estimation of averages, the first percentile of proficiency scores is imputed for literacy-related nonrespondents.

The multiple imputation variance estimate²⁰ is used to account for both the sampling error and the imputation error in the PIAAC estimates. First, the survey estimate for the mth PV for county k is computed as

{\hat{y}}_{k m} = \sum_{l = 1}^{n_{k}} w_{k l} y_{k l m} / \sum_{l = 1}^{n_{k}} w_{k l},

(1)

where $w_{k l}$ is the final PIAAC 2012/2014/2017 national weight for person $l$ in county $k, y_{k l m}$ is an indicator variable for the proficiency level (for proportions) or the proficiency score (for average) and $n_{k}$ is the number of cases in county $k$ . Then the county-level direct estimate $({\hat{y}}_{k})$ is calculated as ${\hat{y}}_{k} = \frac{1}{10} \sum_{m = 1}^{10} {\hat{y}}_{k m}$ . The multiple imputation variance estimate is ${\hat{σ}}_{k}^{2} = {\hat{σ}}_{W k}^{2} + (\frac{11}{10}) {\hat{σ}}_{B k}^{2}$ , where ${\hat{σ}}_{W k}^{2}$ is the within-imputation variance and ${\hat{σ}}_{B k}^{2}$ is the between-imputation variance. The within-imputation variance component is computed as the average of the sampling variance for each of the 10PVs: ${\hat{σ}}_{W k}^{2} = (\sum_{m = 1}^{10} {\hat{σ}}_{k m}^{2}) / 10$ , where ${\hat{σ}}_{k m}^{2}$ is the sampling variance of the estimated mean or proportion for the PV $m$ : $({\hat{y}}_{k m})$ . The between-imputation component is calculated as ${\hat{σ}}_{B k}^{2} = [\sum_{m = 1}^{10} {({\hat{y}}_{k m} - {\hat{y}}_{k})}^{2}] / 9$ . Taylor series linearization ²¹ p. 234 is used to compute sampling variances $({\hat{σ}}_{k m}^{2})$ , with PSUs as strata and SSUs as variance units (clusters). There were at least two SSUs, and sample sizes were five or more. For 170 counties, however, 15 counties just had 1 SSU with PIAAC data; therefore, direct variance estimates could not be computed for them.

Next, SRE was used to help address small sample sizes and concerns about resulting large variances and the representation of the sample within each county with a sample. The survey regression estimate of the mth PV for county k can be written in the following form²²:

{\hat{y}}_{k m}^{s u r v} = X_{k}^{T} {\hat{B}}^{m} / N_{k} + \sum_{l = 1}^{n_{k}} e_{k l} w_{k l} / \sum_{l = 1}^{n_{k}} w_{k l},

(2)

where $X_{k}$ is the vector of population totals in county k corresponding to the predictors in the unit-level regression model, ${\hat{B}}^{m} = {(\sum_{l \in s} w_{l} x_{l} x_{l}^{T})}^{- 1} \sum_{l \in s} w_{l} x_{l} y_{l m}$ , is the vector of survey-weighted regression coefficients from the unit-level regression based on the whole sample for the PV $m, N_{k}$ is the known size of the eligible population in the county, $e_{l m} = y_{l m} - x_{l}^{T} {\hat{B}}^{m}$ are the unit-level residuals from a regression fit in county k for the PV $m, w_{l}$ are the corresponding survey weights and $s_{k}$ is the sample in county k. The models for the eight literacy/numeracy estimates used the same set of predictors, and the final set consisted of 15 indicator variables for the following 15 categories:

Age groups: 18-19; 20-24; 25-34; 35-44; 45-54; 55-64 and 65-74 years

Gender by age: males of age 18-74 years

Race/ethnicity by age: Black of age 18-74 years and Hispanic of age 18-74 years

Educational attainment by age: less than high school education of age 18-64 years, high school education of age 18-64 years, college education of age 18-64 years and bachelor’s degree of age 18-64 years

Nativity by age: foreign born of age 20-74 years.

It is necessary for the predictors for the unit-level SRE model to have (a) population totals that had the same definition and coverage as the corresponding PIAAC variables (obtained from the American Community Survey [ACS] 2012-2016) and (b) have a low level of item nonresponse (less than 5%) where imputation was used to fill in the missing values. The approach brought survey-estimated county population totals closer to the county totals from a reliable external source.

In SAE, the use of SRE and the Taylor series variance estimation approach is described.³, pp. 21-23 For each PV, the SRE variance was estimated by applying the standard variance expression to the residuals (e_l), with PSUs as strata and SSUs as clusters. Each county is in a single stratum (PSU), which simplifies the notation. Counties with only one sample segment were excluded in the following computation of the sampling variance for each PV:

var ({\hat{y}}_{k m}^{surv}) = N_{k}^{- 2} \frac{n_{seg, p}}{(n_{seg, p} - 1)} \sum_{c \in p} {(e_{p k c m} - e_{p k m})}^{2},

where $n_{seg, p}$ is the number of SSUs c selected within PSU $p, e_{p k c m} = \sum_{l \in c} a_{p k c} w_{l} e_{l m}, e_{p k m} = n_{s e g, p}^{- 1} \sum_{c \in p} e_{p k c m}$ , and where $a_{p k c}$ is an indicator with value 1 if segment c of PSU p is in county k, and 0 otherwise. To obtain the survey regression estimates and variances for the sampled counties and account for the imputation variance, the multiple imputation variance formulas above are applied using ${\hat{y}}_{k m}^{surv}$ in place of ${\hat{y}}_{k m}$ . Note that the SRE approach was used to make survey estimates for counties with the PIAAC sample.

The HB models (to be discussed) for the PIAAC SAE process assume that the variances of the SRE county estimates are known, whereas in practice they are estimated as $var ({\hat{y}}_{k m}^{surv})$ . Variance smoothing is sometimes used to stabilize the variances. Inspired by the generalized variance function methods in chapter 7 of Wolter,²¹ a variance estimation smoothing model for proportions is specified as

In ({neff}_{k}) = β_{0} + β_{1} ln (C_{k}) + β_{2} In (B_{k}) + ε,

(3)

where $n e f f_{k} = {\hat{y}}_{k m}^{surv} (1 - {\hat{y}}_{k m}^{surv}) / var ({\hat{y}}_{k m}^{surv}), C_{k}$ is the number of clusters in county $k, B_{k}$ is the average cluster size among clusters in county k and $ϵ$ is an error term. The model was weighted by $C_{k} - 1$ , and the exponentiation of the predicted value from this model, ${\tilde{n e f f}}_{k}$ , was used to derive the smoothed variance as ${\tilde{σ}}_{k}^{2} = {\hat{y}}_{k m}^{surv} (1 - {\hat{y}}_{k m}^{surv}) / {\tilde{n e f f}}_{k}$ . Covariances were calculated using $Cov ({\hat{y}}_{k m}^{surv}, {\hat{y}}_{k m}^{surv}) = \frac{{\tilde{σ}}_{3 k}^{2} - {\tilde{σ}}_{1 k}^{2} - {\tilde{σ}}_{2 k}^{2}}{2}$ , given the smoothed variances and that $V a r ({\hat{y}}_{1 k m}^{surv} + {\hat{y}}_{2 k m}^{surv}) = V a r ({\hat{y}}_{3 k m}^{surv})$ . since the sum of the three proportions is equal to 1, where ${\hat{y}}_{1 k m}^{surv}, {\hat{y}}_{2 k m}^{surv}$ and ${\hat{y}}_{3 k m}^{surv}$ are the estimated proportions at or below Level 1, at Level 2 and at Level 3 and above, respectively, and ${\hat{σ}}^{2}, {\hat{σ}}^{2 k}$ and ${\hat{σ}}^{2}$ are the corresponding variance estimates. Later, the variance-covariance matrix after smoothing will be denoted by $Σ_{i j k}$ . For averages, the variance of the estimated average from the SRE process is smoothed by fitting a weighted least-squares model as

ln ({Var}_{r_{k}}) = β_{0} + β_{1} ln (C_{k}) + β_{2} ln (B_{k}) + β_{3} ln ({\hat{σ}}^{2} y_{k}) + ϵ,

(4)

where ${Var}_{r_{k}}$ is the residual variance for each county $k, C_{k}$ is the number of clusters in each county, $B_{k}$ is the average cluster size for each county k and ${\hat{σ}}_{y_{k}}^{2}$ is the estimated population variance of the literacy/ numeracy scores within each county k. The model is weighted by $C_{k} - 1$ . The exponentiation of the predicted value from this model is the smoothed variance.

Selecting Covariates for the SAE Models

Only 185 counties have PIAAC data, and therefore the model-based estimates will rely on the model extensively for a vast majority of the counties in the United States. To improve the strength of the SAE models, an extensive covariate selection process has been developed with a hope that predictors can be found that are highly related to the key outcomes. A summary of the process is discussed here, while more details can be found in Ren et al.²³ First, the candidate predictor variables are treated as fixed effects and a correlation matrix is created among all the covariates to identify highly correlated variables. One variable in each of the highly correlated pairs is dropped to avoid multicollinearity. Then the least absolute selection and shrinkage operator (LASSO) method²⁴ is used to select several sets of covariates for each of the four outcome models for literacy and for numeracy. To taking into account the random effect estimation in the SAE model (described below), the final list of covariates is determined using a cross-validation process. Note that the probability of selection of the PSUs is included in the variable selection process to help address informative sampling; however, it did not enter the final model.

Developing the Model and Conducting Diagnostics

An SAE modelling approach is used to produce model-based estimates that are the predictions of how the adults in a state or county would have performed had they been administered the PIAAC assessment. The methodology uses PIAAC survey data in combination with ACS data at the county level to model the quantities of interest.

Models for Proportions

In terms of producing estimated proportions of literacy and numeracy proficiency and continuing to address the propagation of high levels of the sampling error and the imputation error, an area level bivariate HB linear threefold model is developed. The model is fitted at the county level, with the input data being the sets of county-level survey regression estimates and their associated variance estimates (smoothed). Modelled jointly are two proportions: Level 1 and below and Level 3 and above. Through subtraction, the third proportion (Level 2) is derived. The model is written using a hierarchical form, that is, a linking-level accounts for the relationship between the target proportions and the covariates, and a sampling level is used for the direct estimates of proportions. A linear relationship between the proportions and the predictors is assumed for the linking model, which has random effects at three nested levels defined by the county, state and census division. To account for multiple outcomes, the SAE model is specified using the matrix form notation as follows:

\begin{array}{l} P_{i j k} \sim N (θ_{i j k}, Σ_{i j k}) \\ θ_{i j k} \sim X_{i j k} β + c_{i j k} + v_{i j} + d_{i}, \end{array}

(5)

where $i$ is an index for the census division, $j$ is an index for the state, k is an index for the county, $P_{i j k}$ is a jointly normally distributed bivariate vector of survey regression estimates for proportions at or below Level $1 (P_{1})$ and at or above Level $3 (P_{3})$ , with mean $θ_{i j k}$ and associated estimated variancecovariance matrix $Σ_{i j k}, X_{i j k}$ is a matrix of covariates, $β$ is a matrix of regression coefficients and $c_{i j k}, v_{i j}, d_{i}$ are the county-, state- and division-level random effects, respectively. The estimated variancecovariance matrices $Σ_{i j k}$ are the result of the variance smoothing process and treated as fixed and known hereafter.

Using the hierarchical model specification, sources of error are accounted for, including: smoothed sampling variances and random effects at the county, state and division levels. A benefit of the threefold model is that model-based estimates for counties and states without a sample will not be fully synthetic because they will be functions of direct estimates in other counties and states. Another benefit is that associations of counties within states, and states within census divisions will be accounted for, helping improve precision of the modelbased estimates at all these three levels of aggregation. In doing so, benchmarking¹ the estimates may not be necessary. The Bayesian approach is used for inference, and prior distributions are adopted for the model parameters. Summaries for the model-based estimates, such as credible intervals, and functions of the model parameters, such as the Level 2 proportion are straightforward through the use of Bayesian methods.

Independent priors are assumed for the regression coefficients and the random effects in the fully specified HB model. Specifically, it is assumed $β \sim N (0, 100)$ , where the normal distribution specification uses the mean of 0 and the standard deviation of 10. It is also assumed that the random effects are mutually independent, following bivariate normal distributions,

\begin{array}{l} c_{i j k} \sim N (0, Σ_{c}) \\ v_{i j} \sim N (0, Σ_{v}), \\ d_{i} \sim N (0, Σ_{d}) \end{array}

(6)

where $Σ_{c}, Σ_{v}$ and $Σ_{d}$ are $2 \times 2$ variance-covariance matrices. For example, in $Σ_{c} = (\begin{matrix} Σ_{c, 1, 1} & Σ_{c, 1, 2} \\ Σ_{c, 2, 1} & Σ_{c, 2, 2} \end{matrix})$ , the elements on the diagonal $Σ_{c, 1, 1}$ and $Σ_{c, 2, 2}$ are the variances of the county-level random effects for the proportion at or below Level 1 and at or above Level 3, respectively, and the off-diagonal elements $Σ_{c, 2, 1}$ and $Σ_{c, 1, 2}$ denote the covariances of the county-level random effec for P1 and P3, respectively. The variance-covariance matrices $Σ_{c}, Σ_{v}$ and $Σ_{d}$ can be decomposed as follows:

\begin{array}{l} Σ_{c} = S_{c} Ω_{c} S_{c} \\ Σ_{d} = S_{d} Ω_{d} S_{d}, \\ Σ_{v} = S_{v} Ω_{v} S_{v} \end{array}

(7)

where $S_{c}, S_{d}$ and $S_{v}$ are the diagonal matrices with standard deviations along the diagonal, and $Ω_{c,} Ω_{d}$ and $Ω_{v}$ are the correlation matrices (with diagonal entries being equal to 1). The Cauchy prior distribution adopted for the standard deviation parameters (diagonal entries in $S_{c}, S_{d}$ and $S_{v}$ ) has a location (median) hyper-parameter of 0 and a scale (half the interquartile range) hyper-parameter of 5; the support of the distribution was restricted to the positive real line. In the model for literacy proportions, an LKJ_corr (1) prior ²⁵ is adopted as the prior distribution for the correlation matrices $Ω_{c,} Ω_{d}$ and $Ω_{v}$ . For numeracy proportions, the ${LKJ}_{corr_cholesky} (1)$ prior is adopted for the Cholesky factors of correlation (lower triangular) matrices $L_{c}, L_{d}$ and $L_{v}$ , where $Ω_{c} = L_{c} L_{c}^{T}, Ω_{d} = L_{d} L_{d}^{T}$ and $Ω_{v} = L_{v} L_{v}^{T}$ ., respectively. With priors adopted for its components, the idea behind the LKJ prior is based on the decomposition of the variance-covariance matrix.

Models for Averages

For each domain, PIAAC averages were estimated using an area-level univariate HB linear threefold model, which includes three levels of random effects: county, state and census division. The model is specified as follows:

\begin{array}{l} Y_{i j k} \sim N (θ_{i j k}, σ_{i j k}^{2}) \\ θ_{i j k} \sim X_{i j k} β + c_{i j k} + v_{i j} + d_{i}, \end{array}

(8)

where $Y_{i j k}$ is the survey regression estimate of average literacy or numeracy scores at the county level, assumed normally distributed with the mean of $θ_{i j k}$ and the associated estimated variance $σ_{i j k}^{2}, X_{i j k}$ is a vector of covariates, $β$ is a vector of regression coefficients and $c_{i j k}, v_{i j}$ and $d_{i}$ are the county-level, state-level and division-level random effects, respectively. The estimated variances $σ_{i j k}^{2}$ are the result of the variance smoothing process and treated as fixed and known hereafter. For the regression coefficients, independent priors $β \sim N (0, 1000)$ are assumed, where the normal distribution specification uses the mean and the standard deviation, and the random effects are mutually independent. The following are the normal distributions:

\begin{array}{l} c_{i j k} \sim N (0, σ_{c}^{2}) \\ v_{i j} \sim N (0, σ_{v}^{2}) . \\ d_{i} \sim N (0, σ_{d}^{2}) \end{array}

(9)

For the random effects, the variances $σ_{c}^{2}, σ_{v}^{2}$ and $σ_{d}^{2}$ are assumed to follow a uniform prior distribution over a wide range, 0−1,000. For both sets of models for proportions and averages, little information is provided to the model because of the choice of vague (almost noninformative) priors for the model parameters; therefore, the data (likelihood) have a major role in the posterior distribution.

Model Fitting, Estimation and Prediction

Using the PIAAC sample data available in 184 counties with at least 2 records, the proportion and average models are fitted to literacy and numeracy data separately. RStan, the R interface to the Stan modelling language, is employed for this purpose. The R and Stan starting seeds used in the generation of sequences of random numbers are set equal to constants so that results can be repeated.

Markov chain Monte Carlo (MCMC) methods are used for the HB models. Three independent Markov chains are run to facilitate the calculation of Monte Carlo standard errors.^{26, 27}, p. 229 The modelfitting procedure starts with three assigned sets of distinct initial values for $β$ , and the initial values for other model parameters are randomly generated within RStan. For this, the initial values of $β$ are detained from fitting weighted linear regressions to the proportions or averages using the set of covariates (given in the ‘Results’ section), with the county sample size being the weight. These initial values are then modified by adding/subtracting a constant, resulting in three distinct sets to be used by the three chains. For each chain, 5,000 iterations are run as a warm-up, then 15,000 iterations are produced for each of the 3 runs and are ‘thinned’ by taking 1 in every 10. Thus, over the three chains, a total of 4,500 iterations remained. These 4,500 final iterations (referred to as MCMC samples) then simulate the posterior distributions of all the parameters in $η = (θ, β, c, v, d, Σ_{c}, Σ_{v}, Σ_{d})$ or $(θ, β, c, v, d, σ_{c}^{2}, σ_{v}^{2}, σ_{d}^{2})$ for proportions and averages, respectively. The means of the 4,500 MCMC samples are the HB estimates.

Across the 4,500 MCMC samples, predictions are produced for proportions and averages of sampled counties, non-sampled counties (without PIAAC data) and for states and nations. The posterior mean ${\hat{θ}}^{HB}$ , which is also called the county-level HB model prediction (for proportions or averages) for sampled county k in state $j$ and census division i, is ${\hat{θ}}_{i j k}^{HB} = \frac{\sum_{b = 1}^{4, 500} θ_{i j k}^{(b)}}{4, 500}$ , where the value of $θ_{i j k}^{(b)}$ for the MCMC sample $b$ is obtained from

θ_{i j k}^{(b)} = X_{i j k} β^{(b)} + c_{i j k}^{(b)} + v_{i j}^{(b)} + d_{i}^{(b)} .

(10)

The values of $c_{i j k}^{(b)}$ are not available for all of the non-sampled counties. Likewise, values of $v_{i j}^{(b)}$ are not available for the non-sampled counties in states without a sampled county; all census divisions are represented in the sample. Values of the county and state random effects are generated from the appropriate normal distribution for those counties and states without sample. Specifically, $c_{i j k}^{(b)}$ is drawn from $N (0, Σ_{c}^{(b)})$ for proportions and $c_{i j k}^{(b)}$ is drawn from $N (0, σ_{c}^{2 (b)})$ for averages. Similarly, $v_{i j}^{(b)}$ is drawn from $N (0, Σ_{v}^{(b)})$ for proportions and $v_{i j}^{(b)}$ is drawn from $N (0, σ_{v}^{2 (b)})$ for averages. For the non-sampled counties in states with one or more sampled counties, the estimated state effect is available. The linking level is used to generate samples for non-sampled areas. In both cases, once the set of 4,500 values of $θ_{i j k}^{(b)}$ is obtained, the posterior mean $θ_{i j k}^{(b)}$ for non-sampled counties is computed. (Note that any posterior summaries can be computed using the 4,500 values.) Because of the use of a linear model, the proportion at or above Level 3 in one county had a negative value relating to literacy and eight counties relating to numeracy had a negative value. Negative values were assigned values of zero in the NCES Skills Map, which is discussed further in the ‘Limitations and Summary’ section. The use of linear models provided some benefits, including run time, and model options such as multivariate models. We note that the NAAL survey application $^{9}$ used a logit linking model. The NAAL had a national proportion for a single measure — below Basic Prose literacy $= 0.14$ . For this PIAAC application, the national proportions were higher, for example, at or below Level 1 literacy $= 0.22$ (numeracy $= 0.32)$ and the national proportion at or above Level 3 literacy $= 0.46$ (numeracy $= 0.36$ ). Therefore, the PIAAC data fall closer to the middle, linear section of a sigmoidal curve than the NAAL application, however, not quite exclusively, and therefore, the models resulted in a small number of negative predicted values.

Aggregation to the State Level and the National Level

Once the county model-based estimates are produced, the estimates for states (and the nation) are computed as weighted aggregates county estimates for each iteration, where the weights represent the total of the county household population of adults ages 16-74 years, obtained from the 2013-2017 ACS data.

Diagnostics

The models were subjected to rigorous diagnostic checks that included various methods of internal and external model validation. The methods of internal model validation included convergence and mixing diagnostics, collinearity tests, residual analysis, posterior predictive checks, model sensitivity checks, examining changes in the specification of the prior distribution for the variance-covariance matrices (including changes in initial values and in hyperparameters values), examining changes in the model specification including univariate versus bivariate models for literacy proportions, tuning parameters in the Hamiltonian Monte Carlo and no-U-turn sampler algorithms and relaxed normality assumptions in the bivariate HB models for proportions.

The methods of external model validation included examining histograms of differences between model-based and survey estimates, shrinkage plots, interval coverage plots, bubble plots of survey regression estimates and model-based estimates, and smoothed and small area model variances, as well as comparing aggregates of model predictions and survey estimates. The model results were also assessed on the merits of improvements in precision.

Results

As shown by Table 2, at the county level, the imputation error can be a significant portion of the variance and cannot be ignored when producing variance estimates. For literacy skills, imputation contributes on average 11% of total variance for the average score, 22% of total variance for the proportion at or below Level 1,35% of total variance for the proportion at Level 2 and 20% of total variance for the proportion at Level 3 and above. Across counties, the contribution to the total variance from multiple imputation ranges from nearly 0% to 90%. The distribution of the proportion for numeracy is similar to that for literacy. The results clearly show that the survey estimates and variance estimates must properly use the PVs.

Table 2.

Distribution of the Proportion of Variance Associated with Multiple Imputation for Direct Estimates Across Counties: 2012/2014/2017.

Proficiency Domain	Proportion of Variance due to Multiple Imputation for	N^a	Mean	Minimum	Maximum	Standard Deviation
Literacy	Average score	170	0.11	0.00	0.87	0.122
Literacy	Proportion at or below Level I	170	0.22	0.02	0.90	0.154
Literacy	Proportion at Level 2	170	0.35	0.07	0.82	0.143
Literacy	Proportion at Level 3 and above	170	0.20	0.00	0.82	0.142
Numeracy	Average score	170	0.10	0.01	0.73	0.097
Numeracy	Proportion at or below Level I	170	0.21	0.02	0.81	0.149
Numeracy	Proportion at Level 2	170	0.39	0.07	0.81	0.145
Numeracy	Proportion at Level 3 and above	169	0.21	0.03	0.70	0.139

Note. aVariances could not be estimated for 15 counties that have just have one secondary sampling unit. The remaining 170 counties have at least two clusters. For numeracy, one county has a variance equal to 0 because it has no respondents with scores at Level 3 and above.

In Table 3, for averages, the proportion at or below Level 1 and that at Level 3 and above, the median SRE variance is shown to decrease substantially in comparison to the variance associated with the corresponding direct estimates. However, for the proportion at Level 2, there is only a modest decline. The R² for the Level 2 models are of the order 0.04 compared with 0.27 for the Level 1 or below models, 0.25 for the Level 3 and above models, and 0.40 for averages. Also, from the table, one can see the impact from the variance smoothing process. While the level of variance is essentially maintained as shown by the median, the standard deviation of the variance is lower for the smoothed variance than for the SRE variance, as expected.

Related to covariate selection, the PSU selection probability was initially included as a potential county-level covariate to account for the informative sampling design; however, it is not identified as a significant predictor through the covariate selection process. Covariates with the highest correlations are education related; for example, the correlation is greater than 0.7 for the proportion of population with lower than high school education versus proportion at or below Level 1 literacy. Reducing the large pool of variables through the use of correlations arrived at the following initial covariates selected at the county level: percentage of population age 25 years and above with less than high school education (no high school diploma), percentage of population age 25 years and above with more than high school education (including some college, no degree), percentage of population below 100% of the poverty line, percentage of Black or African-American population, percentage of Hispanic population, percentage of civilian noninstitutionalized population who has no health insurance coverage, percentage of population age 16 years and above with service occupations, percentage of foreign-born people who entered the United States after year 2010 among the population born outside the United States, percentage of population born outside of the United States, percentage population age 16 years and above who did not work at home who spend more than 60 minutes to travel to work, unemployment rate, percentage of diabetes diagnosed, the birth rate per 1,000 women, and the average amount of grant and scholarship aid received. During the cross-validation phase of the variable selection process, a decision was made to use the same set of seven county-level variables from the 2013 to 2017 ACS data in all four models fitted for proportions and averages for literacy and numeracy. The seven covariates provide strong models for proportions and averages. For example, the adjusted R² is 0.58 for the linear regression of literacy proportions at or below Level 1 on the seven covariates. More details on the results for each step of the covariate selection process are published.²³ The estimated model parameters are given in Table 4 for the bivariate models for proportions and in Table 5 for the univariate models for averages.

Table 3.

Distribution of Variance Estimates Prior to SRE, After SRE and After Smoothing: 2012/2014/2017

Proficiency Domain	Sampling Variance for	Stage	Number of Counties	Minimum	Median	Mean	Maximum	Standard Deviation
Literacy	Average score	Direct estimate	170	16.51	100.23	127.48	663.42	111.623
		SRE	170	8.73	47.46	61.93	476.57	58.305
		Smoothed	170	15.86	45.55	56.20	275.15	40.635
	Proportion at or below Level 1	Direct estimate	170	0.0007	0.0058	0.0079	0.0566	0.00771
		SRE	170	0.0006	0.0036	0.0055	0.0594	0.00648
		Smoothed	170	0.0009	0.0035	0.0054	0.0427	0.00595
	Proportion at Level 2	Direct estimate	170	0.0011	0.0069	0.0094	0.0600	0.00822
		SRE	170	0.0011	0.0067	0.0090	0.0602	0.00817
		Smoothed	170	0.0011	0.0071	0.0090	0.0636	0.00803
	Proportion at Level 3 and above	Direct estimate	170	0.0040	0.0160	0.0246	0.2306	0.02586
		SRE	170	0.0010	0.0051	0.0065	0.0659	0.00640
		Smoothed	170	0.0010	0.0054	0.0064	0.0309	0.00451
Numeracy	Average score	Direct estimate	170	18.00	127.99	164.41	771.03	138.954
		SRE	170	8.94	60.43	78.15	737.71	87.124
		Smoothed	170	19.62	54.70	67.45	346.00	46.129
	Proportion at or below Level 1	Direct estimate	170	0.0016	0.0078	0.0101	0.0712	0.00916
		SRE	170	0.0008	0.0045	0.0066	0.0594	0.00715
		Smoothed	170	0.0010	0.0045	0.0063	0.0454	0.00589
	Proportion at Level 2	Direct estimate	170	0.0010	0.0068	0.0097	0.1004	0.00983
		SRE	170	0.0008	0.0065	0.0095	0.0937	0.00949
		Smoothed	170	0.0010	0.0072	0.0092	0.0749	0.00827
	Proportion at Level 3 and above	Direct estimate	170	0.0042	0.0201	0.0303	0.3254	0.03256
		SRE	170	0.0006	0.0047	0.0063	0.0637	0.00726
		Smoothed	170	0.0010	0.0050	0.0058	0.0300	0.00407

Table 4.

Final HB Model Estimates for Regression Coefficients and Components of the Variance–Covariance Matrices of Random Effects: For Literacy and Numeracy Proportions: 2012/2014/2017

	Literacy at or Below Level 1		Literacy at or Above Level 3		Numeracy at or Below Level 1		Numeracy at or Above Level 3
Parameters^a	HB mean	HB standard deviation	HB mean	HB standard deviation	HB mean	HB standard deviation	HB mean	HB standard deviation
Intercept	0.12	0.09	0.09	0.10	0.28	0.10	−0.16	0.10
Education—LH	0.67	0.23	−0.02	0.25	0.52	0.25	0.53	0.25
Education—MH	−0.13	0.09	0.81	0.11	−0.30	0.11	0.97	0.11
Poverty	0.20	0.15	−0.14	0.18	0.37	0.17	−0.13	0.18
Black	0.16	0.05	−0.17	0.05	0.26	0.05	−0.17	0.05
Health insurance	−0.07	0.16	−0.15	0.20	0.01	0.18	−0.32	0.19
Hispanic	0.19	0.06	−0.12	0.06	0.18	0.07	−0.16	0.06
Service occupations	0.10	0.18	−0.19	0.21	0.18	0.22	−0.17	0.21
$Σ_{c, 1, 1}$	0.00010	0.00013			0.00019	0.00022
$Σ_{c, 1, 2}$ , $Σ_{c, 2, 1}$	−0.00007	0.00013			−0.00018	0.00024
$Σ_{c, 2, 2}$	0.00023	0.00027			0.00047	0.00042
$Σ_{v, 1, 1}$	0.00013	0.00016			0.00015	0.00020
$Σ_{v, 1, 2}$ , $Σ_{v, 2, 1}$	−0.00005	0.00011			−0.00008	0.00015
$Σ_{v, 2, 2}$	0.00015	0.00020			0.00023	0.00027
$Σ_{d, 1, 1}$	0.00010	0.00018			0.00014	0.00025
$Σ_{d, 1, 2}$ , $Σ_{d, 2, 1}$	−0.00001	0.00014			−0.00005	0.00016
$Σ_{d, 2, 2}$	0.00037	0.00072			0.00026	0.00052

Note. ^a Education—LH: proportion of population age 25 years and above with less than high school education; education—MH: proportion of population age 25 years and above with more than high school education; poverty: proportion of population below 100% of the poverty line; Black: proportion of the Black or African-American population; Hispanic: proportion of the Hispanic population; health insurance: proportion of the civilian non-institutionalized population who has no health insurance coverage; service occupations: proportion of population age 16 years and above with service occupations.

Table 5.

Final HB Model Estimated Regression Coefficients and Variances of Random Effects: for Literacy and Numeracy Averages: 2012/2014/2017.

	Literacy		Numeracy
Parameters^a	HB Mean	HB Mean	HB Mean	HB Mean
Intercept	243.50	11.53	223.99	12.60
Education—LH	-29.87	30.44	1.36	32.93
Education—MH	72.46	12.89	88.38	13.93
Poverty	-17.83	20.80	-44.31	22.31
Black	-23.49	5.97	-34.40	6.58
Health insurance	-14.30	24.32	-25.65	25.79
Hispanic	-28.79	7.94	-36.47	8.58
Service occupations	-42.78	24.42	-46.98	26.69
$σ_{c}^{2}$	15.32	6.74	17.71	8.20
$σ_{v}^{2}$	5.70	5.35	6.53	5.79
$σ_{d}^{2}$	6.96	10.52	6.63	10.32

Note. aEducation—LH: proportion of population age 25 years and above with less than high school education; education—MH: proportion of population age 25 years and above with more than high school education; poverty: proportion of population below 100% of the poverty line; Black: proportion of Black or African-American population; Hispanic: proportion of Hispanic population; health insurance: proportion of civilian noninstitutionalized population who has no health insurance coverage; service occupations: proportion of population age 16 years and above with service occupations.

In Table 6, the mean and quartiles of diagnostic statistics are shown, including the effective sample size, Gelman-Rubin $\hat{R}$ statistic,²⁶ MC standard error, autocorrelation and cross-correlation across all the monitored parameters. Table 6 results indicate that convergence and mixing of the three chains have been reached. Particularly, after accounting for autocorrelation, none of the monitored parameters has an MC chain sample size less than 5% of the total sample size (4,500 samples) or an MC standard error greater than 10% of the posterior standard deviation or an $\hat{R}$ above 1.1. Autocorrelations within chains and cross-correlation among the monitored parameters are low. In other diagnostics performed but not shown here, for proportions at or below Level 1 for literacy, the posterior predictive p values for the indicator test statistics are close to 0.5, the deviations and the unscaled residuals are close to zero, and the scaled residuals range is within -1.96 to 1.96. Therefore, overall, there is no substantial indication for the model lack-of-fit. The posterior predictive checks for the other quantities of interest (literacy proportions at or above Level 3, numeracy proportions, and literacy and numeracy averages) do not indicate the lack of fit for the models adopted. The final models for the state and county model-based estimates passed the various diagnostic tests and found to be insensitive to different model assumptions. The measures of the model fit indicate good fits to the data.

Table 6.

Convergence Diagnostics for the MCMC.

Metric	$\hat{R}$	Effective Sample Size	MC Standard Error/Posterior Standard Deviation	Autocorrelation Lag 1	Autocorrelation Lag 5	Cross-correlation
Minimum	0.9993	0.000	0.0000	-0.0520	-0.0439	-0.9554
Ist quantile	0.9998	3,934.763	0.0149	0.0030	-0.0077	-0.0190
Median	1.0000	4,303.378	0.0153	0.0189	0.0045	-0.0001
Mean	1.0002	4,073.012	0.0162	0.0312	0.0095	0.0040
3rd quantile	1.0004	4,500.000	0.0160	0.0372	0.0178	0.0189
Maximum	1.0234	5,844.301	0.0982	0.9302	0.7471	1.0000

To determine the benefits from using the bivariate model, the proportions using univariate models are modelled with uniform priors on the variances, on a wide range, 0-1.000,. and compared against the bivariate model with LKJ on the correlation matrix (specification one in the report) and half-Cauchy on the standard deviations. Figure 2 illustrates the results for these comparisons for county-level literacy proportions under univariate and bivariate HB models, respectively. When the proportions are modelled using a bivariate model, the posterior variances are reduced as expected.

In Figure 1, the top row relates to proportions at or below Level 1 and the bottom row relates to proportions at or above Level 3. The left-hand side illustrates the relationship between the posterior means, while the right-hand side displays the relationship between the posterior standard deviations.

Figure 1.

Posterior Means and Standard Deviations for County-Level Literacy Proportions Under Univariate and Bivariate HB Models: 2012/2014/2017.

The Figure 2 histograms are the differences between survey regression-estimated literacy proportions and model-based estimates. Overall, the differences for means and medians are approximately 0, while the majority of the differences are within 20 percentage points. The large differences between model predictions and the survey regression estimates (about 20-40 percentage points) are mostly associated with counties that have small sample sizes and are therefore not a concern because they are less reliable than the corresponding estimates for counties with large sample sizes.

Figure 2.

Literacy Proportion—Histograms of Differences Between Survey Regression and Indirect Estimates.

As expected, and shown in Figure 3, areas with smaller sample size have more significant shrinkage towards the means than areas with larger sample sizes. When sample sizes are greater than 100, model predictions and the survey regression estimates become much more similar. We note that there is one county that has sample size around 160 with larger shrinkage in proportions at Level 2 and proportion at or above Level 3.

Figure 3.

Literacy Proportion—Shrinkage Plots of Point Estimates, by Sample Size.

In the interval coverage plots of Figure 4, the credible intervals for areas with large sample sizes mainly cover the survey regression estimates. However, when the sample sizes are small (<50), as expected, some credible intervals do not cover the survey regression estimates because the survey regression estimates are less reliable and contribute less to the model-based estimates.

Figure 4.

Literacy Proportion—Indication of Coverage by Credible Interval.

The scatterplot in Figure 5 shows the majority of survey regression estimates and indirect estimates around the 45∘ line. That is, the model predictions are generally close to the survey regression estimates. Counties with larger sample sizes, indicated by larger bubbles, have closer estimates than counties with smaller sizes. As expected, some of the small counties are farther away from the 45∘ lines because of their large sampling errors. Likely due to P1 and P3 being in the model fit and estimation, the proportion at or below Level 1 (P1) and proportion Level 3 and above (P3) have closer estimates than proportion at Level 2 (P2).

Figure 5.

Literacy Proportion—Comparison Between Survey Regression Estimates and Indirect Estimates.

The plot in Figure 6 shows that for areas with small sample sizes, the posterior standard deviations from the SAE model are smaller than the smoothed standard errors of the survey regression estimates and the posterior standard deviations from the SAE model. For these plots, because the standard errors of proportions depend on the sizes of the estimated proportions, the model proportion could be different from the survey regression proportion, and therefore the variance will in theory be different.

Figure 6.

Literacy Proportion—Comparison Between Model Standard Errors and Smoothed Standard Errors.

The precision of the model-based estimates depends heavily on the ability of the covariates in the model to predict the outcomes. The model-based estimates produced for counties not in the sample, therefore, rely almost entirely on the model predictions, with some contributions from the division and/ or state random effects. The model-based estimates for counties that were included in the sample (and for which direct estimation is possible) also relied heavily on the model predictions because their direct estimates were based on small samples and are generally imprecise. Table 7 summarizes the distributions of the widths (the difference between the upper bound and the lower bound) of the credible intervals as well as the coefficients of variation (CVs) for the 3,142 counties and 51 states in the United States, for literacy proportion at or below Level 1. Overall, the state predictions are more precise than the county predictions, and to a less extent, the counties with the sample are more precise than counties without the PIAAC sample. For example, for the proportion at or below Level 1 in literacy, the median credible interval width for county predictions is 8.0 percentage points, while the median is 6.1 percentage points for state predictions. Also, the median credible interval width is 7.2 percentage points for counties with the PIAAC sample and 8.0 percentage points for counties without the PIAAC sample. The CVs for the county-level model predictions are of the order of 10%. Estimates with CVs of this magnitude are considered precise.

Table 7.

Credible Interval Widths and Coefficients of Variation (in %) for Model-Based Estimates: 2012/2014/2017

	Literacy				Numeracy
Statistic	At or Below Level 1	At Level 2	At or Above Level 3	Averages	At or Below Level 1	At Level 2	At or Above Level 3	Averages
County model-based estimates
95% credible interval width	8.0	10.8	10.3	19.1	9.5	11.7	12.2	20.5
Coefficient of variation	10.0	7.2	6.3	1.8	7.7	7.8	10.1	2.1
Sampled counties
95% credible interval width	7.2	9.8	9.3	15.3	8.5	10.5	10.9	16.7
Coefficient of variation	9.3	7.3	5.0	1.5	7.3	7.9	7.5	1.7
Non-sampled counties
95% credible interval width	8.0	10.9	10.3	19.2	9.6	11.8	12.3	20.6
Coefficient of variation	10.0	7.2	6.4	1.8	7.7	7.8	10.3	2.1
State model-based estimates
95% credible interval width	6.1	8.2	7.3	11.0	6.7	8.3	7.9	11.7
Coefficient of variation	8.1	6.0	3.9	1.0	5.9	6.0	5.2	1.2

A comparison is shown in Table 8 between the national-level model predictions and the direct estimates for proportions and averages for literacy and numeracy. Because the two sets of estimates are not significantly different, it shows that benchmarking to national direct estimates was not necessary and that the use of a threefold model helps to reduce the need for benchmarking.

Table 8.

National-Level Model-Based and Survey Estimates: 2012/2014/2017

Model	Model-Based Estimate	Posterior Standard Deviation of Model-Based Estimate	Survey Estimate	Standard Error of Survey Estimate
Literacy P1	0.218	0.0048	0.226	0.0050
Literacy P2	0.323	0.0062	0.322	0.0063
Literacy P3	0.458	0.0056	0.452	0.0061
Numeracy P1	0.319	0.0052	0.321	0.0069
Numeracy P2	0.322	0.0062	0.321	0.0069
Numeracy P3	0.360	0.0054	0.359	0.0074
Literacy average	263.5	0.61	263.3	0.44
Numeracy average	249.1	0.67	248.9	0.84

Limitations and Summary

Covariates that have associated sampling or non-sampling error (e.g., possible systematic bias due to inaccurate measurements) cause the measurement error to occur in SAE models. The covariates may have inaccurate measurements with a possible systematic bias. The SAE model results rely on the precision of the covariates and the correlation between the outcome of interest and the covariates. SAE models in the literature with covariates at low levels of geography, such as census tracts, or cross-tabulations of county-level variables from the ACS, have the substantial sampling error or measurement error associated with them. An approach has been provided to allow the propagation of the measurement error into the small area estimates.²⁸ However, because the approach may not be able to extend to this many variables, we have not accounted for this source of measurement error. When the covariate sampling error exists (measurement error) and is not accounted for, the model is mis specified and will misstate the prediction MSE. ²⁹ An exception is when counties with the sampling error for the covariates equal the average sampling error across the counties. It is interesting to note that the resulting prediction error is generally an overestimate when the county’s sampling error is below average, and more severely underestimates when above average. Small counties likely have less precise ACS data; therefore, cautionary notes are provided on the state- and county model-based estimates website that alert users that the covariates used in the model or predictions for counties with population less than 1,500 (2% of counties) may have higher associated uncertainty.

One feature of the PIAAC SAE process was that models account for informative sampling and informative nonresponse; otherwise, the process would have resulted in biased estimates. The set of counties were selected by way of probability proportionate-to-size sampling; therefore, the sample design is informative. To address the issue of informative sampling, we included the probability of selection of PSUs in the covariate selection process. Because it was not an important factor, it was decided to exclude it from the final SAE model. Also, to help address informative nonresponse, weighting adjustments can be effective if the weighting variables are associated with the proficiency scores. Literacy-related nonresponse (about five percent) was also addressed through imputation of low scores prior to creating the direct estimates.

Second, another feature was that the models accounted for all important sources of variability so that the reported estimated error reflects the true level of precision. For PIAAC, these sources of error include the following: (a) the sampling error, (b) model error (c) the prediction error and (d) the imputation error.

The sampling error results from probability sampling and from the fact that different results would occur for repeated samples. Likewise, the imputation error results from the generation of PVs and that different results would occur for replications of the imputation process. The uncertainty due to sampling and imputation has been accounted for in the SAE process and captured in the HB model. The model error results from the estimation of model parameters, such as area-level random effects. This type of error accounts for different results occurring for different runs of the modelling process due to its random mechanism in fitting the models. The HB method accounted for the noise contributions attributed to estimating model parameters (beta coefficients and random effects variance-covariance parameters). The prediction error results from making estimates from the final model for areas (including those without PIAAC sample cases) and are accounted for in the resulting credible intervals.

Although the modelling process allows various sources of error to propagate into the results, the following model features have been implemented to reduce the amount of error. The SRE approach reduces the amount of error associated with the survey estimates. In addition, the thorough covariate selection process results in strong models, which included the bivariate model for the proportions. The model also takes advantage of the covariance between domains by using the same covariates for each domain.

Finally, the three levels of random effects - county, state and census division (threefold model) ensure that estimates are not fully synthetic. That is, states that do not have the PIAAC sample will have some contribution from the PIAAC sample because all census divisions have the PIAAC sample. Furthermore, because the same random effect is applied to counties within states, and states within division, the associations of counties within states and states within divisions will have some impact.

The resulting estimates are accessible through the PIAAC Skills Map for State and County Indicators of Literacy and Numeracy.² The Skills Map provides interested users the ability to analyse and explore the model-based estimates through heat maps and allows for statistical comparisons between counties, between counties and their state, between states, and between states and the nation. Figure 7 shows a heat map for the county model-based estimates for the proportion at or below Level 1 numeracy.

Figure 7.

County-Level Heat Map of Model-Based Proportions at or Below Level-I Literacy.

Footnotes

Acknowledgements

The authors are grateful for the guidance provided by J. N. K. Rao, who was a consultant on the project during both the NAAL and the PIAAC SAE processes. Also, the contributions of Weijia Ren to the evaluation and graphs were much appreciated. In addition, Partha Lahiri, William Bell and Danny Pfeffermann provided some important feedback in the early stages of model development as members of the international SAE expert panel convened by NCES. The authors greatly appreciated the reviewer comments which improved the manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research and development of the SAE models was conducted under contract to the National Center for Education Statistics.

Notes

ORCID iDs

Tom Krenzke

Andreea L. Erciulescu

References

Fay

, and Herriot

RA.

Estimates of income for small places: An application of James-Stein procedures to census data. J Am Stat Assoc 1979; 74: 269–277.

Battese

, Harter

, and Fuller

WA.

An error-components model for prediction of county crop areas using survey and satellite data. J Am Stat Assoc 1988; 83: 28–36.

Rao

JNK

, and Molina

Small area estimation . 2nd ed. (The Wiley Series in Survey Methodology). Hoboken, NJ: Wiley, 2015.

Pfeffermann

New important developments in small area estimation. Stat Sci 2013; 28: 40–68.

Pfeffermann

and Sverchkov

Small area estimation under informative probability sampling of areas and within selected areas. J Am Stat Assoc 2007; 102: 1427–1439.

Datta

, Fay

, and Ghosh

Hierarchical and empirical Bayes multivariate analysis in small area estimation. In: Proceedings of Bureau of Census 1991 annual research conference , U.S. Bureau of the Census, Washington, DC, 1991, pp. 63–79.

Benavent

and Morales

Multivariate Fay-Herriot models for small area estimation. Comput Stat Data Anal 2016; 94: 372–390.

Erciulescu

, and Opsomer

JD.

A model-based approach to predict employee compensation components. JR Stat Soc: C 2022; 71: 1503–1520. https://doi.org/10.1111/rssc.12587

Mohadjer

, Kalton

, and Krenzke

, . National assessment of adult literacy indirect county and state estimates of the percentage of adults at the lowest literacy level for 1992 and 2003 (NCES 2009-482). U.S. Department of Education. National Center for Education Statistics. Washington, DC, 2009.

10.

Mohadjer

, Rao

JNK

, Liu

, . Hierarchical Bayes small area estimates of adult literacy using unmatched sampling and linking models. J Indian Soc Agric Stat 2011; 66: 1–9.

11.

Pfeffermann

, Terryn

and Moura

FAS

. Small area estimation under a two-part random effects model with application to estimation of literacy in developing countries. Surv Methodol 2008; 34: 235–249.

12.

Gibson

and Hewson

2011 Skills for life survey: Small area estimation user guide. The United Kingdom Department for Business, Innovation and Skills research paper number 81D, 2012, https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/36076/12-1316-2011-skills-for-lifesmall-area-estimation-user-guide.pdf (accessed 24 December 2019).

13.

Yamamoto

Using PIAAC data to produce regional estimates [Unpublished manuscript] . Princeton, NJ: Educational Testing Service, 2014.

14.

Bijlsma

, Van den Brakel

, Van der Velden

, . Estimating literacy levels at a detailed regional level: An application using Dutch data. J Off Stat 2020; 36: 251–274. https://doi.org/10.2478/jos-2020-0014

15.

Krenzke

, Mohadjer

, Li

, . Program for the International Assessment Of Adult Competencies (PIAAC). State and County Estimation Methodology Report . Report no. NCES 2020-225. Report for the U.S. Department of Education, National Center for Education Statistics. Washington, DC: U.S. Government Printing Office, 2020.

16.

Krenzke

, Van de Kerckhove

, Thornton

. U.S. Program for the International Assessment of Adult Competencies (PIAAC) 2012/2014/2017: Main Study, National Supplement, and PIAAC 2017 technical report: Report for the National Center for Education Statistics . Technical report no. NCES 2020-224. Washington, DC: U.S. Department of Education, 2019.

17.

Yamamoto

, Khorramdel

and Von Davier

Scaling PIAAC cognitive data. In: Technical Report of the Survey of Adult Skills (PIAAC) [Prepublication copy 2013], Chapter 17, pp. 406–438. Paris: Organisation for Economic Cooperation and Development, https://www.oecd.org/site/piaac/_TechnicalReport_17OCT13.pdf (accessed December 24, 2019).

18.

Rampey

, Finnegan

, Goodman

, . Skills of U.S. unemployed, young, and older adults in sharper focus: Results from the Program for the International Assessment of Adult Competencies (PIAAC) 2012/2014: First Look . Report no. NCES 2016-039rev. Washington, DC: U.S. Department of Education, National Center for Education Statistics, 2016.

19.

Goodman

, Finnegan

, Mohadjer

, . Literacy, numeracy, and problem solving in technology-rich environments among U.S. adults: Results from the Program for the International Assessment of Adult Competencies 2012: First Look. Report no. NCES 2014-008. Washington, DC: U.S. Department of Education, National Center for Education Statistics, 2013. https://nces.ed.gov/pubs2014/2014008.pdf (accessed December 24, 2019).

20.

Rubin

Multiple imputation for nonresponse in surveys . New York: Wiley, 1987.

21.

Wolter

KM.

ed. Taylor series methods and generalized variance functions. In: Introduction to variance estimation. Statistics for social and behavioral sciences . New York: Springer, 2007.

22.

Särndal

, and Hidiroglou

Small domain estimation: A conditional analysis. J Am Stat Assoc 1989; 84: 266–275. https://doi.org/10.1080/01621459.1989.10478765

23.

Ren

, Li

, Erciulescu

, . A variable selection method for small area estimation modeling of the proficiency of adult competency. Stats 2022; 5: 689–713.

24.

Tibshirani

The lasso method for variable selection in the Cox model. Stat Med 1997; 16: 385–395.

25.

Lewandowski

, Kurowicka

and Joe

Generating random correlation matrices based on vines and extended onion method. J Multivariate Anal 2009; 9: 1989–2001. https://doi.org/10.1016/j.jmva.2009.04.008

26.

Gelman

and Rubin

Inference from iterative simulation using multiple sequences. Stat Sci 1992; 7: 457472.

27.

Rao

JNK.

Small area estimation . New York: Wiley, 2003.

28.

Ybarra

LMR

, and Lohr

SL.

Small area estimation when auxiliary information is measured with error. Biometrika 2008; 95: 919–931.

29.

Bell

, Chung

, Datta

, . Measurement error in small area estimation: Functional versus structural versus naïve models. Surv Methodol 2019; 45: 61–80.

An Area-Level Hierarchical Bayes Bivariate Threefold Linear Model for US State and County Indicators of Adult Skills in Literacy and Numeracy

Abstract

Keywords

Introduction

Background on Ingested Data: Sources and Key Issues

Key Issues Inherent in the Ingested Data from PIAAC

Number of Completed Cases per County.

Predictor Variables

Description of the PIAAC SAE Process

Creating the Model Inputs

Selecting Covariates for the SAE Models

Developing the Model and Conducting Diagnostics

Models for Proportions

Models for Averages

Model Fitting, Estimation and Prediction

Aggregation to the State Level and the National Level

Diagnostics

Results

Distribution of the Proportion of Variance Associated with Multiple Imputation for Direct Estimates Across Counties: 2012/2014/2017.

Distribution of Variance Estimates Prior to SRE, After SRE and After Smoothing: 2012/2014/2017

Final HB Model Estimates for Regression Coefficients and Components of the Variance–Covariance Matrices of Random Effects: For Literacy and Numeracy Proportions: 2012/2014/2017

Final HB Model Estimated Regression Coefficients and Variances of Random Effects: for Literacy and Numeracy Averages: 2012/2014/2017.

Convergence Diagnostics for the MCMC.

Posterior Means and Standard Deviations for County-Level Literacy Proportions Under Univariate and Bivariate HB Models: 2012/2014/2017.

Literacy Proportion—Histograms of Differences Between Survey Regression and Indirect Estimates.

Literacy Proportion—Shrinkage Plots of Point Estimates, by Sample Size.

Literacy Proportion—Indication of Coverage by Credible Interval.

Literacy Proportion—Comparison Between Survey Regression Estimates and Indirect Estimates.

Literacy Proportion—Comparison Between Model Standard Errors and Smoothed Standard Errors.

Credible Interval Widths and Coefficients of Variation (in %) for Model-Based Estimates: 2012/2014/2017

National-Level Model-Based and Survey Estimates: 2012/2014/2017

Limitations and Summary

County-Level Heat Map of Model-Based Proportions at or Below Level-I Literacy.

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

Notes

ORCID iDs

References