A Disaggregation Method for Estimation of Adult Competency in Small Domains

Abstract

Estimation at fine levels of aggregation is necessary, to better describe society. Small area estimation model-based approaches that combine sparse survey data with rich data from auxiliary sources have been proven useful to improve the reliability of estimates for small domains. Considered here is a scenario where small area model-based estimates, produced at a given aggregation level, need to be disaggregated to better describe the social structure at finer levels. For this scenario, an allocation method was developed to implement the disaggregation, overcoming challenges associated with data availability and model development at such fine levels. The method is applied to adult literacy and numeracy estimation at the county-by-group-level, using data from the U.S. Program for the International Assessment of Adult Competencies. In this application, the groups are defined in terms of age or education.

Keywords

allocation official statistics Program for the International Assessment of Adult Competencies small area estimation

1. Introduction

Making effective evidence-based policies and laws relating to adult education requires sound research based on reliable data that are most relevant to jurisdictions such as counties, states, and demographic groups within counties and states. As a multicycle international study involving over thirty countries under the leadership of the Organization for Economic Cooperation and Development (OECD), the first cycle of the Program for the International Assessment of Adult Competencies (PIAAC) was designed to provide national estimates of the proficiency of adult literacy, numeracy, and problem-solving skills. In the United States (U.S.), PIAAC is sponsored by the National Center for Education Statistics (NCES) at the Institute of Education Sciences (IES). The U.S. PIAAC survey involves a multistage probability design with in-person data collections that include a screener questionnaire, a background questionnaire, and an assessment of adult skills. Approximately 12,330 U.S. adults aging 16 to 74 and living in households were surveyed for U.S. PIAAC during three rounds of data collection fielded in 2012, 2014, and 2017. Each round of data collection involved an area probability sample, including a national sample in 2012, a national supplement in 2014 from the same selected areas as in 2012, and another national sample in 2017 from a new sample of areas. Combining the 2012/2014/2017 samples increased the number of counties with PIAAC data for the small area estimation modeling process. Sample weights were created for the combined PIAAC 2012/2014/2017 sample. The 2012–16 American Community Survey (ACS) 5-year Public Use Micro Sample (PUMS) data were used to construct control totals for calibrating the PIAAC sample weights for the combined sample. Details about the 2012/2014/2017 combined sample are provided in Krenzke et al. (2019). In the rest of the paper, we will refer to the U.S. PIAAC as PIAAC and we will refer to the 2012/2014/2017 U.S. PIAAC survey sample as PIAAC survey sample.

National survey estimates constructed using the PIAAC sample were of high quality. However, the PIAAC sample size was too small to support the production of survey estimates at disaggregate levels. To address the need for disaggregated statistics, Krenzke et al. (2020) and Li et al. (2022) developed hierarchical Bayes model-based small area estimation (SAE) methodology to produce county, state, and state by age and education groups estimates for average scores for literacy and numeracy, and various proficiency levels. The methodology used the PIAAC survey data and auxiliary data from the ACS. The resulting SAE estimates provide reliable U.S. official statistics of adult literacy and numeracy skills in all fifty states, all 3,141 counties, and the District of Columbia, and in all fifty states and the District of Columbia by six age groups, 16 to 24, 25 to 34, 35 to 44, 45 to 54, 55 to 64, and 65 to 74 year old, and four education groups, less than high school, high school diploma or an alternative to high school diploma, some college (no degree or attained associate’s degree), and bachelor’s degree or higher.

Complementing the estimation of county, state, and state by age and education groups adult proficiency, this paper further addresses the need for disaggregated statistics. Specifically, county by group estimation of adult proficiency is addressed, where the quantities of interest are the same as the ones considered in Krenzke et al. (2020) and Li et al. (2022), and the groups of interest are the same ones considered in Li et al. (2022). A deterministic method is used to allocate the previously available state by group estimates to the county by group level, for all the counties nested within the corresponding state, using the ratios of county to state estimates. The allocation is applied to the posterior samples available for the state, county, and state by group quantities of interest, resulting in pseudo-posterior distributions for the county by group quantities of interest. The resulting county by group-level estimates are available in a publicly available interactive tool known as the Skills Map and available at https://nces.ed.gov/surveys/piaac/skillsmap/, along with the previously available county, state, and state by group-level estimates, for all outcomes of interest, the proportions at or below Level 1 (P1), at Level 2 (P2), and at or above Level 3 (P3), and average scores.

The rest of the paper is organized as follows. In Section 2, we describe the data available from PIAAC, the Skills Map, and the ACS, serving as input into the county by group-level estimation process. The allocation method is presented in Section 3 and validation metrics are provided in Section 4. A discussion is provided in Section 5. Additional details about the allocation method are included in the Appendix. Unless otherwise noted, selected results are presented throughout the manuscript only for one quantity of interest, the proportion at or below Level 1 literacy, and for one demographic group, the population with less-than-high-school education. Results for all quantities of interest and for all domains of interest are comparable and are available in the Skills Map and in the report Erciulescu et al. (2022).

2. Data Available at Various Levels of Aggregation

Person-level survey data are available from PIAAC and are used to produce survey estimates for the county by group domains with sample data. Because these data are sparse at disaggregated levels such as county by group, an indirect estimation approach is developed. The indirect estimation approach uses data produced as the output of SAE models for higher levels of aggregation, that is, county, state, and state by group. The survey estimates and select auxiliary data are used in the validation process of the indirect county by group estimates. Selected details about the PIAAC survey, model, and auxiliary data are provided in the rest of this section. Providing further details is beyond the purposes of this paper. See Krenzke et al. (2020) and Li et al. (2022) for additional details, and Krenzke et al. (2024) for a succinct version of Krenzke et al. (2020).

2.1. Survey Estimates

Following Li et al. (2022), a multiple imputation approach (Rubin 1987, 1996) is implemented for constructing the survey estimates and the associated variance estimates. The PIAAC micro-level data includes ten plausible values generated from a posterior distribution by combining item response theory scaling of the cognitive items with a latent regression model using information from the background questionnaire in a population model (Hogan et al. 2016; Krenzke et al. 2019; Mislevy and Sheehan 1989, 1992). For each plausible value, Háyek-type estimates and associated Taylor-series approximated variance estimates (Särndal et al. 1992) are constructed for the county by group-level quantities of interest, with survey weights calibrated to 2013 to 2017 ACS control totals for age groups, education levels, gender, and race/ethnicity within state. The variance units or clusters are assumed to be the secondary sampling units, which are groups of census blocks. Variances for domains nested within a variance cluster are nonestimable. Therefore, for county-by-group domain estimation, the methodology in Krenzke et al. (2020) and Li et al. (2022) is not directly applicable.

Let $m$ denote the plausible value, $j$ denote the state, $k$ denote the county, and $g$ denote the group. Then, the county by group survey estimate for plausible value $m$ is defined as

Y_{g, j k}^{(m)} = \frac{\sum_{a = 1}^{n_{g, j k}} w_{g, j k a} y_{g, j k a}^{(m)}}{\sum_{a = 1}^{n_{g, j k}} w_{g, j k a}},

where $w_{g, jka}$ is the sampling weight for person $a$ , $y_{g, jka}^{(m)}$ is the proficiency score (for average) or an indicator variable for the proficiency level (for proportions), and $n_{g, jk}$ is the number of cases in the domain. Then, the estimates for the ten plausible values are averaged to produce the county by group-level survey estimates,

Y_{g, jk} = 10^{- 1} \sum_{m = 1}^{10} Y_{g, jk}^{(m)} .

To complete the survey estimation process, the variances of the county by group-level survey estimates are estimated using the multiple imputation approach by combining the within-imputation and between-imputation variances,

σ_{g, jk}^{2} = σ_{W, g, jk}^{2} + \frac{11}{10} σ_{B, g, jk}^{2},

where $σ_{W, g, jk}^{2}$ is the within-imputation variance and $σ_{B, g, jk}^{2}$ is the between-imputation variance. The within-imputation variance is computed as the average of the sampling variances for the ten plausible values,

σ_{W, g, jk}^{2} = 10^{- 1} \sum_{m = 1}^{10} σ_{g, jk}^{2 (m)},

where $σ_{g, jk}^{2 (m)}$ is the sampling variance of the estimated mean or proportion for plausible value $m$ . The between-imputation variance is computed as follows:

σ_{B, g, jk}^{2} = 9^{- 1} \sum_{m = 1}^{10} {(Y_{g, jk}^{(m)} - Y_{g, jk})}^{2} .

The number of county by group domains with sample data and the distributions of county by group domain sample sizes are provided in Table 1. With a small number of county-level survey estimates available for the groups, an indirect estimation approach is deemed necessary to produce estimates for all the county by age/education group domains. The medians of the county by group domain sample sizes range from 6.5 to 16, indicating that the survey estimates are subject to great uncertainty in most county by group domains. As illustrated in Table 2, the median coefficients of variation (CVs) for county-level survey estimates of literacy P1 for age and education groups, ranges from 36.6% to 132.2%. In addition, some of the CVs are not available, because the associated survey estimates are either zero or have nonestimable variances. In addition to producing estimates for all the domains of interest, the indirect estimation approach described in the next section helps reduce the uncertainty of the survey estimates. For the rest of the manuscript, let the county by group domains with sample data be denoted by in-sample domains and let the county by group domains without sample data be denoted by not-in-sample domains.

Table 1.

County-Level Sample Size Distributions for Age and Education Groups: 2012/2014/2017 PIAAC.

Age and education groups	Number of counties with sample	Number of respondents
Age and education groups	Number of counties with sample	Minimum	10th percentile	Median	90th percentile	Maximum	Mean
16–24	172	1	2	10	31	82	15
25–34	177	1	3	11	33	96	16
35–44	180	1	2	9.5	21.5	56	11
45–54	178	1	2	10	21	72	11
55–64	180	1	2	10	18	45	10
65–74	176	1	1	6.5	14	31	7
Less than high school	175	1	2	8	25	67	12
High school diploma or an alternative to high school diploma	180	1	4	16	42.5	86	20
Some college	182	1	3	16	38	114	19
Bachelor’s degree or higher	179	1	3	13	36	115	18

Table 2.

Distribution of Coefficients of Variation (CVs) for County-Level Survey Estimates for Age and Education Groups for Literacy Proportion at or Below Level 1: 2012/2014/2017 PIAAC.

Age and education groups	Percentile					Number of counties with missing CVs
Age and education groups	20	40	50 (median)	60	80	Number of counties with missing CVs
16–24	52.5	67.1	77.4	89.0	122.2	25
25–34	47.3	71.7	78.6	87.4	112.0	17
35–44	48.9	69.3	77.7	85.4	116.6	23
45–54	44.1	61.8	73.3	80.5	108.7	19
55–64	47.9	64.8	71.2	80.4	105.6	16
65–74	39.1	56.0	63.1	74.2	109.8	33
Less than high school	21.5	32.1	36.6	42.2	61.5	12
High school diploma or an alternative to high school diploma	31.8	41.0	47.4	50.7	65.3	1
Some college	49.8	63.0	75.4	86.8	123.3	5
Bachelor’s degree or higher	75.2	110.2	132.2	148.9	241.0	42

2.2. Model-Based Estimates

Posterior distributions for the adult proficiency quantities of interest at the county, state, and state by group levels were produced in Krenzke et al. (2020) and Li et al. (2022) using area-level hierarchical Bayes linear models. Point estimates with associated variance estimates and credible intervals are directly available to download from the Skills Map. Also produced were 4,500 posterior samples for the adult proficiency quantities of interest. These posterior samples are used in this manuscript to construct the county by group-level estimates using an indirect estimation approach. High level details about the production of these model estimates are presented in the rest of this section. See Krenzke et al. (2020) and Li et al. (2022) for additional information.

In Krenzke et al. (2020), the proportions P1 and P3 were modeled using a bivariate model specified as follows:

\begin{matrix} P_{ijk} & ~ & N (θ_{ijk}, Σ_{ijk}) \\ θ_{ijk} & ~ & X_{ijk} β + c_{ijk} + v_{ij} + d_{i}, \end{matrix}

where $i$ is an index for the census division, $j$ is an index for the state, $k$ is an index for the county, $P_{ijk}$ is a jointly normally distributed bivariate vector of survey regression estimates (Rao and Molina 2015, 21–23) for proportions at or below Level 1 ( $P_{1}$ ) and at or above Level 3 ( $P_{3}$ ), with mean $θ_{ijk}$ and associated estimated variance-covariance matrix $Σ_{ijk}$ , $X_{ijk}$ is a matrix of covariates, $β$ is a matrix of regression coefficients, and $c_{ijk}, v_{ij}, d_{i}$ are county-level, state-level, and division-level random effects, respectively. The estimated variance-covariance matrices $Σ_{ijk}$ are the result of a variance smoothing process and treated as fixed and known. The random effects are mutually independent, following bivariate normal distributions with mean 0 and unknown variance-covariance matrices. Independent, weakly-informative prior distributions were adopted for the regression coefficients and the variances of the random effects. The averages were modeled using a univariate model specified as follows:

\begin{matrix} Y_{ijk} & ~ & N (θ_{ijk}, σ_{ijk}^{2}) \\ θ_{ijk} & ~ & x'_{ijk} β + c_{ijk} + v_{ij} + d_{i}, \end{matrix}

where $Y_{ijk}$ is the survey regression estimate of average literacy or numeracy scores at the county level, assumed normally distributed with mean of $θ_{ijk}$ and associated estimated variance $σ_{ijk}^{2}$ , $x_{ijk}^{'}$ is a vector of covariates, $β$ is a vector of regression coefficients, and $c_{ijk}, v_{ij}, d_{i}$ are county-level, state-level, and division-level random effects, respectively. The estimated variances $σ_{ijk}^{2}$ are the result of variance smoothing process and treated as fixed and known. The random effects are mutually independent, following normal distributions with mean 0 and unknown variances. Independent, weakly-informative prior distributions were adopted for the regression coefficients and the variances of the random effects. Similar to the survey direct estimation approach described in Section 2.1, plausible values were used in the construction of the survey regression estimates and their associated variance estimates, serving as inputs into the models. Irrespective of the outcome variable, the final covariates used in these models were percentage of population age 25 and over with less than high school education, percentage of population age 25 and over with more than high school education, percentage of population below 100% of the poverty line, percentage of Black and African American population, percentage of Hispanic population, percentage of civilian noninstitutionalized population who has no health insurance coverage, and percentage of population age 16 and over with service occupations.

In Li et al. (2022), for each group $g$ , each of the proportions P1 and P3 was modeled using a univariate model specified as follows:

\begin{matrix} P_{j, g} & ~ & N (θ_{j, g}, σ_{j, g}^{2}) \\ θ_{j, g} & ~ & x'_{j, g} β_{g} + v_{j, g}, \end{matrix}

where $j$ is an index for the state, $g$ is an index for the group, $P_{j, g}$ is the arcsine square root transformed survey estimate for proportion at or below Level 1 ( $P_{1}$ ) or at or above Level 3 ( $P_{3}$ ), with mean $θ_{jg}$ and associated estimated variance $σ_{j, g}^{2}$ , $x_{jg}^{'}$ is a vector of covariates, $β_{g}$ is a vector of regression coefficients, and $v_{jg}$ are state-level random effects. The estimated variance $σ_{j, g}^{2}$ depends only on the effective sample size (as a result of the arcsine square root transformation) and treated as fixed and known. The random effects follow a normal distribution with mean 0 and unknown variance. Independent, weakly-informative prior distributions were adopted for the regression coefficients and the standard deviation of the random effects. The averages were modeled using a univariate model specified as follows:

\begin{matrix} y_{j, g} & ~ & N (θ_{j, g}, σ_{j, g}^{2}) \\ θ_{j, g} & ~ & x'_{j, g} β_{g} + v_{j, g}, \end{matrix}

where $j$ is an index for the state, $g$ is an index for the group, $y_{j, g}$ is the survey estimate for average literacy or numeracy scores in group $g$ at the state level, with mean $θ_{jg}$ and associated estimated variance $σ_{j, g}^{2}$ , $x_{j, g}^{'}$ is a vector of covariates, $β_{g}$ is a vector of regression coefficients, and $v_{j, g}$ are state-level random effects. The estimated variance $σ_{j, g}^{2}$ are the result of smoothing functions of the variances for the survey estimates and treated as fixed and known. The random effects follow a normal distribution with mean 0 and unknown variance. Independent, weakly-informative prior distributions were adopted for the regression coefficients and the variance of the random effects. Similar to the survey direct estimation approach described in Subsection 2.1, plausible values were used in the construction of the survey estimates and their associated variance estimates, serving as inputs into the models. Irrespective of the outcome variable, the final covariates used in these models were percentage of population with less than high school education, percentage of population with more than high school education, percentage of population below 100% of the poverty line, percentage of Black and African American population, percentage of Hispanic population, percentage of civilian noninstitutionalized population who has no health insurance coverage, percentage of population age 16 and over with service occupations, and percentages of population for age groups 16 to 24, 25 to 34, 35 to 44, 45 to 54, 55 to 64, and 65 to 74.

Without loss of generality in notation, let $Y_{j}^{(b)}, Y_{jk}^{(b)}$ , and $Y_{j, g}^{(b)}$ denote the state-level, county-level, and state by group-level posterior samples, respectively. The variable $Y$ denotes any of the quantities of interest, P1, P2, P3, or average, $j$ indexes states, $k$ indexes counties, $g$ indexes groups, and $b$ indexes posterior samples. These samples are available for all the counties and states, and for all the age/education groups of interest.

The quality of the county by group-level estimates is dictated by the quality of the county-level, state-level, and state by group-level model-based estimates, since the latter serve as the only inputs into the model used to produce the former, to be described in the next section. Uncertainty statistics for model-based estimates of literacy P1 are provided in Tables 3 and 4. Recall that these model-based estimates were produced using hierarchical Bayes small area estimation models. The median credible interval width is 8.0% for county-level model-based estimates of P1, 6.1% for state-level model-based estimates, and 21.1% for state-level model-based estimates of literacy P1 for the less-than-high-school education group. In addition, the median CV is 10.0% for the county-level model-based estimates of P1, 8.1% for the state-level model-based estimates of P1, and 11.8% for state-level model-based estimates of literacy P1 for the less-than-high-school education group.

Table 3.

Distribution of Credible Interval Widths and Coefficients of Variation for County-Level and State-Level Model-Based Estimates for Literacy Proportion at or Below Level 1: 2012/2014/2017.

Statistics	Percentile
Statistics	20	40	50 (median)	60	80
County estimates for all domains
95% credible interval width (%)	7.2	7.7	8.0	8.3	9.4
Coefficient of variation (%)	7.8	9.1	10.0	10.9	13.5
County estimates for in-sample domains
95% credible interval width (%)	6.6	7.0	7.2	7.4	8.1
Coefficient of variation (%)	7.4	8.7	9.3	10.0	12.8
County estimates for not-in-sample domains
95% credible interval width (%)	7.2	7.8	8.0	8.4	9.5
Coefficient of variation (%)	7.8	9.1	10.0	10.9	13.6
State estimates for all domains
95% credible interval width (%)	5.2	5.7	6.1	6.4	6.9
Coefficient of variation (%)	6.3	7.0	8.1	8.6	10.3

Table 4.

Distribution of Credible Interval Widths and Coefficients of Variation for State-Level Model-Based Estimates for the Less Than High School Group and for Literacy Proportion at or Below Level 1: 2012/2014/2017 PIAAC.

Statistics for less than high school	Percentile
Statistics for less than high school	20	40	50 (median)	60	80
State estimates for all domains
95% credible interval width (%)	18.6	20.5	21.1	22.0	24.6
Coefficient of variation (%)	9.8	11.2	11.8	12.8	17.0
States estimates for in-sample domains
95% credible interval width (%)	18.3	20.4	21.0	22.0	24.5
Coefficient of variation (%)	9.8	10.7	11.5	12.4	15.9
States estimates for not-in-sample domains
95% credible interval width (%)	20.7	21.4	21.5	22.7	24.8
Coefficient of variation (%)	12.7	13.0	13.4	15.6	17.7

Finally, it is important to note that the posterior distributions for the county-level and state-level adult proficiency quantities were produced assuming normal distribution for the data level in the hierarchical Bayes model specified for the survey estimates (see Krenzke et al. 2020). Hence, some of the posterior samples $Y_{j}^{(b)}, Y_{jk}^{(b)}$ are negative. Unlike the county-level estimation approach adopted in Krenzke et al. (2020), the state by group-level hierarchical Bayes estimation approach assumed a normal distribution for transformed survey estimates which after back-transformation led to positive posterior samples $Y_{j, g}^{(b)}$ (see Li et al. 2022).

3. Allocation Method

The county-level indirect estimates for groups are constructed using an allocation method. This method consists of using the county-level, state-level, and state by group-level SAE model-based estimates and allocating them to the county by group domains of interest via a deterministic model. Model-based estimates for P1 and P3 are constructed first, and then their sum is subtracted from one to obtain the model-based estimates for P2; recall that P1 + P2 + P3 = 1 by definition. Also, literacy and numeracy measures are estimated independently. These approaches are consistent with the approaches in Krenzke et al. (2020) and Li et al. (2022).

Following the notation from the previous section, let the pseudo-posterior samples for the county by group-level quantities of interest be defined as

Y_{jk, g}^{(b)} : = Y_{j, g}^{(b)} \frac{Y_{jk}^{(b)}}{Y_{j}^{(b)}},

(1)

where $Y$ now denotes either P1, P3, or the average. The implicit working assumption is that the county-to-state ratio of posterior samples is constant across the age and education groups; see the Appendix for details about the working assumption and its evaluation. Pseudo-posterior samples for the county by group-level P2 are then defined as $1 - P 1_{jk, g}^{(b)} - P 3_{jk, g}^{(b)}$ . Posterior summaries, such as means, variances, and credible intervals are constructed using the pseudo-posterior distributions described above. Posterior means for P1, P2, and P3 that are below 0 or above 1 are set equal to 0 or 1, respectively.

Among the 175 counties with sample data for the less-than high-school education group, there were three P1 model-based literacy estimates greater than 1. Among the 2,967 counties without sample data for the less-than-high-school education group, there were forty-two P1 model-based literacy estimates greater than 1. None of the county-level P1 model-based estimates for the less-than-high-school education group were below 0. No adjustment was needed for the average score model-based estimates because all were positive. The P1 literacy model-based county by group-level estimates for the population with less than high school education are illustrated in the map in Figure 1.

Figure 1.

Map of county-level model-based estimates for the proportion at or below Level 1 literacy for less-than-high-school education group (as percentages).

4. Validation of Final Estimates

The survey estimates constructed for the county by age/education groups serve as inputs into the validation process, along with the selected ACS estimates related to proficiency described in Section 2. Three validation metrics are considered: (1) visual displays of the magnitude and direction from survey estimates to model-based estimates, (2) visual displays of the relationship between the proficiency estimates and related ACS variables, and (3) uncertainty measures for the model-based estimates.

The deviation plots in Figure 2 show the magnitude and direction from survey estimates to model-based estimates of literacy proportions for the less-than-high-school education group, by sample size. The starting point of the arrow is the survey estimate and the end point of the arrow is the model estimate. Short arrows correspond to small differences between the survey and the model-based estimates and long arrows correspond to large differences between the survey and the model-based estimates. The deviation is more substantial in domains with smaller sample sizes than those in domains with larger sample sizes. Arrows pointing upward correspond to negative differences between the survey and the model-based estimates and arrows pointing downward correspond to positive differences between the survey and the model-based estimates. Both positive and negative differences are observed.

Figure 2.

Literacy proportion (less than high school)—Deviation plots of point estimates by sample size.

Two ACS variables related to adult proficiency are now selected in the validation of the county by group-level estimates. They are available from the initial pool of variables considered by Krenzke et al. (2020) for the SAE models that combined PIAAC survey data with ACS data. The first variable is the proportion of population below poverty and the second variable is the proportion of population employed. The former variable is one of the seven covariates included in the SAE models in Krenzke et al. (2020) and Li et al. (2022). While the latter variable was not included as a covariate in the SAE models in Krenzke et al. (2020) or Li et al. (2022), it was part of the initial pool of variables considered in the development of those models. The proportion of population employed was not included in the final SAE models because this variable was highly correlated with other model covariates. These variables are available from the 2013 to 2017 ACS, for all the county by group domains of interest. Both the ACS estimates and the PIAAC estimates are “period” estimates, being constructed using combined samples across multiple years.

Scatterplots of estimates of literacy P1 for the less-than-high-school education group are illustrated in Figure 3, against the selected ACS variables (proportion of population below poverty, proportion of population employed). In general, similar range of the survey estimates and model-based estimates, and similar relationship between the ACS variables and the estimates are observed, with this relationship being clearer between the model-based estimates and the ACS variables than between the survey estimates and the ACS variables. This result is expected because the county by group-level model-based estimates are functions of the county, state, and state by group-level model-based estimates, which are themselves functions of, or related to, the ACS variables considered here; recall that the county, state, and state by group-level model-based estimates are composites of survey estimates (when available) and ACS estimates (as described in section 1) which include estimated proportions of population below poverty and other covariates related to the proportion of population employed (as described above).

Figure 3.

Literacy proportion at or below Level 1 for less-than-high-school education group versus selected ACS variables—County by group-level estimates: 2012/2014/2017 PIAAC.

As reported in Table 5, the median credible interval width is 24.5% for county-level estimates of literacy P1 for the less-than-high-school education group, where the median is taken over all the county by group domains in the U.S. When these county by group domains are categorized by sample availability, the median credible interval widths are 22.7% and 24.7% for in-sample and not-in-sample domains, respectively.

Table 5.

Distribution of Credible Interval Widths and Coefficients of Variation for County-Level Model-Based Estimates for Less Than High School for Literacy Proportion at or Below Level 1: 2012/2014/2017 PIAAC.

Statistics for less than high school	Percentile
Statistics for less than high school	20	40	50 (median)	60	80
County estimates for all domains
95% credible interval width (%)	21.1	23.4	24.5	25.9	29.4
Coefficient of variation (%)	10.7	12.1	12.9	14.0	17.7
County estimates for in-sample domains
95% credible interval width (%)	19.6	21.5	22.7	24.0	27.9
Coefficient of variation (%)	10.3	11.6	12.3	12.8	16.3
County estimates for not-in-sample domains
95% credible interval width (%)	21.2	23.5	24.7	25.9	29.5
Coefficient of variation (%)	10.7	12.1	13.0	14.1	17.7

The CVs for the county-level model-based estimates of literacy P1 for the less-than-high-school education group are also summarized in Table 5 by sample availability. For most of these county by group domains, the CVs are lower than 20%. All CVs reported in Table 5 for in-sample domains are less than half of the corresponding CVs for the survey estimates reported previously in Table 2. This result is expected because the uncertainty in the model-based county-level, state-level, and state by group-level estimates used as inputs into the allocation method was lower than the uncertainty in the corresponding survey estimates (see Krenzke et al. 2020 and Li et al. 2022). Also, the CVs reported in Table 5 are larger than the CVs for the state-level model-based estimates of literacy P1 for the less-than-high-school education group reported previously in Table 4: the state-level estimates by group have a median CV of 11.8% compared to 12.9% for the county-level estimates by group. This result is expected because the level of aggregation in this paper is finer than the level of aggregation in the cited report (county by group vs. state by group).

5. Concluding Remarks and Discussion

An allocation approach was developed to produce county-level model-based estimates of adult proficiency for six age groups and four educational attainment groups. Previous model-based SAE estimates, constructed at the county, state, and state by group levels, served as inputs into the allocation approach. The resulting county by group-level estimates are available in the Skills Map at https://nces.ed.gov/surveys/piaac/skillsmap/, along with the previously available county, state, and state by group-level estimates.

Due to challenges related to data availability at the county by group level, a top-down estimation approach was adopted in this manuscript, unlike the estimation approach in Krenzke et al. (2020) and Li et al. (2022). By starting with the available estimates at the county, state, and state by group levels, the proposed method allows for estimation at the county by group level, with pseudo-posterior distributions computed for all the quantities of interest at this fine level of aggregation. No benchmarking adjustments are necessary to the county by group-level estimates, because upon aggregation to higher levels these estimates are consistent with the previously available estimates that served as starting values in the estimation process. While the proposed method is based on the exact definition of the quantities of interest, an approximation of this definition had to be implemented due to a lack of consistently measured and reliable population estimates at nested levels of aggregation. The evaluation of the implicit working assumption and resulting estimates did not raise any serious concerns.

Because this manuscript addresses the production of reliable estimates for small domains, where there was insufficient sample data available, there are similarities between the presented estimation approach and small area estimation approaches (see Rao and Molina 2015). However, unlike many small area estimation studies, the presented estimation approach did not start at the level of aggregation at which estimates were needed, or at a lower level of aggregation followed by an aggregation step to higher levels. Also, unlike a small area estimation study, the presented estimation approach did not explicitly address variance reduction in the survey estimates at the level of interest. The observed variance reduction was a result of the variance reduction already achieved by Krenzke et al. (2020) and Li et al. (2022). To note, an area-level small area estimation model specified for the county by group-level survey estimates would have addressed variance reduction explicitly. However, in addition to good quality for the survey variance estimates, the predictive power of such a model would have needed a strong set of available predictors for all the domains of interest. In addition, benchmarking adjustments would have been needed for the resulting estimates to be consistent with the previously published estimates, and any aggregation to higher levels implies availability of consistent population totals (to serve as aggregation weights).

Footnotes

Appendix

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Andreea L. Erciulescu

Received: December 2022

Accepted: November 2024

References

Erciulescu

Ren

Krenzke

Mohadjer

Fay

2022. “Program for the International Assessment of Adult Competencies (PIAAC): County-Level Estimation for Age and Education Groups Methodology Report.” NCES 2023-004, U.S. Department of Education, National Center for Education Statistics. U.S. Government Publishing Office. https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2023004 (accessed October 31, 2022).

Hogan

Thornton

Diaz-Hoffmann

Mohadjer

Krenzke

van de Kerckhove

, et al. 2016. “U.S. Program for the International Assessment of Adult Competencies (PIAAC) 2012/2014: Main Study and National Supplement Technical Report.” NCES 2016-036REV, U.S. Department of Education. National Center for Education Statistics. https://nces.ed.gov/pubs2016/2016036.pdf (accessed November 6, 2024).

Krenzke

Erciulescu

A. L.

Mohadjer

Fay

2024. “An Area-Level Hierarchical Bayes Bivariate Three-Fold Linear Models for U.S. State and County Indicators of Adult Skills in Literacy and Numeracy.” Calcutta Statistical Association Bulletin 76 (1): 160–83. DOI: https://doi.org/10.1177/00080683231199020.

Krenzke

Mohadjer

Erciulescu

A. L.

Fay

Ren

van de Kerckhove

, et al. 2020. “Program for the International Assessment of Adult Competencies (PIAAC): State and County Estimation Methodology Report.” NCES 2020-225, U.S. Department of Education, National Center for Education Statistics. U.S. Government Printing Office. https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2020225 (accessed October 26, 2020).

Krenzke

van de Kerckhove

Thornton

Diaz-Hoffmann

Hogan

Mohadjer

, et al. 2019. “U.S. Program for the International Assessment of Adult Competencies (PIAAC) 2012/2014/2017: Main Study, National Supplement, and PIAAC 2017 Technical Report.”U.S. Department of Education, National Center for Education Statistics. U.S. Government Printing Office. https://nces.ed.gov/pubs2020/2020224.pdf (accessed August 19, 2023).

Krenzke

Ren

Mohadjer

Fay

Erciulescu

2022. “Program for the International Assessment of Adult Competencies (PIAAC): State-Level Estimation for Age and Education Groups Methodology Report.” NCES 2022-050, U.S. Department of Education, National Center for Education Statistics. U.S. Government Printing Office. https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2022050 (accessed September 2, 2022).

Mislevy

R. J.

Beaton

A. E.

Kaplan

Sheehan

K. M.

1992. “Estimating Population Characteristics from Sparse Matrix Samples of Item Responses.”Journal of Educational Measurement 29 (2): 133–61. https://www.jstor.org/stable/1434599 (accessed November 7, 2024).

Mislevy

R. J.

Sheehan

K. M.

1989. “The Role of Collateral Information About Examinees in Item Parameter Estimation.”Psychometrika 54: 661–79. DOI: https://doi.org/10.1007/BF02296402.

Rao

J. N. K.

Molina

2015. Small Area Estimation. 2nd ed. Hoboken, NJ: John Wiley & Sons.

10.

Rubin

1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.

11.

Rubin

1996. “Multiple Imputation After 18+ Years.”Journal of the American Statistical Association 91: 473–89. DOI: https://doi.org/10.1080/01621459.1996.10476908.

12.

Särndal

C.-E.

Swensson

Wretman

1992. Model Assisted Survey Sampling. New York: Springer.