Abstract

Keywords
1. Introduction
The field of survey design has seen a growing trend, both in the United States and internationally, toward producing state- or region-specific estimates from historically national-only surveys. Approaches for achieving this must balance the dual objectives of maintaining robust national estimates and producing reliable regional estimates without significantly increasing survey costs. One notable example of this trend is the 2016 redesign of the U.S. National Crime Victimization Survey (NCVS) (Langton et al., 2017). The redesign enabled the production of direct state- and local-level victimization estimates for the twenty-two largest states as well as specific metropolitan areas within those states. By refining its sampling and reporting strategies, the NCVS preserved its national scope while providing more granular insights into state crime trends. The trend of tailoring national surveys to provide subnational insights is not limited to the United States. For example, Larmarange and Bendaud (2014) outlined approaches to generate subnational HIV epidemiology estimates using data from the Demographic and Health Surveys (DHS) Program, that spans multiple countries. Their work underscores both the challenges and the opportunities of extracting detailed regional estimates from national-level data.
While the demand for granular, high-quality estimates to meet localized needs continues to grow, national statistical offices (NSOs) face difficult challenges due especially to ever-rising survey costs. Such financial pressures require NSOs to explore innovative, new methods for reducing survey expenditures without sacrificing data quality or scope. One solution to this dilemma is the blending of survey and administrative data (see, for e.g., NASEM 2023). By leveraging the information from multiple data sources, data blending can help produce more granular, detailed, and cost-effective estimates.
In the quest to combine survey and administrative data sources effectively, two primary approaches—data integration and data blending—are often discussed. While they share the common goal of enhancing analytical capacity through leveraging data from disparate sources, they differ fundamentally in execution and application. Data integration is the process of merging multiple datasets into a single, coherent dataset. This method requires harmonizing variables, addressing inter-file inconsistencies, and generally ensuring the resulting derived dataset can be analyzed unambiguously. A common example is linking individual records from surveys with administrative databases using shared identifiers such as social security numbers or healthcare IDs. Data blending, on the other hand, focuses on combining datasets that can only be linked at some higher level of aggregation—such as a county or province—for the purpose of creating specific estimates directly. For example, national-level estimates might be created by combining subnational estimates after weighting them via administrative data. Thus, rather than linking individual records, blending employs statistical methods like predictive modeling and statistical weighting that use the external data to extend inferences from a more restricted survey target population to an unrestricted national target population. A particularly promising but underutilized application of data blending involves redesigning national surveys with administrative data in mind, ensuring that both national and subnational estimates can be derived with high precision and acceptable accuracy. This approach allows for the optimization of each data source—leveraging the relevance and methodological rigor of survey data and the scope and timeliness of administrative records—while minimizing survey costs and errors.
Two blending methodologies—viz., quasi-randomization weighting (QRW) and superpopulation modeling (SPM)—are described in this paper. These approaches were recently evaluated in an extensive study documented in Biemer et al. (2024) for the fourth cohort of the National Survey of Child and Adolescent Well-being (abbreviated as S&NSCAW), which began fielding in 2025. Here the “S” stands for “state” to emphasize the change from a national-only survey to one that provides state estimates as well. The design of the S&NSCAW marks a significant shift from the national probability sampling of earlier cohorts as it draws sample only from a small number (denoted by L) of states that were purposively selected for their intrinsic relevance to researchers and policy analysts. An important consequence of purposive selection is biased national-level estimation which, as shown in Biemer et al. can be addressed by employing the blending methodologies. Section 2 summarizes key results from applying QRW and SPM to the S&NSCAW, while Section 3 discusses the findings supporting the effectiveness of both approaches, how to choose between them for creating blended estimates, and outlines several future research areas to improve the quality of these estimates.
2. Summary of Results
2.1. Quasi-Randomization Weighting
Using data from two prior cohorts of the survey—viz., NSCAW I and NSCAW II—as well as administrative data from the National Child Abuse and Neglect Data System (NCANDS), the quality of various types of QRW and SPM estimates was evaluated (see Biemer et al. 2024 for a comprehensive report on the results). NCANDS is an annual, comprehensive data source on child welfare investigations from nearly all U.S. states, including demographics, maltreatment types, services provided, and case outcomes. NCANDS variables were used as model covariates for SPM as well as for calibration weighting in the QRW approach. Because of space constraints, the QRW with the NSCAW I data are only summarized here; however, the NSCAW II results are similar and can be found in Biemer et al. (2024).
Initially, the data for the NSCAW I analysis was restricted to eligible children in the eight largest U.S. states (as determined from the NCANDS data), referred to as the set A. This restriction is consistent with the preliminary design of the S&NSCAW that is confined to the ten largest U.S. states. The QRW approach is applied to the sample from A (denoted by SA), which includes NSCAW I randomly selected children from each of the eight states in A. Thus, QRW regards SA as a “quasi-random sample” from the full population comprised of all fifty states and DC denoted by U. The probability that a child in U is selected for the sample SA is the product of probabilities across three stages: (a) selecting the child’s PSU from U to be included in A, (b) selecting the child’s PSU for the sample SA, and (c) selecting the child within the PSU in SA. Because the states in A are selected purposively, probability (a) is unknown and is to be estimated from NCANDS data, while probabilities (b) and (c) are known by the NSCAW I design.
Two QRW methods were evaluated for estimating (a). Method 1 is a PSU-level logistic regression approach which used NCANDS data to model (a). Method 2 does not use a model; rather, it simply assigns a probability of each PSU in A to its theoretical value assuming it was sampled by a probability proportional to size (PPS) sampling scheme from U that is similar to the scheme used for A. Finally, the so-called pseudo-base weights, defined as the inverse product of (a), (b), and (c), yielded by each method were adjusted for nonresponse and noncoverage using calibration factors that were also derived from NCANDS data.
QRW national estimates based solely on SA and the pseudo-weights were computed for twenty-six NSCAW dichotomized variables. In all, four types of estimates were computed from SA: (1) unweighted estimates, (2) estimates based on the original (unadjusted) NSCAW weights, (3) QRW estimates using Method 1, and (4) QRW estimates using Method 2. Estimates (1) and (2) were compared with estimates (3) and (4) primarily to assess the effect of QRW on estimates. A key component in the evaluation was the NSCAW I sample in the states not included in SA, denoted by SB. This made it possible to use the full NSCAW I sample (denoted by
Biemer et al. (2024) concludes that both QRW methods can be used to successfully extend inference from the restricted, nonrandom sample, SA, to U, although, Method 2 yielded slight better results overall. Even though some bias may remain in the estimates after applying QRW, the biases were substantially reduced from their unweighted counterparts for almost all the characteristics considered. Even further gains in the accuracy of the estimates are possible through the use of superpopulation methods as shown in the next section.
2.2. Superpopulation Modeling
Biemer et al. investigated both single and double robust SPM estimates. As described in Elliott and Valliant (2017), a single robust model does not incorporate survey weights whereas double robust models do. Both types of SPM models may still use variance weights to address any model heteroscedasticity that may arise when the variances of the error terms in the model are not homoscedastic. In the NSCAW I analysis, heteroscedasticity may arise due to the variation in sample sizes across PSUs. Single robust models provide valid inferences as long as the SPM model specification is correct. Double robust models have the advantage of providing valid inferences when either the model specification is correct or the final survey weights are correct.
Two choices of survey weights were considered for the double robust models: the original PSU-level weight from NSCAW I and the QRW pseudo-weights developed via the QRW Method 2. Double robust models with and without variance weights were also evaluated. In all, six model specifications were considered in the analysis: single robust with (labeled 1) and without (labeled 2) variance weights; double robust using traditional survey weights with (labeled 3) and without (labeled 4) variance weights; and double robust using QRW pseudo-weights with (labeled 5) and without (labeled 6) variance weights.
Theoretically, specification (6) should be preferred because it maximizes the information incorporated into the models—both QRW weights and variance weights. In practical applications, however, (6) may not be uniformly superior to the other models, as was our experience. Nevertheless, unless a valid gold standard estimate is available (as it was for both the NSCAW I and II analyses), model (6) might be the best choice for assessing the bias in the estimates based on models (1) through (5). In that case, the weighted regression analysis approaches reviewed in Bollen et al. (2016) could be employed to determine which of the six alternative model specifications is best. For example, using the criteria in Bollen et al., if an alternative model yields significantly different estimates than those from model (6), that model should be discarded in favor of model (6). However, if an alternative model produces estimates that are statistically equivalent to those from model (6), and the alternative estimates exhibit smaller standard errors, then the estimates from the alternative model should be preferred. A jackknife method for computing the standard errors of the SPM estimates can be found in Biemer et al. (2024).
The limited SPM analysis in Biemer et al. suggests that SPM is capable of producing estimates having smaller biases than those from QRW, and further, that double robust SPM estimates outperform their single robust counterparts. However, those results should be considered as preliminary until further evaluation (in progress) is completed. Finally, a replicate-based variance estimation method was also developed to estimate the standard errors of the bias estimates and is currently be implemented.
3. Additional Research Needed to Further Improve the Quality of Blended Estimates
This paper described how the blending of administrative and nonprobability survey data can address the high costs of producing both national and subnational estimates from a single survey. The approach involves designing the survey to generate reliable subnational estimates and then, using national-level administrative data to extend inferences to the national level via data blending. Two data blending approaches, QRW and SPM, were evaluated and both were found to be effective in producing high quality national estimates from a survey of targeted states by leveraging administrative data to essentially remove the bias present in the unblended, survey-only national estimates. The evaluation results in Biemer et al. (2024) further suggest that QRW can be recommended as the primary method because of its ease and breadth of application; however, because it tends to be a “one-size-fits-all” approach, it may not yield acceptable results for all survey characteristics. SPM, though more analytically demanding, can be tailored to each individual outcome variable by incorporating covariates that are most highly correlated with the outcome, and offers a less-biased alternative to QRW in situations where QRW produces unacceptably biased estimates. Therefore, we recommend that QRW and SPM should be used in tandem, with QRW providing estimates for the vast majority of characteristics and SPM is applied primarily in situations when a QRW estimate is of dubious quality.
With regard to future research, several important areas of investigation can be identified that were addressed in neither Biemer et al. (2024) nor this paper. One concerns the quality of inferences from substantive modeling. A substantive model is a theoretical or conceptual model often used in exploratory data analysis that aims to explain the underlying theoretical mechanisms, processes, and relationships within a particular research domain. Rather than focusing on descriptive statistics, it fits models that attempt reveal relationships between an outcome and its determinants or correlates in a population of interest. Substantive modeling often employs more complex multivariate statistics such multiple and logistic regression, correlation, and contingency table analyses. In fact, the S&NSCAW is designed for such analyses to be conducted individually for each of the L states as well as for the combined L-state sample. It is not clear, however, whether using blended estimation allows high quality inferences to be extended to the entire nation from such substantive models. We recommend pursuing a research agenda to address the following questions:
Do substantive models fit to the data in the quasi-random sample, SA (with or without survey weights), also hold for U?
Does weighting the models using the QRW weights improve the agreement between models that are fit SA and the same models that are fit using a nationally representative sample?
Can SPM models be adapted to provide an alternative to QRW weighted models for national inferences when conducting substantive research.
What guidance should be given to data users on how to analyze the data substantively for national inferences?
Similar questions can be posed for other statistics as well such correlations, medians, and other quantiles.
