Abstract

Keywords
1. Introduction: Statistical Ethics and Scientific Integrity Require Consideration of Categorical Imperatives Plus Complex Empirical Trade-Offs
For this special issue of Journal of Official Statistics, the call for papers highlighted, “future challenges and research needs across the different fields of Official Statistics,” and included “ethics, transparency and scientific integrity in the new era” as a topic of interest. This important topic involves many dimensions that have received renewed attention from National Statistical Institutes (NSIs) and other large-scale public-stewardship statistical organizations. United Nations Statistical Division (1994 ff.) and Eltinge (2024) provide some general background and literature review on these dimensions, and note some features of the changing social, methodological, technological, and data environments that have led to that renewed attention.
Some of those issues center on categorical imperatives, for example, strict prohibitions on plagiarism, falsification of data, or mistreatment of human subjects. There is an extensive literature on those necessary prohibitions, and this note will not elaborate further on those issues.
Other issues require somewhat more nuanced approaches. For example, norms of scientific integrity generally include the expectation that practitioners will follow procedures that have been established through rigorous development, testing, and peer review. Those norms also require well-reasoned justification for decisions to deviate from those established procedures when necessary. NSIs can encounter important challenges in the practical application of those norms within a statistical environment that encounters rapid changes in stakeholder information priorities, data sources, and available statistical methodology and technology. This is one of the important reasons that Journal of Official Statistics and other methodological journals are devoting extensive attention to statistical concepts and applications that provide practical responses to those changes.
In addition, statistical ethics and scientific integrity generally involve an expectation that NSIs will provide transparent and actionable information on (a) the data sources, methodology and technology that they employ, and on (b) the resulting properties of their estimation and dissemination procedures. National Academies of Sciences, Engineering, and Medicine (2022), and the extensive literature cited therein, provide some general discussion of transparency for NSIs. Due to space limitations, the current paper will note only that practical application of transparency norms and concepts often hinge on (i) specific decisions and actions that key stakeholder groups may need to make based on information received regarding (a) and (b); (ii) the extent and ways in which specific information released by the NSI can enhance the stakeholder decisions in (i); and (iii) the ways in which development and dissemination of that transparent information can help to improve the quality, efficiency, and credibility of the NSI.
As a complement to the ideas summarized above, the remainder of this paper will focus on some crucial components of statistical ethics and scientific integrity that center on the obligations of NSIs to deliver value to a wide range of stakeholders through the production of high-quality statistical information on a sustainable and cost-effective basis; and to provide stakeholders with reasonable guidance on ways in which the strengths and limitations of our statistical products may affect the value that we deliver. Realistic approaches to these issues require nuanced consideration of complex trade-offs, which often hinge on important empirical information, to the extent available. To provide initial ideas on these issues, Section 2 outlines some underlying concepts and questions; and Section 3 presents three illustrative examples.
2. Connecting Stakeholder Utility with Statistical Information Quality, Risk, and Cost
Many NSI obligations to provide valuable information to stakeholders center on two groups of concepts.
(i) Performance profiles and related trade-offs. NSIs seek to obtain, and to communicate to others, some realistic measures of the “performance profiles” of their suites of statistical information products and services. These profiles generally involve multiple dimensions of quality (e.g., relevance, accuracy, comparability, granularity, punctuality, interpretability, and accessibility); risk (e.g., inferential risk, as well as risks of failure of production systems, or violation of privacy protections); and cost (including cash expenditures, respondent burden, and cognitive and operational burden for data users). Many NSI decisions on design and operations center on trade-offs among competing dimensions of these performance profiles, for example, accuracy versus punctuality; or accuracy versus cost; or accuracy and availability versus confidentiality protection, and are guided by discussions of ethical obligations related to those trade-offs.
(ii) Stakeholder utility functions. The value delivered to a given stakeholder by a given set of statistical information products may include both utility conveyed through concrete use of specific estimates (“use value” in economic terminology); and utility conveyed through potential future use (“option value”). Questions on option value are especially important when economic and social structures are evolving rapidly. Consistent with point (i), NSIs generally expect that performance profiles will have an important impact on stakeholder utility. In addition, utility functions will naturally vary across multiple competing stakeholder groups, and also will vary with changes in underlying social and economic phenomena. For example, statistical information on labor markets and price levels may be of special interest during recessions or periods of high inflation. We emphasize that throughout this paper, we restrict the concept of “stakeholder utility” to involve only the value delivered to stakeholders through improved objective understanding of empirical phenomena. Thus, we specifically exclude cases involving misuse of statistical information in ways that degrade or distract from the goal of improved public understanding.
In principle, one could develop empirical models for stakeholder utility as functions of (a) the performance profiles in (i); (b) intended uses of a suite of statistical information products by a given stakeholder; (c) related available auxiliary information; (d) characteristics of that stakeholder (e.g., previous training and experience in the use of statistical information; and access to analytic software); and (e) other measurable variables. In less formal settings, an NSI may seek qualitative indications regarding the extent to which some or all of (a) to (e) are considered important by a given stakeholder. For either the formal or informal cases, some notable questions are:
- What are some concrete ways in which statistical ethics and scientific integrity are connected with stakeholder needs for statistical information?
- What are realistic ways in which to measure stakeholder utility as considered in (ii); and to evaluate ways in which that utility is associated with characteristics of a given stakeholder, and of a given body of statistical information? These ways might include enhanced work with customary external advisory groups; case-specific product testing, focus groups and user interviews; use of methods developed for elicitation of prior distributions and utility functions in Bayesian methodology; and other options.
- What are realistic measures for each of the factors (a) to (e) described above?
- To what extent do the performance profile measures in (i) serve as satisfactory proxies for the stakeholder utility functions in (ii)? If the performance profiles generally are not satisfactory proxies for (ii), what important factors are they missing, and what are realistic ways in which to measure those missing factors?
- To what extent do the performance-profile trade-offs described in (i) offer insights into ways that we can improve stakeholder utility considered in (ii)?
- What are transparent, actionable, and respectful ways in which NSIs can provide balanced and nuanced stakeholder communication regarding decisions intended to enhance stakeholder utility; to address the above-mentioned trade-offs; and thus to enhance the scientific integrity of the NSI work?
3. Three Illustrative Examples: Quality Issues Relevant to Statistical Ethics, and Related Trade-Offs
Section 2 identified some issues that warrant in-depth empirical study that would go well beyond the space limitations of the current note. As an initial step, this section provides three brief illustrative examples of ways in which quality issues, and related trade-offs, can be relevant to the general issues of stakeholder utility and statistical ethics summarized above. These general issues have been present for decades in customary NSI work. Those same general issues, and extensions thereof, may warrant deeper and more nuanced consideration as we move further into “new era” of changing sets of stakeholder information needs; data sources with changing quality profiles and cost structures; and rapidly evolving methodology and technology with properties that are not yet fully evaluated.
3.1. Survey Nonresponse and Related Incomplete-Data Patterns in Non-Survey Sources
General context: The methodological literature includes many in-depth studies of biases arising from survey nonresponse, related phenomena arising from incomplete data in non-survey data sources, and methods for mitigation of these issues. See, for example, the reviews and related discussion in Little and Rubin (2019), Miller et al. (2020), Bradley et al. (2021), and Eltinge (2023). In addition, there is extensive literature on adaptive and responsive survey designs intended to mitigate the impact of nonresponse.
Some quality issues relevant to ethics: NSIs have an obligation to produce high-quality statistical information, and nonresponse bias can severely degrade accuracy. This issue has led to extensive literature, per the citations above. However, the magnitude of nonresponse bias for a given estimator in a specific application ultimately depends on empirical phenomena that generally are not entirely known at the time of public release of a given statistical product. Thus, despite best efforts at mitigation, an NSI cannot provide absolute guarantees that a given set of published estimates have negligible bias. For a given magnitude of nonresponse bias, what do we know about the resulting impact on the stakeholder value of the resulting published estimates?
Some notable trade-offs: (a) standard bias-variance trade-offs encountered with weighting and imputation methods at various levels of refinement; (b) decisions about whether to incorporate into production a given source of non-survey data that is known to have incomplete-data issues; (c) decisions on allocation of limited resources for, respectively, direct nonresponse follow-ups, or design and analytic efforts to address (a) and (b); (d) versions of the decisions in (c), specifically for relatively small subpopulations, and for estimands that are of interest primarily for very specialized groups of data users; and (e) prospective decisions not to publish a given set of estimates, or to publish them with strong cautionary notes about quality, due to concerns about incomplete-data effects. For each of these trade-offs, what do we know about the resulting impact on the stakeholder utility of published estimates, and thus on NSI obligations to provide statistical information that is of the best feasible value for stakeholders? Similar comments and questions apply to NSI work with other components of a Total Survey Error model, for example, population coverage or measurement error.
Some “new era” issues: Each of issues (a) to (e) have received attention in literature and practice for decades. Under the “new era” conditions emphasized in Section 3, issue (a) warrants further in-depth consideration, especially for cases in which weighting and imputation methods use data-driven methods for which statistical properties may not be entirely transparent. Similar comments apply to data-driven methods for specialized estimators considered in issue (d). Also, “new era” versions of issues (b), (c), and (e) may require special attention in using data sources that are novel, and have properties that are not fully evaluated and may be unpredictable over time. Transparent and actionable evaluation of those properties, and of the resulting stakeholder impact, may be especially important.
3.2. Classification Systems and Limitations Thereof
General context: Many statistical information products from NSIs use standardized classification systems involving, for example, demographics, health conditions, education, occupations, geography, industries, and products. NSIs and related participants carry out extensive work to evaluate, and build consensus on, realistic options for these classification systems.
Some quality issues relevant to ethics: The methodological literature broadly recognizes that any nontrivial classification system involves multiple sources of uncertainty. Notable cases include (a) conceptual issues with coarsening of underlying phenomena that are often highly granular or continuous; (b) other conceptual issues arising from classification systems based on high-dimensional measures for a given unit, and on associated separating hyperplanes and dimension reduction; (c) classifications that depend on underlying phenomena that may be volatile over time, leading to frequent changes in true class membership status of a given unit; and (d) classification errors that arise from uncertainties in the measurement of a given unit, or from uncertainty in data-driven classification rules. Each of these issues can have important implications for estimation of the conditional distributions of outcome variables within a specified class. For each of these sources of uncertainty, what do we know about the resulting impact on the stakeholder utility of the published estimates? Also, in some applications, the relevance and interpretability of a given classification system may be highly context dependent. These issues can be especially important when empirical results of classifications are contrary to the expectations of some stakeholders. What are practical ways in which to enhance stakeholder value through transparent, actionable, and forthright ethical communication of classification concepts, empirical results, and related context?
Some notable trade-offs: The methodological literature often notes trade-offs encountered with decisions about the use of coarser versus finer classifications, for example, variance versus relevance and interpretability; and estimator variance versus confidentiality. Also, when an NSI considers updating a legacy classification system, it encounters trade-offs between relevance and interpretability. Additional confidentiality trade-offs also may arise. For each of these trade-offs, what do we know about the resulting impact on the stakeholder utility of published estimates? What are realistic ways in which NSIs can use those trade-off insights to address our obligations to provide statistical information that is of the best feasible value for stakeholders?
Some “new era” issues: The confidentiality issues noted above may be of special concern under “new era” conditions that include expanded availability of external data sources that can increase identification risk and attribute risk. In some cases, these risks may be exacerbated by “slivering” phenomena, for example, movement of a small fraction of the population into different classification, due to changes in the data-driven classification criterion. In addition, “new era” work may include non-conventional stakeholders who employ different conceptual frameworks. That may lead to classification criteria that are different from those used in development of legacy classifications. For those cases, highly transparent explanations of those differences, and of their impact on stakeholder usage, can be especially important.
3.3. Comparability Issues: Microdata, Published Estimates and Inferences
General context: The methodological literature often notes the importance of the comparability of statistical information over multiple data sources, over different cross-sectional subpopulations, and over time. For published estimates, this generally includes the comparability of the underlying concepts, and the resulting estimands; and the comparability of the production methodology, the resulting distributions of estimation error, and inferential use. At the microdata level, comparability questions can involve the extent to which data from different sources have essentially identical distributions; and can also involve the extent to which unit-level differences are approximately equal to zero. Some of these issues are discussed in additional depth in, for example, Brackstone (1999), Federal Committee on Statistical Methodology (2020), and Eurostat (2022).
Some quality issues relevant to ethics: There is a natural tension inherent in work with the “comparability” dimension of quality. On the one hand, scientific integrity and statistical ethics generally involve the expectation of objective and repeatable measurement. For this reason, and due to related expectations of many data users, NSIs often may be reluctant to make changes in underlying variable definitions, data collection methods, and estimation procedures. On the other hand, many substantive phenomena can evolve rapidly due to changes in underlying technological or social environments, thus imposing some limitations on the concept of comparability. In economic statistics, notable examples may include measures of income and productivity. Also, declining survey response rates, and the increased availability of non-survey data sources, may lead NSIs to consider changes in their production processes, despite the above-mentioned reluctance. Finally, there are important comparability questions related to analytic methods and inference. In many cases, inference from NSI publications may be based on published point estimates and related standard errors intended to account for applicable sources of random variability, for example, through the construction of customary confidence intervals. However, in some other cases, external analysts may use statistical information from NSIs for highly exploratory analyses. The resulting inferential statements may be of substantive interest, but the associated profiles of inferential quality may not be directly comparable to those of customary inference methods; cf. the conceptual distinction between “conclusions” and “indications” as discussed by Tukey (1962) and others. When underlying phenomena, data sources, and analytic methods are changing rapidly, what are some ways in which stakeholders change their perceptions of the impact of “comparability” on the utility of related estimates produced by NSIs? In addition, what are some realistic ways in which to assess the impact on stakeholder utility from different inferential methods that are not fully comparable?
Some notable trade-offs: Within the methodological literature and practice, commonly encountered trade-offs include the impact of changing variable definitions or data sources (relevance vs. comparability and interpretability); the use of data analysis or text processing methods based on procedures that are computationally intensive but not entirely transparent (possible improvements in accuracy and punctuality vs. loss of comparability, interpretability, and possibly relevance); and use of “bridge studies” to allow in-depth comparison of estimates produced through legacy procedures and updated procedures, respectively (improvement of comparability and interpretability vs. reduction in other resource allocations to pay for the study). For each of these trade-offs, what do we know about the resulting impact on the stakeholder utility of published estimates, and what are the best ways to communicate those trade-offs to our stakeholders?
Some “new era” issues: As noted above, “new era” changes in underlying substantive concepts, data sources, and estimation methodology all can require in-depth reconsideration of questions related to the comparability of statistical information over time and across subpopulations. In addition, conventional and non-conventional stakeholders may have differing priorities related to comparability, and to the utility that they receive from statistical information that meets specified criteria for comparability. Consequently, there will be special interest in “new era” use cases that provide solid empirical information on the ways in which different dimensions of comparability have different effects on specific dimensions of stakeholder information needs and related utility functions.
Footnotes
Acknowledgements
The author thanks many colleagues in government statistical agencies, academia, and the private sector for many productive discussions of the topics considered in this paper; and thanks Tommy Wright, Rolando Rodriguez, Wendy Martinez, Michael Hawes, and Paul Beatty for insightful comments on an earlier version of this paper.
Disclaimer
The views expressed here are those of the author and do not represent the policies of the United States Census Bureau.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Received: January 10, 2025
Accepted: March 15, 2025
