Abstract

I wrote this article as the Director of the United States Census Bureau, a principal agency in our nation’s federal statistical system. Our mission is to provide quality statistical data on our nation’s people and economy. Like all national statistical institutes, we are deep into the process of transforming and modernizing into a twenty-first century statistical agency. And we are doing so amid several challenges.
National statistical institutes are facing challenges that seem to parallel what I find in nature. I happen to own a spit of land in the hill country of central Texas, 35 acres to be exact. It never ceases to amaze me how the ecosystem in the hill country—that is, the wildlife, foliage, insects, and the land itself—is constantly adapting to its environment. Incursions to this ecosystem include the heavy hand of humans like me. I clear out brush, have built structures and drilled a water well. Exotic wildlife regularly escapes local “trophy” ranches, where they compete for survival with our native fauna. And, of course, the harsh climate offers drought, flash floods, wildfires, hail, and windstorms, not to mention the extreme heat and cold that accompany the central Texas seasons. Yet, despite these challenges, life in the hill country is intrinsically resilient. It adapts, and the ecosystem ultimately thrives. Even in harshest times, the hill country manages to flaunt its beauty. The same can be said of national statistical institutes, including the U.S. Census Bureau.
Our statistical institutes exist within their own ecosystems. We continuously evolve and adapt to highly complex, ever-changing threats and opportunities. Over time, we’ve all faced funding uncertainty, staffing challenges, decreasing public trust, robust political engagement, diminishing efficacy of gold standard methodologies (e.g., withering declines in survey participation). Statistical estimation approaches have had to adapt to respond to this new reality. Add to this an accelerated advance in technology, and a full societal acceptance of technology with a thirst for more contemporaneous data. Despite these many challenges to our business ecosystems, innovations by national statistical institutes have borne the fruits of insightful statistical data. Just as the Texas hill country ecosystem taps its resilience to ultimately thrive, so too do statistical institutes regenerate and grow stronger against continuous challenges.
Context
The Census Bureau is the largest of the current sixteen statistical agencies in the U.S. government. We conduct three censuses—a Population and Housing Census every ten years; and an Economic Census and a Census of Governments every five years. We continuously conduct our flagship American Community Survey (ACS) which features a national sample of 3.5 million households annually. We also conduct over 130 surveys of our population, housing, and our economy (i.e., businesses) at various frequencies. This includes the biennial American Housing Survey for the U.S. Housing and Urban Development Department, the semiannual National Crime Victimization Survey for the U.S. Department of Justice, and the monthly Current Population Survey (CPS), among many others.
We face our challenges head on, modernizing and transforming to meet the data needs of our nation. Our ability to transform and modernize ebbs and flows with available resources and changing priorities of elected administrations. Yet, our mission and values remain stalwart. Even a global pandemic could not thwart us from achieving our mission. Our decision-making is guided by our core values of scientific integrity, objectivity, transparency, and independence. Our mission is constitutionally and statutorily driven and embraced fully by our staff.
Below, I describe our agency challenges in the context of data needs and best value to the public.
Challenge: Thirst for More Contemporaneous Data
The accelerating appetite for data/information by society over the past quarter century is truly remarkable. The smart phone—with its applications leveraging digital data—has revolutionized many aspects of our lives. This includes how we travel, eat, sleep, work, buy and sell, and manage our interpersonal relationships. This extends into governance and policy implementation, as well.
Historically, statistical agencies have focused on maximizing the accuracy and reliability of statistical data. These are two very important aspects of data quality. “Gold standard” statistics are generated by our censuses as well as our flagship surveys like the ACS and CPS. The five-year ACS product, for instance, publishes statistics with nominal levels of precision down to the block group level of geography. Such levels of accuracy come at the expense of timeliness: the 2019 to 2023 data product was published in December 2024. Yes, it takes the better part of a year after data collection ends to produce five-year gold standard ACS statistics. For this product, achieving “gold standard” statistical quality comes at the expense of an important quality factor: timeliness.
The accuracy versus timeliness tradeoff was never more perspicuous than at the start of the COVID pandemic. The nation critically needed a contemporaneous, quick snapshot of how our country’s people and businesses were faring. Yet, many of our data products required extensive collection and post processing time to fulfill this urgent need. So, within weeks of the pandemic’s emergence on U.S. soil, the Census Bureau worked with other federal agencies to develop and conduct biweekly national data collections like the Household Pulse Survey. While such high frequency programs did not provide “gold standard” statistical precision, the biweekly flow of statistical data allowed the public to understand short term trends in employment, health, schooling, and the economy. Since then, high frequency data products have become so popular that they have been added to our portfolio for both our demographic and economic directorates. But the challenge between minimizing statistical error and timeliness remains as a balancing act that all statistical agencies still face.
Challenge: Rethinking Our Data Collection Model
Raw data are the fodder for statistical data production. For as long as the Census Bureau has conducted census and surveys, raw data have been gathered using a specific operational model. This legacy model features direct solicitation from entities that possess the information being sought. Solicitation models are necessary for many important policy-related measures (e.g., perceived wellbeing, attitudes, beliefs, experiences). But for others (e.g., employment status, personal income, assets, retail sales, exported goods), administrative data can be used and may be more accurate than self-responses.
A key challenge that faces the Census Bureau is its historical reliance on a data solicitation model. Our infrastructure was literally built around how we gathered data. Unfortunately, this approach is no longer sustainable in our current society. Participation rates in economic and demographic surveys and census have dwindled steadily for decades. Public trust in government is diminishing over time, driven by respondent concerns about the confidentiality of respondent data. In turn, the per unit costs of data collection are increasing, ergo, our legacy solicitation model is ultimately unsustainable alone to generate the data we need. Other data sources must be tapped, even if they complicate the overall error structures of resulting statistics.
Administrative records are helping to transform our approach to data gathering. The Census Bureau currently has agreements to acquire limited administrative data from such federal entities as the Social Security Administration and the Centers for Medicare & Medicaid Services, among others. State and private administrative data are also increasingly available.
We are pooling administrative data with our census and survey data to form an enterprise level data lake. This allows in-field operational efficiencies with solicitation-based data collection. It can also be used to impute item missing data. Moreover, we can reduce respondent burden by using administrative data items instead of soliciting such items.
Amid the challenges we face, there are opportunities. An important one comes from forming statistical data products that blend data from multiple sources. This can lead to new products that otherwise could never have been imagined. An illustration of this is a Census Bureau data product called the Opportunity Atlas. It is a collaborative project between the Census Bureau and researchers at Harvard University and Brown University. The Opportunity Atlas represents a comprehensive dataset of children’s outcomes when they reach adulthood. It traces the roots of poverty, incarceration, and other measures, from children’s neighborhoods where they were raised into their adulthood. It was created by linking data from administrative sources and federal programs to ACS data. Other examples of linked and pooled data include our Annual Population Estimates, our Veteran’s Employment Outcomes, and our Community Resilience Estimates.
Innovative, powerful data products that link data from multiple sources illustrate how we are reimagining our legacy solicitation model and its products. Having said that, pooling administrative, survey and census data increases the challenges of the resulting non-standardized data. The use of machine learning and artificial intelligence offer a glimmer of hope to address such challenges, but much work remains ahead of us. The challenge of rethinking our data collection model continues.
Challenge: Data Relevance
Advances in technology enable societal evolution in population behavior and information needs. Seldom do we acknowledge the implications regarding the relevance of the items we are measuring vis a vis what we should be measuring. The relevance of historical measures can fade with time. They should be regularly reviewed to align with contemporaneous society. To illustrate, U.S. public interest in race-ethnicity and ancestry has grown tremendously with emergence of commercial DNA testing and genealogical online services. This has led to an expanded, revised standard for race-ethnicity collection. It features disaggregated categories of black, Hispanic, Asian, American Indian, plus other races and ethnicities. Similar interest exists for such measures as disability, employment, and gender for populations. This creates a basic conflict of retaining an historical measure with diminishing relevance versus developing more appropriate measures that better align with today’s society. Economists, public health experts, epidemiologists, demographers, educators, and others want to preserve trends over time, even in the face of dwindling relevance of the measures they track. So, a real challenge is conducting regular reviews and establishing criteria regarding when a long-established measure requires a revision, and the research protocols for making changes.
This also applies to adopting new measures. Forty years ago, federal surveys did not ask about broadband access, social media, remote schooling and work, or telemedicine. These topics are highly relevant today. Agencies need a protocol for deciding when and how we alter or add measures to produce more meaningful information. This review process greatly benefits from engagement with data users, stakeholders, and the public to inform data relevance and products.
Challenge: Statistical Approaches in a Twenty-First Century World
Besides pressure for more data, there is the challenge of applying statistically valid methodologies to produce better data products and overcome the limitations inherent in combined data. Two challenges are prominent. First, there is the application of statistical estimates that blend data from multiple sources, for example, surveys, administrative data. Next, there is the challenge of protecting against disclosure while preserving sufficient granularity in the measure itself (e.g., income in dollars) and at small levels of geography (e.g., blocks).
Combining data from different sources to produce statistics is not new. Small area estimation, synthetic estimates, and formal Bayesian statistics are examples. But what is relatively new is national statistical institutes’ use of combined data to overcome the limitations of solicitation-based data. However, the error structures of resulting statistics can be complex, as they are affected by sampling error and model-based error. Historical, publicly familiar measures of statistical error like the margins of error and confidence intervals no longer apply to these statistics. More sophisticated measures of variability must be used, such as mean squared error and credibility intervals. Unfortunately, the public and even many analysts do not understand how to use these to gauge the strength of their inferences.
At the other extreme, sometimes the statistical results from combining data sources can be too accurate. They can raise the risk of disclosure of individual responses. The Census Bureau must abide by U.S. Code Title 13 protections against disclosure when publishing statistical data. We employ swapping, suppression, coarsening, and other legacy methods to reduce the risk to acceptable levels. But acceptable risk levels are subjectively assessed. Formal privacy solutions are also available for census enumeration counts at different levels of geography. They measure and control the risk of disclosure even against future threats. Approaches such as differential privacy can be used but rely on balancing the granularity of information at a specific level of geography against the accuracy of the counts, themselves. There is also a strategy to synthesize entire survey micro datasets to protect against disclosure. However, the science behind this approach has not been fully realized.
Finding a solution that provides both data utility for the user and protection for respondents is an incredibly hard challenge. It is one with which we and all national statistical institutes continue to grapple.
Challenge: Thinking Differently About Our Work
I close with what may be our greatest challenge. It is the challenge of culture change, getting our staff to think differently about what they do, how they do it, and what constitutes improvement. We are all on a journey of transformation and modernization. All too often, this is conveniently seen as adopting new technologies (e.g., cloud computing, artificial intelligence, enterprise software). As such, major players are often a small subset of technical experts who design and implement technological changes. Yet, throughout the acquisition and development of these new technologies, day-to-day production work continues, often with shrunken budgets to help offset to the costs of transformation. All the while, staff continue to operate under legacy systems; they are often resistant to change.
This is where organizational culture helps. Everyone should have a role in organizational change. The entire staff should understand and buy into the transformation and modernization effort. This includes how it changes their jobs, how it benefits the organization, and how it benefits them. Staff should be enabled to develop and exercise innovation and creativity. Only then can the organization’s transformation goals be achievable. That is why nurturing cultural change is as important as the development of new technologies and systems. But it is also our greatest challenge because of a natural aversion to uncertainty associated with change.
Conclusion
National statistical institutes are continually faced with myriad challenges. The future of national statistical institutes lies in their ability to become efficient and effective through innovation and new technologies. All the while, society is changing and expecting the delivery of more accurate, relevant data. This mimics the ecosystem of the central Texas hill country. Natural challenges befall the land, fauna, and vegetation. But life adapts and shows its beauty and resilience. So, too, do national statistical institutes face a cadre of challenges within their ecosystems. But those challenges can motivate innovation, create efficiencies, and reveal new, creative data products. I am optimistic that statistical institutes will preserve and even thrive, provided we enjoin our staff in the efforts with a focus on mission and values.
