Abstract
Introduction:
Many health plans in the United States are working to address disparities in care by race/ethnicity. A fundamental part of this work involves implementing processes for collecting race/ethnicity data to visualize where there are gaps for certain groups.
Methods:
A team of researchers at the National Committee for Quality Assurance interviewed health plans to understand how they collect and manage race/ethnicity data and to gain insights into how they overcome barriers to these practices. The research team purposively sampled and interviewed 13 health plans that report on the Healthcare Effectiveness Data and Information Set (HEDIS®), a widely used clinical quality measurement tool. Plans represented all product lines, and operate in a variety of United States geographic regions. Interview questions focused on how plans source race/ethnicity data, data granularity, processes for linking race/ethnicity data to quality performance metrics, and how data inform quality improvement efforts.
Results:
Participants highlighted four primary challenges to collecting and managing race/ethnicity data. First, there must be a balance between federal, state, and accreditation standards about which groups to include in data collection. Second, plans must balance the number of groups included with their ability to meaningfully report on those groups. Third, there are multiple sources of race/ethnicity data, requiring prioritization of certain data sources over others. Fourth, plans need an organizational structure conducive to such efforts, which includes the capacity for smooth interdepartmental collaboration and the organizational priority to collect, organize, and act on race and ethnicity data.
Discussion:
Despite the challenges associated with data wrangling, many plans emphasized interdepartmental collaboration as necessary for success in collecting and managing high-quality data to understand where there are health disparities by race/ethnicity, and to inform efforts aimed at addressing gaps in care.
Introduction
Racial and ethnic disparities in health outcomes exist throughout the U.S. health care system. Stratification of health care data by race/ethnicity is a critical tool; done correctly, it can inform strategies to address disparities and generate equitable health outcomes.1–7 Incomplete and low-quality data, however, prevent clear assessment of where disparities exist, leading to disproportionate, poor outcomes among certain racial and ethnic populations. Health plans in particular have an important role in addressing disparities by identifying gaps in care, working with clinicians and patients to identify the cause of disparities, and providing proper and inclusive services and coverage to everyone.
Many health plans have begun collecting race/ethnicity data to understand and improve health outcomes and reduce disparities in the populations they manage, in line with state and federal policies incentivizing such outcomes.8–10 But while there are many strategies to overcome limitations on collecting these data, simply mandating data collection is insufficient.11–14 Health plans must also understand and implement strategies for managing and utilizing the data to provide more equitable health outcomes.
This study highlights health plans’ experience collecting and managing race/ethnicity data, and how these are working to improve data quality and completeness. Insights can help plans better understand how to use race/ethnicity data to develop initiatives that focus on improving the quality of care for patients, and can help plans in the early stages of their data collection journey.
Methods
This study reports qualitative findings from the Race/Ethnicity Stratification Learning Network (RESLN), a mixed-methods quality improvement and research program conducted by the National Committee for Quality Assurance (NCQA) between October 2022 and January 2023. The goal of the RESLN was to identify preliminary performance on race/ethnicity-stratified Healthcare Effectiveness Data and Information Set (HEDIS®)a quality measures, and pain points experienced by health plans in their efforts to manage quality data, along with strategies and opportunities to overcome barriers and achieve actionable quality insights. The quantitative arm of the program involved submitting stratified performance data on up to five HEDIS measures using a standardized data collection tool. For the qualitative arm, program staff conducted virtual, small-group interviews with plan-identified key personnel. Complete methods and findings from the quantitative analysis, as well as initial qualitative themes, were previously reported. 15 No patient-level data were exchanged during this study. This study applied a rigorous qualitative framework to identify structured themes.
Participants were purposively sampled through email outreach to HEDIS-reporting health plans. Organizations offered all product lines (Commercial, Medicaid, Medicare, Exchange) and operated in a variety of states and geographic regions across the United States.
Plans that volunteered to participate were oriented to program aims and methods. Confirmed participants were asked to identify key personnel to be interviewed, based on a list of topics. Each interview had between three and five participants. All organizations were interviewed separately.
Interviews were 60 min long and semistructured. Interview questions focused on race/ethnicity data sources that plans relied on, the level of granularity in the data, the processes used to collect the data and link it to quality performance metrics, and how the data informed quality improvement efforts.
Qualitative analysis proceeded through several rounds of reading and coding, using an inductive approach to allow themes to emerge.16–19 First, all interview transcripts and flagged sections were reviewed, according to the questions on the interview guide. Two researchers reviewed all transcripts to ensure that materials were captured logically and were appropriately tagged to a relevant question.
Next, we began a fresh review process to identify themes and patterns, which we labeled with formal codes. Each researcher independently read the collated documents to identify potential themes and codes for inclusion in the codebook. We discussed identified codes as a group and agreed on official code labels and descriptions, which were organized into a codebook. 20 We then generated an unmarked copy of the collated documents and applied the codes. At this stage, one team member performed an initial review, and another followed to verify agreement. Mismatches were discussed as a team to find agreement. As a final quality assurance step, codes were applied to relevant sections of the original transcripts. This allowed team members to reexamine the transcripts for materials that may have been missed in the earlier phases.
The final phase was analytic synthesis: extracting the coded transcript material into files organized by code and reviewing the material anew. Each researcher was assigned a set of codes to review and summarize, identifying key themes and patterns within the code, and pointing to especially representative examples. As we reviewed these summaries, we built toward broader patterns in the data, following an inductive approach.
Interview Sample Summary
Thirteen health plans participated in qualitative group interviews. Participating organizations varied in size, location, and product line (Table 1). Plan characteristics are masked. Interview participant roles included HEDIS manager, director/manager of quality improvement or data analytics, health equity officer, IT developer, and vice president.
Summary Characteristics of Participating Plans
Results
Plans identified numerous challenges to managing and reporting race/ethnicity data. While some pointed to end goals such as visualization and reporting to appropriate audiences, most fell into two major categories: how to define and categorize race/ethnicity groups—addressing which race/ethnicity groups are considered, and the data sources available on those groups—and how best to achieve actionable data.
Part I: Defining and Categorizing Race and Ethnicity
There are three primary questions to address when considering how data are managed for quality data analysis. First, which race/ethnicity groups are recognized? Second, what data sources should health plans leverage? Third, how should plans prioritize certain data sources over others?
Race/ethnicity groups
Interviews identified multiple influencing forces on how race/ethnicity data are collected and harnessed for quality improvement. All plans indicated that state standards are involved, and most also referenced accreditation standards (e.g., NCQA) and federal standards (e.g., Office of Management and Budget) on which they are based. NCQA HEDIS specifications require collection of race/ethnicity data for certain measures, including Race and Ethnicity Diversity of Membership. Several states have instituted mandates, and they offer financial incentives to increase race/ethnicity data collection.
But when requirements differ, it can generate confusion and challenges. For example, federal (and accreditation) standards and state standards might direct reporting of race/ethnicity, but they might use different reporting categories. State-recognized categories are often already in Medicaid enrollment files, which means that plans must map data for different reporting outputs. This is further complicated for national or multistate organizations, because the structure of enrollment files can vary between states.
Misalignment is common in the “Hispanic/Latino” category. At the time of the qualitative interviews, federal standards recognized this category as an ethnicity independent from race categories and asked for a person’s race separately from their ethnicity. However, many intake forms include “Hispanic/Latino” within the broader list of races that could be indicated, making it difficult to disambiguate data: anyone who marked only “Hispanic” would be listed as such for ethnicity, but would be “Unknown” for race.
Similarly, not all intake forms include the same categories. For example, while federal standards separate the categories of Asian and Native Hawaiian, some intake forms collapse them into one item, creating a challenge when mapping to federal categories. Some forms allow an individual to check multiple race categories, while others indicate to select only one, or list a “Multiracial” option. When only one category can be indicated, individuals who identify with more than one race do not know how to respond: Do they select only one, select “Other,” or perhaps skip the question altogether? Uncertainty lowers data reliability. This suggests that the data collection instrument is critical to gathering good-quality race/ethnicity data, and that there may be room to better standardize intake of these data.
Multiple sources of race/ethnicity data
Plans can draw on an array of data sources for member race/ethnicity information—from member self-reporting, to enrollment files, to third-party records (e.g., state immunization registries). High-quality imputation algorithms are also available. Some plans have developed logic documents to facilitate prioritization of certain data sources based on data completeness and anticipated accuracy, often using criteria set by health plan staff.
The gold standard is member self-reported race/ethnicity data, and for the health plan to have full control over the collection method without involvement of an intermediary or third party. This ensures accurate assessment of how a person identifies their racial background. The member portal is the most prevalent method of collecting member self-reported data. Many RESLN participants stated that they are building, or have already built, a member portal that can collect an array of demographic information directly from members.
To ensure optimal outcomes from a portal, plans provided ideas to avoid or overcome key challenges. First, it is important to steer members to the member portal to ensure access and uptake. Once there, questions on race/ethnicity should be easy to understand and answer; this helps improve members’ digital experience. It also builds member trust if the health plan explains why it is asking for the information and assures members that their response does not affect their care or coverage.
Collecting member self-reported data requires investment of time and resources. When self-reported data are not immediately available, plans may have access to other data sources. Medicaid plans regularly receive enrollment files from their states, and Medicare plans may receive similar files from the Centers for Medicare & Medicaid Services (CMS). These files typically contain member race/ethnicity information. Some plans may use other types of state-level records, such as immunization registries, to obtain race/ethnicity data. Programs such as the Special Supplemental Nutrition Program for Women, Infants, and Children, or other state-run services, may also provide these data. Some plans partner with their provider networks to pull race/ethnicity data from the provider’s electronic health record system.
Many plans in the RESLN discussed leveraging imputation methods to fill in missing race/ethnicity data, although all plans prioritized direct sources over this approach. The most common imputation method used was the RAND Bayesian Improved Surname Geocoding method. A few plans stated they were hesitant to use imputation methods, largely out of a lack of trust in the methods’ accuracy.
Prioritizing data sources
Different sources can contradict each other, leading to questions about how to prioritize information or identify a source of truth. Most plans developed a hierarchy of data sources, although the structure of these hierarchies varied. Plans most often relied on state data, typically supplemented with either CMS data or data from providers and clinics. Some plans described their process for updating race/ethnicity data based on information in the member portal, or on interactions with case managers who work directly with the member. Some allowed member self-reported data to override all other information.
Not all organizations can do this. One plan shared how its state’s enrollment data contradicts the information the plan collected through member self-reporting, yet state reporting requirements prioritized the original file, making it difficult for the plan to update and reconcile the information. Another plan stated that it takes all the information it receives and rolls it up into state-defined output “buckets.” Deciding how to prioritize data includes not only data availability and quality but also state reporting requirements.
Plans that developed a source prioritization hierarchy were often also responsible for ensuring that staff understood how it worked, and/or when they were allowed to use certain data sources. This often meant expanding access to race/ethnicity data and training staff to ensure consistency of analyses across units. One plan instituted governance standards to ensure that only staff with training could access race/ethnicity data. Such standards clarified that the only allowable use of race/ethnicity data was to measure and address inequalities, and they also helped prevent inappropriate use that might exacerbate inequalities.
Illustrative quotes for Part I are described in Table 2.
Themes and Illustrative Quotes, Defining and Categorizing Race and Ethnicity
Part II: Achieving Actionable Data
In addition to defining race/ethnicity categories and determining sources to leverage, plans must address at least two overarching logistical questions on use of race/ethnicity data in assessing disparities. First, what is a plan’s ability to report on the data? Second, how can it be assured that departments across the health plan collect, store, and report these data in the most effective manner?
Reportability
When linked to quality performance metrics, analysis of race/ethnicity data must navigate between including an appropriate number of groups (granularity) and having large enough counts to adequately report on those groups. Smaller counts lower precision; since racial identity is inherent in the reported value, this also increases the risk of being able to identify individuals if a reported category is too small. Plans with very small counts of certain groups may not be able to report on those groups, despite otherwise valid data. Some health plans noted situations where smaller groups experienced considerably worse care than other groups, but plans were uncertain how to conduct valid disparity assessments or target proper resources to eliminate the disparities faced by those groups. Further, small totals make it more difficult to demonstrate return on investment to stakeholders when reducing gaps for smaller groups, which leads plans to focus on addressing gaps found in larger populations.
Analysis becomes more complicated for plans that want to analyze data at a more granular level than the federal categories. For example, one critique of the category “Asian” is that it comprises a diverse set of groups with a variety of experiences and needs. Plans with a large Asian population may want to examine differences between their sub-populations. Three plans noted that they run analyses using more granular data because it better suits their needs than the required aggregated levels. However, such an approach often runs into the problem of smaller cell sizes generating less reliable output.
One plan proposed a creative solution to this problem: rather than aggregating smaller groups and risk losing the nuance between groups, the plan aggregates three consecutive years of data on a measure, on the premise that disparities are not limited to a single measurement year. This approach may help identify and clarify trends in disparity while maintaining meaningful and reliable data.
The importance of organizational structure and priorities
Different departments provide different contributions to enhancing health equity. A diverse array of teams and perspectives necessitates collaboration across departments for both data management and data access. Most plans emphasized the importance of multiple teams and departments coordinating the process of collecting, storing, and reporting race/ethnicity data in support of equity initiatives. This approach minimizes duplication of effort across units, such as creating standards for reporting on race/ethnicity and allowable data sources.
The processes of acquiring, managing, and using race/ethnicity data in the pursuit of health equity is a complex task. Some plans mentioned the importance of having the direction of senior leadership to prioritize and advance initiatives, making health equity a priority across the organization. Asking plans how they define health equity, and who is responsible for establishing the definition, resulted in a fundamental insight: The best approaches to harnessing race/ethnicity data and striving for equity goals include strong leadership support, coupled with a broad team effort.
Over half the plans interviewed discussed the importance of leadership roles in setting equity goals and defining equity as a priority. Three plans stated that the board of directors established a commitment to equity, often by initiating a health equity committee or health equity campaign. Many units or departments in a plan have a stake in the data, and the best way to support them is to invest in resources at the top, and find efficient ways to coordinate units that work with race/ethnicity data.
But a successful strategy requires more than a top-down approach. One plan noted that even though its efforts were spearheaded by the chief equity officer, setting equity goals involved a strong collaborative component across multiple units. Other plans suggested a dialectic approach to how their equity goals were defined and set, relying on cross-departmental or multiparticipant efforts. One plan said equity decisions were made by a mix of top leadership, middle management, and subject matter experts from multiple departments, including population health, case management, medical management, member services, provider services, and human resources. Representation from all business units is necessary to motivate change to eliminate inequities.
Illustrative quotes for Part II are described in Table 3.
Themes and Illustrative Quotes, Achieving Actionable Data
Discussion and Health Equity Implications
Incomplete, unstandardized race/ethnicity data prevent clear assessment of health care disparities. In this paper, we outline primary actions identified by health plans as critical to successful collection and use of race/ethnicity data. These include collecting high-quality race/ethnicity data and managing data effectively to address inequities in care. Plans must consider which race/ethnicity groups are recognized for data collection, in addition to their ability to meaningfully report on those groups, and have an effective data infrastructure and clear mechanisms for prioritizing inconsistent information. Plans also emphasized the importance of intraorganizational collaboration for success in collecting and managing high-quality data.
The ultimate goal in collecting race/ethnicity data is to be able to systematically track and address disparities. Despite the challenges presented in this paper, plans across the health care system are demonstrating creative ways to collect and manage these data, to understand where disparities exist, and to inform efforts aimed at addressing gaps.
Although some plans could point to specific examples of using race/ethnicity data to address an identified gap, there remains significant work to integrate data collection into efforts to address health disparities. Data collection and management processes described in this paper are a critical first step to facilitating efforts at closing racial disparities in health care. As health plans become more adept at managing race/ethnicity data, the next step is to understand how to use the data in the pursuit of health equity. Future research should consider how to link stratified outcomes by race/ethnicity with efforts to reduce gaps in care.
The health plans in this study have laid the foundation for eliminating health disparities by establishing effective methods of collecting and managing high-quality race/ethnicity data. While the early part of this decade saw an increased emphasis on equitable outcomes at the state and federal levels, we are currently seeing a retreat from such accountability. However, for organizations that intend to continue pursuing health equity work, the need for reliable data cannot be overstated. Health care organizations can implement findings from this study to build capacity and develop an infrastructure to collect, manage, and leverage race/ethnicity data to advance high-quality, equitable care.
Footnotes
a
HEDIS is a registered trademark of the National Committee for Quality Assurance.
Authors’ Contributions
S.A.T.: Conceptualization (equal); data curation (lead); formal analysis (equal); investigation (equal); methodology (lead); writing—original draft (lead); writing—review & editing (equal); K.T.: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); project administration (lead); writing-original draft (equal); writing—review & editing (lead); R.M.: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); writing-original draft (equal); writing—review & editing (equal); R.H: Conceptualization (equal); supervision (lead); writing—review & editing (equal).
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This project received no external funding.
Abbreviations Used
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
