Abstract
Recent scientific and policy initiatives frame clinical settings as sites for intervening upon inequality. Electronic health records and data analytic technologies offer opportunity to record standard data on education, employment, social support, and race-ethnicity, and numerous audiences expect biomedicine to redress social determinants based on newly available data. However, little is known on how health practitioners and institutional actors view data standardization in relation to inequity. This article examines a public safety-net health system’s expansion of race, ethnicity, and language data collection, drawing on 10 months of ethnographic fieldwork and 32 qualitative interviews with providers, clinic staff, data scientists, and administrators. Findings suggest that electronic data capture institutes a decontextualized racialization within biomedicine as health practitioners and data workers rely on biological, cultural, and social justifications for collecting racial data. This demonstrates a critical paradox of stratified biomedicalization: The same data-centered interventions expected to redress injustice may ultimately reinscribe it.
While social scientists have long recognized the social conditions of health and illness (House 2001; Link and Phelan 1995; Williams and Sternthal 2010), biomedical institutions have historically granted limited attention to the social nature of health inequality. Nearly two decades of attention toward health disparities has coexisted alongside clinical practice, with medicine’s role in redressing inequity narrowed to the mitigation of provider bias and differential treatment (Ibrahim et al. 2003; Institute of Medicine 2003; Manzer and Bell 2021; Stepanikova 2012; van Ryn and Fu 2003). Recent scientific and policy initiatives now seek to expand the scope of biomedicine by integrating consideration of “social factors” within clinical care, most prominently through expanded clinical data collection on social and behavioral domains (Centers for Medicare and Medicaid Services 2021; Institute of Medicine 2014). Standardized data on education, employment, stress, social support, and race and ethnicity within electronic health records (EHRs) are expected to guide health practitioners to place “patients in context” (Adler and Stead 2015), with clinical settings framed as sites for intervening upon social problems following institutional uptake of data analytic technologies: Capturing social determinants of health in EHR data will allow health care providers to better characterize, understand the causes of, and identify appropriate interventions that health systems can make to reduce health disparities, which will allow critical social problems and also costly problems for the health system and society as a whole to be addressed. (Institute of Medicine 2014:17)
Electronic data capture on social determinants is widely expected to induce providers and health systems to intervene upon social conditions, with scientific, policy, and advocacy audiences alike seeking to leverage technological advances as means of reducing inequity (Cantor and Thorpe 2018; Chin 2015; Douglas et al. 2015; Pérez-Stable, Jean-Francois, and Aklin 2019; Zhang et al. 2017).
Despite the broad legitimacy afforded to these data-centered interventions, investment in data analytics risks obscuring the social nature of health inequalities to the very health practitioners and other institutional actors expected to redress them. The focus on technical standards alone sidesteps the sociological context of stratified biomedicalization (Clarke et al. 2003; Shim 2010; Spencer and Grace 2016; Timmermans and Kaufman 2020; Wright and Perry 2010), including how health care workers view themselves as implicated in “critical social problems” (Mackenzie, Skivington, and Fergie 2020; Malat et al. 2010). Previous attempts to include “race” within biomedical research and medical education have often resulted in bio-reductionist reification, with institutional initiatives appearing to attend to social factors without specifying their sociological significance (Duster 2005; Epstein 2007; Lee 2009; Olsen 2019; Roberts and Rollins 2020; Shim 2014). Recent controversy over racialized surveillance and algorithmic bias (e.g., a commercial risk algorithm recommending Black patients receive less care than White patients for similar health conditions; Obermeyer et al. 2019) also suggest that data-centered, high-tech innovations may further exacerbate existing inequality (Benjamin 2019; Braun et al. 2021; Noble 2018; Vyas, Eisenstein, and Jones 2020). Thus, despite the seeming promise of data analytic technologies for understanding, monitoring, and intervening upon social conditions, empirical observation is needed to show how standard data integrate attention to social factors within biomedicine, including how such data reconfigure relations of injustice.
In this article, I examine how health practitioners and institutional actors understand data standardization on social and behavioral domains in relation to inequity, focusing on a public safety-net health system’s expansion of race, ethnicity, and language (REAL) data collection within its EHR data system. Drawing on 10 months of ethnographic fieldwork (September 2017–June 2018) and 32 interviews with providers, clinic staff, data scientists, and administrators, I illustrate how standardized data institute a decontextualized racialization within biomedicine, inscribing “race” within clinical settings without specifying its social nature as understood by social scientists (Duster 2005; Phelan and Link 2015; Williams and Sternthal 2010). Health practitioners and data workers rely on biological, cultural, and social justifications for collecting racial data, recognizing potential tailored care practices that ultimately result in limited change to racialized social conditions. I call this critical paradox “racing the machine,” whereby data analytic technologies become enrolled in projects of racialization while forestalling alternative possibilities of advancing social justice.
Background
Institutional Pressure to Integrate the Social within Biomedicine
Canonical medical sociology positions the social study of health and illness as an alternative to biomedical understandings of disease (Engels 1845; House et al. 1994; Link and Phelan 1995; McKinlay 1996), prioritizing fundamental differences in resources, living conditions, and relationships in explaining class-based and racialized health inequities (Phelan and Link 2015; Williams and Sternthal 2010). Du Bois (1906:90; see also Gamble 2010) first explained the unequal burden of sickness among African Americans compared to Whites as profoundly social: Challenging then-dominant theories of biological essentialism, he argued racial differences in health were “largely matters of [social] condition and not due to racial traits . . . with improved sanitary conditions, improved education, and better economic opportunities, the mortality of the race may and probably will steadily decrease.” Contemporary sociologists continue Du Bois’s legacy by studying residential and institutional segregation, cultural racism, race-based social stress, and stigma and discrimination in explaining racialized health inequities (Ahmad and Bradby 2007; Nazroo, Bhui, and Rhodes 2020; Turner, Brown, and Hale 2017; Williams, Lawrence, and Davis 2019; Williams and Sternthal 2010). Much of this social science scholarship further recommends policy programs that directly confront these social conditions as means of reducing inequality.
Biomedical institutions today face growing pressure to redress such “social factors,” typically coded as attending to social determinants as a part of clinical care. With the exception of a few notable contributions (McKeown 1976; Navarro 1986; Waitzkin 2000), biomedicine has historically deemphasized the social conditions outlined by social scientists in favor of biological and cultural explanations of health differences, while further narrowing the scope of expected activity to the mitigation of provider bias (Betancourt et al. 2003; Ibrahim et al. 2003; Manzer and Bell 2021; van Ryn and Fu 2003). Racial reckoning regarding police violence and COVID-19 inequities, however, has provoked a flurry of activities toward integrating the social within biomedical practices: These efforts include institutional recognition of health disparities within state policy and advocacy (e.g., building on National Institutes of Health Revitalization Act, see Centers for Disease Control and Prevention 2013; Centers for Medicare and Medicaid Services 2021; Epstein 2007:75–83), new accountable care arrangements that expect providers to meet patient social needs within medical care (America’s Health Insurance Plans 2018; Cantor and Thorpe 2018; Fraze et al. 2016; Vale and Perkins 2022), and student organizing for racial justice within health professions schools themselves (Balch 2020; Evans et al. 2020; White Coats for Black Lives 2022). Explicit consideration of the social is thus now a part of medical education (Olsen 2019; Sharma, Pinto, and Kumagai 2018), biomedical research (Bliss 2012; Lee 2009; Rollins 2021; Shim 2014), and clinical guidelines and standards-based protocols (Mackenzie et al. 2020; Smart and Weiner 2018; Vyas et al. 2020). This marks a distinctive shift from previous eras of biomedicine, with biomedical institutions appearing to finally recognize the importance of social factors amid public pressure to redress inequality.
The Rise of Electronic Health Records and Data Analytic Technologies
Alongside these developments, health care organizations and practitioners find themselves in the midst of another transformation with institutional investment in data analytic technologies such as EHRs. Incentivized under the HITECH Act of 2009 and resulting in near-universal adoption across sites of care (Atasoy, Greenwood, and McCullough 2019; Ferris 2010), EHRs have ushered in an emerging digital paradigm of what Clarke et al. (2003) characterize as biomedicalization. In circulating the new lifeblood of data, EHR data systems are described as a unifying nervous system connecting clinical care, biomedical research, and policy governance via the widespread expansion of computer and information technologies within modern medicine (Blumenthal 2010; Institute of Medicine 2014). Large-scale biomedical data sources in turn have given rise to an explosion of data-centered artifacts, ranging from performance metrics, clinical algorithms, and public data dashboards tracking progress toward health equity (Anderson et al. 2018; Penman-Aguilar et al. 2016; Zhang et al. 2017; e.g., covid.cdc.gov/covid-data-tracker). The rise of data analytics effectively links clinical settings to converging projects of state administration, scientific research, and population surveillance, reflecting the ongoing, increasingly complex and technoscientific transformation of United States biomedicine in the twenty-first century.
At the intersection of these two developments—growing pressure for biomedical institutions to redress social determinants, and the broad-scale transformation engendered by emerging data analytic technologies—lies the challenge of representing the social within EHR data systems. The National Academies of Science, Engineering, and Medicine and other advisory bodies affirm the importance of EHR data collection on social and behavioral domains in clinical settings (Adler and Stead 2015; Centers for Medicare and Medicaid Services 2021; Douglas et al. 2015; Institute of Medicine 2014). Clinical data collection on social factors, including patient race and ethnicity, is cited as key for redressing long-standing social problems such as health inequities, including those studied by social scientists. When confronted by the possibility of racialized differences associated with COVID-19, for example, elite medical societies jointly urged for federally mandated data collection and release on race, ethnicity, and language as an essential first step for promoting health equity (American Medical Association 2020; American Medical Association et al. 2020; e.g., covid.cdc.gov/covid-data-tracker/#health-equity-data). Standardized data may allow institutions to place “patients in context,” with providers and systems expected to intervene upon social conditions based on newly available data, thereby resulting in the reduction of inequality.
Biomedical Stratification and the Social Nature of “Race”
Yet as Du Bois (1906) first recognized in countering biological explanations of racialized health differences, neither recognition of race–health associations nor data collection itself can guarantee that health practitioners understand the social basis of any observed data-based findings, let alone lead them to actively transform social conditions. Contemporary scholars have shown how the routine inclusion of “race” within biomedical research may result in reification, leaving biological notions of difference intact while limiting consideration of racialized social conditions (Duster 2005; Lee 2009; Roberts and Rollins 2020; Shim 2014). This occurs even when biomedical scientists explicitly cite the social in justifying novel research programs, yet prioritize approaches that reduce social complexity or deemphasize structural inequality (Ackerman et al. 2016; Bliss 2012; Rollins 2021; Shim et al. 2014). Attention to social determinants within medical education has similarly resulted in social problems seen as “facts to know, rather than conditions to challenge and change” (Olsen 2019, 2021; Sharma et al. 2018:25). And despite scientific and policy expectations of leveraging data analytics for equity, emerging concerns over algorithmic bias and racialized surveillance (Benjamin 2019; Noble 2018; Obermeyer et al. 2019) suggest that technoscientific advances remain inseparably enmeshed with contemporary processes of stratified biomedicalization. Taken together, these developments reflect the complex challenges of accounting for the social within EHR data systems and integrating such data to redress inequality.
This article presents empirical fieldwork and qualitative interviews with providers, clinic staff, data scientists, and administrators to examine the social life of racial data within a large public safety-net health system. It follows biomedical data across a range of contexts within a single organization to examine how data artifacts appear in the everyday practice of care. Ethnographic and interview-based approaches are especially appropriate for studying emerging data practices, including how such heterogeneous practices may diverge from scientific and policy expectations of data analytics. Published scholarship and public reports, for example, already present arguments for clinical data collection on social and behavioral domains and its expected use to intervene upon critical social problems (Adler and Stead 2015; American Medical Association et al. 2020; Institute of Medicine 2014; Pérez-Stable et al. 2019; Zhang et al. 2017). But to understand how standard data integrate social factors within biomedical institutions, we must examine all of the elements in the situation (Clarke and Fujimura 1992), including the views of the institutional actors expected to intervene upon social conditions post-data collection. This research carries particular weight given the centrality of EHR data reporting in institutional recognition of racialized health disparities associated with COVID-19 (American Medical Association et al. 2020; Black Public Health Collective 2020); what remains to be seen, however, is how data analytics results in the fundamental transformation of social conditions informing such inequities.
Data And Methods
This article drew from a multisite project on the rise of data analytics in U.S. biomedicine, focusing on ethnographic fieldwork of the integration of data artifacts (e.g., metrics, algorithms, and data dashboards) within a large public safety-net health system. The organization primarily serves low-income, racial and ethnic minority, and/or immigrant patients insured through Medicaid managed care, reflecting national enrollment trends in capitated managed care plans in the wake of the Affordable Care Act (America’s Health Insurance Plans 2018). As a county institution, the organization further participated in several state and federal programs mandating data reporting in exchange for reimbursement. Billed under “delivery system reform” and “value-based payment” initiatives, these programs tie financial incentives to quality metrics and performance goals based on data generated through EHR use (Anderson et al. 2018; DeMeester et al. 2017). As part of a state accountability program, the health system was required to initiate data collection on race, ethnicity, and language (REAL) in the primary care setting following growing institutional pressure to recognize and reduce health disparities. These data requirements, however, were simply one initiative out of many: Providers and staff faced many other data monitoring efforts such as controlling high blood pressure, reducing hospital readmissions, and reconciling patient medications under related data-centered accountable care programs. This article focuses on how health practitioners understood the initiation of race-based data collection, including its availability within EHRs and its relation to disparity reduction.
My ethnographic fieldwork spanned over 10 months (September 2017–June 2018; n = 450 hours) and included observations and interviews with the range of institutional actors involved in data analytics integration (n = 32; Table 1). I first shadowed a data team charged with local implementation of the state accountability program, examining their work in introducing new data-centered care initiatives across multiple outpatient clinics, the IT and analytics division, and system administration. These observations introduced me to many other health care workers involved in data analytics integration, including frontline providers and staff, clinic and nurse managers, C-suite executives and administrators, and on-site technical consultants and data scientists. Although I did not originally intend to study REAL data collection, the announcement of this initiative during my time in the field led me to include interview questions on race and ethnicity in relation to data analytics. Following the procedures of constructivist grounded theory (Charmaz 2014), preliminary analysis of initial interviews led to refined questions specific to the REAL data, including how health practitioners and institutional actors viewed such efforts in relation to care and disparity reduction.
Participant Sample. a
Note: To protect participant confidentiality, aggregate information is presented by position, location, and background. Due to overlapping roles and responsibilities, several participants reported more than one response for each column. For more information on data collection procedures, please see Cruz and Smith (2021). EHR = electronic health record.
Table 1 reports 32 unique interviews, including 28 original participant interviews and 4 follow-ups.
Table 1 presents the wide range of health care workers reflected in the sample.
By following the implementation of REAL data collection across organizational units, I was able to examine how race-based data appeared in the daily work of health practitioners, clinic staff, data scientists, and administrators (Table 1; note the range of positions, locations, and professional backgrounds of participants). Field notes were digitally recorded following a day of observations, and interviews were audio recorded and subsequently transcribed. The author then recruited a team of eight research assistants to organize the resulting qualitative data, first by isolating all relevant excerpts of the REAL data initiative or other mention of race, ethnicity, language, or immigration. Line-by-line coding and analytical memoing resulted in team identification of early themes; I then independently refined these analyses to focus specifically on the sociological implications of investing in data analytics for redressing inequality.
Results
This section presents how electronic data capture institutes a decontextualized racialization within biomedicine. First, data standardization inscribes “race” within clinical settings but without specification as to its social nature or relation to inequality. Second, health practitioners collectively offer biological, cultural, and social justifications for collecting race-based data, reflecting lack of consensus over the meaning of race despite expanded data availability. Third, few providers or staff anticipate substantive change post-data collection, with data-based strategy deferring action toward redressing observed disparities. Taken together, these findings suggest the same data-centered interventions expected to redress injustice may ultimately reinscribe it.
Association without Context: Data Collection as Decontextualized Racialization
Within the state program, the collection of race-based data took place alongside many other IT and data initiatives across public health systems. But unlike the integration of other performance metrics—such as controlling high blood pressure, reducing hospital readmissions, and reconciling patient medications, all of which serve policy objectives of delivery system reform and quality assurance—the REAL data collection was introduced as a part of health system disparity reduction. A policy expert responsible for overseeing the state program justified REAL data collection accordingly, with race-based data expected to inform future work within local systems: We had some interest in doing better disparity reduction within our health systems, and we wanted to think about how to better integrate the collection of REAL data, as well as focus around disparity reduction in our program. . . . While there has been interest in wanting to reduce disparities in our communities and in our systems, if systems and providers don’t have any information about what the disparities are, how they differ within their patient populations, it’s hard to create that focus to better address some pretty important issues for our patients. As things started to materialize, I think there was recognition that data collection was the key missing piece. (Policy expert, state program)
This understanding of race-based data collection is commonplace among policy officials and health advocates, and aligns with many public calls for more robust data collection for purposes of disparity reduction. Standard data are seen as foundational for knowing and acting upon inequities, with data collection coupled with expectations of future change.
Yet this particular understanding of REAL data is ultimately lost in translation once it reaches frontline health care and data workers, in part due to the explosion of health IT and data-centered initiatives associated with EHRs and data analytics. Neither the social nature of race nor objectives of disparity reduction necessarily travel with public data mandates, with data collection serving as a technique of decontextualized racialization. Institutional actors do cite and recognize race–health associations within their work, but oftentimes in forms divorced from critical social problems such as inequality. For example, one of the main data analysts charged with leading data analytics integration explicitly cited racialized differences in explaining system investment in data-driven care, but failed to note the importance of reducing disparities between populations: For instance, when it comes to race and ethnicity, different rates of heart complications are more prevalent in certain races. Having the ability to research that and understand that can change the way we deliver health care for a specific population of diabetics, of hypertensives. You know, reviewing the data and seeing the outcomes from specific interventions for specific populations allow us to fine-tune our care so we can provide it not only more efficiently, but more effectively for people. That’s what data-driven care means to me.
The data-centered strategy of dividing patients into populations is ostensibly carried out for health care improvement, justified under “tailored care” practices but without specific acknowledgment of racialized health inequality. Data collection alone sustains association without context, with differences observed without explanation as revealed when I asked for an example: So let’s see, like with blood pressure. What are certain things that make general blood pressures rise for African Americans versus non-African Americans? Is it diet? Is it lack of exercise? Socioeconomic factors play into it, too. Is there more opportunity based on where this population lives here? Or is it based on what this population is eating? You know, that type of thing. It could be any of those. (Data analyst, administration)
Race is indeed recognized as an important basis for difference, as reflected in his belief that “certain things” raise the blood pressure of African Americans but not for others. Yet his open-ended questioning suggests that data collection alone cannot explain race’s particular social significance for health and medicine. Institutional actors instead are left to grapple with observed differences on their own, resulting in a decontextualized racialization despite public investment in such data for reducing inequality.
A data analytics programmer also noted the patterning of race and health, citing a general recognition of racialized health differences in light of scientific research and local data sources. Because a racial or ethnic group may be more “prone” to a particular disease, race-based data in her estimation carries unspecified potential to promote better health: I think with race and ethnicity, it helps in a certain way [to collect race-based data] because let’s just say Southeast Asians are more prone to diabetes. That’s something we all know. Maybe American Indians are more prone to something else. So, it comes back to the topic of prevention—to have it effectively done, you have to capture the data, see the population, look at the trends and see, “Okay, this population has this kind of tendency.” And so maybe the data will help them in having better health. . . . Data will only tell you it as good as it is. There is nothing new you can put in it or that you can give to it. So, if it’s not in the data, it’s not there. And if there’s data on race and ethnicity, and you have diabetes demarcated by race and ethnicity, then it definitely means certain races are higher risk that other races. Maybe others have something else. Like maybe certain races have more Vitamin D deficiency, or certain races have higher cholesterol, or certain people have arthritis. And how do you make any right decisions without data? (Programmer, IT and analytics)
The epistemic authority granted to data (“if it’s not in the data, it’s not there”) serves to bolster the mattering of race (being “higher risk”) and justify expanded data analytics, but without clear explanation of why health outcomes are regularly patterned by racial difference. Her response further risks the biologization of health concerns across social groups, given its limited consideration to racialized social conditions or health inequalities. Taken together, the insistence on the need for data without recognition of the social nature of race suggests electronic data capture institutes a decontextualized racialization. Data analytics cannot guarantee that institutional actors understand the sociological context of difference or that they relate such data to broader projects of redressing inequality.
The Politics of Explanation: Multiple Justifications for Race-Based Data Collection
Decontextualized racialization ultimately leaves health care workers to develop their own justifications for the importance of race-based data, provoking biological, cultural, and social explanations of race’s relevance for health and medicine. The most common explanation offered linked race and ethnicity with specific disease forms, with responses to data collection routinely coupled with biological explanations for race–health associations. These explanations often emerged in response to the mandated data collection, despite original policy objectives of instituting race-based data for disparity reduction. One quality improvement (QI) coordinator with a nursing background, who was enrolled in a computer science graduate program at the time of interview, provided one such example when I asked for her thoughts on the REAL data collection: Yeah, so when you have race recorded with high blood pressure, it is better to have all that information there [in the health record]. For example, with African Americans, these patients come in and they tend to have high blood pressure. So, you know that maybe it is your ethnicity which is contributing to you having those kinds of conditions, and with those ethnicities you have to be careful. So, I believe it is good to have [REAL data in the EHR]. Let me tell you another thing—South Asians have very high rates, are very prone to having heart attacks. So, for that group, they say you should watch your cholesterol, watch your activities, and your cholesterol has to be lower at a certain level. They normally say 220, but for certain ethnicities, it’s more stringent actually. So, I feel it is good to have all that information in there. It will help. (QI coordinator, quality division)
Despite recognizing race–health associations, this health care worker views race as a contributing factor to poor health itself but not as an indicator of inequality—a seeming given set of circumstances rather than something to actively work to change. The additional reference to race-specific medicine further reveals an underlying belief in the importance of race: not for the purposes of disparity reduction, but to inform “tailored care” practices (e.g., adhering to different cholesterol standards based on race-ethnicity data or offering translation services; for a review of clinical algorithms implicated in race-specific care, see Braun et al. 2021; Vyas et al. 2020).
Race-based data collection thus reveals competing practitioner understandings for the overall relevance of “race” for biomedicine, with these oftentimes conflicting with sociological explanations of racialized health inequities. Another nurse, this one working on the data integration team leading systemwide data-centered care programs, emphasized “genes” in explaining the importance of collecting race and ethnicity data: As a nurse, I know in fact that there are diseases that are more pervasive in some ethnicities, and some diseases that are more pervasive in other ethnicities. So, for me, the [race-based data] information is very important because I will have insight when caring for them. They may have a disease that is pervasive in their ethnic group that maybe the doctor hasn’t seen, but me as a nurse, I may see it.
What do you mean? Can you give me an example?
Okay . . . so we were told that Mexicans or the Latino population, they have a very high threshold for alcohol. So, they may tell us that no, they’re not drinking because they’re still not out of it, but we could smell their alcohol so we would know as a nurse, “Oh, he might have a gene that won’t really tip him over to the other side, no matter how much he drinks.” So that is important in taking care of this population because we could educate them. The alcohol may not impair him, but it will kill his liver or his kidneys. So, he can deny it to us, that he is not drinking that much, but because he may have that gene, he will be able to drink so much but not be impaired. (pause) Unlike other ethnicities, one or two glasses, already wasted! (laughs) (QI coordinator, administration)
The existence of these beliefs, especially among key clinical members of the data integration team, reveals a much more complex set of stakes in leveraging data analytics to redress inequity. In this instance, the nurse justifies race-based data collection in an inaccurate, genetics-based accounting of difference while perpetuating behavioral stereotypes by ethnicity and cites racial difference’s relevance for care accordingly. Expanded data collection here confronts preexisting biological and behavioral understandings of race, but with limited consideration of racialized social conditions or their relation to inequity.
Not all health care workers offered biological justifications for collecting REAL data: Others offered explanations grounded in patient cultural differences, with a smaller minority recognizing the social significance of REAL data (see next subsection). Several emphasized such data for “tailored care” alone, oftentimes without recognizing how data analytics may inform population-level disparity monitoring. A frontline clinic manager offered another such example of the data’s immediate relevance for care: When you collect the REAL information, it fosters trust between the patient and you as the data collector. The cultural sensitivity is there, and you don’t just talk to them in a nonpersonal way. You’re not going to talk to a Filipino patient without really knowing you know their culture, right? You know that Vietnamese people, when they talk to you, they don’t look straight into your eyes since it can be construed as rudeness. But some people just look straight into your eyes because for them eye contact means showing respect, right? So, for me it’s really important that you know [patient race and ethnicity]. These are basic data, but you can really make a connection with your patient. (Nurse manager, outpatient clinic)
In this context, race, ethnicity, and language information is deemed important and relevant, but for the limited purposes of respecting patient differences as a part of clinical treatment. Race-based data collection is seen as crucial for cultural sensitivity and tailored care (echoing cultural competency initiatives or need for translation services; Betancourt et al. 2003; Olsen 2021), but there is little acknowledgment of racialized social conditions or policy objectives of disparity reduction. Thus, data collection not only inscribes “race” within clinical settings in a decontextualized manner. In leaving providers and staff to their own devices in making sense of difference, standard data reanimates a long-standing politics of explaining race’s connection to health, now openly taking place directly within sites of care.
Activity without Change: Data Collection as Strategy of Deferred Action
After several months of fieldwork and interviews and noting the regular decoupling of REAL data collection from stated policy objectives, I began asking informants directly about using such data to redress observed disparities. I was in part surprised by the limited acknowledgment of racialized health inequities (as grounded in social conditions, not decontextualized biological or cultural explanations), especially within the public safety-net context of serving a large racial and ethnic minority and immigrant patient population. I was also keenly aware of general policy and advocacy expectations that such data be used by local health systems for the purposes of disparity reduction. This led to another challenge: Few providers and staff members viewed their work as connected to the problem of “disparities,” and thus most anticipated little change following expanded data availability. A clinician was quick to separate her care practices from the problem of inequality itself, choosing instead to focus on her patients as individuals:
Based on fieldwork I’ve done in Washington D.C., some health advocates have called for data collection on race and ethnicity to assess disparities in health and care. Do you have any thoughts on that overall?
I don’t know. I treat every patient like an individual and I treat them all the same, you know what I mean? I don’t discriminate. “Oh gee! You’re Hispanic so we don’t need to control you as well because you’re eating tortillas.” I mean, it’s not like that, I’m trying to get the patient to have optimal health, and I’m trying to help them be the healthiest person they can be, regardless of who they are. It doesn’t matter, so I’m not sure [the data] makes a difference now when I treat a patient. (Physician and primary care provider, outpatient clinic)
This response conveys an outward expression of belief in equal treatment for all patients, perhaps in anticipation of questions on provider bias (Ibrahim et al. 2003; van Ryn and Fu 2003). This not only reflects a limited means of relating care to inequality but also betrays an absolution of responsibility toward redressing the critical social problems that data collection is expected to induce among providers (Institute of Medicine 2014:17). Thus, in contrast to scholarly and policy audiences who call for providers and systems to intervene on inequality post-data collection, the practitioners required to collect such data may not necessarily see themselves as implicated in disparity reduction. Overreliance on data collection in turn risks creating activity without change, resulting in limited transformation of the social conditions informing health inequities.
In contrast with the previously described health care workers, a small minority of informants did understand racial health inequities as grounded in social conditions and further understood the general expectation that race-based data collection be used for disparity reduction. However, these two workers still anticipated little change post-data collection based on their own experiences working within the health system. Both were immigrant women of color—from Latin America and Africa, respectively (country specifics removed to preserve participant confidentiality)—and had spent over two decades each working to serve the largely low-income, minority, and immigrant patient population as a part of the safety-net. Neither expressed much confidence in the REAL data initiative as a means of redressing inequity:
What are your thoughts on the recent Race, Ethnicity, and Language data collection?
In that respect, I still feel we fall short. For example, I’m Hispanic, okay? And I don’t see that many Hispanic providers in our hospital, in the nursing setting. We’re not seeing that representation. Our [patient] population of Hispanics is very high in our county, and in our hospital, but we’re not seeing that representation in the staff.
(pause) Thank you for sharing that with me—I’ll come back to that. Can you tell me more about the REAL data collection? Do you think it’s important to collect that in the health record?
Yes, because we want fair representation, and we really want people to advocate for the community. There is a major health crisis when it comes to being overweight, high blood pressure, diabetes in the Latino population. . . . It’s also very difficult to relate to somebody in a different culture or to teach somebody health eating habits, especially if they don’t have resources. I was just explaining this to my colleague nurse, “How do we teach a homeless person to eat healthy when he has to worry about that one meal a day?” We have to take little steps at a time, and it’s better when you have that understanding and sensitivity. . . . I see a lot of those kinds of issues in the hospital and in our outpatient setting. I would love to see more Hispanics involved in health care.
This complex care nurse, unlike the other health care workers previously described, here demonstrates some awareness of racialized social conditions and associated inequities, and the need for greater advocacy on behalf of underserved communities. Her further identification of the importance of basic resources and need for “sensitivity” appears to reveal an understanding of the social nature of race and ethnicity in relation to health. However, she had very little confidence in the REAL data initiative for addressing the more fundamental issues she identified herself within the organization:
Do you think the REAL data collection relates to this at all?
Not really. No, not really. Because the data collection is more like . . . when a patient comes in, we want to know whether you’re Hispanic, blah blah blah, just to lump you in this area, but not necessarily to provide you with the resources that you may need as a Hispanic person or an African American person.
What do you mean? Can you give me an example?
Well, like say I see the need for a diabetes class in Spanish. Yes, it’s good for the patients. But it also involves resources from us. We’re going to have to pay somebody to give this class. So, what is the gain to the entity [County Health System]? Long term, this patient is going to have decreased fasting glucose levels, is going to have their A1C better controlled, and ultimately that’s going to result in fewer hospitalizations . . . but that’s a long-term goal. It’s not tangible, and it will involve spending money from our part to give these specific classes in Spanish. There also just aren’t enough classes for the demand we already have, with our patient population. We also have to understand that in the Hispanic community—because I’m very familiar with it myself—we’re not all at the same learning level, okay? We still have people who are illiterate. And how do you teach somebody how to take this medicine when they cannot read the name on the bottle? You know, we really have to understand the situation. So having language data on record, what’s that going to do? . . . I don’t really see how [the REAL data] helps the patients. (Complex care registered nurse, administration)
Data collection, in this nurse’s estimation, does not stand in for understanding differential social conditions or addressing “social factors” directly. Citing the uneven distribution of resources, limited educational opportunity, and language barriers—including how these social conditions intersect with race and ethnicity and organizational dynamics—she recognizes the social context of patient situations and the need for “fair representation and advocacy.” Yet she also recognizes that data on race, ethnicity, and language alone will not generate new resources; nor will it guarantee that providers, staff, and administrators work toward redressing observed disparities, especially if they do not understand the social conditions of racialized health inequity.
A clinic manager (the second immigrant woman of color described previously) also expressed doubt that race-based data collection would ultimately lead to change, despite recognizing the REAL data’s expected role within disparity reduction. In her estimation, the narrow focus on data obscures the lack of will and resources dedicated to redressing observed differentials, with data collection ultimately a strategy of deferred action:
Do you think it’s important to collect the REAL information?
(pause) Personally, to be honest, I don’t know. From all the things I’ve read, they say that it’s used to direct care, to channel care. After you collect all the data, and you find out, “Okay, with African American women, the data show that their numbers are high in this area. What can we do to improve that aspect of their life that shows that they are not being well taken care of?” If that’s really true, that’s what it is going to be used for, and it is being done, then yes I like it, but where I have issues with it—and this is not just particular to the REAL data, but all data—is you have to have a plan, you have to know what you’re going to do with that data when you collect it. If you don’t have a plan on what to do with it, don’t do it, because the people who you are collecting that data from would like to know what happens to all this information once you’ve collected it from them. Yes, the intention is good, all very well intended, but what is the plan? If you say that African American women have a high rate of breast cancer, what are you doing to prevent it, to reach African American women, to help them? What is the plan to stop them from being high risk? What is causing them to have high risk of breast cancer, and if you find out the cause, what are you going to do about it? If you’re going to do something, then it’s worth every ounce, every second we spend on it, but if you don’t have a plan, then please don’t collect.
Like the complex care nurse previously quoted, this clinic manager recognizes the causes of disparities as rooted in modifiable social conditions (“what is causing them to have high risk” and “What are you going to do about it?”). She further demonstrates a clear grasp of stated policy goals behind REAL data collection and appears to understand very well the expectations of what might potentially be done with race-based data. Yet she ultimately viewed such data-centered initiatives as strategy without action, anticipating no system interest in following up post-data collection:
Do you think it’s a good start in terms of identifying disparities?
Again, as I said, if it’s about identifying disparity, if you find out that there is a disparity between the Latino group and the Caucasian group, how are you going to bridge it? If you don’t have a plan on bridging that disparity, why are you collecting the data? That’s why people get so data tired. If I knew that after we collected the data, and there is a disparity, and this is being done to bridge that gap, then when you come back and say, “Okay, this is what we found, this is what we’re doing, this is what we have seen, now we want to look at this”—then we would be happy to engage in that. But, instead, you will find that you collected this so many years ago, but what did you do with it? The government changed and nothing was done about it. That’s when people get data tired. (Nurse manager, outpatient clinic)
As with the same nurse previously quoted, she recognizes the intention of data collection but also draws from her own long experience working within the organization in anticipating little successive work post-data collection. She describes the experience as neither one of frustration nor of naïve optimism, but of being data tired: the sensation of complying with ever more data requests while not seeing change on the frontlines of the clinic or within broader society. In contrast to the public officials and health advocates who argue for investing in data analytics to redress health inequities, she instead interprets data-based mandates as means of appearing to make progress without transforming the social conditions informing health inequality.
Discussion
In this article, I examined how embedded health practitioners and institutional actors—including providers, clinic staff, data scientists, and administrators—understand standard data on “social factors,” focusing on mandated data collection on race, ethnicity, and language (REAL). Such data are broadly described by scientific and policy audiences as an essential first step toward transforming social conditions (Adler and Stead 2015; Chin 2015; Institute of Medicine 2014), with clinical settings framed as potential sites for intervening upon inequality. Through ethnographic fieldwork and in-depth interviews, I show how such data collection—despite original intentions of supporting work toward disparity reduction—institutes a decontextualized racialization within biomedicine, provoking biological, cultural, and social justifications for collecting social data. Such multiple explanations may ultimately run counter toward projects of redressing inequities, with expected “tailored care” practices leading to limited change to racialized social conditions. In sum, investment in data analytics risks resulting in the invisible inscription of racialized health injustice—even as such data sources are legitimated as public means of rendering inequity visible.
By studying data analytic technologies within a clinical setting primarily serving low-income, racial and ethnic minority, and immigrant patients, this research thus demonstrates a critical paradox of stratified biomedicalization: The same data-centered interventions expected to redress inequity may ultimately reinscribe it. Sociologists have long recognized that the incorporation of race within biomedical research risks reification in failing to specify its sociological basis (Duster 2005; Lee 2009; Roberts and Rollins 2020), even when such scientific programs are publicly justified as attending to societal problems (Ackerman et al. 2016; Bliss 2012; Rollins 2021). Scholars of medical education have similarly shown how curricular expansion to include social determinants may perpetuate inequities, such as by appearing to address the social while preserving biological notions of difference (Olsen 2019; Sharma et al. 2018). The present study suggests that these issues do not only manifest within scientific research or medical education: Everyday clinical settings now encounter similar challenges in integrating attention to social causes following the widespread digitized transformation of care. The same racial data invested in by scientific, policy, and advocacy audiences may ultimately take on a life of their own as they move across organizational contexts into a diversity of practices. Despite the legitimacy afforded to race-based data for purposes of disparity reduction, I have shown how data analytic technologies may become enrolled within projects of racialization while forestalling alternative possibilities of advancing social justice.
This research further demonstrates the power of qualitative and ethnographic work in studying the social life of data analytics, revealing the heterogeneous and emergent practices surrounding clinical integration of data artifacts (e.g., metrics, algorithms, data dashboards). Indeed, while existing disciplinary trends adopt new computational methods and large-scale data sources for basic understanding of social processes, empirical observation of people and technologies working together reveals unique insight into how social problems appear in everyday life. In this case on leveraging data analytics to redress inequality, most health care workers expressed limited understanding of racialized social conditions as emphasized by social scientists (Ahmad and Bradby 2007; Turner et al. 2017; Williams et al. 2019), and many did not see themselves as active agents of change within their own organizations (Mackenzie et al. 2020; Spencer and Grace 2016; Wright and Perry 2010). Perhaps most concerning, the decontextualized nature of data collection and race–health associations provoked biological justifications that sociologists have long challenged in explaining enduring health inequities (Du Bois 1906; Duster 2005; Olsen 2019; Roberts 2012; Williams and Sternthal 2010). These findings underline the need for multimethod examination of the broad-scale transformations engendered by data analytic technologies, and the significance of a critical sociology of medicine in studying the contextual dynamics of enduring inequality.
If investment in biomedical data analytics results in racing the machine, then this holds broad implications for scientific, policy, advocacy, and activist audiences genuinely committed to advancing social change. As noted in the early stages of the COVID-19 pandemic, elite medical societies and advocacy groups jointly called for federally mandated data collection on race as the primary means of monitoring and redressing emerging inequities (American Medical Association 2020; American Medical Association et al. 2020; e.g., covid.cdc.gov/covid-data-tracker/#health-equity-data). Meanwhile, the racialized social conditions of differential population health profiles (e.g., higher burden of chronic illness), stratified living and working conditions, and legacies of residential and institutional segregation remain structurally intact despite heightened public awareness of COVID-19 inequities. This research thus lends empirical support to the Black Public Health Collective’s (2020) declaration that “race-based data are not racial justice,” recognizing technological effects may be paradoxical (Cruz and Paine 2021; Hoeyer 2023; Thompson 2021; Ziebland, Hyde, and Powell 2021) and themselves highly unequal (Noble 2018; Obermeyer et al. 2019; Vyas et al. 2020). It further joins recent critical scholarship in reimagining the role of data within collective struggles for freedom and social justice (Benjamin 2019; Hatch 2022; Nelson 2016; Rodríguez-Muñiz 2016).
Conclusion
Despite institutional objectives of reducing inequality, this article demonstrates how the same data expected to redress injustice may ultimately reinscribe it. Data standardization serves as a formal technique of accounting for “race,” potentially informing race-specific clinical practice, revealing practitioner belief in behavioral and genetic accounts of difference, all while forestalling alternative possibilities for advancing social justice. As data-intensive and computational logics further the ongoing, increasingly complex and technoscientific transformation of U.S. biomedicine, all audiences must confront what demands for greater data, and even the final data reports themselves, may ultimately encode: a seemingly socially conscious medicine that remains just as impenetrable to change as ever before.
Footnotes
Acknowledgements
This article is dedicated in loving memory of Luis Angel Rodriguez. May you rest in power.
An earlier version of this manuscript received the Ida B. Wells-Troy Duster Award from the ASA Section on Science, Knowledge, and Technology. I would like to honor and thank all Black scholars who inspire us to join collective struggles for freedom and social justice
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the following grants: 1144247 from the National Science Foundation, UCSF Department of Social and Behavioral Sciences (Harrington and Newcomer Health Policy Awards), and UCSF Department of Anthropology, History, and Social Medicine (Forsythe Award for Social Studies of Science, Technology, and Health). The content is solely the responsibility of the author and does not represent the official views of any of the aforementioned organizations.
