Abstract
Machine learning offers new possibilities for developing more precise diagnostics and treatments, but the increasing use of sex stratification in precision medicine algorithms raises concerns. Using Alzheimer's disease (AD) research as an example in which machine learning approaches are applied to a heterogenous, socially patterned disease, this paper examines how the move toward sex-specific “pink” and “blue” algorithms reinforces biological sex essentialist assumptions and their attendant harms. We analyze three examples of sex-stratified algorithmic approaches in AD research, and identify three interacting processes-effacing contested knowledge, obscuring social factors, and ossifying binary sex categories-that can occur when binary sex variables are incorporated into predictive models. These case studies demonstrate that even in models intended to be causally agnostic, sex categories are likely to be interpreted as decontextualized, self-evident health determinants in a manner that can imply causality of biological sex. We call for establishing ethical norms and empirical standards for including gender/sex variables in precision medicine algorithms to avoid perpetuating crude ontologies of sex and gender that undermine both scientific validity and health justice.
Introduction
Machine learning offers new possibilities for medicine: data-driven tools that promise to tailor care to each patient's particular needs and circumstances. Those who develop such tools are, with increasing frequency, stratifying their models by sex or including sex as a predictor. In light of concerns about comparable accuracy and fairness of medical algorithms for men and women (Celeste et al., 2023; McCradden et al., 2020), the appeal of sex stratification is obvious; its dangers less so.
The embrace of sex stratification stands in contrast to discussions of “race corrections” in biomedical algorithms. Recent scholarship in medicine and science & technology studies argues that race-based corrections reinforce stereotypes about biological differences between groups; systematically mischaracterize risk for non-white groups; and suggest that race is a cause of health outcomes when racism is what is causally relevant (Braun et al., 2021; Delgado et al., 2022; Denny and Collins, 2021; Eneanya et al., 2022). As a result, medical researchers are reassessing the widespread inclusion of race categories in clinical algorithms, once believed to be necessary for accurate risk prediction and racially inclusive medicine. Reflecting this reversal, a 2020 perspective in the
As we will argue, sex-stratification calls for similar scrutiny. One place where sex stratification has taken particular hold is in research on Alzheimer's Disease and Related Dementias (ADRD or AD), a well-funded, politically powerful, and socially salient field of biomedicine (see e.g., Alzheimer's Association, (n.d. a, n.d. b, n.d. c)) with a history of contentious debate over research ethics and the role of biological versus social factors in disease risk and prediction (Buckley et al. 2019; Morris et al., 2014; Piller, 2025). In this paper, we use AD research to characterize visions of binary sex-stratified solutions to multifactorial, costly, socially-patterned health conditions, and anticipate how these visions may ossify binary sex categories across precision medicine platforms.
In medical research, sex is conventionally defined as “the different biological and physiological characteristics of males and females, such as reproductive organs, chromosomes, hormones, etc.,” and gender as “the socially constructed characteristics of women and men—such as norms, roles and relationships of and between groups of women and men” (WHO, 2021). In this paper, we also use the term “gender/sex” for cases in which “gender and sex cannot be easily or at all disentangled” (van Anders, 2015: 1181). The term “biological sex essentialism” refers to concepts and practices where sex is defined by a set of binary, fixed variables that are facts of biology, found in nature across species and ecologies, uncontroversially ‘scientific,’ and omnipresent throughout the body so that every tissue at every level of biological organization can be characterized as male or female. (Richardson, 2022: 14)
Machine learning algorithms in precision medicine use large datasets to identify patterns and features relevant for predicting health outcomes or disease categories. In this paper, we use “sex-stratified algorithms,” sometimes also called “sex-specific algorithms” and “sex-sensitive algorithms,” as an umbrella term to refer to machine learning approaches that incorporate sex categories as a source of predictive information; build a separate computational model for each sex; and/or include different numerical cutoffs or relevant predictive features for each sex.
Technically, these methods are often intended to be causally agnostic. That is, an algorithm can offer predictions—for example, whether an individual is more likely to develop AD or will benefit more from an early intervention—without positing a causal relationship between predictors and outcomes, and without elaborating mechanisms that explain why certain individuals are more at risk or why a given intervention works. Our analysis of sex-stratified algorithms will show that the application of such algorithms, even if intended to be causally agnostic, can reinforce biological sex essentialist assumptions and their attendant harms. Notably, this identifies a novel way in which sex essentialism can become embedded in scientific research, as compared with the explicit biological sex determinism of more traditional life and biomedical sciences that feminist scholars have previously analyzed (Fausto-Sterling, 2000; Jordan-Young, 2011; Richardson, 2013).
The clinical and direct-to-consumer technologies built using sex-stratified algorithms may harm individuals and communities and impede the advancement of scientific knowledge, both in the short and long term. In AD science, they may, for example, produce disproportionately inaccurate AD predictions for people with nonbinary or gender expansive identities, or people with cisgender identities who lie on the ends of statistical distributions for men and women. They may also uphold the notion of a woman's brain and a man's brain as innately different, deepening stereotypes about differences in intelligence, cognition, career preferences or motivations, and other psychometrics that, due to their link to the brain and cognition, carry stigma and status and have historically contributed to the subjugation of women (Fine, 2010; Jordan-Young, 2011; Lock, 2013).
We argue that the unreflective use of binary sex categories in machine learning for precision medicine research accelerates approaches to studying gender/sex disparities in health that focus exclusively on biological factors and entrench biological essentialist understandings of gender/sex categories. We show that this occurs through three interconnected processes: effacing contested knowledge; obscuring the social; and ossifying binary sex categories. We describe these processes through close analysis of major research initiatives advancing the study of sex in AD precision medicine, such as the Women's Brain Project (WBP), and through three worked examples of research programs attempting to execute sex-stratified predictive algorithms aligned with the vision of the WBP. We call for precision medicine scientists, ethicists, and critical data studies scholars to establish ethical norms and empirical standards for including gender/sex-related variables in precision medicine algorithms.
Sex as the “gateway to precision medicine”
Many women's health and sex-based biology advocates posit that sex represents a “gateway” into a precision medicine future (Clayton, 2016; Ferretti et al., 2018; Hampel et al., 2018; Miller et al., 2015; Stachenfeld and Mazure, 2022). Precision medicine is a vision of medical care that is tailored to each individual patient, offering customized diagnoses, treatment plans, and predictions of health risks based on an individual's lifestyle, environment, and, most prominently, genetic make-up (Erikainen and Chan, 2019; Ferryman and Pitcan, 2018; Joel et al., 2015; National Institutes of Health (NIH), 2020 ; Prainsack et al., 2018). Proponents of precision medicine claim that this future can be achieved by harnessing the potential of large digital datasets and machine learning to reveal differences in disease risk, biology, and progression amongst subpopulations (Behl et al., 2022; Denny and Collins, 2021; Schaefer et al., 2019). To reach this aim, advocates call for increasing investment in AI-powered models trained on larger and more granular datasets for use in medical research.
ADRD are “debilitating conditions that impair memory, thought processes, and functioning, primarily among older adults” (U.S. Dept. of Health and Human Services, n.d.), and are responsible for significant human suffering, health care costs, and caregiving labor. AD is a highly heterogeneous disease category, that is, one with large variability in disease presentation, progression, and patterns of pathology. Despite billions invested annually into research, existing hypotheses about underlying mechanisms remain contested (e.g., the amyloid hypothesis (Morris et al., 2014)), and effective treatments have proven challenging to develop (Moutinho, 2022). Funders and researchers thus consider AD an ideal target for precision medicine (Arafah et al., 2023; Hampel et al., 2018; NIA-AA Symposium Enabling Precision Medicine for Alzheimer's Disease Through Open Science, 2021; Yang et al., 2021). That is, machine learning looks especially well-suited to tackling AD, since etiological mechanisms are multiple, complex, and interlocking—all factors that make it difficult for human researchers to craft fruitful hypotheses or identify useful patterns.
More women live with AD than men, and women are more likely to be caretakers of individuals with AD. These sharp disparities in the burden of the disease have made AD a focus of both women's health advocacy and research on sex differences. Leading voices behind this focus include the Society for Women's Health Research (Society for Women's Health Research, n.d.), the Office of Research on Women's Health, the Alzheimer's Association (Alzheimer's Association, n.d. a, n.d. b, n.d. c), the UK's Alzheimer's Society (Why is dementia different for women?, 2024), Maria Shriver's Women's Alzheimer's Movement (The Women's Alzheimer's Movement, n.d.), and the WBP (Ferretti et al., 2021)—the last of which is an influential non-profit consortium advocating for the importance of sex-based analysis in precision medicine for AD. In 2020 the Alzheimer's Association created a formal professional interest area, “Sex and Gender Differences in Alzheimer's Disease” (ISTAART Community, n.d.), including an award program to support scientific research on “understanding the contributions of biological sex and gender…to address the gaps in our understanding of the role of sex assigned at birth and related genetic, biological, lifestyle and societal factors may play in increasing vulnerability to AD” (Alzheimer's Association, n.d. a, n.d. b, n.d. c), which also established scientific research cohorts to examine sex differences in AD risk factors.
A 2023 flow chart (Figure 1), published by the WBP as part of their efforts to launch a “Sex and Gender Precision Medicine Institute,” serves as an ideograph of these developments (Castro-Aldrete et al., 2023). In this figure, research begins in the laboratory with comparisons of male and female cells in petri dishes or rodent models. It then proceeds in iterative dialogue, as depicted by the two-headed arrows, to “sex-based disease modeling,” which might involve methods such as “stratification of algorithms by sex” and “sex-sensitive deep learning algorithms.” Multi-modal population data and patient data are then processed through these algorithms to develop “sex-sensitive clinical diagnosis and treatment.” Only at the last stage might features such as “socio-cultural determinants” of health be included, alongside “access to digital health tools” that presumably will make such sex-based algorithms a ubiquitous and everyday part of peoples’ lives.

A flow chart from the women's brain project illustrating a “sex-sensitive Alzheimer’s disease (AD) approach” to inform “precision medicine agendas” (Castro-Aldrete et al., 2023: 7).
The WBP asserts that “
The push for sex stratification in biomedicine is not unique to AD research. The WBP's approach emerges within a powerful global movement towards stratifying by sex at all levels of biomedical research (Pape, 2021). Exemplifying this work is the mandate, introduced in 2016 by the National Institutes of Health (NIH), requiring all NIH-funded preclinical research to consider sex as a biological variable (SABV). Champions of this movement see precision medicine initiatives as naturally aligned. As Janine Clayton, Director of the NIH's Office for Research on Women's Health puts it: attending to sex differences is “one step toward the more individualized approach to human health that is the trajectory of medical practice and the aim of the Precision Medicine Initiative” (Clayton, 2016). In close alignment with SABV policy, the WBP proposes implementing a sex-stratified approach at every stage of AD research, modeling, assessment, diagnosis, and treatment, “taking into account sex and gender differences to make a precise diagnosis and recommend a tailored and more effective treatment for each individual” (Cirillo et al., 2020: 81).
The WBP, along with similar initiatives rooted in sex-based biology and promoting gender-specific medicine, presents sex-stratified approaches to AD as not only scientifically important but also ethically essential for addressing a history of androcentrism in biomedicine. Appealing to a gender equity imperative to include sex and gender (Cirillo et al., 2020), the WBP envisions a program of research and clinical risk prediction that is centrally oriented around the category of sex, organizing data and pursuing analysis within a framework of binary sex difference.
Contested science
Sex-stratified machine learning approaches to AD risk prediction and diagnosis introduce binary sex categories into research designs in a space of underdetermination, in which knowledge about the existence of sex differences in the epidemiology of AD and the role of social variables such as education, socioeconomic class, and occupation in producing any disparities is contested. As in many areas of sex disparities science (Danielsen et al., 2022; Einstein, 2017; Lee et al., 2023; Rushovich et al., 2023), in AD research, both the
Due in large part to demographics of aging, with women living longer than men (Arias et al., 2019) and increases in AD incidence with age (Hebert et al., 2013; Mayeda et al., 2016), a greater number of women live with AD than men. Evidence on sex/gender differences in AD beyond longevity remains equivocal and contested (Mayeda, 2019). For example, claims of excess AD mortality and morbidity among women (e.g., Buckley et al., 2019) face methodological challenges resulting from survival bias, competing risks of death from other causes, and measurement challenges (Mayeda, 2019). Apart from survival differences, reported disparities in AD risk may also be shaped by the gender/sex patterning of known, modifiable risk factors, as suggested by Geraets and Leist (2023) who, for example, found no sex difference in the risk of dementia but rather “differences in the prevalence of modifiable risk factors for dementia,” such as childhood deprivation and low wealth. Anstey et al. (2021), as another example, demonstrated that gendered differences in midlife cardiovascular conditions, such as physical activity and hypertension, largely explain observed sex differences in cognitive decline. Likewise, a growing literature also supports the theory that known, modifiable risk factors of AD are patterned by gender/sex and gendered experiences (Dekhtyar et al., 2015; Sindi et al., 2021; Wolters et al., 2020), including gender identity (Brady et al., 2024). Moreover, any sex differences noted in AD incidence are “slight” compared to disparities observed across socioeconomic and racial/ethnic groups (Lim et al., 2022).
While it is likely that AD emerges from a complex of biological (e.g., genetics, cardiovascular disease, protein plaque accumulation, protein tangles) and lifestyle and environmental factors (e.g., education, employment, caregiving burden, socioeconomic status, lifetime stress, traumatic brain injury exposure), there is little consensus about the relative importance of and interaction between biological and social variables in gender/sex disparities in AD. Research indicating an important role for gender-related social factors—such as education level and occupation (e.g., Garibotto et al., 2008; Vemuri et al., 2014), experiences of violence and trauma (e.g., Severs et al., 2023), and social isolation (e.g., Shen et al., 2022)—in AD has greatly expanded over the past decade, yet considerable research effort and funding has been and continues to be predominantly directed toward biological mechanisms. For instance, the Alzheimer's Association distributed over half of its annual research funding to areas of “Molecular Pathogenesis and Physiology” and “Diagnosis, Assessment and Disease Monitoring” in 2022 and 2023 (Alzheimer's Association, n.d. a, n.d. b, n.d. c). Starting in 2018, the NIH's National Institute on Aging convened an effort to develop a “new biological research framework for Alzheimer's” that focuses on biomarker data (Silverberg et al., 2018). Biomedical researchers who advocate a “sex-specific or gender-specific focus in AD research” (Mielke et al., 2014) operate in this milieu and have set a research agenda at the highest levels of government, foundation, and pharmaceutical agencies focused on the discovery of biologically-driven, sex-specific differences in the prodromal (stage between initial symptoms and full disease onset), diagnostic, treatment, or late stage of the disease that must be taken into account in the development of therapies, screenings, and risk prediction tools (Nebel et al., 2018; Pike, 2017).
The persistence of biological explanations of observed sex disparities may reflect entrenched but outdated theories in Alzheimer's research, which have centered on hormonal explanations of AD since the 1990s. More specifically, growing interest in hormone therapy for postmenopausal women in the 1980s raised the hypothesis of estradiol decline playing a role in sex differences in AD. Indeed, the prospect of reducing AD risk became central to the explosion of hype around hormone therapy (HT) during the 1990s, with proponents of HT suggesting that the therapy reduced risk of cognitive decline in women (Henderson et al., 1994, 1996; Schmidt et al., 1996). Estradiol was not only touted as a potential cure or preventive measure to AD; HT proponents also gestured towards a sex-specific etiological theory of AD, helping to cement, in the eyes of both researchers (Fillit et al., 1986) and the general public (Fillit, 1986), the idea of AD as a female-biased medical condition, linked to female sex-related biology.
By the 2000s, however, the tide shifted sharply away from estradiol theories as new studies revisited the efficacy of HT and called into question the safety of long-term estradiol use. In 1992, the NIH started the Women's Health Initiative (WHI), a study intended to evaluate the efficacy and safety of HT as a preventative measure of heart disease and other health events. By 2002, the WHI elected to halt all study arms that involved estrogens and progestin treatments, citing increased risks for breast cancer, strokes, heart attacks, and other conditions (Writing Group for the Women's Health Initiative Investigators and Rossouw, 2002; for more recent developments see Manson et al., 2024). By 2004, the WHI canceled all programs involving any form of estrogen treatment. In addition to raising concerns about the risk of HT, the preliminary findings from the WHI research program on memory (Women's Health Initiative Memory Study) suggested that estrogen treatment could, in fact, increase dementia risk in women aged 65 years or older (Shumaker et al., 2003, 2004). Despite this, research on the role of biological sex-related factors in AD remains a top priority of major AD research funders, who are interested in molecular biological mechanisms and pathways for understanding AD pathology, including the interaction between apolipoprotein E ε4 (APOE; a risk factor for late life AD) genotype and chromosomal sex (Riedel et al., 2016) and enduring interest in the role of the postmenopausal reproductive transition (Scheyer et al., 2018).
In sum, today, sex and gender research on AD is characterized by extensive debate and uneven evidence about the contributions of sex and gender factors to AD. As the examples in the following section demonstrate, the application of machine learning to this contested field of science can further obfuscate the role of gender/sex in disease.
Sex-stratified predictive algorithms in Alzheimer's precision medicine science: Examples
To characterize and understand models, claims, and assumptions within the emergent use of gender/sex categories in predictive precision medicine approaches, we analyzed three papers applying sex-stratified predictive algorithms in AD precision medicine science. To identify these papers, we surveyed biomedical literature that uses sex variables in predictive models for estimating AD risk. Using a keyword-driven snowball search in Google Scholar (keywords: Alzheimer's, dementia, prediction, predictive model, risk, sex, gender, algorithm, and sex-stratified), we identified 25 articles for close analysis (see Supplementary 1). From this pool, the authorship team of scholars of gender and sexuality studies, history and philosophy of science, public health, and social studies of science, selected three articles that represent emergent strategies for incorporating sex/gender categories in machine learning precision medicine, aligned with the vision of the WBP. We emphasize that these examples do not represent a systematic review of the field and that our focus is not on critiquing individual researchers, but on illuminating assumptions and approaches emerging in this particular area.
Across the 25 papers reviewed, studies use a variety of tools such as neural networks, decision trees, or classifier models. These models compute an AD risk score or assign an AD disease status on the basis of individual patient demographics, biomarkers, and/or behavioral data. In cases such as the WBP, sex categories are a central analytic because sex disparities and sexed disease pathways are an explicit focus of the research program. Elsewhere, sex categories are routinely incorporated in study designs for other reasons, including: the assumption that sex is always an important moderating factor in AD; institutional mandates by funders and publishers that gender/sex categories be included in biomedical research (e.g., “NIH Policy on Sex as a Biological Variable,” see Arnegard et al., 2020); a desire to make tools for clinical settings, where the sex category is a ubiquitous demographic variable accessible to physicians; or an everything-but-the-kitchen-sink approach that inputs all available variables and allows an algorithm to select the most predictive ones.
Below, we describe the three studies and characterize their methods. In Section 5, we reference these examples as we identify three interconnected processes—
Calculating individualized AD risk
Our first example, published in 2020, comes from the journal

Part of a figure from Qiu et al. (2020), illustrating a neural network that combines neuroimaging and non-neuroimaging inputs, including “gender,” to predict disease status for the patient (AD or “normal cognition,” NC).
This study operationalizes “gender” as “male” or “female.” The authors thereby depart from conventional definitions of gender and offer no details as to how their “male” and “female” categories are defined, assessed or reported. In this study, “gender” is included because it is easily available and usable in clinical practice. The researchers describe gender and other non-imaging variables as “known Alzheimer's disease risk factors … easily obtained by non-Alzheimer's disease specialists” (Qiu et al., 2020: 1925). In contrast to other model inputs, the inclusion of gender is not elaborated. For example, the researchers further contextualize the inclusion of MMSE scores as a current standard of diagnosis. Likewise, they justify the inclusion of age as a means of controlling for “the natural progression of cerebral morphological changes over the lifespan,” citing literature showing a clear “proportionality between age and global cerebral atrophy” (Qiu et al., 2020: 1928).
Although “gender” is included here without a hypothesis about its causal relationship to AD, the authors report that “when age, gender and MMSE information were added to the model, then the performance increased significantly” (Qiu et al., 2020: 1928). Together with the statement that gender is a “known Alzheimer's disease risk factor,” the research team's findings imply that machine learning tools have identified binary sex categories as offering valuable predictive information about AD risk.
Sex-specific algorithms for AD diagnosis
Our second example is a 2019 paper published in the journal
To do this, the authors stratified the dataset by sex, creating two sex-specific branches—one for males and one for females—in the decision tree algorithm. They then applied feature selection algorithms to the male and female subpopulations, which selected the most informative neuropsychological tests for predicting cognitive outcomes. The results showed that the set of top five most predictive tests for AD status in males differed from the set of top five most predictive tests in females, generating two different decision tree computations for males and females. Although the authors also performed stratifications by education level (receiving high school education and above, or not) and APOE e4 status and found that optimal neuropsychological test profiles also differed in these stratifications, they elaborate only the results for sex stratification in the paper's discussion section.
The research team interprets these different optimal neuropsychological test profiles as evidence of a need for sex-specific algorithms and decision rules in AD screening. They also hypothesize that the sex-specific decision trees likely differ for males and females due to sex-related “cognitive heterogeneity” (Ang et al., 2019: 268) in AD phenotype, which might include sex differences in performance on cognitive tests and/or differential impacts of disease on cognitive domains in men as compared to women. The researchers conclude in support of the view that machine learning offers a promising approach for developing tailored diagnostics and “sex-specific decision rules” for AD (Ang et al., 2019: 268). Several co-authors of this paper have since collaborated with WBP researchers on a deeper probe to develop “sex-specific predictive models” based on male and female neuropsychological profiles (Ferretti et al., 2024). The result of this sex-specific machine learning approach to AD is to generate separate diagnostic or screening algorithms for men and women—creating pink and blue algorithms.
Predicting patient sex from AD data
Our third example is a paper from the WBP's research program published in 2022 in the
In this study, the authors demonstrated that a machine learning classifier (a type of model that predicts the category of a given input) could use digital biomarker data to predict the self-reported sex of healthy patients. Using 793 biomarkers capturing features of cognitive processing and physiology, which were collected from participants’ completion of two motor tasks and two augmented reality tests, the sex classifier demonstrated that it could successfully distinguish the sex of participants with good predictive performance (0.75 AUC). This finding is, in turn, interpreted as validating the importance of sex classifiers in “precision neurology” (Harms et al., 2022: 310). The WBP-Altoida research team interprets the classifier's performance in predicting the sex of healthy subjects as evidence of sex differences in baseline healthy neurocognitive performance. As the authors write, “sex differences [are] expressed by the capacity of results to inform a sex classifier” (Harms et al., 2022: 302).
The researchers further found that the classifier performed more poorly on a dataset of patients with mild cognitive impairment or AD as compared to healthy patients. They interpret this difference in the classifier's predictive performance between healthy and AD subjects as evidence that sex-based neurocognitive profiles change across the course of AD disease progression. In other words, they interpret the differential performance of classifier algorithms as evidence of both a sex difference in healthy patients and a sex difference in how AD progresses.
In this case, the perceived successful use of neurocognitive data to predict sex is interpreted as evidence of a sex difference in disease progression, inferring etiological significance from a dataset's performance in categorizing sex. As such, Harms et al. illustrate how algorithms can be mobilized not only to generate pink and blue disease-predictive models, but also in reverse, to predict people's sex category. The researchers conclude by calling for an “integrated framework for sex-stratified prediction, monitoring, and personalized treatment” (Harms et al., 2022: 310), which they argue has clinical significance for early disease detection and tailoring preventative treatment for those at-risk or in early stages of dementia, and can guide the development of digital diagnostic and preventative tools.
From sex stratification to biological sex essentialism
Scholars across information science, science and technology studies, public health, and gender studies have warned of the complexities and risks of uncritically embracing big data and machine learning. These include the risks of perpetuating discrimination in areas such as policing, employment, healthcare, and biometrics (Benjamin, 2019; Chun, 2021; Hu and Kohler-Hausmann, 2020; Pierson, 2024; Scheuerman et al., 2021; Selbst and Barocas, 2018; Wang et al., 2023), as well as the dangerous bioessentialist trade-offs of “inclusion and difference” approaches to correcting histories of sexism in medicine and advancing women's health (Epstein, 2007; Keyes et al., 2020; Richardson et al., 2015). These literatures motivate a concern that algorithmic practices may uncritically bake essentialized gender/sex markers of human difference into medicine.
Limited previous work that has looked specifically at the inclusion of gender/sex variables in medical algorithms has flagged the exclusion and omission of women and gender minorities from research and the potential for precision data platforms to inadequately capture social dimensions of sex and gender oppression (Pot et al., 2019). Writing about the exclusion of nonbinary individuals in algorithms for calculating body composition, Albert and Delano (2022a) highlight how the use of binary sex categories perpetuates “category-based erasure, the idea that although a particular group or subgroup of people may be present in a dataset, categories have been constructed in such a way that their presence cannot be determined one way or the other” (Albert and Delano, 2022a: 4). For example, nonbinary or intersex persons may be included in a dataset but lumped together within male/men or female/women categories, invisibilizing them within the structure of research categories. Consequently, gender/sex data in electronic health records—which are frequently used for machine learning because of their scale and ubiquity—exhibit, among other things, “slippage” between sex and gender variables; ambiguity of what a sex category refers to (genitals, sex assigned at birth, chromosomes, etc.); and fixation on sex assigned at birth as ground truth (Albert and Delano, 2022b).
Sex-stratified precision medicine algorithms clearly raise concerns about sex/gender slippage and category-based erasure. But they also raise additional, distinct issues, demonstrated by our analysis of their application to AD research. Here, in the face of contested science, purportedly causally agnostic machine learning approaches carry the potential to introduce mutually reinforcing, looping processes that sediment essentialist, binary, and biological approaches to explaining disparities in human health.
In the three examples above, sex is incorporated into models without an explicit hypothesis about the causal pathways between sex and AD outcomes, that is, in a manner agnostic to the reasons for the predictive value of sex variables. This is a common feature of precision medicine machine learning approaches, which are often solely or primarily interested in predictive power, in contrast to approaches aimed at identifying the causal mechanisms. Predictive machine learning-based research need not posit a causal role for sex in AD etiology (the causal pathway of a disease). Likewise, it need not resolve whether sex variables are a measure of biological factors and/or a proxy for other variables, such as gender or gendered exposures. Such causal agnosticism is often touted as an advantage of predictive machine learning (Anderson, 2008; Mayer-Schönberger and Cukier, 2013).
However, examining sex-stratified algorithms in AD precision medicine research makes clear how this causal agnosticism can work to
When sex-stratification in algorithms also works to obscure the role of social factors in health outcomes, the utilization of sex in predictive models without hypothesis or mechanism is likely to endorse and sediment essentialist, binary, and biological approaches to explaining health disparities. This can be seen in Example 2, the Ang et al. (2019) study. Observed sex differences in optimal neuropsychological test profiles could reflect differences in gendered educational and/or occupational experiences—which impact testing comfort and performance in these birth cohorts—rather than inherent differences between groups in aspects of sex-linked biology. However, the question of whether these findings are due to gender/sex differences in testing or measurement validity, rather than underlying AD heterogeneity, goes unasked. Here,
The risks of effacing contested knowledge and obscuring the social are amplified by the potential for looping, self-confirmatory processes to ossify categorical machine learning approaches in biomedical and population health research. When sex makes a difference to the algorithm's accuracy or decision nodes, and contested knowledge about the complex role of gender/sex related variables is sidestepped, some researchers use this to call for sex-based risk assessment in healthcare. We see this, for instance, in Example 3, Harms et al. (2022), in which the authors conclude that “more data on sex differences could guide future clinical practice, informing choices for ad hoc prevention (knowing sex-specific risk profiles), diagnosis (adjusting diagnostic cut-offs by sex), and treatment options (if sex specific efficacy and safety profiles will be found)” (2022: 310). Here,
Sex-stratified predictive algorithms in brain-related conditions use binarized data that fuses diverse demographic, biomedical, and social measures, capturing signals from a range of contextually situated sexed and gendered variables that systematically vary across gender/sexed bodies, but which cannot always be causally attributed to sex-related biology. Because the distribution of disease as well as lifestyle factors and social and environmental exposures are known to vary considerably across genders/sexes, we can expect that such approaches will likely return models of AD risk that, by some quantitative measure, “work” better for women or for men, and that appear to confirm the hypothesis that women and men carry different vulnerabilities for dementias. For instance, if sex functions as a proxy for occupational hazards that were gender-specific in the context of the cohorts used for testing and validation, adding sex as a variable or stratifying by sex will improve the model's predictive performance for that cohort, even though sex is not mechanistically related to AD diagnosis.
The outcome is a binary sexed model of AD, that is, a model of separate “male” and “female” Alzheimer's diseases, each with different etiologies, courses, and outcomes. Indeed, the vision of a future of precision medicine guided by sex-specific algorithms, exemplified by the WBP, is a drive toward separate testing protocols, diagnosis metrics, and preventative and treatment regimes for males and females, enabled by binary sex stratification in data collection and machine learning analyses. Such protocols and treatments will likely prove less effective not only for many nonbinary, intersex, transgender, and gender expansive individuals, but also for cisgender men and women whose biomarker profiles deviate from the mean of their sex's distribution. In the form of bioessentialist “pink algorithm/blue algorithm” claims, sex stratified precision medicine exits the laboratory into a world of powerful misogynist beliefs about sex differences in intellectual ability and cognitive strengths, which contribute to stigmatizing stereotypes in a range of arenas including education, career, and economic potential (Fine, 2010; Jordan-Young, 2011).
Everyday encounters with the technologies developed using these algorithms, many of which are intended for use in the clinic as well as at home or in direct-to-consumer devices, may directly help to construct gendered subjectivities in which people understand themselves to be part of a category with particular cognitive-behavioral risks, potentialities, and baselines. This data may be tracked and used in other areas, such as health insurance premiums (Sadowski, 2024), in an unequal way that further disadvantages those with higher predicted risk for AD. Companies that have commercialized digital biomarker data and cognitive performance metrics may profit from targeting women and heightening their anxieties about female risk for AD. When used in interaction with physicians, educational institutions, and other social institutions, these technologies risk contributing to the othering, exclusion, and stigma of nonbinary, trans, and intersex people in medical knowledge and at the clinical interface, for whom binary logics are particularly incoherent. Moreover, binarized algorithms can be co-opted by non-scientific actors, such as law and policymakers, to lend scientific legitimacy to trans-exclusionary laws and regulations (Sudai et al., 2022). This is all in addition to the potential harm to biomedical knowledge, where sex-binary tools, once deployed and standardized widely in the clinic, are difficult to revise or remove, creating intellectual and infrastructural barriers to pursuing scientific questions outside of this binary (Pape et al., 2024; Richardson, 2022; Richardson et al., 2015).
Conclusion and recommendations
AD leads to significant suffering and its prevalence will likely increase in the future as populations age. While there is no cure for AD, researchers hope that risk prediction algorithms may convince people to make lifestyle changes that might delay symptoms. It is understandable that researchers and policymakers want to both predict risk of AD onset, understand the causal mechanisms behind it, quantify and track the burden of the disease, and develop tools that will help support those with the condition and their caretakers. Equally, it is commendable that researchers wish to correct for and avoid repeating histories of sexism in medicine and biomedical research. It is also vital to attend to sex-related biases in datasets and their implications for inaccuracies and error rates for people tagged with a specific sex. But risk prediction involving socially salient categories such as sex and gender can also bring harm.
Sex-stratified approaches to risk prediction in precision medicine are rapidly advancing with minimal social, ethical, conceptual, and methodological dialogue around these practices. The specific case of sex-stratified algorithms in AD research illustrates how bold programs that build binary sex categories into algorithmic approaches to disease risk prediction are moving forward despite significant ongoing uncertainty among health researchers regarding any causal relationship between biological sex and AD—and even over whether gender/sex disparities exist in the disease at all, and if they do, the magnitude of these effects relative to other drivers of AD vulnerability, such as education and occupational history.
AD is but one example of a broader move toward sex-specific metrics, cutoffs, and diagnostics across biomedicine. Sex-stratified algorithms are currently under development across a range of domains and diseases, such as predicting opioid use (Bright et al., 2021) and liver disease (Straw and Wu, 2022), diagnosing cardiac disease (Bermúdez-López et al., 2022), and assessing ACL knee injury risk (Beynnon et al., 2015). Further analysis of this landscape is needed to understand the prevalence of sex variables in precision medicine research designs using machine learning, characterize how sex constructs are operationalized, and examine the specific assumptions that inform sex-stratification using binary sex categories.
This is challenging, especially given that existing data on sexed and gendered social factors is often limited and of poor or biased quality (e.g., Pot et al., 2019). There are, nonetheless, ways to engage in more ethical and precise research when attending to gender/sex variables. For example, D'Ignazio and Klein (2020) have proposed data collection, interpretation, and visualization practices to align data science with principles of gender equity and diversity. Scheuerman and Brubaker (2018) suggest a model of participatory design workshops to develop more trans-inclusionary algorithms. Theoretical developments, such as sex contextualism (Richardson, 2022), offer frameworks for systematically engaging in research on sex-related variation outside of a binary model, acknowledging contextual and pragmatic limitations on interpreting sex-tagged data, and testing plural hypotheses about the structure of sex and gender-related variation in relation to an outcome of interest. In some instances a solution may involve eliminating sex and gender variables, while in others it may mean a more precise, contextualized use of those variables. As discussions on race-related variables in clinical algorithms demonstrate, it is possible for the research community to productively come together around standards for the ethical use of a contested social category in algorithmic research in a way that advances the rigor and precision of the use of these categories in biomedicine.
In summary, we conclude that it is reasonable to anticipate that the current pursuit of sex-stratified precision medicine platforms using AI-informed tools will contribute to the acceleration of unjustified biological sex essentialist assumptions and frameworks in medicine, from data collection, to the laboratory and computational analysis, to the clinic, to health policy and economics, health-adjacent wellness discourses and consumer devices, and ultimately legal and folk understandings of sex as a biological category. In these ways, the naturalization of sex as a predictor of risk in machine learning and related precision medicine approaches is poised to repeat the mistakes of other uncritical uses of social categories, perpetuating crude ontologies of sex and gender that are ill-suited to both precision and health equity.
Supplemental Material
sj-docx-1-bds-10.1177_20539517251381674 - Supplemental material for Sex in the medical machine: How algorithms can entrench bioessentialism in precision medicine
Supplemental material, sj-docx-1-bds-10.1177_20539517251381674 for Sex in the medical machine: How algorithms can entrench bioessentialism in precision medicine by Kelsey Ichikawa, Marion Boulicault, Alex Thinius, Marina DiMarco, Audrey R Murchland, Ben Maldonado, Abigail S Higgins and Sarah S Richardson in Big Data & Society
Footnotes
Acknowledgements
The authors would like to thank the members of the Harvard GenderSci Lab, particularly Rory Brinkmann and Kai Jillson, for their invaluable contributions to this research project. Kendra Albert, Solon Barocas, Seetha Davis, Gillian Einstein, Kadija Ferryman, Nancy Krieger, David Jones, and Brian Liu offered detailed comments on earlier drafts of the paper. We are also grateful to the many Alzheimer’s researchers and computational scientists who shared their time and expertise.
ORCID iDs
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Robert Wood Johnson Foundation and the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (grant number: 79892, 019.221SG.009).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
