Abstract
Background:
NIH requires NIH-funded studies to use historical race and ethnicity categories—originally put forth by the Office of Management and Budget in 1997—for demographics collection. These historical categories were only designed for use within the United States (US). We speculated on the adequacy of these categories in capturing the true diversity of participants enrolled in the Inherited Neuropathy Consortium (INC), and their applicability for an international, broader rare disease population.
Objectives:
To determine the feasibility and outcomes of using updated categories for rare disease patients that can be collapsed into the required historical categories.
Design:
This was achieved by expanding on existing government categories from countries with INC sites to create categories that reached 100% consensus of the research team. Quantitative cross-sectional analysis was performed in two cohorts.
Methods:
Common government census categories among the US, the United Kingdom, Italy, and Australia were used to generate updated demographic categories capturing racial, ethnic, sex, and gender identities. We piloted the updated categories at three INC sites with participants who were participating in the INC. We made a minor update and sent the survey to anyone who had joined the Rare Disease Clinical Research Network’s contact registry.
Results:
Both the pilot study and the contact registry saw an increase in diversity with the updated categories. The sex breakdown of the survey respondents was similar to that of the contact registry as a whole, but several participants were able to identify as nonbinary with the updated categories.
Discussion:
The updated categories allow researchers to provide a more inclusive race and ethnicity identification experience to participants. This may have implications for understanding differences in study populations that may translate to treatment response and has an overall aim to increase enrollment and adherence to observational research.
Plain language summary
The authors noticed that the race and ethnicity categories required for NIH-funded studies might not work well for participants living outside the U.S. To address this, they created new categories that include those used by other countries but can still be grouped to match the traditional NIH categories. These updated categories were first tested on a small group of patients enrolling in the Inherited Neuropathy Consortium. After adjustments, they were tested again using a survey sent to people registered in the Rare Disease Clinical Research Network contact registry. The updated categories revealed more diversity among respondents compared to the original categories. These new categories meet federal requirements for demographic data collection and provide better insights into diverse populations outside the U.S. This approach could help researchers address health disparities in clinical studies.
Introduction
There is evidence of poor representation of people from disadvantaged backgrounds in clinical research. 1 People with rare diseases are also considered “hard to reach,” and the intersection of both characteristics further compounds the issue. Categorization of people by race and ethnicity has a controversial history originating from the eugenics movement. 2 It has been a historically difficult endeavor, as races are not genetically distinct subspecies, and they are poor markers for underlying genotypic, structural, and cultural characteristics. 3 There is a recognition, however, that many societies are diverse and there is a call to ensure inclusion of people from different backgrounds in research for biomedical and socio-political reasons.3,4 While the collection of race and ethnicity data in clinical research studies can be controversial, it is recommended when examining health disparities in research, per National Academies guidelines. 5
The persistent underrepresentation of diverse populations in clinical trials undermines the generalizability of research findings. 6 Inclusivity is not just beneficial but is essential for conducting high-quality science. It is necessary for developing effective medical algorithms and risk models. This will ensure that health outcomes are more comprehensive and applicable across various genetic and cultural backgrounds. Achieving diversity in research participants from all ancestral, racial, ethnic, and cultural groups is essential for developing effective treatments and interventions. One example of this would be the findings from Yalcouyé et al., 7 which demonstrated that Charcot Marie Tooth (CMT) type 1A (CMT1A), which is by far the most common CMT subtype among Inherited Neuropathy Consortium (INC) participants, is actually quite rare in Africa, and suggests the possibility that protective factors among those of African ancestry could account for this. This could have implications for future CMT1A treatments.
Sex differences also exist in research engagement, with research participants historically more likely to be male, however with a trend toward more female participation. 8 Additional barriers exist for people who are transgender/nonbinary due to negative experiences of health care. 9 There is a social inclusion argument, but again biomedical rationale has been presented. Inclusion of people who have undergone gender reassignment has been highlighted as a specific need in pharmaceutical trials due to differences in pharmacokinetics between biological sexes (assigned at birth) and also for individuals receiving gender-affirming treatments. 10
The Inherited Neuropathy Consortium consists of 20 international centers who have followed over 8000 subjects with inherited peripheral neuropathies, collectively described under the umbrella of Charcot-Marie-Tooth disease. Pathogenic variants in more than 100 different genes are known to cause CMT, 11 which affects individuals of all ethnic and genetic sex backgrounds. 12
The INC is part of the Rare Disease Clinical Research Network (RDCRN), funded by NIH. NIH requires studies it funds to use their historical race and ethnicity categories for demographic data collection, which were devised by the Office of Management and Budget (OMB) in 1997 and have not been updated until recently. 13 The RDCRN maintains an international patient contact registry in which patients from the 20 different RDCRN consortia self-report their demographic data using the historical NIH categories. Ninety percent of these 3146 individuals describe themselves as White, with 60% identifying as female.
The INC conducts natural history studies to determine how various genetic subtypes of CMT progress over time in order to be “clinical trial ready” for emerging interventional studies. An international collaboration like the INC faces challenges, as the racial and ethnic categories collected by the countries of the consortium differ according to the socio-political status and the indigenous or immigrant groups within those countries.
The Diversity Committee of the INC aimed to harmonize the nationally recognized racial/ethnic categories for the four main countries represented in the consortium. Within the INC, approximately 90% of individuals in the database self-report as White, with around 60% female, nearly identical to the RDCRN Contact Registry. The authors were concerned that the INC was not acquiring data from a diverse group of participants. Without a diverse participant population, it would be impossible for the INC to determine whether there are racial or ethnic differences in how the disease presents and progresses, and whether disparities in diagnosis and access to care exist. Learning this information should lead to a more comprehensive understanding of the disease and may present an opportunity to create more effective treatments and therapies for all patients with CMT in the future. Further, a diverse participant population may lead to increased interest and confidence in future treatments or therapies among members of some racial and ethnic groups. 6
As an initial step to understand why researchers were not reaching a more diverse population, the historical demographic categories were re-examined. It was speculated that they may not accurately capture the diversity among an international population, as they were designed for use within the US only. They do not specifically recognize people with mixed racial and ethnic backgrounds and they force people of Hispanic ethnicity to also choose a racial identity that they may not feel is appropriate (White, for example). In addition to concerns about the accuracy of the historical categories in an international population, the authors intended to provide a more inclusive experience for research participants. For instance, the historical categories incorrectly use male and female as binary gender—rather than sex—categories without additional options to represent a spectrum. 14 When participants feel that their demographics are accurately represented in a study, the authors presume that it would lead to increased participant enrollment and retention in a long-term study. This project sought to determine the feasibility and outcomes of using updated categories for rare disease patients who represent an international cohort while maintaining the requirements of the historical categories.
Methods
Creation of INC-updated categories
From spring of 2020 to spring of 2021, nationally collected race and ethnicity government census categories were identified for the United States 15 and the United Kingdom. 16 The INC includes sites in Italy and Australia; however, these countries do not collect race or ethnicity data as part of their censuses, only nationality.17,18 Common categories were identified between each source, then grouped under broad categories and sub-categories. These categories, along with the INC’s original race, ethnicity, and gender categories, are available in the Supplemental Materials. Next, categories specific to one country were grouped under the broad and sub-categories. This early framework was presented to the INC diversity committee to reach consensus on categories and check appropriate wording, with the goal of staying as close as possible to the original published categories from each government.
The race categories were expanded to include Hispanic and Latino/a identities, rather than listing them as an ethnicity separate from race, as they are in the historical categories, consistent with the emerging guidance in 2021 when the finalized categories were created. 19 While race and ethnicity are social constructs, race refers to groupings based on physical appearance and is distinct from ancestry, whereas ethnicity involves shared culture, language, and traditions. 20 Latino/a refers to individuals from Latin American countries and includes Portuguese-speaking Brazilians, while Hispanic refers to those of Spanish-speaking origin, including Spain. 21 This clarification ensures a more accurate representation of these diverse populations.
Pilot study
For this point prevalence study, we selected three sites within the INC that are high-enrolling and located in areas where the local population is comprised of comparatively unique demographic breakdowns, with one outside of the US: Cedars-Sinai Medical Center, Los Angeles, California, University College London Queen Square Institute of Neurology, London, United Kingdom, and University of Iowa, Iowa City, Iowa.
The updated categories were utilized by the three selected study sites when completing demographic data collection on participants who were newly enrolling in the INC from January 1, 2022, through March 31, 2022. Any participant who was enrolling at one of the three sites as a new INC participant during this period was included in this pilot. The Cedars-Sinai site also included INC participants who were returning for a study visit. A power analysis for sample size was not performed, as this was intended to include as many participants as possible, provided they met the inclusion criteria.
Participants were asked to provide information about their identity using the historical categories and then were asked if they knew and/or were willing to provide any additional details in the updated categories.
Individual participants’ demographic information from the historical categories were compared to the updated categories. As there is no known racial or ethnic predisposition to having CMT, 12 enrolled subject demographics at each study site should somewhat match those of the corresponding local population. Therefore, pilot project demographic breakdowns at each site were also compared to the local demographics in each city to help determine whether we are truly reaching the target population.
Contact registry survey
Following the pilot study, the authors modified the updated race and ethnicity categories to (1) reduce the number of persons in the “multi-racial” category who are of multiple White European ethnicities; 22 (2) updated the term “American Indian” to “North American Indigenous” to reflect the inclusion of all North American Indigenous peoples, rather than only those residing within the United States 23 ; and (3) updated sex and gender categories to the data collection form per the recommendations of the Office of the Chief Statistician of the United States. 24 The former was achieved by streamlining our main categories to collapse into the historical categories from the US Management and Budget Office with the addition of a Middle Eastern/North African (MENA) category, while still maintaining our expandable sub-categories. 22
For an additional point prevalence study, we then created a brief survey in which participants were asked to provide their racial, ethnic, sex, and gender identities using the updated categories. Additional IRB approval was obtained (HawkIRB protocol #202211436), and the survey was sent to anyone who had previously agreed to be contacted for possible research when they joined the RDCRN contact registry. Study data were collected and managed using REDCap electronic data capture tools hosted at Yale University,25,26 and a link to the survey was included in the invitation email that all members of the contact registry received. When recipients opened the survey link, the initial screen displayed a letter containing elements of consent. Upon reviewing the letter, interested potential participants were taken to the survey. The initial invitation was sent on March 1, 2023. Two more invitations were sent on March 15 and March 29, and the survey remained open until April 12, 2023 (42 days total). Power analysis for sample size was not performed, as this was intended to include as many participants as possible as long as they met the inclusion criteria.
Results
Pilot project
Across the three sites, 43 participants were enrolled during the three-month pilot period.
Demographic information from participants enrolled at all three sites when using the historical categories versus the updated categories is outlined in the Supplemental Materials. When using the updated categories, the number of participants who identified as “White” decreased, and the number who identified as “mixed/multiple ethnic group” increased at all three sites. Four participants who initially identified as “other race and ethnic background” were placed into more descriptive categories. Comparisons of the demographics of enrolled subjects at each site versus local census data are provided in the Supplemental Materials.
RDCRN contact registry survey
The RDCRN contact registry at the time that the survey was completed (in March and April of 2023) included a total of 3146 registrants who agreed to be contacted. Registrants include persons with a rare disease, unaffected family members, unaffected volunteers, parents of children under 18 with rare disease, or parents of adults with rare disease who need assistance. Registrants represented 73 countries and all 50 states within the United States. When originally signing up for the registry, registrants had the option to complete a demographics form with the historical race, ethnicity, and sex categories. The racial and ethnic breakdown of registrants using the historical categories is provided in Figure 1.

Race/ethnicity of contact registry.
Eight-hundred twenty-seven of the 3146 registrants responded to the survey (response rate of 26.3%). Forty-three participants responded “no” to being a part of the RDCRN contact registry and were unable to answer further questions and three respondents submitted blank surveys, so these entries were removed. The total number of participants was 781, representing 24.8% of the contact registry. The breakdown of respondents’ racial and ethnic identities is directly compared to those in the historical categories in Figure 1. Approximately 90% of participants self-declared as being of White race using the historical categories, while this figure dropped to 76% using the updated categories.
The racial identity breakdown among survey respondents who selected “more than one race” (N = 80) is depicted in Figure 2. Using the historical categories, 3% of individuals self-selected as being in more than one race, which increased to 10% when using the updated categories.

Race and ethnicity selections of contact registry survey respondents who selected “More Than One Race” (N = 80).
Figure 3(a) provides the sex at birth of updated survey respondents compared to the historical contact registry. The breakdown is very similar in both cases. The gender identity breakdown among survey respondents is shown in Figure 3(b). When comparing individual entries, 13 respondents did not identify as cisgender, including six respondents who considered themselves nonbinary, four who considered themselves agender, and three with other identities.

(a) Sex assigned at birth of contact registry. (b) Gender identity of contact registry—updated version.
Discussion
Clearly defining demographic categories is crucial for fostering diversity and inclusion in clinical research, which in turn improves health outcomes by ensuring that treatments and interventions are both inclusive and representative. Rare genetic disorders may vary in their onset, severity, and natural history. Sex, 27 ethnicity, and race can offer clues to causation. 14 Therefore, natural history data should include patients from different sexes, ethnicities, and racial backgrounds to ensure that the data accurately reflects disease severity and progression. In addition, demographic data used to identify ethnic and racial backgrounds in subjects must be precise and up-to-date to ensure that the classifications of ethnicity and race are accurate. Our data demonstrate that the historical demographic data identified using NIH criteria may underestimate racial and ethnic diversity, which could limit the ability to interpret natural history studies and, ultimately, clinical trials. Furthermore, for people who do not identify as their sex assigned at birth, forcing them into selected sex categories may contribute to “minority stress” 28 and be considered a microaggression by the research team, 29 which can, in turn, affect the enrollment and retention of underrepresented groups.
The expansion of the demographic categories, as we have done in this study, has improved these limitations. For example, in both the overall RDCRN contact registry and the pilot, the percentage of those who identified as White dropped when using the updated categories. Using the updated categories, there was an increase in identifying as more than one race. Additionally, by collecting gender identity data in addition to sex, we were able to learn that 13 survey respondents did not identify as cisgender. Asking only about sex assigned at birth does not include this population.
Greater attempts to recruit diverse populations in research have become precedent. Since this project began, other groups have published inclusive race and ethnicity categories for use in rare disease research, and most are designed for use within the US.19,30 The United States Food and Drug Administration (FDA) published draft guidance on the collection of race and ethnicity data for industry in January 2024, which included the suggestion of using an updated set of NIH categories for studies on international patient populations. 31 The US OMB also released updates to the federal standards for race and ethnicity collection that took effect on March 28, 2024. 22 Furthermore, the UK’s National Institute for Health and Care Research (NIHR) has launched a Research Inclusion Strategy to ensure that funded research is inclusive. 4 The updated categories presented in this paper provide a suitable framework for collecting race and ethnicity data in international studies while satisfying the updated US federal standard, the FDA’s proposed requirements, and the NIHR’s Research Inclusion Strategy.
Several federal agencies have also published draft or final guidelines regarding the collection of sex, gender, and sexual orientation data in NIH-funded research since 2022.14,24,32 The sex and gender categories that were used for the RDCRN survey are not completely aligned with any of these recommendations.
Inclusive demographics data collection also allows researchers to identify populations on which they need to focus. 33 In order to achieve diverse trial enrollment, building trusting relationships with members of the population is tantamount, and this should begin long before there is a clinical trial. 6 Authentic community engagement can be resource-intensive, and the availability of detailed preliminary data may, in some cases, help justify allocation of the necessary resources for these efforts. Once the trial begins, asking participants to provide demographic information that doesn’t include their racial or ethnic identity and that clearly wasn’t designed for them may make recruitment more challenging, especially for those who are already skeptical. The ability to include and understand the needs of all affected individuals is especially important in trials for rare diseases, as the population may be so small that all diagnosed individuals should be included in any potential trial.
Going forward, these updated demographic categories will be applied to other consortia within the RDCRN to further determine whether our findings are specific to the INC or are translatable to other networks of rare diseases. It is expected that the increased diversity identified by the INC’s updated criteria will generate similar findings in more common disease populations. Further research analyzing the free responses to the question “Do you feel these categories capture accurate representation on your identity?” will offer revision based on community feedback. The authors recognize that many challenges remain in reaching more diverse populations, not only for research studies but also for clinical care. Nevertheless, these results are an important proof-of-concept for capturing diversity within study participants more comprehensively.
Limitations
There are several limitations to this project. Since the contact registry survey was anonymous, the authors were unable to directly compare the updated categories to the respondents’ historical categories for this cohort. The authors decided to maintain this anonymity in order to encourage survey completion, since some of the demographic information may be considered sensitive topics. 34 In addition, because survey respondents were not asked to identify the RDCRN consortium with which they are involved or their role (i.e., patient, caregiver), these data are not consortia-specific. Rather, the authors hope that this paper serves as a feasibility study to encourage wider use. As well, though it was not possible to prevent individuals from completing the survey more than once, given there was no incentive to complete the survey and that it was open for a short period of time, this was not likely a significant issue. Moreover, it appears that the “more than one race” category could have been confusing for some respondents, as some checked “no” to that question while selecting more than one race. It is possible that those individuals do not identify as multi-racial but consider their race and ethnicity to be separate, such as those who chose both “Jewish” and “White.” Another limitation of the updated categories is that if power analyses are performed in future studies, statistical power may be low due to the potentially small number of participants in some categories. Thus, it is helpful to be able to collapse the updated categories into the historical ones. A power analysis for sample size was not performed, as the study attempted to recruit every participant who met the inclusion criteria. Finally, it was not possible to include every possible identity across all nations. While our updated categories do increase the ability to capture the diversity of a study population, they are not all-encompassing.
Conclusion
The updated categories offer several additional advantages over the historical categories. The updated version allows participants to specify which races they identify with or to simply choose “more than one race” as a standalone identity. In contrast, while the historical categories allow respondents to select more than one race or ethnicity checkbox, they do not recognize “more than one race” as its own identity. Allowing respondents to choose one or more races, rather than creating specific checkboxes for the most common multi-racial identities (Black/White, for example), allows for inclusion of all individuals and their racial identities, rather than including the most common and potentially causing those whose identities are underrepresented to experience “minority stress.” 28 Another advantage of the updated categories is the inclusion of “Hispanic” in a combined race and ethnicity section, rather than as distinct from race. This change eliminates the need for those who identify solely as “Hispanic” or “Jewish” to choose a race with which they may not identify at all. By offering a more inclusive race and ethnicity identification experience to participants, researchers may improve their understanding of differences in study populations. This improved understanding may translate to increases in enrollment, adherence to observational research, and, eventually, improvements in treatment responses.
Supplemental Material
sj-docx-1-trd-10.1177_26330040251359676 – Supplemental material for Updated demographics categories to capture the true diversity of an international registry of rare disease patients
Supplemental material, sj-docx-1-trd-10.1177_26330040251359676 for Updated demographics categories to capture the true diversity of an international registry of rare disease patients by Nicole Kressin, Michael E. Shy, Tara Jones, Nidia Villalpando and Gita Ramdharry in Therapeutic Advances in Rare Disease
Footnotes
Acknowledgements
The authors would like to thank the members of the RDCRN and INC diversity committees for their ongoing support and feedback over the course of this project.
Declarations
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
