Abstract

Surrogate markers, which are biomarkers or physiological measures intended to substitute for clinical outcomes, are commonly used in clinical trials as endpoints because they can be modified earlier by treatment and thereby allow for shorter, smaller, and less costly studies. 1 However, surrogate markers often lack strong correlations with clinical outcomes and may potentially overestimate treatment effects or not predict meaningful clinical benefits.2,3
While the use of surrogate markers as endpoints in industry-sponsored trials has drawn scrutiny, 4 federally funded clinical trials also shape the evidence base and influence scientific norms. The National Institutes of Health (NIH), the largest public funder of biomedical research globally, allocates an estimated $28 billion supporting clinical trials and related activities.5–7 NIH-sponsored studies are subject to different pressures and expectations than industry-sponsored trials, yet the extent to which they rely on surrogate markers, as opposed to clinical outcomes, remains poorly understood.
We identified all NIH-sponsored Phase 2–4 interventional clinical trials registered on ClinicalTrials.gov between January 1, 2006 and December 16, 2024, to characterize their primary and secondary efficacy endpoints and evaluate use of surrogate markers, exploring trends over time and variation by intervention type. Our sample was limited to trials with at least one publicly pre-specified primary efficacy endpoint on ClinicalTrials.gov; trials focused solely on feasibility, pharmacokinetics, or pathophysiologic mechanisms were excluded.
Data from study records as of December 16, 2024, were retrieved by one author (A.M.), who extracted all primary and secondary efficacy endpoints; a 10% subsample was cross-validated by a second author (S.Y.) and any uncertainties were discussed with a third author (J.S.R.) and resolved by consensus. Efficacy endpoints were classified into one of four mutually exclusive categories: clinical outcome, intermediate outcome, surrogate marker, and other, which included feasibility, toxicity/safety, diagnostic, and pharmacokinetic/pharmacodynamic endpoints, using an established framework. 8 Clinical outcomes were defined as direct measures of how a patient feels, functions, or survives. Intermediate outcomes were defined as measures of function or symptoms that do not directly reflect final health outcomes, such as behavioral, engagement, and adherence measures. Surrogate markers were defined as laboratory, imaging, or physiological indicators intended to substitute for clinical benefit; the Food and Drug Administration’s (FDA) table of surrogate endpoints was used as a reference. 9 Intervention type was determined using the structured “Intervention Type” field from ClinicalTrials.gov, which categorizes trials by the intervention modality (e.g. drug, behavioral, device, procedure). Descriptive statistics were conducted using R (version 4.4.1) to characterize efficacy endpoints by classification type and intervention category.
Among 1592 eligible clinical trials registered on ClinicalTrials.gov between January 1, 2006, and December 16, 2024, we identified 2707 unique primary efficacy endpoints (median = 1 per trial, interquartile range (IQR) = 1–2) and 6808 secondary endpoints (median = 3 per trial, IQR = 2–6). Of the 2707 primary endpoints, 667 (24.6%) were classified as clinical outcomes, 860 (31.8%) as intermediate outcomes, 728 (26.9%) as surrogate markers, and 452 (16.7%) as other endpoints. Among secondary endpoints, 2262 (33.2%) were clinical outcomes, 2016 (29.6%) intermediate outcomes, 1989 (29.2%) surrogate markers, and 541 (7.9%) other endpoints. Overall, 882 trials (55.4%) used at least one efficacy endpoint (primary or secondary) categorized as clinical outcomes, 763 (47.9%) intermediate outcomes, and 811 (50.9%) surrogate markers. Across trial start years, the proportion using endpoints from each category fluctuated yearly without a consistent pattern.
Among all primary and secondary endpoints classified as clinical outcomes (n = 3003), the most common were quality of life (n = 224, 7.5%), depressive symptoms (n = 198, 6.6%), and overall survival (n = 193, 6.4%). Similarly, the most common endpoints classified as intermediate outcomes (n = 2876) were weight, body mass index, or waist circumference (n = 198, 6.9%), smoking behavior or abstinence (n = 141, 4.9%), and physical activity (n = 127, 4.4%); the most common endpoints classified as surrogate markers (n = 2643) were progression-free survival (n = 165, 6.2%), laboratory biomarkers (n = 164, 6.2%), and objective response rate (n = 100, 3.8%); and the most common endpoints classified as other endpoints (n = 993) were adverse events or safety outcomes (n = 323, 32.5%), feasibility or implementation measures (n = 139, 14.0%), and pharmacokinetic parameters (n = 67, 6.7%).
Among these 1592 clinical trials, the most common intervention types were behavioral (n = 605, 38.0%), drug/biologic (n = 535, 33.6%), device/diagnostic/procedure (n = 223, 14.0%), and other (n = 229, 14.4%) (Table 1). The use of at least one clinical outcome efficacy endpoint was consistent but highest among device or diagnostic/procedure trials (132/223, 59.2%), followed by drug or biologic trials (304/535, 56.8%), behavioral trials (327/604, 54.1%), and other trials (119/229, 52.0%). In contrast, the use of at least one surrogate marker efficacy endpoint varied and was highest among drug or biologic trials (409/535, 76.4%), followed by device or diagnostic/procedure trials (129/223, 57.8%), other trials (96/229, 41.9%), and behavioral trials (177/604, 29.3%).
Outcome classification endpoints by intervention type in NIH-sponsored clinical trials (2006–2024).
Each cell shows the number and percentage of trials in a given intervention category that reported at least one endpoint of the specified type, based on primary or secondary efficacy endpoints.
In this analysis of NIH-sponsored clinical trials, more than half used at least one clinical outcome as a primary or secondary efficacy endpoint, and just over half used at least one surrogate marker. There were no observed changes in the pattern of endpoint use over time. While the use of clinical outcomes was consistent across intervention types, use of surrogate markers, intermediate outcomes, and other endpoint types differed, with surrogate markers used most commonly in trials involving drug or biologic interventions. In contrast, behavioral trials, the most common type of NIH-sponsored trial in our sample, relied more heavily on clinical and intermediate outcome endpoints.
Prior studies evaluating surrogate marker use, especially those assessing FDA approvals, have focused primarily on industry-sponsored trials and primary efficacy endpoints alone.2,10–12 Our study extends these analyses to all registered primary and secondary efficacy endpoints, offering a more comprehensive understanding of the type of evidence that NIH-sponsored trials aim to generate.
This study has several limitations. First, we analyzed registered endpoint data only and did not evaluate whether trials were initiated, completed, or published. Second, registered endpoints in ClinicalTrials.gov may not represent those reported in final publications. Finally, although we used a structured framework and independently reviewed to categorize endpoints, classification may be influenced by trial descriptions and reporting quality.
Our findings suggest that surrogate markers remain widely used as endpoints in NIH-sponsored trials. Future work could explore how outcome selection varies by therapeutic area, NIH institute, trial phase, or other characteristics, including trial-level combination of clinical and surrogate primary and secondary endpoints, and whether the use of surrogate endpoints ultimately aligns with meaningful clinical benefit.
Footnotes
Acknowledgements
The authors thank Kyungwan Hong, PharmD, PhD, for his feedback during project development and manuscript preparation. Dr. Hong is an employee of the National Institutes of Health and received no compensation for his effort and has no competing interests to disclose.
Author contributions
Mr A.M. had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. A.M. and J.S.R. contributed to the concept and design. All authors contributed to the acquisition, analysis, or interpretation of data. A.M. contributed to the drafting of the manuscript. All authors contributed to the critical revision of the manuscript for important intellectual content. A.M. contributed to the statistical analysis. J.S.R. contributed to the supervision.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Mr A.M. and Mr S.Y. have no potential conflicts of interest to report. Dr J.D.W. reported receiving grants from the National Institute on Alcohol Abuse and Alcoholism (K01AA028258, R01AA032254), Arnold Ventures, Johnson & Johnson (through the Yale Open Data Access Project), and the FDA (through the Yale University-Mayo Clinic Center of Excellence in Regulatory Science and Innovation) and receiving former personal fees from Hagens Berman Sobol Shapiro LLP and Dugan Law Firm APLC (as a former consultant) outside the submitted work. Dr R.R. currently receives research support from Arnold Ventures, The Greenwall Foundation, and Public Citizen for work related to the US FDA and previously received research support through Yale University from the Stavros Niarchos Foundation and FDA. She also receives an honorarium through the Reimagining America Fellowship with the Roosevelt Institute, as well as personal fees from Debevoise & Plimpton LLP as a consultant outside of submitted work. She previously received consultancy fees from the Johns Hopkins Bloomberg School of Public Health in 2022 for work funded by the Swedish International Development and Cooperation Agency, outside the submitted work. Dr J.S.R. currently receives research support through Yale University from Johnson and Johnson to develop methods of clinical trial data sharing, from the FDA for the Yale-Mayo Clinic Center for Excellence in Regulatory Science and Innovation (CERSI) program (U01FD005938), from the Greenwall Foundation, and from Arnold Ventures.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
