Sage Journals: Discover world-class research

Abstract

Background

Machine learning (ML) and artificial intelligence (AI) applications have increased across different stages of clinical research. Their use in clinical trials (CTs) has been discussed but not quantified.

Methods

A scoping review was conducted by searching PubMed, Embase (Ovid), and Scopus for CTs or protocols. The goal was to understand the extent of ML and AI applications in the design, conduct, and analysis of CTs. Screening was performed on Covidence, with GPT model support.

Findings

After title/abstract and full-text screening, 108 records were included; in some studies, AI/ML was applied across multiple stages. For the design, 20 studies involved advanced methods, six applied them to stratification, four to treatment selection during randomization, six to participant selection, two for outcome assessment, and two for site selection. Seven studies involved them in the collection and analysis of data from wearable devices, and one for monitoring. More commonly, AI/ML has been used at the analysis stage of 93 CTs; however, limitations in reporting trial objectives make it difficult to distinguish the purpose between primary and exploratory analyses.

Interpretation

This research identifies a serious mismatch between the potential and actual applications of ML in CTs. Considering the potential benefits of ML in CTs, such underuse could hinder the evolution of CTs toward faster and more efficient approaches.

Keywords

Machine learning artificial intelligence clinical trials design opportunities design conduct analysis reporting guidelines randomized controlled trials adaptive trial design

Introduction

Machine learning (ML) and artificial intelligence (AI) methods are gaining popularity across many sectors, including clinical research,¹ with numerous opportunities already identified in the literature.² They can reshape clinical settings, reduce complexity, lower risk of failure, and improve success and efficiency across all stages of clinical trials (CTs).¹ However, less is known about their current usage in these contexts; therefore, this review aims to identify the most common applications of these digital technologies and explore how they are implemented to provide insights for regulatory agencies and organizations planning to adopt them.

In the discussion paper “Using Artificial Intelligence & Machine Learning in the Development of Drug & Biological Products,” the US Food and Drug Administration (FDA) discusses the current and potential applications of AI and ML for interventional and non-interventional studies including aspects such as participants recruitment and selection for population enrichment, participant stratification, dose optimization, monitoring to improve adherence and retention, site selection, continuous data collection with devices, data management, analysis and real-time clinical endpoint assessment.³

Similarly, the European Medicines Agency (EMA) and the European Medicines regulatory network published a “Draft reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle” in December 2023, specifying that AI usage should be documented in the protocol along with their risk-benefit assessment following the ICH E6 guidelines and that the statistical analysis plan (SAP) must detail data handling and safety concerns.⁴ National agencies, such as the Italian Medicines Agency's (AIFA) Clinical Trials Office, have raised concerns about reliability, transparency, and security, particularly in the case of non-fully validated algorithms.⁵

Few reviews have examined AI-assisted tools in interventional⁶ or non-interventional trials,⁷ yet the growing interest has fueled discussions on their integration to improve the operational management, design, and conduct of CTs. A recent protocol aimed to review AI tools designed to optimize recruitment and retention,⁸ which were previously identified as key areas of interest.² For example, wearables allow continuous monitoring to identify dropout risk and predict events, enabling personalized interventions.²

AI-assisted procedures and ML-guided interventions have been explored for treatment selection and allocation, guided by reinforcement learning,⁹ and predictive modeling of responses or optimal doses.¹⁰ Advances have been made in patient-to-trial matching informed by real-world data, large language models (LLMs),¹¹ and in supporting prescreening and screening procedures with natural language processing (NLP); although these do not directly affect the design, they contribute to earlier planning phases.¹²

Some reviews have described how high-dimensional data from wearable devices can inform trial design.⁶ However, systematic evidence on AI/ML use for trial design, conduct, and analysis remains limited.^1,13 It is important to determine whether these methods have real applicability or if current interest is merely speculative.

The aim of this scoping review is to summarize the current AI and ML usage in CTs. We describe the frequency of usage, field of application, model type, and clinical area. By examining current applications, we assess whether these methods are impactful, with practical rather than theoretical benefits. Of special interest are: (1) Design, that involves patient selection and cohort composition (screening, matching to the trial), predictive modeling for stratification (optimal treatment allocation, automatic recruitment, cohort composition and biomarker identification for risk stratification); (2) Conduct, which regards activities after the start of the trial, with monitoring via devices to collect passive data aimed at increasing adherence, retention, safety, and at reducing adverse events; and (3) Analysis, for the primary or secondary/exploratory endpoints, if included in the main paper or study protocol, covering complex modeling, imaging analysis, feature selection, and outcome prediction. AI-assisted procedures and ML-guided interventions themselves are beyond the scope of this review.

Methods

Protocol and registration

The protocol was drafted following the PRISMA Extension for Scoping Reviews (PRISMA-ScR),¹⁴ available in Supplementary Table S1, and it was registered on the Open Science Framework (OSF) on October 16, 2024 (registration link: https://osf.io/8qb5n).¹⁵

Eligibility criteria

For this review, peer-reviewed papers published without time restrictions up to February 28, 2025, written in English, were considered. They had to refer to CTs or protocols conducted in humans, regardless of intervention type. To be selected, studies had to implement AI/ML methods in the design, conduct, or main analysis of the trial. Papers were excluded if they concerned non-interventional studies, such as cohort, cross-sectional observational, and retrospective studies. Specifically, post hoc and secondary analyses of earlier published CTs were excluded. Articles were excluded if they did not mention AI or ML; however, if there was an implicit reference to these methods, the full text was reviewed. For interventions, studies on robotic, remote-controlled, computer-assisted intervention, or those testing AI-assisted intervention or ML-based predictive algorithms as experimental arm versus a control arm were excluded. These do not fit review’s scope, since advanced methods are validated as interventions against gold standard procedures or devices. An additional exclusion criterion included lack of full-text and non-peer-reviewed articles.

Search strategy and selection criteria

Potential studies were identified by searching three public bibliographic databases, such as PubMed, Embase (via Ovid), and Scopus, up to February 28, 2025. The search strategy was based on the partitioning of the research question into three main concepts: type of study, usage of AI/ML, and stage of application. This facilitates the conversion of the search into translated search strings, which are then applied across the three databases using the Polyglot Review Accelerator (https://sr-accelerator.com/#/polyglot, accessed on February 19, 2024) to maintain consistency in the searches. The search strings and methodologies are provided in Supplementary Table S2.

The selection of citations was performed using strings built on the three main concepts. The filter on randomized controlled trials (RCTs) was performed using Cochrane validated strings from PubMed (https://work.cochrane.org/pubmed, accessed on February 19, 2024), AI/ML methods used terminology that recalled the most known advanced methods, and the stage of application was built on different specifications of stages considered appropriate based on FDA discussion papers, which include design (screening eligibility, patient enrollment, risk stratification pre- and post-randomization), conduct (monitoring, adherence, retention with continuous collection of data from wearable devices), and analysis (imaging, endpoint detection with prediction algorithms, wearables). The string was developed to be as comprehensive as possible.

The selection of citations and more details on the steps of title and abstract screening that involved the use of different GPT models, gpt-4-turbo-2024-04-09, gpt-4o-2024-08-06, and gpt-4o-mini-2024-07-18, along with the prompts, are available in Supplementary Table S3. The title and abstract screening was performed in two separate instances: first, on articles collected up to February 19, 2024, with gpt-4-turbo using individual calls; then, it was repeated on those of the first instance with additional articles collected up to February 28, 2025, with gpt-4o, using a batch application programming interface call performed with gpteasyr package.¹⁶ All articles chosen with at least one of the GPT models were included for full-text screening; the additional articles of the second instance did not undergo gpt-4-turbo screening again because of cost inefficiency. The first comparison was conducted on a sample of the included articles to refine the prompt. Once the prompt correctly identified the articles to include, it was tested on the full sample of 1,002 entries using different models, and a comparison with the gold standard was performed, as shown in Supplementary Table S4.

Data charting process and synthesis of results

Data charting was performed on Covidence, and variables of interest, such as country of conduct, study size, field of application, type of intervention, model type, and registration code, were extracted by a single reviewer. The results were summarized into tables with absolute and relative frequencies, grouped by the stage of application (design, conduct, and analysis), and more specific information related to the way AI/ML was applied was reported by each group.

Results

Selection of sources of evidence

The flow diagram is adapted from the PRISMA 2020 statement and reproduced with permission. As shown in Figure 1, references imported from Scopus (29,570), Embase (10,847), and PubMed (10,827) were loaded into Covidence, which removed 15,284 duplicates and marked 18,280 references as ineligible, leaving 17,624 articles for consideration. A sample of 1,002 articles was manually screened by the reviewers to define the gold standard, with the selection of 15 articles, which were used to develop a prompt for screening the titles and abstracts of references. The values of specificity and sensitivity were 0.9432 and 0.9333 for gpt-4-turbo and 0.9615 and 0.9333 for gpt-4o, respectively. The latter shows higher levels of both measures and higher cost efficiency; therefore, only the gpt-4o-2024-08-06 model with batch API execution was used to screen the remaining articles. The total error rate of GPT in the sample was 5.68% for gpt-4-turbo and 3.89% for gpt-4o, both lower than the measured error rates among human reviewers of 10.76% (95% CI: 7.43%–14.09%).¹⁷ This demonstrates higher performance and faster screening when using GPT models.

Figure 1.

PRISMA 2020 flow diagram of study selection.

The selected prompt and model were then applied to the remaining 16,622 articles, yielding 1,645 selected articles. These, together with the 15 articles initially selected manually, were subsequently screened in full-text. Summary data for each study included in the review are provided in Supplementary Table S5.

Literature analysis

The earlier publication of interest dates back to 2004, when these methods were employed in the exploratory analysis stage; at the time, these approaches were still commonly referred to as data mining.¹⁸ This was followed in 2012 by a study that involved the use of AI/ML in the analysis of electroencephalography.¹⁹ Only since 2020, there has been a real increase in the number of studies (Supplementary Figure S1). However, only 59 CTs were registered in ClinicalTrial.gov or national registries. Most of the studies were conducted in China (35 articles), followed by 27 in the USA, then six in the UK, four in Italy, and four in Spain; the full list is available in Supplementary Table S6, and a spatial distribution map in Supplementary Figure S2. Concerning the type of CT, 75 were single-center trials, 18 multicenter, 12 cluster trials, and three were protocols representing combinations of multiple subsequent trials, two in one case²⁰ and several so-called mini-trials in another,²¹ and one was a protocol for a project involving a pilot study followed by a CT.²² Regarding sample size, the studies had a median of 114 enrolled subjects, with a minimum of six²³ in a phase I oncology study with imaging and a maximum of 2,000,000 in the case of cluster trials with digital interventions using social media.²⁴ Neurology, Psychology and Mental Health, and Public Health emerged as primary fields of application. Behavioral interventions were common (43/108), followed by pharmacological ones in 28 studies. Study details and characteristics by application type are presented in Table 1.

Table 1.

Details and study characteristics by stage of application, only analysis, conduct and analysis (both C/A), design and analysis (both D/A), and only design.

Characteristic	Overall N = 108^a	Analysis N = 80^a	Both (C/A) N = 8^a	Both (D/A) N = 5^a	Design N = 15^a
Stage of study
Protocol	32 (29.6%)	19 (24%)	2 (25%)	3 (75%)	8 (53%)
Ongoing	3 (2.8%)	2 (2.5%)	1 (13%)	0 (0%)	0 (0%)
Completed	73 (67.6%)	59 (74%)	5 (63%)	1 (25%)	7 (47%)
Type of trial
Cluster trial	12 (11.1%)	2 (2.5%)	1 (13%)	1 (20%)	8 (53%)
Multicenter	18 (16.7%)	13 (16%)	1 (13%)	1 (20%)	3 (20%)
Single center	75 (69.4%)	64 (80%)	6 (75%)	1 (20%)	4 (27%)
Other	3 (2.8%)	1 (1.3%)	0 (0%)	2 (40%)	0 (0%)
Phase of trial
I	2 (1.9%)	2 (2.5%)	0 (0%)	0 (0%)	0 (0%)
II	2 (1.9%)	2 (2.5%)	0 (0%)	0 (0%)	0 (0%)
III	3 (2.9%)	2 (2.5%)	1 (12.5%)	0 (0%)	0 (0%)
Pilot	4 (3.7%)	3 (3.8%)	1 (12.5%)	0 (0%)	0 (0%)
Not applicable	97 (90%)	71 (89%)	6 (75%)	5 (100%)	15 (100%)
Field of application
Neurology	22 (20%)	19 (24%)	1 (11%)	0 (0%)	2 (13%)
Psychology/mental health	18 (17%)	13 (16%)	0 (0%)	4 (80%)	1 (6.7%)
Cardiology	12 (11%)	8 (10%)	1 (13%)	0 (0%)	3 (20%)
Public health	11 (9.3%)	4 (5.0%)	2 (25%)	1 (20%)	3 (20%)
Orthopedics	6 (5.6%)	5 (6.3%)	0 (0%)	0 (0%)	1 (6.7%)
Gastroenterology	4 (3.7%)	4 (5.0%)	0 (0%)	0 (0%)	0 (0%)
Gynecology	4 (3.7%)	4 (5.0%)	0 (0%)	0 (0%)	0 (0%)
Pediatric pulmonology/pulmonology	4 (3.7%)	4 (5.0%)	0 (0%)	0 (0%)	0 (0%)
Anesthesiology	3 (2.8%)	3 (3.8%)	0 (0%)	0 (0%)	0 (0%)
Geriatrics	3 (2.8%)	2 (2.5%)	1 (13%)	0 (0%)	0 (0%)
Vascular	2 (1.9%)	2 (2.5%)	0 (0%)	0 (0%)	0 (0%)
Other	11 (10%)	5 (6.3%)	3 (37.5%)	0 (0%)	3 (20%)
Type of intervention
Acupuncture	4 (3.7%)	4 (5.0%)	0 (0%)	0 (0%)	0 (0%)
AI intervention	4 (3.7%)	1 (1.3%)	0 (0%)	0 (0%)	3 (20%)
Behavioral	43 (40%)	25 (31%)	3 (37.5%)	5 (100%)	10 (67%)
Medical device	10 (9.3%)	6 (7.5%)	4 (50%)	0 (0%)	0 (0%)
Nutritional	1 (0.9%)	1 (1.3%)	0 (0%)	0 (0%)	0 (0%)
Pharmacological	28 (26%)	28 (35%)	0 (0%)	0 (0%)	0 (0%)
Surgical	6 (5.6%)	6 (7.5%)	0 (0%)	0 (0%)	0 (0%)
Other	12 (11.1%)	9 (11%)	1 (12.5%)	0 (0%)	2 (13%)
Study size
Less 50	26 (24.1%)	22 (28%)	1 (13%)	1 (20%)	2 (13%)
Between 51 and 100	27 (24.7%)	25 (31%)	2 (25%)	0 (0%)	0 (0%)
Between 101 and 200	28 (25.9%)	24 (30%)	3 (38%)	0 (0%)	1 (6.7%)
Between 201 and 1500	15 (13.8%)	7 (8.8%)	2 (25%)	3 (60%)	3 (20%)
Over 1501	12 (11.1%)	2 (2.5%)	0 (0%)	1 (20%)	9 (60%)
Is the RCT registered?	59 (54.6%)	39 (49%)	5 (62.5%)	4 (80%)	11 (73%)

n (%).

With regard to the primary question of the review, it was found that most of the studies used AI/ML only for analysis (80/108 articles), eight for both analysis and conduct,^25–32 five for both design and analysis,^20,21,^33–35 and 15 only at the design stage.^24,^36–49 They are reported by subcategories within Design, Conduct, and Analysis of CTs in Table 2.

Table 2.

Type of application found in the design, conduct, and analysis of CTs.

Applications	N (%)	Studies
Design	N = 20^a
Stratification	6 (30%)	^{24,37,39,40,46,48}
Participant selection (screening)	6 (30%)	^35,36,38, ^42–44
Adaptive arm selection	4 (20%)	^20,21,33,41
Outcome definition/assessment	2 (10%)	^34,49
Site selection (screening)	2 (10%)	^45,47
Conduct	N = 8
Collection of data from wearables	7 (87.5%)	^25,27, ^29–32 ^,70
Monitor	1 (12.5%)	²⁸
Analysis	N = 93^a
Analysis of outcome prediction	52 (56%)	–
Analysis of imaging	28 (30%)	–
Analysis of imaging and outcome prediction	5 (5.4%)	^19,56, ^71–73
Assessment of endpoints (device)	4 (4.3%)	^25–27 ^,32
Analysis of wearables and outcome prediction	2 (2.2%)	^29,74
Analysis of audio	1 (1.1%)	⁷⁵
Evaluation of intervention effect	1 (1.1%)	³⁴

n (%).

For studies that involved ML at both the conduct and analysis stages, the collection of data from wearable devices was the most common application, followed by the analysis of collected high-dimensional data. Of relevance, the use k-nearest neighbors classifier to evaluate the differences in the training effect of virtual reality,²⁶ the collection of data with the Acumen IQ device, both monitoring and collection of secondary endpoint data, as well as an intervention through an ML-guided Hypotension Prediction Index.²⁷ Additional predictive modeling was developed to estimate hemoglobin worsening,³⁰ hospital readmissions,³¹ the uptake of physical activity behavior,³² and the risk in the context of transplant.²⁹

With regard to the use of models at the design stage to guide arm selection, there was a two-phase trial, with the first phase used for the development of treatment assignment scores and the second phase for allocating to the treatment with a higher probability of being optimal.⁴¹ Similarly, but with additional predictive modeling, a protocol of an AI-adaptive trial with 12 mini-trials was found to allow for optimization of allocation ratios; however, no details of the AI algorithm were provided.²¹ Another study involved reinforcement learning for adapting arm and treatment selection in the third one,³³ and the last aimed to develop an optimal treatment rule in the second and third trials, to test it through unequal treatment allocation.²⁰

In many cluster trials, it is challenging to clearly distinguish between intervention and design because the models can actively determine when and how the intervention occurs, as in adaptive intervention for malaria control,³³ and for identifying tuberculosis hotspots.⁴⁴ In some cases, it can also impact outcomes, such as in the Contrast Risk study, where the clinical decision support system predicts patient risk, guides follow-up decisions, and informs interventions with optimized fluid recommendations.³⁶

In different studies, ML algorithms were used for screening to identify the risk level for enrolling high-risk patients and then stratifying by predicted risk;^35,36 additionally, they were used for site selection.⁴⁷ Site selection was performed in another study using NLP to identify cases of interest among imaging reports from primary care practices.⁴⁵ Other screening processes involved real-time speech processing,³⁸ a prediction model applied daily to EHRs (electronic health records) for the identification of high-risk patients,⁴² application to CT scans for predicting the coronary artery calcium (CAC) score and identifying high-risk groups through imaging,⁴³ and screening individual X-rays with detection software guided by AI.⁴⁴

Risk prediction was also used for stratification; in one study, it was performed across sites to measure the primary outcome on subgroups.⁴⁰ In another, it was used to identify high-risk patients in both arms, with an additional diagnostic test in the interventional arm.⁴⁸ In a stepped wedge trial, it was applied at the patient level to inform clinicians randomized to weekly emails and reminders to improve attention to high-risk patients and usual cases, and for subgroup analysis.³⁹ In an educational cluster-randomized trial, a clustering algorithm was used to group the target population to guide the choice of educational content.²⁴ Similarly, clustering and classification were used for stratification on EEG (electroencephalography) profiles.³⁷ Of interest, a trial testing an assessment rule guided by AI used the derived scores to select high-risk subjects for follow-up outcome assessment.⁴⁶

In summary, regarding the design, AI/ML was used for cohort stratification pre- or post-randomization in six studies and during the randomization phase in four studies to optimize treatment allocation. Additionally, two studies used participant selection with imaging algorithms, four used predictive models, two were specifically for site selection, and two defined or assessed the primary outcome. ML models were constructed using data collected from wearable devices (7) and monitoring (1).

More common was the application in the analysis stage, with 93 references found, particularly as exploratory analysis in 52 studies, if included in the protocol or main paper of the trial. In 41 studies, it was applied for primary analysis. Table 3 shows the details of the type of application in the analysis stage by primary and exploratory analyses. Imaging was mainly used for primary analysis in the evaluation of treatment effects using magnetic resonance imaging or ultrasound. ML involved in outcome prediction has its main application in exploratory analysis for the development of predictive models for the response and outcome.

Table 3.

Application of AI/ML in the analysis by primary, exploratory, or both.

	Type of analysis with AI/ML
	Primary	Exploratory	Total
Analysis of audio	1	0	1
Analysis of imaging	25	3	28
Analysis of imaging and outcome prediction	1	4	5
Analysis of outcome prediction	8	44	52
Analysis of wearables and outcome prediction	1	1	2
Assessment of endpoints (device)	4	0	4
Evaluation of intervention effect	1	0	1
Total	41	52	93

Regarding the transparency of AI/ML method reporting, the methods were specified in 86/108 articles. Convolutional Neural Networks were used in the context of imaging, as was Deep Learning, which was used in five studies for imaging and in two for predicting outcomes. Support Vector Machines or Support Vector Regression were applied to imaging analysis (1), combined with outcome prediction (4), and solely for predicting outcomes (4). Random Forests were also mainly used for predicting outcomes (8) and in one case in the context of imaging. In Supplementary Table S7, the specification and application of the ML algorithms for primary and exploratory analyses are summarized.

China appears to be the leading country; however, there is unclear reporting in eight studies^50–57 that used AI/ML for the primary analysis, because they combined both validation of deep learning imaging with studies that had an interventional (pharmacological, acupuncture) experimental arm with randomized patients, and three that used it in the exploratory analysis. Among all Chinese studies, only 10 of them have been registered as into ClinicalTrial.gov^58–61 and in China's national registry.^62–67

The primary finding is the limited number of studies included for data extraction: only 108 of 17,624 unique studies (duplicates and non-RCTs removed). This suggests that the use of AI/ML is still quite restricted and primarily involves partial contributions to classic statistical analyses. The increasing number of published protocols signals growth; however, CT repositories were not screened for the identification of additional studies, and only studies reported in English were considered, which may have led to some underreporting.

Discussion

Overall, AI/ML techniques are often referred to by their commercial names, making it difficult to clearly identify them as ML models. Additionally, their role is rarely specified, making it unclear whether they are used in primary, secondary, or exploratory analysis. This lack of transparency highlights the need for regulatory guidance, as these methods are linked not only to operational complexity but also to regulatory constraints. Therefore, they are more common in secondary and exploratory analysis.

In the absence of formal standards and to mitigate the lack of transparency in reporting the AI/ML methods in CTs, a minimum requirement for authors should be adherence to the GCP ICH E6, with consideration of the EMA reflection paper.⁴ In addition, reports and papers should at least include in the study protocol or as part of the CT data and documentation the following: (i) specify the context of usage (COU) of the algorithm in the trial (Design, Conduct, Analysis) in the abstract, report in the Introduction and Background if there is literature supporting the usage of these methods in the design and add details on the COU and the implications on decisions, (ii) describe the algorithm by motivating the model choice based on COU objectives (if in-house model, then specify all the process of model development with full description of the data, if commercial model, specify its commercial name, versioning and documentation references), (iii) specify performance measures of validation of frozen qualified model, (iv) ensure transparency and reproducibility by publishing the model in a accessible repository, by documenting the pipeline and database before lock, by specifying that non-pre-specified modification should be considered post hoc.

To promote standardized reporting, we developed a draft operational extension of the CONSORT-AI and SPIRIT-AI guidelines,⁶⁸ aimed at guiding the reporting of AI/ML methods in design, conduct, and analysis (Table 4).

Table 4.

Structured checklist for reporting the AI/ML methods in CTs, aligned with the CONSORT-AI and SPIRIT-AI.

Item	Section	MLT in CT item
0	Detail the context of usage (COU)	Specify in detail the COU of ML among: - Design □ Participant selection □ Stratification □ Treatment allocation □ Outcome definition □ Outcome assessment □ Site selection □ Other: __________________ - Conduct: □ Collection of data from devices □ Monitoring □ Other: __________________ - Analysis: □ Primary analysis □ Exploratory analysis □ Other: __________________ - Details on the analysis: □ Imaging processing □ Wearable processing □ Other devices processing (eg. Audio) □ Outcome prediction □ Evaluation of intervention effect □ Other:__________________________
1	Title and abstract	(i) State the intended use of the AI/ML in the design, conduct or analysis within the abstract.
Introduction
2	Background and objectives	(i) specify if there is literature supporting the usage of these methods in the design. (ii) specify in detail the COU and the implications on decision-making. (iii) define the primary and secondary objectives and how AI/ML relates to them.
Methods
3	Trial design	(i) Define how AI/ML influences the design. (ii) Document pre-specified rules and threshold.
4	Participants	(i) Define how the AI/ML influences the participant selection and enrollment. (ii) Define variables required by the model and the source of data. (iii) Define how monitoring is performed.
5	Interventions	(i) Define if the model influences the strategy/dosing and describe rules and thresholds.
6	Outcomes	(i) Define how outcomes have been obtained and their evaluation.
7	Sample size	(i) Report simulations on implications of type I error and power if applicable.
8	Sequence generation
9	Allocation mechanism	(i) specify if AI/ML influences stratification, allocation probabilities, and blinding.
10	Implementation
11	Blinding	(i) specify who can access the AI/ML output.
12	Statistical methods	(i) Motivate and name the model choice based on COU objectives. (ii) Specify how AI/ML is involved in the analysis. (iii) Specify performance measures of internal or external validation. (iv) If necessary, provide simulation scenarios of different configurations. (v) Specify the output with the model, and document the worst-case handling scenario. (vi) Describe how errors are handled and identified.
Results
13	Participant flow (diagram)	(i) Specify the participant flow and reason for exclusion/inclusion for screening performed by AI/ML
14	Recruitment	(i) Specify how the ML are involved in the recruitment.
17	Outcome and estimation	(i) Distinguish if the outcome is primary or derived by AI/ML (imaging, wearable, audio, etc.). (ii) If derived, provided complete pipeline and documentation.
18	Ancillary analysis	(i) Specify how AI/ML is involved in subgroup definition and explorative objectives.
Discussion
20	Limitations	(i) State the limitations of AI/ML integration in the trial.
23	Registration	(i) Publish model in an accessible repository. (ii) Document the frozen pipeline and database before lock. (iii) Specify that any non-prespecified modification should be considered post hoc.
24	Protocol

Of interest, 92 articles were excluded at full-text review because they evaluated ML-integrated interventions, such as real-time monitoring, personalized alerts and therapies, AI-integrated chatbots, and prediction models for improving adherence via wearables. An additional 240 studies focused on AI interventions. The high number of excluded references, involving proceedings papers and congress abstracts, indicates that although applications are increasing, they remain at an early stage. Adoption in clinical settings depends on meeting regulatory requirements, and hesitation about compliance leads to delays in approval and implementation standards.⁶⁹

Indeed, critical gaps in the integration of AI/ML into CTs may stem from scalability and regulatory compliance issues, which are pressing concerns for healthcare providers and policymakers. The lack of standardized reporting and the challenge of validating AI/ML algorithms for clinical use can hinder their widespread adoption. Indeed, a spin-off of this review would be a synthesis of existing practices and identified gaps, contributing to the definition of a regulatory framework to develop robust guidelines that facilitate the safe and effective integration of AI/ML technologies into clinical research and practice.³ Indeed, bridging these gaps could significantly accelerate the transition from experimental to routine clinical applications, addressing urgent healthcare challenges such as improving CT efficiency, enhancing the accuracy of diagnostic and prognostic tools, and reducing healthcare disparities through more personalized data-driven care.¹⁰

Broader methods of patient-to-trial matching or trial success prediction were not readily found in the review, as they are mainly used by private companies and are either not mentioned or described using their commercial names, making it difficult to identify them as AI/ML in published CT results. The impact of such methods will become clearer in the future as they move from development to application.

Conclusions

The application of AI/ML in the design, execution, and analysis of CTs is still in its early stages. The use of commercial acronyms instead of detailed descriptions of the methods, along with vague explanations of their contributions to different trial phases and goals, has hindered the evaluation of their implementation. Although interest in this area is growing and more techniques, especially in risk modeling and prediction, are being developed, many are tailored to specific studies and lack scalability, which delays their broader application. Therefore, it is essential to transparently report the use of AI/ML, clearly defining their role from the initial protocol stage to the final publication, as suggested in the developed checklist.

Supplemental Material

sj-pdf-1-dhj-10.1177_20552076251393272 - Supplemental material for Current applications and future challenges of machine learning and artificial intelligence in clinical trials: A scoping review

Supplemental material, sj-pdf-1-dhj-10.1177_20552076251393272 for Current applications and future challenges of machine learning and artificial intelligence in clinical trials: A scoping review by Ajsi Kanapari, Giulia Lorenzoni, Honoria Ocagli and Dario Gregori in DIGITAL HEALTH

Footnotes

Acknowledgements

This research was carried out within the framework of the PhD Programme G.B. Morgagni at the University of Padua, Cycle XXXIX, Fellowship No. 8682. The fellowship was co-funded by Zeta Research s.r.l. and the European Union—Next Generation EU, under the National Recovery and Resilience Plan (PNRR), Mission 4 Component 1, Investment 4.1—Ministerial Decree No. 117/2023, I.3.3 “PNRR Innovative PhD fellowships addressing the innovation needs of enterprises.”

ORCID iDs

Ajsi Kanapari

Giulia Lorenzoni

Honoria Ocagli

Dario Gregori

Contributorship

Conceptualization: DG and GL. Data curation: AK and GL. Design and methodology: AK, GL, and HO. Investigation: AK. Supervision: GL. Writing—original draft: AK. Writing—review and editing: GL and DG. All authors have read and agreed to the published version of the manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

The characteristics of the studies, search strings, and prompts are available in the Supplementary Material.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the authors used GPT 4o in order to assess appropriate clarity and language of the manuscript (e.g. grammar, spelling, and style). After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Guarantor

DG.

Supplemental material

Supplemental material for this article is available online.

References

Weissler

Naumann

Andersson

, et al. The role of machine learning in clinical research: transforming the future of evidence generation. Trials 2021; 22: 537.

Askin

Burkhalter

Calado

, et al. Artificial intelligence applied to clinical trials: opportunities and challenges. Health Technol 2023; 13: 203–213.

U.S. Food & Drug Administration . Using Artificial intelligence & machine learning in the development of drug & biological products, https://www.fda.gov/media/167973/download (2025, accessed 4 April 2025).

European Medicines Agency . Reflection paper on the use of artificial intelligence (AI) in the medicinal product lifecycle, https://www.ema.europa.eu/en/documents/scientific-guideline/draft-reflection-paper-use-artificial-intelligence-ai-medicinal-product-lifecycle_en.pdf (2023, accessed 18 July 2024).

AIFA Italian Medicines Agency . Guide to the submission of a request for authorisation of a clinical trial involving the use of artificial intelligence (AI) or machine learning (ML) systems, https://www.aifa.gov.it/documents/20142/871583/Guide_CT_AI_ML_v_1.0_date_24.05.2021_EN.pdf (2021, accessed 18 July 2024).

Han

Acosta

Shakeri

, et al. Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. Lancet Digit Health 2024; 6: e367–e373.

Lam

TYT

Cheung

MFK

Munro

, et al. Randomized controlled trials of artificial intelligence in clinical practice: systematic review. J Med Internet Res 2022; 24: e37188.

Chen

, et al. Artificial intelligence tools for optimising recruitment and retention in clinical trials: a scoping review protocol. BMJ Open 2024; 14: e080032.

Zhao

Zeng

Socinski

, et al. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics 2011; 67: 1422–1433.

10.

Truong

ATL

Tan

S-B

Wang

, et al. CURATE.AI-assisted dose titration for anti-hypertensive personalized therapy: study protocol for a multi-arm, randomized, pilot feasibility trial using CURATE.AI (CURATE.AI ADAPT trial). Eur Heart J Digit Health 2024; 5: 41–49.

11.

Unlu

Varugheese

Shin

, et al. Manual vs AI-assisted prescreening for trial eligibility using large language models—a randomized clinical trial. JAMA 2025; 333(12): 1084–1087.

12.

Idnay

Dreisbach

Weng

, et al. A systematic review on natural language processing systems for eligibility prescreening in clinical research. J Am Med Inform Assoc JAMIA 2021; 29: 197–206.

13.

Kolluri

Lin

Liu

, et al. Machine learning and artificial intelligence in pharmaceutical research and development: a review. AAPS J 2022; 24: 19.

14.

Tricco

Lillie

Zarin

, et al. PRISMA Extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med 2018; 169: 467–473.

15.

Kanapari

, et al. The Application of Machine Learning and Artificial Intelligence Methods in the Design of Clinical Trials: A Scoping Review Protocol. OSF. October 16 2024; Web. doi:https://doi.org/10.17605/OSF.IO/8QB5N

16.

Lanera

. gpteasyr: a basic and simple interface to OpenAI’s GPT API. R package version 0.5.0, https://corradolanera.github.io/gpteasyr/authors.html#citation (2024, accessed 2 April 2025).

17.

Wang

Nayfeh

Tetzlaff

, et al. Error rates of human reviewers during abstract screening in systematic reviews. PLoS One 2020; 15: e0227742.

18.

Minamino

Jiyoong

Asakura

, et al. Rationale and design of a large-scale trial using nicorandil as an adjunct to percutaneous coronary intervention for ST-segment elevation acute myocardial infarction: Japan-working groups of acute myocardial infarction for the reduction of necrotic damage. Circ J 2004; 68: 101–106.

19.

Graversen

Olesen

, et al. The analgesic effect of pregabalin in patients with chronic pain is reflected by changes in pharmaco-EEG spectral indices. Br J Clin Pharmacol 2012; 73: 363–372.

20.

Benjet

Kessler

Kazdin

, et al. Study protocol for pragmatic trials of Internet-delivered guided and unguided cognitive behavior therapy for treating depression and anxiety in university students of two Latin American countries: the Yo Puedo Sentirme Bien study. Trials 2022; 23(1): 450.

21.

Huckvale

Hoon

Stech

, et al. Protocol for a bandit-based response adaptive trial to evaluate the effectiveness of brief self-guided digital interventions for reducing psychological distress in university students: the Vibe Up study. BMJ Open 2023; 13(4):e066249.

22.

De Fabritiis

Trisolini

Bertuletti

, et al. An internet-based multi-approach intervention targeting university students suffering from psychological problems: design, implementation, and evaluation. Int J Environ Res Public Health 2022; 19(5): 2711.

23.

Pouessel

Ken

Gouaze-Andersson

, et al. Hypofractionated stereotactic re-irradiation and anti-PDL1 durvalumab combination in recurrent glioblastoma: STERIMGLI phase I results. Oncologist 2023; 28: 825–826.

24.

Rocha

Pinheiro

de Paula Monteiro

, et al. Adaptive content tuning of social network digital health interventions using control systems engineering for precision public health: cluster randomized controlled trial. J Med Internet Res 2023; 25: e43132.

25.

Dorsch

Thomas

, et al. SIRRACT: a multi-center, international, randomized clinical trial using wireless technology to affect outcomes during acute stroke rehabilitation Enabled by Wireless Sensing. . Neurorehabilitation and Neural Repair 2014. doi:https://doi.org/10.1177/1545968314550369

26.

Sokołowska

Świderski

Smolis-Bąk

, et al. A machine learning approach to evaluate the impact of virtual balance/cognitive training on fall risk in older women. Front Comput Neurosci 2024; 18: 1390208.

27.

Beyls

Lefebvre

Mollet

, et al. Norepinephrine weaning guided by the hypotension prediction index in vasoplegic shock after cardiac surgery: protocol for a single-centre, open-label randomised controlled trial—the NORAHPI study. BMJ Open 2024; 14: e084499.

28.

Kladny

Glatz

Lieberum

, et al. Supine positioning for graft attachment after Descemet membrane endothelial keratoplasty: a randomized-controlled trial. Am J Ophthalmol 2023; 263: 117–125.

29.

Murray

Foroutan

Amadio

, et al. Remote mobile outpatient monitoring in transplant (Reboot) 2.0: protocol for a randomized controlled trial. JMIR Res Protoc 2021; 10(10): e26816.

30.

Patel

Polsky

Small

, et al. Predicting changes in glycemic control among adults with prediabetes from activity patterns collected by wearable devices. npj Digit Med 2021; 4(1): 172..

31.

Patel

Volpp

Small

, et al. Using remotely monitored patient activity patterns after hospital discharge to predict 30 day hospital readmission: a randomized trial. Sci Rep 2023; 13(1): 8258.

32.

Mendoza

Haaland

Jacobs

, et al. Bicycle trains, cycling, and physical activity: a pilot cluster RCT. Am J Prev Med 2017; 53: 481–489.

33.

Zhou

Lee

M-C

Atieli

, et al. Adaptive interventions for optimizing malaria control: an implementation study protocol for a block-cluster randomized, sequential multiple assignment trial. Trials 2020; 21(1): 665.

34.

Liu

, et al. Personalized stress optimization intervention to reduce adolescents’ anxiety: a randomized controlled trial leveraging machine learning. J Anxiety Disord 2025; 110: 102964.

35.

Weinstock

Bishop

Bauer

, et al. Design of a multicenter randomized controlled trial of a post-discharge suicide prevention intervention for high-risk psychiatric inpatients: the Veterans Coordinated Community Care Study. Int J Methods Psychiatr Res 2024; 33: e70003.

36.

James

Har

Tyrrell

, et al. Clinical decision support to reduce contrast-induced kidney injury during cardiac catheterization: design of a randomized stepped-wedge trial. Can J Cardiol 2019; 35: 1124–1133.

37.

Dagnino

Braboszcz

Kroupi

, et al. Stratification of responses to tDCS intervention in a healthy pediatric population based on resting-state EEG profiles. Sci Rep 2023; 13(1): 8438.

38.

Blomberg

Christensen

Lippert

, et al. Effect of machine learning on dispatcher recognition of out-of-hospital cardiac arrest during calls to emergency medical services: a randomized clinical trial. JAMA Netw Open 2021; 4: e2032320.

39.

Manz

Zhang

Chen

, et al. Long-term effect of machine learning-triggered behavioral nudges on serious illness conversations and End-of-life outcomes among patients with cancer: a randomized clinical trial. JAMA Oncol 2023; 9: 414–418.

40.

Takvorian

Bekelman

Beidas

, et al. Behavioral economic implementation strategies to improve serious illness communication between clinicians and high-risk patients with cancer: protocol for a cluster randomized pragmatic trial. Implement Sci 2021; 16(1): 90.

41.

Pigeon

Bishop

Bossarte

, et al. A two-phase, prescriptive comparative effectiveness study to optimize the treatment of co-occurring insomnia and depression with digital interventions. Contemp Clin Trials 2023; 132: 107306.

42.

Bull

Arendarczyk

Reis

, et al. Impact on all-cause mortality of a case prediction and prevention intervention designed to reduce secondary care utilisation: findings from a randomised controlled trial. Emerg Med J 2023; 0: 1–9. doi:https://doi.org/10.1136/emermed-2022-212908

43.

Sandhu

Rodriguez

Ngo

, et al. Incidental coronary artery calcium: opportunistic screening of previous nongated chest computed tomography scans to improve statin rates (NOTIFY-1 project). Circulation 2023; 147: 703–714.

44.

Zaidi

SMA

Mahfooz

Latif

, et al. Geographical targeting of active case finding for tuberculosis in Pakistan using hotspots identified by artificial intelligence software (SPOT-TB): study protocol for a pragmatic stepped wedge cluster randomised control trial. BMJ Open Respir Res 2024; 11: e002079.

45.

Wang

Knight

Demeshko

, et al. Integrated model of secondary fracture prevention in primary care (INTERCEPT): protocol for a cluster randomised controlled multicentre trial. BMC Prim Care 2024; 25: 349.

46.

Dublin

Greenwood-Hickman

Karliner

, et al. The Electronic Health Record Risk of Alzheimer’s and Dementia Assessment Rule (eRADAR) Brain Health Trial: protocol for an embedded, pragmatic clinical trial of a low-cost dementia detection algorithm. Contemp Clin Trials 2023; 135: 107356. doi: https://doi.org/10.1016/j.cct.2023.107356

47.

Kadota

Packel

Mlowe

, , et al. Pamoja Kundini (RKPK): study protocol for a hybrid type 1 randomized effectiveness-implementation trial using data science and economic incentive strategies to strengthen the continuity of care among people living with HIV in Tanzania. Trials 2024; 25: 114.

48.

Hill

Arden

Beresford-Hulme

, et al. Identification of undiagnosed atrial fibrillation patients using a machine learning risk prediction algorithm and diagnostic testing (PULsE-AI): study protocol for a randomised controlled trial. Contemp Clin Trials 2020; 99: 106191.

49.

Bilaver

Ariza

Binns

, et al. Design of the intervention to reduce early peanut allergy in children (iREACH): a practice-based clinical trial. Pediatr Allergy Immunol 2024; 35: e14115.

50.

Sun

Zhang

Yang

Y-M

, et al. Exploration of the influence of early rehabilitation training on circulating endothelial progenitor cell mobilization in patients with acute ischemic stroke and its related mechanism under a lightweight artificial intelligence algorithm. Eur Rev Med Pharmacol Sci 2023; 27: 5338–5355.

51.

Lou

. Machine learning algorithm-based analysis of efficacy of pulmonary surfactant combined with mucosolvan in meconium aspiration syndrome of newborns through ultrasonic images. Sci Program 2021; 2021: 1–7.

52.

Zheng

Lei

Bai

, et al. The curative effect of pregabalin in the treatment of postherpetic neuralgia analyzed by deep learning-based brain resting-state functional magnetic resonance images. Contrast Media Mol Imaging 2022; 2022: 2250621.

53.

Mei

Zhang

. Vomiting management and effect prediction after early chemotherapy of lung cancer with diffusion-weighted imaging under artificial intelligence algorithm and comfort care intervention. Comput Math Methods Med 2022; 2022: 1–11.

54.

Yin

Wang

. Evaluation of nursing effect of pelvic floor rehabilitation training on pelvic organ prolapse in postpartum pregnant women under ultrasound imaging with artificial intelligence algorithm. Comput Math Methods Med 2022; 2022: 1–13.

55.

Feng

Yuan

, et al. Classification algorithm-based fMRI images for evaluating the effect of yishen tiaodu acupuncture on the recovery period of cerebral infarction. Comput Intell Neurosci 2022; 2022: 1–9.

56.

Xie

Liu

, et al. Resting-state functional connectivity patterns predict acupuncture treatment response in primary dysmenorrhea. Front Neurosci 2020; 14: 559191.

57.

Fan

Pack

De Man

. A virtual imaging trial framework to study cardiac CT blooming artifacts. Proc. SPIE 12304, 7th International Conference on Image Formation in X-Ray Computed Tomography 2022: 16. https://doi.org/10.1117/12.2646407

58.

Wei

Meng

Guo

, et al. Hybrid exercise program for sarcopenia in older adults: the effectiveness of explainable artificial intelligence-based clinical assistance in assessing skeletal muscle area. Int J Environ Res Public Health 2022; 19(16): 9952.

59.

Lin

Huang

, et al. Comparison of diet and exercise on cardiometabolic factors in young adults with overweight/obesity: multiomics analysis and gut microbiota prediction, a randomized controlled trial. MedComm 2020 2025; 6: e70044.

60.

Wei

Meng

, et al. Self-determined sequence exercise program for elderly with sarcopenia: a randomized controlled trial with clinical assistance from explainable artificial intelligence: AI-assisted exercise for sarcopenic elderly. Arch Gerontol Geriatr 2024; 119: 105317.

61.

Guo

Cao

, et al. Quantifying the enhancement of sarcopenic skeletal muscle preservation through a hybrid exercise program: randomized controlled trial. JMIR Aging 2024; 7: e58175–e58175.

62.

Hong

Sun

Zhang

, et al. Neurological mechanism and treatment effects prediction of acupuncture on migraine without aura: study protocol for a randomized controlled trial. Front Neurol 2022; 13: 981752.

63.

Zhang

Liang

, et al. Network topology and machine learning analyses reveal microstructural white matter changes underlying Chinese medicine Dengzhan Shengmai treatment on patients with vascular cognitive impairment. Pharmacol Res 2020; 156: 104773.

64.

Wang

J-L

Wei

X-Y

, et al. Psychological and neurological predictors of acupuncture effect in patients with chronic pain: a randomized controlled neuroimaging trial. Pain 2023; 164: 1578–1592.

65.

Zhou

Deng

Zeng

, et al. Unconscious classification of quantitative electroencephalogram features from propofol versus propofol combined with etomidate anesthesia using one-dimensional convolutional neural network. Front Med Lausanne 2024; 11: 1447951.

66.

Zhu

Sun

, et al. Compared to histamine-2 receptor antagonist, proton pump inhibitor induces stronger oral-to-gut microbial transmission and gut microbiome alterations: a randomised controlled trial. Gut 2023; 73: 1087–1097.

67.

Pei

Yang

, et al. Acupuncture combined with cognitive-behavioural therapy for insomnia (CBT-I) in patients with insomnia: study protocol for a randomised controlled trial. BMJ Open 2022; 12: e063442.

68.

Liu

Rivera

Faes

, et al. CONSORT-AI and SPIRIT-AI: new reporting guidelines for clinical trials and trial protocols for artificial intelligence interventions. Invest Ophthalmol Vis Sci 2020; 61: 1617.

69.

Ahmed

Spooner

Isherwood

, et al. A systematic review of the barriers to the implementation of artificial intelligence in healthcare. Cureus 2023; 15: e46454.

70.

Sokolowska

. Novel machine learning and statistical learning approaches in neurology. Folia Neuropathol 2018; 56: 270.

71.

Zhang

Liang

72.

Yang

, et al. Assessment of rTMS treatment effects for methamphetamine addiction based on EEG functional connectivity. Cogn Neurodyn 2024; 18: 2373–2386.

73.

Feng

, et al. Oxytocin effects on the resting-state mentalizing brain network. Brain Imaging Behav 2020; 14: 2530–2541.

74.

Gilman

Schmitt

Potter

, et al. Identification of Δ9-tetrahydrocannabinol (THC) impairment using functional brain imaging. Neuropsychopharmacology 2022; 47: 944–952.

75.

Jaschke

Howlin

Pool

, et al. Study protocol of a randomized control trial on the effectiveness of improvisational music therapy for autistic children. BMC Psychiatry 2024; 24: 637.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.76 MB