Sage Journals: Discover world-class research

Abstract

Background:

Active learning is a proposed method for accelerating the screening phase of systematic reviews. While extensively studied, evidence remains scattered across a fragmented body of literature.

Objective:

This scoping review investigates whether active learning is recommended for systematic review screening and identifies areas needing further research.

Design:

We screened 1887 records published since 2006 using ASReview, an active learning tool, and included 60 relevant studies. We also analysed 238 of 336 collected datasets for study design, dataset usage, and implementation.

Results:

All 60 studies recommended active learning as a means to improve screening efficiency. Despite some methodological heterogeneity, consistent endorsement was found across the literature.

Conclusion:

Active learning shows strong potential to support systematic review screening. Standardising evaluation metrics, encouraging open data practices, and diversifying model configurations are key priorities for advancing this field.

Keywords

Active learning artificial intelligence machine learning natural language processing simulation study simulation systematic review

1. Introduction

This article investigates the use of active learning in the context of systematic reviews. While systematic reviews are the gold standard for evidence synthesis, they can be characterised by considerable time and labour demands. Active learning is a machine-learning approach that has been widely tested in this context, yet results are fragmented across many studies. To bring clarity, we conduct a scoping review of simulation studies on active learning for systematic review screening, with the aim of mapping existing evidence, identifying gaps, and guiding future research.

The systematic review process involves manually searching databases for relevant literature, screening each record for relevance, and analysing the resulting collection of research. Despite these demands, the systematic review is a widely used and trusted form of research because of its methodological approach, complete data collection, and critical appraisal of available evidence. The need for workload reduction [1] has motivated a new field of meta-systematic review research [2], where research is aimed at increasing the efficiency with which systematic reviews can be performed to free up valuable time and resources. The traditional systematic review consists of many steps [3] and overviews of the entire systematic review software environment exist, for example, see [4, 5].

Among all stages of a systematic review, the screening phase is arguably the most time-consuming and labour-intensive. Active learning [6], as we show in this article, is especially appropriate in design and thus often employed to alleviate the burdensome process in this step. Active learning is already incorporated into numerous software tools [5,7 –18]¹, and studies are done assessing the performance of active learning using simulations. However, study designs, datasets used and active learning model compared vary widely across studies. The purpose of this article is to systematically review the existing literature on active learning applications in systematic reviewing.

Workload reduction in the screening phase is much sought after but proves to be a complex task to accomplish. The relevance of a paper is often complex and nuanced, as the research question being addressed in the systematic review may be broad or multifaceted, and it can be difficult to determine which studies are directly relevant to the question. Because of this complexity, a human reviewer is needed with a good understanding of the nuances of the research question and the criteria for relevance. Text classification machine learning algorithms are found to be an effective tool for assisting these human reviewers. By being trained on a dataset of human-labelled data and distilling the subtleties present in the data, these algorithms can replicate human nuance. As a result, the utilisation of machine learning algorithms is frequently attempted in research aimed at enhancing the efficiency of systematic reviews [20].

One research paper [21] identifies sub-processes within the systematic review process and provides commentary on which aspects of the process can be supported by various types of machine learning. The authors note that automatic classification poses a significant challenge for machine learning but suggest that active learning [6] can be used to address this challenge. Several researchers [21, 22] propose active learning as an effective method of application of machine learning on text-based systematic reviewing.

Active learning is a machine learning approach in which the model incrementally trains itself during the labelling process. Instead of relying on a large pretrained model or a static training set, it begins with minimal labelled data and continuously updates its predictions as new labels are obtained. The model actively selects the requests with the highest likelihood of being relevant, and requests human input. These new labels are immediately used to retrain the model, allowing it to improve in parallel with the screening task.

This setup enables the model to assist the reviewer early in the process, even before a large training set exists. With each new label, the model becomes more accurate in identifying which documents are probably to be relevant, and it prioritises these for human screening. In this cycle, both the algorithm and the human reviewer improve over time: the model becomes better at selecting useful records and the reviewer screens more efficiently. The ability to learn and support screening simultaneously makes active learning particularly well-suited for systematic reviews.

It is the nature of systematic reviews that makes the traditional training of machine learning algorithms for text classification of literature challenging. Systematic reviews address novel and evolving questions, and as a result, the labelled data required to train an algorithm to recognise relevance almost never exists. This absence of pre-collected data is of course, in part, what motivates the conduct of systematic reviews in the first place. Active learning addresses this constraint by initiating learning without requiring a large labelled dataset, making it a suitable method for optimising the screening process in systematic reviews.

Despite the advantages of incorporating active learning in the application of machine learning algorithms for the support of literature screening, and the numerous simulation studies that have been conducted on this topic, an overview of the usage and performance of active learning is currently lacking. As a result, the research in this field is scattered and there is a lack of consistency in research. From this field, one could not easily answer whether or not active learning is the answer to the challenges systematic reviewing brings.

The field of simulation studies for active learning-based systematic reviewing shows a lack of consistency and uniformity in study design and methodology. This includes variations in study design, dataset usage, the number of datasets utilised, the specific active learning models employed, the performance metrics applied, and more. As a result, it is difficult to draw broad conclusions or make comparisons between studies, hindering both the advancement of the field and the application of active learning in systematic review practice.

In light of the current state of the field, the main objective is to provide an overview of the performance of active learning for use during the screening phase of systematic review acceleration. We collect and present data extracted from simulation studies that deal with the performance and application of active learning in the acceleration of systematic reviews.

From the studies identified in our systematic search, the aim is to identify potential areas for future research and provide recommendations for future studies on active learning in systematic reviews. We extract information on currently standard study design, dataset utilisation and statistics, and machine learning applications in this field. Our study can serve as a reference point for anyone interested in simulating active learning performance and optimising their systematic reviews or active learning-aided software tools.

In following the outlined goals, this study aims to achieve several objectives. First, it analyzes the study designs of the included simulation studies to assess the scale and methodology employed while collecting data on the availability of machine learning source code. In addition, it gathers and presents information on the labelled datasets used to test performance, evaluating their accessibility and reproducibility. Finally, the study examines the use of active learning models, reporting on the evaluations of different applications across the selected studies. By integrating these elements, the research ultimately addresses the question of whether active learning should be recommended for accelerating systematic reviews.

2. Methodology

This review was reported in accordance with PRISMA. In the PRISMA flowchart found in Figure 1, the steps taken during this scoping review from data gathering to the final selection of included records are represented. The information extraction in this study is divided into three different categories; Study design, Datasets, and Models. Therefore, the methodology section of this article is divided into sections following those categories.

Figure 1.

PRISMA flowchart representing the workflow for the current study, along with the amount of literature found and discarded in each step.

We search the following databases for literature containing any variant of the term ‘systematic review’ in combination with the term ‘active learning’, published after 2005: Web of Science, Scopus, and EMBASE. The timeframe of “after 2005” was selected as the starting point for the analysis, as per a review by O’Mara-Eves [23], which identified the first documented application of techniques used for title and abstract screening in literature as occurring in 2006. The results show that this cutoff was enough; the earliest true simulation study found was done in 2010.

The database search used in our systematic review was performed first in September 2021 and repeated in March 2023 and August 2024. The exact search terms, along with other information, are available on the Open Science Framework as [24].

The screening of abstracts for potential relevance was done by two screening researchers using active learning, through the use of the open-source software package ASReview [14], on version 0.19 for the primary screening, versions 1.1.1 and 1.6.2 for the literature updates. The application of active learning has been shown to result in a significant decrease in total screening time, as demonstrated in [14, 25 –27]. This is achieved by prioritising relevant records at the beginning of the screening process, allowing for partial screening with full results, thus reducing the overall number of records requiring screening.

The algorithm employed for both the original search and the first literature update is an implementation of TF-IDF, Naïve Bayes, and a dynamic double resampling algorithm. The second update changed it is models based on simulation results found in [28], and opted for a retrieval optimised transformer, mxbai-embed-large-v1, together with a Random Forest classifier.

To initiate the active learning process, seven records known to be relevant and five randomly selected irrelevant records were identified and utilised as prior knowledge for the initialization of the active learning machine learning algorithm. For each literature update, the model is initiated using all relevant records from the previous searches combined with a random irrelevant record. Specifically, the first update uses records from the initial search, while the second update incorporates records from both the first and second searches.

A stopping heuristic of 50 consecutive irrelevant records was set as the criterion for terminating the screening process and proceeding to the next phase of the systematic review. The consecutive count of irrelevant records was shared between screening researchers, and inclusions and uncertain exclusions were evaluated by both screening researchers.

To be considered relevant, a record must describe a simulation study that tests the performance of machine learning for systematic review screening, with at least one of the simulations utilising active learning to apply the machine learning algorithm. In addition, the dataset used in the simulation must be of scientific nature, excluding materials such as legal documents or emails.

The first search yielded a total of 1290 articles in the first search and 504 in the search updates. The results from all three databases were combined and deduplicated based on DOI, title, and/or abstract in preparation for the screening phase. Following the stopping heuristic, screening was halted at 269 records out of the total 1290 (2~0%) for the initial screening phase and 91 of 223 (4~1%), 128 of 285 (4~5%) for the second screening phase. The performance of the first screening is depicted in Figure 2 and the search updates in Figure 3.

Figure 2.

In this graph for the screening phase of the current scoping review, the blue line represents the amount of relevant papers identified at a certain number of papers screening, a metric known as the recall. The dotted yellow line shows the moment the screening was halted, and the black line represents the hypothetical recall for random screening. The x-axis indicates the total dataset of records.

Figure 3.

Recall graphs for the first and second literature update phases of this scoping review: (a) Recall graph for the first literature update phase. (b) Recall graph for the second literature update phase.

Combining the searches, of the total 525 records screened, 106 were deemed to be relevant based on their abstracts. These 106 papers were further assessed for eligibility based on their full text. This resulted in a final selection of 55 records deemed to be relevant to the study in accordance with the established inclusion criteria. Additional rounds of screening were conducted through citation searching of the reference lists of the final inclusions, the snowballing and citation searching rounds. This led to the identification of an additional five records, bringing the total number of relevant records from 55 to 60, the final amount of relevant records.

2.1. Study design analysis

We search for simulation studies testing the performance of active learning in experiments or simulation studies to accelerate the screening phase in systematic reviews. From these papers, the analysis focuses on the study design of said papers. The papers were carefully reviewed, and the relevant data were organised into three separate tables. These tables can be found as supplementary materials in the persistent storage location²: Table 1 (paper and model information), Table 2 (dataset information), and Table 3 (linking datasets to their corresponding papers). From the identified papers, the following information was extracted:

Record title

Authors

Publication year

Number of datasets used for simulations

Was the dataset originally prelabeled

Which metrics are used to quantify the results

Is the used dataset reported?

– If so, is the dataset accessible

– If so, where is the dataset stored?

Is the used code reported?

– If so, is the code accessible

– If so, where is the code stored?

Table 1.

List of studies selected for inclusion in the literature review, sorted by publication year. The table includes the title, year of publication, and reference for each study.

Title	Year	Reference
Modelling annotation time to reduce workload in comparative effectiveness reviews	2010	[29]
Active learning for biomedical citation screening	2010	[30]
Semi-automated screening of biomedical citations for systematic reviews	2010	[31]
Who should label what? instance allocation in multiple expert active learning	2011	[32]
Deploying an interactive machine learning system in an evidence-based practice centre: abstrackr	2012	[33]
Virk: an active learning-based system for bootstrapping knowledge base development in the neurosciences	2013	[34]
Reducing systematic review workload through certainty-based screening	2014	[20]
Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review	2015	[35]
Active learning for the automation of medical systematic review creation	2015	[36]
Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening programme for systematic reviewers	2015	[37]
Topic detection using paragraph vectors to support active learning in systematic reviews	2016	[38]
Scalability of Continuous Active Learning for Reliable High-Recall Text Classification	2016	[39]
Technology-assisted review in empirical medicine: Waterloo participation in CLEF eHealth 2017	2017	[40]
Data balancing for technologically assisted reviews: Undersampling or reweighting	2017	[41]
A machine learning approach for semi-automated search and selection in literature studies	2017	[42]
CLEF 2017 technologically assisted reviews in empirical medicine overview	2017	[43]
Finding better active learners for faster literature reviews	2018	[44]
LIMSI@CLEF eHealth 2018 Task 2: Technology assisted reviews by stacking active and static learning	2018	[45]
A comparative analysis of semi-supervised learning: The case of article selection for medical systematic reviews	2018	[46]
Prioritising references for systematic reviews with RobotAnalyst: A user study	2018	[47]
Technology-assisted review in empirical medicine: Waterloo participation in CLEF eHealth 2018	2018	[48]
CLEF 2018 technologically assisted reviews in empirical medicine overview	2018	[49]
Finding all the needles in a haystack: A System to estimate the costs of e-discovery and systematic reviews	2018	[50]
CLEF 2019 technology assisted reviews in empirical medicine overview	2019	[51]
Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy	2019	[52]
FAST(2): An intelligent assistant for finding relevant papers	2019	[53]
Comparing word embeddings for document screening based on active learning	2019	[54]
Active learning in automated text classification: a case study exploring bias in predicted model performance metrics	2019	[55]
PNS335: Using Machine Learning for Efficiency Improvements in Systematic Literature Reviews of Clinical Efficacy and Safety	2019	[56]
Towards Question-based High-recall Information Retrieval	2020	[57]
Evaluating sentence-level relevance feedback for high-recall information retrieval	2020	[58]
Statistical stopping criteria for automated screening in systematic reviews	2020	[59]
Deep Neural Network for Reducing the Screening Workload in Systematic Reviews for Clinical Guidelines: Algorithm Validation Study	2020	[60]
Automatic document screening of medical literature using word and text embeddings in an active learning setting	2020	[61]
APS: An Active PubMed Search System for Technology Assisted Reviews	2020	[62]
Risk and Protective Factors in the COVID-19 Pandemic: A Rapid Evidence Map	2020	[63]
Machine learning for screening prioritisation in systematic reviews: comparative performance of Abstrackr and EPPI-Reviewer	2020	[13]
An evaluation of DistillerSR’s machine learning-based prioritisation tool for title/abstract screening – impact on reviewer-relevant outcomes	2020	[64]
SWIFT-Active Screener: Accelerated document screening through active learning and integrated recall estimation	2020	[65]
Semi-Supervised Text Classification Framework: An Overview of Dengue Landscape Factors and Satellite Earth Observation	2020	[66]
AI-Assisted systematic reviewing: selecting studies to compare Bayesian versus Frequentist SEM for small sample sizes	2020	[26]
An open source machine learning framework for efficient and transparent systematic reviews	2021	[14]
The Use Of A Text-Mining Screening Tool For Systematic Review Of Treatments For Relapsed/Refractory Diffuse Large B-Cell Lymphoma	2021	[67]
Transferring knowledge between topics in systematic reviews	2022	[68]
Assessing expert system-assisted literature reviews with a case study	2022	[69]
Extreme Systematic Reviews: A Large Literature Screening Dataset to Support Environmental Policymaking	2022	[70]
The efficiency of machine learning-assisted platform for article screening in systematic reviews in orthopaedics	2023	[25]
Reducing the user labelling effort in effective high recall tasks by fine-tuning active learning	2023	[71]
A Machine Learning Framework Reduces the Manual Workload for systematic reviews of the Diagnostic Performance of Prostate Magnetic Resonance Imaging	2023	[72]
Performance of active learning models for screening prioritisation study into the Average Time to Discover relevant records	2023	[73]
An Automated Literature Review Tool (LiteRev) for Streamlining and Accelerating Research Using Natural Language Processing and Machine Learning: Descriptive Performance Evaluation Study	2023	[74]
Machine Learning assisted systematic reviewing in orthopaedics	2023	[75]
Implementing Simple Active Learning (AL) Boosters considerably improves the early identification of relevant studies in the Systematic Literature Review (SLR) process	2023	[76]
Can artificial intelligence separate the wheat from the chaff in systematic reviews of health economic articles?	2023	[77]
Machine learning based system for the automation of systematic literature reviews	2023	[78]
SciMine: An Efficient Systematic Prioritisation Model Based on Richer Semantic Information	2023	[79]
Active learning-based systematic reviewing using switching classification models: the case of the onset, maintenance, and relapse of depressive disorders	2023	[27]
Impact of Active learning model and prior knowledge on discovery time of elusive relevant papers: a simulation study	2024	[80]
Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for SR in Educational Research	2024	[81]
Systematic review using a spiral approach with machine learning	2024	[82]

Table 2.

Descriptions of most used metrics in the identified studies.

Metric	Description
Recall/Yield @ X	Number of relevant documents found at X screened, i.e. $TP / (TP + FN)$
WSS (often WSS@X%)	Work saved over sampling at $X %$ relevant documents found (i.e. work decrease when compared with screening randomly)
Precision	$TP / (TP + FP)$
Burden	Percentage of studies that are manually labelled
Lastrel%	Percentage of candidate documents that need to be screened to get all the relevant documents
Mean average precision (map)	Average of precision at each recall position
Reliability	$lossr + losse$ , with $lossr = {(1 - r)}^{1}$ , where r is the recall at the threshold, and $losse = {(n / (R + 100) * 100 / N)}^{1}$ , where $n$ is the number of returned documents by the system up to the threshold, $N$ is the size of the collection, and $R$ the number of relevant documents.
Number of relevant found	Measure of recall at a set threshold
F1	$(1 * recall * precision) / (recall + precision)$
Cost	Extra percentage of documents screened in order to obtain a given level of estimated recall, using theoretical and actual $WSS @ X %$ values

Table 3.

Summary of the largest available datasets utilised for systematic review screening phase automation research.

Collection Name	Author	Field	Datasets	Description	Times seen
CLEF eHealth 2019 [40, 48, 51]	E. Kanoulas, D. Li, L. Azzopardi and R. Spijker	Medicine	111	111 systematic reviews from CLEF eHealth collected over 3 years. The 2017 and 2018 collections contribute 80 reviews on Diagnostic Test Accuracy, all conducted by Cochrane researchers. In 2019, an additional 31 reviews were incorporated, encompassing not just Diagnostic Test Accuracy but also Intervention, Prognosis, and Qualitative systematic reviews.	16
Cohen [83]	Cohen, A.M., Hersh,W. R., Peterson, K., & Yen, P.-Y.	Medicine	15	15 systematic drug class reviews carried out by three institutions: the Oregon Evidence-Based Practice Centre (EPC), the Southern California EPC, and the Research Triangle Institute/University of North Carolina (RTI/UNC) EPC	5
Hamel [64]	Hamel, Kelly, Thavorn, Rice, Wells, Hutton	Medicine	10	10 previously conducted systematic reviews, focusing on responses during title/abstract screening and the finalised lists of included studies. These reviews were carried out by research teams with a background in knowledge synthesis reviews. The research was a collaborative effort between the Ottawa Hospital Research Institute and the University of Ottawa Heart Institute in Ottawa.	1
RobotAnalyst database [47]	Przybyla, Brockmeier, Austin, Kontonatsios, Le Pogam, McNaught, Elm, Nolan, Ananiadou	Medicine	22	Used for testing RobotAnalyst. 22 collections were screened completely, ie, a reviewer made a relevance decision for every reference. All collections were used with no post hoc selection. The smallest dataset encompassed 86 references, while the largest approached 5000. The proportion of pertinent references in these collections fluctuated dramatically, with a low of 0.28% (6 out of 2,148) to a high nearing 30%.	1
AHRQ evidence reports [13]	Amy Y. Tsou, Jonathan R. Treadwell, Eileen Erinoff and Karen Schoelles	Medicine	9	This collection comprises evidence reviews assessing specific medical interventions conducted after 2010, with a clear citation screening threshold. Large reviews, sourced from AHRQ Evidence-based Practice Centre (EPC) publications between 2012 and 2018, contain over 1000 citations each and employ dual screening. Smaller reviews, predominantly from the ECRI Institute’s “Emerging Technology” reports, range from 200 to 999 citations. While these smaller reviews maintain standard systematic review methodologies, citation screening is singularly managed by one analyst.	1
SYNERGY dataset [84]	De Bruin, Jonathan; Ma, Yongchao; Ferdinands, Gerbrich; Teijema, Jelle; Van de Schoot, Rens	Medicine, Psychology, Computer science, Biology, Chemistry, Mathematics	26	SYNERGY is a free and open dataset on study selection in systematic reviews, comprising 169,288 academic works from 26 systematic reviews. Only 2834 (1.67%) of the academic works in the binary classified dataset are included in the systematic reviews. This makes the SYNERGY dataset a unique dataset for the development of information retrieval algorithms, especially for sparse labels. Due to the many variables available per record (i.e. titles, abstracts, authors, references, topics), this dataset is useful for researchers in NLP, machine learning, network analysis, and more. In total, the dataset contains 82,668,134 trainable data points.	-

In assessing the availability of code and datasets for this study, the criterion employed was the feasibility of access through reasonable effort. Instances in which a link was provided but found to be non-functional were documented as unavailability. Similarly, cases, where code or datasets were described without providing an accessible link, were also considered to be unavailable. Furthermore, if instructions for accessing code or datasets were provided, but these instructions proved to be inoperable within a reasonable effort, such cases were also recorded as unavailable. Finally, if a link was directed to a programme that was neither open source nor accessible due to a paywall, the programme was deemed unavailable for the purposes of this study.

The accessibility of each dataset was confirmed through manual verification (i.e., can the dataset be accessed through reasonable effort?).

2.2. Dataset analysis

The collection of datasets is not all-inclusive, as certain criteria must be met for a dataset to be considered. The dataset must pertain to a substantive topic (i.e. created for a of a systematic review, not for use in a simulation study) and be derived from scientific literature, thus excluding datasets such as those containing news articles. In addition, the dataset must either be pre-labeled or get labelled during the study before the simulation starts to accurately evaluate the performance of active learning algorithms. This ensures that the datasets are representative of scientific fields and of relevance to systematic review studies.

From the literature, 336 datasets were extracted. From this, 238 datasets were selected to be valid for inclusion based on the previously mentioned inclusion criteria. The following variables were extracted:

Dataset publication year

Original author

Collection author

Originally pre-labeled

Data type (title, abstract, full-text)

Original data purpose

Field & topic

Number of records in the dataset

Number of inclusions

Original dataset storage location

2.3. Model analysis

With model analysis, this study aims to reveal the model intricacies and provide a clearer understanding of how each model contributes to the performance of active learning, how often, and in what way the models are applied in the field.

Detailed data about the models utilised in each study were extracted. Specifically, we focused on:

The type of machine learning model used

Any customization applied to the model, if available, referred to as the custom model name

Are hyperparameter optimization techniques used

The size of the batch used in active learning cycles

In total, information on 15 distinct models was gathered. Each of these models has integrated active learning techniques in some form, showing a diverse array of approaches used in this field. This analysis of different models will shed light on the design nuances that might influence the successful application of active learning in systematic reviews.

3. Results

The results section of this scoping review provides an analysis of the studies identified through the literature search process. To present the findings in a cohesive manner, the results are formatted in the same manner as the method section is: Table 1 Study Designs, subsection 3.2 Datasets, and subsection 3.3 Active Learning Models and Evaluation. Each category focuses on specific variables relevant to that topic and provides a detailed examination of the key findings and their significance in the context of active learning literature.

The 60 papers labelled as relevant can be described as studies that use active learning in their simulations to test its performance. These are studies that focus specifically on the performance of active learning to improve systematic reviews, often in the form of a case study or report on a systematic review that employed active learning as part of their machine learning implementation. Section 1 presents the 60 simulation studies that were selected for inclusion in this scoping review.

3.1. Study designs of simulation papers

The first section of the results is about study design. The information in this section provides an understanding of the scale of the studies included in this review.

The distribution of papers by year of publication can be observed in Figure 4 and provides insight into the rising popularity of the field.

Figure 4.

Histogram visualising the number of papers per year. While the search was started at 2006, no papers running simulation studies were found between 2006 and 2010.

Figure 5 shows the distribution of the number of datasets used for simulation studies. Most studies use only a single dataset, and the median number of datasets per study is 3.5. Moreover, it was found that most studies train their model on a title-abstract combination.

Figure 5.

Boxplot with the number of datasets used in the included papers.

A remarkable observation can be found in the considerable quantity and diversity of metrics employed in the field. In Table 2 and Figure 6, only metrics with three or more instances are displayed; however, a total of 61 distinct evaluation metrics were identified to evaluate simulation performance. This substantial variation poses a significant challenge to the cross-comparison of models.

Figure 6.

Barplot showing the frequency of metrics used in studies.

Figure 7 presents the distribution of the primary studies incorporated in this review, focusing on their compliance with open science principles. The figure delineates the proportion of studies that provided access to their code, datasets, both, or neither. It reveals that while most studies report on their dataset and code, less than a quarter of analysed studies had both resources available. Factors contributing to the lack of shared code and datasets vary, including government restrictions exemplified in [65], unintentional omissions of code and data in relevant publications, and persistence issues such as broken links or relocated resources over time. Regardless of the reasons for these missing resources, their absence can impede study reproducibility, verifiability, and overall utility.

Figure 7.

Stacked bar chart illustrating the actual availability of reported resources in relation to the frequency of their occurrence. The categories on the x-axis represent the types of resources reported, while the y-axis displays the reported frequency. The differently coloured segments of each bar denote the actual availability, highlighting discrepancies between what was reported and the reality of access.

3.2. Datasets and usage

The second set of variables analysed in this review considers the datasets reported in the studies, with particular attention given to the manner in which they were utilised. This information provides insight into how the datasets were initially intended to be used, the specific fields and topics they belong to, and the characteristics of the text and labelling status of the datasets.

Table 3 lists open dataset collections with more than eight datasets for systematic review screening phase automation research, as identified in our searches. To facilitate organisation and identification, a dataset_id and a collection_id are assigned to each record and dataset in the tables.

Figure 8 shows the distribution of research fields found in the collected datasets. For some fields, the usage of the systematic review format is more commonplace than in others, leading to these fields being over represented in the simulation papers.

Figure 8.

Horizontal bar chart depicting the distribution of fields across the datasets under review. The y-axis represents various fields, while the x-axis indicates the amount of datasets in each field.

Figure 9 illustrates the relationship between the number of records and the number of inclusions, excluding outliers for enhanced clarity. This visualisation provides insight into the association between these two variables within the context of the analysed datasets. A subsequent statistical analysis was conducted on the data, revealing no significant correlation between the number of records and the number of inclusions ( $p = 0.17$ , and thus not significant). This result suggests that, for the collected data, the two variables are independent and do not directly influence one another.

Figure 9.

Scatterplot depicting the relationship between the number of records and the number of inclusions, with outliers removed for clarity. The x-axis represents the number of records, while the y-axis displays the number of inclusions. The plot highlights the association between these two variables within the context of the analysed datasets.

3.3. Active learning models and evaluation

The third group of variables in this review presents the active learning models used in the studies and the methods used to evaluate and compare these models. This information provides insight into the active learning models utilised in the studies. In addition, this group also includes the final conclusions of the studies on the effectiveness of active learning.

The studies show a concentration around a small set of supervised learning models, as seen in Figure 10. SVMs were the most common choice by a wide margin, followed by logistic regression. Naïve Bayes, random forests, and neural network architectures appeared regularly but at lower frequencies. Beyond these core models, usage dropped sharply.

Figure 10.

Horizontal bar chart showcasing the frequency of machine learning model choices employed in the studies. The y-axis represents various machine learning models, while the x-axis indicates their frequency of usage. This visualisation highlights the popularity of different models within the context of the analysed studies.

The analysis revealed that approximately one quarter of the simulation studies incorporated some form of hyperparameter optimization for their machine-learning algorithms. The other studies either employed models that did not require optimization or relied on standard settings for their algorithms.

The code for the visualisations used in this article is available as [85]. The datasets are available as [24].

Finally, each paper was assessed on its evaluation of active learning. It was found that, despite some identifying certain limitations, all 60 provided positive endorsements for using active learning to improve the efficiency of the screening phase in systematic reviews.

4. Discussion

Systematic reviews are top-quality research but require a lot of time and work, and the screening phase remains notably laborious and time-consuming. It is in this step that active learning, which has already found application in numerous software tools, can greatly contribute by reducing human effort. However, there is no solid overview of how well active learning performs in this role. We aimed was to review existing literature on active learning in systematic reviews to map out the field, present its various approaches, methodologies, and findings, and ultimately determine if active learning is indeed the recommended solution for systematic review acceleration. We believe this work will serve as an important reference point, providing an understanding of the current state of the field and highlighting areas for future research and practical applications.

The review conducted in this study has thoroughly analysed the current state of active learning-based systematic review screening, taking into account the variations in study design, dataset usage, active learning models employed, and performance metrics applied. Despite the diversity of examined studies, a shared theme emerged: our analysis consistently revealed the belief that active learning is recommended as a solution to the challenges posed by the implementation of machine learning for systematic reviews. This finding not only supports the broader application of active learning in the field of systematic review practice but also serves as a strong foundation for both researchers in meta-systematic review research and practitioners considering the implementation of active learning in their own systematic review efforts.

The results section of this review offers an analysis of the identified studies, focusing on three key aspects: study design, dataset usage, and active learning models. The study design analysis provided insights into the scale and methodology of the included studies, as well as the distribution of papers by year of publication, number of datasets used, and the considerable diversity of metrics employed. The dataset analysis offered information on how datasets were utilised, their intended purposes, the specific fields they belong to, and the characteristics of the text and labelling status. The active learning models analysis presented the models used in the studies, the methods of evaluation and comparison, and the conclusions on the effectiveness of active learning.

One limitation encountered during this review is the diversity of metrics used in screened literature. This diversity makes cross-comparison of performance and models difficult. In order to effectively evaluate the performance of active learning in systematic reviews, a robust and consistent measurement tool is required. While previous studies have contributed to the development of sophisticated metrics, the absence of a universal standard and the variety of performance measures employed hinder direct comparisons of active learning methods’ effectiveness.

O’Mara-Eves et al. [23] provide an overview of the performance measures definitions used in studies of text mining for systematic reviews. This study presents clear and easily understandable documentation of the performance measures employed in various studies in this field. Their work serves as a valuable resource for researchers seeking to compare and evaluate the effectiveness of different text-mining methods for systematic reviews. However, we found that since then the diversity of metrics has only increased, with a total of 61 different metrics identified. This further complicates the cross-comparison of active learning methods, posing a challenge for researchers aiming to draw broad conclusions about the field.

Another limitation identified in this review is the lack of dataset availability. Many datasets used in the studies are not open data or open science, which restricts the reproducibility of the research. Reproducibility is a vital aspect of the scientific method, as it allows researchers to validate the findings of a study and build upon the existing body of knowledge. Ensuring the reproducibility of research is essential for the credibility of scientific results. To address this issue, we encourage researchers to open up their datasets, adhering to open science principles, and facilitating the replication of their work by others.

Olorisade et al. [86] report that around 80% of the studies they assessed lacked sufficient information regarding dataset usage. While our observations come to around 40% of assessed papers missing dataset information, this is still a significant portion. This lack of open data undermines the reproducibility of research in data science. To counter this, Olorisade et al. provide a framework for ensuring the reproducibility of research in data science, which can help researchers produce reliable and trustworthy results that can be validated and reused by others. By adopting this framework and sharing datasets, researchers can contribute to the advancement of the scientific method and bolster the credibility of their findings in the active learning-based systematic review screening field.

Another limitation found by this review is the lack of cross-analysis between models. The majority of the reviewed papers employ either support vector machines (SVMs) or logistic regression (LR), and most papers only compare against manual work, not against other models. Furthermore, the models are typically tested against unique datasets, making it challenging to compare their performance across different datasets. While the reviewed studies provide useful insights into the application of active learning models for systematic reviews, there is still a need for more extensive and comparative analyses across various models and datasets. Such research could help in identifying the most effective active learning models for systematic reviews and provide more standardised performance evaluation methods.

There are several areas of active learning-based systematic review screening that warrant further exploration. One critical area for future research is the development and standardisation of metrics to evaluate active learning methods. With the proliferation of different metrics used in the field, there is a pressing need to identify the most appropriate metrics to use for evaluating active learning models.

In addition, advocating for the use of open data practices could be beneficial in improving the availability of datasets and promoting collaborative research efforts.

There is a need to explore a wider variety of models to improve the understanding of active learning techniques. While SVMs and LR models are currently popular choices, exploring a more extensive range of models may lead to improved performance and a better understanding of the strengths and weaknesses of different active learning techniques.

Future work in active learning-based systematic review screening should focus on standardising metrics, promoting open data practices, and exploring a wider variety of models to improve the efficacy and transparency of research in this field.

4.1. Recommendations

As practical results of our analysis of 60 simulation studies, we find consistent evidence that active learning improves efficiency in systematic review screening compared with random or manual-only approaches. Across diverse designs and datasets, this conclusion appears repeatedly. From this body of work, three priorities can be formulated as initial steps towards a more standardised model: the inclusion of pre-existing evaluation criteria when introducing new metrics, increasing the availability of open datasets, and broadening the range of models compared. Taken together, these priorities provide a start for both researchers and practitioners, and can be seen as the beginning of a more unified framework for applying active learning in systematic reviewing.

4.2. Declarations

4.2.1. Data availability

The datasets generated and analysed during this study are available in the Open Science Framework repository at the following URL: https://osf.io/t9hgm. The source code for ‘Simulation-Based Active Learning for Systematic Reviews’ is publicly accessible via our GitHub repository at the following URL: https://doi.org/10.5281/zenodo.13361795. These resources provide supplementary data, methods, and materials related to the study.

The OSF repository includes as much information as possible in the shared datasets within our interpretation of copyright restrictions, but this remains a limiting factor. Copyright restrictions constrain the extent to which full record sets can be made openly available, even though recent work has shown that reproducing systematic review datasets is often highly challenging without access to the original records [87].

We encourage interested readers to explore these resources for a more complete understanding of the methods and results presented in this article. The GitHub repository is actively maintained and updated by the authors, and may contain more recent versions or enhancements of the code used in this study. The cited version (v3.0) is the latest version used for this work.

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Usage of generative AI and AI-assisted technologies in the writing process

During the preparation of this work the authors used Open Source Generative AI to increase language readability. After use of this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

ORCID iDs

Jelle Jasper Teijema

Daniel Anadria

Notes

References

Marshall

Wallace

BC.

Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev 2019; 8(1): 163. DOI: 10.1186/s13643-019-1074-9.

Tsafnat

Glasziou

Choong

, et al. Systematic review automation technologies. Syst Rev 2014; 3(1): 1–15.

Higgins

JPT

Thomas

Chandler

, et al. Cochrane handbook for systematic reviews of interventions. John Wiley & Sons, 2019.

Van Dinter

Tekinerdogan

Catal

. Automation of systematic literature reviews: a systematic literature review. Inform Softw Tech 2021; 136. DOI: 10.1016/j.infsof.2021.106589.

Jimenez

Lee

Rosillo

, et al. Machine learning computational tools to assist the performance of systematic reviews: a mapping review. BMC Med Res Methodol 2022; 22(1). DOI: 10.1186/s12874-022-01805-4.

Settles

Active learning literature survey, 2009.

Adam

Wallace

Trikalinos

. Semi-automated tools for systematic searches. In: Methods in Molecular Biology. New York: Springer, 2021, pp. 17–40. DOI: 10.1007/978-1-0716-1566-9_2.

Cowie

Rahmatullah

Hardy

, et al. Web-based software tools for systematic literature review in medicine: systematic search and feature analysis. JMIR Med Inform 2022; 10(5): e33219. DOI: 10.2196/33219.

Pellegrini

Marsili

Evaluating software tools to conduct systematic reviews: a feature analysis and user survey. Form@re – Open Journal per la formazione in rete 2021; 21(2): 124–140. DOI: 10.36253/form-11343.

10.

Mauricio

Gonzalez

. Optimización de estrategias de búsquedas científicas médicas utilizando técnicas de inteligencia artificial. PhD thesis. DOI: 10.11144/javeriana.10554.58492.

11.

Robledo

Grisales Aguirre

Hughes

, et al. ‘Hasta la vista, baby’– will machine learning terminate human literature reviews in entrepreneurship? J Small Bus Manag 2021: 1–30. DOI: 10.1080/00472778.2021.1955125.

12.

Scott

Forbes

Clark

, et al. Systematic review automation tools improve efficiency but lack of knowledge impedes their adoption: a survey. J Clin Epidemiol 2021; 138: 80–94. DOI: 10.1016/j.jclinepi.2021.06.030.

13.

Tsou

Treadwell

Erinoff

, et al. Machine learning for screening prioritization in systematic reviews: comparative performance of Abstrackr and EPPI-Reviewer. Syst Rev 2020; 9(1): 73. DOI: 10.1186/s13643-020-01324-7.

14.

van

Schoot

de Bruin

Schram

, et al. An open source machine learning framework for efficient and transparent systematic reviews. Nature Mach Intel 2021; 3(2): 125–133.

15.

Wagner

Lukyanenko

Paré

Artificial intelligence and the conduct of literature reviews. J Inform Tech 2022; 37(2): 209–226.

16.

Wang

Text mining approaches for dealing with the rapidly expanding literature on COVID-19. Brief Bioinform 2021; 22(2): 781–799.

17.

Khalil

Ameen

Zarnegar

Tools to support the automation of systematic reviews: a scoping review. J Clin Epidemiol 2022; 144: 22–42. DOI: 10.1016/j.jclinepi.2021.12.005.

18.

Xuan

Jiali

Yuning

, et al. Application of natural language processing in systematic reviews. Chin J Evidence-Based Med 2021; 21(6): 715–720. DOI: 10.7507/1672-2531.202012150.

19.

Stoel

Mourits

van

Schoot

Procedure and Results for the Initial Selection of Software for Systematically Screening Large Amounts of Textual Data Implementing Active Learning. 2023. doi: 10.17605/OSF.IO/G3NKZ.

20.

Miwa

Thomas

O’Mara-Eves

, et al. Reducing systematic review workload through certainty-based screening. J Biomed Inform 2014; 51: 242–253. DOI: 10.1016/j.jbi.2014.06.005.

21.

Thomas

McNaught

Ananiadou

Applications of text mining within systematic reviews. Res Synth Methods 2011; 2(1): 1–14. DOI: 10.1002/jrsm.27.

22.

Thomas

Living systematic reviews: 2. Combining human and machine effort. J Clin Epidemiol 2017; 91: 31–37.

23.

O’Mara-Eves

. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 201; 4(1): 1–22.

24.

Teijema

JJ.

Simulation–based Active Learning for Systematic Reviews: A Scoping Review of the Literature – Repository. 2025. DOI: 10.17605/OSF.IO/T9HGM. url: osf.io/t9hgm.

25.

Muthu

. The efficiency of machine learning-assisted platform for article screening in systematic reviews in orthopaedics. Int Orthop 2022: 1–6.

26.

Ferdinands

AI-Assisted systematic reviewing: selecting studies to compare Bayesian versus Frequentist SEM for small sample sizes. Multivariate Behav Res 2021; 56(1): 153–154.

27.

Teijema

Hofstee

Brouwer

, et al. Active learning-based Systematic reviewing using switching classification models: the case of the onset, maintenance, and relapse of depressive disorders. Front Res Metric Anal 2023; 8: 1178181.

28.

Teijema

de Bruin

Bagheri

, et al. Large-scale simulation study of active learning models for systematic reviews. PsyArXiv, 1 November, 2023.

29.

Wallace

Small

Brodley

, et al. Modeling annotation time to reduce workload in comparative effectiveness reviews, pp. 28–35. DOI: 10.1145/1882992.1882999.

30.

Wallace

Small

Brodley

, et al. Active learning for biomedical citation screening, pp. 173–181. DOI: 10.1145/1835804.1835829.

31.

Wallace

Trikalinos

Lau

, et al. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinform 2010; 11(1): 55. DOI: 10.1186/1471-2105-11-55.

32.

Wallace

Small

Brodley

, et al. Who should label what? instance allocation in multiple expert active learning. pp. 176–187. DOI: 10.1137/1.9781611972818.16.

33.

Wallace

Small

Brodley

, et al. Deploying an interactive machine learning system in an evidence-based practice center: abstrackr. In: The 2nd ACM SIGHIT symposium. Miami, FL, USA: ACM Press, pp. 819. DOI: 10.1145/2110363.2110464.

34.

Ambert

Cohen

Burns

GAPC

, et al. Virk: an active learning-based system for bootstrapping knowledge base development in the neurosciences. Front Neuroinform 2013; 7. DOI: 10.3389/fninf.2013.00038.

35.

Cormack

Grossman

MR.

Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review. arXiv:1504.06868, 2015.

36.

Timsina

El-Gayar

Liu

. Active learning for the automation of medical systematic review creation.

37.

Rathbone

Hoffmann

Glasziou

Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers. Syst Rev 2015; 4: 80. DOI: 10.1186/s13643-015-0067-6.

38.

Hashimoto

Kontonatsios

Miwa

, et al. Topic detection using paragraph vectors to support active learning in systematic reviews. J Biomed Inform 2016; 62: 59–65. DOI: 10.1016/j.jbi.2016.06.001.

39.

Cormack

Grossman

. Scalability of continuous active learning for reliable high-recall text classification. In: CIKM’16: ACM Conference on Information and Knowledge Management, Indianapolis IN: ACM, pp. 1039–1048. DOI: 10.1145/2983323.2983776.

40.

Cormack

Grossman

MR.

Technology-assisted review in empirical medicine: Waterloo participation in CLEF eHealth

2017, vol. 1866.

41.

Menzies

(eds) Data balancing for technologically assisted reviews: undersampling or reweighting, vol. 1866.

42.

Ros

Bjarnason

Runeson

. A machine learning approach for semi-automated search and selection in literature studies. In: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. 2017, pp. 118–127.

43.

Kanoulas

Azzopardi

, et al. CLEF 2017 Technologically assisted reviews in empirical medicine overview, https://ceur-ws.org/Vol-1866/invited_paper_12.pdf

44.

Kraft

Menzies

Finding better active learners for faster literature reviews. Empir Softw Eng 2018; 23(6): 3161–3186. DOI: 10.1007/s10664-017-9587-0.

45.

Norman

Leeflang

Névéol

LIMSI@CLEF eHealth 2018 Task 2: technology assisted reviews by stacking active and static learning, vol. 2125.

46.

Liu

Timsina

El-Gayar

A comparative analysis of semi-supervised learning: the case of article selection for medical systematic reviews. Inform Syst Front 2018; 20(2): 195–207. DOI: 10.1007/s10796-016-9724-0.

47.

Przybyła

Brockmeier

Kontonatsios

, et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res Synth Method 2018; 9(3): 470–488. DOI: 10.1002/jrsm.1311.

48.

Cormack

Grossman

MR.

Technology-assisted review in empirical medicine: Waterloo participation in CLEF eHealth

2018. 2125.

49.

Kanoulas

Azzopardi

, et al. CLEF 2018 Technologically assisted reviews in empirical medicine overview. In: CLEF 2018 technologically assisted reviews in empirical medicine overview.

50.

Di Nunzio

. Finding all the needles in a haystack: a system to estimate the costs of e-discovery and systematic reviews, vol. 2167, p. 106.

51.

Kanoulas

Azzopardi

, et al. CLEF 2019 technology assisted reviews in empirical medicine overview, 2380.

52.

Norman

Leeflang

MMG

Porcher

, et al. Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy. Syst Rev 2019; 8(1). DOI: 10.1186/s13643-019-1162-x.

53.

Menzies

FAST(2): an intelligent assistant for finding relevant papers. Expert Syst Appl 2019; 120: 57–71. DOI: 10.1016/j.eswa.2018.11.021.

54.

Carvallo

Parra

Comparing word embeddings for document screening based on active learning. 2414, pp. 100–107.

55.

Varghese

Hong

Hunter

, et al. Active learning in automated text classification: a case study exploring bias in predicted model performance metrics. Environ Syst Decis 2019; 39(3): 269–280. DOI: 10.1007/s10669-019-09717-3.

56.

Halfpenny

Eaton

. PNS335: using machine learning for efficiency improvements in systematic literature reviews of clinical efficacy and safety. In: Value in Health. York: Pharmerit International, York, 2019, p. 22. DOI: 10.1016/j.jval.2019.09.2235.

57.

Zou

Kanoulas

Towards Question-based High-recall Information Retrieval. ACM Trans Inform Syst 2020; 38(3). DOI: 10.1145/3388640.

58.

Zhang

Cormack

Grossman

, et al. Evaluating sentence-level relevance feedback for high-recall information retrieval. Inform Retriev J 2020; 23(1). DOI: 10.1007/s10791-019-09361-0.

59.

Callaghan

Mueller-Hansen

Statistical stopping criteria for automated screening in systematic reviews. Syst Rev 2020; 9(1). DOI: 10.1186/s13643-020-01521-4.

60.

Yamada

Yoneoka

Hiraike

, et al. Deep neural network for reducing the screening workload in systematic reviews for clinical guidelines: algorithm validation study. J Med Intern Res 2020; 22(12): e22422. DOI: 10.2196/22422.

61.

Carvallo

Parra

Lobel

, et al. Automatic document screening of medical literature using word and text embeddings in an active learning setting. Scientometrics 2020; 125(3): 3047–3084. DOI: 10.1007/s11192-020-03648-6.

62.

Zafeiriadis

Kanoulas

. APS: an active pubmed search system for technology assisted reviews, pp. 2137–2140. DOI: 10.1145/3397271.3401401.

63.

Elmore

Schmidt

Lam

, et al. Risk and protective factors in the COVID-19 pandemic: a rapid evidence map. Front Public Health 2020; 8. DOI: 10.3389/fpubh.2020.582205.

64.

Hamel

Kelly

Thavorn

, et al. An evaluation of DistillerSR’s machine learning-based prioritization tool for title/abstract screening – impact on reviewer-relevant outcomes. BMC Med Res Methodol 2020; 20(1). DOI: 10.1186/s12874-020-01129-1.

65.

Howard

Phillips

Tandon

, et al. SWIFT-active screener: accelerated document screening through active learning and integrated recall estimation. Environ Int 2020; 138: 105623. DOI: 10.1016/j.envint.2020.105623.

66.

Gurgel

Dessay

, et al. Semi-supervised text classification framework: an overview of dengue landscape factors and satellite earth observation. Int J Environ Res Public Health 2020; 17(12). doi: 10.3390/ijerph17124509.

67.

Carey

Harte

McCullagh

The use of a text-mining screening tool for systematic review of treatments for relapsed/refractory diffuse large B-Cell Lymphoma. Int J Tech Asses Health Care 2021; 37(S1): 2.

68.

Molinari

Kanoulas

Transferring knowledge between topics in systematic reviews. Intel Syst Appl 2022; 16: 200150.

69.

Carver

Rothermal

, et al. Assessing expert system-assisted literature reviews with a case study. Expert Syst Appl 2022; 200: 116958.

70.

Hou

Wang

Dubois

J-J

, et al. Extreme systematic reviews: a large literature screening dataset to support environmental policymaking. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 4029–4033.

71.

Bianco

Duarte

Gonçalves

. Reducing the user labeling effort in effective high recall tasks by fine-tuning active learning. J Intel Inform Syst 2023: 1–20.

72.

Nedelcu

Oerther

Engel

, et al. A machine learning framework reduces the manual workload for systematic reviews of the diagnostic performance of prostate magnetic resonance imaging. Europ Urol Open Sci 2023; 56: 11–14.

73.

Ferdinands

Schram

de Bruin

, et al. Performance of active learning models for screening prioritization in systematic reviews: a simulation study into the Average Time to Discover relevant records. Syst Rev 2023; 12(1): 100.

74.

Orel

Ciglenecki

Thiabaud

, et al. An automated literature review tool (LiteRev) for streamlining and accelerating research using natural language processing and machine learning: descriptive performance evaluation study. J Med Intern Res 2023; 25: e39736.

75.

Pijls

BG.

Machine Learning assisted systematic reviewing in orthopaedics. J Orthop, 2023.

76.

Bravo

Patel

Atanasov

MSR63 Implementing simple active learning (AL) boosters considerably improves the early identification of relevant studies in the systematic literature review (SLR) Process. Value in Health 2023; 26(12): S404–S405.

77.

Oude Wolcherink

Pouwels

XGLV

van Dijk

SHB

, et al. Can artificial intelligence separate the wheat from the chaff in systematic reviews of health economic articles? Expert Rev Pharmacoecon Outcom Res 2023; 23(9): 1049–1056.

78.

Pytlak

Bukhvalova

Cichosz

, et al. Machine learning based system for the automation of systematic literature reviews. In: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. 2023, pp. 4389–4397.

79.

Guo

Luo

Yang

, et al. Scimine: an efficient systematic prioritization model based on richer semantic information. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023, pp. 205–215.

80.

Byrne

Hofstee

Teijema

, et al. Impact of Active learning model and prior knowledge on discovery time of elusive relevant papers: a simulation study. Syst Rev 2024; 13(1): 175.

81.

Campos

Fütterer

Gfrörer

, et al. Screening smarter, not harder: a comparative analysis of machine learning screening algorithms and heuristic stopping criteria for systematic reviews in educational research. Educat Psychol Review 2024; 36(1): 19.

82.

Saeidmehr

Steel

PDG

Samavati

FF.

Systematic review using a spiral approach with machine learning. Syst Rev 2024; 13(1): 32.

83.

Cohen

Hersh

Peterson

, et al. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc 2006; 13(2): 206–219.

84.

De Bruin

Ferdinands

, et al. SYNERGY – Open machine learning dataset on study selection in systematic reviews. Version V1. 2023. doi: 10.34894/HE6NAQ. url: https://doi.org/10.34894/HE6NAQ.

85.

Teijema

Jteijema/Code-for-Simulation-Based-Active-Learning-for-Systematic-Reviews: v1.0. 2023. DOI: 10.5281/zenodo.8095084.

86.

Olorisade

Brereton

Andras

Reproducibility of studies on text mining for citation screening in systematic reviews: evaluation and checklist. J Biomed Inform 2017; 73: 1–13. DOI: 10.1016/j.jbi.2017.07.010.

87.

Neeleman

Leenaars

CHC

Oud

, et al. Addressing the challenges of reconstructing systematic reviews datasets: a case study and a noisy label filter procedure. Syst Rev 2024; 13(1): 69.

Simulation-based active learning for systematic reviews: A scoping review of literature

Abstract

Background:

Objective:

Design:

Results:

Conclusion:

Keywords

1. Introduction

2. Methodology

2.1. Study design analysis

2.2. Dataset analysis

2.3. Model analysis

3. Results

3.1. Study designs of simulation papers

3.2. Datasets and usage

3.3. Active learning models and evaluation

4. Discussion

4.1. Recommendations

4.2. Declarations

4.2.1. Data availability

Footnotes

Declaration of conflicting interests

Funding

Usage of generative AI and AI-assisted technologies in the writing process

ORCID iDs

Notes

References