Sage Journals: Discover world-class research

Abstract

Artificial Intelligence (AI)-based clinical decision support systems provide clinicians with insights extending beyond conventional medical tools. The aim is to improve diagnostic and prognostic accuracy by capitalizing on the granularity of available data, allowing a larger population to benefit from tailored care. AI-driven reconstruction of stroke imaging has been shown to have performance non-inferior to that of a neuroradiologist in terms of ischemic lesion scoring according to ASPECTs,¹ also providing consistent scoring of collaterals in the hyperacute setting.² Machine learning methods (ML) to aid large vessel occlusion and salvageable tissue detection represent a pivotal stride toward optimizing the onset-to-treatment window, potentially improving cost effectiveness³ and providing a reasonably accurate prediction of response to treatment.⁴ Beyond imaging, AI has also been tested in the prediction of functional outcome.^5–7 In such settings, using data from baseline assessment and further details on patient status 24 h after stroke, ML models have been shown to provide up to 80% accurate prediction of good functional outcome at 3 months.^5–7

Taken all together, these preliminary studies underscore the potential of ML in leveraging real-world data for outcome prediction. Concurrently, these studies draw attention to the need to deal with a reproducibility crisis related to AI-based research.

ML involves the development of models that enable computers to learn from data autonomously, improving performance without explicit programming. The proliferation of ML has spurred an exponential increase in clinical AI model development, with over 75,000 reported studies (https://aiforhealth.app).

Amid such AI progress, a critical challenge looms large: the reproducibility of AI-based studies. This issue came to the fore from studies on AI-aided interpretation of medical images for COVID diagnosis, where several models faced implementation hurdles due to implicit biases and reproducibility issues.⁸ Consequently, as AI continues to integrate into stroke care, prioritizing the three pillars of scientific rigor – reliability, replicability, and reproducibility – becomes imperative.

Reliability concerns the consistency and accuracy of a method to discriminate outcomes coherently. Replicability lies in the accuracy of findings, and whether they can be confirmed using the same procedures in new studies (scientific replicability). This entails the same algorithm producing similar results under stress conditions, encompassing increased sampling variability, greater system uncertainty, or varying data quality. Far from implying exact point estimate replication, scientific replicability rather represents the need to reach consistent results within probabilistic and non-probabilistic variation tolerances.

Lastly, reproducibility relates to transparency of data, methods and analysis, and the steps put in place to allow re-testing and refinement (computational reproducibility). The latter implies that any computed result must be obtainable by any investigator using the same data and algorithms. This principle extends beyond result validity, constituting a fundamental requirement at the computational level: every algorithm must be reproducible to justify its existence.⁹

For broad applicability, “an AI model needs to be reproducible, which means the code and data should be available and error-free.”⁹ Several antagonistic factors impede this principle, including ethical concerns, privacy issues, and legal obstacles at institutional and national levels, particularly in experiments involving statistical and sub-symbolic domains with deep learning models.⁹ The lack of standardized reporting and the limited availability of source code and datasets undermine AI ethics principles.¹⁰ The stroke research field, in particular, finds in data sharing and source code reporting a consistent limitation to the applicability and external validation of research findings and models.¹¹ From a scoping search, over the last year more than 30 ML algorithms were reported for stroke patients, but only one study fully disclosed the source code,⁴ one referred to open source libraries for data mining and algorithm development^7,9 and only one provided external validation of the ML model.¹² Therefore, implementing technical and ethical measures is crucial to ensure fairness, accountability, and transparency in AI studies.

From a technical perspective, authors may be encouraged to report the architecture of the ML algorithm (unless patented). This also applies to the methods for detecting and preventing data leakage, to ensure explicit criteria for dataset splitting during training, tuning, testing, and validation. Even when building from community-based frameworks (e.g. MONAI.io), providing source code would allow external validation and reproducibility. Standardized reporting of data curation and annotation can enhance understanding of the ML model development platform. From a reviewer perspective, grasping the functioning of the ML models is critical. As neutral or negative results are often kept unpublished, the main risk for the scientific community is to be overwhelmed by ML studies with unrecognized overfitting – namely those with the highest predictive accuracy. At that point reviewers will likely have little – if any – ability to discriminate good use and misuse of AI¹⁰ without source code and data available. Although data and source code sharing is common in non-medical ML literature (e.g. Open Neural Network Exchange, http://onnx.ai), it remains underused in clinical studies.¹³ Indeed, the sharing of sensitive personal data among different institutions raises privacy preserving concerns. Federated platforms may partially overcome data sharing issues, allowing to train algorithms without sharing patient sensitive data, while providing external validation of AI-based models. Such approaches would also mitigate the intrinsic limitations of studies adopting AI-driven text mining for data collection.^7,14–16 To this extent, readers and clinicians, as well as patients as final users, would likely be more confident in giving trust to a validated and transparent ML algorithm rather than to a not explainable model with undefined internal or external validity. Implementing safeguard measures for reporting ML studies can help identify potential undisclosed issues such as treatment bias, ethnic group bias, subgroup and sensitivity analysis, and generalizability.

From an ethical standpoint, editorial bodies should adopt a framework that promotes transparency and fairness, including checklist reporting and repositories for source code storage. This approach would streamline the peer-review process, foster reproducibility, and potentially ensure proper citation of original work. The stroke field may also refer to common frameworks developed for cancer imaging, where transparency and explainability are promoted, and testing of real-world implementation of AI solution is expected.¹⁷ Besides checklists (e.g. TRIPOD, CLAIM, MAIC-10) required for ML study submission, reviewers should use a dedicated lexicon and glossary for reviews and test ML model reproducibility and reliability. Reviewers should also categorize ML reproducibility in tiers, which could be prominently displayed alongside article titles to inform journal audiences.

Historically, we transitioned from AI as a branch of mathematics, grounded in deduction, to ML as a branch of statistics, grounded on probability and correlation. But probabilities and correlation are by definition estimates of events rather than deterministic. And although running with probabilities is commonplace in the clinical setting, with AI we may be able to explain general outcomes through infinitesimal – potentially microscopic – parameters and patterns. If ML is to help in stroke care, algorithms must evolve from inscrutable to informative and explainable, for the sake of clinicians and patients.¹³ A model’s accuracy depends on the underlying data quality, but can improve and gain trust over time with transparency. As AI develops, the ultimate goal is broad generalizability and positive impact on many lives. Consequently, the community of authors, reviewers, and editors shares a common interest in pooling efforts toward this goal.

Footnotes

Acknowledgements

None

Declaration of conflicting interest

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: MR is supported by Young Investigator Grants from the Italian Stroke Association (ISA-AII), and declares support for educational activities from CLS-Behring and PRESTIGE-AF trial.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Ethical approval

It is an editorial, there is no ethic include.

Informed consent

It is an editorial, there is no patient included.

Guarantor

MR.

Contributorship

MR, PC: concept, design, writing.

ORCID iDs

Michele Romoli

Pietro Caliandro

References

Nagel

Sinha

Day

, et al. E-ASPECTS software is non-inferior to neuroradiologists in applying the ASPECT score to computed tomography scans of acute ischemic stroke patients. Int J Stroke 2017; 12: 615–622.

Kellner

Reisert

Kiselev

, et al. Automated infarct core volumetry within the hypoperfused tissue: technical implementation and evaluation. J Comput Assist Tomogr 2017; 41: 515–520.

van Leeuwen

Meijer

FJA

Schalekamp

, et al. Cost-effectiveness of artificial intelligence aided vessel occlusion detection in acute stroke: an early health technology assessment. Insights Imaging 2021; 12: 133.

Chen

Tozer

Liu

, et al. Prediction of response to thrombolysis in acute stroke using neural network analysis of CT perfusion imaging. Eur Stroke J 2023; 8: 629–637.

Monteiro

Fonseca

Freitas

, et al. Using machine learning to improve the prediction of functional outcome in ischemic stroke patients. IEEE/ACM Trans Comput Biol Bioinform 2018; 15: 1953–1959.

Zihni

Madai

Livne

, et al. Opening the black box of artificial intelligence for clinical decision support: a study predicting stroke outcome. PLoS One 2020; 15: e0231166.

Caliandro

Lenkowicz

Reale

, et al. Artificial intelligence to predict individualized outcome of acute ischemic stroke patients: the SIBILLA project. Eur Stroke J. Epub ahead of print 22 May 2024. DOI: 10.1177/23969873241253366.

Roberts

Driggs

Thorpe

, et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell 2021; 3: 199–217.

Sohn

The reproducibility issues that haunt health-care AI. Nature 2023; 613: 402–403.

10.

Floridi

Cowls

A unified framework of five principles for AI in society. Mach Learn City Appl Archit Urban Des 2022; 535–545.

11.

Scutt

Woodhouse

Montgomery

, et al. Data sharing: experience of accessing individual patient data from completed randomised controlled trials in vascular and cognitive medicine. BMJ Open 2020; 10: e038765.

12.

Heo

Lee

Seog

, et al. Cancer prediction with machine learning of thrombi from thrombectomy in stroke: multicenter development and validation. Stroke 2023; 54: 2105–2113.

13.

Singh

Beam

Nallamothu

BK.

Machine learning in clinical journals: moving from inscrutable to informative. Circ Cardiovasc Qual Outcomes 2020; 13: E007491.

14.

Hsu

Bako

Potter

, et al. Extraction of radiological characteristics from free-text imaging reports using natural language processing among patients with ischemic and hemorrhagic stroke: algorithm development and validation. JMIR AI 2023; 2: e42884.

15.

Zia

Aziz

Popa

, et al. Artificial intelligence-based medical data mining. J Pers Med 2022; 12: 1359.

16.

Dhar

Das

Majumder

. A study on NLP based approach in AI and text data mining for automated highlighting of new information in clinical notes. In: 2022 6th international conference on intelligent computing and control systems (ICICCS), Madurai, India, 25–27 May 2022, pp.966–972. New York: IEEE.

17.

Koh

Papanikolaou

Bick

, et al. Artificial intelligence and machine learning in cancer imaging. Commun Med 2022; 2: 133.

Artificial intelligence,machine learning,and reproducibility in stroke research