Sage Journals: Discover world-class research

Abstract

Keywords

Artificial Intelligence Clinical Value External Validation Reproducibility

A good radiology article that explores the use of artificial intelligence (AI) techniques has the potential of improving patient care; however, producing such an article is not an easy feat. The difficulty lies in the interdisciplinary nature of a radiology AI study. It needs to be strong in both the clinical and technical sense. With increasingly more manuscripts on this topic being submitted to radiology journals for consideration, we highlight the following clinical and technical elements that a good radiology AI article should demonstrate.

Clinical Elements

Clinical Value

Although early interest in the use of AI in radiology appeared to be driven primarily by technical advancement of machine learning and deep learning algorithms,¹ the fundamental value of an AI solution intended for radiology rests on its eventual role in patient care. A good radiology AI study answers a carefully crafted imaging-based question based on clinical needs, which maximizes the technical potential of AI techniques within their technical limitations. For studies that investigate the diagnosis accuracy of an AI solution, it is important to define its role in the diagnostic pathway, and investigate its performance accordingly (in the intended role and compared with appropriate standard of care). For example, if an AI solution is intended as a replacement test, the authors should compare its performance with the radiologist’s performance; if intended for a triage or add-on role, the authors should instead compare the performance in an AI/radiologists combined setting with performance of radiologists alone. Furthermore, superior technical efficacy or diagnostic accuracy in isolation does not necessarily translate into improved clinical decision making or patient outcomes. Exploring these clinically oriented endpoints beyond diagnostic accuracy can better demonstrate the practical value of radiology AI techniques and facilitate their widespread adoption.²

Patient/Case Selection

To achieve a high generalizability, careful patient selection is critical to ensure that the studied population best represents the patient population for the intended use. The strength of this representation is affected by the studied patient demographics (age, gender and ethnicity), basis of identification (prospective/retrospective, patient care setting, enrolment location and dates, consecutive/random/convenience series and inclusion/exclusion criteria), and disease profile (prevalence and severity of both the investigated and alternative diagnoses).³ If pre-curated public online datasets are used, the developers/publishers of the public datasets should attempt to make such information available, and researchers using these datasets should report relevant information in the manuscript along with a discussion of the benefits/limitations of the dataset selection.⁴ The studied cases should be homogenously distributed into the training, validation and test datasets, in terms of patient demographics and disease profile. Furthermore, the cases in the test datasets should be unique, that is, not duplicated in the training or validation datasets, and only be tested on once using the final model to ensure a non-biased measurement of accuracy.

External Validation

A major challenge to the clinical adoption of AI techniques is maintaining the same performance illustrated in research articles when deployed in real-world clinical settings. This can be partially attributed to the risk of overfitting, where a machine learning or deep learning algorithm is most optimized for the datasets it is developed on without appreciating the unknown confounders inherent in the small number of studied cases. To generalize their performance to routine clinical practices, the algorithms should be validated in external test datasets, ideally prospectively using adequately sized multi-institutional data.

Technical Elements

Reporting of Accuracy: Bridging Two Fields

When reporting diagnostic accuracy, early radiology AI studies tended to use metrics and terminology common in the computer science field, such as F1 score, precision, recall, accuracy or area under a receiver operating characteristic curve (AUROC). On the other hand, in the clinical field, performance of diagnostic tools is commonly assessed using metrics such as sensitivity, specificity, positive and negative predictive values, although AUROC is also frequently used. If an AI solution is intended for eventual clinical adoption, applicable clinical metrics should be used. This will not only facilitate communication with physicians and researchers, but also allow consistent comparison with other clinical diagnostic tools. In addition, all the used metrics should be clearly defined in the manuscript with a description of the method of calculation, in consideration of the reader in one particular field who may not have familiarity with the conventions in the other field.

Reproducibility and Transparency

Rigorous reporting of both clinical and technical details is critical to allow the readers the ability to fully assess the merits of a radiology AI study. When applicable, relevant clinical reporting guidelines should be followed, which include Standards for Reporting of Diagnostic Accuracy Studies (STARD),³ Strengthening the Reporting of Observational Studies in Epidemiology (STROBE)⁵ and Consolidated Standards of Reporting Trials (CONSORT).⁶ An excellent checklist specifically developed for AI in medical imaging (CLAIM) has been recently published.⁷ An AI-specific extension to the STARD guidelines (STARD-AI) is also in development.⁸ To allow the readers to verify the reproducibility of the results, the authors are encouraged to share the de-identified patient images and the code for AI modelling, training and data analysis in a publicly accessible repository.

Summary

Demonstrating clinical value with reliable generalizability is a key for successful clinical adoption of AI techniques in radiology.^9,10 A good radiology AI study should follow established clinical research and reporting guidelines where appropriate, while ensuring technical reproducibility and transparency. This requires collaborative efforts among radiologists, other physicians and technical experts from the inception of a study throughout all subsequent stages.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Bo Gong

Philippe Soyer

Matthew D. F. McInnes

Michael N. Patlas

References

Nakaura

Higaki

Awai

Ikeda

Yamashita

. A primer for understanding radiology articles about machine learning and deep learning. Diagn Interv Imaging. 2020;101(12):765-770. doi:10.1016/j.diii.2020.10.001.

Chassagnon

Dohan

. Artificial intelligence: From challenges to clinical implementation. Diagn Interv Imaging. 2020;101(12):763-764. doi:10.1016/j.diii.2020.10.007.

Cohen

Korevaar

Altman

, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: Explanation and elaboration. BMJ Open. 2016;6(11):e012799. doi:10.1136/bmjopen-2016-012799.

van der Pol

Patlas

. Canadian radiology in the age of artificial intelligence: A golden opportunity. Can Assoc Radiol J. 2020;71(2):127-128. doi:10.1177/0846537120907507.

von Elm

Altman

Egger

, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: Guidelines for reporting observational studies. Lancet. 2007;370(9596):1453-1457. doi:10.1016/S0140-6736(07)61602-X.

Schulz

Altman

Moher

CONSORT Group . CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c332-c332. doi:10.1136/bmj.c332.

Mongan

Moy

Kahn

Jr . Checklist for artificial intelligence in medical imaging (CLAIM): A guide for authors and reviewers. Radiology: Artif Intell. 2020;2(2):e200029. doi:10.1148/ryai.2020200029.

Sounderajah

Ashrafian

Aggarwal

, et al. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: The STARD-AI steering group. Nat Med. 2020;26(6):807-808. doi:10.1038/s41591-020-0941-1.

Tang

Tam

Cadrin-Chênevert

, et al. Canadian association of radiologists white paper on artificial intelligence in radiology. Can Assoc Radiol J. 2018;69(2):120-135. doi:10.1016/j.carj.2018.02.002.

10.

Chong

Patlas

. Radiology artificial intelligence: Bringing theory to clinical practice. Can Assoc Radiol J. 2021;72(1):6. doi:10.1177/0846537120959875.

Elements of a Good Radiology Artificial Intelligence Paper

Abstract

Keywords

Clinical Elements

Clinical Value

Patient/Case Selection

External Validation

Technical Elements

Reporting of Accuracy: Bridging Two Fields

Reproducibility and Transparency

Summary

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

References