Abstract

A good radiology article that explores the use of artificial intelligence (AI) techniques has the potential of improving patient care; however, producing such an article is not an easy feat. The difficulty lies in the interdisciplinary nature of a radiology AI study. It needs to be strong in both the clinical and technical sense. With increasingly more manuscripts on this topic being submitted to radiology journals for consideration, we highlight the following clinical and technical elements that a good radiology AI article should demonstrate.
Clinical Elements
Clinical Value
Although early interest in the use of AI in radiology appeared to be driven primarily by technical advancement of machine learning and deep learning algorithms, 1 the fundamental value of an AI solution intended for radiology rests on its eventual role in patient care. A good radiology AI study answers a carefully crafted imaging-based question based on clinical needs, which maximizes the technical potential of AI techniques within their technical limitations. For studies that investigate the diagnosis accuracy of an AI solution, it is important to define its role in the diagnostic pathway, and investigate its performance accordingly (in the intended role and compared with appropriate standard of care). For example, if an AI solution is intended as a replacement test, the authors should compare its performance with the radiologist’s performance; if intended for a triage or add-on role, the authors should instead compare the performance in an AI/radiologists combined setting with performance of radiologists alone. Furthermore, superior technical efficacy or diagnostic accuracy in isolation does not necessarily translate into improved clinical decision making or patient outcomes. Exploring these clinically oriented endpoints beyond diagnostic accuracy can better demonstrate the practical value of radiology AI techniques and facilitate their widespread adoption. 2
Patient/Case Selection
To achieve a high generalizability, careful patient selection is critical to ensure that the studied population best represents the patient population for the intended use. The strength of this representation is affected by the studied patient demographics (age, gender and ethnicity), basis of identification (prospective/retrospective, patient care setting, enrolment location and dates, consecutive/random/convenience series and inclusion/exclusion criteria), and disease profile (prevalence and severity of both the investigated and alternative diagnoses). 3 If pre-curated public online datasets are used, the developers/publishers of the public datasets should attempt to make such information available, and researchers using these datasets should report relevant information in the manuscript along with a discussion of the benefits/limitations of the dataset selection. 4 The studied cases should be homogenously distributed into the training, validation and test datasets, in terms of patient demographics and disease profile. Furthermore, the cases in the test datasets should be unique, that is, not duplicated in the training or validation datasets, and only be tested on once using the final model to ensure a non-biased measurement of accuracy.
External Validation
A major challenge to the clinical adoption of AI techniques is maintaining the same performance illustrated in research articles when deployed in real-world clinical settings. This can be partially attributed to the risk of overfitting, where a machine learning or deep learning algorithm is most optimized for the datasets it is developed on without appreciating the unknown confounders inherent in the small number of studied cases. To generalize their performance to routine clinical practices, the algorithms should be validated in external test datasets, ideally prospectively using adequately sized multi-institutional data.
Technical Elements
Reporting of Accuracy: Bridging Two Fields
When reporting diagnostic accuracy, early radiology AI studies tended to use metrics and terminology common in the computer science field, such as F1 score, precision, recall, accuracy or area under a receiver operating characteristic curve (AUROC). On the other hand, in the clinical field, performance of diagnostic tools is commonly assessed using metrics such as sensitivity, specificity, positive and negative predictive values, although AUROC is also frequently used. If an AI solution is intended for eventual clinical adoption, applicable clinical metrics should be used. This will not only facilitate communication with physicians and researchers, but also allow consistent comparison with other clinical diagnostic tools. In addition, all the used metrics should be clearly defined in the manuscript with a description of the method of calculation, in consideration of the reader in one particular field who may not have familiarity with the conventions in the other field.
Reproducibility and Transparency
Rigorous reporting of both clinical and technical details is critical to allow the readers the ability to fully assess the merits of a radiology AI study. When applicable, relevant clinical reporting guidelines should be followed, which include Standards for Reporting of Diagnostic Accuracy Studies (STARD), 3 Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) 5 and Consolidated Standards of Reporting Trials (CONSORT). 6 An excellent checklist specifically developed for AI in medical imaging (CLAIM) has been recently published. 7 An AI-specific extension to the STARD guidelines (STARD-AI) is also in development. 8 To allow the readers to verify the reproducibility of the results, the authors are encouraged to share the de-identified patient images and the code for AI modelling, training and data analysis in a publicly accessible repository.
Summary
Demonstrating clinical value with reliable generalizability is a key for successful clinical adoption of AI techniques in radiology.9,10 A good radiology AI study should follow established clinical research and reporting guidelines where appropriate, while ensuring technical reproducibility and transparency. This requires collaborative efforts among radiologists, other physicians and technical experts from the inception of a study throughout all subsequent stages.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
