Sage Journals: Discover world-class research

Abstract

French

Artificial intelligence (AI) software in radiology is becoming increasingly prevalent and performance is improving rapidly with new applications for given use cases being developed continuously, oftentimes with development and validation occurring in parallel. Several guidelines have provided reporting standards for publications of AI-based research in medicine and radiology. Yet, there is an unmet need for recommendations on the assessment of AI software before adoption and after commercialization. As the radiology AI ecosystem continues to grow and mature, a formalization of system assessment and evaluation is paramount to ensure patient safety, relevance and support to clinical workflows, and optimal allocation of limited AI development and validation resources before broader implementation into clinical practice. To fulfil these needs, we provide a glossary for AI software types, use cases and roles within the clinical workflow; list healthcare needs, key performance indicators and required information about software prior to assessment; and lay out examples of software performance metrics per software category. This conceptual framework is intended to streamline communication with the AI software industry and provide healthcare decision makers and radiologists with tools to assess the potential use of these software. The proposed software evaluation framework lays the foundation for a radiologist-led prospective validation network of radiology AI software. Learning Points: The rapid expansion of AI applications in radiology requires standardization of AI software specification, classification, and evaluation. The Canadian Association of Radiologists' AI Tech & Apps Working Group Proposes an AI Specification document format and supports the implementation of a clinical expert evaluation process for Radiology AI software.

Keywords

artificial intelligence health technology assessment external validation artificial intelligence software evaluation framework

Introduction

The increasing variety of radiology artificial intelligence (AI) software offers numerous opportunities to improve detection, diagnosis, and prediction of patient outcomes and increase quality and operational efficiencies. Numerous regulatory body approved AI software solutions have been announced in radiology, with greater than 241 FDA-approved solutions alone as of the time of writing, the vast majority of which were approved since 2017.¹ Independent research supports estimate the global market for medical imaging AI applications will reach $1.2 billion by 2024.²

Despite this, as radiology AI software has been deployed clinically, limitations of real-world effectiveness and weaknesses of radiology AI have now been observed. Many of these weaknesses primarily concern the inability of contemporary AI to adapt to novel inputs that may differ from training datasets by disease sub-type, imaging parameters or imaging artifacts. Standardized methods for external validation and performance evaluation, and mechanisms for prospective post-market system reliability monitoring still remain elusive and inconsistent throughout the field. Improving the state of evaluation is particularly urgent, given that in a recent survey conducted by the ACR Data Science Institute, approximately 30% of radiologists are currently using AI as a part of their practice with 20% of practices not currently using AI but planning to purchase AI tools in the next 1 to 5 years.³

To help guide this growth in AI development and deployment, several guidelines have provided reporting standards for publications of AI-based research in medicine and radiology.^4-7 Yet, in a large systematic review of COVID-19 detection algorithms, 258 of 320 publications screened using the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) and Radiomics Quality Score (RQS) failed quality control, with insufficient documentation of model selection, image pre-processing and algorithm training approach as the most common causes for rejection.⁸ A recent systematic review on AI algorithms that have undergone external validation has shown that only 13% of 86 studies stated adherence to a reporting quality guideline.⁹ Additionally, although many AI algorithms report performance metrics, fewer solutions have undergone multi-centric or external validation or evaluation, comprising only 6% of 516 medical imaging algorithms reviewed in a 2019 systematic review of publications investigating AI software performance in medical imaging analysis.¹⁰ Multiple recently published external validation studies of AI algorithms showed at least some decrease in performance between the training and external datasets, with 24% of 86 studies showing a substantial decrease. This loss of generalizability is commonly attributed to differences between the training and validation/test populations and limited interpretability of neural networks.^9,11

As the radiology AI ecosystem continues to grow and mature, there is a need for a formal approach to evaluation of AI software to ensure patient safety and user acceptance, both pre- and post-deployment. This paper seeks to provide a comprehensive formal format for regulators, healthcare organizations or clinical radiology practices to evaluate AI models before deployment. Complementary to that of the recent guidelines for evaluation of commercial AI solutions proposed by the ECLAIR guidelines,¹² the paper takes a format of standardized evaluation forms that can be utilized as is, or further modified and built upon, for structured evaluation of candidate AI solutions. These forms propose a systematic approach to the evaluation of a radiology AI product: from both accurately specifying and describing clinical radiology AI software through standards for single- or multi-centre pre-deployment evaluation and post-deployment monitoring.

Artificial Intelligence Software Overview

As per the Canadian Association of Radiologists (CAR) White Paper on AI in Radiology,¹³ radiology AI software can be categorized according to their role in the clinical workflow, type of application or classes of use cases. This typology is expected to evolve following advances in the field.

Clinical Applications of AI in Radiology: Workflow Roles, Application Types and Use Cases

The existing clinical pathway includes image acquisition followed by the radiologist’s interpretation and report creation, readily available to the clinician. Artificial intelligence software can be embedded in the clinical workflow for triage¹⁴ (e.g. screening tool), add-on (complementary tasks) or replacement of existing tasks (Figure 1). Alternatively, software may be defined according to the tasks performed such as which for computer vision tasks could include: classification, detection or segmentation or for natural language processing tasks could include: sentiment classification or summarization. Finally, software may also be divided by use cases: separating normal vs not normal, computer-aided detection, radiomics, workflow optimization, quality assurance, grading and classification, natural language processing, computer-assisted reporting and knowledge management. Some clinical use scenarios may target specific scenarios such as computer-aided detection of pneumothorax. Table 1 provides an overview of AI software categorized according to their role in the clinical workflow, types of applications and use cases, along with short definitions and companion examples. It is critical that radiology providers spend time up front clearly deciding on the intended clinical use to guide effective evaluation.

Figure 1.

Role of radiology AI software in the clinical radiology workflow. In the standard scenario, the clinician requests an imaging procedure, to which a protocol is assigned either by the radiologist or imaging technologist. The imaging data is acquired from the patient, processed, formatted to DICOM images, read and interpreted by the radiologist who then generates a radiological report. Clinicians can then use the information contained in the radiological report to guide their diagnosis and use existing tools to estimate the patient’s health outcome and prognosis. AI software tools complement radiologists by fulfilling different roles incorporated to their clinical workflow (blue). AI software tools can use DICOM images as input and accomplish a variety of task types (green). Many AI software use cases (black) improve the existing scenario.

Table 1.

Overview of role in clinical flow, types of applications, use cases of AI software.

Category	Short Definition	Example
1. Role in clinical workflow – role of AI in existing clinical pathways
A. Triage	Screening tool based on the probability of disease	Prioritization of scans with intracranial bleeding
B. Add-on	Realisation of time-consuming complementary tasks	Detection and quantification of pulmonary emphysema
C. Replacement	Completely automated exam interpretation	Estimation of bone age on hand X-ray
2. Types of applications – type of operation made by software on raw and/or DICOM images
A. Classification	Assignation of an image to a category	Determination of the consistency of a lung nodule (solid, mixed, ground-glass)
B. Detection/Localization	Identify an anomaly within an image	Detection of lung nodules
C. Segmentation	Isolation of a structure of interest from the remainder of the study	Liver segmentation, diameter measurement, volume calculation
D. Comparison	Compare 2 (or more) image features	Calculating the volume change of a mass between a prior and new examination
E. Post-processing	Facilitating or automating existing post-processing pipelines	Automated MRA cinematic 3D rendering
F. Radiomics	Extract of quantitative features from images to predict outcomes.	Classify a kidney lesion as benign or malignant
3. Use cases – type of operation made anywhere within the clinical radiology workflow
A. Separating normal vs not normal	Identify examinations with abnormal findings.	Flagging abnormal chest X-rays
B. Computer-aided detection	Provide radiology assistance for computer-aided detection or diagnosis.	Mammogram CAD to detection of suspicious microcalcifications
C. Clinical outcome prediction	Predict the likelihood of a given clinical scenario	Predict 5-year progression free survival
D. Workflow optimisation	Assign a level of reading priority.	Triage thorax CT examinations for incidental pulmonary emboli
E. Quality assurance	Evaluate image quality and technical protocol conformity in a radiology department.	Evaluate CT head coverage to avoid orbit lens dose exposure
F. Grading and classification	Provide image grading and classification for structured reporting.	Classify mammographic breast density as per BI-RADS criteria
G. Natural language processing	Convert unstructured text into a structured form to allow for the automated extraction of information.	Summarize findings into a pre-composed draft conclusion
H. Computed-assisted reporting	Provide automated information in reports by linking with clinical databases or by automated image detection, grading, and classification.	Pre-fill a cardiac CT report with cardiac function parameters
I. Knowledge management	Provide contextual information from the electronic medical record or report being dictated	Provide guidance on categorization of liver observations and corresponding LI-RADS management recommendations

Framework

Radiological Practice Health Needs Assessment

Appropriate evaluation of radiology AI software requires a clear description of the target clinical problem in terms of unmet need and technical specifications to operate. Developers should define the healthcare needs addressed by their software (e.g. quality of radiological care, healthcare network integration, patient perspective, employee experience, efficiency and costs), the applicable clinical domain (e.g. mammography and chest radiograph) and task (e.g. breast density evaluation and pneumothorax detection) and describe the expected positive impact on these needs and patient outcomes. New initiatives which help standardize radiological AI use cases are emerging, most notably the ACR Data Science Institute’s (DSI) comprehensive catalogue of AI use case documents providing a description of a clinical need potentially addressed by an AI algorithm, specifying technical details about the expected inputs needed by the AI algorithm, and the outputs produced.¹⁵

Several publications give a broad outline of itemized health needs assessments.¹⁶ In the context of radiological practice, the exact content of this assessment will vary according to the nature of the site (e.g. academic hospital centre, community hospital, and private practice), but should include well-defined items, key performance indicators and targets. Defining a health needs assessment is an essential first step in evaluating radiology software products, as it frames the appropriate key performance indicators, which could vary greatly between implementations. For example, metrics such as sensitivity or specificity of a given test constitute common diagnostic performance metrics. However, applications targeting different components of a clinical workflow, such as natural language processing tasks, could require metrics concerning report turnaround time. Evaluation metrics can also vary along several dimensions including overall performance, performance within subgroups (e.g. equipment, clinical care setting and subpopulation), workflow impact and health economic impact. Ensuring alignment of the key performance indicator corresponding to healthcare needs is required to build a case for implementation, quantify the value of the solution and frame how post-market surveillance would detect deviations in performance. Table 2 provides a sample list of healthcare needs and key performance indicators.

Table 2.

Healthcare Needs and Key Performance Indicators.

Healthcare Needs	Healthcare Needs	Key Performance Indicators
Quality of radiological care	• Minimize radiation dose	• Average radiation dose (mSV) per scan protocol
	• Reduce inappropriate examinations	• Ratio of linically appropriate examinations/choice of modality/imaging protocol
	• Select best modality and imaging protocol	• Report specificity and sensitivity frequency of report to imaged site (e.g. right/left) discrepancies
	• Lower false negatives in emergency radiology	• Accuracy of initial prognostication on follow-up
	• Lower error rates in reports
	• Better tumour and pathology prognostication

Healthcare network integration	• Facilitate urgent notifications	• Time from imaging to urgent notification
	• Facilitate communications with clinical teams	• Clinical team satisfaction scores
	• Increase radiologist availability

Patient perspective	• Shorten waiting lists	• Time between patient registration on waiting list and examination
	• Streamline patient experience	• Time between schedule and start times
	• Produce patient-oriented summaries	• Patient satisfaction scores

Employee perspective	• Facilitate technician feedback	• Frequency of anatomic range errors (excluded field of view)
	• Facilitate report production and review	• Ratio of examinations requiring technician feedback
		• Technician satisfaction scores

Efficiency	• Reduce workflow delays	• Time from imaging request to protocol submission
	• Automate repetitive tasks	• Time from imaging acquisition to report submission
	• Improve access to information and priors

Costs	• Maximize scanner utilization	• Ratio of missed appointments
Costs	• Facilitate accurate billing

Radiology AI Software Specifications

Given the significant variability in software solutions, an ideal software description must contain a significant amount of information to accurately communicate the true substance of a solution to reviewers for validation and evaluation. Existing AI reporting standards have thus far focused on AI/ML scientific reporting standards, such as the TRIPOD-ML, SPIRIT-AI and CONSORT-AI guidelines.^4-6 Similar guidance within the radiology literature has provided key considerations used by editorial boards to evaluate AI research.^7,17 We believe these efforts are extremely timely and will serve to further improve radiology AI scientific reporting. However, from an AI regulatory perspective, there are technical specifications that would be important to detail for regulatory review and that these reporting guidelines may not fully delineate. Table 3 proposes a list of required specifications which should be provided by developers for any AI software solution prior to assessment. Among these, particular emphasis should be placed on imaging technical requirements (e.g., spatial resolution, slice thickness and vascular phases), other input requirements (e.g. when the software uses other input types) and degree of expected user interaction, which are frequently underspecified and can significantly alter workflow integration and implementation feasibility. Additionally, product development information which was flagged as frequently underreported in scientific reviews included detail of the training or validation dataset, detail of the precise sources and specifications of data and labels, detail of the dataset’s population cohort and occasionally even fundamental information such as sample sizes.

Table 3.

Required information about each software solution prior to assessment.

Independent Evaluation Performance Assessment

Once a radiology AI software is appropriately specified, evaluation would greatly benefit from expert review, wherein a radiologist reviewer could evaluate hands-on a trial or demonstration of a solution to assess clinical relevance and integration with expected clinical workflow, and have the opportunity to challenge a system with sample cases, for face validation of software solution performance claims.¹⁸ This is especially relevant to AI-driven software, where there may be significant model generalization difficulties due to systematic differences between training datasets vs an end-user’s own clinical population.

This evaluation can be done either retrospectively by running the AI software on previously reported cases or prospectively if the software is applied to cases while the radiologist is reading them. A unique advantage of the latter validation period is that the direct effect of AI software on a reader’s clinical workflow and user interface can be directly assessed, an important practical consideration rarely characterized in scientific literature. However, prospective evaluation may not adequately assess performance for rare diseases.

Independent external test cases should be chosen to reflect the actual patient population on which the system is expected to be applied, with a disease prevalence replicating that which is encountered in a clinical radiologist’s practice. Appropriate test case sample composition sizes can be estimated by sample proportion (p) calculation methods.¹⁵ The size of the test set will depend on many factors, particularly the number of output classes and the frequency of relevant disease presentations. Ground truth would then be established for each test case, ideally with multi-rater consensus to establish inter-rater diagnostic variability, or if reference standards are defined from ground truth derived from clinical or pathology references, obtained appropriately. There is ongoing work to determine statistical power calculation methods to estimate appropriate sample sizes in AI software validation.^19,20 Some authors have suggested 50–100 cases may be sufficient when testing lesion detection or segmentation, whereas at least 200 cases for binary classification diagnostic software may be necessary.¹⁵ Such minimums are not exhaustive but could be used as general guidelines and may vary by clinical application and AI technique being evaluated.

The wide array of software product types, functions and outputs call for a similarly large spectrum of possible benchmarks, which may require custom testing strategies individualized to each application. For instance, a detection software can be assessed using sensitivity and specificity, a segmentation software can be assessed using Dice indexes, and a workflow optimization software can be assessed by throughput and reporting turnaround times. Table 4 provides a few examples of performance metrics for various software categories.

Table 4.

Examples of performance metrics per software category.

Detection

• Free-recall receiver operating characteristic (ROC) curve

• Sensitivity

• Specificity

• cost of error

• Intersection over union (IOU)

• Mean average precision (mAP)

• F1 Score

Triage

• Sensitivity

• Specificity

• Impact on workflow

Radiomics

• accuracy

• cost of error

• Added value on patient management

Segmentation

• Dice index

• Intersection over union (IOU)

• Reliability

Augmentation

• Dice index

• User experience (Likert, Kappa)

• Reliability

Classification

• Area under the ROC curve

• accuracy

Prediction

• Area under ROC curve

• accuracy

• R2

Natural language processing

• Report turnaround time

• Impact on workflow

• Added value on patient management

Platform services

• User experience (Likert, Kappa)

• Impact on workflow

• Added value on patient management

Metrics should be intuitive to the end user or clinician and be directly relevant to clinical use. Where possible, AI software evaluation should emphasize clear unambiguous statements of how a solution improves patient outcomes and precisely define quality of care outcomes that complement technical metrics in the performance assessment protocol, to maximize relevance to patient care and value for the healthcare system.

Post-Market Quality Assurance and Quality Control

Software output is subject to change over time, driven either by changes in the population (data drift), changes in the statistical relationship between the fundamental relationship of the input and output data over time (concept drift), or by the nature of the AI software product (e.g. locked vs adaptive algorithms.^21,22 Documentation of the software version along with predictions and quality control of software output are essential to detect and remediate any decrease in accuracy. Different strategies have been proposed for performance monitoring and detection of statistical changes in data generating processes, such as ACR assess-AI data registry.²³

As both a practical and technical capability matter, if a solution is not developed in-house, we expect that AI solution vendors themselves would ultimately be most responsible for refreshing a software’s models when concept drift occurs, or decreased performance is noted. There should be a pre-specified strategy for continued quality control established between the developer and end-users, clearly outlined in any service contract or agreement. An adverse event reporting procedure should clearly specify a procedure for reporting and escalating unexpected adverse events.

Future Directions

AI software has the potential to address a large variety of potential use cases. Tremendous intellectual and financial capital has been invested in developing new tools and platforms to further develop these use cases. This has created a steady stream of new products intended, in time, for full-scale clinical use. Despite this large array of imaging software, a standardized method for external validation and performance evaluation, as well as mechanisms for prospective post-market system reliability monitoring are still nascent.

Because of the great variety in applications and tasks, it is inherently difficult to standardize these evaluation processes. Some basic benchmarking protocols can still be adopted to validate products before clinical implementation and purchase by clinical users to benchmark real-world performance and compared with developer-provided specifications, assess for potential bias in algorithm training and risk for systematic errors, as well as unanticipated pitfalls and to ensure continuous quality control of software performance over extended periods of time.

Artificial intelligence software metric development is analogous to benchmarking in the computing hardware industry, with a widening variety of hardware computing products, resulting in a need for application-specific performance metrics. Because it is logistically challenging and in many hospital environments, impractical and prohibitive for individual radiology departments to perform validation of a variety of new AI software, we see a need for a national validation network providing testing cases for multiple clinical use case scenarios and labelled by radiology experts from several Canadian institutions representing a broad range of users from academic and community settings. This would help ensure the external validity and generalizability of performance claims by AI software manufacturers and provide patient, physician and regulator reassurance of system reliability, while also providing an efficient and expedient way for a vendor to demonstrate solution quality to stakeholders.

It is important to acknowledge that due to the dynamic nature of healthcare, performance at the time of deployment does not guarantee continued performance. As real-world deployments expand in reach, post-market surveillance of AI software will prove increasingly critical, akin to the development of pharmacovigilance in the pharmaceutical industry, potentially indicating a future need for standardized radiology AI adverse event reporting. Again, the infrastructure and expertise provided by a national validation network could provide updated testing datasets for dynamic post-market surveillance.

Conclusion

In summary, we have proposed our vision of how Radiology AI validation processes should be performed, inspired by previous work in the field and similar methods in other industries. The responsibility for validation is complex and multi-faceted and will have to draw heavily from the expertise of academic and clinical end-users, industry, regulatory and legal entities to address the variety of challenges ahead. A coordinated approach that balances the priorities of all stakeholders while leveraging each group’s individual strengths will best serve the public and patient’s interests, further grow the burgeoning field of radiology AI, and sustain user trust in these technologies for medical care provision. Delivering on such a robust ecosystem of development and validation will require the right technical infrastructure and strategic investments, but more than this, will require an enthusiastic network of stakeholders and investigators to realize the full potential of radiology AI software.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a Junior 2 and Senior research scholarship from the Fonds de recherche du Québec — Santé (Awards #26993 and #298509) to An Tang.

ORCID iD

William Tanguay

References

U. S. Food and Drug Administration, Center for Devices and Radiological . Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices; 2021. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices. accessed 27 July 2022.

Parekh

. 3 High-Impact AI Market Trends in Radiology at RSNA 2019. Imaging Technology News; 2020. http://www.itnonline.com/article/3-high-impact-ai-market-trends-radiology-rsna-2019. accessed 7 October 2021.

Allen

Agarwal

Coombs

, et al. 2020 ACR data science institute artificial intelligence survey. J Am Coll Radiol. 2021;18:1153-1159.

Moons

KGM

Altman

Reitsma

, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration. Ann Intern Med. 2015;162:W1-W73.

Rivera

Liu

Chan

A-W

, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: The SPIRIT-AI extension. BMJ. 2020;370:m3210.

Liu

Rivera

Moher

, et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: The CONSORT-AI extension. BMJ. 2020;370:m3164.

Mongan

Moy

Kahn

. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A guide for authors and reviewers. Radiol Artif Intell. 2020;2:e200029.

Roberts

Driggs

Thorpe

, et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell. 2021;3:199-217.

Mohajer

Eng

. External validation of deep learning algorithms for radiologic diagnosis: A systematic review. Radiol Artif Intell. 2022;4:e210064.

10.

Kim

Jang

Kim

, et al. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: Results from recently published papers. Korean J Radiol. 2019;20:405-410.

11.

Savage

. Breaking into the black box of artificial intelligence. Nature. Epub ahead of print 29 March 2022. doi:10.1038/d41586-022-00858-1

12.

Omoumi

Ducarouge

Tournier

, et al. To buy or not to buy-evaluating commercial AI solutions in radiology (the ECLAIR guidelines). Eur Radiol. 2021;31:3786-3796.

13.

Tang

Tam

Cadrin-Chênevert

, et al. Canadian association of radiologists white paper on artificial intelligence in radiology. Can Assoc Radiol J. 2018;69:120-135.

14.

Winkel

Heye

Weikert

, et al. Evaluation of an AI-based detection software for acute findings in abdominal computed tomography scans: Toward an automated work list prioritization of routine CT examinations. Invest Radiol. 2019;54:55-59.

15.

ACR Data Science Institute Releases Landmark Artificial Intelligence Use Cases . 2018. https://www.acr.org/Media-Center/ACR-News-Releases/2018/ACR-Data-Science-Institute-Releases-Landmark-Artificial-Intelligence-Use-Cases. accessed 7 October 2021.

16.

Sharpe

. Strategic Planning and Radiology Practice Management in the New Health Care Environment | RadioGraphics. https://pubs.rsna.org/doi/10.1148/rg.351140064. accessed 11 January 2022.

17.

Bluemke

Moy

Bredella

, et al. Assessing radiology research on artificial intelligence: A brief guide for authors, reviewers, and readers—from the radiology editorial board. Radiology. 2020;294:487-489.

18.

Pierce

Rosipko

Youngblood

, et al. Seamless integration of artificial intelligence into the clinical environment: Our experience with a novel pneumothorax detection artificial intelligence algorithm. J Am Coll Radiol JACR. 2021;18:1497-1505.

19.

Gibson

Huisman

, et al. Designing image segmentation studies: Statistical power, sample size and reference standard quality. Med Image Anal. 2017;42:44-59.

20.

Figueroa

Zeng-Treitler

Kandula

, et al. Predicting sample size required for classification performance. BMC Med Inform Decis Mak. 2012;12:8.

21.

Gerke

Babic

Evgeniou

, et al. The need for a system view to regulate artificial intelligence/machine learning-based software as medical device. Npj Digit Med. 2020;3:1-4.

22.

U.S. Food and Drug Administration . Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD); 2019. https://www.fda.gov/files/medicaldevices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf. accessed 12 August 2022.

23.

Allen

Dreyer

Stibolt

, et al. Evaluation and real-world performance monitoring of artificial intelligence models in clinical practice: Try it, buy it, check It. J Am Coll Radiol JACR. 2021;18:1489-1496.

Assessment of Radiology Artificial Intelligence Software: A Validation and Evaluation Framework

Abstract

Keywords

Introduction

Artificial Intelligence Software Overview

Clinical Applications of AI in Radiology: Workflow Roles, Application Types and Use Cases

Framework

Radiological Practice Health Needs Assessment

Radiology AI Software Specifications

Independent Evaluation Performance Assessment

Post-Market Quality Assurance and Quality Control

Future Directions

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

References