Radiomics and artificial intelligence for risk stratification of pulmonary nodules: Ready for primetime?

Abstract

Pulmonary nodules are ubiquitously found on computed tomography (CT) imaging either incidentally or via lung cancer screening and require careful diagnostic evaluation and management to both diagnose malignancy when present and avoid unnecessary biopsy of benign lesions. To engage in this complex decision-making, clinicians must first risk stratify pulmonary nodules to determine what the best course of action should be. Recent developments in imaging technology, computer processing power, and artificial intelligence algorithms have yielded radiomics-based computer-aided diagnosis tools that use CT imaging data including features invisible to the naked human eye to predict pulmonary nodule malignancy risk and are designed to be used as a supplement to routine clinical risk assessment. These tools vary widely in their algorithm construction, internal and external validation populations, intended-use populations, and commercial availability. While several clinical validation studies have been published, robust clinical utility and clinical effectiveness data are not yet currently available. However, there is reason for optimism as ongoing and future studies aim to target this knowledge gap, in the hopes of improving the diagnostic process for patients with pulmonary nodules.

Keywords

Radiomics artificial intelligence lung cancer risk stratification pulmonary nodule

1. Introduction

The past four decades have seen a dramatic increase in thoracic computed tomography (CT) imaging, resulting in approximately 1.6 million adults in the U.S. diagnosed with incidentally-detected pulmonary nodules (PNs) annually [1,2]. Moreover, based on the 2013 U.S. Preventative Services Task Force (USPSTF) recommendations, an estimated 8.0 million adults in the U.S. are eligible for lung cancer screening with low-dose CT [3], and this population is anticipated to expand to 15.1 million with implementation of the 2021 USPSTF recommendations [4]. When PNs are detected – either incidentally or via screening – lung cancer is the primary concern, as it is the deadliest malignancy in the U.S. and worldwide [5]. A definitive diagnosis of lung cancer requires an invasive lung biopsy, which is associated with certain procedural costs and potential significant risks, including respiratory failure, pneumothorax, myocardial infarction, and even death [6,7,8,9,10]. Therefore, malignancy risk stratification is the fundamental first step in guiding PN management decisions among clinicians, who seek to diagnose cancer in a timely manner while avoiding unnecessary procedures for those with benign PNs [11]. Among suspicious PNs > 8 mm in maximal diameter, clinical guidelines for both incidentally-detected PNs (American College of Chest Physicians [12]. Fleischner Society [13]) and screen-detected PNs (American College of Radiology Lung Imaging Reporting and Data System [Lung-RADS] [14]) recommend surgical resection for high risk PNs ( > 65% risk of malignancy) and conservative management with non-invasive serial CT imaging surveillance for very low risk PNs ( < 5% risk of malignancy). However, clinicians face a diagnostic dilemma among intermediate risk (5%–65% malignancy risk) PNs, as they must decide whether to pursue a lung biopsy or surveil with serial imaging. This crucial decision has important implications for patients. A malignant PN inappropriately managed with imaging surveillance delays a cancer diagnosis and may even deny a patient the opportunity for curative treatment. On the other hand, a patient with a benign PN recommended to undergo lung biopsy has been unnecessarily exposed to the risks and costs associated with this invasive procedure.

There currently exists a significant misalignment between malignancy risk stratification processes and clinical management decisions [9,15,16,17,18]. As many as 45% of individuals undergoing a lung biopsy for evaluation of a PN are ultimately found to have a benign diagnosis [15,16,17,18,19,20], meaning that a considerable proportion of patients are unnecessarily exposed to the potential complications and harms of lung biopsy procedures. While conventional regression-based risk prediction models incorporating a variety of clinical and PN characteristics have been in existence since the late 1990s (e.g., the Mayo Clinic and Brock models) [21,22,23,24] they do not reliably outperform routine clinician assessment of malignancy risk [15,25,26]. Moreover, only 18% of thoracic surgeons and 31% of pulmonologists regularly use any clinical risk prediction model [27], and clinicians do not consistently document a quantitative estimate of cancer risk [28]. Thus, a core focus of the thoracic oncology scientific and clinical community is to improve PN malignancy risk stratification to better guide subsequent management decisions [29,30]. A smorgasbord of biomarkers has been developed recently [31] including blood-based [32,33,34,35,36], airway-based [19,37,38], breath-based [39,40,41], and imaging-based tests and devices [42,43]. This article will focus specifically on recent efforts to use radiomics and artificial intelligence (AI) technology for PN risk stratification and the practical hurdles that exist for clinical implementation.

2. Radiomics and artificial intelligence

Table 1
Selected studies on radiomics-based risk stratification of pulmonary nodules.

Publication Study design and objective Populations or datasets Model and analytical details Key results

Way et al, 2006 [51] Analytical validation study to develop a CAD model and assess performance of image segmentation. Training data: 96 PNs (4–60 mm; 46% malignant) from 58 pts at the University of Michigan. Validation data for segmentation: experienced radiologists’ segmentation of 23 PNs from LIDC. 3D active contour segmentation with manual feature extraction, selection, and classification. CAD model trained and tested using leave-one-case-out resampling scheme. AUC = 0.83. Model-segmented PN volumes greater than those outlined by LIDC radiologists.

Way et al, 2009 [52] Analytical validation study to refine above CAD model. Training data: 256 PNs (3–38 mm; 48% malignant) from 152 pts at the University of Michigan. Novel PN surface features characterizing smoothness and shape irregularity added to CAD model (described above). Demographics (age, gender) and LDA classifier also assessed. AUC = 0.86 with addition of novel PN surface features. No significant difference in CAD model performance when demographic features or LDA classifier included.

Way et al, 2010 [53] Retrospective multi-reader, multi-case study to assess effect of above CAD model on radiologists’ performance discriminating between malignant and benign PNs. Reader study: 6 fellowship-trained thoracic radiologists evaluated 256 PNs (3–38 mm; 48% malignant) from 152 pts at the University of Michigan. CAD model (described above). Model output = relative malignancy rating on a scale of 1 to 10, representing a 10-bin histogram of scores with fitted Gaussian distributions for malignant benign PNs. CAD AUC = 0.86. Average radiologists’ AUC increased from 0.83 to 0.85 with CAD.

Huang et al, 2018 [54] Analytical validation study using matched case-control data to derive and evaluate a novel CAD model. Training data: 140 PNs (4–20 mm; 50% malignant) from 140 pts in the NLST. Validation data: 46 PNs (4–20 mm; 43% malignant) from 46 pts in the NLST. All pts underwent lung biopsy. Malignant and benign PNs were matched based on demographic, clinical, and PN variables. Image processing and feature extraction performed by expert radiologists. Random forest machine learning algorithm used to select variables and develop CAD model. Validation cohort: CAD AUC = 0.92. CAD: Sn = 0.95, Sp = 0.88, PPV = 0.86, NPV = 0.96. Three radiologists’ combined reading: Sn = 0.70, Sp = 0.69, PPV = 0.64, NPV = 0.75.

Peikert et al, 2018 [55] Analytical validation study to develop and internally validate a radiomics-based multivariable model (BRODERS model). Training data: 726 PNs (7–30 mm; 56% malignant) from 726 pts in the NLST. PNs segmented manually using ANALYZE software (Mayo Clinic Biomedical Imaging Resource) and radiomic features extracted. LASSO multivariable analysis used to develop final model. Optimism-corrected AUC for final 8-variable BRODERS model = 0.94.

Maldonado et al, 2020 [56] Analytical validation study to externally validate BRODERS model. External validation data: 170 PNs (7–30 mm; 54% malignant) from 170 consecutive pts with incidentally detected PNs at Vanderbilt University. BRODERS model (described above) compared to Brock model. BRODERS AUC = 0.90; Brock AUC = 0.87.

Balagurunathan et al, 2019 [57] Analytical validation study using a 2:1 nested case-control study design to develop a novel radiomics model. Training data: 244 PNs ( > 4 mm; 32% malignant) from 244 pts in the NLST. Validation data: 235 PNs ( > 4 mm; 37% malignant) from 235 pts in the NLST. Malignant and benign PNs were matched based on demographic and clinical variables. PNs 3D segmented by radiologists via semi-automated algorithm, 219 quantitative features extracted, and an optimal linear classifier model was used. In both training (0.85 vs 0.80) and validation (0.88 vs 0.86) datasets, AUC was higher for best texture feature set compared to size and shape feature set. Addition of clinical data did not significantly improve AUC.

Ardila et al, 2019 [58] Analytical validation study and retrospective reader study to develop and externally validate a novel radiomics-based AI CAD model. Training data: 29,541 PNs (4% malignant) from NLST. Tuning data: ∼6,343 PNs (5% malignant) from NLST. Validation data: 6,716 PNs (4% malignant) from NLST. Reader study: 6 board-certified radiologists evaluated 507 CTs with PNs (16% malignant; subset of validation data). CAD approach developed using the TensorFlow platform (Google Inc.) and employed a 3D CNN model that performs end-to-end analysis of whole-CT volumes. Model output = LUMAS, roughly meant to correspond to Lung-RADS 3, 4A, and 4B/4X. Validation cohort: AI CAD AUC = 0.94. AI CAD outperformed radiologists within each LUMAS bucket in reader study when either 1 CT scan was used per pt or when multiple scans were available per pt.

Venkadesh et al, 2021 [59] Analytical validation study and retrospective reader study to develop and externally validate a novel radiomics-based AI CAD model. Training data: 16,077 PNs ( > 4 mm; 8% malignant) from NLST. Validation data: 883 PNs in full cohort (7% malignant); 175 non-size-matched PNs in subset A (34% malignant); 177 size-matched PNs in subset B (33% malignant) from the DLCST. Reader study: 11 clinicians (9 radiologists, 2 pulmonologists) evaluated PNs in cancer-enriched cohorts. 2D CNN with ResNet50 backbone and 3D CNN based on Inception-v1 architecture used to develop AI CAD algorithm. Internally validated using 10-fold cross validation. AI CAD model compared to Brock model and clinicians. Model output = risk score from 0 to 1. Full validation cohort: AI CAD AUC = 0.93; Brock AUC = 0.90. Subset A cohort: AI CAD AUC = 0.96; average clinician AUC = 0.90; Brock AUC = 0.94. Subset B cohort: AI CAD AUC = 0.86; average clinician AUC = 0.82; Brock AUC = 0.75.

Massion et al, 2020 [60] Analytical validation study to develop and externally validate a novel radiomics-based AI CAD model (Optellum LCP-CNN). Training data: > 130,000 PNs (∼50% malignant) from NLST. Internal validation data: 15,693 PNs ( > 6 mm; 6% malignant) from 6,547 pts in the NLST. External validation data: 116 PNs (5–30 mm; 55% malignant) from 116 pts with incidentally detected PNs at Vanderbilt University; 463 PNs (5–19 mm; 14% malignant) from 427 pts with incidentally detected PNs at Oxford University 2.5D CNN with DenseNet architecture with 5 dense blocks and PyTorch framework for machine learning. Internally validated using 8-fold cross validation. Model output = score between 0% and 100% to represent likelihood of malignancy. Compared to Brock and Mayo Clinic models. Internal validation cohort: LCP-CNN AUC = 0.92; Brock AUC = 0.86; Mayo Clinic AUC = 0.85. Vanderbilt University external validation cohort: LCP-CNN AUC = 0.84; Mayo Clinic AUC = 0.78. Oxford University external validation cohort: LCP-CNN AUC = 0.92; Mayo Clinic AUC = 0.82.

Baldwin et al, 2020 [61] Analytical validation study to externally validate the Optellum LCP-CNN model. External validation data: 1,397 PNs (5–15 mm; 17% malignant) from 1,187 U.K. pts in IDEAL study. Optellum LCP-CNN model (described above) compared to Brock model. LCP-CNN AUC = 0.87; Brock AUC = 0.83.

Kim et al, 2022 [62] Retrospective multi-reader, multi-case study to assess the effect of Optellum AI CAD model on clinicians’ performance discriminating between malignant and benign PNs. Reader study: 12 clinicians (6 radiologists, 6 pulmonologists) evaluated 300 CTs with PNs (5–30 mm; 50% malignant) from 300 pts from 7 sources in the U.S., U.K., and NLST. Optellum LCP-CNN model (described above). Model output = LCP score 1 to 10, categorizing malignancy risk on a decile scale for a population with 30% cancer prevalence. Average clinicians’ AUC increased from 0.82 to 0.89 with AI CAD. Interobserver agreement (Fleiss Kappa) improved with AI CAD for < 5% risk (0.71 vs 0.50) and > 65% risk (0.71 vs 0.54) categories and PN management decisions (0.52 vs 0.44).

Kim et al, 2023 [63] Secondary analysis of above retrospective multi-reader, multi-case study to assess the effect of Optellum AI CAD model on clinicians’ management of PNs. Reader study: described above. LCP score (described above). Appropriate PN management defined as surgery, biopsy, or immediate imaging for malignant PNs and imaging follow-up for benign PNs. Average clinicians’ risk estimate without vs with AI CAD: 60% vs 69% (malignant PNs); 23% vs 21% (benign PNs). Average clinicians’ appropriate PN management without vs with AI CAD: 80% vs 84% (overall); 72% vs 81% (malignant PNs); 87% vs 89% (benign PNs).

Publication	Study design and objective	Populations or datasets	Model and analytical details	Key results
Way et al, 2006 [51]	Analytical validation study to develop a CAD model and assess performance of image segmentation.	Training data: 96 PNs (4–60 mm; 46% malignant) from 58 pts at the University of Michigan. Validation data for segmentation: experienced radiologists’ segmentation of 23 PNs from LIDC.	3D active contour segmentation with manual feature extraction, selection, and classification. CAD model trained and tested using leave-one-case-out resampling scheme.	AUC = 0.83. Model-segmented PN volumes greater than those outlined by LIDC radiologists.
Way et al, 2009 [52]	Analytical validation study to refine above CAD model.	Training data: 256 PNs (3–38 mm; 48% malignant) from 152 pts at the University of Michigan.	Novel PN surface features characterizing smoothness and shape irregularity added to CAD model (described above). Demographics (age, gender) and LDA classifier also assessed.	AUC = 0.86 with addition of novel PN surface features. No significant difference in CAD model performance when demographic features or LDA classifier included.
Way et al, 2010 [53]	Retrospective multi-reader, multi-case study to assess effect of above CAD model on radiologists’ performance discriminating between malignant and benign PNs.	Reader study: 6 fellowship-trained thoracic radiologists evaluated 256 PNs (3–38 mm; 48% malignant) from 152 pts at the University of Michigan.	CAD model (described above). Model output = relative malignancy rating on a scale of 1 to 10, representing a 10-bin histogram of scores with fitted Gaussian distributions for malignant benign PNs.	CAD AUC = 0.86. Average radiologists’ AUC increased from 0.83 to 0.85 with CAD.
Huang et al, 2018 [54]	Analytical validation study using matched case-control data to derive and evaluate a novel CAD model.	Training data: 140 PNs (4–20 mm; 50% malignant) from 140 pts in the NLST. Validation data: 46 PNs (4–20 mm; 43% malignant) from 46 pts in the NLST. All pts underwent lung biopsy. Malignant and benign PNs were matched based on demographic, clinical, and PN variables.	Image processing and feature extraction performed by expert radiologists. Random forest machine learning algorithm used to select variables and develop CAD model.	Validation cohort: CAD AUC = 0.92. CAD: Sn = 0.95, Sp = 0.88, PPV = 0.86, NPV = 0.96. Three radiologists’ combined reading: Sn = 0.70, Sp = 0.69, PPV = 0.64, NPV = 0.75.
Peikert et al, 2018 [55]	Analytical validation study to develop and internally validate a radiomics-based multivariable model (BRODERS model).	Training data: 726 PNs (7–30 mm; 56% malignant) from 726 pts in the NLST.	PNs segmented manually using ANALYZE software (Mayo Clinic Biomedical Imaging Resource) and radiomic features extracted. LASSO multivariable analysis used to develop final model.	Optimism-corrected AUC for final 8-variable BRODERS model = 0.94.
Maldonado et al, 2020 [56]	Analytical validation study to externally validate BRODERS model.	External validation data: 170 PNs (7–30 mm; 54% malignant) from 170 consecutive pts with incidentally detected PNs at Vanderbilt University.	BRODERS model (described above) compared to Brock model.	BRODERS AUC = 0.90; Brock AUC = 0.87.
Balagurunathan et al, 2019 [57]	Analytical validation study using a 2:1 nested case-control study design to develop a novel radiomics model.	Training data: 244 PNs ( > 4 mm; 32% malignant) from 244 pts in the NLST. Validation data: 235 PNs ( > 4 mm; 37% malignant) from 235 pts in the NLST. Malignant and benign PNs were matched based on demographic and clinical variables.	PNs 3D segmented by radiologists via semi-automated algorithm, 219 quantitative features extracted, and an optimal linear classifier model was used.	In both training (0.85 vs 0.80) and validation (0.88 vs 0.86) datasets, AUC was higher for best texture feature set compared to size and shape feature set. Addition of clinical data did not significantly improve AUC.
Ardila et al, 2019 [58]	Analytical validation study and retrospective reader study to develop and externally validate a novel radiomics-based AI CAD model.	Training data: 29,541 PNs (4% malignant) from NLST. Tuning data: ∼6,343 PNs (5% malignant) from NLST. Validation data: 6,716 PNs (4% malignant) from NLST. Reader study: 6 board-certified radiologists evaluated 507 CTs with PNs (16% malignant; subset of validation data).	CAD approach developed using the TensorFlow platform (Google Inc.) and employed a 3D CNN model that performs end-to-end analysis of whole-CT volumes. Model output = LUMAS, roughly meant to correspond to Lung-RADS 3, 4A, and 4B/4X.	Validation cohort: AI CAD AUC = 0.94. AI CAD outperformed radiologists within each LUMAS bucket in reader study when either 1 CT scan was used per pt or when multiple scans were available per pt.
Venkadesh et al, 2021 [59]	Analytical validation study and retrospective reader study to develop and externally validate a novel radiomics-based AI CAD model.	Training data: 16,077 PNs ( > 4 mm; 8% malignant) from NLST. Validation data: 883 PNs in full cohort (7% malignant); 175 non-size-matched PNs in subset A (34% malignant); 177 size-matched PNs in subset B (33% malignant) from the DLCST. Reader study: 11 clinicians (9 radiologists, 2 pulmonologists) evaluated PNs in cancer-enriched cohorts.	2D CNN with ResNet50 backbone and 3D CNN based on Inception-v1 architecture used to develop AI CAD algorithm. Internally validated using 10-fold cross validation. AI CAD model compared to Brock model and clinicians. Model output = risk score from 0 to 1.	Full validation cohort: AI CAD AUC = 0.93; Brock AUC = 0.90. Subset A cohort: AI CAD AUC = 0.96; average clinician AUC = 0.90; Brock AUC = 0.94. Subset B cohort: AI CAD AUC = 0.86; average clinician AUC = 0.82; Brock AUC = 0.75.
Massion et al, 2020 [60]	Analytical validation study to develop and externally validate a novel radiomics-based AI CAD model (Optellum LCP-CNN).	Training data: > 130,000 PNs (∼50% malignant) from NLST. Internal validation data: 15,693 PNs ( > 6 mm; 6% malignant) from 6,547 pts in the NLST. External validation data: 116 PNs (5–30 mm; 55% malignant) from 116 pts with incidentally detected PNs at Vanderbilt University; 463 PNs (5–19 mm; 14% malignant) from 427 pts with incidentally detected PNs at Oxford University	2.5D CNN with DenseNet architecture with 5 dense blocks and PyTorch framework for machine learning. Internally validated using 8-fold cross validation. Model output = score between 0% and 100% to represent likelihood of malignancy. Compared to Brock and Mayo Clinic models.	Internal validation cohort: LCP-CNN AUC = 0.92; Brock AUC = 0.86; Mayo Clinic AUC = 0.85. Vanderbilt University external validation cohort: LCP-CNN AUC = 0.84; Mayo Clinic AUC = 0.78. Oxford University external validation cohort: LCP-CNN AUC = 0.92; Mayo Clinic AUC = 0.82.
Baldwin et al, 2020 [61]	Analytical validation study to externally validate the Optellum LCP-CNN model.	External validation data: 1,397 PNs (5–15 mm; 17% malignant) from 1,187 U.K. pts in IDEAL study.	Optellum LCP-CNN model (described above) compared to Brock model.	LCP-CNN AUC = 0.87; Brock AUC = 0.83.
Kim et al, 2022 [62]	Retrospective multi-reader, multi-case study to assess the effect of Optellum AI CAD model on clinicians’ performance discriminating between malignant and benign PNs.	Reader study: 12 clinicians (6 radiologists, 6 pulmonologists) evaluated 300 CTs with PNs (5–30 mm; 50% malignant) from 300 pts from 7 sources in the U.S., U.K., and NLST.	Optellum LCP-CNN model (described above). Model output = LCP score 1 to 10, categorizing malignancy risk on a decile scale for a population with 30% cancer prevalence.	Average clinicians’ AUC increased from 0.82 to 0.89 with AI CAD. Interobserver agreement (Fleiss Kappa) improved with AI CAD for < 5% risk (0.71 vs 0.50) and > 65% risk (0.71 vs 0.54) categories and PN management decisions (0.52 vs 0.44).
Kim et al, 2023 [63]	Secondary analysis of above retrospective multi-reader, multi-case study to assess the effect of Optellum AI CAD model on clinicians’ management of PNs.	Reader study: described above.	LCP score (described above). Appropriate PN management defined as surgery, biopsy, or immediate imaging for malignant PNs and imaging follow-up for benign PNs.	Average clinicians’ risk estimate without vs with AI CAD: 60% vs 69% (malignant PNs); 23% vs 21% (benign PNs). Average clinicians’ appropriate PN management without vs with AI CAD: 80% vs 84% (overall); 72% vs 81% (malignant PNs); 87% vs 89% (benign PNs).

Abbreviations: CAD = computer-aided diagnosis; PN = pulmonary nodule; pts = patients; LIDC = Lung Image Database Consortium; LDA = linear discriminant analysis; NLST = National Lung Screening Trial; Sn = sensitivity; Sp = specificity; PPV = positive predictive value; NPV = negative predictive value; BRODERS = Benign Versus Aggressive Nodule Evaluation Using Radiomics Stratification; LASSO = least absolute shrinkage and selection operator; AI = artificial intelligence; CNN = convolutional neural network; CT = computed tomography; LUMAS = lung malignancy score; Lung-RADS = Lung Imaging reporting and Data System; DLCST = Danish Lung Cancer Screening Trial; LCP-CNN = Lung Cancer Prediction Convolutional Neural Network; U.K. = United Kingdom; IDEAL = Artificial Intelligence and Big Data for Early Lung Cancer Diagnosis; U.S. = United States.

Radiomics-based computer-aided diagnosis (CAD) tools demonstrate promise for noninvasive PN risk stratification using solely CT imaging data. CAD describes the automation of image review to assist clinicians with making diagnoses [44], and the past two decades have seen CAD paired with radiomics, which uses advanced mathematical analysis of imaging data to aid interpretation [45,46]. More recently, the evolution of AI has allowed deep learning using neural networks to enhance the development of radiomics-based CAD tools [47,48]. The potential benefit of such tools lies in their ability to analyze additional data invisible to the human eye (including shape, spatial complexity, textures, and wavelet transformations) and provide information to clinicians beyond PN size, spiculation, and density [49]. Additionally, in contrast to traditional clinical risk prediction models that require clinicians to enter discrete variables into a model to calculate a probability of malignancy, radiomics-based CAD tools automate this process, which theoretically could lower the threshold for clinical uptake. Numerous studies to date have been published describing the development and validation of radiomics-based biomarkers for PN risk stratification [50,51,52,53,54,55,56,57,58,59,60,61,62,63]. An exhaustive systematic review of all radiomics-based CAD tools is outside the scope of this focused narrative review, which will cover select notable examples to date (Table 1).

Initial efforts to incorporate radiomics-based quantitative imaging data into models to distinguish between malignant and benign PNs used conventional machine learning approaches, which rely on explicit parameters based on expert knowledge and classic multivariable model development techniques. In 2006, Way and colleagues first described a CAD system that was trained on clinical imaging data, evaluated using data from the Lung Image Database Consortium and differentiated malignant from benign PNs using morphological and texture characteristics via a three-dimensional active contour method, achieving an area under the receiver operating characteristic curve (AUC) of 0.83 [51]. This system was then updated a few years later to include additional nodule characteristics including surface smoothness and shape irregularity, achieving an AUC of 0.86 [52]. Next, this group performed a multi-reader, multi-case study using retrospective PN CT data from the University of Michigan to evaluate the effect of this CAD tool on radiologists’ performance discriminating between malignant and benign PNs and found that on average radiologists’ AUC increased from 0.83 to 0.85 (P < 0.01) [53]. In 2018, Huang and colleagues published the results of their CAD algorithm, which analyzed adjacent lung tissues in addition to PN texture features and was derived from random forest machine learning using National Lung Screening Trial (NLST) data [54]. They performed a matched case-control study and reported a CAD AUC of 0.92, sensitivity of 0.95, specificity of 0.88, positive predictive value (PPV) of 0.86, and a negative predictive value (NPV) of 0.96, which outperformed three radiologists’ collective evaluations (sensitivity: 0.70, specificity: 0.69, PPV: 0.64, NPV: 0.75). In 2018, Peikert and colleagues also used NLST data to develop a distinct radiomics-based model via manual software segmentation, incorporation of both PN and adjacent lung tissue characteristics, and the least absolute shrinkage selection operator (LASSO) method for multivariable model development, and reported an associated AUC of 0.94 on internal validation [55]. Subsequent external validation of this model using data from the Vanderbilt University Lung Nodule Registry yielded an AUC of 0.90 [56]. In 2019, Balagurunathan and colleagues published the results of their radiomics-based models also trained on NLST data reporting an AUC as high as 0.85 and noting the superior contribution of texture metrics in comparison to traditional size metrics [57]. The authors also found that discrimination was not augmented when clinical factors were incorporated into their radiomics-based models.

An alternative method of harnessing and analyzing radiomics-based quantitative imaging data from CT scans to develop a predictive model requires the use of AI [48,49]. Advancements in AI have ushered in the emergence of deep learning algorithms that do not rely on explicit feature parameter inputs but instead are trained via direct interaction with the data, theoretically enhancing problem-solving abilities. Convolutional neural networks (CNNs) are currently the most commonly used deep learning architecture in medical imaging. Generally speaking, these AI deep learning algorithms simultaneously evaluate imaging data, extract and aggregate features, and integrate this information to achieve high-level reasoning and ultimately make a prediction regarding PN malignancy risk. Radiomics-based tools that use AI technology fundamentally differ from those that do not, as these algorithms “learn” independently, can potentially identify previously unknown imaging features, and are capable of being iteratively updated by the introduction of new training data. A small but growing number of radiomics-based AI tools have been developed to date. In 2019, Ardila and colleagues described the development of a CNN model designed by Google that was trained on and validated in NLST imaging data. Notably, this model used full-volume imaging data (i.e,. the entire axial series of images) to classify malignancy risk. They reported an AUC of 0.94, which outperformed six radiologists [58]. The authors proposed a four-tier lung malignancy scoring (LUMAS) system, loosely meant to correspond with estimated malignancy probabilities associated with American College of Radiology Lung-RADS categories, but emphasized that optimization of this scoring system for use in clinical practice had yet to be performed. Separately, in 2021 Venkadesh and colleagues published the results of their CNN-based algorithm that was trained on NLST data and externally validated using data from the Danish Lung Cancer Screening Trial. Their deep learning algorithm outperformed the Brock (PanCan) traditional clinical risk prediction model (AUC: 0.93 vs 0.90; P < 0.05) and performed similarly to thoracic radiologists (AUC: 0.96 vs 0.90; P = 0.11) [59]. The authors initially made their algorithm freely accessible to the public for a time and concluded that their AI-based algorithm could serve as an adjunct for radiologists evaluating screening CT scans in the future.

To date, the only radiomics-based AI algorithm to gain both U.S. Food and Drug Administration 510(k) clearance (2021) and European Union CE marking (2022) is the Lung Cancer Prediction Convolutional Neural Network (LCP-CNN) developed by Optellum. This AI CAD tool was trained on and internally validated in NLST data of screen-detected PNs (AUC: 0.92) and was externally validated using imaging data of incidentally-detected PNs from Vanderbilt University Medical Center (AUC: 0.84), Oxford University Hospital National Health Service (NHS) Foundation Trust (AUC: 0.92), Leeds Teaching Hospital NHS Trust (AUC: 0.88), and Nottingham University Hospitals NHS Trust (AUC: 0.89) [60,61]. Additionally, the LCP-CNN had superior discrimination compared to both the Mayo Clinic and the Brock (PanCan) clinical models. A commercially available version of the LCP-CNN generates a radiomics biomarker Lung Cancer Prediction (LCP) score that represents an estimate of predicted risk of malignancy on a decile scale. In 2022, a retrospective multi-reader, multi-case study was performed to evaluate the effect of the LCP-CNN on clinicians’ malignancy risk assessments [62]. Twelve clinicians (six pulmonologists and six radiologists) each evaluated 300 chest CT cases of PNs and were asked to provide an estimate of PN malignancy risk (0%–100%) and a management recommendation for each case before and after using the AI tool. When using the tool, clinicians’ average discrimination improved by 7 percentage points (AUC: 0.89 vs 0.82; P < 0.001) and sensitivity and specificity at both the 5% and 65% malignancy risk thresholds increased as well. Interobserver agreement for both clinically relevant malignancy risk categories ( < 5%, 5%–30%, 31%–65%, > 65%) and management recommendations (no action, CT surveillance, diagnostic procedure) also increased with use of the AI tool. Moreover, the average proportion of appropriately managed PN cases (defined as immediate imaging or biopsy for malignant PNs and no action or imaging surveillance for benign PNs) increased from 80% to 84% with use of the LCP-CNN in this retrospective study [63].

3. Barriers to implementation

Despite the plethora of novel radiomics and AI-based CAD tools that have been developed and the well-known need for improved PN risk stratification, widespread adoption of this technology has not yet occurred despite being commercially available. The reason why is likely multifaceted. First, while all of the aforementioned studies reported metrics for model performance (i.e., AUC, sensitivity, and specificity), prospective clinical utility studies using real-world data have not yet been performed. It is critical to note that models associated with high levels of discrimination (i.e., AUC) do not necessarily equate to high-performing models in clinical settings that differ from patient populations in which models were originally trained and validated [64]. Specifically, differences in demographic characteristics and cancer prevalence could limit generalizability of model performance in distinct populations. In fact, the more relevant metric for model performance and applicability to specific patient care scenarios is model calibration [65]. Currently, there does not exist a standardized approach to systematically evaluate AI in healthcare or how best to evaluate the clinical utility of new technologies. However, several approaches to rigorously evaluating novel AI technologies have been proposed. For example, Park and colleagues have proposed an approach akin to the classic framework for new drug development, advancing scientific inquiry from phase 1 safety-focused studies to eventual phase 4 clinical effectiveness studies [66]. Khera and colleagues have suggested a holistic approach to AI evaluation and implementation with an emphasis on health quality, equity, generalizability, and medical education in addition to evaluating patient-centered outcomes [67]. Of course, the optimal method for evaluating any novel intervention is to perform a prospective randomized controlled trial assessing patient-centered outcomes. To date, no such studies have been published. Second, much has been made of the unique challenges AI technology poses in the medical setting. As AI tools use an automated approach to independent learning, concerns have been raised regarding the “black-box” nature of which factors drive AI decision-making and risk estimation [68]. This opaqueness in what is “under the hood” of AI algorithms have resulted in mistrust among clinicians [69]. In fact, a recent survey of clinicians highlighted limited acceptance and trust of AI technology as a significant perceived barrier to implementation [70]. This survey also revealed clinicians’ concerns about safety, inconsistent technical performance, absence of standardized guidelines, lack of technical knowledge, and loss of autonomy. Radiologists have additionally raised concerns regarding medical-legal liability, responsibility for the results of AI-generated recommendations, and the nature of AI integration into routine clinical workflow [71]. Third, as radiomics-based tools require high resolution CT images to be available and large imaging data files to be uploaded into CAD software platforms, practical barriers to clinical implementation include lack of standardization of CT image acquisition across different healthcare institutions and disruption of clinical workflow in already busy pulmonary nodule clinics. Finally, the medical community’s overall wariness of AI technology is understandable given previous examples of unintended consequences of CAD on medical decision-making [72,73]. For example, a 2003 study assessing the effect of CAD on electrocardiogram (ECG) interpretation by inexperienced resident physicians demonstrated that when incorrect CAD interpretations were provided, residents were more likely to misinterpret an ECG compared to when CAD was not used [74]. In another study, use of CAD was associated with a reduction in breast cancer discrimination on mammography among high-performing expert clinicians [75]. Subsequent studies reported either no significant impact of CAD on radiologists’ decision-making [76] or a decrease in clinician discrimination when using CAD [77]. These examples underscore the importance and need to perform high quality studies assessing the effect of CAD tools on both clinical decision-making and patient outcomes.

4. Future directions

Before widespread implementation of radiomics-based AI tools for PN risk stratification can be recommended, well-executed studies must be performed to assess the effect of such tools on medical decision-making and patient-centered outcomes and to determine how best to implement these devices into routine clinical practice. Importantly, AI algorithms have been developed and trained to discriminate between malignant and benign PNs, but they are not capable of understanding the nuances of patient preferences and clinician assessments of the associated risks of various management approaches [68,69]. For a given indeterminate PN, clinicians have inconsistent approaches to PN risk assessment and variable malignancy probability thresholds above which they would recommend pursuing a lung biopsy [29,78,79,80,81]. For example, a more conservative clinician might not recommend a biopsy unless a PN diameter is greater than 10 mm or unless the estimated malignancy risk is greater than 20% or 30%, whereas a more aggressive approach might see a clinician recommend a biopsy for any PN larger than 8 mm or with a risk greater than 10%. Apart from clinicians’ variable perspectives on PN malignancy risk and management, individual patients can have widely disparate opinions on acceptable risk and anxiety related to the lack of certainty associated with a PN detected on a CT scan [82,83,84]. For example, a patient who values not missing a cancer diagnosis and places high importance on timeliness of care might choose to pursue a biopsy upfront for a given indeterminate PN even at the lower end of malignancy risk. On the other hand, a patient with multiple comorbidities who might be more anxious of the potential risks and complications of a lung biopsy procedure might choose to avoid a biopsy initially, opting for surveillance with serial CT scans instead. Thus, radiomics-based AI tools are not designed to replace clinicians’ decision-making but, at best, could assist clinicians and patients in jointly making the challenging decision of whether or not to biopsy a given PN [85]. As such, several decision analytic modeling approaches to estimating the clinical utility of diagnostic tests that take into account various threshold probabilities for biopsy have been developed. The most widely used and oldest is decision curve analysis, developed by Vickers and colleagues in 2006 [86,87,88,89]. This analytic technique plots net clinical benefit (a weighted difference between true positives and false positives for malignancy) on the Y-axis against threshold probability (the malignancy probability above which biopsy would be recommended) on the X-axis and has been used in multiple areas of research [90,91,92]. Notable examples of alternative approaches include the relative clinical utility curve developed by Baker and colleagues [93,94,95,96] and the interventional probability curve from Kammer and colleagues [97]. A necessary first step to understanding the potential effect of novel AI tools on PN management decisions will be the rigorous application of such clinical utility models using real-world patient data.

Promisingly, a growing number of studies have begun to estimate the clinical utility of radiomics-based tools in a retrospective fashion. For example, a recent publication from Paez and colleagues demonstrated the potential clinical utility of the Optellum LCP-CNN for longitudinal assessment of PNs, as malignancy risk estimates for malignant PNs increased over time while those for benign PNs remained relatively stable [98]. Separately, in 2021 Kammer and colleagues described the development of a novel combination biomarker incorporating clinical variables in addition to blood and radiomics-based inputs and performed a clinical utility analysis to estimate what the effect of using the biomarker would have been on clinical decision-making [99]. They found that use of this novel biomarker would theoretically have both reduced the proportion of individuals with benign PNs undergoing invasive procedures and the time to diagnosis of cancer among those with malignant PNs.

As previously mentioned, the gold standard method of evaluating any novel intervention is to perform a prospective randomized controlled trial that directly assesses the impact of an intervention on patient-centered clinical outcomes. Multiple experts have urged the performance of such trials when evaluating any novel AI-based technology [69,72,100,101]. To date, no clinical trials have been conducted evaluating the clinical effectiveness of a radiomics-based AI tool on PN risk stratification. However, a recent search of ClinicalTrials.gov reveals one such trial that is actively recruiting patients (NCT05968898). This pragmatic randomized controlled trial will compare usual care with an approach to PN risk stratification that incorporates use of the Optellum LCP-CNN tool. The primary outcome will be the composite proportion of malignant PNs managed with biopsy or empiric treatment and benign PNs managed with imaging surveillance, and secondary outcomes include timeliness of care, adverse events, diagnostic yield of biopsy procedures, and healthcare costs. Thus, much needed future efforts to carefully investigate AI technology are currently in the pipeline.

5. Conclusions

In conclusion, recent advances in radiomics-based AI technology have yielded promising preliminary data suggesting that AI may serve a complementary role to routine clinical decision-making for PN management in the future. However, widespread adoption of such novel tools has not yet been observed despite commercial availability, and use of such technology is not currently recommended by any clinical guidelines due to a dearth of adequate clinical utility and prospective randomized controlled trial data. Future rigorously conducted clinical research studies are required to fully evaluate the clinical effectiveness of radiomics-based AI tools for PN risk stratification and to clearly define what role, if any, these tools should play within routine clinical practice.

Footnotes

Author contributions

R.Y.K. performed the literature review and wrote the manuscript.

Funding

R.Y.K. is supported by a National Cancer Institute Career Development Award (K08CA279881).

Conflict of interest

No relevant financial conflicts of interest to disclose.

References

Smith-Bindman

Miglioretti

D.L.

Johnson

Lee

Feigelson

H.S.

Flynn

Greenlee

R.T.

Kruger

R.L.

Hornbrook

M.C.

Roblin

Solberg

L.I.

Vanneman

Weinmann

Williams

A.E.

, Use of diagnostic imaging studies and associated radiation exposure for patients enrolled in large integrated health care systems, 1996–2010, JAMA307 (2012), 2400–2409.

Gould

M.K.

Tang

Liu

I.L.

Lee

Zheng

Danforth

K.N.

Kosco

A.E.

Di Fiore

J.L.

Suh

D.E.

, Recent trends in the identification of incidental pulmonary nodules, Am J Respir Crit Care Med192 (2015), 1208–1214.

Moyer

V.A.

Force

U.S.P.S.T.

, Screening for lung cancer: U.S. Preventive Services Task Force recommendation statement, Ann Intern Med160 (2014), 330–338.

Meza

Jeon

Toumazis

Ten Haaf

Cao

Bastani

Han

S.S.

Blom

E.F.

Jonas

D.E.

Feuer

E.J.

Plevritis

S.K.

de Koning

H.J.

Kong

C.Y.

, Evaluation of the benefits and harms of lung cancer screening with low-dose computed tomography: Modeling study for the US Preventive Services Task Force, JAMA325 (2021), 988–997.

Siegel

R.L.

Miller

K.D.

Fuchs

H.E.

Jemal

, Cancer statistics, 2022, CA Cancer J Clin72 (2022), 7–33.

Huo

Sheu

Volk

R.J.

Shih

Y.T.

, Complication rates and downstream medical costs associated with invasive diagnostic procedures for lung abnormalities in the community setting, JAMA Intern Med179 (2019), 324–332.

Zhao

Huo

Burks

A.C.

Ost

D.E.

Shih

Y.T.

, Updated analysis of complication rates associated with invasive diagnostic procedures after lung cancer screening, JAMA Netw Open3 (2020), e2029874.

Nishi

S.P.E.

Zhou

Okereke

Kuo

Y.F.

Goodwin

, Use of imaging and diagnostic procedures after Low-Dose CT screening for lung cancer, Chest157 (2020), 427–434.

Farjah

Monsell

S.E.

Gould

M.K.

Smith-Bindman

Banegas

M.P.

Heagerty

P.J.

Keast

E.M.

Ramaprasan

Schoen

Brewer

E.G.

Greenlee

R.T.

Buist

D.S.M.

, Association of the intensity of diagnostic evaluation with outcomes in incidentally detected lung nodules, JAMA Intern Med181 (2021), 480–489.

10.

Rendle

K.A.

Saia

C.A.

Vachani

Burnett-Hartman

A.N.

Doria-Rose

V.P.

Beucker

Neslund-Dudas

Oshiro

Kim

R.Y.

Elston-Lafata

Honda

S.A.

Ritzwoller

Wainwright

J.V.

Mitra

Greenlee

R.T.

, Rates of downstream procedures and complications associated with lung cancer screening in routine clinical practice: A retrospective cohort study, Ann Intern Med177 (2024), 18–28.

11.

Ost

D.E.

Gould

M.K.

, Decision making in patients with pulmonary nodules, Am J Respir Crit Care Med185 (2012), 363–372.

12.

Gould

M.K.

Donington

Lynch

W.R.

Mazzone

P.J.

Midthun

D.E.

Naidich

D.P.

Wiener

R.S.

, Evaluation of individuals with pulmonary nodules: when is it lung cancer? Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines, Chest143 (2013), e93S–e120S.

13.

MacMahon

Naidich

D.P.

Goo

J.M.

Lee

K.S.

Leung

A.N.C.

Mayo

J.R.

Mehta

A.C.

Ohno

Powell

C.A.

Prokop

Rubin

G.D.

Schaefer-Prokop

C.M.

Travis

W.D.

Van Schil

P.E.

Bankier

A.A.

, Guidelines for management of incidental pulmonary nodules detected on CT images: From the Fleischner Society 2017, Radiology284 (2017), 228–243.

14.

American College of Radiology, Lung CT Screening Reporting and Data System (Lung-RADS), in.

15.

Tanner

N.T.

Porter

Gould

M.K.

X.J.

Vachani

Silvestri

G.A.

, Physician assessment of pretest probability of malignancy and adherence with guidelines for pulmonary nodule evaluation, Chest152 (2017), 263–270.

16.

Tanner

N.T.

Aggarwal

Gould

M.K.

Kearney

Diette

Vachani

Fang

K.C.

Silvestri

G.A.

, Management of pulmonary nodules by community pulmonologists: A multicenter observational study, Chest148 (2015), 1405–1414.

17.

Lokhandwala

Bittoni

M.A.

Dann

R.A.

D’Souza

A.O.

Johnson

Nagy

R.J.

Lanman

R.B.

Merritt

R.E.

Carbone

D.P.

, Costs of diagnostic assessment for lung cancer: A medicare claims analysis, Clin Lung Cancer18 (2017), e27–e34.

18.

Wiener

R.S.

Gould

M.K.

Slatore

C.G.

Fincke

B.G.

Schwartz

L.M.

Woloshin

, Resource use and guideline concordance in evaluation of pulmonary nodules for cancer: too much and too little care, JAMA Intern Med174 (2014), 871–880.

19.

Silvestri

G.A.

Vachani

Whitney

Elashoff

Porta Smith

Ferguson

J.S.

Parsons

Mitra

Brody

Lenburg

M.E.

Spira

Team

A.S.

, A bronchial genomic classifier for the diagnostic evaluation of lung cancer, N Engl J Med373 (2015), 243–251.

20.

National Lung Screening Trial Research

Aberle

D.R.

Adams

A.M.

Berg

C.D.

Black

W.C.

Clapp

J.D.

Fagerstrom

R.M.

Gareen

I.F.

Gatsonis

Marcus

P.M.

Sicks

J.D.

, Reduced lung-cancer mortality with low-dose computed tomographic screening, N Engl J Med365 (2011), 395–409.

21.

Swensen

S.J.

Silverstein

M.D.

Ilstrup

D.M.

Schleck

C.D.

Edell

E.S.

, The probability of malignancy in solitary pulmonary nodules, Archives of Internal Medicine157 (1997).

22.

Herder

G.J.

van Tinteren

Golding

R.P.

Kostense

P.J.

Comans

E.F.

Smit

E.F.

Hoekstra

O.S.

, Clinical prediction model to characterize pulmonary nodules: validation and added value of 18F-fluorodeoxyglucose positron emission tomography, Chest128 (2005), 2490–2496.

23.

Reid

Choi

H.K.

Han

Wang

Mukhopadhyay

Kou

Ahmad

Wang

Mazzone

P.J.

, Development of a risk prediction model to estimate the probability of malignancy in pulmonary nodules being considered for biopsy, Chest156 (2019), 367–375.

24.

McWilliams

Tammemagi

M.C.

Mayo

J.R.

Roberts

Liu

Soghrati

Yasufuku

Martel

Laberge

Gingras

Atkar-Khattra

Berg

C.D.

Evans

Finley

Yee

English

Nasute

Goffin

Puksa

Stewart

Tsai

Johnston

M.R.

Manos

Nicholas

Goss

G.D.

Seely

J.M.

Amjadi

Tremblay

Burrowes

MacEachern

Bhatia

Tsao

M.S.

Lam

, Probability of cancer in pulmonary nodules detected on first screening CT, N Engl J Med369 (2013), 910–919.

25.

MacMahon

Jiang

Armato

S.G.

, 3rd, Accuracy of the vancouver lung cancer risk prediction model compared with that of radiologists, Chest156 (2019), 112–119.

26.

Balekian

A.A.

Silvestri

G.A.

Simkovich

S.M.

Mestaz

P.J.

Sanders

G.D.

Daniel

Porcel

Gould

M.K.

, Accuracy of clinicians and models for estimating the probability that a pulmonary nodule is malignant, Ann Am Thorac Soc10 (2013), 629–635.

27.

Tanner

N.T.

Brasher

P.B.

Jett

Silvestri

G.A.

, Effect of a rule-in biomarker test on pulmonary nodule management: A survey of pulmonologists and thoracic surgeons, Clin Lung Cancer21 (2020), e89–e98.

28.

Maiga

A.W.

Deppen

S.A.

Massion

P.P.

Callaway-Lane

Pinkerman

Dittus

R.S.

Lambright

E.S.

Nesbitt

J.C.

Grogan

E.L.

, Communication about the probability of cancer in indeterminate pulmonary nodules, JAMA Surg153 (2018), 353–357.

29.

Iaccarino

J.M.

Simmons

Gould

M.K.

Slatore

C.G.

Woloshin

Schwartz

L.M.

Wiener

R.S.

, Clinical equipoise and shared decision-making in pulmonary nodule management. A survey of American Thoracic Society Clinicians, Ann Am Thorac Soc4 (2017), 968–975.

30.

Slatore

C.G.

Horeweg

Jett

J.R.

Midthun

D.E.

Powell

C.A.

Wiener

R.S.

Wisnivesky

J.P.

Gould

M.K.

Evaluation

A.T.S.A.H.C.o.S.a.R.F.f.P.N.

, An official American Thoracic Society research statement: A research framework for pulmonary nodule evaluation and management, Am J Respir Crit Care Med192 (2015), 500–514.

31.

Paez

Kammer

M.N.

Tanner

N.T.

Shojaee

Heideman

B.E.

Peikert

Balbach

M.L.

Iams

W.T.

Ning

Lenburg

M.E.

Mallow

Yarmus

Fong

K.M.

Deppen

Grogan

E.L.

Maldonado

, Update on biomarkers for the stratification of indeterminate pulmonary nodules, Chest (2023).

32.

Kammer

M.N.

Massion

P.P.

, Noninvasive biomarkers for lung cancer diagnosis, where do we stand? J Thorac Dis12 (2020), 3317–3330.

33.

Mamdani

Ahmed

Armstrong

Mok

Jalal

S.I.

, Blood-based tumor biomarkers in lung cancer for detection and treatment, Transl Lung Cancer Res6 (2017), 648–660.

34.

Tao

Cao

Zhu

Nie

Wang

Liu

Chen

Hong

Zhao

, Liquid biopsies to distinguish malignant from benign pulmonary nodules, Thorac Cancer12 (2021), 1647–1655.

35.

Liu

Xiang

Han

Lim

H.Y.

Zhang

Yang

Guo

Soo

Ren

Wang

Goh

B.C.

, Blood-based liquid biopsy: Insights into early detection and clinical management of lung cancer, Cancer Lett524 (2022), 91–102.

36.

Silvestri

G.A.

Tanner

N.T.

Kearney

Vachani

Massion

P.P.

Porter

Springmeyer

S.C.

Fang

K.C.

Midthun

Mazzone

P.J.

Team

P.T.

, Assessment of plasma proteomics biomarker’s ability to distinguish benign from malignant lung nodules: Results of the PANOPTIC (Pulmonary Nodule Plasma Proteomic Classifier) trial, Chest154 (2018), 491–500.

37.

Vachani

Whitney

D.H.

Parsons

E.C.

Lenburg

Ferguson

J.S.

Silvestri

G.A.

Spira

, Clinical utility of a bronchial genomic classifier in patients with suspected lung cancer, Chest150 (2016), 210–218.

38.

Team

A.S.

, Shared gene expression alterations in nasal and bronchial epithelium for lung cancer detection, J Natl Cancer Inst109 (2017).

39.

Keogh

R.J.

Riches

J.C.

, The use of breath analysis in the management of lung cancer: Is it ready for primetime? Curr Oncol29 (2022), 7355–7378.

40.

Horvath

Lazar

Gyulai

Kollai

Losonczy

, Exhaled biomarkers in lung cancer, Eur Respir J34 (2009), 261–275.

41.

Wang

Huang

Meng

Liu

Zhao

Wang

Qiu

, Identification of lung cancer breath biomarkers based on perioperative breathomics testing: A prospective observational study, EClinicalMedicine47 (2022), 101384.

42.

Y.J.

F.Z.

Yang

S.C.

Tang

E.K.

Liang

C.H.

, Radiomics in early lung cancer diagnosis: From diagnosis to clinical decision support and education, Diagnostics (Basel)12 (2022).

43.

Khawaja

Bartholmai

B.J.

Rajagopalan

Karwoski

R.A.

Varghese

Maldonado

Peikert

, Do we need to see to believe?-radiomics for lung nodule classification and lung cancer risk stratification, J Thorac Dis12 (2020), 3303–3316.

44.

Fujita

, AI-based computer-aided diagnosis (AI-CAD): the latest review to read first, Radiol Phys Technol13 (2020), 6–19.

45.

Gillies

R.J.

Kinahan

P.E.

Hricak

, Radiomics: Images are more than pictures, They Are Data, Radiology278 (2016), 563–577.

46.

Wilson

Devaraj

, Radiomics of pulmonary nodules and lung cancer, Transl Lung Cancer Res6 (2017), 86–91.

47.

Yang

Feng

Chi

Duan

Liu

Liang

Wang

Chen

Liu

, Deep learning aided decision support for pulmonary nodules diagnosing: A review, J Thorac Dis10 (2018), S867–S875.

48.

Hosny

Parmar

Quackenbush

Schwartz

L.H.

Aerts

, Artificial intelligence in radiology, Nat Rev Cancer18 (2018), 500–510.

49.

Ather

Kadir

Gleeson

, Artificial intelligence and radiomics in pulmonary nodule management: Current status and future applications, Clin Radiol75 (2020), 13–19.

50.

Wang

Zhou

Yuan

Pang

Fang

Zhang

Huang

Zhou

Wang

Lin

Sun

Tang

Yan

Zhang

Cheng

Zhang

, Development and validation of a clinically applicable deep learning strategy (HONORS) for pulmonary nodule classification at CT: A retrospective multicentre study, Lung Cancer155 (2021), 78–86.

51.

Way

T.W.

Hadjiiski

L.M.

Sahiner

Chan

H.P.

Cascade

P.N.

Kazerooni

E.A.

Bogot

Zhou

, Computer-aided diagnosis of pulmonary nodules on CT scans: segmentation and classification using 3D active contours, Med Phys33 (2006), 2323–2337.

52.

Way

T.W.

Sahiner

Chan

H.P.

Hadjiiski

Cascade

P.N.

Chughtai

Bogot

Kazerooni

, Computer-aided diagnosis of pulmonary nodules on CT scans: improvement of classification performance with nodule surface features, Med Phys36 (2009), 3086–3098.

53.

Way

Chan

H.P.

Hadjiiski

Sahiner

Chughtai

Song

T.K.

Poopat

Stojanovska

Frank

Attili

Bogot

Cascade

P.N.

Kazerooni

E.A.

, Computer-aided diagnosis of lung nodules on CT scans: ROC study of its effect on radiologists’ performance, Acad Radiol17 (2010), 323–332.

54.

Huang

Park

Yan

Lee

Chu

L.C.

Lin

C.T.

Hussien

Rathmell

Thomas

Chen

Hales

Ettinger

D.S.

Brock

Fishman

E.K.

Gabrielson

Lam

, Added value of computer-aided CT image features for early lung cancer diagnosis with small pulmonary nodules: A matched case-control study, Radiology286 (2018), 286–295.

55.

Peikert

Duan

Rajagopalan

Karwoski

R.A.

Clay

Robb

R.A.

Qin

Sicks

Bartholmai

B.J.

Maldonado

, Novel high-resolution computed tomography-based radiomic classifier for screen-identified pulmonary nodules in the National Lung Screening Trial, PLoS ONE13 (2018), e0196910.

56.

Maldonado

Varghese

Rajagopalan

Duan

Balar

Lakhani

D.A.

Antic

S.B.

Massion

Johnson

T.F.

Karwoski

R.A.

Robb

R.A.

Bartholmai

B.J.

Peikert

, Validation of the BRODERS classifier (Benign versus aggressive nODule Evaluation using Radiomic Stratification), a novel high-resolution computed tomography-based radiomic classifier for indeterminate pulmonary nodules, Eur Respir J (2020).

57.

Balagurunathan

Schabath

M.B.

Wang

Liu

Gillies

R.J.

, Quantitative imaging features improve discrimination of malignancy in pulmonary nodules, Sci Rep9 (2019), 8528.

58.

Ardila

Kiraly

A.P.

Bharadwaj

Choi

Reicher

J.J.

Peng

Tse

Etemadi

Corrado

Naidich

D.P.

Shetty

, End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography, Nat Med25 (2019), 954–961.

59.

Venkadesh

K.V.

Setio

A.A.A.

Schreuder

Scholten

E.T.

Chung

W.W.

Saghir

van Ginneken

Prokop

Jacobs

, Deep learning for malignancy risk estimation of pulmonary nodules detected at Low-Dose Screening CT, Radiology (2021), 204433.

60.

Massion

P.P.

Antic

Ather

Arteta

Brabec

Chen

Declerck

Dufek

Hickes

Kadir

Kunst

Landman

B.A.

Munden

R.F.

Novotny

Peschl

Pickup

L.C.

Santos

Smith

G.T.

Talwar

Gleeson

, Assessing the accuracy of a deep learning method to risk stratify indeterminate pulmonary nodules, Am J Respir Crit Care Med202 (2020), 241–249.

61.

Baldwin

D.R.

Gustafson

Pickup

Arteta

Novotny

Declerck

Kadir

Figueiras

Sterba

Exell

Potesil

Holland

Spence

Clubley

O’Dowd

Clark

Ashford-Turner

Callister

M.E.

Gleeson

F.V.

, External validation of a convolutional neural network artificial intelligence tool to predict malignancy in pulmonary nodules, Thorax75 (2020), 306–312.

62.

Kim

R.Y.

Oke

J.L.

Pickup

L.C.

Munden

R.F.

Dotson

T.L.

Bellinger

C.R.

Cohen

Simoff

M.J.

Massion

P.P.

Filippini

Gleeson

F.V.

, and Vachani

, Artificial intelligence tool for assessment of indeterminate pulmonary nodules detected with CT, Radiology304 (2022), 683–691.

63.

Kim

R.Y.

Oke

J.L.

Dotson

T.L.

Bellinger

C.R.

Vachani

, Effect of an artificial intelligence tool on management decisions for indeterminate pulmonary nodules, Respirology28 (2023), 582–584.

64.

de Hond

A.A.H.

Steyerberg

E.W.

van Calster

, Interpreting area under the receiver operating characteristic curve, Lancet Digit Health4 (2022), e853–e855.

65.

Van Calster

Steyerberg

E.W.

Wynants

van Smeden

, There is no such thing as a validated prediction model, BMC Med21 (2023), 70.

66.

Park

Jackson

G.P.

Foreman

M.A.

Gruen

Das

A.K.

, Evaluating artificial intelligence in medicine: phases of clinical research, JAMIA Open3 (2020), 326–331.

67.

Khera

Butte

A.J.

Berkwits

Hswen

Flanagin

Park

Curfman

Bibbins-Domingo

, AI in medicine-JAMA’s focus on clinical outcomes, patient-centered care, quality, and equity, JAMA (2023).

68.

Challen

Denny

Pitt

Gompels

Edwards

Tsaneva-Atanasova

, Artificial intelligence, bias and clinical safety, BMJ Qual Saf28 (2019), 231–237.

69.

K.H.

Kohane

I.S.

, Framing the challenges of artificial intelligence in medicine, BMJ Qual Saf28 (2019), 238–241.

70.

Strohm

Hehakaya

Ranschaert

E.R.

Boon

W.P.C.

Moors

E.H.M.

, Implementation of artificial intelligence (AI) applications in radiology: hindering and facilitating factors, Eur Radiol30 (2020), 5525–5532.

71.

Neri

Coppola

Miele

Bibbolino

Grassi

, Artificial intelligence: Who is responsible for the diagnosis? Radiol Med125 (2020), 517–521.

72.

Cabitza

Rasoini

Gensini

G.F.

, Unintended consequences of machine learning in medicine, JAMA318 (2017), 517–518.

73.

Kohli

Jha

, Why CAD failed in mammography, J Am Coll Radiol15 (2018), 535–537.

74.

Tsai

T.L.

Fridsma

D.B.

Gatti

, Computer decision support as a source of interpretation error: the case of electrocardiograms, J Am Med Inform Assoc10 (2003), 478–483.

75.

Povyakalo

A.A.

Alberdi

Strigini

Ayton

, How to discriminate between computer-aided and computer-hindered decisions: a case study in mammography, Med Decis Making33 (2013), 98–107.

76.

Cole

E.B.

Zhang

Marques

H.S.

Edward Hendrick

Yaffe

M.J.

Pisano

E.D.

, Impact of computer-aided detection systems on radiologist accuracy with digital mammography, AJR Am J Roentgenol203 (2014), 909–916.

77.

Lehman

C.D.

Wellman

R.D.

Buist

D.S.

Kerlikowske

Tosteson

A.N.

Miglioretti

D.L.

Breast Cancer Surveillance

, Diagnostic accuracy of digital screening mammography with and without computer-aided detection, JAMA Intern Med175 (2015), 1828–1837.

78.

Nair

Bartlett

E.C.

Walsh

S.L.F.

Wells

A.U.

Navani

Hardavella

Bhalla

Calandriello

Devaraj

Goo

J.M.

Klein

J.S.

MacMahon

Schaefer-Prokop

C.M.

Seo

J.B.

Sverzellati

Desai

S.R.

Lung Nodule Evaluation

, Variable radiological lung nodule evaluation leads to divergent management recommendations, Eur Respir J52 (2018).

79.

Verdial

F.C.

Madtes

D.K.

Cheng

G.S.

Pipavath

Kim

Hubbard

J.J.

Zadworny

Wood

D.E.

Farjah

, Multidisciplinary team-based management of incidentally detected lung nodules, Chest157 (2020), 985–993.

80.

van Riel

S.J.

Sanchez

C.I.

Bankier

A.A.

Naidich

D.P.

Verschakelen

Scholten

E.T.

de Jong

P.A.

Jacobs

van Rikxoort

Peters-Bax

Snoeren

Prokop

van Ginneken

Schaefer-Prokop

, Observer variability for classification of pulmonary nodules on Low-Dose CT images and its effect on nodule management, Radiology277 (2015), 863–871.

81.

van Riel

S.J.

Jacobs

Scholten

E.T.

Wittenberg

Winkler Wille

M.M.

de Hoop

Sprengers

Mets

O.M.

Geurts

Prokop

Schaefer-Prokop

van Ginneken

, Observer variability for Lung-RADS categorisation of lung cancer screening CTs: impact on patient management, Eur Radiol29 (2019), 924–931.

82.

Wiener

R.S.

Gould

M.K.

Woloshin

Schwartz

L.M.

Clark

J.A.

, What do you mean, a spot? A qualitative analysis of patients’ reactions to discussions with their physicians about pulmonary nodules, Chest143 (2013), 672–677.

83.

Slatore

C.G.

Golden

S.E.

Ganzini

Wiener

R.S.

D.H.

, Distress and patient-centered communication among veterans with incidental (not screen-detected) pulmonary nodules. A cohort study, Ann Am Thorac Soc12 (2015), 184–192.

84.

Freiman

M.R.

Clark

J.A.

Slatore

C.G.

Gould

M.K.

Woloshin

Schwartz

L.M.

Wiener

R.S.

, Patients’ knowledge, beliefs, and distress associated with detection and evaluation of incidental pulmonary nodules for cancer: results from a multicenter survey, J Thorac Oncol11 (2016), 700–708.

85.

Verghese

Shah

N.H.

Harrington

R.A.

, What this computer needs is a physician: Humanism and artificial intelligence, JAMA319 (2018), 19–20.

86.

Van Calster

Wynants

Verbeek

J.F.M.

Verbakel

J.Y.

Christodoulou

Vickers

A.J.

Roobol

M.J.

Steyerberg

E.W.

, Reporting and interpreting decision curve analysis: A guide for investigators, Eur Urol74 (2018), 796–804.

87.

Vickers

A.J.

Cronin

A.M.

Elkin

E.B.

Gonen

, Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers, BMC Med Inform Decis Mak8 (2008), 53.

88.

Vickers

A.J.

Elkin

E.B.

, Decision curve analysis: a novel method for evaluating prediction models, Med Decis Making26 (2006), 565–574.

89.

Vickers

A.J.

van Calster

Steyerberg

E.W.

, A simple, step-by-step guide to interpreting decision curve analysis, Diagn Progn Res3 (2019), 18.

90.

Fitzgerald

Saville

B.R.

Lewis

R.J.

, Decision curve analysis, JAMA313 (2015), 409–410.

91.

Raji

O.Y.

Duffy

S.W.

Agbaje

O.F.

Baker

S.G.

Christiani

D.C.

Cassidy

Field

J.K.

, Predictive accuracy of the Liverpool Lung Project risk model for stratifying patients for computed tomography screening for lung cancer: a case-control and cohort validation study, Ann Intern Med157 (2012), 242–250.

92.

Siddiqui

M.M.

Rais-Bahrami

Turkbey

George

A.K.

Rothwax

Shakir

Okoro

Raskolnikov

Parnes

H.L.

Linehan

W.M.

Merino

M.J.

Simon

R.M.

Choyke

P.L.

Wood

B.J.

Pinto

P.A.

, Comparison of MR/ultrasound fusion-guided biopsy with ultrasound-guided biopsy for the diagnosis of prostate cancer, JAMA313 (2015), 390–397.

93.

Baker

S.G.

, Putting risk prediction in perspective: relative utility curves, J Natl Cancer Inst101 (2009), 1538–1542.

94.

Baker

S.G.

, Decision Curves and Relative Utility Curves, Med Decis Making39 (2019), 489–490.

95.

Baker

S.G.

Kramer

B.S.

, Evaluating prognostic markers using relative utility curves and test tradeoffs, J Clin Oncol33 (2015), 2578–2580.

96.

Baker

S.G.

Van Calster

Steyerberg

E.W.

, Evaluating a new marker for risk prediction using the test tradeoff: an update, Int J Biostat8 (2012).

97.

Kammer

M.N.

Rowe

D.J.

Deppen

S.A.

Grogan

E.L.

Kaizer

A.M.

Baron

A.E.

Maldonado

, The intervention probability curve: modeling the practical application of threshold-guided decision-making, evaluated in lung, prostate, and ovarian cancers, Cancer Epidemiol Biomarkers Prev31 (2022), 1752–1759.

98.

Paez

Kammer

M.N.

Balar

Lakhani

D.A.

Knight

Rowe

Xiao

Heideman

B.E.

Antic

S.L.

Chen

S.C.

Peikert

Sandler

K.L.

Landman

B.A.

Deppen

S.A.

Grogan

E.L.

Maldonado

, Longitudinal lung cancer prediction convolutional neural network model improves the classification of indeterminate pulmonary nodules, Sci Rep13 (2023), 6157.

99.

Kammer

M.N.

Lakhani

D.A.

Balar

A.B.

Antic

S.L.

Kussrow

A.K.

Webster

R.L.

Mahapatra

Barad

Shah

Atwater

Diergaarde

Qian

Kaizer

New

Hirsch

Feser

W.J.

Strong

Rioth

Miller

Y.E.

Balagurunathan

Rowe

D.J.

Helmey

Chen

S.C.

Bauza

Deppen

S.A.

Sandler

Maldonado

Spira

Billatos

Schabath

M.B.

Gillies

R.J.

Wilson

D.O.

Walker

R.C.

Landman

Chen

Grogan

E.L.

Baron

A.E.

Bornhop

D.J.

Massion

P.P.

, Integrated biomarkers for the management of indeterminate pulmonary nodules, Am J Respir Crit Care Med204 (2021), 1306–1316.

100.

Park

S.H.

Han

, Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction, Radiology286 (2018), 800–809.

101.

Topol

E.J.

, High-performance medicine: the convergence of human and artificial intelligence, Nat Med25 (2019), 44–56.